[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578846#comment-16578846 ] Chen Liang commented on HDFS-13735: --- Thanks for pointing out [~shv]. I've committed to branch-3.0 and branch-3.1. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576979#comment-16576979 ] Konstantin Shvachko commented on HDFS-13735: We should also push it to branch-3.0, [~vagarychen]? > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575184#comment-16575184 ] Hudson commented on HDFS-13735: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14740 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14740/]) HDFS-13735. Make QJM HTTP URL connection timeout configurable. (cliang: rev 5326a7906de7c86a236d948012cabf3a9ba82310) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575173#comment-16575173 ] Chao Sun commented on HDFS-13735: - Thanks [~shv], [~xkrogen] for the review and [~vagarychen] for helping to commit it! > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575151#comment-16575151 ] Chen Liang commented on HDFS-13735: --- I've committed to trunk, thanks [~csun] for the contribution and [~shv], [~xkrogen] for the review! > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566200#comment-16566200 ] Konstantin Shvachko commented on HDFS-13735: +1. Let's move on. I think this should go into trunk. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555981#comment-16555981 ] Chao Sun commented on HDFS-13735: - Thanks for taking a look [~shv]! Please see my reply inline. bq. Can we reuse an existing parameter for this purpose? I looked around and the only existing configs are {{dfs.webhdfs.socket.[connect|read]-timeout}}, which do not seem a good fit for this use case. bq. If we cannot use existing, should we make the new ones public, keep undocumented, or use a reasonable hard-coded constant? I'm more in favor of having config parameters for these. Also there're already several timeout related configurations in {{QuorumJournalManager}}, which are all exposed through {{hdfs-default.xml}}. Should we do the same for these two just for consistency? bq. If we introduce a new parameter, we should give it a reasonable default value. What is the reasonable timeout here? You set it to the old default. I think it's probably fine to just keep the old default *without ObserverNameNode*. For the latter though, the timeout value should be decreased for the reasons I listed above. Internally our 5min P99 latency for this is roughly below 8sec (this also include the time to apply the edit logs), so to me seems like 10 second would be a good value, but obviously it subjects to different environments. bq. The best solution would be to take the http call (readOp()) out of the global lock. Can it be done? Not complete sure. One approach might be to add an {{init}} method {{EditLogInputStream}} and call it outside the global lock. This will call the {{EditLogFileInputStream#init()}} (see [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java#L134]) and creates the HTTP connection. Another more hacky way is to call {{EditLogInputStream#getVersion(false)}}. However, one issue is that currently we select input streams inside the lock (see [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java#L298]) so have to move it outside the lock in order to make the above possible. Even if we can move the http call outside the global lock, the point 2) I mentioned above: bq. The namespace freshness of ObserverNameNode w.r.t Active NN will be as stale as 60s because of the long timeout. This may not be acceptable for some scenarios. is still not resolved though. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554951#comment-16554951 ] Konstantin Shvachko commented on HDFS-13735: Looked at your patch, Chao. I am not a big fan of adding new timeout configuration parameters for every type of connection. QJM and Hadoop in general have already a bunch of those. So my questions are: # Can we reuse an existing parameter for this purpose? # If we cannot use existing, should we make the new ones public, keep undocumented, or use a reasonable hard-coded constant? # If we introduce a new parameter, we should give it a reasonable default value. What is the reasonable timeout here? You set it to the old default. # The best solution would be to take the http call ({{readOp()}}) out of the global lock. Can it be done? > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554570#comment-16554570 ] Chao Sun commented on HDFS-13735: - [~shv]: could you take a look? > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545900#comment-16545900 ] genericqa commented on HDFS-13735: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13735 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931838/HDFS-13735.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 72842cbd3f21 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 121865c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545822#comment-16545822 ] Erik Krogen commented on HDFS-13735: The logic seems good to me. It is a pretty general change so I don't see a reason not to include it in trunk. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch, HDFS-13735.001.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545752#comment-16545752 ] genericqa commented on HDFS-13735: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.tools.TestHdfsConfigFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HDFS-13735 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931814/HDFS-13735.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e365c9f614fd 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0c7a578 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24596/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24596/testReport/ | | Max. process+thread count | 3852 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545750#comment-16545750 ] Chao Sun commented on HDFS-13735: - [~xkrogen]: will add this to {{hdfs-default.xml}}. I'd like to decrease the timeout so that it can fail quickly. Currently if the timeout occur, the {{EditLogTailer}} will [retry|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java#L468] in the next iteration. This will be a minor issue for SBN but not for ObserverNameNode, for two reasons: 1. The HTTP connection is opened while holding the NN read/write lock (see [readOp() inside FSEditLogLoader|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java#L213] and how it calls [EditLogFileInputStream#init()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java#L134] to open HTTP connection with JNs.), so huge RPC spike could occur with the timeout. 2. The namespace freshness of ObserverNameNode w.r.t Active NN will be as stale as 60s because of the long timeout. This may not be acceptable for some scenarios. Since this is mainly an issue for ObserverNameNode, please also let me know whether it makes more sense to move this JIRA under HDFS-12943. Thanks. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545678#comment-16545678 ] Erik Krogen commented on HDFS-13735: The patch LGTM except that you need to update {{hdfs-default.xml}} as well, but I would like to understand more about the use case. Are you saying you want to increase the timeouts to avoid "connect timed out" errors, or decrease the timeouts to make it fail more quickly? What is the implication if a timeout occurs here - fatal exception or retry? > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13735) Make QJM HTTP URL connection timeout configurable
[ https://issues.apache.org/jira/browse/HDFS-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545606#comment-16545606 ] Chao Sun commented on HDFS-13735: - [~shv], [~xkrogen]: could you take a look at this patch? Thanks. > Make QJM HTTP URL connection timeout configurable > - > > Key: HDFS-13735 > URL: https://issues.apache.org/jira/browse/HDFS-13735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: qjm >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-13735.000.patch > > > We've seen "connect timed out" happen internally when QJM tries to open HTTP > connections to JNs. This is now using {{newDefaultURLConnectionFactory}} > which uses the default timeout 60s, and is not configurable. > It would be better for this to be configurable, especially for > ObserverNameNode (HDFS-12943), where latency is important, and 60s may not be > a good value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org