Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/ [Nov 6, 2018 5:14:57 AM] (wwei) YARN-8969. AbstractYarnScheduler#getNodeTracker should return generic [Nov 6, 2018 5:38:39 AM] (xiao) HDFS-14053. Provide ability for NN to re-replicate based on topology [Nov 6, 2018 6:50:09 AM] (wwei) YARN-8970. Improve the debug message in [Nov 6, 2018 9:42:38 AM] (aajisaka) HADOOP-15904. [JDK 11] Javadoc build failed due to bad use of '>' in [Nov 6, 2018 11:44:05 AM] (aajisaka) HADOOP-15902. [JDK 11] Specify the HTML version of Javadoc to 4.01. [Nov 6, 2018 12:26:43 PM] (aajisaka) HDDS-811. [JDK 11] Fix compilation failures with jdk 11. Contributed by [Nov 6, 2018 2:40:59 PM] (jlowe) YARN-8865. RMStateStore contains large number of expired [Nov 6, 2018 5:22:27 PM] (templedf) HDFS-14047. [libhdfs++] Fix hdfsGetLastExceptionRootCause bug in [Nov 6, 2018 5:28:54 PM] (elek) HDDS-798. Storage-class is showing incorrectly. Contributed by Bharat [Nov 6, 2018 6:05:58 PM] (inigoiri) HDFS-14051. Refactor NameNodeHttpServer#initWebHdfs to specify local [Nov 6, 2018 6:42:19 PM] (aengineer) HDDS-810. Move the "\r\n" at the chunk end in [Nov 6, 2018 7:18:15 PM] (inigoiri) HDFS-14048. DFSOutputStream close() throws exception on subsequent call [Nov 6, 2018 7:46:05 PM] (eyang) YARN-8957. Add Serializable interface to ComponentContainers. [Nov 6, 2018 11:55:51 PM] (jlowe) MAPREDUCE-7156. NullPointerException when reaching max shuffle -1 overall The following subsystems voted -1: findbugs hadolint pathlen shadedclient unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-compile-javac-root.txt [324K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-checkstyle-root.txt [17M] hadolint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-patch-pylint.txt [40K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-patch-shellcheck.txt [68K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/whitespace-eol.txt [9.3M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-hdds_client.txt [24K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-hdds_framework.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-hdds_tools.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_client.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_common.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_objectstore-service.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_ozone-manager.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_ozonefs.txt [16K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-hadoop-ozone_s3gateway.txt [44K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/950/artifact/out/branch-findbugs-
[jira] [Resolved] (HDFS-14048) DFSOutputStream close() throws exception on subsequent call after DataNode restart
[ https://issues.apache.org/jira/browse/HDFS-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved HDFS-14048. Resolution: Fixed Fix Version/s: 2.9.2 2.10.0 > DFSOutputStream close() throws exception on subsequent call after DataNode > restart > -- > > Key: HDFS-14048 > URL: https://issues.apache.org/jira/browse/HDFS-14048 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: HDFS-14048-branch-2.000.patch, HDFS-14048.000.patch > > > We recently discovered an issue in which, during a rolling upgrade, some jobs > were failing with exceptions like (sadly this is the whole stack trace): > {code} > java.io.IOException: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:877) > {code} > with an earlier statement in the log like: > {code} > INFO [main] org.apache.hadoop.hdfs.DFSClient: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > {code} > Strangely we did not see any other logs about the {{DFSOutputStream}} failing > after waiting for the DataNode restart. We eventually realized that in some > cases {{DFSOutputStream#close()}} may be called more than once, and that if > so, the {{IOException}} above is thrown on the _second_ call to {{close()}} > (this is even with HDFS-5335; prior to this it would have been thrown on all > calls to {{close()}} besides the first). > The problem is that in {{DataStreamer#createBlockOutputStream()}}, after the > new output stream is created, it resets the error states: > {code} > errorState.resetInternalError(); > // remove all restarting nodes from failed nodes list > failed.removeAll(restartingNodes); > restartingNodes.clear(); > {code} > But it forgets to clear {{lastException}}. When > {{DFSOutputStream#closeImpl()}} is called a second time, this block is > triggered: > {code} > if (isClosed()) { > LOG.debug("Closing an already closed stream. [Stream:{}, streamer:{}]", > closed, getStreamer().streamerClosed()); > try { > getStreamer().getLastException().check(true); > {code} > The second time, {{isClosed()}} is true, so the exception checking occurs and > the "Datanode is restarting" exception is thrown even though the stream has > already been successfully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14048) DFSOutputStream close() throws exception on subsequent call after DataNode restart
[ https://issues.apache.org/jira/browse/HDFS-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reopened HDFS-14048: Re-opening for branch-2 commit. Sorry for the trouble [~elgoiri], I have just attached the branch-2 patch. Since I'm not sure if Jenkins will run properly given the branch-2 build issues, I also executed all of the following tests locally without any failures: {{TestClientProtocolForPipelineRecovery,TestDFSOutputStream,TestClientBlockVerification,TestDatanodeRestart}} > DFSOutputStream close() throws exception on subsequent call after DataNode > restart > -- > > Key: HDFS-14048 > URL: https://issues.apache.org/jira/browse/HDFS-14048 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: HDFS-14048.000.patch > > > We recently discovered an issue in which, during a rolling upgrade, some jobs > were failing with exceptions like (sadly this is the whole stack trace): > {code} > java.io.IOException: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:877) > {code} > with an earlier statement in the log like: > {code} > INFO [main] org.apache.hadoop.hdfs.DFSClient: A datanode is restarting: > DatanodeInfoWithStorage[1.1.1.1:71,BP-,DISK] > {code} > Strangely we did not see any other logs about the {{DFSOutputStream}} failing > after waiting for the DataNode restart. We eventually realized that in some > cases {{DFSOutputStream#close()}} may be called more than once, and that if > so, the {{IOException}} above is thrown on the _second_ call to {{close()}} > (this is even with HDFS-5335; prior to this it would have been thrown on all > calls to {{close()}} besides the first). > The problem is that in {{DataStreamer#createBlockOutputStream()}}, after the > new output stream is created, it resets the error states: > {code} > errorState.resetInternalError(); > // remove all restarting nodes from failed nodes list > failed.removeAll(restartingNodes); > restartingNodes.clear(); > {code} > But it forgets to clear {{lastException}}. When > {{DFSOutputStream#closeImpl()}} is called a second time, this block is > triggered: > {code} > if (isClosed()) { > LOG.debug("Closing an already closed stream. [Stream:{}, streamer:{}]", > closed, getStreamer().streamerClosed()); > try { > getStreamer().getLastException().check(true); > {code} > The second time, {{isClosed()}} is true, so the exception checking occurs and > the "Datanode is restarting" exception is thrown even though the stream has > already been successfully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-815) Rename Ozone/HDDS config keys prefixed with 'dfs'
Arpit Agarwal created HDDS-815: -- Summary: Rename Ozone/HDDS config keys prefixed with 'dfs' Key: HDDS-815 URL: https://issues.apache.org/jira/browse/HDDS-815 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Arpit Agarwal The following Ozone config keys are prefixed with dfs which is the prefix used by HDFS. Instead we should prefix them with either HDDS or Ozone. {code} dfs.container.ipc dfs.container.ipc.random.port dfs.container.ratis.datanode.storage.dir dfs.container.ratis.enabled dfs.container.ratis.ipc dfs.container.ratis.ipc.random.port dfs.container.ratis.num.container.op.executors dfs.container.ratis.num.write.chunk.threads dfs.container.ratis.replication.level dfs.container.ratis.rpc.type dfs.container.ratis.segment.preallocated.size dfs.container.ratis.segment.size dfs.container.ratis.statemachinedata.sync.timeout dfs.ratis.client.request.max.retries dfs.ratis.client.request.retry.interval dfs.ratis.client.request.timeout.duration dfs.ratis.leader.election.minimum.timeout.duration dfs.ratis.server.failure.duration dfs.ratis.server.request.timeout.duration dfs.ratis.server.retry-cache.timeout.duration dfs.ratis.snapshot.threshold {code} Additionally, _dfs.container.ipc_ should be changed to _dfs.container.ipc.port_. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-814) dfs.ratis.leader.election.minimum.timeout.duration should not be read by client
Arpit Agarwal created HDDS-814: -- Summary: dfs.ratis.leader.election.minimum.timeout.duration should not be read by client Key: HDDS-814 URL: https://issues.apache.org/jira/browse/HDDS-814 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Arpit Agarwal dfs.ratis.leader.election.minimum.timeout.duration is read by client for the following assertion. {code} Preconditions .assertTrue(maxRetryCount * retryInterval > 5 * leaderElectionTimeout, "Please make sure dfs.ratis.client.request.max.retries * " + "dfs.ratis.client.request.retry.interval > " + "5 * dfs.ratis.leader.election.minimum.timeout.duration"); {code} This does not guarantee that the leader is using the same value as the client. We should probably just ensure that the defaults are sane and remove this assert. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-813) [JDK11] mvn javadoc:javadoc fails
Akira Ajisaka created HDDS-813: -- Summary: [JDK11] mvn javadoc:javadoc fails Key: HDDS-813 URL: https://issues.apache.org/jira/browse/HDDS-813 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Akira Ajisaka {{mvn javadoc:javadoc -Phdds}} fails on Java 11 {noformat} [ERROR] /Users/aajisaka/git/hadoop/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java:107: error: bad use of '>' [ERROR]* @param count count must be > 0. [ERROR] /Users/aajisaka/git/hadoop/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/LocatedContainer.java:85: error: unknown tag: DatanodeInfo [ERROR] * @return Set nodes that currently host the container [ERROR] /Users/aajisaka/git/hadoop/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/ScmLocatedBlock.java:71: error: unknown tag: DatanodeInfo [ERROR] * @return List nodes that currently host the block [ERROR] /Users/aajisaka/git/hadoop/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/audit/Auditable.java:28: error: malformed HTML [ERROR] * @return Map with values to be logged in audit. [ERROR] ^ [ERROR] /Users/aajisaka/git/hadoop/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/audit/Auditable.java:28: error: bad use of '>' [ERROR] * @return Map with values to be logged in audit. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13939) [JDK10] Javadoc build fails on JDK 10 in hadoop-hdfs-project
[ https://issues.apache.org/jira/browse/HDFS-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-13939. -- Resolution: Done All the sub-tasks were resolved. Closing this. > [JDK10] Javadoc build fails on JDK 10 in hadoop-hdfs-project > > > Key: HDFS-13939 > URL: https://issues.apache.org/jira/browse/HDFS-13939 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Reporter: Takanobu Asanuma >Priority: Major > > There are many javadoc errors on JDK 10 in hadoop-hdfs-project. Let's fix > them per project or module. > * hadoop-hdfs-project/hadoop-hdfs: 212 errors > * hadoop-hdfs-project/hadoop-hdfs-client: 85 errors > * hadoop-hdfs-project/hadoop-hdfs-rbf: 34 errors > We can confirm the errors by below command. > {noformat} > $ mvn javadoc:javadoc --projects hadoop-hdfs-project/hadoop-hdfs-client > {noformat} > See also: HADOOP-15785 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-812) TestEndPoint#testCheckVersionResponse is failing
Nanda kumar created HDDS-812: Summary: TestEndPoint#testCheckVersionResponse is failing Key: HDDS-812 URL: https://issues.apache.org/jira/browse/HDDS-812 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar TestEndPoint#testCheckVersionResponse is failing with the below error {code:java} [ERROR] testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint) Time elapsed: 0.142 s <<< FAILURE! java.lang.AssertionError: expected: but was: {code} Once we are in REGISTER state we don't allow getVersion call anymore. This is causing the test case to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-810) Move the "\r\n" at the chunk end in TestSignedChunksInputStream#singlechunkwithoutend
chencan created HDDS-810: Summary: Move the "\r\n" at the chunk end in TestSignedChunksInputStream#singlechunkwithoutend Key: HDDS-810 URL: https://issues.apache.org/jira/browse/HDDS-810 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: chencan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-809) Refactor SCMChillModeManager
Yiqun Lin created HDDS-809: -- Summary: Refactor SCMChillModeManager Key: HDDS-809 URL: https://issues.apache.org/jira/browse/HDDS-809 Project: Hadoop Distributed Data Store Issue Type: Improvement Affects Versions: 0.3.0 Reporter: Yiqun Lin Assignee: Yiqun Lin Currently the class SCMChillModeManager and related class Precheck/ChillModePrecheck are directly under the package {{org.apache.hadoop.hdds.scm.server}}. This looks not so appropriate and we can have a refactor for this. In addition, we can separate its inner classes (rule classes) and make them as independent classes. This will make the logic of chillmode manage look very clean. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org