[jira] [Created] (HDFS-13525) RBF: Add unit test TestStateStoreDisabledNameserviceStore
Yiqun Lin created HDFS-13525: Summary: RBF: Add unit test TestStateStoreDisabledNameserviceStore Key: HDFS-13525 URL: https://issues.apache.org/jira/browse/HDFS-13525 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.1 Reporter: Yiqun Lin Assignee: Yiqun Lin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize
Wei-Chiu Chuang created HDFS-13524: -- Summary: Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize Key: HDFS-13524 URL: https://issues.apache.org/jira/browse/HDFS-13524 Project: Hadoop HDFS Issue Type: Bug Environment: TestLargeBlock#testLargeBlockSize may fail with error: {quote} All datanodes [DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]] are bad. Aborting... {quote} Tracing back, the error is due to the stress applied to the host sending a 2GB block, causing write pipeline ack read timeout: {quote} 2017-09-10 22:16:07,285 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: /127.0.0.1:57794 dest: /127.0.0.1:44968 2017-09-10 22:16:50,402 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/ 2017-09-10 22:17:54,427 [ResponseProcessor for block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:57794 remote=/127.0.0.1:44968] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104) 2017-09-10 22:17:54,432 [DataXceiver for client DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 java.io.IOException: Connection reset by peer {quote} Instead of raising read timeout, I suggest increasing cluster size from default=1 to 3, so that it has the opportunity to choose a different DN and resend. Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we introduced client acknowledgement read timeout. Reporter: Wei-Chiu Chuang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13523) Support observer nodes in MiniDFSCluster
Erik Krogen created HDFS-13523: -- Summary: Support observer nodes in MiniDFSCluster Key: HDFS-13523 URL: https://issues.apache.org/jira/browse/HDFS-13523 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Reporter: Erik Krogen MiniDFSCluster should support Observer nodes so that we can write decent integration tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13522) Support observer node from Router-Based Federation
Erik Krogen created HDFS-13522: -- Summary: Support observer node from Router-Based Federation Key: HDFS-13522 URL: https://issues.apache.org/jira/browse/HDFS-13522 Project: Hadoop HDFS Issue Type: Sub-task Components: federation, namenode Reporter: Erik Krogen Changes will need to occur to the router to support the new observer node. One such change will be to make the router understand the observer state, e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-6589) TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-6589. --- Resolution: Cannot Reproduce Resolve as cannot reproduce. The last time I see this bug was 2 years ago. Most likely it was a real bug and fixed later. > TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently > > > Key: HDFS-6589 > URL: https://issues.apache.org/jira/browse/HDFS-6589 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: flaky-test > > https://builds.apache.org/job/PreCommit-HDFS-Build/7207 is clean > https://builds.apache.org/job/PreCommit-HDFS-Build/7208 has the following > failure. The code is essentially the same. > Running the same test locally doesn't reproduce. A flaky test there. > {code} > Stacktrace > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClient(TestDistributedFileSystem.java:263) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testAllWithNoXmlDefaults(TestDistributedFileSystem.java:651) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13521) NFS Gateway should support impersonation
Wei-Chiu Chuang created HDFS-13521: -- Summary: NFS Gateway should support impersonation Key: HDFS-13521 URL: https://issues.apache.org/jira/browse/HDFS-13521 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Similar to HDFS-10481, NFS gateway and httpfs are independent processes that accept client connections. NFS Gateway currently solves file permission/ownership problem by running as HDFS super user, and then call setOwner() to change file owner. This is not desirable. # it adds additional RPC load to NameNode. # this does not support at-rest encryption, because by design, HDFS super user cannot access KMS. This is yet another problem around KMS ACL. [~xiaochen] [~rushabh.shah] thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13520) fuse_dfs to support keytab based login
Wei-Chiu Chuang created HDFS-13520: -- Summary: fuse_dfs to support keytab based login Key: HDFS-13520 URL: https://issues.apache.org/jira/browse/HDFS-13520 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Environment: Hadoop 2.6/3.0, Kerberized, fuse_dfs Reporter: Wei-Chiu Chuang It looks like the current fuse_dfs implementation supports login using current kerberos credential. If the tgt expires, it fails with the following error: {noformat} hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://ns1, port=0, kerbTicketCachePath=/tmp/krb5cc_2000, userName=systest) error: LoginException: Unable to obtain Principal Name for authentication org.apache.hadoop.security.KerberosAuthException: failure to login: for user: systest using ticket cache file: /tmp/krb5cc_2000 javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:807) at org.apache.hadoop.security.UserGroupInformation.getBestUGI(UserGroupInformation.java:742) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404) Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal Name for authentication at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:841) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:704) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:788) ... 2 more {noformat} This is reproducible easily in a test cluster with an extremely short ticket life time (e.g. 1 minute) Note: HDFS-3608 addresses a similar issue, but in this case, since the ticket cache file itself does not change, fuse couldn't detect & update. It looks like it should call UserGroupInformation#loginFromKeytab() in the beginning, similar to how balancer supports keytab based login (HDFS-9804). Thanks [~xiaochen] for the idea. Or alternatively, have a background process that continuously relogin from keytab. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/ [May 1, 2018 1:47:10 AM] (wwei) YARN-8212. Pending backlog for async allocation threads should be [May 1, 2018 4:41:10 AM] (aengineer) HDDS-13. Refactor StorageContainerManager into seperate RPC endpoints. [May 1, 2018 8:31:34 AM] (sunilg) YARN-8187. [UI2] Individual Node page does not contain breadcrumb trail. [May 1, 2018 2:27:47 PM] (billie) YARN-2674. Fix distributed shell AM container relaunch during RM work [May 1, 2018 3:12:46 PM] (inigoiri) HDFS-13503. Fix TestFsck test failures on Windows. Contributed by Xiao [May 1, 2018 8:46:34 PM] (eyang) YARN-7799. Improved YARN service jar file handling. [May 1, 2018 9:19:53 PM] (jlowe) MAPREDUCE-7086. Add config to allow FileInputFormat to ignore [May 1, 2018 9:32:40 PM] (stevel) HADOOP-15250. Split-DNS MultiHomed Server Network Cluster Network IPC -1 overall The following subsystems voted -1: asflicense findbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-hdds/common Exceptional return value of java.io.File.mkdirs() ignored in org.apache.hadoop.utils.LevelDBStore.openDB(File, Options) At LevelDBStore.java:ignored in org.apache.hadoop.utils.LevelDBStore.openDB(File, Options) At LevelDBStore.java:[line 79] FindBugs : module:hadoop-hdds/server-scm Synchronization performed on java.util.concurrent.ConcurrentMap in org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.updateContainerReportMetrics(StorageContainerDatanodeProtocolProtos$ContainerReportsRequestProto) At SCMDatanodeProtocolServer.java:org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.updateContainerReportMetrics(StorageContainerDatanodeProtocolProtos$ContainerReportsRequestProto) At SCMDatanodeProtocolServer.java:[line 230] FindBugs : module:hadoop-ozone/common org.apache.hadoop.ozone.web.handlers.UserArgs.getGroups() may expose internal representation by returning UserArgs.groups At UserArgs.java:by returning UserArgs.groups At UserArgs.java:[line 121] org.apache.hadoop.ozone.web.handlers.UserArgs.setGroups(String[]) may expose internal representation by storing an externally mutable object into UserArgs.groups At UserArgs.java:by storing an externally mutable object into UserArgs.groups At UserArgs.java:[line 130] Failed junit tests : hadoop.hdfs.client.impl.TestBlockReaderLocal hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration hadoop.hdfs.TestReconstructStripedFile hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-compile-javac-root.txt [332K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/whitespace-eol.txt [9.4M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-hdds_common-warnings.html [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-hdds_server-scm-warnings.html [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_client.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_common-warnings.html [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_objectstore-service.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out