[jira] [Created] (HDFS-13525) RBF: Add unit test TestStateStoreDisabledNameserviceStore

2018-05-02 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-13525:


 Summary: RBF: Add unit test TestStateStoreDisabledNameserviceStore
 Key: HDFS-13525
 URL: https://issues.apache.org/jira/browse/HDFS-13525
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.1
Reporter: Yiqun Lin
Assignee: Yiqun Lin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13524) Occasional "All datanodes are bad" error in TestLargeBlock#testLargeBlockSize

2018-05-02 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-13524:
--

 Summary: Occasional "All datanodes are bad" error in 
TestLargeBlock#testLargeBlockSize
 Key: HDFS-13524
 URL: https://issues.apache.org/jira/browse/HDFS-13524
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: TestLargeBlock#testLargeBlockSize may fail with error:
{quote}
All datanodes 
[DatanodeInfoWithStorage[127.0.0.1:44968,DS-acddd79e-cdf1-4ac5-aac5-e804a2e61600,DISK]]
 are bad. Aborting...
{quote}

Tracing back, the error is due to the stress applied to the host sending a 2GB 
block, causing write pipeline ack read timeout:
{quote}
2017-09-10 22:16:07,285 [DataXceiver for client 
DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO  
datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001 src: 
/127.0.0.1:57794 dest: /127.0.0.1:44968
2017-09-10 22:16:50,402 [DataXceiver for client 
DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] WARN  
datanode.DataNode (BlockReceiver.java:flushOrSync(434)) - Slow flushOrSync took 
5383ms (threshold=300ms), isSync:false, flushTotalNanos=5383638982ns, 
volume=file:/tmp/tmp.1oS3ZfDCwq/src/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/
2017-09-10 22:17:54,427 [ResponseProcessor for block 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001] WARN  
hdfs.DataStreamer (DataStreamer.java:run(1214)) - Exception for 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/127.0.0.1:57794 remote=/127.0.0.1:44968]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:434)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at 
org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1104)
2017-09-10 22:17:54,432 [DataXceiver for client 
DFSClient_NONMAPREDUCE_998779779_9 at /127.0.0.1:57794 [Receiving block 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001]] INFO  
datanode.DataNode (BlockReceiver.java:receiveBlock(1000)) - Exception for 
BP-682118952-172.26.15.143-1505106964162:blk_1073741825_1001
java.io.IOException: Connection reset by peer
{quote}

Instead of raising read timeout, I suggest increasing cluster size from 
default=1 to 3, so that it has the opportunity to choose a different DN and 
resend.

Suspect this fails after HDFS-13103, in Hadoop 2.8/3.0.0-alpha1 when we 
introduced client acknowledgement read timeout.
Reporter: Wei-Chiu Chuang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13523) Support observer nodes in MiniDFSCluster

2018-05-02 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13523:
--

 Summary: Support observer nodes in MiniDFSCluster
 Key: HDFS-13523
 URL: https://issues.apache.org/jira/browse/HDFS-13523
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, test
Reporter: Erik Krogen


MiniDFSCluster should support Observer nodes so that we can write decent 
integration tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13522) Support observer node from Router-Based Federation

2018-05-02 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13522:
--

 Summary: Support observer node from Router-Based Federation
 Key: HDFS-13522
 URL: https://issues.apache.org/jira/browse/HDFS-13522
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: federation, namenode
Reporter: Erik Krogen


Changes will need to occur to the router to support the new observer node.

One such change will be to make the router understand the observer state, e.g. 
{{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-6589) TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently

2018-05-02 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-6589.
---
Resolution: Cannot Reproduce

Resolve as cannot reproduce. The last time I see this bug was 2 years ago. Most 
likely it was a real bug and fixed later.

> TestDistributedFileSystem.testAllWithNoXmlDefaults failed intermittently
> 
>
> Key: HDFS-6589
> URL: https://issues.apache.org/jira/browse/HDFS-6589
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: flaky-test
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/7207 is clean
> https://builds.apache.org/job/PreCommit-HDFS-Build/7208 has the following 
> failure. The code is essentially the same.
> Running the same test locally doesn't reproduce. A flaky test there.
> {code}
> Stacktrace
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClient(TestDistributedFileSystem.java:263)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testAllWithNoXmlDefaults(TestDistributedFileSystem.java:651)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13521) NFS Gateway should support impersonation

2018-05-02 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-13521:
--

 Summary: NFS Gateway should support impersonation
 Key: HDFS-13521
 URL: https://issues.apache.org/jira/browse/HDFS-13521
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


Similar to HDFS-10481, NFS gateway and httpfs are independent processes that 
accept client connections.
NFS Gateway currently solves file permission/ownership problem by running as 
HDFS super user, and then call setOwner() to change file owner.

This is not desirable.
# it adds additional RPC load to NameNode. 
#  this does not support at-rest encryption, because by design, HDFS super user 
cannot access KMS.

This is yet another problem around KMS ACL. [~xiaochen] [~rushabh.shah] 
thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13520) fuse_dfs to support keytab based login

2018-05-02 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-13520:
--

 Summary: fuse_dfs to support keytab based login
 Key: HDFS-13520
 URL: https://issues.apache.org/jira/browse/HDFS-13520
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
 Environment: Hadoop 2.6/3.0, Kerberized, fuse_dfs
Reporter: Wei-Chiu Chuang


It looks like the current fuse_dfs implementation supports login using current 
kerberos credential. If the tgt expires, it fails with the following error:
{noformat}
hdfsBuilderConnect(forceNewInstance=1, nn=hdfs://ns1, port=0, 
kerbTicketCachePath=/tmp/krb5cc_2000, userName=systest) error:
LoginException: Unable to obtain Principal Name for authentication 
org.apache.hadoop.security.KerberosAuthException: failure to login: for user: 
systest using ticket cache file: /tmp/krb5cc_2000 
javax.security.auth.login.LoginException: Unable to obtain Principal Name for 
authentication
at 
org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:807)
at 
org.apache.hadoop.security.UserGroupInformation.getBestUGI(UserGroupInformation.java:742)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404)
Caused by: javax.security.auth.login.LoginException: Unable to obtain Principal 
Name for authentication
at 
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:841)
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:704)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at 
org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:788)
... 2 more

{noformat}
This is reproducible easily in a test cluster with an extremely short ticket 
life time (e.g. 1 minute)

Note: HDFS-3608 addresses a similar issue, but in this case, since the ticket 
cache file itself does not change, fuse couldn't detect & update.

It looks like it should call UserGroupInformation#loginFromKeytab() in the 
beginning, similar to how balancer supports keytab based login (HDFS-9804). 
Thanks [~xiaochen] for the idea.

Or alternatively, have a background process that continuously relogin from 
keytab.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-05-02 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/

[May 1, 2018 1:47:10 AM] (wwei) YARN-8212. Pending backlog for async allocation 
threads should be
[May 1, 2018 4:41:10 AM] (aengineer) HDDS-13. Refactor StorageContainerManager 
into seperate RPC endpoints.
[May 1, 2018 8:31:34 AM] (sunilg) YARN-8187. [UI2] Individual Node page does 
not contain breadcrumb trail.
[May 1, 2018 2:27:47 PM] (billie) YARN-2674. Fix distributed shell AM container 
relaunch during RM work
[May 1, 2018 3:12:46 PM] (inigoiri) HDFS-13503. Fix TestFsck test failures on 
Windows. Contributed by Xiao
[May 1, 2018 8:46:34 PM] (eyang) YARN-7799. Improved YARN service jar file 
handling.   
[May 1, 2018 9:19:53 PM] (jlowe) MAPREDUCE-7086. Add config to allow 
FileInputFormat to ignore
[May 1, 2018 9:32:40 PM] (stevel) HADOOP-15250. Split-DNS MultiHomed Server 
Network Cluster Network IPC




-1 overall


The following subsystems voted -1:
asflicense findbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-hdds/common 
   Exceptional return value of java.io.File.mkdirs() ignored in 
org.apache.hadoop.utils.LevelDBStore.openDB(File, Options) At 
LevelDBStore.java:ignored in org.apache.hadoop.utils.LevelDBStore.openDB(File, 
Options) At LevelDBStore.java:[line 79] 

FindBugs :

   module:hadoop-hdds/server-scm 
   Synchronization performed on java.util.concurrent.ConcurrentMap in 
org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.updateContainerReportMetrics(StorageContainerDatanodeProtocolProtos$ContainerReportsRequestProto)
 At 
SCMDatanodeProtocolServer.java:org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.updateContainerReportMetrics(StorageContainerDatanodeProtocolProtos$ContainerReportsRequestProto)
 At SCMDatanodeProtocolServer.java:[line 230] 

FindBugs :

   module:hadoop-ozone/common 
   org.apache.hadoop.ozone.web.handlers.UserArgs.getGroups() may expose 
internal representation by returning UserArgs.groups At UserArgs.java:by 
returning UserArgs.groups At UserArgs.java:[line 121] 
   org.apache.hadoop.ozone.web.handlers.UserArgs.setGroups(String[]) may 
expose internal representation by storing an externally mutable object into 
UserArgs.groups At UserArgs.java:by storing an externally mutable object into 
UserArgs.groups At UserArgs.java:[line 130] 

Failed junit tests :

   hadoop.hdfs.client.impl.TestBlockReaderLocal 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration 
   hadoop.hdfs.TestReconstructStripedFile 
   
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-compile-javac-root.txt
  [332K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/whitespace-eol.txt
  [9.4M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/whitespace-tabs.txt
  [1.1M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-hdds_common-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-hdds_server-scm-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_client.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_common-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out/branch-findbugs-hadoop-ozone_objectstore-service.txt
  [4.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/769/artifact/out