[jira] [Commented] (HDFS-10575) webhdfs fails with filenames including semicolons

2016-08-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436327#comment-15436327
 ] 

Yongjun Zhang commented on HDFS-10575:
--

Thanks [~bobhansen] and [~yuanbo]. I resolved it as duplicate of HDFS-10574.



> webhdfs fails with filenames including semicolons
> -
>
> Key: HDFS-10575
> URL: https://issues.apache.org/jira/browse/HDFS-10575
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Bob Hansen
> Attachments: curl_request.txt, dfs_copyfrom_local_traffic.txt
>
>
> Via webhdfs or native HDFS, we can create files with semicolons in their 
> names:
> {code}
> bhansen@::1 /tmp$ hdfs dfs -copyFromLocal /tmp/data 
> "webhdfs://localhost:50070/foo;bar"
> bhansen@::1 /tmp$ hadoop fs -ls /
> Found 1 items
> -rw-r--r--   2 bhansen supergroup  9 2016-06-24 12:20 /foo;bar
> {code}
> Attempting to fetch the file via webhdfs fails:
> {code}
> bhansen@::1 /tmp$ curl -L 
> "http://localhost:50070/webhdfs/v1/foo%3Bbar?user.name=bhansen=OPEN;
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  does not exist: /foo\n\tat 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)\n\tat
>  
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)\n\tat
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)\n"}}
> {code}
> It appears (from the attached TCP dump in curl_request.txt) that the 
> namenode's redirect unescapes the semicolon, and the DataNode's HTTP server 
> is splitting the request at the semicolon, and failing to find the file "foo".
> Interesting side notes:
> * In the attached dfs_copyfrom_local_traffic.txt, you can see the 
> copyFromLocal command writing the data to "foo;bar_COPYING_", which is then 
> redirected and just writes to "foo".  The subsequent rename attempts to 
> rename "foo;bar_COPYING_" to "foo;bar", but has the same parsing bug so 
> effectively renames "foo" to "foo;bar".
> Here is the full range of special characters that we initially started with 
> that led to the minimal reproducer above:
> {code}
> hdfs dfs -copyFromLocal /tmp/data webhdfs://localhost:50070/'~`!@#$%^& 
> ()-_=+|<.>]}",\\\[\{\*\?\;'\''data'
> curl -L 
> "http://localhost:50070/webhdfs/v1/%7E%60%21%40%23%24%25%5E%26+%28%29-_%3D%2B%7C%3C.%3E%5D%7D%22%2C%5C%5B%7B*%3F%3B%27data?user.name=bhansen=OPEN=0;
> {code}
> Thanks to [~anatoli.shein] for making a concise reproducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10575) webhdfs fails with filenames including semicolons

2016-08-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10575.
--
Resolution: Duplicate

> webhdfs fails with filenames including semicolons
> -
>
> Key: HDFS-10575
> URL: https://issues.apache.org/jira/browse/HDFS-10575
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Bob Hansen
> Attachments: curl_request.txt, dfs_copyfrom_local_traffic.txt
>
>
> Via webhdfs or native HDFS, we can create files with semicolons in their 
> names:
> {code}
> bhansen@::1 /tmp$ hdfs dfs -copyFromLocal /tmp/data 
> "webhdfs://localhost:50070/foo;bar"
> bhansen@::1 /tmp$ hadoop fs -ls /
> Found 1 items
> -rw-r--r--   2 bhansen supergroup  9 2016-06-24 12:20 /foo;bar
> {code}
> Attempting to fetch the file via webhdfs fails:
> {code}
> bhansen@::1 /tmp$ curl -L 
> "http://localhost:50070/webhdfs/v1/foo%3Bbar?user.name=bhansen=OPEN;
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  does not exist: /foo\n\tat 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)\n\tat
>  
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)\n\tat
>  
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)\n\tat
>  
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)\n\tat 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)\n"}}
> {code}
> It appears (from the attached TCP dump in curl_request.txt) that the 
> namenode's redirect unescapes the semicolon, and the DataNode's HTTP server 
> is splitting the request at the semicolon, and failing to find the file "foo".
> Interesting side notes:
> * In the attached dfs_copyfrom_local_traffic.txt, you can see the 
> copyFromLocal command writing the data to "foo;bar_COPYING_", which is then 
> redirected and just writes to "foo".  The subsequent rename attempts to 
> rename "foo;bar_COPYING_" to "foo;bar", but has the same parsing bug so 
> effectively renames "foo" to "foo;bar".
> Here is the full range of special characters that we initially started with 
> that led to the minimal reproducer above:
> {code}
> hdfs dfs -copyFromLocal /tmp/data webhdfs://localhost:50070/'~`!@#$%^& 
> ()-_=+|<.>]}",\\\[\{\*\?\;'\''data'
> curl -L 
> "http://localhost:50070/webhdfs/v1/%7E%60%21%40%23%24%25%5E%26+%28%29-_%3D%2B%7C%3C.%3E%5D%7D%22%2C%5C%5B%7B*%3F%3B%27data?user.name=bhansen=OPEN=0;
> {code}
> Thanks to [~anatoli.shein] for making a concise reproducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10788.
--
Resolution: Duplicate

Thanks guys, I'm marking it as duplicate of HDFS-9958.


> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436304#comment-15436304
 ] 

Hadoop QA commented on HDFS-10748:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825399/HDFS-10748.002.patch |
| JIRA Issue | HDFS-10748 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dfc6c52fc785 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 5a6fc5f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16534/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16534/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
> 
>
> Key: HDFS-10748
> URL: https://issues.apache.org/jira/browse/HDFS-10748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  

[jira] [Created] (HDFS-10794) Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-24 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10794:
---

 Summary: Provide storage policy satisfy worker at DN for 
co-ordinating the block storage movement work
 Key: HDFS-10794
 URL: https://issues.apache.org/jira/browse/HDFS-10794
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to implement a mechanism to move the blocks to the 
given target in order to satisfy the block storage policy. Datanode receives 
{{blocktomove}} details via heart beat response from NN. More specifically, its 
a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-24 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436247#comment-15436247
 ] 

Jingcheng Du commented on HDFS-9668:


Thanks a lot for the comments! [~eddyxu]
bq. AutoCloseableLock acquireDatasetLock(boolean readLock);. Would it be more 
clear to split it into two methods acquireReadLock() and acquireWriteLock(). 
From the caller aspect, it makes the code self explained.
I defined the APIs in the way of HDFS-10682. Yes, I can split it to two 
methods. Do you think we should retain the method acquireDatasetLock() which 
has no parameters? 
bq. In FsDatasetImpl#getStoredBlock(). Could you explain what does blockOpLock 
protect? IMO, datasetReadLock does not need to proecte findMetadataFile() and 
parseGenerationStamp(). What if we do the following:
I tried to replace the old locks with new locks directly to avoid potential 
issues and concerns in reviews:) and plan to do more refinements step by step 
according to comments. You are right,  we can move the file parsing operations 
out of the lock scope. I will do that in the next patch.
bq. Similarly, in getTmpInputStreams, the datasetReadLock and blockOpLock 
should only protect getReplicaInfo(), instead of several openAndSeek() calls. 
Btw, FsVolumeReference is AutoClosable that can be used into 
try-finally-resources as well.
Right, I will do it in the next patch.
bq. In private FsDatasetImpl#append(), you need the write lock to run
In {{volumeMap.add(bpid, newReplicaInfo);}}, it has its own synchronization 
mutex in methods. I think it is okay to a read lock here?
bq. In summary, in your write-heavy workloads, the write requests need to 
acquire datasetWriteLock to update volumeMap. ... since the changes on block / 
blockFile can be protected by blockOpLock, it seems to me that there is no need 
to hold dataset (read/write) locks when manipulating the blocks (i.g., bump 
genstamp)
The read/write lock is used to synchronize the operations between the volume 
operations and block operations to avoid the race condition when block 
operations and adding/removing volume operations happen concurrently. So I have 
to retain the read locks even if it is to manipulate the blocks? But yes, some 
read-only operations can be moved out of the lock scope.


> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> 

[jira] [Updated] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout

2016-08-24 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10748:
-
Attachment: HDFS-10748.002.patch

I found this issue is very similar to HDFS-8729, and there is already a 
complete analysis and corresponding patch.(See 
link:https://issues.apache.org/jira/browse/HDFS-8729?focusedCommentId=1461=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1461).

Since [~xuebinzhang@gmail.com] asked me the current status of this jira 
offline, softly ping [~ajisakaa], could you take a look for this if you have a 
time, thanks in advance.

Finally attach a new patch to add the sleep time as the comment that suggested 
in HDFS-8729.

> TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
> 
>
> Key: HDFS-10748
> URL: https://issues.apache.org/jira/browse/HDFS-10748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
> Attachments: HDFS-10748.001.patch, HDFS-10748.002.patch
>
>
> This was fixed by HDFS-7886. But some recent [Jenkins 
> Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
>  started seeing this again: 
> {code}
> Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
> testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
>   Time elapsed: 43.861 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for 
> /test/testTruncateWithDataNodesRestart to reach 3 replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184

2016-08-24 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436161#comment-15436161
 ] 

Manoj Govindassamy commented on HDFS-10793:
---

Had a chat with [~andrew.wang] and here is the proposal:

{{HdfsAuditLogger.java}}
-- Retain the old abstract method {{logAuditEvent}}, the one with no 
{{CallerContext}} in the args
-- Change the type of current method {{logAuditEvent}} (the one with 
{{CallerContext}} arg) to non-abstract and its default method body will be a 
simple call delegation to the older method (by dropping the {{CallerContext}} 
info). This will make the older AuditLogger class work without any changes and 
rebuild against new code
-- So, any AuditLogger wanting to make use of {{CallerContext}} info, has to 
override the newer {{logAuditEvent}} method with custom implementation

{{FSNamesystem.java}} 
-- In {{DefaultAuditLogger}}, implement the abstract method {{logAuditEvent}} 
with no {{CallerContext}} arg with the method body simply delegating the call 
to the current version of {{logAuditEvent}} by passing null for the 
{{CallerContext}}

Tested the above with Class implementing HdfsAuditLogger the older way and 
Client operations are logged as expected without any method signature errors.





> Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
> --
>
> Key: HDFS-10793
> URL: https://issues.apache.org/jira/browse/HDFS-10793
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Manoj Govindassamy
>Priority: Blocker
>
> HDFS-9184 added a new parameter to an existing method signature in 
> HdfsAuditLogger, which is a Public/Evolving class. This breaks binary 
> compatibility with implementing subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436157#comment-15436157
 ] 

Hadoop QA commented on HDFS-4210:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 
41s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12682752/HDFS-4210.001.patch |
| JIRA Issue | HDFS-4210 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6e1c2852c653 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a1f3293 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16533/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16533/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   

[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184

2016-08-24 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436131#comment-15436131
 ] 

Manoj Govindassamy commented on HDFS-10793:
---


Here is a test showing the problem. 
-- Set the config key "dfs.namenode.audit.loggers" to a class implementing 
HdfsAuditLogger the older way. That is, class implementing the method with *no* 
CallerContext arg.
-- Run a client which runs setTimes() operation on the cluster filesystem
-- Client operation fails as the custom AuditLogger is not binary compatible 
with the new one

{noformat}
252 2016-08-24 18:27:30,010 [IPC Server handler 0 on 63080] WARN  ipc.Server 
(Server.java:logException(2494)) - IPC Server handler 0 on 63080, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes from 127.0.0.1:63086 
Call#5 Retry#0
253 java.lang.AbstractMethodError: 
org.apache.hadoop.hdfs.server.namenode.HdfsAuditLogger.logAuditEvent(ZLjava/lang/String;Ljava/net/InetAddress;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/apache/hadoop/fs/FileStatus;Lorg/apache/hadoop/ipc/CallerContext;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager;)V
254 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:362)
255 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:340)
256 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1913)
257 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1344)
258 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:948)
259 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
260 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:663)
261 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
262 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2423)
263 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2419)
264 at java.security.AccessController.doPrivileged(Native Method)
265 at javax.security.auth.Subject.doAs(Subject.java:422)
266 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
267 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2419)
{noformat}

> Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
> --
>
> Key: HDFS-10793
> URL: https://issues.apache.org/jira/browse/HDFS-10793
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Manoj Govindassamy
>Priority: Blocker
>
> HDFS-9184 added a new parameter to an existing method signature in 
> HdfsAuditLogger, which is a Public/Evolving class. This breaks binary 
> compatibility with implementing subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10791) Delete block meta file when the block file is missing

2016-08-24 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436085#comment-15436085
 ] 

Yiqun Lin commented on HDFS-10791:
--

Hi, [~szetszwo], this issue seems has been fixed, the related codes:
{code:title=FsDatasetImpl.java|borderStyle=solid}
  public void checkAndUpdate(String bpid, long blockId, File diskFile,
  File diskMetaFile, FsVolumeSpi vol) throws IOException {
Block corruptBlock = null;
ReplicaInfo memBlockInfo;
try (AutoCloseableLock lock = datasetLock.acquire()) {
  memBlockInfo = volumeMap.get(bpid, blockId);
  if (memBlockInfo != null && memBlockInfo.getState() != 
ReplicaState.FINALIZED) {
// Block is not finalized - ignore the difference
return;
  }

  final long diskGS = diskMetaFile != null && diskMetaFile.exists() ?
  Block.getGenerationStamp(diskMetaFile.getName()) :
HdfsConstants.GRANDFATHER_GENERATION_STAMP;

  if (diskFile == null || !diskFile.exists()) {
if (memBlockInfo == null) {
  // Block file does not exist and block does not exist in memory
  // If metadata file exists then delete it
  if (diskMetaFile != null && diskMetaFile.exists()
  && diskMetaFile.delete()) {
LOG.warn("Deleted a metadata file without a block "
+ diskMetaFile.getAbsolutePath());
  }
  return;
}
...
  }
{code}


> Delete block meta file when the block file is missing
> -
>
> Key: HDFS-10791
> URL: https://issues.apache.org/jira/browse/HDFS-10791
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>
> When the block file is missing, the block meta file should be deleted if it 
> exists.
> Note that such situation is possible since the meta file is closed before the 
> block file, the datanode could be killed in-between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-24 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436050#comment-15436050
 ] 

John Zhuge commented on HDFS-4210:
--

[~clamb]/[~cwl], I am taking over the jira in order to push it over the finish 
line. Hope that is ok with you.

> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93)
>   ... 10 more
> I suggest that if quorum is up format should not fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-24 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge reassigned HDFS-4210:


Assignee: John Zhuge  (was: Charles Lamb)

> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93)
>   ... 10 more
> I suggest that if quorum is up format should not fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9028) Create accessor methods for DataNode#data and DataNode#isBlockTokenEnabled

2016-08-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo resolved HDFS-9028.

Resolution: Won't Fix

The original motivation for creating these accessor methods is no longer valid. 
I figured out a way to mock them without the methods. Please re-open if you 
feel otherwise.

> Create accessor methods for DataNode#data and DataNode#isBlockTokenEnabled
> --
>
> Key: HDFS-9028
> URL: https://issues.apache.org/jira/browse/HDFS-9028
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Priority: Minor
>
> Currently both DataNode#data and DataNode#isBlockTokenEnabled instance 
> variables are package scoped with no accessor methods. This makes mocking in 
> unit tests difficult. This jira is to make them private scoped with proper 
> getters and setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9028) Create accessor methods for DataNode#data and DataNode#isBlockTokenEnabled

2016-08-24 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-9028:
---
Assignee: (was: Chris Trezzo)

> Create accessor methods for DataNode#data and DataNode#isBlockTokenEnabled
> --
>
> Key: HDFS-9028
> URL: https://issues.apache.org/jira/browse/HDFS-9028
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Priority: Minor
>
> Currently both DataNode#data and DataNode#isBlockTokenEnabled instance 
> variables are package scoped with no accessor methods. This makes mocking in 
> unit tests difficult. This jira is to make them private scoped with proper 
> getters and setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10792) RedundantEditLogInputStream should log caught exceptions

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436022#comment-15436022
 ] 

Hadoop QA commented on HDFS-10792:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 
45s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825354/HDFS-10792.01.patch |
| JIRA Issue | HDFS-10792 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c4958ca13007 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a1f3293 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16532/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16532/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RedundantEditLogInputStream should log caught exceptions
> 
>
> Key: HDFS-10792
> URL: https://issues.apache.org/jira/browse/HDFS-10792
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10792.01.patch
>
>
> 

[jira] [Updated] (HDFS-10475) Adding metrics for long FSD lock

2016-08-24 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10475:
--
Summary: Adding metrics for long FSD lock  (was: Adding metrics and 
warn/debug logs for long FSD lock)

> Adding metrics for long FSD lock
> 
>
> Key: HDFS-10475
> URL: https://issues.apache.org/jira/browse/HDFS-10475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> This is a follow up of the comment on HADOOP-12916 and 
> [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837]
>  add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on 
> namenode similar to what we have for slow write/network WARN/metrics on 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10475) Adding metrics and warn/debug logs for long FSD lock

2016-08-24 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435977#comment-15435977
 ] 

Xiaoyu Yao commented on HDFS-10475:
---

Thanks [~zhz] for the heads up. This log part is covered by HDFS-9145 but the 
metrics are not. I've update the title to reflect it. 

> Adding metrics and warn/debug logs for long FSD lock
> 
>
> Key: HDFS-10475
> URL: https://issues.apache.org/jira/browse/HDFS-10475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> This is a follow up of the comment on HADOOP-12916 and 
> [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837]
>  add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on 
> namenode similar to what we have for slow write/network WARN/metrics on 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184

2016-08-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10793:
---
Assignee: Manoj Govindassamy

> Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
> --
>
> Key: HDFS-10793
> URL: https://issues.apache.org/jira/browse/HDFS-10793
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Manoj Govindassamy
>Priority: Blocker
>
> HDFS-9184 added a new parameter to an existing method signature in 
> HdfsAuditLogger, which is a Public/Evolving class. This breaks binary 
> compatibility with implementing subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184

2016-08-24 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-10793:
--

 Summary: Fix HdfsAuditLogger binary incompatibility introduced by 
HDFS-9184
 Key: HDFS-10793
 URL: https://issues.apache.org/jira/browse/HDFS-10793
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Andrew Wang
Priority: Blocker


HDFS-9184 added a new parameter to an existing method signature in 
HdfsAuditLogger, which is a Public/Evolving class. This breaks binary 
compatibility with implementing subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435950#comment-15435950
 ] 

Hadoop QA commented on HDFS-10652:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m  
3s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825348/HDFS-10652.006.patch |
| JIRA Issue | HDFS-10652 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c3aa9c71f181 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a1f3293 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16531/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16531/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add a unit test for HDFS-4660
> -
>
> Key: HDFS-10652
> URL: https://issues.apache.org/jira/browse/HDFS-10652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Vinayakumar B
> Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, 
> HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, 
> HDFS-10652.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Commented] (HDFS-10475) Adding metrics and warn/debug logs for long FSD lock

2016-08-24 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435932#comment-15435932
 ] 

Zhe Zhang commented on HDFS-10475:
--

[~xyao] Is this proposal partially overlapped with HDFS-9145? Thanks.

> Adding metrics and warn/debug logs for long FSD lock
> 
>
> Key: HDFS-10475
> URL: https://issues.apache.org/jira/browse/HDFS-10475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> This is a follow up of the comment on HADOOP-12916 and 
> [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837]
>  add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on 
> namenode similar to what we have for slow write/network WARN/metrics on 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication

2016-08-24 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435895#comment-15435895
 ] 

Ravi Prakash commented on HDFS-9205:


Thanks for the change Nicholas! Should this line be modified? 
https://github.com/apache/hadoop/blob/a1f3293762dddb0ca953d1145f5b53d9086b25b8/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java#L62
 .

I think most often this queue had missing blocks, so it didn't really make 
sense to re-replicate missing blocks anyway. We should be careful about 
removing this queue though, because its where the [count of missing blocks is 
taken 
from|https://github.com/apache/hadoop/blob/a1f3293762dddb0ca953d1145f5b53d9086b25b8/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4112]

> Do not schedule corrupt blocks for replication
> --
>
> Key: HDFS-9205
> URL: https://issues.apache.org/jira/browse/HDFS-9205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: h9205_20151007.patch, h9205_20151007b.patch, 
> h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, 
> h9205_20151013.patch, h9205_20151015.patch
>
>
> Corrupted blocks by definition are blocks cannot be read. As a consequence, 
> they cannot be replicated.  In UnderReplicatedBlocks, there is a queue for 
> QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks 
> from it.  It seems that scheduling corrupted block for replication is wasting 
> resource and potentially slow down replication for the higher priority blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10792) RedundantEditLogInputStream should log caught exceptions

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10792:
---
Status: Patch Available  (was: Open)

> RedundantEditLogInputStream should log caught exceptions
> 
>
> Key: HDFS-10792
> URL: https://issues.apache.org/jira/browse/HDFS-10792
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10792.01.patch
>
>
> There are a few places in {{RedundantEditLogInputStream}} where an 
> IOException is caught but never logged. We should improve the logging of 
> these exceptions to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10792) RedundantEditLogInputStream should log caught exceptions

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10792:
---
Attachment: HDFS-10792.01.patch

v01. two places that need to log exceptions.

> RedundantEditLogInputStream should log caught exceptions
> 
>
> Key: HDFS-10792
> URL: https://issues.apache.org/jira/browse/HDFS-10792
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10792.01.patch
>
>
> There are a few places in {{RedundantEditLogInputStream}} where an 
> IOException is caught but never logged. We should improve the logging of 
> these exceptions to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10792) RedundantEditLogInputStream should log caught exceptions

2016-08-24 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-10792:
--

 Summary: RedundantEditLogInputStream should log caught exceptions
 Key: HDFS-10792
 URL: https://issues.apache.org/jira/browse/HDFS-10792
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


There are a few places in {{RedundantEditLogInputStream}} where an IOException 
is caught but never logged. We should improve the logging of these exceptions 
to help debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-4660) Block corruption can happen during pipeline recovery

2016-08-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435861#comment-15435861
 ] 

Yongjun Zhang edited comment on HDFS-4660 at 8/24/16 10:42 PM:
---

Hi [~nroberts],

Thanks for your earlier work here. Would you please explain how you did the 
first step

"Modify teragen to hflush() every 1 records"

in

https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862

Thanks much.




was (Author: yzhangal):
Hi [~nroberts],

Thanks for your earlier work here. Would you please explain how you did the 
first step

"Modify teragen to hflush() every 1 records"

in

"
https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862

Thanks much.



> Block corruption can happen during pipeline recovery
> 
>
> Key: HDFS-4660
> URL: https://issues.apache.org/jira/browse/HDFS-4660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.3-alpha, 3.0.0-alpha1
>Reporter: Peng Zhang
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.7.1, 2.6.4
>
> Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, 
> HDFS-4660.v2.patch
>
>
> pipeline DN1  DN2  DN3
> stop DN2
> pipeline added node DN4 located at 2nd position
> DN1  DN4  DN3
> recover RBW
> DN4 after recover rbw
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134144
>   getBytesOnDisk() = 134144
>   getVisibleLength()= 134144
> end at chunk (134144/512=262)
> DN3 after recover rbw
> 2013-04-01 21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01
>  21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134028 
>   getBytesOnDisk() = 134028
>   getVisibleLength()= 134028
> client send packet after recover pipeline
> offset=133632  len=1008
> DN4 after flush 
> 2013-04-01 21:02:31,779 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1063
> // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 
> 1063.
> DN3 after flush
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, 
> type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, 
> lastPacketInBlock=false, offsetInBlock=134640, 
> ackEnqueueNanoTime=8817026136871545)
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing 
> meta file offset of block 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 
> 1055 to 1051
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1059
> After checking meta on DN4, I found checksum of chunk 262 is duplicated, but 
> data not.
> Later after block was finalized, DN4's scanner detected bad block, and then 
> reported it to NM. NM send a command to delete this block, and replicate this 
> block from other DN in pipeline to satisfy duplication num.
> I think this is because in BlockReceiver it skips data bytes already written, 
> but not skips checksum bytes already written. And function 
> adjustCrcFilePosition is only used for last non-completed chunk, but
> not for this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4660) Block corruption can happen during pipeline recovery

2016-08-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435861#comment-15435861
 ] 

Yongjun Zhang commented on HDFS-4660:
-

Hi [~nroberts],

Thanks for your earlier work here. Would you please explain how you did the 
first step

"Modify teragen to hflush() every 1 records"

in

"
https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862

Thanks much.



> Block corruption can happen during pipeline recovery
> 
>
> Key: HDFS-4660
> URL: https://issues.apache.org/jira/browse/HDFS-4660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.3-alpha, 3.0.0-alpha1
>Reporter: Peng Zhang
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.7.1, 2.6.4
>
> Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, 
> HDFS-4660.v2.patch
>
>
> pipeline DN1  DN2  DN3
> stop DN2
> pipeline added node DN4 located at 2nd position
> DN1  DN4  DN3
> recover RBW
> DN4 after recover rbw
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134144
>   getBytesOnDisk() = 134144
>   getVisibleLength()= 134144
> end at chunk (134144/512=262)
> DN3 after recover rbw
> 2013-04-01 21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01
>  21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134028 
>   getBytesOnDisk() = 134028
>   getVisibleLength()= 134028
> client send packet after recover pipeline
> offset=133632  len=1008
> DN4 after flush 
> 2013-04-01 21:02:31,779 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1063
> // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 
> 1063.
> DN3 after flush
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, 
> type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, 
> lastPacketInBlock=false, offsetInBlock=134640, 
> ackEnqueueNanoTime=8817026136871545)
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing 
> meta file offset of block 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 
> 1055 to 1051
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1059
> After checking meta on DN4, I found checksum of chunk 262 is duplicated, but 
> data not.
> Later after block was finalized, DN4's scanner detected bad block, and then 
> reported it to NM. NM send a command to delete this block, and replicate this 
> block from other DN in pipeline to satisfy duplication num.
> I think this is because in BlockReceiver it skips data bytes already written, 
> but not skips checksum bytes already written. And function 
> adjustCrcFilePosition is only used for last non-completed chunk, but
> not for this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Jeff Field (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435855#comment-15435855
 ] 

Jeff Field commented on HDFS-10788:
---

Perfect, thanks, I'll discuss a backport with my vendor/plan an upgrade.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10791) Delete block meta file when the block file is missing

2016-08-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435843#comment-15435843
 ] 

Tsz Wo Nicholas Sze commented on HDFS-10791:


A related comment: Currently, BlockPoolSlice.addToReplicasMap(..) ignores files 
other than block files.  We should log a warning if an unidentified file found.

> Delete block meta file when the block file is missing
> -
>
> Key: HDFS-10791
> URL: https://issues.apache.org/jira/browse/HDFS-10791
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>
> When the block file is missing, the block meta file should be deleted if it 
> exists.
> Note that such situation is possible since the meta file is closed before the 
> block file, the datanode could be killed in-between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10791) Delete block meta file when the block file is missing

2016-08-24 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-10791:
--

 Summary: Delete block meta file when the block file is missing
 Key: HDFS-10791
 URL: https://issues.apache.org/jira/browse/HDFS-10791
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Tsz Wo Nicholas Sze


When the block file is missing, the block meta file should be deleted if it 
exists.

Note that such situation is possible since the meta file is closed before the 
block file, the datanode could be killed in-between.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435817#comment-15435817
 ] 

Bob Hansen commented on HDFS-10754:
---

Looks like there was a compilation failure in tools: could not find 
hdfs_find.cpp.

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, 
> HDFS-10754.HDFS-8707.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435809#comment-15435809
 ] 

Bob Hansen edited comment on HDFS-10754 at 8/24/16 10:10 PM:
-

Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep 
dragging you back, but...

The new recursive methods should not use future/promises internally.  That 
blocks one of the asio threads waiting for more data; if a consumer tried to do 
one of these with a single thread in the threadpool, it would deadlock waiting 
for the subtasks to complete, but they'd all get queued up behind the initial 
handler.

Instead, whenever they're all done (request_count == 0), the last one out the 
door (the handler that dropped the request_count to 0) should call into the 
consumer's handler directly with the final status.  If any of the other threads 
has received an error, all of the subsequent deliveries to the handler should 
return false, telling find "I'm going to report an error anyway, so don't 
bother recursing any more."  It's good to wait until request_count==0, even in 
an error state, so the consumer doesn't have any lame duck requests queued up 
to take care of?

Also because all of this is asynchronous, you can't allocate the lock and state 
variables on the stack.  When the consumer calls SetOwner('/', true, handler), 
the function is going to return as soon as the find operation is kicked off, 
destroying all of the elements on the stack.  We'll need to create a little 
struct for SetOwner that is maintained with a shared_ptr and cleaned up when 
the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's 
more outstanding_requests, since for big directories, we'll have more requests 
without recursing.

Perhaps FindOperationState is a better name than CurrentState, and 
SharedFindState is better than just SharedState (since we might have many 
shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" 
"\*.db", and not have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the 
itnerface (even if private)?

We're very close, now.



was (Author: bobhansen):
Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep 
dragging you back, but...

The new recursive methods should not use future/promises internally.  That 
blocks one of the asio threads waiting for more data; if a consumer tried to do 
one of these with a single thread in the threadpool, it would deadlock waiting 
for the subtasks to complete, but they'd all get queued up behind the initial 
handler.

Instead, whenever they're all done (request_count == 0), the last one out the 
door (the handler that dropped the request_count to 0) should call into the 
consumer's handler directly with the final status.  If any of the other threads 
has received an error, all of the subsequent deliveries to the handler should 
return false, telling find "I'm going to report an error anyway, so don't 
bother recursing any more."  It's good to wait until request_count==0, even in 
an error state, so the consumer doesn't have any lame duck requests queued up 
to take care of?

Also because all of this is asynchronous, you can't allocate the lock and state 
variables on the stack.  When the consumer calls SetOwner('/', true, handler), 
the function is going to return as soon as the find operation is kicked off, 
destroying all of the elements on the stack.  We'll need to create a little 
struct for SetOwner that is maintained with a shared_ptr and cleaned up when 
the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's 
more outstanding_requests, since for big directories, we'll have more requests 
without recursing.

Perhaps FindOperationState is a better name than CurrentState, and 
SharedFindState is better than just SharedState (since we might have many 
shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" 
"*.db", and not have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the 
itnerface (even if private)?

We're very close, now.


> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  

[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435809#comment-15435809
 ] 

Bob Hansen commented on HDFS-10754:
---

Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep 
dragging you back, but...

The new recursive methods should not use future/promises internally.  That 
blocks one of the asio threads waiting for more data; if a consumer tried to do 
one of these with a single thread in the threadpool, it would deadlock waiting 
for the subtasks to complete, but they'd all get queued up behind the initial 
handler.

Instead, whenever they're all done (request_count == 0), the last one out the 
door (the handler that dropped the request_count to 0) should call into the 
consumer's handler directly with the final status.  If any of the other threads 
has received an error, all of the subsequent deliveries to the handler should 
return false, telling find "I'm going to report an error anyway, so don't 
bother recursing any more."  It's good to wait until request_count==0, even in 
an error state, so the consumer doesn't have any lame duck requests queued up 
to take care of?

Also because all of this is asynchronous, you can't allocate the lock and state 
variables on the stack.  When the consumer calls SetOwner('/', true, handler), 
the function is going to return as soon as the find operation is kicked off, 
destroying all of the elements on the stack.  We'll need to create a little 
struct for SetOwner that is maintained with a shared_ptr and cleaned up when 
the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's 
more outstanding_requests, since for big directories, we'll have more requests 
without recursing.

Perhaps FindOperationState is a better name than CurrentState, and 
SharedFindState is better than just SharedState (since we might have many 
shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" 
"*.db", and not have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the 
itnerface (even if private)?

We're very close, now.


> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, 
> HDFS-10754.HDFS-8707.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10768) Optimize mkdir ops

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435812#comment-15435812
 ] 

Hadoop QA commented on HDFS-10768:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 58 unchanged - 1 fixed = 59 total (was 59) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
1s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Invocation of toString on localName in 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirectory, 
INodesInPath, byte[], PermissionStatus, short, long, String, String)  At 
FSDirWriteFileOp.java:in 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirectory, 
INodesInPath, byte[], PermissionStatus, short, long, String, String)  At 
FSDirWriteFileOp.java:[line 598] |
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824008/HDFS-10768.patch |
| JIRA Issue | HDFS-10768 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b239842e71fc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a1f3293 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16529/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16529/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
 |

[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-24 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435803#comment-15435803
 ] 

Lei (Eddy) Xu commented on HDFS-9668:
-

Hi, [~jingcheng...@intel.com] Thanks a lot for the patch.  It looks nice 
overall.

* {{AutoCloseableLock acquireDatasetLock(boolean readLock);}}.  Would it be 
more clear to split it into two methods {{acquireReadLock()}} and 
{{acquireWriteLock()}}. From the caller aspect, it makes the code self 
explained. 
* In {{FsDatasetImpl#getStoredBlock()}}. Could you explain what does 
{{blockOpLock}} protect?  IMO, {{datasetReadLock}} does not need to proecte 
{{findMetadataFile()}} and {{parseGenerationStamp()}}. What if we do the 
following:

{code}
File blockfile = null
try (AutoCloseableLock lock = datasetReadLock.acquire()) {
synchronized (getBlockOpLock(blkid)) {
blockfile = getFile(bpid, blkid, false);  
   }
}
if blockFile == null {
  return null
}
final File metafile = 
{code}

Similarly, in {{getTmpInputStreams}}, the {{datasetReadLock}} and 
{{blockOpLock}} should only protect {{getReplicaInfo()}}, instead of several 
{{openAndSeek()}} calls.  
Btw, {{FsVolumeReference}} is {{AutoClosable}} that can be used into 
{{try-finally-resources}} as well.

* In {{private FsDatasetImpl#append()}}, you need the write lock to run 
{code}
1311volumeMap.add(bpid, newReplicaInfo);
{code}
Also, you might want to add a comment for {{append()}} that the caller must 
hold {{blockOpLock}}.

* Similarly, we do not need read locks in {{recoverAppend()}} and 
{{recoverClose()}} after calling {{recoverCheck()}}.

In summary, in your write-heavy workloads, the write requests need to acquire 
{{datasetWriteLock}} to update {{volumeMap}}. As this patch using fair 
read/write locks, the duration of {{readLock}} should be as short as possible 
to allow write locks being acquired more frequently. On the other hand, since 
the changes on {{block / blockFile}} can be protected by {{blockOpLock}}, it 
seems to me that there is no need to hold dataset (read/write) locks when 
manipulating the blocks (i.g., bump genstamp). What do you think, 
[~jingcheng...@intel.com]?









> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> 

[jira] [Updated] (HDFS-10652) Add a unit test for HDFS-4660

2016-08-24 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-10652:
-
Attachment: HDFS-10652.006.patch

Hi [~vinayrpet],

Sorry for the delay. I finally got to look at this case again. 

I found that it's not that we are shutting down wrong DNs. It's because the 
block report of the new DN did not make to NN when we do the read, from time to 
time.  So I introduced some new code to trigger block report instead of
{code}
DFSTestUtil.waitForReplication(fs, fileName, (short)3, 2000);
{code}
which doesn't seem to guarantee block report is received by NN.

With this change, the test appears to fail stably with checksum error when 
without the fix of HDFS-4660/HDFS-9220, and succeed stably with the fixes.

Would you please take a look?

Thanks.


> Add a unit test for HDFS-4660
> -
>
> Key: HDFS-10652
> URL: https://issues.apache.org/jira/browse/HDFS-10652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Vinayakumar B
> Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, 
> HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, 
> HDFS-10652.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435765#comment-15435765
 ] 

Hadoop QA commented on HDFS-10754:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
52s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
50s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
48s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m  
9s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
8s{color} | {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m  8s{color} | 
{color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_101. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  8s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
10s{color} | {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.7.0_111. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 10s{color} | 
{color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_111. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 10s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.7.0_111. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m  
7s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m  
9s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 10s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.7.0_111. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825336/HDFS-10754.HDFS-8707.009.patch
 |
| JIRA Issue | HDFS-10754 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  javadoc  
mvninstall  |
| uname | Linux 0537edd0cd98 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 4a74bc4 |
| Default Java | 1.7.0_111 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 |
| 

[jira] [Created] (HDFS-10790) libhdfs++: split recursive versions of SetPermission and SetOwner to SetAllPermissions and SetAllOwner

2016-08-24 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-10790:
-

 Summary: libhdfs++: split recursive versions of SetPermission and 
SetOwner to SetAllPermissions and SetAllOwner
 Key: HDFS-10790
 URL: https://issues.apache.org/jira/browse/HDFS-10790
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Bob Hansen


We currently have a flag that we pass in to SetPermission and SetOwner that 
change the semantics of the call.  We should split it into two functions, one 
that does an efficient, direct version, and the other that does globbing and 
optionally recursion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

2016-08-24 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435692#comment-15435692
 ] 

Anatoli Shein edited comment on HDFS-10679 at 8/24/16 9:23 PM:
---

Thanks for the review, [~bobhansen].

I will close this jira now since I am moving the changes from here to 
HDFS-10754.

I have addressed your comments as follows:

FS::Find:
* Async - Callback deliver results with a const std::vector &, not a 
shared_ptr. This is a signal to the consumer to use the data delivered during 
the callback, but don't use the passed-in container.
(/) Done, also fixed GetListing Async.
* Likewise, the synchronous call should take a non-const std::vector 
* for an output parameter, signaling to the consumer that we are going to 
mutate their input vector
(/) Done, also fixed GetListing Sync.
* We need a very clear threading model. Will the handler be called concurrently 
from multiple threads (currently, yes. If we ever get on asio fibers, we should 
make it a no, because we love our consumers)
(i) I agree. We might need to make a jira for that.
* We're doing a lot of dynamic memory allocation during recursion. Could we 
restructure things a little to not copy the entirety of the FindState and 
RecursionState on each call? It appears that they each have one element that is 
being updated for each recursive call
(/) I separated the state into CurrentState and SharedState. SharedState is 
never copied now.
* We need to hold the lock while incrementing the recursion_counter also
(i) recursion_counter is atomic and in our case increments are never paired 
with read accessed, so they do not need locking.
* If the handler returns false (don't want more) at the end of the function, do 
we do anything to prevent more from being delivered? Should we push that into 
the shared find_state and bail out for any subsequent NN responses?
(/) I added a variable "aborted" that stopps recursion when user does not want 
anymore.

find.cpp:
* Like the cat examples, simplify as much as possible. Nuke URI parsing, etc.
(/) Done.
* Expand smth_found to something_found to prevent confusion (especially in an 
example)
(/) Done.
* We have race conditions if one thread is outputting the previous block while 
another thread gets a final block (or error).
(/) Fixed by locking the handler.
FS::GetFileInfo should populate the full_path member also
(/) Done.


was (Author: anatoli.shein):
Thanks for the review, [~bobhansen].

I will close this jira now since I am moving the changes from here to 
HDFS-10754.

I have addressed your comments as follows:

FS::Find:
* Async - Callback deliver results with a const std::vector &, not a 
shared_ptr. This is a signal to the consumer to use the data delivered during 
the callback, but don't use the passed-in container.
(/) Done, also fixed GetListing Async.
* Likewise, the synchronous call should take a non-const std::vector 
* for an output parameter, signaling to the consumer that we are going to 
mutate their input vector
(/) Done, also fixed GetListing Sync.
* We need a very clear threading model. Will the handler be called concurrently 
from multiple threads (currently, yes. If we ever get on asio fibers, we should 
make it a no, because we love our consumers)
(i) I agree. We might need to make a jira for that.
* We're doing a lot of dynamic memory allocation during recursion. Could we 
restructure things a little to not copy the entirety of the FindState and 
RecursionState on each call? It appears that they each have one element that is 
being updated for each recursive call
(/) I separated the state into CurrentState and SharedState. SharedState is 
never copied now.
* We need to hold the lock while incrementing the recursion_counter also
(i) recursion_counter is atomic and in our case increments are never paired 
with read accessed, so they do not need locking.
* If the handler returns false (don't want more) at the end of the function, do 
we do anything to prevent more from being delivered? Should we push that into 
the shared find_state and bail out for any subsequent NN responses?
(/) I added a variable "aborted" that stopps recursion when user does not want 
anymore.
find.cpp:
* Like the cat examples, simplify as much as possible. Nuke URI parsing, etc.
(/) Done.
* Expand smth_found to something_found to prevent confusion (especially in an 
example)
(/) Done.
* We have race conditions if one thread is outputting the previous block while 
another thread gets a final block (or error).
(/) Fixed by locking the handler.
FS::GetFileInfo should populate the full_path member also
(/) Done.

> libhdfs++: Implement parallel find with wildcards tool
> --
>
> Key: HDFS-10679
> URL: https://issues.apache.org/jira/browse/HDFS-10679
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>

[jira] [Comment Edited] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435697#comment-15435697
 ] 

Anatoli Shein edited comment on HDFS-10754 at 8/24/16 9:22 PM:
---

Thanks for the review, [~bobhansen].

I have addressed your comments as follows:

cat:
* Why are we including "google/protobuf/stubs/common.h?
(i) Because we need it to call ShutdownProtobufLibrary() (clean up for 
protobuf).
* Comment says "wrapping fs in unique_ptr", but then we wrap it in a shared_ptr.
(/) Fixed.
* We shouldn't check the port on defaultFS before reporting error. The 
defaultFS might point to an HA name. Just check that defaultFS isn't empty, and 
report it if it is.
(/) Done.
* For simplicity, cat should just use file->Read rather than PositionRead
(/) Done.

gendirs:
* As an example, gendirs should definitely not need to use 
fs/namenode_operations
(/) Fixed by adding hardcoded permissions instead of pulling the default ones 
from the namenode_ops.
* It should not need to use common/* either; we should push the configuration 
stuff into hdfspp/ (but perhaps that can be a different bug)
(i) Created a new jira for this: HDFS-10787
* We generally don't make values on the stack const (e.g. path, depth, fanout). 
It's not wrong, just generally redundant (unless it's important they be const 
for some reason)
(/) Fixed.
* Put a comment on setting the timeout referring to HDFS-10781 (and vice-versa)
(/) Comments added.
configuration_loader.cc:
* Add a comment on why we're calling setDefaultSearchPath in the ctor
(/) Added.

configuration_loader.h:
* I might make the comment something along the lines of "Creates a 
configuration loader with the default search path (). If you want to 
explicitly set the entire search path, call ClearSearchPath() first"
(/) Fixed.
* Filesystem.cc: can SetOwner/SetPermission be written as a call to 
::Find(recursive=true) with the SetOwner/SetPerimission implemented in the 
callback? Then we wouldn't need three separate implementations of the recursion 
logic
(/) Done. Recursive SetOwner and SetPermission now just use Find results to 
launch their own asynchrous calls. The code between them is similar though, and 
might need to be collapsed even further in the future.
* Does recursive SetOwner/SetPermissions accept globs both for the recursive 
and non-recursive versions? We should be consistent. Perhaps SetOwner(explicit 
filename, fast) and SetOwner(glob/recursive, slower) should be different methods
(i) Since SetOwner and SetPermission are now using Find to recurse the results, 
we get wild-cards for free. To activate them we will possibly need to split all 
recursive functions in two.

Tools impl:
* Make a usage() function that prints out the usage of the tool. Call it if 
"--help", "-h", or a parameter problem occurs.h
(/) Added a usage() function which is called when parameter problem occurs, or 
when "-h" flag is passed.
(i) GetOpt does not allow long options, so I did not add "--help" flag, just 
"-h".
* Keep gendirs in the examples. I don't think we need a tool version of it.
(/) hdfs_gendirs removed.
* Include a comment on HDFS-9539 to fix up the tools and examples as part of 
the scope.
(/) Done.
* hdfsNewBuilderFromDirectory (in hdfs.cc) should call ClearSearchPath rather 
than inheriting the default. Are there any other instances in our codebase 
where we're currently constructing loaders whose behavior we need to 
double-check?
(/) Done. Also fixed in files: configuration_test.cc, configuration_test.h, and 
hdfs_configuration_test.cc.


was (Author: anatoli.shein):
Thanks for the review, [~bobhansen].

I have addressed your comments as follows:
cat:
* Why are we including "google/protobuf/stubs/common.h?
(i) Because we need it to call ShutdownProtobufLibrary() (clean up for 
protobuf).
* Comment says "wrapping fs in unique_ptr", but then we wrap it in a shared_ptr.
(/) Fixed.
* We shouldn't check the port on defaultFS before reporting error. The 
defaultFS might point to an HA name. Just check that defaultFS isn't empty, and 
report it if it is.
(/) Done.
* For simplicity, cat should just use file->Read rather than PositionRead
(/) Done.
gendirs:
* As an example, gendirs should definitely not need to use 
fs/namenode_operations
(/) Fixed by adding hardcoded permissions instead of pulling the default ones 
from the namenode_ops.
* It should not need to use common/* either; we should push the configuration 
stuff into hdfspp/ (but perhaps that can be a different bug)
(i) Created a new jira for this: HDFS-10787
* We generally don't make values on the stack const (e.g. path, depth, fanout). 
It's not wrong, just generally redundant (unless it's important they be const 
for some reason)
(/) Fixed.
* Put a comment on setting the timeout referring to HDFS-10781 (and vice-versa)
(/) Comments added.
configuration_loader.cc:
* Add a comment on why we're calling 

[jira] [Comment Edited] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435658#comment-15435658
 ] 

Wei-Chiu Chuang edited comment on HDFS-10788 at 8/24/16 9:23 PM:
-

Thanks [~kshukla] for confirming my guess. I also traced the code and found 
{{ClientProtocol.getBlockLocations}} indirectly calls 
{{BlockManager#createLocatedBlocks}}. CDH5.5.2 is GA early this year before 
HDFS-9985 was committed so it does not have the fix HDFS-9985.


was (Author: jojochuang):
Thanks [~kshukla] for confirming my guess. I also traced the code and found 
{{ClientProtocol.getBlockLocations}} indirectly calls 
{{BlockManager#createLocatedBlocks}}. CDH5.5.2 is GA before Hadoop 2.7.3 so it 
does not have the fix HDFS-9985.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10754:
-
Attachment: HDFS-10754.HDFS-8707.009.patch

Please review the new patch.

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, 
> HDFS-10754.HDFS-8707.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435697#comment-15435697
 ] 

Anatoli Shein commented on HDFS-10754:
--

Thanks for the review, [~bobhansen].

I have addressed your comments as follows:
cat:
* Why are we including "google/protobuf/stubs/common.h?
(i) Because we need it to call ShutdownProtobufLibrary() (clean up for 
protobuf).
* Comment says "wrapping fs in unique_ptr", but then we wrap it in a shared_ptr.
(/) Fixed.
* We shouldn't check the port on defaultFS before reporting error. The 
defaultFS might point to an HA name. Just check that defaultFS isn't empty, and 
report it if it is.
(/) Done.
* For simplicity, cat should just use file->Read rather than PositionRead
(/) Done.
gendirs:
* As an example, gendirs should definitely not need to use 
fs/namenode_operations
(/) Fixed by adding hardcoded permissions instead of pulling the default ones 
from the namenode_ops.
* It should not need to use common/* either; we should push the configuration 
stuff into hdfspp/ (but perhaps that can be a different bug)
(i) Created a new jira for this: HDFS-10787
* We generally don't make values on the stack const (e.g. path, depth, fanout). 
It's not wrong, just generally redundant (unless it's important they be const 
for some reason)
(/) Fixed.
* Put a comment on setting the timeout referring to HDFS-10781 (and vice-versa)
(/) Comments added.
configuration_loader.cc:
* Add a comment on why we're calling setDefaultSearchPath in the ctor
(/) Added.
configuration_loader.h:
* I might make the comment something along the lines of "Creates a 
configuration loader with the default search path (). If you want to 
explicitly set the entire search path, call ClearSearchPath() first"
(/) Fixed.
* Filesystem.cc: can SetOwner/SetPermission be written as a call to 
::Find(recursive=true) with the SetOwner/SetPerimission implemented in the 
callback? Then we wouldn't need three separate implementations of the recursion 
logic
(/) Done. Recursive SetOwner and SetPermission now just use Find results to 
launch their own asynchrous calls. The code between them is similar though, and 
might need to be collapsed even further in the future.
* Does recursive SetOwner/SetPermissions accept globs both for the recursive 
and non-recursive versions? We should be consistent. Perhaps SetOwner(explicit 
filename, fast) and SetOwner(glob/recursive, slower) should be different methods
(i) Since SetOwner and SetPermission are now using Find to recurse the results, 
we get wild-cards for free. To activate them we will possibly need to split all 
recursive functions in two.
Tools impl:
* Make a usage() function that prints out the usage of the tool. Call it if 
"--help", "-h", or a parameter problem occurs.h
(/) Added a usage() function which is called when parameter problem occurs, or 
when "-h" flag is passed.
(i) GetOpt does not allow long options, so I did not add "--help" flag, just 
"-h".
* Keep gendirs in the examples. I don't think we need a tool version of it.
(/) hdfs_gendirs removed.
* Include a comment on HDFS-9539 to fix up the tools and examples as part of 
the scope.
(/) Done.
* hdfsNewBuilderFromDirectory (in hdfs.cc) should call ClearSearchPath rather 
than inheriting the default. Are there any other instances in our codebase 
where we're currently constructing loaders whose behavior we need to 
double-check?
(/) Done. Also fixed in files: configuration_test.cc, configuration_test.h, and 
hdfs_configuration_test.cc.

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435696#comment-15435696
 ] 

Anatoli Shein commented on HDFS-10754:
--

I am moving the find tool into this jira from HDFS-10679 now because it shares 
code with the rest of the tools.

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

2016-08-24 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10679:
-
Resolution: Resolved
Status: Resolved  (was: Patch Available)

It is being reviewed as part of HDFS-10754 now.

> libhdfs++: Implement parallel find with wildcards tool
> --
>
> Key: HDFS-10679
> URL: https://issues.apache.org/jira/browse/HDFS-10679
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10679.HDFS-8707.000.patch, 
> HDFS-10679.HDFS-8707.001.patch, HDFS-10679.HDFS-8707.002.patch, 
> HDFS-10679.HDFS-8707.003.patch, HDFS-10679.HDFS-8707.004.patch, 
> HDFS-10679.HDFS-8707.005.patch, HDFS-10679.HDFS-8707.006.patch, 
> HDFS-10679.HDFS-8707.007.patch, HDFS-10679.HDFS-8707.008.patch, 
> HDFS-10679.HDFS-8707.009.patch, HDFS-10679.HDFS-8707.010.patch, 
> HDFS-10679.HDFS-8707.011.patch, HDFS-10679.HDFS-8707.012.patch, 
> HDFS-10679.HDFS-8707.013.patch
>
>
> The find tool will issue the GetListing namenode operation on a given 
> directory, and filter the results using posix globbing library.
> If the recursive option is selected, for each returned entry that is a 
> directory the tool will issue another asynchronous call GetListing and repeat 
> the result processing in a recursive fashion.
> One implementation issue that needs to be addressed is the way how results 
> are returned back to the user: we can either buffer the results and return 
> them to the user in bulk, or we can return results continuously as they 
> arrive. While buffering would be an easier solution, returning results as 
> they arrive would be more beneficial to the user in terms of performance, 
> since the result processing can start as soon as the first results arrive 
> without any delay. In order to do that we need the user to use a loop to 
> process arriving results, and we need to send a special message back to the 
> user when the search is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.

2016-08-24 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10754:
-
Summary: libhdfs++: Create tools directory and implement hdfs_cat, 
hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.  (was: libhdfs++: Create 
tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, and hdfs_chmod)

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> ---
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

2016-08-24 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435692#comment-15435692
 ] 

Anatoli Shein commented on HDFS-10679:
--

Thanks for the review, [~bobhansen].

I will close this jira now since I am moving the changes from here to 
HDFS-10754.

I have addressed your comments as follows:

FS::Find:
* Async - Callback deliver results with a const std::vector &, not a 
shared_ptr. This is a signal to the consumer to use the data delivered during 
the callback, but don't use the passed-in container.
(/) Done, also fixed GetListing Async.
* Likewise, the synchronous call should take a non-const std::vector 
* for an output parameter, signaling to the consumer that we are going to 
mutate their input vector
(/) Done, also fixed GetListing Sync.
* We need a very clear threading model. Will the handler be called concurrently 
from multiple threads (currently, yes. If we ever get on asio fibers, we should 
make it a no, because we love our consumers)
(i) I agree. We might need to make a jira for that.
* We're doing a lot of dynamic memory allocation during recursion. Could we 
restructure things a little to not copy the entirety of the FindState and 
RecursionState on each call? It appears that they each have one element that is 
being updated for each recursive call
(/) I separated the state into CurrentState and SharedState. SharedState is 
never copied now.
* We need to hold the lock while incrementing the recursion_counter also
(i) recursion_counter is atomic and in our case increments are never paired 
with read accessed, so they do not need locking.
* If the handler returns false (don't want more) at the end of the function, do 
we do anything to prevent more from being delivered? Should we push that into 
the shared find_state and bail out for any subsequent NN responses?
(/) I added a variable "aborted" that stopps recursion when user does not want 
anymore.
find.cpp:
* Like the cat examples, simplify as much as possible. Nuke URI parsing, etc.
(/) Done.
* Expand smth_found to something_found to prevent confusion (especially in an 
example)
(/) Done.
* We have race conditions if one thread is outputting the previous block while 
another thread gets a final block (or error).
(/) Fixed by locking the handler.
FS::GetFileInfo should populate the full_path member also
(/) Done.

> libhdfs++: Implement parallel find with wildcards tool
> --
>
> Key: HDFS-10679
> URL: https://issues.apache.org/jira/browse/HDFS-10679
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10679.HDFS-8707.000.patch, 
> HDFS-10679.HDFS-8707.001.patch, HDFS-10679.HDFS-8707.002.patch, 
> HDFS-10679.HDFS-8707.003.patch, HDFS-10679.HDFS-8707.004.patch, 
> HDFS-10679.HDFS-8707.005.patch, HDFS-10679.HDFS-8707.006.patch, 
> HDFS-10679.HDFS-8707.007.patch, HDFS-10679.HDFS-8707.008.patch, 
> HDFS-10679.HDFS-8707.009.patch, HDFS-10679.HDFS-8707.010.patch, 
> HDFS-10679.HDFS-8707.011.patch, HDFS-10679.HDFS-8707.012.patch, 
> HDFS-10679.HDFS-8707.013.patch
>
>
> The find tool will issue the GetListing namenode operation on a given 
> directory, and filter the results using posix globbing library.
> If the recursive option is selected, for each returned entry that is a 
> directory the tool will issue another asynchronous call GetListing and repeat 
> the result processing in a recursive fashion.
> One implementation issue that needs to be addressed is the way how results 
> are returned back to the user: we can either buffer the results and return 
> them to the user in bulk, or we can return results continuously as they 
> arrive. While buffering would be an easier solution, returning results as 
> they arrive would be more beneficial to the user in terms of performance, 
> since the result processing can start as soon as the first results arrive 
> without any delay. In order to do that we need the user to use a loop to 
> process arriving results, and we need to send a special message back to the 
> user when the search is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435658#comment-15435658
 ] 

Wei-Chiu Chuang commented on HDFS-10788:


Thanks [~kshukla] for confirming my guess. I also traced the code and found 
{{ClientProtocol.getBlockLocations}} indirectly calls 
{{BlockManager#createLocatedBlocks}}. CDH5.5.2 is GA before Hadoop 2.7.3 so it 
does not have the fix HDFS-9985.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-24 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435642#comment-15435642
 ] 

Zhe Zhang commented on HDFS-8901:
-

Sorry about the delay. I just finished reviewing the patch. It needs a rebase, 
most likely due to HDFS-8905.
# The below code can be moved to {{BlockReaderUtil}}:
{code}
+//Behave exactly as the readAll() call
+ByteBuffer tmp = buf.duplicate();
+tmp.limit(tmp.position() + len);
+tmp = tmp.slice();
...
{code}
# This is a little confusing and actually IntelliJ can't find where {{ret}} is 
declared. Can we change to regular assignments?
{code}
int nread = 0, ret;
{code}
# Just to clarify, is the plan to add a {{ByteBuffer}} version of the below 
positional read public API?
{code}
 public int read(long position, byte[] buffer, int offset, int length)
{code}
# Why was this not needed before but is needed now?
{code}
+  strategy.getReadBuffer().flip();
{code}
# Below formatting looks odd. Maybe an IDE setting issue?
{code}
+  private static void calculateChunkPositionsInBuf(int cellSize,
+   AlignedStripe[] stripes,
+   StripingCell[] cells,
+   ByteBuffer bu
{code}
# So with the changes in {{DFSInputStream}} and {{DFSStripedInputStream}}, 
every {{StripingChunk}} should use {{ByteBuffer}} instead of byte array right? 
Can we unify the usage of {{byteBuffer}} and {{chunkBuffer}}? Looks like 
{{ChunkByteBuffer}} can just handle a special case of a single {{ByteBuffer}}?
# Just to clarify, we will improve performance of the below operation only 
after implementing the new public API for {{ByteBuffer}}-based positional read 
right? Because otherwise {{chunk.getChunkBuffer}} will be an indirect buffer 
backed by a regular on-heap array.
{code}
-chunk.copyTo(decodeInputs[i]);
+if (chunk.useChunkBuffer()) {
+  chunk.getChunkBuffer().copyTo(decodeInputs[i]);
+}
{code}

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Youwei Wang
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, HDFS-8901.v14.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10772) Reduce byte/string conversions for get listing

2016-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435640#comment-15435640
 ] 

Hudson commented on HDFS-10772:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10339 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10339/])
HDFS-10772. Reduce byte/string conversions for get listing. Contributed 
(kihwal: rev a1f3293762dddb0ca953d1145f5b53d9086b25b8)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java


> Reduce byte/string conversions for get listing
> --
>
> Key: HDFS-10772
> URL: https://issues.apache.org/jira/browse/HDFS-10772
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-10772.patch
>
>
> {{FSDirectory.getListingInt}} does a byte/string conversion for the byte[] 
> startAfter just to determine if it should be resolved as an inode path.  This 
> is not the common case but rather for NFS support so it should be avoided.  
> When the resolution is necessary the conversions may be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10768) Optimize mkdir ops

2016-08-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435633#comment-15435633
 ] 

Kihwal Lee commented on HDFS-10768:
---

The patch can apply now. Kicked a precommit build.

> Optimize mkdir ops
> --
>
> Key: HDFS-10768
> URL: https://issues.apache.org/jira/browse/HDFS-10768
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10768.patch
>
>
> Directory creation causes excessive object allocation: ex. an immutable list 
> builder, containing the string of components converted from the IIP's 
> byte[]s, sublist views of the string list, iterable, followed by string to 
> byte[] conversion.  This can all be eliminated by accessing the component's 
> byte[] in the IIP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435626#comment-15435626
 ] 

Hadoop QA commented on HDFS-10742:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
50s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be 
atomic in org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:may not be atomic in 
org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:[line 162] |
|  |  Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be 
atomic in org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:may not be atomic in 
org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:[line 175] |
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825311/HDFS-10742.006.patch |
| JIRA Issue | HDFS-10742 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ec48519e1d32 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 3476156 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| findbugs | 

[jira] [Updated] (HDFS-10772) Reduce byte/string conversions for get listing

2016-08-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10772:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

> Reduce byte/string conversions for get listing
> --
>
> Key: HDFS-10772
> URL: https://issues.apache.org/jira/browse/HDFS-10772
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-10772.patch
>
>
> {{FSDirectory.getListingInt}} does a byte/string conversion for the byte[] 
> startAfter just to determine if it should be resolved as an inode path.  This 
> is not the common case but rather for NFS support so it should be avoided.  
> When the resolution is necessary the conversions may be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435620#comment-15435620
 ] 

Kuhu Shukla commented on HDFS-10788:


[~jfield]. This is fixed in 2.7.3 through HDFS-9958. I am guessing this is from 
2.6 HDFS as you mentioned in the affected version. Hope this helps.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10772) Reduce byte/string conversions for get listing

2016-08-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435606#comment-15435606
 ] 

Kihwal Lee commented on HDFS-10772:
---

+1 it looks straight-forward.

> Reduce byte/string conversions for get listing
> --
>
> Key: HDFS-10772
> URL: https://issues.apache.org/jira/browse/HDFS-10772
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10772.patch
>
>
> {{FSDirectory.getListingInt}} does a byte/string conversion for the byte[] 
> startAfter just to determine if it should be resolved as an inode path.  This 
> is not the common case but rather for NFS support so it should be avoided.  
> When the resolution is necessary the conversions may be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435578#comment-15435578
 ] 

Wei-Chiu Chuang commented on HDFS-10788:


I suspect it's HDFS-9958.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10788) fsck NullPointerException when it encounters corrupt replicas

2016-08-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435537#comment-15435537
 ] 

Kihwal Lee commented on HDFS-10788:
---

[~kshukla], I think one of your jiras fixed it. Please verify.

> fsck NullPointerException when it encounters corrupt replicas
> -
>
> Key: HDFS-10788
> URL: https://issues.apache.org/jira/browse/HDFS-10788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CDH5.5.2, CentOS 6.7
>Reporter: Jeff Field
>
> Somehow (I haven't found root cause yet) we ended up with blocks that have 
> corrupt replicas where the replica count is inconsistent between the blockmap 
> and the corrupt replicas map. If we try to hdfs fsck any parent directory 
> that has a child with one of these blocks, fsck will exit with something like 
> this:
> {code}
> $ hdfs fsck /path/to/parent/dir/ | egrep -v '^\.+$'
> Connecting to namenode via http://mynamenode:50070
> FSCK started by bot-hadoop (auth:KERBEROS_SSL) from /10.97.132.43 for path 
> /path/to/parent/dir/ at Tue Aug 23 20:34:58 UTC 2016
> .FSCK 
> ended at Tue Aug 23 20:34:59 UTC 2016 in 1098 milliseconds
> null
> Fsck on path '/path/to/parent/dir/' FAILED
> {code}
> So I start at the top, fscking every subdirectory until I find one or more 
> that fails. Then I do the same thing with those directories (our top level 
> directories all have subdirectories with date directories in them, which then 
> contain the files) and once I find a directory with files in it, I run a 
> checksum of the files in that directory. When I do that, I don't get the name 
> of the file, rather I get:
> checksum: java.lang.NullPointerException
> but since the files are in order, I can figure it out by seeing which file 
> was before the NPE. Once I get to this point, I can see the following in the 
> namenode log when I try to checksum the corrupt file:
> 2016-08-23 20:24:59,627 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent 
> number of corrupt replicas for blk_1335893388_1100036319546 blockMap has 0 
> but corrupt replicas map has 1
> 2016-08-23 20:24:59,627 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 23 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 
> 192.168.1.100:47785 Call#1 Retry#0
> java.lang.NullPointerException
> At which point I can delete the file, but it is a very tedious process.
> Ideally, shouldn't fsck be able to emit the name of the file that is the 
> source of the problem - and (if -delete is specified) get rid of the file, 
> instead of exiting without saying why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10784) Implement WebHdfsFileSystem#listStatusIterator

2016-08-24 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435511#comment-15435511
 ] 

Andrew Wang commented on HDFS-10784:


Thanks for taking a look Xiao. FWIW, the limitation of starting the listing at 
an offset is a function of the Java API, but not the REST API itself. Since Hue 
doesn't use the Java API, it can implement an "iterator at offset" if it so 
desires. We could implement it in the Java API too if that's desired, but I'm 
hesitant to widen the FileSystem API any further.

> Implement WebHdfsFileSystem#listStatusIterator
> --
>
> Key: HDFS-10784
> URL: https://issues.apache.org/jira/browse/HDFS-10784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-10784.001.patch
>
>
> It would be nice to implement the iterative listStatus in WebHDFS so client 
> apps do not need to buffer the full file list for large directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10784) Implement WebHdfsFileSystem#listStatusIterator

2016-08-24 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435496#comment-15435496
 ] 

Xiao Chen commented on HDFS-10784:
--

Thanks Andrew for working on this, and Brahma for bringing up HDFS-9366. Patch 
itself looks good.

The feature the 2 jiras want to solve looks very alike - allowing some 
pagination when listing. Implementation is a little different:
- HDFS-10784 implements {{RemoteIterator}}, so adds a new interface 
{{listStatusIterator}}. 
- HDFS-9366 overloads current {{listStatus}}, with customized offset and size 
parameters.

IMHO, HDFS-10784 is cleaner and more flexible, hence easier to use when user 
wants to iterate the whole listing. HDFS-9366 could have less end-to-end trips 
when listing with a starting offset not at the beginning.
Looking at {{DistributedFileSystem}}, an iterator would be more consistent with 
hdfs context. I don't have a strong opinion, maybe we should ask our user 
[~romainr] since both jira seems to aim at Hue.

I think we should combine this and HDFS-9366 after agreement, and add 
documentation. Would be great if httpfs is supported too.

> Implement WebHdfsFileSystem#listStatusIterator
> --
>
> Key: HDFS-10784
> URL: https://issues.apache.org/jira/browse/HDFS-10784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-10784.001.patch
>
>
> It would be nice to implement the iterative listStatus in WebHDFS so client 
> apps do not need to buffer the full file list for large directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10608) Include event for AddBlock in Inotify Event Stream

2016-08-24 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435492#comment-15435492
 ] 

Surendra Singh Lilhore commented on HDFS-10608:
---

Thanks [~churromorales] for patch..

Comments for V3 :
1. 
{code}
+private Long lastBlockIdLen;
{code}
This is not required. we cannot provide the length for last block, since we 
don't know how many bytes the client will write.

2. For client block ID is nothing but block name (blk_xxx), so set 
block name in {{penultimateBlockId}} and {{lastBlockId}}.

   If you see the shell commands like {{fsck}}, there we will pass the block 
name from the command prompt for block related query.

> Include event for AddBlock in Inotify Event Stream
> --
>
> Key: HDFS-10608
> URL: https://issues.apache.org/jira/browse/HDFS-10608
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: churro morales
>Priority: Minor
> Attachments: HDFS-10608.patch, HDFS-10608.v1.patch, 
> HDFS-10608.v2.patch, HDFS-10608.v3.patch
>
>
> It would be nice to have an AddBlockEvent in the INotify pipeline.  Based on 
> discussions from mailing list:
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201607.mbox/%3C1467743792.4040080.657624289.7BE240AD%40webmail.messagingengine.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-8986:
--
Fix Version/s: (was: 2.9.0)

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-24 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10742:
--
Attachment: HDFS-10742.006.patch

The findbugs still complains the issue, slightly modified the code to address 
it.

> Measurement of lock held time in FsDatasetImpl
> --
>
> Key: HDFS-10742
> URL: https://issues.apache.org/jira/browse/HDFS-10742
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, 
> HDFS-10742.006.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435442#comment-15435442
 ] 

Hadoop QA commented on HDFS-10742:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
47s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m  
8s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be 
atomic in org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:may not be atomic in 
org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:[line 163] |
|  |  Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be 
atomic in org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:may not be atomic in 
org.apache.hadoop.hdfs.InstrumentedLock.check(long, long)  At 
InstrumentedLock.java:[line 179] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825294/HDFS-10742.005.patch |
| JIRA Issue | HDFS-10742 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux be3739531a9a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6fa9bf4 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16527/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
 |
|  Test 

[jira] [Updated] (HDFS-10777) DataNode should report volume failures if DU cannot access files

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10777:
---
Attachment: HDFS-10777.01.patch

v01: a proof of concept. Whenever DU gets an IOException with message of 
"cannot access (.*): Input/output error", set up a flag. If the flag is on, 
DiskCheck thread scans all directories under the volume.

Given that we are scanning the entire volume only in this specific case, rather 
than scanning blindly, I feel the performance impact should be acceptable.

> DataNode should report volume failures if DU cannot access files
> ---
>
> Key: HDFS-10777
> URL: https://issues.apache.org/jira/browse/HDFS-10777
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10777.01.patch
>
>
> HADOOP-12973 refactored DU and makes it pluggable. The refactory has a 
> side-effect that if DU encounters an exception, the exception is caught, 
> logged and ignored, essentially fixes HDFS-9908 (in which case runaway 
> exceptions prevent DataNodes from handshaking with NameNodes).
> However, this "fix" is not good, in the sense that if the disk is bad, there 
> is no immediate action made by the DataNode other than logging the exception. 
> Existing {{FsDatasetSpi#checkDataDir}} has been reduced to only check a few 
> number of directories blindly. If a disk goes bad, it is often possible that 
> only a few files are bad initially and that by checking only a small number 
> of directories it is easy to overlook the degraded disk.
> I propose: in addition to logging the exception, DataNode should proactively 
> verify the files are not accessible, remove the volume, and make the failure 
> visible by showing it in JMX, so that administrators can spot the failure via 
> monitoring systems.
> A different fix, based on HDFS-9908, is needed before Hadoop 2.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10636) Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File.

2016-08-24 Thread Virajith Jalaparti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-10636:
--
Attachment: HDFS-10636.006.patch

Posting an updated patch based on [~eddyxu]'s comments (also rebased on most 
recent trunk). 

> Modify ReplicaInfo to remove the assumption that replica metadata and data 
> are stored in java.io.File.
> --
>
> Key: HDFS-10636
> URL: https://issues.apache.org/jira/browse/HDFS-10636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-10636.001.patch, HDFS-10636.002.patch, 
> HDFS-10636.003.patch, HDFS-10636.004.patch, HDFS-10636.005.patch, 
> HDFS-10636.006.patch
>
>
> Replace java.io.File related APIs from {{ReplicaInfo}}, and enable the 
> definition of new {{ReplicaInfo}} sub-classes whose metadata and data can be 
> present on external storages (HDFS-9806). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-24 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435324#comment-15435324
 ] 

Chen Liang commented on HDFS-10742:
---

Thanks [~arpitagarwal]!

Updated a patch to fix style. findbugs reports a potentially not atomic 
operation. But this is actually the desired behavior and it does not need to 
atomic. (Also, earlier patches had the same code and for some reason findbugs 
did not complain).

> Measurement of lock held time in FsDatasetImpl
> --
>
> Key: HDFS-10742
> URL: https://issues.apache.org/jira/browse/HDFS-10742
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10742) Measurement of lock held time in FsDatasetImpl

2016-08-24 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10742:
--
Attachment: HDFS-10742.005.patch

> Measurement of lock held time in FsDatasetImpl
> --
>
> Key: HDFS-10742
> URL: https://issues.apache.org/jira/browse/HDFS-10742
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha2
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, 
> HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch
>
>
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is 
> held by a thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in 
> {{FsDatasetImpl}}. In the future we can also consider replacing the lock with 
> a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8093) blk does not exist or is not under Constructionnull

2016-08-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8093:
--
Summary: blk does not exist or is not under Constructionnull  (was: BP does 
not exist or is not under Constructionnull)

> blk does not exist or is not under Constructionnull
> ---
>
> Key: HDFS-8093
> URL: https://issues.apache.org/jira/browse/HDFS-8093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: Centos 6.5
>Reporter: LINTE
>
> HDFS balancer run during several hours blancing blocs beetween datanode, it 
> ended by failing with the following error.
> getStoredBlock function return a null BlockInfo.
> java.io.IOException: Bad response ERROR for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from 
> datanode 192.168.0.18:1004
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
> 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 
> 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 
> 192.168.0.18:1004
> 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
> 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> 

[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435262#comment-15435262
 ] 

Xiao Chen commented on HDFS-8986:
-

Thanks [~jojochuang] for the reviews, and everyone for creating and thoughtful 
discussions on the issue.

I set the fix versions according to Wei-Chiu's comment above.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull

2016-08-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435275#comment-15435275
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8093:
---

Then your cluster probably doesn't have HDFS-9365 which was committed to 2.7.3.

> BP does not exist or is not under Constructionnull
> --
>
> Key: HDFS-8093
> URL: https://issues.apache.org/jira/browse/HDFS-8093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: Centos 6.5
>Reporter: LINTE
>
> HDFS balancer run during several hours blancing blocs beetween datanode, it 
> ended by failing with the following error.
> getStoredBlock function return a null BlockInfo.
> java.io.IOException: Bad response ERROR for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from 
> datanode 192.168.0.18:1004
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
> 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 
> 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 
> 192.168.0.18:1004
> 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
> 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> 

[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8986:

Fix Version/s: 3.0.0-alpha2
   2.9.0
   2.8.0

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-8986:
--
Release Note: Add a -x option for "hdfs -du" and "hdfs -count" commands to 
exclude snapshots from being calculated.  (was: Add a -x option to "hdfs -du" 
and "hdfs -count" command to excludes snapshots from being calculated.)

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-8986:
--
Release Note: Add a -x option to "hdfs -du" and "hdfs -count" command to 
excludes snapshots from being calculated.  (was: Add a -x option to "hdfs -du" 
command to excludes snapshots from being calculated.)

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-8986:
--
Release Note: Add a -x option to "hdfs -du" command to excludes snapshots 
from being calculated.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435207#comment-15435207
 ] 

Wei-Chiu Chuang commented on HDFS-8986:
---

+1 on branch-2 patch. Test failures are not reproducible in my local tree 
(except for TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten, which is a 
known flaky test)

Committed to branch-2 and branch-2.8 Thanks [~ggop] for reporting the bug, 
[~jagadesh.kiran], [~qwertymaniac], [~cnauroth] for commenting and  [~xiaochen] 
for the patch.

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-08-24 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-8986:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch, 
> HDFS-8986.03.patch, HDFS-8986.04.patch, HDFS-8986.05.patch, 
> HDFS-8986.06.patch, HDFS-8986.07.patch, HDFS-8986.branch-2.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-08-24 Thread Xinwei Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435188#comment-15435188
 ] 

Xinwei Qin  commented on HDFS-7859:
---

Sorry for attaching the wrong patch, not the latest one, I will correct it 
tomorrow morning.

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, 
> HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, 
> HDFS-7859.008.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435140#comment-15435140
 ] 

Hadoop QA commented on HDFS-7859:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
57s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 57s{color} | 
{color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 57s{color} 
| {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project: The patch generated 3 new + 
1220 unchanged - 2 fixed = 1223 total (was 1222) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
30s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new 
+ 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
|  |  Class 
org.apache.hadoop.hdfs.protocol.datatransfer.ReplaceDatanodeOnFailure$Policy 
defines non-transient non-serializable instance field condition  In 
ReplaceDatanodeOnFailure.java:instance field condition  In 
ReplaceDatanodeOnFailure.java |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825274/HDFS-7859.008.patch |
| JIRA Issue | HDFS-7859 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  javadoc  
mvninstall  findbugs  checkstyle  |
| uname | Linux 4340414684b8 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 

[jira] [Commented] (HDFS-10763) Open files can leak permanently due to inconsistent lease update

2016-08-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435124#comment-15435124
 ] 

Kihwal Lee commented on HDFS-10763:
---

The one thing I had to do in the latest patch for branch-2.7 was to maintain 
whatever the snapshot code was doing against deleted files in snapshots. If it 
leaks UC features, it will continue to leak. If they don't, there will be no 
leak with the patch either.  So I think it is safe for branch-2.6 as well.

> Open files can leak permanently due to inconsistent lease update
> 
>
> Key: HDFS-10763
> URL: https://issues.apache.org/jira/browse/HDFS-10763
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.6.4
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-10763.br27.patch, 
> HDFS-10763.branch-2.7.supplement.patch, HDFS-10763.branch-2.7.v2.patch, 
> HDFS-10763.patch
>
>
> This can heppen during {{commitBlockSynchronization()}} or a client gives up 
> on closing a file after retries.
> From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and 
> then the inode is turned into the closed state. But if any block is not in 
> COMPLETE state, 
> {{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will 
> cause the lease is removed from the lease manager, but not from the inode. 
> Since the lease manager does not have a lease for the file, no lease recovery 
> will happen for this file. Moreover, this broken state is persisted and 
> reconstructed through saving and loading of fsimage. Since no replication is 
> scheduled for the blocks for the file, this can cause a data loss and also 
> block decommissioning of datanode.
> The lease cannot be manually recovered either. It fails with
> {noformat}
> ...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 
> on
>  0.0.0.1 because the file is under construction but no leases found.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)
> ...
> {noformat}
> When a client retries {{close()}}, the same inconsistent state is created, 
> but it can work in the next time since {{checkLease()}} only looks at the 
> inode, not the lease manager in this case. The close behavior is different if 
> HDFS-8999 is activated by setting 
> {{dfs.namenode.file.close.num-committed-allowed}} to 1 (unlikely) or 2 
> (never). 
> In principle, the under-construction feature of an inode and the lease in the 
> lease manager should never go out of sync. The fix involves two parts.
> 1) Prevent inconsistent lease updates. We can achieve this by calling 
> {{removeLease()}} after checking the block state. 
> 2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone 
> does not correct the existing inconsistencies surviving through fsimages.  
> This can be done during fsimage loading time by making sure a corresponding 
> lease exists for each inode that are with the underconstruction feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435117#comment-15435117
 ] 

Hadoop QA commented on HDFS-10782:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 148 unchanged - 0 fixed = 151 total (was 148) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m  
6s{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_111. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_101 Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825264/HDFS-10782-branch-2.001.patch
 |
| JIRA Issue | HDFS-10782 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 9d1f95f94b23 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 

[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9696:
-
Attachment: HDFS-9696.branch-2.6.patch

I think it is worth having in branch-2.6. We would if we were still on 2.6. 
Attaching a patch for 2.6.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-9696.branch-2.6.patch, HDFS-9696.patch, 
> HDFS-9696.v2.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-08-24 Thread Xinwei Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435092#comment-15435092
 ] 

Xinwei Qin  commented on HDFS-7859:
---

Attach the new patch to fix the only TestOfflineEditsViewer failure. Checkstyle 
and Findbugs results are not relate to this issue.

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, 
> HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, 
> HDFS-7859.008.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-08-24 Thread Xinwei Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  updated HDFS-7859:
--
Attachment: HDFS-7859.008.patch

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, 
> HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, 
> HDFS-7859.008.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull

2016-08-24 Thread Max Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435064#comment-15435064
 ] 

Max Schmidt commented on HDFS-8093:
---

I am still facing this issue on my namenode (just happened once while creating 
a file with a java client), from my namenode.log:

{code}
java.io.IOException: BP-1876130894-10.5.0.4-1469019082320:blk_1073787208_63449 
does not exist or is not under Constructionnull
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6238)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6305)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:804)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
{code}

Iam using hadoop 2.7.1 with the corresponding java libraries.

> BP does not exist or is not under Constructionnull
> --
>
> Key: HDFS-8093
> URL: https://issues.apache.org/jira/browse/HDFS-8093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.6.0
> Environment: Centos 6.5
>Reporter: LINTE
>
> HDFS balancer run during several hours blancing blocs beetween datanode, it 
> ended by failing with the following error.
> getStoredBlock function return a null BlockInfo.
> java.io.IOException: Bad response ERROR for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from 
> datanode 192.168.0.18:1004
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
> 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 
> 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 
> 192.168.0.18:1004
> 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
> exist or is not under Constructionnull
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877)
> at 

[jira] [Commented] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2016-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435007#comment-15435007
 ] 

Hudson commented on HDFS-8905:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10335 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10335/])
HDFS-8905. Refactor DFSInputStream#ReaderStrategy. Contributed by Kai 
(kai.zheng: rev 793447f79924c97c2b562d5e41fa85adf19673fe)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsDataInputStream.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/IOUtilsClient.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ReadStatistics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ReaderStrategy.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestExternalBlockReader.java


> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Kai Zheng
>Assignee: SammiChen
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v10.patch, 
> HDFS-8905-v11.patch, HDFS-8905-v12.patch, HDFS-8905-v13.patch, 
> HDFS-8905-v14.patch, HDFS-8905-v15.patch, HDFS-8905-v2.patch, 
> HDFS-8905-v3.patch, HDFS-8905-v4.patch, HDFS-8905-v5.patch, 
> HDFS-8905-v6.patch, HDFS-8905-v7.patch, HDFS-8905-v8.patch, HDFS-8905-v9.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10762) Pass IIP for file status related methods

2016-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434984#comment-15434984
 ] 

Hudson commented on HDFS-10762:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10334 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10334/])
HDFS-10762. Pass IIP for file status related methods (daryn: rev 
ec252ce0fc0998ce13f31af3440c08a236328e5a)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAppendOp.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodesInPath.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReservedRawPaths.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java


> Pass IIP for file status related methods
> 
>
> Key: HDFS-10762
> URL: https://issues.apache.org/jira/browse/HDFS-10762
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-10762.1.patch, HDFS-10762.patch
>
>
> The frequently called file status methods will not require path re-resolves 
> if the IIP is passed down the call stack.  The code can be simplified further 
> if the IIP tracks if the original path was a reserved raw path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2016-08-24 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8905:

  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 3.0.0-alpha1
Target Version/s:   (was: )
  Status: Resolved  (was: Patch Available)

Committed to 3.0.0 and trunk branches. Thanks [~Sammi] for the contribution, 
[~zhz] and [~rakeshr] for the reviewing!

> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Kai Zheng
>Assignee: SammiChen
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v10.patch, 
> HDFS-8905-v11.patch, HDFS-8905-v12.patch, HDFS-8905-v13.patch, 
> HDFS-8905-v14.patch, HDFS-8905-v15.patch, HDFS-8905-v2.patch, 
> HDFS-8905-v3.patch, HDFS-8905-v4.patch, HDFS-8905-v5.patch, 
> HDFS-8905-v6.patch, HDFS-8905-v7.patch, HDFS-8905-v8.patch, HDFS-8905-v9.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10762) Pass IIP for file status related methods

2016-08-24 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10762:
---
   Resolution: Fixed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

committed to trunk & branch-2

> Pass IIP for file status related methods
> 
>
> Key: HDFS-10762
> URL: https://issues.apache.org/jira/browse/HDFS-10762
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HDFS-10762.1.patch, HDFS-10762.patch
>
>
> The frequently called file status methods will not require path re-resolves 
> if the IIP is passed down the call stack.  The code can be simplified further 
> if the IIP tracks if the original path was a reserved raw path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Status: Patch Available  (was: Open)

re-submit patch.

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Status: Open  (was: Patch Available)

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Attachment: (was: HDFS-10782-branch-2.001.patch)

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Attachment: HDFS-10782-branch-2.001.patch

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch, 
> HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434756#comment-15434756
 ] 

Hadoop QA commented on HDFS-10782:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} HDFS-10782 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825240/HDFS-10782-branch-2.001.patch
 |
| JIRA Issue | HDFS-10782 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16524/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Labels: patch  (was: )
Status: Patch Available  (was: Open)

NN returns uncached blocks for #getBlocks first,if totalSize is not meet the 
requirements, then add some cached blocks.

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
>  Labels: patch
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434723#comment-15434723
 ] 

Hadoop QA commented on HDFS-8905:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-hdfs-project: The patch generated 0 new + 130 
unchanged - 15 fixed = 130 total (was 145) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 
42s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825237/HDFS-8905-v15.patch |
| JIRA Issue | HDFS-8905 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 00d949c0c313 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 092b4d5 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16523/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16523/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Refactor DFSInputStream#ReaderStrategy
> --
>
>  

[jira] [Commented] (HDFS-9337) Should check required params in WebHDFS to avoid NPE

2016-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434703#comment-15434703
 ] 

Hadoop QA commented on HDFS-9337:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12772186/HDFS-9337_03.patch |
| JIRA Issue | HDFS-9337 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1e92bec30e3d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 092b4d5 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16522/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16522/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16522/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Should check required params in WebHDFS to avoid NPE
> 
>
> Key: HDFS-9337
> URL: 

[jira] [Updated] (HDFS-10782) Decrease memory frequent exchange of Centralized Cache Management when run balancer

2016-08-24 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10782:
---
Attachment: HDFS-10782-branch-2.001.patch

submit patch for branch-2.7

> Decrease memory frequent exchange of Centralized Cache Management when run 
> balancer
> ---
>
> Key: HDFS-10782
> URL: https://issues.apache.org/jira/browse/HDFS-10782
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, caching
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
> Attachments: HDFS-10782-branch-2.001.patch
>
>
> CachedBlocks currently are transparent for Balancer when active feature of 
> centralized cache management. This makes DataNode exchange memory frequently, 
> because Balancer does not Distinguish CachedBlock from blocks, so it may 
> trigger mount of cached/uncached ops.
> I think namenode should avoid return CacheBlocks as much as possible when 
> Balanacer#getblocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2016-08-24 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-8905:

Attachment: HDFS-8905-v15.patch

Rebased against the latest trunk code

> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Kai Zheng
>Assignee: SammiChen
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v10.patch, 
> HDFS-8905-v11.patch, HDFS-8905-v12.patch, HDFS-8905-v13.patch, 
> HDFS-8905-v14.patch, HDFS-8905-v15.patch, HDFS-8905-v2.patch, 
> HDFS-8905-v3.patch, HDFS-8905-v4.patch, HDFS-8905-v5.patch, 
> HDFS-8905-v6.patch, HDFS-8905-v7.patch, HDFS-8905-v8.patch, HDFS-8905-v9.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9337) Should check required params in WebHDFS to avoid NPE

2016-08-24 Thread Jagadesh Kiran N (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434558#comment-15434558
 ] 

Jagadesh Kiran N commented on HDFS-9337:


[~walter.k.su] &  [~vinayrpet]  Sorry for the delay was busy with other work, I 
will update the patch soon 

> Should check required params in WebHDFS to avoid NPE
> 
>
> Key: HDFS-9337
> URL: https://issues.apache.org/jira/browse/HDFS-9337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jagadesh Kiran N
>Assignee: Jagadesh Kiran N
> Attachments: HDFS-9337_00.patch, HDFS-9337_01.patch, 
> HDFS-9337_02.patch, HDFS-9337_03.patch
>
>
> {code}
>  curl -i -X PUT 
> "http://10.19.92.127:50070/webhdfs/v1/kiran/sreenu?op=RENAMESNAPSHOT=SNAPSHOTNAME;
> {code}
> Null point exception will be thrown
> {code}
> {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException","message":null}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >