[jira] [Updated] (HDFS-9355) Add client API to get Datanode list based on storage type

2015-12-02 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9355:
-
Description: 
HDFS-2576 give option to hint namenode about favored nodes, but in 
heterogeneous cluster this will not work out. Suppose client wants to write his 
data in directory which have COLD policy, but he don't know which DN have 
ARCHIVE storage, So he will not able to give favoredNodes list. 


  was:
Through this feature client can give suggestion to HDFS to write his all the 
blocks on same set of datanodes. Currently this we can achieve through 
HDFS-2576. HDFS-2576 give option to hint namenode about favored nodes, but in 
heterogeneous cluster this will not work out. Support client wants to write his 
data in directory which have COLD policy, but he don't know which DN have 
ARCHIVE storage, So he will not able to give favoredNodes list. 


*Implementation*

Colocation can enable by setting "dfs.colocation.enable" true in client 
configuration. If colocation is enable and  favoredNodes list is empty then 
{{DataStreamer}} will set first set of datanodes as favoredNodes which is 
chosen for first block and subsequent block will use the same datanodes for 
write. Before closing file client can get the favoredNodes list and same he can 
use for writing new file.
 

 Issue Type: Improvement  (was: New Feature)
Summary: Add client API to get Datanode list based on storage type  
(was: Support colocation in HDFS.)

Updated jira summary and description based on [~nijel] suggestion..

> Add client API to get Datanode list based on storage type
> -
>
> Key: HDFS-9355
> URL: https://issues.apache.org/jira/browse/HDFS-9355
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>
> HDFS-2576 give option to hint namenode about favored nodes, but in 
> heterogeneous cluster this will not work out. Suppose client wants to write 
> his data in directory which have COLD policy, but he don't know which DN have 
> ARCHIVE storage, So he will not able to give favoredNodes list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-8791:
-
Priority: Blocker  (was: Critical)

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9486) Valgrind failures when using more than 1 io_service worker thread.

2015-12-02 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9486:
--
Attachment: HDFS-9486-stacks-sanitized.txt

Attached a set of stacks to give a snapshot of what things look like right 
before the invalid read.  This was done with 5 asio worker threads and 128 
threads doing small reads (12 byte file).

This only happens during disconnect.  I think it's likely things getting 
destroyed in the wrong order in HadoopFileSystem's destructor (happened before 
and looked similar) or an object explicitly deleting a pointer that also 
happens to be held by a member smart_ptr in some other object.

It seems to be very timing dependent, at least on my machine.  It usually shows 
up the first time I run valgrind with a cold FS cache and then doesn't appear 
in subsequent runs.

> Valgrind failures when using more than 1 io_service worker thread.
> --
>
> Key: HDFS-9486
> URL: https://issues.apache.org/jira/browse/HDFS-9486
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-9486-stacks-sanitized.txt
>
>
> Valgrind catches an invalid read of size 8.  Setup: 4 io_service worker 
> threads, 64 threads doing open-read-close on a small file.
> Stack:
> ==8351== Invalid read of size 8
> ==8351==at 0x51F45C: 
> asio::detail::reactive_socket_recv_op asio::detail::read_op asio::stream_socket_service >, asio::mutable_buffers_1, 
> asio::detail::transfer_all_t, std::_Bind asio::stream_socket_service > >::*)(std::error_code const&, 
> unsigned long)> 
> (hdfs::RpcConnectionImpl asio::stream_socket_service > >*, std::_Placeholder<1>, 
> std::_Placeholder<2>)> > >::do_complete(asio::detail::task_io_service*, 
> asio::detail::task_io_service_operation*, std::error_code const&, unsigned 
> long) (functional:601)
> ==8351==by 0x508B10: hdfs::IoServiceImpl::Run() 
> (task_io_service_operation.hpp:37)
> ==8351==by 0x55BCBEF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
> ==8351==by 0x5A2D181: start_thread (pthread_create.c:312)
> ==8351==by 0x5D3D47C: clone (clone.S:111)
> ==8351==  Address 0x67e3eb0 is 0 bytes inside a block of size 216 free'd
> ==8351==at 0x4C2C2BC: operator delete(void*) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8351==by 0x51F7B2: 
> hdfs::RpcConnectionImpl asio::stream_socket_service > >::~RpcConnectionImpl() 
> (rpc_connection.h:32)
> ==8351==by 0x50C104: hdfs::FileSystemImpl::~FileSystemImpl() 
> (unique_ptr.h:67)
> ==8351==by 0x503A10: hdfs::HadoopFileSystem::~HadoopFileSystem() 
> (unique_ptr.h:67)
> ==8351==by 0x503B28: hdfs::HadoopFileSystem::~HadoopFileSystem() 
> (hdfs_cpp.cc:140)
> ==8351==by 0x503580: hdfs_internal::~hdfs_internal() (unique_ptr.h:67)
> ==8351==by 0x502FEE: hdfsDisconnect (hdfs.cc:127)
> ==8351==by 0x5010B7: main (threaded_stress_test.cc:74)
> ==8351== 
> pure virtual method called
> terminate called without an active exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9468) DfsAdmin command set dataXceiver count for datanode

2015-12-02 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035934#comment-15035934
 ] 

Lin Yiqun commented on HDFS-9468:
-

Thanks [~kihwal] for comments. I changed my code as you said and I found it's 
easy to accomplish  an individual commands for specific purposes but I had met 
some problems for the generic refresh command.There are some my questions:
* Should I get the all datanodes in my cluster ? So I can refresh the 
configuration one by one.
* If I don't want to get all datanode address infos, the one way of refreshing 
configuration by heartbeat like {{setBalancerBandwidth}}, So we can do that 
like this?

> DfsAdmin command set dataXceiver count for datanode
> ---
>
> Key: HDFS-9468
> URL: https://issues.apache.org/jira/browse/HDFS-9468
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9468.001.patch
>
>
> Now in every datanode,concurrent xceivers count value are all set by 
> {{DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT}}.And if you want 
> to set this value for different values because some node has lower memory or 
> cores.Then you must to change the config and restart datanode.So may be we 
> can dynamic set dataxceiver count by a dfsadmin command and we can set value 
> for one or many specific nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9497) libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license

2015-12-02 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-9497:


 Summary: libhdfs++: move lib/proto/cpp_helpers to third-party 
since it won't have an ASF license
 Key: HDFS-9497
 URL: https://issues.apache.org/jira/browse/HDFS-9497
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Bob Hansen
Assignee: Bob Hansen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9497) libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license

2015-12-02 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9497:
-
Status: Patch Available  (was: Open)

> libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an 
> ASF license
> ---
>
> Key: HDFS-9497
> URL: https://issues.apache.org/jira/browse/HDFS-9497
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9497.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035938#comment-15035938
 ] 

Junping Du commented on HDFS-8791:
--

Mark this as blocker per comments from [~jrottinghuis] and [~kihwal].

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2015-12-02 Thread Jagadesh Kiran N (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035970#comment-15035970
 ] 

Jagadesh Kiran N commented on HDFS-8986:


no issues [~ggop] ,you can assign 

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Jagadesh Kiran N
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8705) BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in all locales

2015-12-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035945#comment-15035945
 ] 

Steve Loughran commented on HDFS-8705:
--

if it's not doing the case conversion, then that's a critical defect. Yes, fix 
it

> BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in 
> all locales
> 
>
> Key: HDFS-8705
> URL: https://issues.apache.org/jira/browse/HDFS-8705
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-8705-002.patch, HDFS-8705.patch
>
>
> Looking at {{BlockStoragePolicySuite.getPolicy(name)}}, is using 
> {{equalsIgnoreCase()}} to find a policy which matches a name.
> This will not work in all locales. It must use 
> {{toLowerCase(Locale.ENGLISH).equals(name)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9497) libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036059#comment-15036059
 ] 

Hadoop QA commented on HDFS-9497:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
46s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 38s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 35s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 16s {color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 8s {color} | 
{color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 427 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 15s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775320/HDFS-9497.HDFS-8707.000.patch
 |
| JIRA Issue | HDFS-9497 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux ccc8591be674 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d6d056d |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_66.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_91.txt
 |
| JDK v1.7.0_91  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13733/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13733/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Max memory used | 76MB |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13733/console |



[jira] [Commented] (HDFS-9441) Do not construct path string when choosing block placement targets

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035916#comment-15035916
 ] 

Kihwal Lee commented on HDFS-9441:
--

bq. It is actually a common practice. 
It makes sense if the method isn't going to interpret the argument and Object 
is the correct level of abstraction for those methods. If the path argument is 
strictly for logging in block placement policy, it might be an acceptable 
design. But if there is a possibility of an implementation interpreting the 
path to influence block placement, this is not desirable.  If we believe path 
should only be used for logging, we should document it.

bq. It may declare a third method with an Object parameter. This is the same 
solution as the patch!
It is not. The abstract class, {{BlockPlacementPolicy}}, has tighter type 
definitions and using Object will be internal to the implementation in this 
case. And it doesn't even have to use Object internally.  Along with the 
tighter type definition, the interpretation of the argument can be provided in 
the abstract class by adding methods such as
{code:java}
  String getFullPath(String path) {
return path;
  }
  String getFullPath(BlockCollection bc) {
return bc.getName();
  }
{code}



> Do not construct path string when choosing block placement targets
> --
>
> Key: HDFS-9441
> URL: https://issues.apache.org/jira/browse/HDFS-9441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h9441_20151118.patch, h9441_20151119.patch
>
>
> - INodeFile.getName() is expensive since it involves quite a few string 
> operations.  The method is called in both ReplicationWork and 
> ErasureCodingWork but the default BlockPlacementPolicy does not use the 
> returned string.  We should simply pass BlockCollection to reduce unnecessary 
> computation when using the default BlockPlacementPolicy.
> - Another improvement: the return type of FSNamesystem.getBlockCollection 
> should be changed to INodeFile since it always returns an INodeFile object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9497) libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license

2015-12-02 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9497:
-
Attachment: HDFS-9497.HDFS-8707.000.patch

Fix: Moved cpp_helper.h to third-party/protobuf and cleaned up references.

> libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an 
> ASF license
> ---
>
> Key: HDFS-9497
> URL: https://issues.apache.org/jira/browse/HDFS-9497
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9497.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9490) MiniDFSCluster should change block generation stamp via FsDatasetTestUtils

2015-12-02 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036109#comment-15036109
 ] 

Tony Wu commented on HDFS-9490:
---

Hi [~eddyxu], 

Thanks a lot for the quick review. I have updated the patch to address your 
comments. Please take a look at the v2 patch.

Regards,
Tony

> MiniDFSCluster should change block generation stamp via FsDatasetTestUtils
> --
>
> Key: HDFS-9490
> URL: https://issues.apache.org/jira/browse/HDFS-9490
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9490.001.patch, HDFS-9490.002.patch
>
>
> {{MiniDFSCluster#changeGenStampOfBlock}} directly manipulates the block meta 
> file to update the generation stamp. This depends on file based {{FsDataset}}.
> We can abstract the change generation stamp operation in 
> {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9490) MiniDFSCluster should change block generation stamp via FsDatasetTestUtils

2015-12-02 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9490:
--
Attachment: HDFS-9490.002.patch

In v2 patch:
* Addressed [~eddyxu]'s review comments by changing 
{{changeStoredGenerationStamp}} to be {{void}}.
* Updated relevant functions and tests for the change.

> MiniDFSCluster should change block generation stamp via FsDatasetTestUtils
> --
>
> Key: HDFS-9490
> URL: https://issues.apache.org/jira/browse/HDFS-9490
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9490.001.patch, HDFS-9490.002.patch
>
>
> {{MiniDFSCluster#changeGenStampOfBlock}} directly manipulates the block meta 
> file to update the generation stamp. This depends on file based {{FsDataset}}.
> We can abstract the change generation stamp operation in 
> {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9495) Data node opens random port for HTTPServer, not configurable

2015-12-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9495.
-
Resolution: Duplicate

Hello, [~neha.bathra].  This issue is tracked in HDFS-9049, so I'm resolving 
this one as a duplicate.

> Data node opens random port for HTTPServer, not configurable
> 
>
> Key: HDFS-9495
> URL: https://issues.apache.org/jira/browse/HDFS-9495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: neha
>
> Data node opens random port for HTTP Server which is not configurable 
> currently. Better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException

2015-12-02 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036139#comment-15036139
 ] 

Xiao Chen commented on HDFS-9429:
-

Thanks Colin for the review and commit! Also thanks Wei-Chiu and Zhe for the 
review.

> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -
>
> Key: HDFS-9429
> URL: https://issues.apache.org/jira/browse/HDFS-9429
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, 
> HDFS-9429.003.patch, HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my 
> understanding this is from {{setUpHaCluster}} so theoretically it could fail 
> for any cases in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool

2015-12-02 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036155#comment-15036155
 ] 

Zhe Zhang commented on HDFS-9496:
-

Thanks for the suggestion Hui. Pinging [~lirui] for comments.

> Erasure coding: an erasure codec throughput benchmark tool
> --
>
> Key: HDFS-9496
> URL: https://issues.apache.org/jira/browse/HDFS-9496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Hui Zheng
>
> We need a tool which can help us decide/benchmark an Erasure Codec and schema.
> Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe 
> we could simply add encode/decode operation to it or implement another tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036168#comment-15036168
 ] 

Chris Trezzo commented on HDFS-8791:


If I have my Twitter hat on, like [~jrottinghuis] said, we have already gained 
the benefits of this patch internally because we have back-ported it to our 
2.6.2 branch. From that perspective, I would be happy if this patch simply made 
it into the next branch-2 minor release.

On the other hand, if I have my community hat on, I am wondering how many 
hadoop users would want this patch and, if that group is large enough, what is 
the best way to get the patch to them on a stable release.

1. How many people would want this patch?: I think this will affect all hadoop 
clusters that have seen over 16 million blocks written to the entire cluster 
over its lifespan and are running ext4. As a reminder, data node startup time 
and potentially IO perf of user level containers will start to degrade before 
this point (as the directory structure grows, the impact becomes greater). I 
would say that most large hadoop users fall into this category. My guess is 
that a non-trivial number of production hadoop clusters for medium size users 
would fall into this category as well. [~andrew.wang] I am sure you would have 
a better sense for how many production clusters this would affect.

2. How do we get this patch out to users on a stable release?: I definitely 
understand the desire to avoid a layout change as part of a maintenance 
release, but I also think it would be nice to have a stable release that users 
could deploy with this patch. Here is one potential solution:
* Since 2.8 is cut but not released, rename the 2.8 branch to 2.9 and continue 
with the release schedule it is currently on.
* Cut a new 2.8 branch off of 2.7.3 and apply this patch to this "new" 2.8.
* Going forward:
** People that are adverse to making the layout change can continue doing 
maintenance releases on the 2.7 line. My guess is that this is a small group 
and that the 2.7 branch will essentially die.
** Maintenance releases can continue on the new 2.8 branch as they would have 
for the 2.7 branch. People that were on 2.7 should be able to easily move to 
2.8 because it is essentially a maintenance release plus the new layout.
* I would say that there is no need to back-port the layout change to the 2.6 
branch if we have a stable 2.8 that users can upgrade to.

With this scenario we get a stable release with the new layout (i.e. the new 
2.8 branch) and we avoid making a layout change in a maintenance release. 
Thoughts?

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using 

[jira] [Commented] (HDFS-9219) Even if permission is enabled in an environment, while resolving reserved paths there is no check on permission.

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036189#comment-15036189
 ] 

Hadoop QA commented on HDFS-9219:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 32s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 32s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 54s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12770038/HDFS-9219.3.patch |
| JIRA Issue | HDFS-9219 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 65e98a0fb2b9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 53e3bf7 |
| findbugs | v3.0.0 |
| 

[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036331#comment-15036331
 ] 

Andrew Wang commented on HDFS-8791:
---

So I won't block anything if the 2.6 or 2.7 RMs really want this included, but 
I do like Chris' proposal about a new 2.8 the most. I think it'd be very 
surprising to 2.6.2 users that 2.6.3 will do an upgrade. Our upstream compat 
guidelines leave this up in the air, but there's some expectation of being able 
to downgrade between maintenance releases, e.g. just swapping JARs back and 
forth.

I've also only seen one or two upgrade issues caused by the 256x256 layout, and 
a good chunk of Cloudera users are on it now. So there's a threshold where this 
kicks in which most Cloudera users aren't hitting. I think that's 
representative of small to medium sized Hadoop users.

Last few questions, for users who are already on the 256x256 layout and are 
affected by this issue, is the upgrade to 32x32 going to again be painful? This 
would also make me very wary of including this in a maintenance release. Does 
the same apply to finalizing the upgrade, as we rm -rf previous? These would be 
good details to have in the release notes.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9429) Tests in TestDFSAdminWithHA intermittently fail with EOFException

2015-12-02 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036152#comment-15036152
 ] 

Zhe Zhang commented on HDFS-9429:
-

[~cmccabe] Seems you forgot branch-2.8 commit?

> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -
>
> Key: HDFS-9429
> URL: https://issues.apache.org/jira/browse/HDFS-9429
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-9429.001.patch, HDFS-9429.002.patch, 
> HDFS-9429.003.patch, HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my 
> understanding this is from {{setUpHaCluster}} so theoretically it could fail 
> for any cases in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-1312) Re-balance disks within a Datanode

2015-12-02 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-1312:
---
Attachment: HDFS-9469-HDFS-1312.002.patch

* Fix the white space issue.
* Ignored the checkstyle issues since they are all of the this.x = x; where x 
hides a local name. They are from getters and setters.
* test failures don't seem to be related to this patch.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-1312) Re-balance disks within a Datanode

2015-12-02 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-1312:
---
Attachment: (was: HDFS-9469-HDFS-1312.002.patch)

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
>Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036476#comment-15036476
 ] 

Andrew Wang commented on HDFS-8791:
---

[~wheat9] [~kihwal] are y'all okay with [~ctrezzo]'s proposal? It gets a 
production release out, and avoids the aforementioned issues with non-monotonic 
layout versions, downgrade, and other expectations about maintenance releases 
(which include open questions around upgrade and finalize from [my last 
comment|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12845740=15036331]).

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-02 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036601#comment-15036601
 ] 

James Clampffer commented on HDFS-9448:
---

This looks good to me, but I'd like to get a +1 from [~aw] or somebody else who 
knows the docker infrastructure well.  Is checking the skip.valgrind.tests from 
maven sufficient or are modifications to the dockerfile still required?

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-12-02 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036603#comment-15036603
 ] 

Xiaoyu Yao commented on HDFS-8831:
--

Thanks [~arpit99] for the review and the detail feedbacks! I will address them 
in the next patch shortly.

> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036421#comment-15036421
 ] 

Joep Rottinghuis commented on HDFS-8791:


Thanks [~ctrezzo], that seems like a reasonable compromise.
Thanks for the additional data points [~andrew.wang], that gives at least some 
comfort that 2.6.x without the patch isn't completely dead for adoption 
(although still at risk).

Aside from 2.6.2->2.6.3 being a surprising layout upgrade with this patch in 
2.6.3, we would also have to make it clear that you would not be able to go 
from 2.6.3 to 2.7.1 because the layout version would go backwards.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-12-02 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9267:

Attachment: HDFS-9267.06.patch

Hi, [~cmccabe] 

Thanks a lot for the reviews and the understandings. 

In the newest patch, I changed the {{getStoredReplicas(...)}} to return a 
{{Iterator}} as you suggested.



> TestDiskError should get stored replicas through FsDatasetTestUtils.
> 
>
> Key: HDFS-9267
> URL: https://issues.apache.org/jira/browse/HDFS-9267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, 
> HDFS-9267.02.patch, HDFS-9267.03.patch, HDFS-9267.04.patch, 
> HDFS-9267.05.patch, HDFS-9267.06.patch
>
>
> {{TestDiskError#testReplicationError}} scans local directories to verify 
> blocks and metadata files, which leaks the details of {{FsDataset}} 
> implementation. 
> This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-9417:
---
Status: Patch Available  (was: Open)

Clicking submit patch so that jenkins fires off.

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9417.HDFS-8707.000.patch, 
> HDFS-9417.HDFS-8707.001.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9430) waitforloadingfsimage() can be removed since checknnstartup() already ensures image loaded and namenode started.

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036479#comment-15036479
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9430:
---

> My main question was around the order of fsimage loading and RPC server 
> startup. ...

I agree that the RPC server is fine.  We should check the web interface:
- WebHDFS is fine since it uses rpc server.
- FsckServlet may not be okay.
- Also need to check the html and js files.

> waitforloadingfsimage() can be removed since checknnstartup() already ensures 
> image loaded and namenode started.
> 
>
> Key: HDFS-9430
> URL: https://issues.apache.org/jira/browse/HDFS-9430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9430.patch
>
>
> After initial startup,loading fsimage in between happens only for 
> secondarynn..But it wn't server any client requests.
> So IMO it would be ok to remove it.
> Please check commement [here 
> |https://issues.apache.org/jira/browse/HDFS-9413?focusedCommentId=15005129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005129]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9430) waitforloadingfsimage() can be removed since checknnstartup() already ensures image loaded and namenode started.

2015-12-02 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036391#comment-15036391
 ] 

Ming Ma commented on HDFS-9430:
---

Thanks [~vinayrpet] for the explanation. Make sense to keep the 
{{FSNamesystem#imageLoaded}} get/setter around for quota check during edit log 
replay at NN startup.

My main question was around the order of fsimage loading and RPC server 
startup. In {NameNode}}'s {{initialize}} function, {{loadNamesystem}} sets 
{{imageLoaded}} to true and it is called before RPC server is started. Thus 
when RPC methods are processed, {{imageLoaded}} should have been set to true.

{noformat}
loadNamesystem(conf);
rpcServer = createRpcServer(conf);
{noformat}

I will wait until the end of the week for [~szetszwo] and [~wheat9] to provide 
any additional comments they might have before commit.

> waitforloadingfsimage() can be removed since checknnstartup() already ensures 
> image loaded and namenode started.
> 
>
> Key: HDFS-9430
> URL: https://issues.apache.org/jira/browse/HDFS-9430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9430.patch
>
>
> After initial startup,loading fsimage in between happens only for 
> secondarynn..But it wn't server any client requests.
> So IMO it would be ok to remove it.
> Please check commement [here 
> |https://issues.apache.org/jira/browse/HDFS-9413?focusedCommentId=15005129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005129]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036442#comment-15036442
 ] 

Haohui Mai commented on HDFS-8791:
--

bq. I've also only seen one or two upgrade issues caused by the 256x256 layout, 
and a good chunk of Cloudera users are on it now. So there's a threshold where 
this kicks in which most Cloudera users aren't hitting. I think that's 
representative of small to medium sized Hadoop users.

I have seen significant amount of Hortonwork customers haven hit this issue 
during upgrades.

I agree with [~kihwal] and other folks here that 2.6 / 2.7 are effectively 
unusable in some use cases without this fix. IMO the issue is significant 
enough that needs to be cherry-picked to active maintenance releases. How to 
ensure the upgrade story works and properly documenting them is a second order 
issue  compared to not having a usable release in production clusters.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9417:
-
Attachment: HDFS-9417.HDFS-8707.002.patch

New patch: should take care of everything except cpp_helper.h, which is being 
moved to third_party in HDFS-9497.

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9417.HDFS-8707.000.patch, 
> HDFS-9417.HDFS-8707.001.patch, HDFS-9417.HDFS-8707.002.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects

2015-12-02 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9144:
--
Attachment: HDFS-9144.HDFS-8707.006.patch

I took a stab at rebasing this to the head to get it set for committing.

2 Changes not due to rebasing:
-Add apache header to one of the CMakeLists, had to patch it manually so 
figured why not.
-Had to add virtual destructors to DataNodeConnection and 
DataNodeConnectionImpl to get HDFS-9559 to link.

> Refactor libhdfs into stateful/ephemeral objects
> 
>
> Key: HDFS-9144
> URL: https://issues.apache.org/jira/browse/HDFS-9144
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9144.HDFS-8707.001.patch, 
> HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, 
> HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, 
> HDFS-9144.HDFS-8707.006.patch
>
>
> In discussion for other efforts, we decided that we should separate several 
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for 
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and 
> other metadata about the file that we assume will not change over the life of 
> the file)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects

2015-12-02 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036541#comment-15036541
 ] 

James Clampffer commented on HDFS-9144:
---

And by HDFS-9559 I meant HDFS-9359.

> Refactor libhdfs into stateful/ephemeral objects
> 
>
> Key: HDFS-9144
> URL: https://issues.apache.org/jira/browse/HDFS-9144
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9144.HDFS-8707.001.patch, 
> HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, 
> HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, 
> HDFS-9144.HDFS-8707.006.patch
>
>
> In discussion for other efforts, we decided that we should separate several 
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for 
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and 
> other metadata about the file that we assume will not change over the life of 
> the file)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036407#comment-15036407
 ] 

Bob Hansen commented on HDFS-9417:
--

[~aw] - any idea why this isn't kicking off a Jenkins build?

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9417.HDFS-8707.000.patch, 
> HDFS-9417.HDFS-8707.001.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9430) waitforloadingfsimage() can be removed since checknnstartup() already ensures image loaded and namenode started.

2015-12-02 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036414#comment-15036414
 ] 

Haohui Mai commented on HDFS-9430:
--

It's okay to remove the {{waitForLoadingFsImage()}} function but I think it 
requires more refactoring to get things right.

bq. I think its fair to remove waitForLoadingFsImage() but leave the tracker 
FSNamesystem#imageLoaded and get/setter for it as is.

Today the paradigm is broken. We assume that when replaying editlogs that the 
NN ops will call functions (i.e., {{unprotected*}})that bypass all checks 
including quotas, etc. This is clearly broken for all callers of 
{{fsn.isImageLoaded()}} and {{fsd.shouldSkipQuotaChecks()}}. I think it's okay 
to file a follow-up jira to address this.

> waitforloadingfsimage() can be removed since checknnstartup() already ensures 
> image loaded and namenode started.
> 
>
> Key: HDFS-9430
> URL: https://issues.apache.org/jira/browse/HDFS-9430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9430.patch
>
>
> After initial startup,loading fsimage in between happens only for 
> secondarynn..But it wn't server any client requests.
> So IMO it would be ok to remove it.
> Please check commement [here 
> |https://issues.apache.org/jira/browse/HDFS-9413?focusedCommentId=15005129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005129]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036508#comment-15036508
 ] 

Bob Hansen commented on HDFS-9417:
--

Of course.  Thank you.  Is there a format that the patch has to correspond to 
to tickle Jenkins?  I assume attaching a word doc won't launch a CI build.

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9417.HDFS-8707.000.patch, 
> HDFS-9417.HDFS-8707.001.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen reassigned HDFS-9417:


Assignee: Bob Hansen  (was: Xiaobing Zhou)

> Clean up the RAT warnings in the HDFS-8707 branch.
> --
>
> Key: HDFS-9417
> URL: https://issues.apache.org/jira/browse/HDFS-9417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Bob Hansen
> Attachments: HDFS-9417.HDFS-8707.000.patch, 
> HDFS-9417.HDFS-8707.001.patch, HDFS-9417.HDFS-8707.002.patch
>
>
> Recent jenkins builds reveals that the pom.xml in the HDFS-8707 branch does 
> not currently exclude third-party files. The RAT plugin generates warnings as 
> these files do not have Apache headers.
> The warnings need to be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9469) DiskBalancer : Add Planner

2015-12-02 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9469:
---
Attachment: HDFS-9469-HDFS-1312.002.patch

* Fix the white space issue.
* Ignored the checkstyle issues since they are all of "this.x = x"; where x 
hides a local name. They are from getters and setters.
* test failures don't seem to be related to this patch.

> DiskBalancer : Add Planner 
> ---
>
> Key: HDFS-9469
> URL: https://issues.apache.org/jira/browse/HDFS-9469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9469-HDFS-1312.001.patch, 
> HDFS-9469-HDFS-1312.002.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9490) MiniDFSCluster should change block generation stamp via FsDatasetTestUtils

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036455#comment-15036455
 ] 

Hadoop QA commented on HDFS-9490:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 170m 47s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestRecoverStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775332/HDFS-9490.002.patch |
| JIRA Issue | HDFS-9490 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit 

[jira] [Commented] (HDFS-9430) waitforloadingfsimage() can be removed since checknnstartup() already ensures image loaded and namenode started.

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036457#comment-15036457
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9430:
---

waitForLoadingFSImage() and checkNNStartup() safeguards FSNamesystem and 
NameNodeRpcServer, respectively.  In namenode, waitForLoadingFSImage() is not 
needed as mentioned.

However, FSNamesystem is also used in SecondaryNameNode and BackupNode and they 
don't have checkNNStartup().  We may forget about BackupNode since it is not 
working for a long time anyway.  We need to make sure SecondaryNameNode still 
work correctly after the change.

> waitforloadingfsimage() can be removed since checknnstartup() already ensures 
> image loaded and namenode started.
> 
>
> Key: HDFS-9430
> URL: https://issues.apache.org/jira/browse/HDFS-9430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9430.patch
>
>
> After initial startup,loading fsimage in between happens only for 
> secondarynn..But it wn't server any client requests.
> So IMO it would be ok to remove it.
> Please check commement [here 
> |https://issues.apache.org/jira/browse/HDFS-9413?focusedCommentId=15005129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005129]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9441) Do not construct path string when choosing block placement targets

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036566#comment-15036566
 ] 

Kihwal Lee commented on HDFS-9441:
--

bq. If the path argument is strictly for logging in block placement policy, it 
might be an acceptable design.
I think we need to answer this first. I wouldn't personally use path for 
placing blocks and existing policies also ignore it, so for all practical 
purposes, the answer might be yes.  But others might have different opinions. 
If the consensus is "yes", your design should be acceptable.

> Do not construct path string when choosing block placement targets
> --
>
> Key: HDFS-9441
> URL: https://issues.apache.org/jira/browse/HDFS-9441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h9441_20151118.patch, h9441_20151119.patch
>
>
> - INodeFile.getName() is expensive since it involves quite a few string 
> operations.  The method is called in both ReplicationWork and 
> ErasureCodingWork but the default BlockPlacementPolicy does not use the 
> returned string.  We should simply pass BlockCollection to reduce unnecessary 
> computation when using the default BlockPlacementPolicy.
> - Another improvement: the return type of FSNamesystem.getBlockCollection 
> should be changed to INodeFile since it always returns an INodeFile object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reopened HDFS-9129:
-

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Status: Patch Available  (was: Reopened)

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9441) Do not construct path string when choosing block placement targets

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036419#comment-15036419
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9441:
---

[~kihwal], you seems having some good idea.  Could you show more details on how 
to add the getFullPath(..) methods?  And how would the interface and the 
implementation look like?

> Do not construct path string when choosing block placement targets
> --
>
> Key: HDFS-9441
> URL: https://issues.apache.org/jira/browse/HDFS-9441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h9441_20151118.patch, h9441_20151119.patch
>
>
> - INodeFile.getName() is expensive since it involves quite a few string 
> operations.  The method is called in both ReplicationWork and 
> ErasureCodingWork but the default BlockPlacementPolicy does not use the 
> returned string.  We should simply pass BlockCollection to reduce unnecessary 
> computation when using the default BlockPlacementPolicy.
> - Another improvement: the return type of FSNamesystem.getBlockCollection 
> should be changed to INodeFile since it always returns an INodeFile object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036547#comment-15036547
 ] 

Kihwal Lee commented on HDFS-8791:
--

This is not regarding the release planning, but the patch itself.

We tried a rolling upgrade of a sandbox/test cluster and it didn't go well.  We 
pulled in the layout fix and the hard-linking was taking about 6-9 minutes per 
drive. The following is an example of 9 minute upgrade. I think it still is the 
cost of scanning the old layout.

{noformat}
2015-12-02 19:10:13,384 INFO common.Storage: Upgrading block pool storage 
directory
 /xxx/current/BP-1586417773-98.139.153.156-1363377856192.
   old LV = -56; old CTime = 1416360571152.
   new LV = -57; new CTime = 1416360571152
2015-12-02 19:19:02,184 INFO common.Storage: HardLinkStats: 64735 Directories, 
including 48966 Empty Directories,
 43842 single Link operations, 1 multi-Link operations, linking 12 files, total 
43854 linkable files.  Also physically copied 0 other files.
{noformat}

At minimum, we need to make the upgrade (storage initialization) parallel as 
suggested by [~cmccabe] before.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9497) libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an ASF license

2015-12-02 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036561#comment-15036561
 ] 

James Clampffer commented on HDFS-9497:
---

Seems straightforward to me.  +1.

I plan on committing tomorrow morning unless someone has an issue.

> libhdfs++: move lib/proto/cpp_helpers to third-party since it won't have an 
> ASF license
> ---
>
> Key: HDFS-9497
> URL: https://issues.apache.org/jira/browse/HDFS-9497
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9497.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-12-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035454#comment-15035454
 ] 

Colin Patrick McCabe commented on HDFS-9267:


Thanks for working on this, [~eddyxu].  This is a lot more code than I was 
expecting.  Patch 002 is really simple... I wish we could do something that was 
as simple.

Does it make sense to do something like what patch 002 is doing and just change

{code}
-292 public Collection getStoredReplicas(String bpid) throws 
IOException {
+292 public Iterator getStoredReplicas(String bpid) throws IOException 
{
{code}
and
{code}
-308return ret;
+308return ret.iterator();
{code}

Since this is part of the test utils, we might not need to optimize it yet.

Thanks again for working on this and sorry for the sometime slow reviews.

> TestDiskError should get stored replicas through FsDatasetTestUtils.
> 
>
> Key: HDFS-9267
> URL: https://issues.apache.org/jira/browse/HDFS-9267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, 
> HDFS-9267.02.patch, HDFS-9267.03.patch, HDFS-9267.04.patch, HDFS-9267.05.patch
>
>
> {{TestDiskError#testReplicationError}} scans local directories to verify 
> blocks and metadata files, which leaks the details of {{FsDataset}} 
> implementation. 
> This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool

2015-12-02 Thread Hui Zheng (JIRA)
Hui Zheng created HDFS-9496:
---

 Summary: Erasure coding: an erasure codec throughput benchmark tool
 Key: HDFS-9496
 URL: https://issues.apache.org/jira/browse/HDFS-9496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding, test
Reporter: Hui Zheng


We need a tool which can help us decide/benchmark an Erasure Codec and schema.
Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe we 
could simply add encode/decode operation to it or implement another tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036701#comment-15036701
 ] 

Chris Trezzo commented on HDFS-8791:


Thanks [~kihwal] for the testing info!
I definitely saw the longer upgrade time between 256x256 to 32x32 layout (a 
more detailed breakdown can be found in the "(­-56) to (­-57) with high block 
density" section of the [testing 
doc|https://issues.apache.org/jira/secure/attachment/12774454/32x32DatanodeLayoutTesting-v2.pdf]),
 but not quite as long as the hard-linking time that you saw.

[~vinodkv] I also agree we need to make the upgrade path smooth regardless of 
which release this patch goes in to.

[~kihwal] How long did it take to scan all the block pools (i.e. was 
hard-linking the majority of the upgrade time)?

Thanks all for the comments!

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036716#comment-15036716
 ] 

Kihwal Lee commented on HDFS-8791:
--

bq.  How long did it take to scan all the block pools (i.e. was hard-linking 
the majority of the upgrade time)?
I don't think hard-linking was the major contributor to the long upgrade time. 
Scanning didn't take too long with the new layout.

{noformat}
INFO impl.FsDatasetImpl:Time taken to scan block pool BP- on 
/xxx/hdfs/data/current: 92ms
...
INFO impl.FsDatasetImpl: Time to add replicas to map for block pool BP- on 
volume /xxx/hdfs/data/current: 2274ms
{noformat}

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9499) Fix typos in DFSAdmin.java

2015-12-02 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-9499:
---

 Summary: Fix typos in DFSAdmin.java
 Key: HDFS-9499
 URL: https://issues.apache.org/jira/browse/HDFS-9499
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.8.0
Reporter: Arpit Agarwal


There are multiple instances of 'snapshot' spelled as 'snaphot' in 
DFSAdmin.java and TestSnapshotCommands.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9417) Clean up the RAT warnings in the HDFS-8707 branch.

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036758#comment-15036758
 ] 

Hadoop QA commented on HDFS-9417:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
37s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 39s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 37s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 6s {color} | 
{color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 1s {color} | 
{color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 24s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775376/HDFS-9417.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-9417 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  cc  |
| uname | Linux 552b5b806f22 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d6d056d |
| unit | 

[jira] [Created] (HDFS-9498) Move BlockManager#numberOfBytesInFutureBlocks to BlockManagerSafeMode

2015-12-02 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9498:
---

 Summary: Move BlockManager#numberOfBytesInFutureBlocks to 
BlockManagerSafeMode
 Key: HDFS-9498
 URL: https://issues.apache.org/jira/browse/HDFS-9498
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Mingliang Liu
Assignee: Mingliang Liu


[HDFS-4015] counts and reports orphaned blocks  {{numberOfBytesInFutureBlocks}} 
in safe mode. It was implemented in {{BlockManager}}. Per discussion in 
[HDFS-9129] which introduces the {{BlockManagerSafeMode}}, we can move code 
that maintaining orphaned blocks to this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9490) MiniDFSCluster should change block generation stamp via FsDatasetTestUtils

2015-12-02 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036754#comment-15036754
 ] 

Tony Wu commented on HDFS-9490:
---

The failed tests are not related to this patch. Only 
{{TestPendingCorruptDnMessages}} and {{TestNameNodeMetadataConsistency}} uses 
the updated {{MiniDFSCluster#changeGenStampOfBlock}}. Both of these tests pass 
fine.

Manually ran the failed tests with jdk 1.8 on OSX and they all pass.

The failed JDK 1.7 tests both suffer from permission denied error (might be a 
test system issue):
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testListCacheDirectives:
{{Error while running command to get file permissions : ExitCodeException 
exitCode=127: /bin/ls: error while loading shared libraries: libc.so.6: failed 
to map segment from shared object: Permission denied}}
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testListCacheDirectives:
{{bash: error while loading shared libraries: libdl.so.2: failed to map segment 
from shared object: Permission denied}}

The failed JDK 1.8 tests : 
TestSeveralNameNodes is tracked by HDFS-9376


> MiniDFSCluster should change block generation stamp via FsDatasetTestUtils
> --
>
> Key: HDFS-9490
> URL: https://issues.apache.org/jira/browse/HDFS-9490
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9490.001.patch, HDFS-9490.002.patch
>
>
> {{MiniDFSCluster#changeGenStampOfBlock}} directly manipulates the block meta 
> file to update the generation stamp. This depends on file based {{FsDataset}}.
> We can abstract the change generation stamp operation in 
> {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036772#comment-15036772
 ] 

Hadoop QA commented on HDFS-9129:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:5d9212c. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775174/HDFS-9129-branch-2.025.patch
 |
| JIRA Issue | HDFS-9129 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13739/console |


This message was automatically generated.



> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-9129-branch-2.025.patch, HDFS-9129.000.patch, 
> HDFS-9129.001.patch, HDFS-9129.002.patch, HDFS-9129.003.patch, 
> HDFS-9129.004.patch, HDFS-9129.005.patch, HDFS-9129.006.patch, 
> HDFS-9129.007.patch, HDFS-9129.008.patch, HDFS-9129.009.patch, 
> HDFS-9129.010.patch, HDFS-9129.011.patch, HDFS-9129.012.patch, 
> HDFS-9129.013.patch, HDFS-9129.014.patch, HDFS-9129.015.patch, 
> HDFS-9129.016.patch, HDFS-9129.017.patch, HDFS-9129.018.patch, 
> HDFS-9129.019.patch, HDFS-9129.020.patch, HDFS-9129.021.patch, 
> HDFS-9129.022.patch, HDFS-9129.023.patch, HDFS-9129.024.patch, 
> HDFS-9129.025.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1503#comment-1503
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8791:
---

Agree with Kihwal that the release discussion is besides the point.

The question is what will happen to existing users' clusters be they are based 
off 2.7.x or 2.8.x or 2.9.x.

Dumbing it down, I think there are (1) users who care about the perf issue and 
can manage the resulting storage layout change and (2) there will be those who 
won't. Making the upgrade (as well as downgrade?) seamlessly work as part of 
our code pretty much seems like a blocker in order to avoid surprises for 
unsuspecting users who are not in need of this change.

Forcing a manual step for a 2.6.x / 2.7.x user when he/she upgrades to 2.6.4, 
2.7.3 or 2.8.0 seems like a non-starter to me.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036721#comment-15036721
 ] 

Kihwal Lee edited comment on HDFS-8791 at 12/2/15 10:06 PM:


As for making it run parallel, we could do it in 
{{DataStorage#addStorageLocations()}}. We can borrow the code from 
{{FsVolumeList#addBlockPool()}}.


was (Author: kihwal):
As for making it run parallel, we could do it in {{Datanode#initStorage()}} or 
{{DataStorage#recoverTransitionRead()}}. We can borrow the code from 
{{FsVolumeList#addBlockPool()}}.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036774#comment-15036774
 ] 

Hadoop QA commented on HDFS-9144:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} HDFS-9144 does not apply to HDFS-8707. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9144 |
| GITHUB PR | https://github.com/apache/hadoop/pull/43 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13740/console |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13740/console |


This message was automatically generated.



> Refactor libhdfs into stateful/ephemeral objects
> 
>
> Key: HDFS-9144
> URL: https://issues.apache.org/jira/browse/HDFS-9144
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9144.HDFS-8707.001.patch, 
> HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, 
> HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, 
> HDFS-9144.HDFS-8707.006.patch
>
>
> In discussion for other efforts, we decided that we should separate several 
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for 
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and 
> other metadata about the file that we assume will not change over the life of 
> the file)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9144) Refactor libhdfs into stateful/ephemeral objects

2015-12-02 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036692#comment-15036692
 ] 

Bob Hansen commented on HDFS-9144:
--

Thanks for rebasing those changes, [~James Clampffer].  +1 on the changes.  

It looks like the refactored lifecycle management clears up the segfault that 
has been making test_libhdfs_threaded_hdfspp_test_shim_static segfault.  It was 
segfaulting fairly regularly on my dev machine before applying the current 
patch, and after applying it, I've run through ~40 iterations of native test 
without an error.

I can't say for sure that it clears up HDFS-9486, but it definitely makes the 
integration tests more stable on my machine.

> Refactor libhdfs into stateful/ephemeral objects
> 
>
> Key: HDFS-9144
> URL: https://issues.apache.org/jira/browse/HDFS-9144
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9144.HDFS-8707.001.patch, 
> HDFS-9144.HDFS-8707.002.patch, HDFS-9144.HDFS-8707.003.patch, 
> HDFS-9144.HDFS-8707.004.patch, HDFS-9144.HDFS-8707.005.patch, 
> HDFS-9144.HDFS-8707.006.patch
>
>
> In discussion for other efforts, we decided that we should separate several 
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for 
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and 
> other metadata about the file that we assume will not change over the life of 
> the file)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9441) Do not construct path string when choosing block placement targets

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036728#comment-15036728
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9441:
---

In the past, the raid block placement policy by facebook used path to enforce 
storage scheme (use raid or not).  I believe the parameter was added at that 
time.  After that no (known) block placement policies use it for block 
placement.

> Do not construct path string when choosing block placement targets
> --
>
> Key: HDFS-9441
> URL: https://issues.apache.org/jira/browse/HDFS-9441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h9441_20151118.patch, h9441_20151119.patch
>
>
> - INodeFile.getName() is expensive since it involves quite a few string 
> operations.  The method is called in both ReplicationWork and 
> ErasureCodingWork but the default BlockPlacementPolicy does not use the 
> returned string.  We should simply pass BlockCollection to reduce unnecessary 
> computation when using the default BlockPlacementPolicy.
> - Another improvement: the return type of FSNamesystem.getBlockCollection 
> should be changed to INodeFile since it always returns an INodeFile object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036721#comment-15036721
 ] 

Kihwal Lee commented on HDFS-8791:
--

As for making it run parallel, we could do it in {{Datanode#initStorage()}} or 
{{DataStorage#recoverTransitionRead()}}. We can borrow the code from 
{{FsVolumeList#addBlockPool()}}.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9499) Fix typos in DFSAdmin.java

2015-12-02 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-9499:
---
Assignee: Nicole Pazmany

> Fix typos in DFSAdmin.java
> --
>
> Key: HDFS-9499
> URL: https://issues.apache.org/jira/browse/HDFS-9499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Nicole Pazmany
>
> There are multiple instances of 'snapshot' spelled as 'snaphot' in 
> DFSAdmin.java and TestSnapshotCommands.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9436) Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default

2015-12-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036932#comment-15036932
 ] 

Konstantin Shvachko commented on HDFS-9436:
---

{{numOpsRequired = 10}} and the {{nuDatanodes = 10}} implies that each DN sends 
exactly one block report. I'd probably prefer a couple of block reports per DN 
for better testing, but it's up to you.
Otherwise patch looks good.

> Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default
> 
>
> Key: HDFS-9436
> URL: https://issues.apache.org/jira/browse/HDFS-9436
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-9436.000.patch, HDFS-9436.001.patch
>
>
> This is a follow-up of [HDFS-9379].
> Though for actual benchmarking the defaults are rarely used, it would be good 
> to change the default for {{numThreads}} as a >=10 value and may be 
> {{numOpsRequired}} in {{BlockReportStats}} just to make sure the condition in 
> [HDFS-9379] is tested in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036943#comment-15036943
 ] 

Hadoop QA commented on HDFS-9267:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 17s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s 
{color} | {color:red} Patch generated 56 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 167m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
| JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775362/HDFS-9267.06.patch |
| JIRA Issue | HDFS-9267 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  

[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-12-02 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8647:

Attachment: (was: HDFS-8647-branch27.patch)

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-12-02 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8647:

Attachment: (was: HDFS-8647-branch26.patch)

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9469) DiskBalancer : Add Planner

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036990#comment-15036990
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9469:
---

I think we need to change the data model to use mean and variance before adding 
planner.  Otherwise, it is harder to change later. 

Other comments:
- In DiskBalancerCluster.createOutPutDirectory
-* createOutPutDirectory: P should be in lower case.
-* It seems that throwing new IOException is enough.  We don't need LOG.fatal.

- computePlan: top is "The total number of nodes to process".  Then what is 
nodesToProcess.size()?  Is it supposed top >= nodesToProcess.size()?

- computePoolSize return 0 if nodeCount is 9000.  It should not " % 100 " at 
the end.

- In PlannerFactory.getPlanner,
-* It logs a messge per node.  Is it needed?
-* Is the planner supposed to be fixed for a single run?
-* What other planners are we going to support?
-* It should throw an exception instead of returning null at the end.

- We should use LOG.error instead of LOG.fatal below.
{code}
  try {
planList.add(f.get());
  } catch (InterruptedException e) {
LOG.fatal("Compute Node plan was cancelled or interrupted : ", e);
  } catch (ExecutionException e) {
LOG.fatal("Unable to compute plan : ", e);
  }
{code}

- The GreedyPlanner is an algorithm.  The input is a DiskBalancerDataNode.  So 
node should be a parameter of plan() but not a field.

- Use StringUtils.TraditionalBinaryPrefix.long2String(..) instead of adding 
getSizeString.

I did not continue reviewing the computation since it requires the data model 
change.

> DiskBalancer : Add Planner 
> ---
>
> Key: HDFS-9469
> URL: https://issues.apache.org/jira/browse/HDFS-9469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9469-HDFS-1312.001.patch, 
> HDFS-9469-HDFS-1312.002.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036993#comment-15036993
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9294:
---

[~aw], why there were only 49 tests (-4077) executed in the previous build?

> DFSClient  deadlock when close file and failed to renew lease
> -
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
>Reporter: 邓飞
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-9294-002.patch, HDFS-9294-002.patch, 
> HDFS-9294-branch-2.7.patch, HDFS-9294-branch-2.patch, HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 
> org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
>   - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
>   at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   - locked <0x00059869eed8> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
>   at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
>   - waiting to lock <0x000486ce6620> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
>   at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
>   at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
>   - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
>   at 

[jira] [Commented] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-12-02 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037004#comment-15037004
 ] 

Chris Nauroth commented on HDFS-9294:
-

Hi [~szetszwo].  That relates to the recent split of the client code into a 
separate hadoop-hdfs-client module.  Tests are only executed in the modules 
being changed by a patch.  This patch only changes files in hadoop-hdfs-client, 
not files in hadoop-hdfs.

> DFSClient  deadlock when close file and failed to renew lease
> -
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
>Reporter: 邓飞
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-9294-002.patch, HDFS-9294-002.patch, 
> HDFS-9294-branch-2.7.patch, HDFS-9294-branch-2.patch, HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 
> org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
>   - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
>   at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   - locked <0x00059869eed8> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
>   at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
>   - waiting to lock <0x000486ce6620> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
>   at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
>   at 

[jira] [Updated] (HDFS-9436) Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default

2015-12-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9436:

Attachment: HDFS-9436.002.patch

Thank you very much [~shv] for your review. Sending a couple of block reports 
makes perfect sense to me. The v2 patch is to address.

> Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default
> 
>
> Key: HDFS-9436
> URL: https://issues.apache.org/jira/browse/HDFS-9436
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-9436.000.patch, HDFS-9436.001.patch, 
> HDFS-9436.002.patch
>
>
> This is a follow-up of [HDFS-9379].
> Though for actual benchmarking the defaults are rarely used, it would be good 
> to change the default for {{numThreads}} as a >=10 value and may be 
> {{numOpsRequired}} in {{BlockReportStats}} just to make sure the condition in 
> [HDFS-9379] is tested in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036953#comment-15036953
 ] 

Hadoop QA commented on HDFS-9294:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 29s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775431/HDFS-9294-002.patch |
| JIRA Issue | HDFS-9294 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 09ffb4893e70 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 

[jira] [Updated] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-12-02 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8647:

Attachment: HDFS-8647-branch27.patch

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch, HDFS-8647-branch27.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-12-02 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037001#comment-15037001
 ] 

Xiao Chen commented on HDFS-8647:
-

Sorry about the flooding comments here...
Talked with Zhe offline, the cherry-pick was from trunk - sadly I didn't grep 
out from branch-2! (lesson learned: double check before backporting - comments 
above also showed Ming did the backport...) So please ignore the above comments 
regarding the cherry-picks. Summary from branch-2 backport to branch-2.7 below:
Conflicts:
- 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
(HDFS-8823 and HDFS-8938, both in branch-2 but not in branch2.7. Several 
BlockInfo related conflicts resolved by using Block, and resolved some 
conflicts regarding HDFS-8938's new methods by not bringing the methods into 
this patch. HDFS-9083 is only in branch 2.6 and 2.7, and removed 
shouldCheckForEnoughRacks related stuff in BlockManager.java, so no change 
needed here.)
- 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
 
(import conflicts)
- 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRackFaultTolerant.java
 
(not exist)
- 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithUpgradeDomain.java
 
(not exist)
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
 
(Import conflicts. Also additional changes due to HDFS-9083 which is only in 
branch 2.6 and 2.7.)
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithUpgradeDomain.java
 
(not exist)

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch, 
> HDFS-8647-005.patch, HDFS-8647-006.patch, HDFS-8647-007.patch, 
> HDFS-8647-008.patch, HDFS-8647-009.patch, HDFS-8647-branch27.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-12-02 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037400#comment-15037400
 ] 

Brahma Reddy Battula commented on HDFS-9294:


[~szetszwo] thanks a lot for reviewing and committing this issue.

> DFSClient  deadlock when close file and failed to renew lease
> -
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
>Reporter: 邓飞
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: HDFS-9294-002.patch, HDFS-9294-002.patch, 
> HDFS-9294-branch-2.7.patch, HDFS-9294-branch-2.patch, HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 
> org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
>   - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
>   at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   - locked <0x00059869eed8> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
>   at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
>   - waiting to lock <0x000486ce6620> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
>   at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
>   at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
>   - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
>   at 

[jira] [Commented] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool

2015-12-02 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037240#comment-15037240
 ] 

Hui Zheng commented on HDFS-9496:
-

HADOOP-11588 is good enough to me.Thanks [~lirui].

> Erasure coding: an erasure codec throughput benchmark tool
> --
>
> Key: HDFS-9496
> URL: https://issues.apache.org/jira/browse/HDFS-9496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Hui Zheng
>
> We need a tool which can help us decide/benchmark an Erasure Codec and schema.
> Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe 
> we could simply add encode/decode operation to it or implement another tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8705) BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in all locales

2015-12-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037123#comment-15037123
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8705:
---

I agree that equalsIgnoreCase is safe for all English letters and all locales; 
try the test program below.
{code}
  public static void main(String[] args) {
final String allLetters = getAllLetters();
System.out.println(allLetters);
for(Locale locale : Locale.getAvailableLocales()) {
  System.out.println("locale = " + locale);

  final String upper = allLetters.toUpperCase(locale);
  final String lower = allLetters.toLowerCase(locale);
  System.out.println("  upper = " + upper);
  System.out.println("  lower = " + lower);
  assertEqualsIgnoreCase(upper, lower);
  assertEqualsIgnoreCase(upper, allLetters);
  assertEqualsIgnoreCase(lower, allLetters);
}
  }
  static String getAllLetters() {
final StringBuilder b = new StringBuilder();
for(char lower = 'a', upper = 'A'; lower <= 'z'; lower++, upper++) {
  b.append(lower).append(upper);
}
return b.toString();
  }
  static void assertEqualsIgnoreCase(String a, String b) {
if (!a.equalsIgnoreCase(b)) {
  throw new AssertionError("a.equalsIgnoreCase(b) = " + 
a.equalsIgnoreCase(b)
  + "\na = " + a + "\nb=" + b);
}
  }
{code}
In particular, below is the output for Turkish.
{code}
locale = tr_TR
  upper = AABBCCDDEEFFGGHHİIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ
  lower = aabbccddeeffgghhiıjjkkllmmnnooppqqrrssttuuvvwwxxyyzz
{code}


> BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in 
> all locales
> 
>
> Key: HDFS-8705
> URL: https://issues.apache.org/jira/browse/HDFS-8705
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-8705-002.patch, HDFS-8705.patch
>
>
> Looking at {{BlockStoragePolicySuite.getPolicy(name)}}, is using 
> {{equalsIgnoreCase()}} to find a policy which matches a name.
> This will not work in all locales. It must use 
> {{toLowerCase(Locale.ENGLISH).equals(name)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9448) Enable valgrind for libhdfspp unit tests

2015-12-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037181#comment-15037181
 ] 

Allen Wittenauer commented on HDFS-9448:


bq. Does that seem idiomatically in line with the rest of the Hadoop build 
infrastructure?

It sort of feels backwards, in a way.  But I'm hoping to get a chance to play 
with it more tomorrow.

bq. We can open a separate issue to add valgrind to the yetus image. Allen 
Wittenauer - Should that be an HDFS Jira or a Yetus Jira?

This tells me that I've failed to communicate. :(  So let's try it one more 
time.

I'm developing a patch for Hadoop.  Let's say I do it on a Mac. I do a build 
and everything seems ok, but I know I don't have a working libzip2.   I can run 
the ./start-build-env.sh script that is in the root of the source tree.  It 
will fire off Docker and create a working Linux environment that has 
*everything* I need to get the *full* capabilities of the Hadoop build, 
including any requirements of *all* of the unit tests.

Now ask yourself, if a test gets added that needs valgrind, should it be part 
of the Dockerfile that ships with Hadoop?

> Enable valgrind for libhdfspp unit tests
> 
>
> Key: HDFS-9448
> URL: https://issues.apache.org/jira/browse/HDFS-9448
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9448.HDFS-8707.000.patch, 
> HDFS-9448.HDFS-8707.001.patch, HDFS-9448.HDFS-8707.002.patch
>
>
> We should have a target that runs the unit tests under valgrind if it is 
> available on the target machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9436) Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default

2015-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037220#comment-15037220
 ] 

Hadoop QA commented on HDFS-9436:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 13s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 33, now 33). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 14s 
{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 1 new issues (was 35, now 35). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 5s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 56 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 215m 12s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   

[jira] [Commented] (HDFS-9469) DiskBalancer : Add Planner

2015-12-02 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037290#comment-15037290
 ] 

Anu Engineer commented on HDFS-9469:


[~szetszwo], Thanks for the detailed review. Here are my thoughts on your 
comments. I will update the review soon with a new patch.

bq. I think we need to change the data model to use mean and variance before 
adding planner. Otherwise, it is harder to change later.
Will do. Right now it is masked behind isBalancingNeeded. I will file a Jira 
and fix that.

bq. computePlan: top is "The total number of nodes to process". Then what is 
nodesToProcess.size()? Is it supposed top >= nodesToProcess.size()?
I see the comments are confusing. Top is the number of nodes that gets send 
down to the program by the user, and NodesToProcess is the list we discover 
from the cluster. I agree that top is badly named.

bq. computePoolSize return 0 if nodeCount is 9000. It should not " % 100 " at 
the end.
Thanks for catching that , fixed.

bq. It logs a messge per node. Is it needed?
I can see that it is noisy since it is going the be same planner most of the 
time.

bq. Is the planner supposed to be fixed for a single run?
Depends on how we define a single run. if you are asking is the planner fixed 
for the duration of a planner being run, yes. But on the other hand nothing 
prevents the user from creating a plan and executing it later. That means 
technically it is possible to have different plans and then execute these plans.

bq. What other planners are we going to support?
The idea behind the planner is that it will serve as a generic disk layout 
tool. At some point in time it could merge with mover, for example, and that 
would need a different kind of planner.

bq. It should throw an exception instead of returning null at the end.
 Will do.

bq. We should use LOG.error instead of LOG.fatal below.
Will do.

bq. The GreedyPlanner is an algorithm. The input is a DiskBalancerDataNode. So 
node should be a parameter of plan() but not a field.
Will do.

bq. Use StringUtils.TraditionalBinaryPrefix.long2String(..) instead of adding 
getSizeString.
Thanks for the pointer, did not know about that.

bq. I did not continue reviewing the computation since it requires the data 
model change.
if you are referring to weighted mean and variance , they are technically 
hidden behind the function of isBalancingNeeded. I will certainly update the 
algorithm as you have proposed.  Just wanted to make sure that you are 
commenting about the same issue, if it is indeed the same issue, I will update 
this patch with the changes related to this one and update the data models in 
another Jira since we will need some tests to makes sure that code is working 
correctly with the weighted mean and variance changes.








> DiskBalancer : Add Planner 
> ---
>
> Key: HDFS-9469
> URL: https://issues.apache.org/jira/browse/HDFS-9469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9469-HDFS-1312.001.patch, 
> HDFS-9469-HDFS-1312.002.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9500:

Description: 
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive at that moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest way to fix this is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.


  was:
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive on the moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest way to fix this is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.



> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> -
>
> Key: HDFS-9500
> URL: https://issues.apache.org/jira/browse/HDFS-9500
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive at that moment, if 
> namenode has not been noticed and removed this node and this node restarts in 
> the new version, the decrementVersionCount belongs to this node will never be 
> executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9083) Replication violates block placement policy.

2015-12-02 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037361#comment-15037361
 ] 

Brahma Reddy Battula commented on HDFS-9083:


[~xyao] thanks for pointing same..

{code}
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(capacities.length)
  .hosts(new String[]{"localhost", "localhost"})
  .racks(new String[]{"rack0", "rack1"}).simulatedCapacities(capacities).build()
{code}

2 DNs are started with "rack1". Ideally we should not create 2 DNs with the 
same hostname.And Pinning depends on favoredNodes.DFSClient#create(..) only 
uses host:port, if favoredNodes is created by new InetSocketAddress(ip, port)

DFSClient will attempt a reverse lookup locally to get host:port, instead of 
sending ip:port directly to NameNode.
.
MiniDFSCluster use fake hostname "host1.foo.com" to start DataNodes.DFSClient 
doesn't use StaticMapping. So if DFSClient do reverse lookup, "127.0.0.1:8020" 
becomes "localhost:8020".

Fix can be like following which I did same in branch-2 and Trunk.

{code}
+String[] hosts = {"host0", "host1"};
 String[] racks = { RACK0, RACK1 };
 int numOfDatanodes = capacities.length;
 
 cluster = new MiniDFSCluster.Builder(conf).numDataNodes(capacities.length)
-  .hosts(new String[]{"localhost", "localhost"})
-  .racks(racks).simulatedCapacities(capacities).build();
+.hosts(hosts).racks(racks).simulatedCapacities(capacities).build();
 
 try {
   cluster.waitActive();
@@ -377,7 +377,10 @@ public void testBalancerWithPinnedBlocks() throws 
Exception {
   long totalUsedSpace = totalCapacity * 8 / 10;
   InetSocketAddress[] favoredNodes = new InetSocketAddress[numOfDatanodes];
   for (int i = 0; i < favoredNodes.length; i++) {
-favoredNodes[i] = cluster.getDataNodes().get(i).getXferAddress();
+// DFSClient will attempt reverse lookup. In case it resolves
+// "127.0.0.1" to "localhost", we manually specify the hostname.
+int port = cluster.getDataNodes().get(i).getXferAddress().getPort();
+favoredNodes[i] = new InetSocketAddress(hosts[i], port);
{code}

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9083-branch-2.6.patch, HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8791:
---
Attachment: test-node-upgrade.txt

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz, 
> test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037383#comment-15037383
 ] 

Chris Trezzo commented on HDFS-8791:


[~kihwal]
For the sake of completeness, we upgraded another test cluster this afternoon 
from 256x256 to 32x32. During this upgrade, we did see the long upgrade times 
that you were seeing. One of the data nodes took 1 hour and 25 min from start 
of upgrade until the last namespace was finalized. Here is the [upgrade 
log|https://issues.apache.org/jira/secure/attachment/12775501/test-node-upgrade.txt].
 This data node was not an outlier. As you can see for this node, the 
hard-linking for all 12 disks took an hour by itself.

I will look at {{DataStorage#addStorageLocations()}} and 
{{FsVolumeList#addBlockPool()}}. I will spend some effort to see if I can put 
together a patch that will parallelize the upgrade. 

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz, 
> test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9083) Replication violates block placement policy.

2015-12-02 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037293#comment-15037293
 ] 

Xiaoyu Yao commented on HDFS-9083:
--

The 2.7 patch caused failure of TestBalancer#testBalancerWithPinnedBlocks. The 
test was passing without this patch. 
[~shahrs87], can you take a look?

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.888 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
  Time elapsed: 12.748 sec  <<< FAILURE!
java.lang.AssertionError: expected:<-3> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:362)


Results :

Failed tests: 
  TestBalancer.testBalancerWithPinnedBlocks:362 expected:<-3> but was:<0>



> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9083-branch-2.6.patch, HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease

2015-12-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037339#comment-15037339
 ] 

Hudson commented on HDFS-9294:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #659 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/659/])
HDFS-9294. DFSClient deadlock when close file and failed to renew lease. 
(szetszwo: rev e8bd1ba74b2fc7a6a1b71d068ef01a0fb0bbe294)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DFSClient  deadlock when close file and failed to renew lease
> -
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
>Reporter: 邓飞
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: HDFS-9294-002.patch, HDFS-9294-002.patch, 
> HDFS-9294-branch-2.7.patch, HDFS-9294-branch-2.patch, HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a 
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a 
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
>   at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
>   - waiting to lock <0x0002fae5ebe0> (a 
> org.apache.hadoop.hdfs.LeaseRenewer)
>   at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
>   at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
>   at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
>   - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
>   at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>   - locked <0x00059869eed8> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
>   at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
>   - waiting to 

[jira] [Created] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)
Phil Yang created HDFS-9500:
---

 Summary: datanodesSoftwareVersions map may counting wrong when 
rolling upgrade
 Key: HDFS-9500
 URL: https://issues.apache.org/jira/browse/HDFS-9500
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.2, 2.7.1
Reporter: Phil Yang
Assignee: Phil Yang


While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive on the moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest fix is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9500:

Attachment: 9500-v1.patch

> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> -
>
> Key: HDFS-9500
> URL: https://issues.apache.org/jira/browse/HDFS-9500
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive on the moment, if 
> namenode has not been noticed and removed this node and this node restarts in 
> the new version, the decrementVersionCount belongs to this node will never be 
> executed.
> So the simplest fix is that we always recounting the version map in 
> registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037393#comment-15037393
 ] 

Chris Trezzo commented on HDFS-8791:


As a side note: I see that there are already multiple jiras around making the 
upgrade parallel. I see HDFS-8782 and HDFS-8578. I will investigate more.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz, 
> test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9484) NNThroughputBenchmark$BlockReportStats should not send empty block reports

2015-12-02 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037435#comment-15037435
 ] 

Mingliang Liu commented on HDFS-9484:
-

The failing tests seem unrelated.

> NNThroughputBenchmark$BlockReportStats should not send empty block reports
> --
>
> Key: HDFS-9484
> URL: https://issues.apache.org/jira/browse/HDFS-9484
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9484.000.patch, HDFS-9484.001.patch
>
>
> There are two potential bugs that make the 
> {{NNThroughputBenchmark$BlockReportStats}} send empty block reports.
> # In {{NNThroughputBenchmark$BlockReportStats#formBlockReport()}}, the 
> {{blockReportList}} is always {{BlockListAsLongs.EMPTY}}. We should construct 
> the block report list by encoding generated {{blocks}} in test.
> # {{TinyDatanode#blocks}} is an empty ArrayList with initial capacity. In 
> {{TinyDatanode#addBlock()}} first statement, the {{if(nrBlocks == 
> blocks.size()) {}} will always be true. We should either fill the blocks with 
> dummy report in {{TinyDatanode()}} constructor, or use initial capacity 
> instead of {{blocks.size()}} in the above _if_ statement (we should replace 
> {{ArrayList#set}} with {{ArrayList#add}} as well).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9436) Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default

2015-12-02 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037433#comment-15037433
 ] 

Mingliang Liu commented on HDFS-9436:
-

The failing tests seem unrelated.

Specially, {{hadoop.hdfs.server.datanode.TestDirectoryScanner}} is tracked by 
[HDFS-9300], {{TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure}} 
tracked by [HDFS-9466]. Other failing tests can be investigated separately.

> Make NNThroughputBenchmark$BlockReportStats run with 10 datanodes by default
> 
>
> Key: HDFS-9436
> URL: https://issues.apache.org/jira/browse/HDFS-9436
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-9436.000.patch, HDFS-9436.001.patch, 
> HDFS-9436.002.patch
>
>
> This is a follow-up of [HDFS-9379].
> Though for actual benchmarking the defaults are rarely used, it would be good 
> to change the default for {{numThreads}} as a >=10 value and may be 
> {{numOpsRequired}} in {{BlockReportStats}} just to make sure the condition in 
> [HDFS-9379] is tested in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9083) Replication violates block placement policy.

2015-12-02 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037436#comment-15037436
 ] 

Xiaoyu Yao commented on HDFS-9083:
--

Thanks [~brahmareddy] for the explanation. That helps to understand the issue.
Is this fixed in 2.7.x branches such as branch-2.7.1 or branch-2.7.2? If not, 
we need a separate ticket for the unit test fix.

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: HDFS-9083-branch-2.6.patch, HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2015-12-02 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8791:
---
Attachment: test-node-upgrade.txt

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, hadoop-56-layout-datanode-dir.tgz
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8705) BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in all locales

2015-12-02 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037423#comment-15037423
 ] 

Brahma Reddy Battula commented on HDFS-8705:


As I mentioned in the earlier {{String.equalIgoreCase()}} is locale free.And 
thanks [~walter.k.su] and [~szetszwo] for your inputs.

{{String.equalIgoreCase()}} internally uses, {{Character.toLowerCase()}} and 
{{Character.toUpperCase()}}, this will not change based on Locale.
but {{String.toLowerCase()}} and {{String.toUpperCase()}} depends in Locale

 *Sample Test Code:* 

{code}
public static void main(String[] args) {
Locale trlocale = new Locale("tr", "TR");
Locale.setDefault(trlocale);
System.out.println(Locale.getDefault()); // tr_TR
char dottedUpper = '\u0130';
char dottedLower = '\u0069';
char dotlessUpper = '\u0049';
char dotlessLower = '\u0131';

char[] chars = new char[] { dottedLower, dottedUpper, dotlessLower,
dotlessUpper };
for (int i = 0; i < chars.length; i++) {
  char ch = chars[i];
  System.out.println("" + ch);
  System.out.println(" Character.toUpperCase('" + ch + "') --> "
  + Character.toUpperCase(ch));
  System.out.println(" Character.toLowerCase('" + ch + "') --> "
  + Character.toLowerCase(ch));
  String chString = new String(new char[] { ch });
  System.out.println(" \"" + chString + "\".toUpperCase() --> "
  + chString.toUpperCase());
  System.out.println(" \"" + chString + "\".toLowerCase() --> "
  + chString.toLowerCase());
}
  }
{code}
{{Character.toLowerCase()}} and {{Character.toUpperCase()}} always maps to 
English characters, not turkish characters.


 *Output when char is "I"* 
{code}
 Character.toUpperCase('I') --> I
 Character.toLowerCase('I') --> i
 "I".toUpperCase() --> I
 "I".toLowerCase() --> ı
{code}

> BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in 
> all locales
> 
>
> Key: HDFS-8705
> URL: https://issues.apache.org/jira/browse/HDFS-8705
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HDFS-8705-002.patch, HDFS-8705.patch
>
>
> Looking at {{BlockStoragePolicySuite.getPolicy(name)}}, is using 
> {{equalsIgnoreCase()}} to find a policy which matches a name.
> This will not work in all locales. It must use 
> {{toLowerCase(Locale.ENGLISH).equals(name)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9498) Move code that tracks orphan blocks to BlockManagerSafeMode

2015-12-02 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9498:

Summary: Move code that tracks orphan blocks to BlockManagerSafeMode  (was: 
Move BlockManager#numberOfBytesInFutureBlocks to BlockManagerSafeMode)

> Move code that tracks orphan blocks to BlockManagerSafeMode
> ---
>
> Key: HDFS-9498
> URL: https://issues.apache.org/jira/browse/HDFS-9498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> [HDFS-4015] counts and reports orphaned blocks  
> {{numberOfBytesInFutureBlocks}} in safe mode. It was implemented in 
> {{BlockManager}}. Per discussion in [HDFS-9129] which introduces the 
> {{BlockManagerSafeMode}}, we can move code that maintaining orphaned blocks 
> to this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9496) Erasure coding: an erasure codec throughput benchmark tool

2015-12-02 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-9496.
-
Resolution: Duplicate

Thanks Rui for clarifying this.

> Erasure coding: an erasure codec throughput benchmark tool
> --
>
> Key: HDFS-9496
> URL: https://issues.apache.org/jira/browse/HDFS-9496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Hui Zheng
>
> We need a tool which can help us decide/benchmark an Erasure Codec and schema.
> Considering HDFS-8968 has implemented an I/O throughput benchmark tool.Maybe 
> we could simply add encode/decode operation to it or implement another tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9500:

Description: 
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive on the moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest way to fix this is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.


  was:
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive on the moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest fix is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.



> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> -
>
> Key: HDFS-9500
> URL: https://issues.apache.org/jira/browse/HDFS-9500
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive on the moment, if 
> namenode has not been noticed and removed this node and this node restarts in 
> the new version, the decrementVersionCount belongs to this node will never be 
> executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9500:

Description: 
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive at that moment, if 
namenode has not removed this node which will decrease the version map and this 
node restarts in the new version, the decrementVersionCount belongs to this 
node will never be executed.

So the simplest way to fix this is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.


  was:
While rolling upgrading, namenode's website overview will report there are two 
versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has 
y nodes. However, sometimes when I stop a datanode in old version and start a 
new version one, namenode only increases the number of new version but not 
decreases the number of old version. So the total number x+y will be larger 
than the number of datanodes. Even all datanodes are upgraded, there will still 
have the messages that there are several datanode in old version. And I must 
run hdfs dfsadmin -refreshNodes to clear this message.

I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive at that moment, if 
namenode has not been noticed and removed this node and this node restarts in 
the new version, the decrementVersionCount belongs to this node will never be 
executed.

So the simplest way to fix this is that we always recounting the version map in 
registerDatanode since it is not a heavy operation.



> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> -
>
> Key: HDFS-9500
> URL: https://issues.apache.org/jira/browse/HDFS-9500
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive at that moment, if 
> namenode has not removed this node which will decrease the version map and 
> this node restarts in the new version, the decrementVersionCount belongs to 
> this node will never be executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9500) datanodesSoftwareVersions map may counting wrong when rolling upgrade

2015-12-02 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9500:

Status: Patch Available  (was: Open)

> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> -
>
> Key: HDFS-9500
> URL: https://issues.apache.org/jira/browse/HDFS-9500
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.2, 2.7.1
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive at that moment, if 
> namenode has not removed this node which will decrease the version map and 
> this node restarts in the new version, the decrementVersionCount belongs to 
> this node will never be executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >