[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225698#comment-15225698
 ] 

Hadoop QA commented on HDFS-10256:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} HDFS-10256 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-10256 |
| GITHUB PR | https://github.com/apache/hadoop/pull/90 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15064/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Use GenericTestUtils.getTestDir method in tests for temporary directories
> -
>
> Key: HDFS-10256
> URL: https://issues.apache.org/jira/browse/HDFS-10256
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225682#comment-15225682
 ] 

Andrew Wang commented on HDFS-8791:
---

bq. This is really unfortunate. Can you give a reference to the NameNode 
LayoutVersion change?

[~vinodkv] we made a few changes in 2.6 -> 2.7, you can look at 
NameNodeLayoutVersion.java for the short summary. Most of them are adding new 
edit log ops, but truncate is a bigger one.

bq. Did we ever establish clear rules about downgrades? We need to layout out 
our story around supporting downgrades continuously and codify it.

We've never addressed downgrade in our compatibility policy, so officially we 
don't have anything.

bq. I'd vote for keeping strict rules for downgrades too, otherwise users are 
left to fend for themselves in deciding the risk associated with every version 
upgrade - are we in a place where we can support this?

We aren't yet, unless we want to exclude most larger features from branch-2. 
Long ago HDFS-5223 was filed to add feature flags to HDFS, which would enable 
downgrade in more scenarios. We never reached consensus though, and it stalled 
out since people seemed to like rolling upgrade (HDFS-5535) more. It can be 
revived, but it does mean additional testing complexity though, and new 
features need to be developed with feature flags in mind.

I like this sentiment in general though, and would be in favor of requiring 
upgrade and downgrade within a major version, once we have HDFS-5223 in. It can 
be added compatibly to branch-2 since Colin did the groundwork in HDFS-5784.

bq. To conclude, is the consensus to document all these downgrade related 
breakages but keep them in 2.7.x and 2.8?

Unless we have new energy to pursue HDFS-5223, I think we'll keep doing LV 
changes. We've been doing them almost every 2.x release, so I think user 
expectations should be in line. AFAIK we've never announced a change in our 
support for downgrade in the 2.x line.

It's also worth noting that HDFS-5223 was thinking about NN LV changes, it 
predates the NN and DN LV split. Thus I'm not sure the feature flags work at 
HDFS-5784 would work for this particular JIRA.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job 

[jira] [Commented] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-04 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225669#comment-15225669
 ] 

Stephen Bovy commented on HDFS-10257:
-

When someone wants to use   "HAVE_BETTER_TLS"   (quick thread local storage)

The code should  not execute ""threadLocalStorageSet"   because it is not 
needed and not required.   

The changes I recommended would fix this 

First the  MACRO  "THREAD_LOCAL_STORAGE_SET_QUICK"  should be changed 

so that after  

   quickTlsEnv = (env);   is set   

Then  we should   

   return env;

Secondly   

the order for invoking the MACRO should be changed 

THREAD_LOCAL_STORAGE_SET_QUICK(env);

if (threadLocalStorageSet(env)) 
{ return NULL; } 



> Quick Thread Local Storage set-up has a small flaw
> --
>
> Key: HDFS-10257
> URL: https://issues.apache.org/jira/browse/HDFS-10257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.4
> Environment: Linux 
>Reporter: Stephen Bovy
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In   jni_helper.c   in the   getJNIEnvfunction 
> The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
> location;   
> It should precede   the  “threadLocalStorageSet(env)”   as follows ::  
> THREAD_LOCAL_STORAGE_SET_QUICK(env);
> if (threadLocalStorageSet(env)) {
>   return NULL;
> }
> AND IN   “thread_local_storage.h”   the macro:   
> “THREAD_LOCAL_STORAGE_SET_QUICK”
> should be as follows :: 
> #ifdef HAVE_BETTER_TLS
>   #define THREAD_LOCAL_STORAGE_GET_QUICK() \
> static __thread JNIEnv *quickTlsEnv = NULL; \
> { \
>   if (quickTlsEnv) { \
> return quickTlsEnv; \
>   } \
> }
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
> { \
>   quickTlsEnv = (env); \
>   return env;
> }
> #else
>   #define THREAD_LOCAL_STORAGE_GET_QUICK()
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
> #endif



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails

2016-04-04 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225666#comment-15225666
 ] 

Lin Yiqun commented on HDFS-9599:
-

Thanks [~iwasakims] for commit!

> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> ---
>
> Key: HDFS-9599
> URL: https://issues.apache.org/jira/browse/HDFS-9599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch
>
>
> From test result of a recent jenkins nightly 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead 
> of 3.
> Looking at the log, there is a strayed block, which might have caused the 
> faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2131)) - BLOCK* processReport: 
> blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any 
> file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case 
> testDecommissionStatusAfterDNRestart. This can happen, because the same 
> minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block 
> can not find sufficient data nodes to place replica. In this test, the 
> runtime should not consider load factor:
> {noformat}
> conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, 
> false);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-04 Thread Stephen Bovy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225659#comment-15225659
 ] 

Stephen Bovy commented on HDFS-10257:
-

Thanks for your quick reply :)

The problem is  "threadLocalStorageSet"  should not be executed  when someone 
wants to use  "HAVE_BETTER_TLS"

The changes  I recommended would fix this 



> Quick Thread Local Storage set-up has a small flaw
> --
>
> Key: HDFS-10257
> URL: https://issues.apache.org/jira/browse/HDFS-10257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.4
> Environment: Linux 
>Reporter: Stephen Bovy
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In   jni_helper.c   in the   getJNIEnvfunction 
> The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
> location;   
> It should precede   the  “threadLocalStorageSet(env)”   as follows ::  
> THREAD_LOCAL_STORAGE_SET_QUICK(env);
> if (threadLocalStorageSet(env)) {
>   return NULL;
> }
> AND IN   “thread_local_storage.h”   the macro:   
> “THREAD_LOCAL_STORAGE_SET_QUICK”
> should be as follows :: 
> #ifdef HAVE_BETTER_TLS
>   #define THREAD_LOCAL_STORAGE_GET_QUICK() \
> static __thread JNIEnv *quickTlsEnv = NULL; \
> { \
>   if (quickTlsEnv) { \
> return quickTlsEnv; \
>   } \
> }
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
> { \
>   quickTlsEnv = (env); \
>   return env;
> }
> #else
>   #define THREAD_LOCAL_STORAGE_GET_QUICK()
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
> #endif



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225649#comment-15225649
 ] 

Colin Patrick McCabe commented on HDFS-8791:


I don't think we ever discussed or documented anywhere the idea that we would 
support rolling downgrade throughout all of the 2.x releases.  In fact, I think 
almost every new minor 2.x release we've ever made has involved a layout 
version upgrade (maybe someone can think of some exceptions?).  And one major 
goal of HDFS-5535 was making such upgrades easier.

On the other hand, I tend to agree with [~busbey] that rolling downgrade should 
be supported in stability branches, and that's currently the discussion we're 
having with the stable branch maintainers.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225641#comment-15225641
 ] 

Colin Patrick McCabe commented on HDFS-10257:
-

Can you explain why you think this is a flaw?  If the {{threadLocalStorageSet}} 
function fails, I don't think we want the data lingering in the other TLS 
location.

> Quick Thread Local Storage set-up has a small flaw
> --
>
> Key: HDFS-10257
> URL: https://issues.apache.org/jira/browse/HDFS-10257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.4
> Environment: Linux 
>Reporter: Stephen Bovy
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In   jni_helper.c   in the   getJNIEnvfunction 
> The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
> location;   
> It should precede   the  “threadLocalStorageSet(env)”   as follows ::  
> THREAD_LOCAL_STORAGE_SET_QUICK(env);
> if (threadLocalStorageSet(env)) {
>   return NULL;
> }
> AND IN   “thread_local_storage.h”   the macro:   
> “THREAD_LOCAL_STORAGE_SET_QUICK”
> should be as follows :: 
> #ifdef HAVE_BETTER_TLS
>   #define THREAD_LOCAL_STORAGE_GET_QUICK() \
> static __thread JNIEnv *quickTlsEnv = NULL; \
> { \
>   if (quickTlsEnv) { \
> return quickTlsEnv; \
>   } \
> }
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
> { \
>   quickTlsEnv = (env); \
>   return env;
> }
> #else
>   #define THREAD_LOCAL_STORAGE_GET_QUICK()
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
> #endif



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10208) Addendum for HDFS-9579: to handle the case when client machine can't resolve network path

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225634#comment-15225634
 ] 

Hadoop QA commented on HDFS-10208:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s 
{color} | {color:red} root: patch generated 5 new + 278 unchanged - 0 fixed = 
283 total (was 278) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 23s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 45s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 15s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 

[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-04 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10175:

Attachment: TestStatisticsOverhead.java

I wrote a test named {{TestStatisticsOverhead}} to quantify the overheads more 
precisely.  On my machine, the difference between running with this patch and 
running without it is about 3MB, as quantified by {{getHeapMemoryUsage}}.  This 
is with 2000 threads.  Since this is a per-FS overhead, some applications that 
use 3 or 4 filesystem objects may experience 3x or 4x that memory overhead.

I agree that that this level of overhead is not a deal-breaker, but I would 
like to understand what we are getting in exchange.  Can you be a little bit 
clearer about the goals for this?  Right now it seems to support only HDFS.  
And many of the operations are things that only HDFS will ever support, such as 
{{getBytesWithFutureGenerationStamps}}. Some operations are not used at all, 
such as {{COPY_FROM_LOCAL_FILE}}.  It's a mishmash of high-level operations 
such as copy_from_local_file, which involve multiple FSes, and very low-level 
things like whether or not we called {{mkdirs}} or {{primitiveMkdir}}.  Are you 
planning to add this statistics support to other filesystems as well?

{{HAS_NEXT}} does not seem like a metric that users would be interested in.  If 
we do another {{listStatus}} RPC, we should increment {{listStatus}}.

If we are going to have these maps, we should move the existing hard-coded 
statistics longs into the maps for simplicity's sake-- or at least things like 
bytesReadDistanceOfOneOrTwo and bytesReadDistanceOfThreeOrFour.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225603#comment-15225603
 ] 

Hadoop QA commented on HDFS-10235:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 52s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796973/HDFS-10235-002.patch |
| JIRA Issue | HDFS-10235 |
| Optional Tests |  asflicense  |
| uname | Linux d75049bd308b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 818d6b7 |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15062/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235-002.patch, HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225581#comment-15225581
 ] 

Hadoop QA commented on HDFS-9732:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 27s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 37s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 51s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 54s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | 

[jira] [Created] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-04 Thread Stephen Bovy (JIRA)
Stephen Bovy created HDFS-10257:
---

 Summary: Quick Thread Local Storage set-up has a small flaw
 Key: HDFS-10257
 URL: https://issues.apache.org/jira/browse/HDFS-10257
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.6.4
 Environment: Linux 
Reporter: Stephen Bovy
Priority: Minor


In   jni_helper.c   in the   getJNIEnvfunction 

The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
location;   

It should precede   the  “threadLocalStorageSet(env)”   as follows ::  

THREAD_LOCAL_STORAGE_SET_QUICK(env);

if (threadLocalStorageSet(env)) {
  return NULL;
}

AND IN   “thread_local_storage.h”   the macro:   
“THREAD_LOCAL_STORAGE_SET_QUICK”
should be as follows :: 

#ifdef HAVE_BETTER_TLS
  #define THREAD_LOCAL_STORAGE_GET_QUICK() \
static __thread JNIEnv *quickTlsEnv = NULL; \
{ \
  if (quickTlsEnv) { \
return quickTlsEnv; \
  } \
}

  #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
{ \
  quickTlsEnv = (env); \
  return env;
}
#else
  #define THREAD_LOCAL_STORAGE_GET_QUICK()
  #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
#endif






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10235:

Attachment: HDFS-10235-002.patch

Uploaded the patch..Added the time unit(sec) for Live Nodes.. 

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235-002.patch, HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225546#comment-15225546
 ] 

Brahma Reddy Battula commented on HDFS-10245:
-

bq. There was 1 findbugs found in prepatch.
Yes exactly, this Jira itself is to fix that. In which case, findbug report 
should point to post-patch findbugs. No matter whether its empty.
No explicit filtered report for findbugs induced from the patch is not required.
But, {{0 new + 0 unchanged - 1 fixed = 0 total (was 1)}} this message at the 
end, similar to checkstyle fixes will help to indentify changes.

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7.patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10250) Ozone: Add key Persistence

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225544#comment-15225544
 ] 

Hadoop QA commented on HDFS-10250:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
33s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
17s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 51s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 56m 44s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.TestHFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796938/HDFS-10250-HDFS-7240.003.patch
 |
| JIRA Issue | HDFS-10250 

[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225529#comment-15225529
 ] 

Hudson commented on HDFS-9917:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9555 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9555/])
HDFS-9917. IBR accumulate more objects when SNN was down for sometime. 
(vinayakumarb: rev 818d6b799eead13a17a0214172df60a269b046fb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java


> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, 
> HDFS-9917-branch-2.7.patch, HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225526#comment-15225526
 ] 

Brahma Reddy Battula commented on HDFS-9917:


[~vinayrpet] thanks for review and commit and thanks for [~szetszwo] for 
additional review.

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, 
> HDFS-9917-branch-2.7.patch, HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225522#comment-15225522
 ] 

Sean Busbey commented on HDFS-8791:
---

Downgrade breakage only in minor versions please

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-04-04 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-9917:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8 and branch-2.7.

Thanks [~brahmareddy] for the contribution.
Thanks [~szetszwo] for reviews.

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, 
> HDFS-9917-branch-2.7.patch, HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-04-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225500#comment-15225500
 ] 

Vinayakumar B commented on HDFS-9917:
-

Committing shortly.

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, 
> HDFS-9917-branch-2.7.patch, HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225492#comment-15225492
 ] 

Vinayakumar B commented on HDFS-10178:
--

Thanks [~kihwal] for making me clear about the issue, nice find and the fix.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225482#comment-15225482
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8791:
---

bq.  I didn't think that it broke rolling upgrade (you should still be able to 
upgrade from an earlier layout version to this one). Did I miss something?
My point was mainly about rolling downgrade. Just used upgrade/downgrade 
together in my comment because in my mind the expectations are the same.

bq. Do we actually support downgrade between 2.7 and 2.6? We changed the 
NameNode LayoutVersion, so I don't think so. These branches don't have 
HDFS-8432 either.
[~andrew.wang], tx for this info.

This is really unfortunate. Can you give a reference to the NameNode 
LayoutVersion change?

Did we ever establish clear rules about downgrades? We need to layout out our 
story around supporting downgrades continuously and codify it. I'd vote for 
keeping strict rules for downgrades too, otherwise users are left to fend for 
themselves in deciding the risk associated with every version upgrade - are we 
in a place where we can support this?

For upgrades, there is tribal knowledge amongst committers/reviewers in the 
minimum. And on YARN side, we've proposed (but made little progress) for tools 
to automatically catch some of it - YARN-3292.

To conclude, is the consensus to document all these downgrade related breakages 
but keep them in 2.7.x and 2.8?

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225458#comment-15225458
 ] 

Hudson commented on HDFS-8496:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9554 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9554/])
HDFS-8496. Calling stopWriter() with FSDatasetImpl lock held may block 
(cmccabe: rev f6b1a818124cc42688c4c5acaf537d96cf00e43b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java


> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225456#comment-15225456
 ] 

Colin Patrick McCabe commented on HDFS-8496:


Committed to 2.8... thanks, all.

> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8496:
---
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0  (was: 2.7.3, 2.6.5)
  Status: Resolved  (was: Patch Available)

> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225423#comment-15225423
 ] 

Colin Patrick McCabe edited comment on HDFS-8496 at 4/5/16 1:01 AM:


Thanks for the review, [~eddyxu].  The "hog" was intentional, I was referring 
to the idea that the function could "hog" the lock by not letting go of it.


was (Author: cmccabe):
Thanks for the review, [~eddyxu].  The "hog" was intentional, I was referring 
to the idea that the function could "hog" the lock by not letting go of it.  On 
second thought, though, I like your proposed name better-- I will fixup on 
commit.

> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225423#comment-15225423
 ] 

Colin Patrick McCabe commented on HDFS-8496:


Thanks for the review, [~eddyxu].  The "hog" was intentional, I was referring 
to the idea that the function could "hog" the lock by not letting go of it.  On 
second thought, though, I like your proposed name better-- I will fixup on 
commit.

> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-04 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-10256:
-
Target Version/s: 2.9.0
  Status: Patch Available  (was: Open)

> Use GenericTestUtils.getTestDir method in tests for temporary directories
> -
>
> Key: HDFS-10256
> URL: https://issues.apache.org/jira/browse/HDFS-10256
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7717) Erasure Coding: distribute replication to EC conversion work to DataNode

2016-04-04 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7717:

Summary: Erasure Coding: distribute replication to EC conversion work to 
DataNode  (was: Erasure Coding: distribute replication to stripping erasure 
coding conversion work to DataNode)

> Erasure Coding: distribute replication to EC conversion work to DataNode
> 
>
> Key: HDFS-7717
> URL: https://issues.apache.org/jira/browse/HDFS-7717
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: GAO Rui
>
> In *stripping* erasure coding case, we need some approach to distribute 
> conversion work between replication and stripping erasure coding to DataNode. 
> It can be NameNode, or a tool utilizing MR just like the current distcp, or 
> another one like the balancer/mover. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2016-04-04 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225393#comment-15225393
 ] 

Lei (Eddy) Xu commented on HDFS-8496:
-

+1 Thanks for the nice work, [~sinago], [~cmccabe].

One minor type, {{testInitReplicaRecoveryDoesNotHogLock()}}  -> 
{{test...HoldLock}} ?

> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch, 
> HDFS-8496.003.patch, HDFS-8496.004.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x000770465dc0> (a java.lang.Object)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x0007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9909) Can't read file after hdfs restart

2016-04-04 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225381#comment-15225381
 ] 

Xiao Chen commented on HDFS-9909:
-

Thanks for reporting this Bogdan. I'd like to work on this.

The test program attached actually passes on trunk. I further tested on 
branch-2.7 and it fails. The reason for this is that we have HDFS-5356 which 
automatically closes DFS on trunk.

To double check this, if I change the {{FileSystem}} to 
{{DistributedFileSystem}}, and call {{close}} before shutdown, the test passes 
on branch-2.7 as well.
{code:java}
/* start cluster and write something */
//  FileSystem fs = cluster.getFileSystem();
DistributedFileSystem fs = cluster.getFileSystem();// 
<=== change to DFS
FSDataOutputStream out = fs.create(path, true);
out.write(bytes);
out.hflush();
/* stop cluster while file is open for writing */
cluster.shutdown();
fs.close();// <=== move this line up by 1

{code}

So now comes the original question: what if the client didn't call 
{{DFS#close}}, and HDFS restarts? Currently the file lease is revoked after the 
hard limit (1hr). Before that READ fails (e.g. {{cp: Cannot obtain block length 
for ...}}). This is because {{fs.delete}} or {{fs.create}} will remove the 
lease, but {{fs.open}} will not.

Since it's the read that fails, I don't think it's a good idea to recover the 
lease on the read operation. One possible solution is perhaps to detect this 
and revoke on restart. I need to further investigate and will update here. 

> Can't read file after hdfs restart
> --
>
> Key: HDFS-9909
> URL: https://issues.apache.org/jira/browse/HDFS-9909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Bogdan Raducanu
>Priority: Critical
> Attachments: Main.java
>
>
> If HDFS is restarted while a file is open for writing then new clients can't 
> read that file until the hard lease limit expires and block recovery starts.
> Scenario:
> 1. write to file, call hflush
> 2. without closing the file, restart hdfs 
> 3. after hdfs is back up, opening file for reading from a new client fails 
> for 1 hour
> Repro attached.
> Thoughts:
> * possibly this also happens in other cases not just when hdfs is restarted 
> (e.g. only all datanodes in pipeline are restarted)
> * As far as I can tell this happens because the last block is RWR and 
> getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
> lease limit expires (so file is readable only after 1 hour).
> * one can call recoverLease which will start the lease recovery sooner, BUT, 
> how can one know when to call this? The exception thrown is IOException which 
> can happen for other reasons.
> I think a reasonable solution would be to return a specialized exception 
> (similar to AlreadyBeingCreatedException when trying to write to open file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9909) Can't read file after hdfs restart

2016-04-04 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reassigned HDFS-9909:
---

Assignee: Xiao Chen

> Can't read file after hdfs restart
> --
>
> Key: HDFS-9909
> URL: https://issues.apache.org/jira/browse/HDFS-9909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Bogdan Raducanu
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: Main.java
>
>
> If HDFS is restarted while a file is open for writing then new clients can't 
> read that file until the hard lease limit expires and block recovery starts.
> Scenario:
> 1. write to file, call hflush
> 2. without closing the file, restart hdfs 
> 3. after hdfs is back up, opening file for reading from a new client fails 
> for 1 hour
> Repro attached.
> Thoughts:
> * possibly this also happens in other cases not just when hdfs is restarted 
> (e.g. only all datanodes in pipeline are restarted)
> * As far as I can tell this happens because the last block is RWR and 
> getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
> lease limit expires (so file is readable only after 1 hour).
> * one can call recoverLease which will start the lease recovery sooner, BUT, 
> how can one know when to call this? The exception thrown is IOException which 
> can happen for other reasons.
> I think a reasonable solution would be to return a specialized exception 
> (similar to AlreadyBeingCreatedException when trying to write to open file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225347#comment-15225347
 ] 

Hadoop QA commented on HDFS-10207:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-10207 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796941/HDFS-10207-HDFS-9000.002.patch
 |
| JIRA Issue | HDFS-10207 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15060/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-04 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10207:
-
Status: Open  (was: Patch Available)

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-04 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10207:
-
Status: Patch Available  (was: Open)

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-04 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10207:
-
Attachment: HDFS-10207-HDFS-9000.002.patch

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-04 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225304#comment-15225304
 ] 

Xiaobing Zhou commented on HDFS-10207:
--

v002 added more unit tests.

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10250) Ozone: Add key Persistence

2016-04-04 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10250:

Attachment: HDFS-10250-HDFS-7240.003.patch

Hopefully this slays the typo demon 

> Ozone: Add key Persistence
> --
>
> Key: HDFS-10250
> URL: https://issues.apache.org/jira/browse/HDFS-10250
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-10250-HDFS-7240.001.patch, 
> HDFS-10250-HDFS-7240.002.patch, HDFS-10250-HDFS-7240.003.patch
>
>
> Support writing keys to containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-04 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225283#comment-15225283
 ] 

Yongjun Zhang commented on HDFS-9732:
-

Hi [~ste...@apache.org],

Thanks a lot for your review and comments. Sorry for getting back late since I 
was out last week.

I just uploaded rev 03 to address your comments:
* I changed the method to {{toStringStable}} with java docs
* I found using mock in this case is hard to create a valid 
DelegationTokenIdentifier to be parsed by 
{{AbstractDelegationTokenIdentifier#readFields}}, so I decided to plugin my 
test to an existing test that starts mini cluster
* I introduced a new method visible for test to return a string, the method is 
called both in production to print out to System.out and can be checked in unit 
test

Would you, [~cnauroth] and [~aw] please help to take a look at rev 03? thanks a 
lot.
 

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-04 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9732:

Attachment: HDFS-9732.003.patch

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225258#comment-15225258
 ] 

Chris Trezzo commented on HDFS-8791:


[~busbey] That being said, I did manually verify the previous directory 
contained the correct content on multiple datanodes (similar to what the above 
test case does). The rollback functionality seems to simply rename the previous 
directory back to current. As stated above, this functionality should be 
identical to the previous layout (and was presumably tested).

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225244#comment-15225244
 ] 

Chris Trezzo commented on HDFS-8791:


bq. My understanding was that this patch adds a new DataNode layout version 
which should be more efficient. I didn't think that it broke rolling upgrade 
(you should still be able to upgrade from an earlier layout version to this 
one). Did I miss something?

You are correct. This does not break rolling upgrade. This code path has been 
tested. The major concern is over the fact that you can not downgrade and have 
to do a rollback.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225242#comment-15225242
 ] 

Chris Trezzo commented on HDFS-8791:


bq. Please expand the release note to state that should something go wrong, 
rolling downgrade will not be possible and a rollback (which requires downtime) 
will be needed.

Done. To be clear, this new layout uses all of the same code paths as the first 
block-id based layout, the only change is the number of directories. It should 
support the same mechanisms that the previous layout supported. From looking at 
the code, my understanding is that it supports rolling upgrades and rollbacks 
(i.e. downgrades are not supported).

bq. Has someone tested that rollback works after upgrading to a version with 
this patch? I saw someone manually examined a non-finalized previous directory, 
but I'm wondering if someone walked through the entire process.

I personally did not test the full rollback path. There is some minimal test 
coverage ensuring that the contents of the previous directory is as expected 
(see TestDataNodeRollingUpgrade#testWithLayoutChangeAndFinalize). The new 
layout should be very similar to the old layout, but I agree that is still no 
excuse not to test it.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8791:
---
Release Note: HDFS-8791 introduces a new datanode layout format. This 
layout is identical to the previous block id based layout except it has a 
smaller 32x32 sub-directory structure in each data storage. On startup, the 
datanode will automatically upgrade it's storages to this new layout. 
Currently, datanode layout changes support rolling upgrades, on the other hand 
downgrading is not supported between datanode layout changes and a rollback 
would be required.  (was: HDFS-8791 introduces a new datanode layout format. 
This layout is identical to the previous block id based layout except it has a 
smaller 32x32 sub-directory structure in each data storage. On startup, the 
datanode will automatically upgrade it's storages to this new layout.)

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10251) Ozone: Shutdown cleanly

2016-04-04 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10251:

Attachment: HDFS-10251-HDFS-7240.001.patch

posting for early code review, depends on HDFS-10250

> Ozone: Shutdown cleanly
> ---
>
> Key: HDFS-10251
> URL: https://issues.apache.org/jira/browse/HDFS-10251
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-10251-HDFS-7240.001.patch
>
>
> Ozone containers needs to shutdown cleanly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225151#comment-15225151
 ] 

Colin Patrick McCabe commented on HDFS-8791:


Hmm.  My understanding was that this patch adds a new DataNode layout version 
which should be more efficient.  I didn't think that it broke rolling upgrade 
(you should still be able to upgrade from an earlier layout version to this 
one).  Did I miss something?  On the other hand, clearly, with a new layout 
version you will not be able to downgrade, but will have to rollback, which is 
a definite tradeoff-- and maybe a reason to keep it out of "dot releases" (but 
not minor releases)

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225148#comment-15225148
 ] 

Hudson commented on HDFS-10178:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9552 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9552/])
HDFS-10178. Permanent write failures can happen if pipeline recoveries (kihwal: 
rev a7d1fb0cd2fdbf830602eb4dbbd9bbe62f4d5584)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java


> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10178:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Committed to trunk through branch-2.7. The 2.7 cherry-pick was clean but the 
test was modified to use {{DFSConfigKeys}} instead of {{HdfsClientConfigKeys}}. 
Thanks for reviews and comments, Akira, Arpit, Masataki and Vinay.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10216) distcp -diff relative path exception

2016-04-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225130#comment-15225130
 ] 

Jing Zhao commented on HDFS-10216:
--

Looks like {{getPathWithSchemeAndAuthority}} does not correctly handle relative 
path when creating the path:
{code}
}

return new Path(scheme, authority, path.toUri().getPath());
{code}

Do you want to provide a patch to fix this, [~jzhuge]?

> distcp -diff relative path exception
> 
>
> Key: HDFS-10216
> URL: https://issues.apache.org/jira/browse/HDFS-10216
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Got this exception when running {{distcp -diff}} with relative paths:
> {code}
> $ hadoop distcp -update -diff s1 s2 d1 d2
> 16/03/25 09:45:40 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], 
> targetPath=d2, targetPathExists=true, preserveRawXattrs=false, 
> filtersFile='null'}
> 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at 
> jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032
> 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:197)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:123)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:436)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at java.net.URI.checkPath(URI.java:1804)
>   at java.net.URI.(URI.java:752)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   ... 11 more
> {code}
> But theses commands worked:
> * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 
> /user/systest/d2}}
> * No {{-diff}}: {{hadoop distcp -update d1 d2}}
> However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 
> d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. 
> Trying to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225103#comment-15225103
 ] 

Kihwal Lee commented on HDFS-10178:
---

{{TestHFlush}}: HDFS-2043 Will review the patch.
JDK8 failures don't have logs, so it is hard to debug.
{{TestDFSClientRetries}}: timed out. Tried to restart namenode, but timed out. 
Without seeing the log, it s hard to know what went wrong. 
{{TestReplication}}: timed out. Datanode shutdown hung at netty shutdown.
{noformat}
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:590)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:503)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:160)
at 
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:70)
at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:249)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1863)
{noformat}
{{TestBlockTokenWithDFS}}: The datanode was restarted and having bind 
exception. The old port is taken.

Test failures are not related to this patch. They are passing when run on my 
machine.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10250) Ozone: Add key Persistence

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225090#comment-15225090
 ] 

Hadoop QA commented on HDFS-10250:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
43s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} HDFS-7240 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 35s 
{color} | {color:green} HDFS-7240 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 30s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 4s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225062#comment-15225062
 ] 

Andrew Wang commented on HDFS-8791:
---

Do we actually support downgrade between 2.7 and 2.6? We changed the NameNode 
LayoutVersion, so I don't think so. These branches don't have HDFS-8432 either.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode

2016-04-04 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6962:
--
Priority: Critical  (was: Major)
Hadoop Flags: Incompatible change

Bumping the importance of this issue for Hadoop 3.0. [~supputuri] are you still 
interested in working on this JIRA?

> ACLs inheritance conflict with umaskmode
> 
>
> Key: HDFS-6962
> URL: https://issues.apache.org/jira/browse/HDFS-6962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
> Environment: CentOS release 6.5 (Final)
>Reporter: LINTE
>Assignee: Srikanth Upputuri
>Priority: Critical
>  Labels: hadoop, security
> Attachments: HDFS-6962.1.patch
>
>
> In hdfs-site.xml 
> 
> dfs.umaskmode
> 027
> 
> 1/ Create a directory as superuser
> bash# hdfs dfs -mkdir  /tmp/ACLS
> 2/ set default ACLs on this directory rwx access for group readwrite and user 
> toto
> bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
> bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
> 3/ check ACLs /tmp/ACLS/
> bash# hdfs dfs -getfacl /tmp/ACLS/
> # file: /tmp/ACLS
> # owner: hdfs
> # group: hadoop
> user::rwx
> group::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
> hdfs-site.xml, everything ok !
> default:group:readwrite:rwx allow readwrite group with rwx access for 
> inhéritance.
> default:user:toto:rwx allow toto user with rwx access for inhéritance.
> default:mask::rwx inhéritance mask is rwx, so no mask
> 4/ Create a subdir to test inheritance of ACL
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
> 5/ check ACLs /tmp/ACLS/hdfs
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
> # file: /tmp/ACLS/hdfs
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:r-x
> group::r-x
> group:readwrite:rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
> because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
> is set to default:mask::rwx on /tmp/ACLS/
> 6/ Modifiy hdfs-site.xml et restart namenode
> 
> dfs.umaskmode
> 010
> 
> 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
> 8/ Check ACL on /tmp/ACLS/hdfs2
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2
> # file: /tmp/ACLS/hdfs2
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:rw-
> group::r-x  #effective:r--
> group:readwrite:rwx #effective:rw-
> mask::rw-
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> So HDFS masks the ACL value (user, group and other  -- exepted the POSIX 
> owner -- ) with the group mask of dfs.umaskmode properties when creating 
> directory with inherited ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225026#comment-15225026
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8791:
---

I commented on the JIRA way back (see 
https://issues.apache.org/jira/browse/HDFS-8791?focusedCommentId=1503=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1503),
 saying what I said below. Unfortunately, I haven’t followed the patch along 
after my initial comment. 

This isn’t about any specific release - starting 2.6 we declared support for 
rolling upgrades and downgrades. Any patch that breaks this should not be in 
branch-2.

Two options from where I stand
 # For folks who worked on the patch: Is there a way to make (a) the 
upgrade-downgrade seamless for people who don’t care about this (b) and have 
explicit documentation for people who care to switch this behavior on and are 
willing to risk not having downgrades. If this means a new configuration 
property, so be it. It’s a necessary evil.
 # Just let specific users backport this into specific 2.x branches they need 
and leave it only on trunk.

Unless this behavior stops breaking rolling upgrades/downgrades, I think we 
should just revert it from branch-2 and definitely 2.7.3 as it stands today.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10250) Ozone: Add key Persistence

2016-04-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225010#comment-15225010
 ] 

Chris Nauroth commented on HDFS-10250:
--

Hello [~anu].  In patch v002, I still see typos on the names of the chunk 
response objects:

{code}
  optional  WriteChunkReponseProto writeChunk = 12;
  optional  ReadChunkReponseProto readChunk = 13;
{code}


> Ozone: Add key Persistence
> --
>
> Key: HDFS-10250
> URL: https://issues.apache.org/jira/browse/HDFS-10250
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-10250-HDFS-7240.001.patch, 
> HDFS-10250-HDFS-7240.002.patch
>
>
> Support writing keys to containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2016-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225002#comment-15225002
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8893:
---

Tx for the quick response, [~shahrs87]!

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2016-04-04 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224993#comment-15224993
 ] 

Rushabh S Shah commented on HDFS-8893:
--

This is still a bug. Don't have enough cycles to work on this.
Moving it to next release. Hopefully will fix it by then.

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8871) Decommissioning of a node with a failed volume may not start

2016-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224994#comment-15224994
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8871:
---

[~daryn] / [~kihwal], any update on this? Is this still a bug? Considering this 
for a 2.7.3 RC later this week. Thanks.


> Decommissioning of a node with a failed volume may not start
> 
>
> Key: HDFS-8871
> URL: https://issues.apache.org/jira/browse/HDFS-8871
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Critical
>
> Since staleness may not be properly cleared, a node with a failed volume may 
> not actually get scanned for block replication. Nothing is being replicated 
> from these nodes.
> This bug does not manifest unless the datanode has a unique storage ID per 
> volume. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2016-04-04 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-8893:
-
Target Version/s: 2.7.4  (was: 2.7.3)

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224987#comment-15224987
 ] 

Hadoop QA commented on HDFS-10178:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
99 unchanged - 2 fixed = 101 total (was 101) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 42s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 59s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796873/HDFS-10178.v5.patch |
| JIRA Issue | HDFS-10178 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 47eac507384a 

[jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2016-04-04 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224988#comment-15224988
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8893:
---

[~daryn] / [~shahrs87], any progress on this? Is this this still a bug? 
Considering this for a 2.7.3 RC later this week.


> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224888#comment-15224888
 ] 

Masatake Iwasaki commented on HDFS-10178:
-

Thanks for the update, [~kihwal]. +1 too on v5.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224876#comment-15224876
 ] 

Ravi Prakash commented on HDFS-10235:
-

I too am +1 for relative time for live nodes and absolute time for dead nodes.

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails

2016-04-04 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9599:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

Committed. Thanks, [~linyiqun] and [~jojochuang].

> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> ---
>
> Key: HDFS-9599
> URL: https://issues.apache.org/jira/browse/HDFS-9599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch
>
>
> From test result of a recent jenkins nightly 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead 
> of 3.
> Looking at the log, there is a strayed block, which might have caused the 
> faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2131)) - BLOCK* processReport: 
> blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any 
> file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case 
> testDecommissionStatusAfterDNRestart. This can happen, because the same 
> minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block 
> can not find sufficient data nodes to place replica. In this test, the 
> runtime should not consider load factor:
> {noformat}
> conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, 
> false);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails

2016-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224757#comment-15224757
 ] 

Hudson commented on HDFS-9599:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9551 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9551/])
HDFS-9599. TestDecommissioningStatus.testDecommissionStatus occasionally 
(iwasakims: rev 154d2532cf015e9ab9141864bd3ab0d6100ef597)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java


> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> ---
>
> Key: HDFS-9599
> URL: https://issues.apache.org/jira/browse/HDFS-9599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
> Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch
>
>
> From test result of a recent jenkins nightly 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead 
> of 3.
> Looking at the log, there is a strayed block, which might have caused the 
> faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2131)) - BLOCK* processReport: 
> blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any 
> file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case 
> testDecommissionStatusAfterDNRestart. This can happen, because the same 
> minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block 
> can not find sufficient data nodes to place replica. In this test, the 
> runtime should not consider load factor:
> {noformat}
> conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, 
> false);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails

2016-04-04 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224729#comment-15224729
 ] 

Masatake Iwasaki commented on HDFS-9599:


+1

> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> ---
>
> Key: HDFS-9599
> URL: https://issues.apache.org/jira/browse/HDFS-9599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
> Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch
>
>
> From test result of a recent jenkins nightly 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead 
> of 3.
> Looking at the log, there is a strayed block, which might have caused the 
> faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2131)) - BLOCK* processReport: 
> blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any 
> file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case 
> testDecommissionStatusAfterDNRestart. This can happen, because the same 
> minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block 
> can not find sufficient data nodes to place replica. In this test, the 
> runtime should not consider load factor:
> {noformat}
> conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, 
> false);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224718#comment-15224718
 ] 

Arpit Agarwal commented on HDFS-10178:
--

+1 pending Jenkins.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2016-04-04 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224675#comment-15224675
 ] 

Arpit Agarwal commented on HDFS-8277:
-

Just to make it explicit, I am -1 on that approach.

> Safemode enter fails when Standby NameNode is down
> --
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2.0
>Reporter: Hari Sekhon
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2016-04-04 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224665#comment-15224665
 ] 

Arpit Agarwal commented on HDFS-8277:
-

Hi [~surendrasingh], I am not convinced the v1 patch is a good idea. I've 
explained my reasoning in earlier comments.

> Safemode enter fails when Standby NameNode is down
> --
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2.0
>Reporter: Hari Sekhon
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10178:
--
Attachment: HDFS-10178.v5.patch

Five time's a charm. Moved the fault injection code. Verified it still fails 
without the fix and passes with the fix.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch, HDFS-10178.v5.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) [umbrella] support hflush and hsync for erasure coded files

2016-04-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224630#comment-15224630
 ] 

Jing Zhao commented on HDFS-7661:
-

I agree with [~demongaorui]'s comment here. In general, storing data cells will 
not simplify the problem when we have multiple flushes within the same stripe. 
Starting from the 2nd flush, to "keep the last time flushed data" safe during 
the current flush, we need to have similar mechanism proposed in the current 
design.

> [umbrella] support hflush and hsync for erasure coded files
> ---
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-v20160323.pdf, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10231) libhdfs++: Fix race conditions in RPC layer

2016-04-04 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-10231:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed with c2e1e23d8ea9a6002c1aafe18f7e6e7e41bb3301

> libhdfs++: Fix race conditions in RPC layer
> ---
>
> Key: HDFS-10231
> URL: https://issues.apache.org/jira/browse/HDFS-10231
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Bob Hansen
> Attachments: HDFS-10231.HDFS-8707.000.patch, 
> HDFS-10231.HDFS-8707.001.patch, HDFS-10231.HDFS-8707.002.patch, 
> HDFS-10231.HDFS-8707.003.patch
>
>
> Namenode calls seem to hang and the retry logic never properly kicks in.  It 
> looks like there's a race condition that leads to a failed rpc call never 
> properly passing the request object to the new RpcConnectionImpl so things 
> hang forever.  RpcConnectionImpl objects are also kept alive longer than they 
> should because of a shared_ptr cycle between them and the timeout tracking 
> objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224316#comment-15224316
 ] 

Allen Wittenauer commented on HDFS-10245:
-

bq. Seems to be, findbug report shows 
branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html instead of 
patch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html

Nope, findbugs report is acting as expected.

There was 1 findbugs found in prepatch so it points to the branch log to show 
you what that one is. There are no *additional* findbugs messages therefore 
there is no report.  There is no value at pointing to an empty log.

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7.patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10178:
--
Description: 
We have observed that write fails permanently if the first packet doesn't go 
through properly and pipeline recovery happens. If the write op creates a 
pipeline, but the actual data packet does not reach one or more datanodes in 
time, the pipeline recovery will be done against the 0-byte partial block.  

If additional datanodes are added, the block is transferred to the new nodes.  
After the transfer, each node will have a meta file containing the header and 
0-length data block file. The pipeline recovery seems to work correctly up to 
this point, but write fails when actual data packet is resent. 

  was:
We have observed that write fails permanently if the first packet doesn't go 
through properly and pipeline recovery happens. If the packet header is sent 
out, but the data portion of the packet does not reach one or more datanodes in 
time, the pipeline recovery will be done against the 0-byte partial block.  

If additional datanodes are added, the block is transferred to the new nodes.  
After the transfer, each node will have a meta file containing the header and 
0-length data block file. The pipeline recovery seems to work correctly up to 
this point, but write fails when actual data packet is resent. 


> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the write op creates a 
> pipeline, but the actual data packet does not reach one or more datanodes in 
> time, the pipeline recovery will be done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224294#comment-15224294
 ] 

Kihwal Lee commented on HDFS-10178:
---

[~vinayrpet], sorry, I gave a confusing description of the problem. I was 
mixing the meta file header and the non-payload protobuf fields.  After a 
connection is made and the command is parsed, a {{BlockReceiver}} is created 
and {{createRbw()}} is called before getting to the packet. It creates a meta 
file with header only. If this is used for transferring the block, the checksum 
type is lost.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2016-04-04 Thread Michael Wall (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Wall updated HDFS-6376:
---
Assignee: Dave Marion  (was: Michael Wall)

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 2.6.0
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, 
> HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch, 
> HDFS-6376.009.patch, HDFS-6376.010.patch, HDFS-6376.011.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224222#comment-15224222
 ] 

Brahma Reddy Battula commented on HDFS-9917:


{{TestRenameSnapshot}},{{TestDecommissioningStatuspasses}} and {{TestBalancer 
}} locally passed.
{{TestHFlush}} is tracked in HDFS-2043...

Kindly review the patch.

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Attachments: HDFS-9917-02.patch, HDFS-9917-branch-2.7-002.patch, 
> HDFS-9917-branch-2.7.patch, HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2016-04-04 Thread loushang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

loushang updated HDFS-6376:
---
Assignee: Michael Wall  (was: Dave Marion)

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
>Assignee: Michael Wall
> Fix For: 2.6.0
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, 
> HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch, 
> HDFS-6376.009.patch, HDFS-6376.010.patch, HDFS-6376.011.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224205#comment-15224205
 ] 

Vinayakumar B commented on HDFS-10178:
--

Hi [~kihwal], I am trying to understand the issue. may be my understanding 
about packet sending/receiving is wrong. 
bq. If the packet header is sent out, but the data portion of the packet does 
not reach one or more datanodes in time
How Only header can be sent out without any data/datalen? header will have the 
payload length. So PacketReceiver should fail/wait to receive the entire packet 
itself right (not just receive header portion) ? 
or payload length is corrupted in the incoming packet to 0?
{code:java|title=packetReceiver.java#doRead(..)}
// Each packet looks like:
//   PLENHLEN  HEADER CHECKSUMS  DATA
//   32-bit  16-bit 
//
// PLEN:  Payload length
//= length(PLEN) + length(CHECKSUMS) + length(DATA)
//This length includes its own encoded length in
//the sum for historical reasons.
//
// HLEN:  Header length
//= length(HEADER)
//
// HEADER:the actual packet header fields, encoded in protobuf
// CHECKSUMS: the crcs for the data chunk. May be missing if
//checksums were not requested
// DATA   the actual block data
{code}

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-04-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224116#comment-15224116
 ] 

Kihwal Lee commented on HDFS-10178:
---

bq. Should the test logic be encapsulate in the DataNodeFaultInjector's method? 
Good point. That's where it logically belong and the code will look cleaner.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224092#comment-15224092
 ] 

ASF GitHub Bot commented on HDFS-10256:
---

GitHub user vinayakumarb opened a pull request:

https://github.com/apache/hadoop/pull/90

HDFS-10256. Use GenericTestUtils.getTestDir method in tests for temporary 
directories

Separated the HDFS changes from HADOOP-12984

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinayakumarb/hadoop features/HDFS-10256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/90.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #90


commit c029cee003492b68954be8a20cfc23ad22bedaa5
Author: Vinayakumar B 
Date:   2016-04-04T12:54:37Z

HADOOP-12984. Add GenericTestUtils.getTestDir method and use it for 
temporary directory in tests

commit 1e7c43187deb18fe3bd56250fc6620d82d0602bb
Author: Vinayakumar B 
Date:   2016-04-04T13:04:24Z

HDFS-10256. Use GenericTestUtils.getTestDir method in tests for temporary 
directories




> Use GenericTestUtils.getTestDir method in tests for temporary directories
> -
>
> Key: HDFS-10256
> URL: https://issues.apache.org/jira/browse/HDFS-10256
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-04 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224077#comment-15224077
 ] 

Vinayakumar B commented on HDFS-10256:
--

Depends on HADOOP-12984.

> Use GenericTestUtils.getTestDir method in tests for temporary directories
> -
>
> Key: HDFS-10256
> URL: https://issues.apache.org/jira/browse/HDFS-10256
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-04 Thread Vinayakumar B (JIRA)
Vinayakumar B created HDFS-10256:


 Summary: Use GenericTestUtils.getTestDir method in tests for 
temporary directories
 Key: HDFS-10256
 URL: https://issues.apache.org/jira/browse/HDFS-10256
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: build, test
Reporter: Vinayakumar B
Assignee: Vinayakumar B






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224037#comment-15224037
 ] 

Brahma Reddy Battula commented on HDFS-10245:
-

Hi [~aw]

 *Seems to be, findbug report shows 
{{branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html}} 
{color:blue}instead of{color} 
{{patch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html}}* 

 *After Patch shows "0" warnings. which is not considered for QA* 

{noformat}


 Patch findbugs detection




cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs
mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-branch-2.7-0 -Ptest-patch 
-DskipTests test-compile findbugs:findbugs -DskipTests=true > 
/testptch/hadoop/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
 2>&1
Elapsed:   3m  1s
Starting with 
/testptch/hadoop/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.xml
Merging 
/testptch/hadoop/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.xml
Writing 
/testptch/hadoop/patchprocess/combined-findbugs-hadoop-hdfs-project_hadoop-hdfs.xml

hadoop-hdfs-project/hadoop-hdfs generated 0 new + 0 unchanged - 1 fixed = 0 
total (was 1)
{noformat}

 *Pre-patch shows "1" warning, same reported in QA.* 
{noformat}


   Pre-patch findbugs detection




cd /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs
mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-branch-2.7-0 -Ptest-patch 
-DskipTests test-compile findbugs:findbugs -DskipTests=true > 
/testptch/hadoop/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
 2>&1
Elapsed:   2m 50s

hadoop-hdfs-project/hadoop-hdfs in branch-2.7 has 1 extant Findbugs warnings.
{noformat}


> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7.patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223974#comment-15223974
 ] 

Brahma Reddy Battula commented on HDFS-10235:
-

 {{lastContact}} is given as seconds itself from the NameNode. It is being 
converted to absolute time stamp in UI side.
 
Now if we convert that to msec, it will be in multiples of 1000 itself.
 So, its better to be in seconds itself. as this is only for user/admin to see. 
Not for any metrics.

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client

2016-04-04 Thread Vishal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223928#comment-15223928
 ] 

Vishal commented on HDFS-6549:
--

Hi Team,

Could you please suggest how to install this patches and on which server we 
should run the same.

Thanks.

Vish

> Add support for accessing the NFS gateway from the AIX NFS client
> -
>
> Key: HDFS-6549
> URL: https://issues.apache.org/jira/browse/HDFS-6549
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 2.5.0
>
> Attachments: HDFS-6549.patch, HDFS-6549.patch, HDFS-6549.patch
>
>
> We've identified two issues when trying to access the HDFS NFS Gateway from 
> an AIX NFS client:
> # In the case of COMMITs, the AIX NFS client will always send 4096, or a 
> multiple of the page size, for the offset to be committed, even if fewer 
> bytes than this have ever, or will ever, be written to the file. This will 
> cause a write to a file from the AIX NFS client to hang on close unless the 
> size of that file is a multiple of 4096.
> # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the 
> same cookie verifier for a given directory seemingly forever after that 
> directory is first accessed over NFS, instead of getting a new cookie 
> verifier for every set of incremental readdir calls. This means that if a 
> directory's mtime ever changes, the FS must be unmounted/remounted before 
> readdir calls on that dir from AIX will ever succeed again.
> From my interpretation of RFC-1813, the NFS Gateway is in fact doing the 
> correct thing in both cases, but we can introduce simple changes on the NFS 
> Gateway side to be able to optionally work around these incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223848#comment-15223848
 ] 

Hadoop QA commented on HDFS-9902:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 45s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 23s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.server.namenode.TestNameNodeResourceChecker |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796727/HDFS-9902-02.patch |
| JIRA Issue | HDFS-9902 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 47883e3846e1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool 

[jira] [Commented] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223846#comment-15223846
 ] 

Hadoop QA commented on HDFS-10245:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
17s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 53s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2.7 has 1 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
120 unchanged - 1 fixed = 121 total (was 121) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1239 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 27s 
{color} | {color:red} The patch has 248 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
22s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs generated 0 new + 0 
unchanged - 1 fixed = 0 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 34s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 7s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 9s 
{color} | {color:red} Patch generated 65 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | 

[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223803#comment-15223803
 ] 

Akira AJISAKA commented on HDFS-10235:
--

One comment for the patch:
{code:title=dfshealth.html}

  

  Node
  Last contact
{code}
* Would you add a time unit (msec?) to the last contact?

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223801#comment-15223801
 ] 

Akira AJISAKA commented on HDFS-10235:
--

Sorry, I missed that. I'm +1 for this change. Relative time for only live nodes 
is good, but there is an inconsistency between live nodes and dead nodes. 
However, it is (hopefully) rare case that there are live nodes and dead nodes 
at the same time, so the inconsistency is acceptable for me. What do you think, 
[~raviprak]?

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223788#comment-15223788
 ] 

Brahma Reddy Battula commented on HDFS-10235:
-

its only for live nodes. Not for dead nodes.

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10235) [NN UI] Last contact for Live Nodes should be relative time

2016-04-04 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223781#comment-15223781
 ] 

Akira AJISAKA commented on HDFS-10235:
--

I'm -1 for the proposal. Absolute time is good for administrators to know when 
a failure happens without calculation. They are using absolute time for 
searching the daemon logs.

> [NN UI] Last contact for Live Nodes should be relative time
> ---
>
> Key: HDFS-10235
> URL: https://issues.apache.org/jira/browse/HDFS-10235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10235.patch
>
>
> Last contact for Live Nodes should be relative time and we can keep absolute 
> date for Dead Nodes. Relative time will be more convenient to know the last 
> contact , we no need to extra maths here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10245:

Attachment: HDFS-10245-branch-2.7.patch

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7.patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10245:

Attachment: (was: HDFS-10245-branch-2.7 .patch)

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223730#comment-15223730
 ] 

Hadoop QA commented on HDFS-10245:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s {color} 
| {color:red} HDFS-10245 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796783/HDFS-10245-branch-2.7%20.patch
 |
| JIRA Issue | HDFS-10245 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15053/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7 .patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10245:

Attachment: (was: HDFS-10245 .patch)

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7 .patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10245) Fix the findbug warnings in branch-2.7

2016-04-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-10245:

Attachment: HDFS-10245-branch-2.7 .patch

> Fix the findbug warnings in branch-2.7
> --
>
> Key: HDFS-10245
> URL: https://issues.apache.org/jira/browse/HDFS-10245
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10245-branch-2.7 .patch
>
>
> *Link*
> https://builds.apache.org/job/PreCommit-HDFS-Build/15036/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Following seen in HDFS-9917 jenkins run.
> {noformat}
> Code  Warning
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 85% of time
> Bug type IS2_INCONSISTENT_SYNC (click for details) 
> In class org.apache.hadoop.hdfs.DFSOutputStream
> Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
> Synchronized 85% of the time
> Unsynchronized access at DFSOutputStream.java:[line 2372]
> Unsynchronized access at DFSOutputStream.java:[line 2390]
> Unsynchronized access at DFSOutputStream.java:[line 1722]
> Synchronized access at DFSOutputStream.java:[line 2178]
> Synchronized access at DFSOutputStream.java:[line 2100]
> Synchronized access at DFSOutputStream.java:[line 2103]
> Synchronized access at DFSOutputStream.java:[line 2264]
> Synchronized access at DFSOutputStream.java:[line 1555]
> Synchronized access at DFSOutputStream.java:[line 1558]
> Synchronized access at DFSOutputStream.java:[line 2166]
> Synchronized access at DFSOutputStream.java:[line 2208]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2209]
> Synchronized access at DFSOutputStream.java:[line 2216]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2037]
> Synchronized access at DFSOutputStream.java:[line 2038]
> Synchronized access at DFSOutputStream.java:[line 2062]
> Synchronized access at DFSOutputStream.java:[line 2063]
> Synchronized access at DFSOutputStream.java:[line 2355]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)