[jira] [Resolved] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-15301.
--
Fix Version/s: 3.4.0
   3.3.0
   Resolution: Fixed

> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
> Fix For: 3.3.0, 3.4.0
>
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095105#comment-17095105
 ] 

Mukul Kumar Singh commented on HDFS-15301:
--

Merged this to trunk and backported to branch-3.3. 

> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095047#comment-17095047
 ] 

Hudson commented on HDFS-15301:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18196 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18196/])
HDFS-15301. statfs function in hdfs-fuse not working. Contributed by (github: 
rev 816042e62bf472a58d9f6dbce1123e9af6d06fb0)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c


> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095036#comment-17095036
 ] 

Mukul Kumar Singh commented on HDFS-15301:
--

Thanks for the review [~weichiu] and [~pifta]. I have merged the changes to 
trunk.
Will also backport this to Hadoop-3.3.

[~aryangupta1998], Can we add the test as a followup task ?

> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15285) The same distance and load nodes don't shuffle when consider DataNode load

2020-04-28 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094999#comment-17094999
 ] 

Lisheng Sun commented on HDFS-15285:


Added the identical v002 patch to trigger a new build.

> The same distance and load nodes don't shuffle when consider DataNode load
> --
>
> Key: HDFS-15285
> URL: https://issues.apache.org/jira/browse/HDFS-15285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15285.001.patch, HDFS-15285.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15285) The same distance and load nodes don't shuffle when consider DataNode load

2020-04-28 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15285:
---
Attachment: HDFS-15285.002.patch

> The same distance and load nodes don't shuffle when consider DataNode load
> --
>
> Key: HDFS-15285
> URL: https://issues.apache.org/jira/browse/HDFS-15285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15285.001.patch, HDFS-15285.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15307) Update description of "dfs.client.socketcache.*" properties to not have mention of short-circuits

2020-04-28 Thread Andrey Elenskiy (Jira)
Andrey Elenskiy created HDFS-15307:
--

 Summary: Update description of "dfs.client.socketcache.*" 
properties to not have mention of short-circuits
 Key: HDFS-15307
 URL: https://issues.apache.org/jira/browse/HDFS-15307
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching, dfsclient
Affects Versions: 3.1.3
 Environment: Hadoop 3.1.3
Reporter: Andrey Elenskiy


Both `dfs.client.socketcache.capacity` and `dfs.client.socketcache.expiryMsec` 
mention that cache is for short-circuit reads. That appears to be not the case 
as PeerCache is also used for caching TCP remote connections:

[https://github.com/apache/hadoop/blob/3f223bebfa6b382a762edcc518fcbae310ce22e5/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderRemote.java#L312]

https://github.com/apache/hadoop/blob/3f223bebfa6b382a762edcc518fcbae310ce22e5/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java#L815



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15306) Make mount-table to read from central place ( Let's say from HDFS)

2020-04-28 Thread Uma Maheswara Rao G (Jira)
Uma Maheswara Rao G created HDFS-15306:
--

 Summary: Make mount-table to read from central place ( Let's say 
from HDFS)
 Key: HDFS-15306
 URL: https://issues.apache.org/jira/browse/HDFS-15306
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: configuration, hadoop-client
Affects Versions: 3.2.1
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15305) Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme configurable.

2020-04-28 Thread Uma Maheswara Rao G (Jira)
Uma Maheswara Rao G created HDFS-15305:
--

 Summary: Extend ViewFS and provide ViewFSOverloadScheme 
implementation with scheme configurable.
 Key: HDFS-15305
 URL: https://issues.apache.org/jira/browse/HDFS-15305
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: viewfs, hadoop-client, fs, hdfs-client
Affects Versions: 3.2.1
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15289) Allow viewfs mounts with hdfs scheme and centralized mount table

2020-04-28 Thread Virajith Jalaparti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094884#comment-17094884
 ] 

Virajith Jalaparti commented on HDFS-15289:
---

Thanks for your response [~umamaheswararao].

bq. One thought is, admin commands use -fs option and specify the required nn 
address.

Yes, this is what we are leaning towards along with setting 
{{-Dfs.hdfs.impl=DistributedFileSystem}} when running HAAdmin and DFSAdmin. 
These configs can work in the short-term before HAAdmin and DFSAdmin are 
completely moved to {{ViewFSOverLoadScheme}}.

bq. If users access DFS directly, they may need to get the childFileSystems 
from ViewFSOverloadScheme and check the instanceOf.

This makes sense for cases where we must use {{DistributedFileSystem}} as a 
library.

[~abhishekd] had to make some changes to {{ViewFileSystem}} to change the scope 
of some classes. He can post more about the details for this. I suspect some of 
these were needed due the packaging chosen for the child class and might not be 
needed if the new class remains in the existing packaging of 
{{org.apache.hadoop.fs.viewfs}}.

> Allow viewfs mounts with hdfs scheme and centralized mount table
> 
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. Additionally viewFS provides flexibility to 
> configure any fs (not only HDFS) scheme in mount table mapping. This approach 
> is solving the scalability problems, but users need to reconfigure the 
> filesystem to ViewFS and to its scheme.  This will be problematic in the case 
> of paths persisted in meta stores, ex: Hive. In systems like Hive, it will 
> store uris in meta store. So, changing the file system scheme will create a 
> burden to upgrade/recreate meta stores. In our experience many users are not 
> ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15265) HttpFS: validate content-type in HttpFSUtils

2020-04-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094820#comment-17094820
 ] 

Íñigo Goiri commented on HDFS-15265:


I'm getting conflicts.
Do you mind rebasing?

> HttpFS: validate content-type in HttpFSUtils
> 
>
> Key: HDFS-15265
> URL: https://issues.apache.org/jira/browse/HDFS-15265
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15265.001.patch
>
>
> Validate that the content-type in HttpFSUtils is JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15293) Relax the condition for accepting a fsimage when receiving a checkpoint

2020-04-28 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094792#comment-17094792
 ] 

Chen Liang commented on HDFS-15293:
---

[~shv] I don't think the the issue you mentioned will actually happen 
currently. Because the checks only skip an image if BOTH conditions are met: 1. 
time delta too small AND 2. txnid delta too small. It's an AND not OR.

So in the case you mentioned, it is true that time delta will always be 
considered too small due to the ridiculously large interval, but if configured 
with a small txnid, it is easy to get enough txnid, so txnid delta won't be 
considered too small. It is not that time delta being small alone leads to 
rejecting an image.

But indeed, it is possible that in a cluster with ridiculously large interval, 
plus a extremely light load (so txnid barely make progress), both conditions 
will always be true. In this case the checkpoint will all be rejected. Although 
realistically I don't think there is much value doing checkpoint in such 
situation any way, it is probably not a good idea to change behavior of the 
system by effectively rejecting all images from happening.

Because of this, I'm thinking of removing the txnid condition all together, so 
the check only looks at time delta and allow any txnid delta. It seems more 
tricky to justify preventing all the use cases with slow txnid increase. (Time 
always proceed, but not necessarily txnid.) I think we were targeting mainly 
time condition originally.

> Relax the condition for accepting a fsimage when receiving a checkpoint 
> 
>
> Key: HDFS-15293
> URL: https://issues.apache.org/jira/browse/HDFS-15293
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: multi-sbnn
>
> HDFS-12979 introduced the logic that, if ANN sees consecutive fs image upload 
> from Standby with a small delta comparing to previous fsImage. ANN would 
> reject this image. This is to avoid overly frequent fsImage in case of when 
> there are multiple Standby node. However this check could be too stringent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15301) statfs function in hdfs-fuse is not working

2020-04-28 Thread Aryan Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aryan Gupta updated HDFS-15301:
---
Labels: https://github.com/apache/hadoop/pull/1980  (was: )

> statfs function in hdfs-fuse is not working
> ---
>
> Key: HDFS-15301
> URL: https://issues.apache.org/jira/browse/HDFS-15301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs, libhdfs
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop/pull/1980
>
> *statfs function in hdfs-fuse is not working.* It gives error like:
> could not find method org/apache/hadoop/fs/FsStatus from class 
> org/apache/hadoop/fs/FsStatus with signature getUsed
> hdfsGetUsed: FsStatus#getUsed error:
> NoSuchMethodError: org/apache/hadoop/fs/FsStatusjava.lang.NoSuchMethodError: 
> org/apache/hadoop/fs/FsStatus
>  
> Problem: Incorrect passing of parameters invokeMethod function.
> invokeMethod(env, , INSTANCE, fss, JC_FS_STATUS,
> HADOOP_FSSTATUS,"getUsed", "()J");
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15265) HttpFS: validate content-type in HttpFSUtils

2020-04-28 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094774#comment-17094774
 ] 

hemanthboyina commented on HDFS-15265:
--

hi [~elgoiri] , can you push this patch forward

> HttpFS: validate content-type in HttpFSUtils
> 
>
> Key: HDFS-15265
> URL: https://issues.apache.org/jira/browse/HDFS-15265
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15265.001.patch
>
>
> Validate that the content-type in HttpFSUtils is JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15300) RBF: updateActiveNamenode() is invalid when RPC address is IP

2020-04-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094767#comment-17094767
 ] 

Íñigo Goiri commented on HDFS-15300:


I'm a little worried about doing  NetUtils.createSocketAddr() unnecesarily.
Can we check for an IP pattern before resolving?

> RBF: updateActiveNamenode() is invalid when RPC address is IP
> -
>
> Key: HDFS-15300
> URL: https://issues.apache.org/jira/browse/HDFS-15300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-15300-001.patch
>
>
> ActiveNamenodeResolver#updateActiveNamenode will invalid when the rpc address 
> like ip:port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2020-04-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094760#comment-17094760
 ] 

Íñigo Goiri commented on HDFS-14353:


Let's file a JIRA for TestReconstructStripedFile (and make that assertTrue an 
assertEquals :))
+1 on  [^HDFS-14353.010.patch].

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, 
> HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, 
> HDFS-14353.009.patch, HDFS-14353.010.patch, screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15300) RBF: updateActiveNamenode() is invalid when RPC address is IP

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15300:
---
Summary: RBF: updateActiveNamenode() is invalid when RPC address is IP  
(was: RBF:updateActiveNamenode is invalid when rpc address is ip)

> RBF: updateActiveNamenode() is invalid when RPC address is IP
> -
>
> Key: HDFS-15300
> URL: https://issues.apache.org/jira/browse/HDFS-15300
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-15300-001.patch
>
>
> ActiveNamenodeResolver#updateActiveNamenode will invalid when the rpc address 
> like ip:port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15287) HDFS rollingupgrade prepare never finishes

2020-04-28 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094722#comment-17094722
 ] 

Chen Liang edited comment on HDFS-15287 at 4/28/20, 5:42 PM:
-

Thanks for the update [~kihwal]. Will follow up on HDFS-15293, the issue to 
resolve there should be relatively though. Issue mentioned in HDFS-15293 is not 
a consistently happening issue and can lead to missing at most one periodical 
image upload.

And just to clarify though, the improvement from HDFS-15036 is not specific to 
Observer. It was for multiple SBN in general. Even without Observer, as long as 
there are multiple SBN, there can be frequent image upload. While even with 
Observer, if there is only one SBN, frequent upload would not be an issue.

Regarding making this configurable, would like to have [~shv]'s thoughts here, 
as Konstantin was opposing adding this new config.


was (Author: vagarychen):
Thanks for the update [~kihwal]. Will follow up on HDFS-15293, the issue to 
resolve there should be relatively though. Issue mentioned in HDFS-15293 is not 
a consistently happening issue and can lead to missing at most one periodical 
image upload.

And just to clarify though, the improvement from HDFS-15036 is not specific to 
Observer. It was for multiple SBN in general. Even without Observer, as long as 
there are multiple SBN, there can be frequent image upload. While even with 
Observer, if there is only one SBN, frequent upload would not be an issue.

> HDFS rollingupgrade prepare never finishes
> --
>
> Key: HDFS-15287
> URL: https://issues.apache.org/jira/browse/HDFS-15287
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-12979, the prepare step of rolling upgrade does not work. This is 
> because it added additional check for sufficient time passing since last 
> checkpoint. Since RU rollback image creation and upload can happen any time, 
> uploading of rollback image never succeeds. For a new cluster deployed for 
> testing, it might work since it never checkpointed before.
> It was found that this check is disabled for unit tests, defeating the very 
> purpose of testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15287) HDFS rollingupgrade prepare never finishes

2020-04-28 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094722#comment-17094722
 ] 

Chen Liang commented on HDFS-15287:
---

Thanks for the update [~kihwal]. Will follow up on HDFS-15293, the issue to 
resolve there should be relatively though. Issue mentioned in HDFS-15293 is not 
a consistently happening issue and can lead to missing at most one periodical 
image upload.

And just to clarify though, the improvement from HDFS-15036 is not specific to 
Observer. It was for multiple SBN in general. Even without Observer, as long as 
there are multiple SBN, there can be frequent image upload. While even with 
Observer, if there is only one SBN, frequent upload would not be an issue.

> HDFS rollingupgrade prepare never finishes
> --
>
> Key: HDFS-15287
> URL: https://issues.apache.org/jira/browse/HDFS-15287
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-12979, the prepare step of rolling upgrade does not work. This is 
> because it added additional check for sufficient time passing since last 
> checkpoint. Since RU rollback image creation and upload can happen any time, 
> uploading of rollback image never succeeds. For a new cluster deployed for 
> testing, it might work since it never checkpointed before.
> It was found that this check is disabled for unit tests, defeating the very 
> purpose of testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15294) RBF: Balance data across federation namespaces with DistCp and snapshot diff

2020-04-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094721#comment-17094721
 ] 

Hadoop QA commented on HDFS-15294:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
54s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  5m 
24s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} branch/hadoop-project no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} branch/hadoop-assemblies no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} branch/hadoop-tools/hadoop-tools-dist no findbugs 
output file (findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} hadoop-project has no data from findbugs {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} hadoop-assemblies has no data from findbugs {color} |
| 

[jira] [Commented] (HDFS-15285) The same distance and load nodes don't shuffle when consider DataNode load

2020-04-28 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094716#comment-17094716
 ] 

Stephen O'Donnell commented on HDFS-15285:
--

The findbug warnings that have appeared recently seems to be fixed by 
HDFS-15298.

Could you push a new identical 002 patch file to trigger a new build, and if it 
comes back clean I think we are good to commit? Then we can figure out the 
findbug problem in HDFS-15255.

> The same distance and load nodes don't shuffle when consider DataNode load
> --
>
> Key: HDFS-15285
> URL: https://issues.apache.org/jira/browse/HDFS-15285
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15285.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15302) Backport HDFS-15286 to branch-2.x

2020-04-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094717#comment-17094717
 ] 

Hadoop QA commented on HDFS-15302:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 11s{color} 
| {color:red} HDFS-15302 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15302 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13001486/HDFS-15302-branch.2.10.1.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29198/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Backport HDFS-15286 to branch-2.x
> -
>
> Key: HDFS-15302
> URL: https://issues.apache.org/jira/browse/HDFS-15302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: hemanthboyina
>Priority: Blocker
> Attachments: HDFS-15302-branch.2.10.1.patch
>
>
> Backport HDFS-15286 to branch-2.10 and branch-2.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15302) Backport HDFS-15286 to branch-2.x

2020-04-28 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15302:
-
Attachment: HDFS-15302-branch.2.10.1.patch
  Assignee: hemanthboyina
Status: Patch Available  (was: Open)

> Backport HDFS-15286 to branch-2.x
> -
>
> Key: HDFS-15302
> URL: https://issues.apache.org/jira/browse/HDFS-15302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: hemanthboyina
>Priority: Blocker
> Attachments: HDFS-15302-branch.2.10.1.patch
>
>
> Backport HDFS-15286 to branch-2.10 and branch-2.9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15304) Infinite loop between DN and NN at rare condition

2020-04-28 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDFS-15304:
---

Assignee: Istvan Fajth

> Infinite loop between DN and NN at rare condition
> -
>
> Key: HDFS-15304
> URL: https://issues.apache.org/jira/browse/HDFS-15304
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Minor
>
> During the investigation lead to HDFS-15303, we have identified the following 
> infinite loop between the DNs affected by the data directory layout problem:
> - for a particular misplaced block, the VolumeScanner finds the block file, 
> and realizes that it is not part of the block map
> - the block is added to the block map
> - at the next FBR the block is reported to the NN
> - the NN finds that the block should have been deleted already, as the 
> corresponding inode was already deleted
> - NN issues the deletion of the block on the DataNode
> - DataNode runs the delete routine, but that fails to delete anything 
> silently as it is trying to delete the block from the wrong internal subdir 
> that is calculated based on the block id with a different algorythm.
> - block is removed from the blockmap
> - VolumeScanner finds the block again, and adds it back to the blockmap
> The problem can happen only when there is a mixed layout on the DataNode due 
> to some issue, and there are blocks in a subdir correct according to Hadoop2 
> format, but the DN is already hadoop3, or vice versa if the problematic 
> layout born during a rollback. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15304) Infinite loop between DN and NN at rare condition

2020-04-28 Thread Istvan Fajth (Jira)
Istvan Fajth created HDFS-15304:
---

 Summary: Infinite loop between DN and NN at rare condition
 Key: HDFS-15304
 URL: https://issues.apache.org/jira/browse/HDFS-15304
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Istvan Fajth


During the investigation lead to HDFS-15303, we have identified the following 
infinite loop between the DNs affected by the data directory layout problem:
- for a particular misplaced block, the VolumeScanner finds the block file, and 
realizes that it is not part of the block map
- the block is added to the block map
- at the next FBR the block is reported to the NN
- the NN finds that the block should have been deleted already, as the 
corresponding inode was already deleted
- NN issues the deletion of the block on the DataNode
- DataNode runs the delete routine, but that fails to delete anything silently 
as it is trying to delete the block from the wrong internal subdir that is 
calculated based on the block id with a different algorythm.
- block is removed from the blockmap
- VolumeScanner finds the block again, and adds it back to the blockmap

The problem can happen only when there is a mixed layout on the DataNode due to 
some issue, and there are blocks in a subdir correct according to Hadoop2 
format, but the DN is already hadoop3, or vice versa if the problematic layout 
born during a rollback. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15303) Provide a tool that can validate/fix the block file placement in DataNode data directories

2020-04-28 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDFS-15303:
---

Assignee: Istvan Fajth

> Provide a tool that can validate/fix the block file placement in DataNode 
> data directories
> --
>
> Key: HDFS-15303
> URL: https://issues.apache.org/jira/browse/HDFS-15303
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Minor
>
> We recently run into an issue where during an upgrade from Hadoop2 to 
> Hadoop3, the filesystem under DataNode data directories was left in an 
> intermediate state, and part of the directories were in Hadoop2 format other 
> parts in Hadoop3 format.
> At first we had to rollback the upgrade, and after the rollback we started to 
> see FileNotFoundExceptions for particular block files.
> The exception was logged on the DataNodes and failed the jobs as well 
> sometimes. HDFS NameNode did not show any missing blocks, and we found the 
> block files and meta files also in the DataNode's data directories, but at a 
> different location.
> This was the point when we realized that something went wrong during the 
> rollback, and some of the data directories had blocks placed according to 
> Hadoop3 rules, while other were placed according to Hadoop2 rules. We suspect 
> a possible premature DataNode shutdown or an unknown failure during the 
> rollback, but at the point when we realized what is the issue and could check 
> into things, we already ran out of the logs that would have been able to show 
> us the cause.
> This JIRA is to suggest two new commands that can help administrators in this 
> situation, to validate the data directory and ensure that blocks are placed 
> correctly according to the rules, and to fix the data directory layout if 
> needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15303) Provide a tool that can validate/fix the block file placement in DataNode data directories

2020-04-28 Thread Istvan Fajth (Jira)
Istvan Fajth created HDFS-15303:
---

 Summary: Provide a tool that can validate/fix the block file 
placement in DataNode data directories
 Key: HDFS-15303
 URL: https://issues.apache.org/jira/browse/HDFS-15303
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Istvan Fajth


We recently run into an issue where during an upgrade from Hadoop2 to Hadoop3, 
the filesystem under DataNode data directories was left in an intermediate 
state, and part of the directories were in Hadoop2 format other parts in 
Hadoop3 format.

At first we had to rollback the upgrade, and after the rollback we started to 
see FileNotFoundExceptions for particular block files.
The exception was logged on the DataNodes and failed the jobs as well 
sometimes. HDFS NameNode did not show any missing blocks, and we found the 
block files and meta files also in the DataNode's data directories, but at a 
different location.

This was the point when we realized that something went wrong during the 
rollback, and some of the data directories had blocks placed according to 
Hadoop3 rules, while other were placed according to Hadoop2 rules. We suspect a 
possible premature DataNode shutdown or an unknown failure during the rollback, 
but at the point when we realized what is the issue and could check into 
things, we already ran out of the logs that would have been able to show us the 
cause.

This JIRA is to suggest two new commands that can help administrators in this 
situation, to validate the data directory and ensure that blocks are placed 
correctly according to the rules, and to fix the data directory layout if 
needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15287) HDFS rollingupgrade prepare never finishes

2020-04-28 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094531#comment-17094531
 ] 

Kihwal Lee commented on HDFS-15287:
---

[~aajisaka], one problem is resolved on 4/16. Another side-effect of this new 
check still remains.  It is a blocker for us as it breaks existing feature/use 
case.  HDFS-15293 might fix it.

> HDFS rollingupgrade prepare never finishes
> --
>
> Key: HDFS-15287
> URL: https://issues.apache.org/jira/browse/HDFS-15287
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-12979, the prepare step of rolling upgrade does not work. This is 
> because it added additional check for sufficient time passing since last 
> checkpoint. Since RU rollback image creation and upload can happen any time, 
> uploading of rollback image never succeeds. For a new cluster deployed for 
> testing, it might work since it never checkpointed before.
> It was found that this check is disabled for unit tests, defeating the very 
> purpose of testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15287) HDFS rollingupgrade prepare never finishes

2020-04-28 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094527#comment-17094527
 ] 

Kihwal Lee commented on HDFS-15287:
---

Thanks for the explanation. It looks like HDFS-15036 was pulled into 2.10 on 
4/16. We were running a snapshot of 2.10 from late March, so it didn't have the 
fix. I haven't verified the fix, but I trust it works. We are now on an 
internal release with HDFS-15036, but it also has the check disabled, so cannot 
easily verify at this time.

[~vagarychen], please do make this check configurable so that more frequent 
checkpointing & uploading is possible. Or you can tie it with the observer 
feature and have it automatically enabled only when the entire feature is 
enabled. After all, we were told the observer feature won't interfere with 
normal operation if disabled.

> HDFS rollingupgrade prepare never finishes
> --
>
> Key: HDFS-15287
> URL: https://issues.apache.org/jira/browse/HDFS-15287
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-12979, the prepare step of rolling upgrade does not work. This is 
> because it added additional check for sufficient time passing since last 
> checkpoint. Since RU rollback image creation and upload can happen any time, 
> uploading of rollback image never succeeds. For a new cluster deployed for 
> testing, it might work since it never checkpointed before.
> It was found that this check is disabled for unit tests, defeating the very 
> purpose of testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15294) RBF: Balance data across federation namespaces with DistCp and snapshot diff

2020-04-28 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15294:
---
Attachment: HDFS-15294.004.patch

> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> 
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, distcp-balance.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the 
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15294) RBF: Balance data across federation namespaces with DistCp and snapshot diff

2020-04-28 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094498#comment-17094498
 ] 

Jinglun commented on HDFS-15294:


Upload v04, fix checkstyle and findbug

> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> 
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, distcp-balance.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the 
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14758) Decrease lease hard limit

2020-04-28 Thread bianqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094178#comment-17094178
 ] 

bianqi commented on HDFS-14758:
---

[~kihwal]  Due to code adjustments, resulting in comment errors, please fix

bq. {quote}bq.   /**
bq.* For a HDFS client to write to a file, a lease is granted; During the 
lease
bq.* period, no other client can write to the file. The writing client can
bq.* periodically renew the lease. When the file is closed, the lease is
bq.* revoked. The lease duration is bound by this soft limit and a
bq.* {@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit}. Until the
bq.* soft limit expires, the writer has sole write access to the file. If 
the
bq.* soft limit expires and the client fails to close the file or renew the
bq.* lease, another client can preempt the lease.
bq.*/
bq.   public static final long LEASE_SOFTLIMIT_PERIOD = 60 * 1000;{quote}

*@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit*  This variable no 
longer exists~

> Decrease lease hard limit
> -
>
> Key: HDFS-14758
> URL: https://issues.apache.org/jira/browse/HDFS-14758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: hemanthboyina
>Priority: Minor
> Fix For: 3.3.0, 2.8.6, 2.9.3, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14758.001.patch, HDFS-14758.002.patch, 
> HDFS-14758.003.patch, HDFS-14758.004.patch, HDFS-14758.005.patch, 
> HDFS-14758.005.patch, HDFS-14758.006.patch
>
>
> The hard limit is currently hard-coded to be 1 hour. This also determines the 
> NN automatic lease recovery interval. Something like 20 min will make more 
> sense.
> After the 5 min soft limit, other clients can recover the lease. If no one 
> else takes the lease away, the original client still can renew the lease 
> within the hard limit. So even after a NN full GC of 8 minutes, leases can be 
> still valid.
> However, there is one risk in reducing the hard limit. E.g. Reduced to 20 
> min. If the NN crashes and the manual failover takes more than 20 minutes, 
> clients will abort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org