[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143504#comment-17143504
 ] 

Chen Liang commented on HDFS-15421:
---

Thanks for reporting [~kihwal] and thanks [~aajisaka] working on this! Good 
catch on the missing updates, the change looks good to me.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143502#comment-17143502
 ] 

Akira Ajisaka commented on HDFS-15421:
--

Thanks [~shv] for your review and suggestion. Merged the test files.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Attachment: HDFS-15421.003.patch

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13248) RBF: Namenode need to choose block location for the client

2020-06-23 Thread jianghua zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianghua zhu updated HDFS-13248:

Description: 
When execute a put operation via router, the NameNode will choose block 
location for the router, not for the real client. This will affect the file's 
locality.

I think on both NameNode and Router, we should add a new addBlock method, or 
add a parameter for the current addBlock method, to pass the real client 
information.

  was:NegativeArraySizeException when PROVIDED replication >1


> RBF: Namenode need to choose block location for the client
> --
>
> Key: HDFS-13248
> URL: https://issues.apache.org/jira/browse/HDFS-13248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wu Weiwei
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, 
> HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, 
> HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality 
> Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg
>
>
> When execute a put operation via router, the NameNode will choose block 
> location for the router, not for the real client. This will affect the file's 
> locality.
> I think on both NameNode and Router, we should add a new addBlock method, or 
> add a parameter for the current addBlock method, to pass the real client 
> information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13248) RBF: Namenode need to choose block location for the client

2020-06-23 Thread jianghua zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianghua zhu updated HDFS-13248:

Description: NegativeArraySizeException when PROVIDED replication >1  (was: 
When execute a put operation via router, the NameNode will choose block 
location for the router, not for the real client. This will affect the file's 
locality.

I think on both NameNode and Router, we should add a new addBlock method, or 
add a parameter for the current addBlock method, to pass the real client 
information.)

> RBF: Namenode need to choose block location for the client
> --
>
> Key: HDFS-13248
> URL: https://issues.apache.org/jira/browse/HDFS-13248
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wu Weiwei
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, 
> HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, 
> HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality 
> Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg
>
>
> NegativeArraySizeException when PROVIDED replication >1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15416) DataStorage#addStorageLocations() should add more reasonable information verification.

2020-06-23 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143483#comment-17143483
 ] 

jianghua zhu commented on HDFS-15416:
-

[~elgoiri] , thank you very much for your suggestions.
I have submitted a new patch file, and I have modified some source code.

 

> DataStorage#addStorageLocations() should add more reasonable information 
> verification.
> --
>
> Key: HDFS-15416
> URL: https://issues.apache.org/jira/browse/HDFS-15416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.1.1
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15416.000.patch, HDFS-15416.001.patch
>
>
> SuccessLocations content is an array, when the number is 0, do not need to be 
> executed again loadBlockPoolSliceStorage ().
> code : 
> try
> {    
> final List successLocations = loadDataStorage(   datanode, 
> nsInfo,    dataDirs, startOpt, executor);  
> return loadBlockPoolSliceStorage(   datanode, nsInfo,   successLocations, 
> startOpt, executor); }
> finally
> {     executor.shutdown(); }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15416) DataStorage#addStorageLocations() should add more reasonable information verification.

2020-06-23 Thread jianghua zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianghua zhu updated HDFS-15416:

Attachment: HDFS-15416.001.patch
Status: Patch Available  (was: In Progress)

> DataStorage#addStorageLocations() should add more reasonable information 
> verification.
> --
>
> Key: HDFS-15416
> URL: https://issues.apache.org/jira/browse/HDFS-15416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1, 3.1.0
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15416.000.patch, HDFS-15416.001.patch
>
>
> SuccessLocations content is an array, when the number is 0, do not need to be 
> executed again loadBlockPoolSliceStorage ().
> code : 
> try
> {    
> final List successLocations = loadDataStorage(   datanode, 
> nsInfo,    dataDirs, startOpt, executor);  
> return loadBlockPoolSliceStorage(   datanode, nsInfo,   successLocations, 
> startOpt, executor); }
> finally
> {     executor.shutdown(); }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15425) Review Logging of DFSClient

2020-06-23 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143456#comment-17143456
 ] 

Hongbing Wang commented on HDFS-15425:
--

[~elgoiri] Thanks for your review!

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Minor
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-23 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143435#comment-17143435
 ] 

liusheng edited comment on HDFS-15098 at 6/24/20, 1:30 AM:
---

Hi [~weichiu],

I am so sorry that we have a delay for this feature, now we have updated the 
patches and tested OK locally, we have added test cases, config options, docs 
in the patch. currently, the SM4 is supported in openssl>=1.1.1, if this 
requirement is unstatisfied, it will fall back to use the SM4 implementation of 
BouncyCastleProvider which is already a dependency of Hadoop. So, now we only 
need to cofigure KMS services to enable SM4 support.

Could you please help to review again ?


was (Author: seanlau):
Hi [~weichiu],

I am so sorry that we have a delay for this feature, now we have updated the 
patches and tested OK locally, we have added test cases, config options, docs 
in the patch. currently, the SM4 is supported in openssl>=1.1.1, if this 
requirement is unstatisfied, it will fall back to use the SM4 implementation 
BouncyCastleProvider which is already a dependency of Hadoop. So, now we only 
need to cofigure KMS services to enable SM4 support.

Could you please help to review again ?

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15098) Add SM4 encryption method for HDFS

2020-06-23 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143435#comment-17143435
 ] 

liusheng commented on HDFS-15098:
-

Hi [~weichiu],

I am so sorry that we have a delay for this feature, now we have updated the 
patches and tested OK locally, we have added test cases, config options, docs 
in the patch. currently, the SM4 is supported in openssl>=1.1.1, if this 
requirement is unstatisfied, it will fall back to use the SM4 implementation 
BouncyCastleProvider which is already a dependency of Hadoop. So, now we only 
need to cofigure KMS services to enable SM4 support.

Could you please help to review again ?

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: zZtai
>Priority: Major
>  Labels: sm4
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch
>
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.download Bouncy Castle Crypto APIs from bouncycastle.org
> [https://bouncycastle.org/download/bcprov-ext-jdk15on-165.jar]
> 2.Configure JDK
> Place bcprov-ext-jdk15on-165.jar in $JAVA_HOME/jre/lib/ext directory,
> add "security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider" 
> to $JAVA_HOME/jre/lib/security/java.security file
> 3.Configure Hadoop KMS
> 4.test HDFS sm4
> hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
> hdfs dfs -mkdir /benchmarks
> hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
> 1.openssl version >=1.1.1
> 2.configure Bouncy Castle Crypto on JDK



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143430#comment-17143430
 ] 

Fengnan Li commented on HDFS-15383:
---

Thanks! [~elgoiri] [~hexiaoqiao]

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.

2020-06-23 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15429:
---
Status: Patch Available  (was: Open)

Updated a PR for review!

> mkdirs should work when parent dir is internalDir and fallback configured.
> --
>
> Key: HDFS-15429
> URL: https://issues.apache.org/jira/browse/HDFS-15429
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.21
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> mkdir will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to mkdir in fallback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143331#comment-17143331
 ] 

Kihwal Lee commented on HDFS-15421:
---

Thanks, [~aajisaka] for the patch. I will also have a look soon.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143296#comment-17143296
 ] 

Konstantin Shvachko commented on HDFS-15421:


Good catch [~kihwal], thanks for debugging this. [~aajisaka] thanks for the 
patch.
Clearly HDFS-14941 missed some append and truncate cases, which update blocks 
with new genStamp while tailing.

Took a look at v02 patch. It seems you caught correctly all other cases of 
block updates during tailing. Would be good if [~vagarychen] could take a look 
as well.
One suggestion for tests is to move all test cases into {{TestAddBlockTailing}} 
if possible, potentially renaming it to something like 
{{TestUpdateBlockTailing}}. The two new tests have a lot of code similarities 
with {{TestAddBlockTailing}. And if merged will avoid extra MiniCluster 
startups, making tests run faster.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143278#comment-17143278
 ] 

Hudson commented on HDFS-15383:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18377 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18377/])
HDFS-15383. RBF: Add support for router delegation token without watch (github: 
rev 84110d850e2bc2a9ff4afcc7508fecd81cb5b7e5)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/ZKDelegationTokenSecretManagerImpl.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/token/TestZKDelegationTokenSecretManagerImpl.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/token/delegation/TestZKDelegationTokenSecretManager.java


> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15416) DataStorage#addStorageLocations() should add more reasonable information verification.

2020-06-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143276#comment-17143276
 ] 

Íñigo Goiri commented on HDFS-15416:


Thanks [~jianghuazhu] for the update.
Please for tracking keep adding patches with the sequence number.
For the test, the content shouldn't have a javadoc style comment but a regular 
one.

> DataStorage#addStorageLocations() should add more reasonable information 
> verification.
> --
>
> Key: HDFS-15416
> URL: https://issues.apache.org/jira/browse/HDFS-15416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.1.1
>Reporter: jianghua zhu
>Assignee: jianghua zhu
>Priority: Major
> Attachments: HDFS-15416.000.patch
>
>
> SuccessLocations content is an array, when the number is 0, do not need to be 
> executed again loadBlockPoolSliceStorage ().
> code : 
> try
> {    
> final List successLocations = loadDataStorage(   datanode, 
> nsInfo,    dataDirs, startOpt, executor);  
> return loadBlockPoolSliceStorage(   datanode, nsInfo,   successLocations, 
> startOpt, executor); }
> finally
> {     executor.shutdown(); }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143268#comment-17143268
 ] 

Íñigo Goiri commented on HDFS-15383:


Thanks [~fengnanli] for the patch and [~hexiaoqiao] for the review.
Merged the PR.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-15383.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15425) Review Logging of DFSClient

2020-06-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143165#comment-17143165
 ] 

Íñigo Goiri commented on HDFS-15425:


[^HDFS-15425.002.patch] looks safer.
We probably should fix the checkstyle though.

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Minor
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143102#comment-17143102
 ] 

Hadoop QA commented on HDFS-15421:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
1s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 28s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29455/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15421 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13006271/HDFS-15421.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 68ab2dd622fb 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 03f855e3e7a |
| Default Java | Private 

[jira] [Resolved] (HDFS-13510) Ozone: Fix precommit hook for Ozone/Hdds on trunk

2020-06-23 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDFS-13510.

Resolution: Won't Fix

> Ozone: Fix precommit hook for Ozone/Hdds on trunk
> -
>
> Key: HDFS-13510
> URL: https://issues.apache.org/jira/browse/HDFS-13510
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> Current precommit doesn't work with the ozone projects as they are in an 
> optional profile.
> This jira may not have any code change but I opened it to track the required 
> changes on builds.apache.org and make the changes more transparent.
> I think we need the following changes:
> 1. Separated jira subproject, as planned
> 2. After that we can create new Precommit-OZONE-Build job which will be 
> triggered by the PreCommit-Admin (jira filter should be modified)
> 3. In the Precommit-OZONE-Build we need to enable the hdds profile. It could 
> be done by modifying the yetus personality or the create a .mvn/mvn.config
> 4. We need the ozone/hdds snapshot artifacts in apache nexus:
>   a.) One option is adding -P hdds to the Hadoop-trunk-Commit. This is the 
> simplified but Hdds/Ozone build failure will cause missing artifacts on nexus 
> (low chance as the merge will be guarded by PreCommit hook)
>   b.) Other options is to create a Hadoop-Ozone-trunk-Commit which do a full 
> compilation but only hdds and ozone artifacts will be deployed (some sync 
> problem maybe here if different core artifacts are uploaded...)
> 5. And we also need a daily unit test run. (qbt) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142930#comment-17142930
 ] 

Akira Ajisaka commented on HDFS-15421:
--

002 patch
* fixed comments in test

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Attachment: HDFS-15421.002.patch

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142884#comment-17142884
 ] 

Akira Ajisaka edited comment on HDFS-15421 at 6/23/20, 1:33 PM:


{quote}I think we need to update genstamp when rolling {{OP_APPEND}}. In 
{{OP_TRUNCATE}}, it is the same.
{quote}
This change does not fix the problem for append. When appending a block without 
{{CreateFlag.NEW_BLOCK}}, the edit log becomes as follows:
* {{OP_APPEND}}: prepare for append
* {{OP_SET_GENSTAMP_V2}}: update pipeline 
* (edited) {{OP_UPDATE_BLOCKS}}: update blocks

That way SNN will tail {{OP_SET_GENSTAMP_V2}} after {{OP_APPEND}}, so apply 
impending genstamp in {{OP_APPEND}} does not fix this problem.
I'll attach a patch with some regression tests.


was (Author: ajisakaa):
{quote}I think we need to update genstamp when rolling {{OP_APPEND}}. In 
{{OP_TRUNCATE}}, it is the same.
{quote}
This change does not fix the problem for append. When appending a block without 
{{CreateFlag.NEW_BLOCK}}, the edit log becomes as follows:
* {{OP_APPEND}}: prepare for append
* {{OP_SET_GENSTAMP_V2}}: update pipeline 

That way SNN will tail {{OP_SET_GENSTAMP_V2}} after {{OP_APPEND}}, so apply 
impending genstamp in {{OP_APPEND}} does not fix this problem.
I'll attach a patch with some regression tests.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Assignee: Akira Ajisaka
  Status: Patch Available  (was: Open)

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142925#comment-17142925
 ] 

Akira Ajisaka commented on HDFS-15421:
--

Attached a 001 patch to update the global genstamp in SBN when tailing 
{{OP_TRUNCATE}} and {{OP_UPDATE_BLOCKS}} edit logs. Please ignore my previous 
comments.
Sorry for going back and forth.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Attachment: HDFS-15421-001.patch

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142892#comment-17142892
 ] 

Akira Ajisaka commented on HDFS-15421:
--

I think HDFS-14941 can be reverted because it causes IBR leak not only in 
append but also in pipeline recovery. Now I'm +1 for the option 3 to fix the 
edit log race in 
https://issues.apache.org/jira/browse/HDFS-14941?focusedCommentId=16963371=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16963371

Any thoughts?

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Attachment: HDFS-15421-000.patch

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142884#comment-17142884
 ] 

Akira Ajisaka commented on HDFS-15421:
--

{quote}I think we need to update genstamp when rolling {{OP_APPEND}}. In 
{{OP_TRUNCATE}}, it is the same.
{quote}
This change does not fix the problem for append. When appending a block without 
{{CreateFlag.NEW_BLOCK}}, the edit log becomes as follows:
* {{OP_APPEND}}: prepare for append
* {{OP_SET_GENSTAMP_V2}}: update pipeline 

That way SNN will tail {{OP_SET_GENSTAMP_V2}} after {{OP_APPEND}}, so apply 
impending genstamp in {{OP_APPEND}} does not fix this problem.
I'll attach a patch with some regression tests.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15425) Review Logging of DFSClient

2020-06-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142850#comment-17142850
 ] 

Hadoop QA commented on HDFS-15425:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 30m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
51s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The 
patch generated 1 new + 61 unchanged - 0 fixed = 62 total (was 61) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29454/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15425 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13006247/HDFS-15425.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux f542ec76cf44 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / fa14e4bc001 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 

[jira] [Commented] (HDFS-15425) Review Logging of DFSClient

2020-06-23 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142792#comment-17142792
 ] 

Hongbing Wang commented on HDFS-15425:
--

I provide a optional version 002.patch. [~elgoiri] Could you help review it? 
thx!

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Minor
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15425) Review Logging of DFSClient

2020-06-23 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15425:
-
Attachment: HDFS-15425.002.patch

> Review Logging of DFSClient
> ---
>
> Key: HDFS-15425
> URL: https://issues.apache.org/jira/browse/HDFS-15425
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Minor
> Attachments: HDFS-15425.001.patch, HDFS-15425.002.patch
>
>
> Review use of SLF4J for DFSClient.LOG. 
> Make the code more concise and readable. 
> Less is more !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occured

2020-06-23 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142772#comment-17142772
 ] 

ludun edited comment on HDFS-15431 at 6/23/20, 9:11 AM:


[~surendrasingh], [~hemanthboyina] please check this issue.


was (Author: pilchard):
[~surendrasingh], [~hemanthboyina] please check the issue.

> Can not read a opening file after NameNode failover if pipeline recover 
> occured
> ---
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
>   
> lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
>   isLastBlockComplete=false}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occured

2020-06-23 Thread ludun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludun updated HDFS-15431:
-
Summary: Can not read a opening file after NameNode failover if pipeline 
recover occured  (was: Can not read a opening file after NameNode failover if 
pipeline recover occuered)

> Can not read a opening file after NameNode failover if pipeline recover 
> occured
> ---
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
>   
> lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
>   isLastBlockComplete=false}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occuered

2020-06-23 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142772#comment-17142772
 ] 

ludun commented on HDFS-15431:
--

[~surendrasingh] [~hemanthboyina] please check the issue.

> Can not read a opening file after NameNode failover if pipeline recover 
> occuered
> 
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
>   
> lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
>   isLastBlockComplete=false}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occuered

2020-06-23 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142772#comment-17142772
 ] 

ludun edited comment on HDFS-15431 at 6/23/20, 9:09 AM:


[~surendrasingh], [~hemanthboyina] please check the issue.


was (Author: pilchard):
[~surendrasingh] [~hemanthboyina] please check the issue.

> Can not read a opening file after NameNode failover if pipeline recover 
> occuered
> 
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
>   
> lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
>   isLastBlockComplete=false}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occuered

2020-06-23 Thread ludun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142764#comment-17142764
 ] 

ludun commented on HDFS-15431:
--

In BlockReceiver we should notifyNamenodeReceivingBlock also when 
PIPELINE_SETUP_STREAMING_RECOVERY
{code}
case PIPELINE_SETUP_CREATE:
  replicaHandler = datanode.data.createRbw(storageType, storageId,
  block, allowLazyPersist);
  datanode.notifyNamenodeReceivingBlock(
  block, replicaHandler.getReplica().getStorageUuid());
  break;
case PIPELINE_SETUP_STREAMING_RECOVERY:
  replicaHandler = datanode.data.recoverRbw(
  block, newGs, minBytesRcvd, maxBytesRcvd);
  block.setGenerationStamp(newGs);
  break;
{code}

so standby namenode can also know the location of new dn.  after failerover we 
can read file normal.

> Can not read a opening file after NameNode failover if pipeline recover 
> occuered
> 
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
>   
> lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
>   isLastBlockComplete=false}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occuered

2020-06-23 Thread ludun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludun updated HDFS-15431:
-
Description: 
a file with two replications and keep it opening.

first it writes to DN1 and DN2.
{code}
2020-06-23 14:22:51,379 | DEBUG | pipeline = 
[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
 | DataStreamer.java:1757
{code}
after DN2 restart, it writes to DN1 and DN3, 
{code}
2020-06-23 14:24:04,559 | DEBUG | pipeline = 
[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
 | DataStreamer.java:1757
{code}
after DN1 restart. it writes to DN3 and DN4.
{code}
2020-06-23 14:25:21,340 | DEBUG | pipeline = 
[DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
 
DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
 | DataStreamer.java:1757
{code}
restart Active NameNode.  then try to get the file.   

NameNode return locatedblocks with DN1 and DN2. Can not obtain block Exception 
occurred.
{code}
20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
  fileLength=0
  underConstruction=true
  
blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
  
lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
  isLastBlockComplete=false}
{code}
 

  was:
a file with two replications and keep it opening.

first it writes to DN1 and DN2.

after DN2 restart, it writes to DN1 and DN3, 

after DN1 restart. it writes to DN3 and DN4.

restart Active NameNode.  then try to get the file.   

NameNode return locatedblocks with DN1 and DN2. Can not obtain block Exception 
occurred.
{code}
20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
  fileLength=0
  underConstruction=true
  
blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
  
lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
  isLastBlockComplete=false}
{code}
 


> Can not read a opening file after NameNode failover if pipeline recover 
> occuered
> 
>
> Key: HDFS-15431
> URL: https://issues.apache.org/jira/browse/HDFS-15431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ludun
>Priority: Major
>
> a file with two replications and keep it opening.
> first it writes to DN1 and DN2.
> {code}
> 2020-06-23 14:22:51,379 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN2:25009,DS-7e434b35-0b10-44fa-9d3b-c3c938f1724d,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN2 restart, it writes to DN1 and DN3, 
> {code}
> 2020-06-23 14:24:04,559 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
>  
> DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK]]
>  | DataStreamer.java:1757
> {code}
> after DN1 restart. it writes to DN3 and DN4.
> {code}
> 2020-06-23 14:25:21,340 | DEBUG | pipeline = 
> [DatanodeInfoWithStorage[DN3:25009,DS-1810c3d5-b6e8-4403-a0fc-071ea6e5489f,DISK],
>  
> DatanodeInfoWithStorage[DN4:25009,DS-5fbb2232-e7c8-4186-8eb9-87a6aff86cef,DISK]]
>  | DataStreamer.java:1757
> {code}
> restart Active NameNode.  then try to get the file.   
> NameNode return locatedblocks with DN1 and DN2. Can not obtain block 
> Exception occurred.
> {code}
> 20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
>   fileLength=0
>   underConstruction=true
>   
> blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
>  getBlockSize()=53; corrupt=false; offset=0; 
> 

[jira] [Created] (HDFS-15431) Can not read a opening file after NameNode failover if pipeline recover occuered

2020-06-23 Thread ludun (Jira)
ludun created HDFS-15431:


 Summary: Can not read a opening file after NameNode failover if 
pipeline recover occuered
 Key: HDFS-15431
 URL: https://issues.apache.org/jira/browse/HDFS-15431
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: ludun


a file with two replications and keep it opening.

first it writes to DN1 and DN2.

after DN2 restart, it writes to DN1 and DN3, 

after DN1 restart. it writes to DN3 and DN4.

restart Active NameNode.  then try to get the file.   

NameNode return locatedblocks with DN1 and DN2. Can not obtain block Exception 
occurred.
{code}
20/06/20 17:57:06 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
  fileLength=0
  underConstruction=true
  
blocks=[LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}]
  
lastLocatedBlock=LocatedBlock{BP-1590194288-10.162.26.113-1587096223927:blk_1073895975_155796;
 getBlockSize()=53; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[DN1:25009,DS-1dcbe5bd-f69a-422c-bea6-a41bda773084,DISK],
 
DatanodeInfoWithStorage[DN2:25009,DS-cd06a4f9-c25d-42ab-887b-f129707dba17,DISK]]}
  isLastBlockComplete=false}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15427) Merged ListStatus with Fallback target filesystem and InternalDirViewFS.

2020-06-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142757#comment-17142757
 ] 

Hudson commented on HDFS-15427:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18374 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18374/])
HDFS-15427. Merged ListStatus with Fallback target filesystem and (github: rev 
7c02d1889bbeabc73c95a4c83f0cd204365ff410)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemLinkFallback.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java


> Merged ListStatus with Fallback target filesystem and InternalDirViewFS.
> 
>
> Key: HDFS-15427
> URL: https://issues.apache.org/jira/browse/HDFS-15427
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently ListStatus will not consider fallback directory when passed path is 
> an internal Directory(except root).
> Since we configured fallback, we should be able to list fallback directories 
> when passed path is internal directory. It should list the union of 
> fallbackDir and internalDir.
> So, that fallback directories will not be shaded when path matched to 
> internal dir.
>  
> The idea here is, user configured default filesystem with fallback fs, then 
> every operation not having link should go to fallback fs. That way users need 
> not configure all paths as mount from default fs.
>  
> This will be very useful in the case of ViewFSOverloadScheme. 
> In ViewFSOverloadScheme, if you choose your existing cluster to be configured 
> as fallback fs, then you can configure desired mount paths to external fs and 
> rest other path should go to fallback.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15427) Merged ListStatus with Fallback target filesystem and InternalDirViewFS.

2020-06-23 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G resolved HDFS-15427.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks a lot [~ayushsaxena] for reviews! 

> Merged ListStatus with Fallback target filesystem and InternalDirViewFS.
> 
>
> Key: HDFS-15427
> URL: https://issues.apache.org/jira/browse/HDFS-15427
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently ListStatus will not consider fallback directory when passed path is 
> an internal Directory(except root).
> Since we configured fallback, we should be able to list fallback directories 
> when passed path is internal directory. It should list the union of 
> fallbackDir and internalDir.
> So, that fallback directories will not be shaded when path matched to 
> internal dir.
>  
> The idea here is, user configured default filesystem with fallback fs, then 
> every operation not having link should go to fallback fs. That way users need 
> not configure all paths as mount from default fs.
>  
> This will be very useful in the case of ViewFSOverloadScheme. 
> In ViewFSOverloadScheme, if you choose your existing cluster to be configured 
> as fallback fs, then you can configure desired mount paths to external fs and 
> rest other path should go to fallback.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15421:
-
Target Version/s: 3.3.0, 3.1.4, 3.2.2, 2.10.1  (was: 2.10.1)
  Labels: release-blocker  (was: )

I observed the leak in our 3.3.0-SNAPSHOT dev cluster. Adding the target 
versions.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>  Labels: release-blocker
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.

2020-06-23 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15430:
---
Description: 
create will not work if the parent dir is Internal mount dir (non leaf in mount 
path) and fall back configured.

Since fallback is available and if same tree structure available in fallback, 
we should be able to create in fallback fs.

> create should work when parent dir is internalDir and fallback configured.
> ---
>
> Key: HDFS-15430
> URL: https://issues.apache.org/jira/browse/HDFS-15430
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> create will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to create in fallback fs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.

2020-06-23 Thread Uma Maheswara Rao G (Jira)
Uma Maheswara Rao G created HDFS-15430:
--

 Summary: create should work when parent dir is internalDir and 
fallback configured.
 Key: HDFS-15430
 URL: https://issues.apache.org/jira/browse/HDFS-15430
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.2.1
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.

2020-06-23 Thread Uma Maheswara Rao G (Jira)
Uma Maheswara Rao G created HDFS-15429:
--

 Summary: mkdirs should work when parent dir is internalDir and 
fallback configured.
 Key: HDFS-15429
 URL: https://issues.apache.org/jira/browse/HDFS-15429
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.21
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


mkdir will not work if the parent dir is Internal mount dir (non leaf in mount 
path) and fall back configured.

Since fallback is available and if same tree structure available in fallback, 
we should be able to mkdir in fallback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-23 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142713#comment-17142713
 ] 

Akira Ajisaka commented on HDFS-15421:
--

Thank you [~kihwal] for the detailed report. I read your report and the 
discussion in HDFS-14941.

In append operation, ANN first log {{OP_SET_GENSTAMP_V2}} and then log 
{{OP_APPEND}}. After HDFS-14941, SNN rolls {{OP_SET_GENSTAMP_V2}} log and set 
impending genstamp without updating the global genstamp. Next SNN rolls 
{{OP_APPEND}} log but the global genstamp is not updated. That's why genstamp 
is never updated and IBR always comes from the future. I think we need to 
update genstamp when rolling {{OP_APPEND}}. In {{OP_TRUNCATE}}, it is the same.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Priority: Blocker
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org