date:20191103

[jira] [Commented] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966456#comment-16966456
 ] 

Lisheng Sun commented on HDFS-14942:


Thanx [~ayushtkn] [~weichiu] for your good suggestions.

the message should be ignored and not be seen by user.

i updated the patch and uploaded the v004 patch. 

> Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex
> -
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch, HDFS-14942.004.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14942:
---
Attachment: HDFS-14942.004.patch

> Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex
> -
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch, HDFS-14942.004.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14942:
---
Summary: Change Log Level to debug in 
JournalNodeSyncer#syncWithJournalAtIndex  (was: Change Log Level to warn in 
JournalNodeSyncer#syncWithJournalAtIndex)

> Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex
> -
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch, HDFS-14942.004.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966444#comment-16966444
 ] 

Ayush Saxena commented on HDFS-14938:
-

Thanx [~leosun08] seems fair enough to me.
v007 LGTM +1

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch, HDFS-14938.007.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966443#comment-16966443
 ] 

Ayush Saxena commented on HDFS-14942:
-

Thanx [~leosun08] for confirming.
If this happens only during rolling Upgrade and never in any other case, as 
[~weichiu] said, we need to always ignore. Shouldn't we turn  it to debug, 
rather than putting a line to ignore?


> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14802) The feature of protect directories should be used in RenameOp

2019-11-03 Thread Fei Hui (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966434#comment-16966434
 ] 

Fei Hui commented on HDFS-14802:


[~weichiu] Thanks.
Wait for [~ste...@apache.org] comments.
If necessary, Will file a new jara

> The feature of protect directories should be used in RenameOp
> -
>
> Key: HDFS-14802
> URL: https://issues.apache.org/jira/browse/HDFS-14802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14802.001.patch, HDFS-14802.002.patch, 
> HDFS-14802.003.patch, HDFS-14802.004.patch
>
>
> Now we could set fs.protected.directories to prevent users from deleting 
> important directories. But users can delete directories around the limitation.
> 1. Rename the directories and delete them.
> 2. move the directories to trash and namenode will delete them.
> So I think we should use the feature of protected directories in RenameOp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

2019-11-03 Thread Fei Hui (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966420#comment-16966420
 ] 

Fei Hui commented on HDFS-14946:


[~ayushtkn] Thanks for your review!
Upload v005 patch with your comments

> Erasure Coding: Block recovery failed during decommissioning
> 
>
> Key: HDFS-14946
> URL: https://issues.apache.org/jira/browse/HDFS-14946
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14946.001.patch, HDFS-14946.002.patch, 
> HDFS-14946.003.patch
>
>
> DataNode logs as follow
> {quote}
> org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are 
> provided, not recoverable
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstructTargets(StripedBlockReconstructor.java:126)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:97)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {quote}
> Block recovery always failed because of srcNodes in the wrong order
> Reproduce steps are:
> # ec block (b0, b1, b2, b3, b4, b5, b6, b7, b8), b[0-8] are on dn[0-8], 
> dn[0-3] are decommissioning
> # dn[1-3] are decommissioned, dn0 are in decommissioning, ec block is 
> [b0(decommissioning), b[1-3](decommissioned), b[4-8](live), b[0-3](live)]
> # dn4 is crash, and b4 will be recovery, ec block is [b0(decommissioning), 
> b[1-3](decommissioned), null, b[5-8](live), b[0-3](live)]
> We can see error log as above, and b4 is not recovery successfuly. Because 
> srcNodes transfered to recovery datanode contains block [b0, b[5-8],b[0-3]], 
> and datanode use [b0, b[5-8], b0](minRequiredSources Readers to reconstruct, 
> minRequiredSources = Math.min(cellsNum, dataBlkNum)) to recovery the missing 
> block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

2019-11-03 Thread Fei Hui (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HDFS-14946:
---
Attachment: HDFS-14946.003.patch

> Erasure Coding: Block recovery failed during decommissioning
> 
>
> Key: HDFS-14946
> URL: https://issues.apache.org/jira/browse/HDFS-14946
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14946.001.patch, HDFS-14946.002.patch, 
> HDFS-14946.003.patch
>
>
> DataNode logs as follow
> {quote}
> org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are 
> provided, not recoverable
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstructTargets(StripedBlockReconstructor.java:126)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:97)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {quote}
> Block recovery always failed because of srcNodes in the wrong order
> Reproduce steps are:
> # ec block (b0, b1, b2, b3, b4, b5, b6, b7, b8), b[0-8] are on dn[0-8], 
> dn[0-3] are decommissioning
> # dn[1-3] are decommissioned, dn0 are in decommissioning, ec block is 
> [b0(decommissioning), b[1-3](decommissioned), b[4-8](live), b[0-3](live)]
> # dn4 is crash, and b4 will be recovery, ec block is [b0(decommissioning), 
> b[1-3](decommissioned), null, b[5-8](live), b[0-3](live)]
> We can see error log as above, and b4 is not recovery successfuly. Because 
> srcNodes transfered to recovery datanode contains block [b0, b[5-8],b[0-3]], 
> and datanode use [b0, b[5-8], b0](minRequiredSources Readers to reconstruct, 
> minRequiredSources = Math.min(cellsNum, dataBlkNum)) to recovery the missing 
> block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966404#comment-16966404
 ] 

Hadoop QA commented on HDFS-14942:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 59s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14942 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984745/HDFS-14942.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 10a2021552d5 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28237/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28237/testReport/ |
| Max. process+thread count | 2715 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:

[jira] [Work logged] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1569?focusedWorklogId=337937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337937
 ]

ASF GitHub Bot logged work on HDDS-1569:


Author: ASF GitHub Bot
Created on: 04/Nov/19 03:58
Start Date: 04/Nov/19 03:58
Worklog Time Spent: 10m 
  Work Description: timmylicheng commented on pull request #1431: HDDS-1569 
Support creating multiple pipelines with same datanode
URL: https://github.com/apache/hadoop/pull/1431
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337937)
Time Spent: 8.5h  (was: 8h 20m)

> Add ability to SCM for creating multiple pipelines with same datanode
> -
>
> Key: HDDS-1569
> URL: https://issues.apache.org/jira/browse/HDDS-1569
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: HDDS-1564
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines 
> with datanodes that are not a part of sufficient pipelines
> - Define soft and hard upper bounds for pipeline membership
> - Create SCMAllocationManager that can be leveraged to get a candidate set of 
> datanodes based on placement policies
> - Add the datanodes to internal datastructures



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-03 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966376#comment-16966376
 ] 

Hadoop QA commented on HDFS-14941:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
58s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 42s{color} | {color:orange} root: The patch generated 2 new + 542 unchanged 
- 1 fixed = 544 total (was 543) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 34s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
48s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}223m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.conf.TestCommonConfigurationFields |
|   | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestDecommissionWithStriped |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestAddBlockTailing |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14941 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984743/HDFS-14941.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient

[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing

2019-11-03 Thread Li Cheng (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966372#comment-16966372
 ] 

Li Cheng commented on HDDS-2396:


[~bharat] Probably not up to date if HDDS 2379 is resolved in recent a week. I 
can try to use the most updated master again. 

> OM rocksdb core dump during writing
> ---
>
> Key: HDDS-2396
> URL: https://issues.apache.org/jira/browse/HDDS-2396
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
>Reporter: Li Cheng
>Priority: Major
> Attachments: hs_err_pid9340.log
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
>  
> There happens core dump in rocksdb while it's occasional. 
>  
> Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free 
> space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0
> C [librocksdbjni3192271038586903156.so+0x358fec] 
> rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, 
> rocksdb::Slice const&, rocksdb:
> :ValueType)+0x51c
> C [librocksdbjni3192271038586903156.so+0x359d17] 
> rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, 
> rocksdb::Slice const&)+0x17
> C [librocksdbjni3192271038586903156.so+0x3513bc] 
> rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c
> C [librocksdbjni3192271038586903156.so+0x354df9] 
> rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, 
> unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, 
> bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9
> C [librocksdbjni3192271038586903156.so+0x29fd79] 
> rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, 
> rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, 
> bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9
> C [librocksdbjni3192271038586903156.so+0x2a0431] 
> rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, 
> rocksdb::WriteBatch*)+0x21
> C [librocksdbjni3192271038586903156.so+0x1a064c] 
> Java_org_rocksdb_RocksDB_write0+0xcc
> J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe 
> [0x7f58f1872d00+0xbe]
> J 10093% C1 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V
>  (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc]
> j 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4
> j java.lang.Thread.run()V+11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966352#comment-16966352
 ] 

Lisheng Sun commented on HDFS-14942:


Thanks [~weichiu] for good sugestions.

I updated the log message and uploaded the v003 patch.

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14942:
---
Attachment: HDFS-14942.003.patch

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch, 
> HDFS-14942.003.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-11-03 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966312#comment-16966312
 ] 

guojh commented on HDFS-14768:
--

[~surendrasingh] I will add patch for branch3.1 and 3.2 soon or later.

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>

[jira] [Comment Edited] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-11-03 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966312#comment-16966312
 ] 

guojh edited comment on HDFS-14768 at 11/3/19 11:53 PM:


[~surendrasingh] I will add patch for branch3.1 and 3.2  later.


was (Author: gjhkael):
[~surendrasingh] I will add patch for branch3.1 and 3.2 soon or later.

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be

[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-03 Thread Konstantin Shvachko (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966291#comment-16966291
 ] 

Konstantin Shvachko edited comment on HDFS-14941 at 11/3/19 11:08 PM:
--

Attached a patch, which fixes the problem. Also provides a unit tests to 
reproduce the race condition. This is based on [~vagarychen]'s original patch.
Also wanted to mention that {{updateBlockForPipeline()}} does not update 
block's gen stamp neither on Active nor on Standby. It is followed by 
{{updatePipeline()}}, which changes the gen stamp. So no race here.


was (Author: shv):
Attached a patch, which fixes the problem. Also provides a unit tests to 
reproduce the race condition. This is based on [~vagarychen]'s original patch.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-03 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-14941:
---
Status: Patch Available  (was: Open)

Attached a patch, which fixes the problem. Also provides a unit tests to 
reproduce the race condition. This is based on [~vagarychen]'s original patch.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-03 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-14941:
---
Attachment: HDFS-14941.001.patch

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966197#comment-16966197
 ] 

Wei-Chiu Chuang commented on HDFS-14942:


Thanks [~leosun08]

bq. yes, this issue happens during rolling upgrade between Hadoop 2 and Hadoop 
3.
So, we didn't make the new protocol backward compatible. Fortunately it didn't 
stop rolling upgrade.

Could you update the log message to assure users can safely ignore the message 
if it happens during rolling upgrade?

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

2019-11-03 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965699#comment-16965699
 ] 

Surendra Singh Lilhore commented on HDFS-14720:
---

No this is not related to EC and the scenario is :
1. Create file with 3 replication.
2. Change replication factor to 10.
3. Delete the file after scheduling replication. File delete operation change 
the file block length to  Long.MAX_VALUE.
4. DN reject the replication task because NN block length doesn't match with DN 
block length and DN report block as bad block.
Please check the HDFS-10453 for more detail . Scenario is same.

> DataNode shouldn't report block as bad block if the block length is 
> Long.MAX_VALUE.
> ---
>
> Key: HDFS-14720
> URL: https://issues.apache.org/jira/browse/HDFS-14720
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14720.001.patch
>
>
> {noformat}
> 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Can't replicate block 
> BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because 
> on-disk length 175085 is shorter than NameNode recorded length 
> 9223372036854775807.{noformat}
> If the block length is Long.MAX_VALUE, means file belongs to this block is 
> deleted from the namenode and DN got the command after deletion of file. In 
> this case command should be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1515) Create ozone dev-support script to check hadolint violiations

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1515?focusedWorklogId=337815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337815
 ]

ASF GitHub Bot logged work on HDDS-1515:


Author: ASF GitHub Bot
Created on: 03/Nov/19 10:58
Start Date: 03/Nov/19 10:58
Worklog Time Spent: 10m 
  Work Description: akki commented on pull request #114: HDDS-1515. Add 
hadolint checks
URL: https://github.com/apache/hadoop-ozone/pull/114
 
 
   ## What changes were proposed in this pull request?
   
   Add Hadolint checks
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-1515
   
   ## How was this patch tested?
   
   Manually tested. Gives the following output currently
   ```
   Checking 
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:15
 DL3006 Always tag the version of an image explicitly
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:16
 DL3018 Pin versions in apk add. Instead of `apk add ` use `apk add 
=`
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:19
 DL3018 Pin versions in apk add. Instead of `apk add ` use `apk add 
=`
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:19
 DL3019 Use the `--no-cache` switch to avoid the need to use `--update` and 
remove `/var/cache/apk/*` when done installing packages
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:25
 DL3003 Use WORKDIR to switch to a directory
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:37
 DL4001 Either use Wget or Curl but not both
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:37
 DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:42
 DL4001 Either use Wget or Curl but not both
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:42
 DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:47
 DL3013 Pin versions in pip. Instead of `pip install ` use `pip 
install ==`
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:47
 DL4001 Either use Wget or Curl but not both
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:50
 DL4001 Either use Wget or Curl but not both
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:54
 SC2086 Double quote to prevent globbing and word splitting.
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:54
 DL4001 Either use Wget or Curl but not both
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:63
 DL3003 Use WORKDIR to switch to a directory
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dev-support/docker/Dockerfile:63
 DL4001 Either use Wget or Curl but not both
   Checking 
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile:16
 DL3006 Always tag the version of an image explicitly
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile:17
 DL3004 Do not use sudo as it leads to unpredictable behavior. Use a tool like 
gosu to enforce root
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile:19
 DL3004 Do not use sudo as it leads to unpredictable behavior. Use a tool like 
gosu to enforce root
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile:20
 DL3004 Do not use sudo as it leads to unpredictable behavior. Use a tool like 
gosu to enforce root
   
/opt/hadoop-ozone/hadoop-ozone/dev-support/checks/../../../hadoop-ozone/dist/src/main/compose/ozonescripts/Dockerfile:21
 DL3004 Do not use sudo as it leads to unpredictable behavior. Use a tool like 
gosu to enforce root

[jira] [Updated] (HDDS-1515) Create ozone dev-support script to check hadolint violiations

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1515:
-
Labels: newbie pull-request-available  (was: newbie)

> Create ozone dev-support script to check hadolint violiations
> -
>
> Key: HDDS-1515
> URL: https://issues.apache.org/jira/browse/HDDS-1515
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Priority: Major
>  Labels: newbie, pull-request-available
>
> hadoop-ozone/dev-support/checks/ directory contains helper scripts to execute 
> different code quality checks locally.
> They are different from yetus as they can be executed in an easy way and they 
> check _ALL_ the violation of the current code base. 
> We need to create a new script to check the 
> [hadolint|https://github.com/hadolint/hadolint] errors in the hadoop-ozone 
> and hadoop-hdds projects.
> The contracts of the check scripts:
> #  Exit code should define the result (0: passed, <>0 failed)
> # Violation should be printed out to the stdout
> We can assume that the hadolint is part of the development environment. For 
> jenkins we can put it to the image of the dev builds.
> As the check introduce zero-tolerance for the hadolint violations the biggest 
> issue here is to eliminate all of the existing issues.
> Thanks to [~eyang] for reporting that it's still missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965622#comment-16965622
 ] 

Hadoop QA commented on HDFS-14942:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.server.namenode.TestAddStripedBlockInFBR |
|   | hadoop.hdfs.TestDecommissionWithStriped |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14942 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984700/HDFS-14942.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 157ceda3ddc2 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28233/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results |

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965630#comment-16965630
 ] 

Hadoop QA commented on HDFS-14938:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14938 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984702/HDFS-14938.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 21c878405413 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28235/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28235/testReport/ |
| Max. process+thread count | 3930 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28235/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add check if excludedNodes contain scope in 
>

[jira] [Commented] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

2019-11-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965615#comment-16965615
 ] 

Ayush Saxena commented on HDFS-14946:
-

Thanx for the update.
* I think rather than having the logic in the {{ErasureCodingWork(..)}} and 
then setting the src Nodes, we should rather pull the logic up in 
{{scheduleReconstruction()}} in {{BlockManager.java}}, This would save out the 
effort of resetting the srcNodes again and the need of having an extra method 
for explicitly setting srcNodes.
* You may refactor the logic into a single private method, add a javadoc 
explaining the calculation logic. Rather than putting all in the main track.
* Give a check, if no modifications are required in busyIndices calculation 
done as part of recent HDFS-14768.
[~gjhkael] Would be good, if you can help double check.

> Erasure Coding: Block recovery failed during decommissioning
> 
>
> Key: HDFS-14946
> URL: https://issues.apache.org/jira/browse/HDFS-14946
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14946.001.patch, HDFS-14946.002.patch
>
>
> DataNode logs as follow
> {quote}
> org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are 
> provided, not recoverable
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47)
>   at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstructTargets(StripedBlockReconstructor.java:126)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:97)
>   at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {quote}
> Block recovery always failed because of srcNodes in the wrong order
> Reproduce steps are:
> # ec block (b0, b1, b2, b3, b4, b5, b6, b7, b8), b[0-8] are on dn[0-8], 
> dn[0-3] are decommissioning
> # dn[1-3] are decommissioned, dn0 are in decommissioning, ec block is 
> [b0(decommissioning), b[1-3](decommissioned), b[4-8](live), b[0-3](live)]
> # dn4 is crash, and b4 will be recovery, ec block is [b0(decommissioning), 
> b[1-3](decommissioned), null, b[5-8](live), b[0-3](live)]
> We can see error log as above, and b4 is not recovery successfuly. Because 
> srcNodes transfered to recovery datanode contains block [b0, b[5-8],b[0-3]], 
> and datanode use [b0, b[5-8], b0](minRequiredSources Readers to reconstruct, 
> minRequiredSources = Math.min(cellsNum, dataBlkNum)) to recovery the missing 
> block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965627#comment-16965627
 ] 

Hadoop QA commented on HDFS-14938:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14938 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984701/HDFS-14938.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e226f9efc425 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d462308 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28234/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28234/testReport/ |
| Max. process+thread count | 2642 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965605#comment-16965605
 ] 

Lisheng Sun commented on HDFS-14942:


[~ayushtkn]

yes, it only happen during rolling upgrade.

The problem is mainly that the new interface does not exist in the old version.

the new InterQJournalProtocol is used to sychronize past log segments to JNs 
that missed them in another thread.

and if it fails,  it does not affect normal service,

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965604#comment-16965604
 ] 

Lisheng Sun commented on HDFS-14938:


Thanks [~ayushtkn] for checking carefully.

updated the patch and uploaded the v007 patch.

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch, HDFS-14938.007.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14938:
---
Attachment: HDFS-14938.007.patch

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch, HDFS-14938.007.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965601#comment-16965601
 ] 

Ayush Saxena commented on HDFS-14938:
-


{code:java}
+   * Tests it should getting no node, if if a node from scope is
{code}
Double {{if}}

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-11-03 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965599#comment-16965599
 ] 

Ayush Saxena commented on HDFS-14942:
-

Does this happen only, during rolling upgrade? Well issues with JN are usually 
fatal. Give a check,  we don't land up abstracting it in some critical 
situation too.

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch, HDFS-14942.002.patch
>
>
> when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
> added，so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log，and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565)
> at org.apache.hadoop.ipc.Client.call(Client.java:1511)
> at org.apache.hadoop.ipc.Client.call(Client.java:1421)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source)
> at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965589#comment-16965589
 ] 

Lisheng Sun commented on HDFS-14938:


Sorry [~ayushtkn] [~elgoiri]

the v005 patch missed the javadoc for UT. 

the v006 patdh included the javadoc and I uploaded the v006 patch.

Could you help review the v006  patch? Thank you.

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-11-03 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14938:
---
Attachment: HDFS-14938.006.patch

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch, HDFS-14938.004.patch, HDFS-14938.005.patch, 
> HDFS-14938.006.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Updated] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Updated] (HDFS-14942) Change Log Level to debug in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14802) The feature of protect directories should be used in RenameOp

[jira] [Commented] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

[jira] [Updated] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Work logged] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Updated] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

[jira] [Comment Edited] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.

[jira] [Work logged] (HDDS-1515) Create ozone dev-support script to check hadolint violiations

[jira] [Updated] (HDDS-1515) Create ozone dev-support script to check hadolint violiations

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Commented] (HDFS-14946) Erasure Coding: Block recovery failed during decommissioning

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

34 matches

Site Navigation

Mail list logo

Footer information