[jira] [Commented] (HDFS-12615) Router-based HDFS federation phase 2

2018-07-01 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529366#comment-16529366
 ] 

Yiqun Lin commented on HDFS-12615:
--

Hi all, looks RBF work has blocked around one months more. This also blocks 
some other small bug fixes that reported by our users. To be more concrete 
saying, since from the feature RBF global Quota released in hadoop 3.1.0 to 
now, we are received some feedbacks from users. And currently these issues 
reported under HDFS-13553. It's a safe change to make these bug fixes of 
existed feature checked in. Also users can just disabled this feature if they 
think it's not stable.
 Can we go ahead of the normal bug fixing for RBF and still keep new features 
implementation be discussed? Looking forward to seeing an agreement, :).

Thanks

> Router-based HDFS federation phase 2
> 
>
> Key: HDFS-12615
> URL: https://issues.apache.org/jira/browse/HDFS-12615
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
>
> This umbrella JIRA tracks set of improvements over the Router-based HDFS 
> federation (HDFS-10467).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13710) RBF: setQuota and getQuotaUsage should check the dfs.federation.router.quota.enable

2018-07-01 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13710:
-
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-13553

> RBF:  setQuota and getQuotaUsage should check the 
> dfs.federation.router.quota.enable
> 
>
> Key: HDFS-13710
> URL: https://issues.apache.org/jira/browse/HDFS-13710
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, hdfs
>Affects Versions: 2.9.1, 3.0.3
>Reporter: yanghuafeng
>Priority: Major
> Attachments: HDFS-13710.patch
>
>
> when I use the command below, some exceptions happened.
>  
> {code:java}
> hdfs dfsrouteradmin -setQuota /tmp -ssQuota 1G 
> {code}
>  the logs follow.
> {code:java}
> Successfully set quota for mount point /tmp
> {code}
> It looks like the quota is set successfully, but some exceptions happen in 
> the rbf server log.
> {code:java}
> java.io.IOException: No remote locations available
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1002)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:967)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:940)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:84)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:255)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:238)
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolServerSideTranslatorPB.updateMountTableEntry(RouterAdminProtocolServerSideTranslatorPB.java:179)
> at 
> org.apache.hadoop.hdfs.protocol.proto.RouterProtocolProtos$RouterAdminProtocolService$2.callBlockingMethod(RouterProtocolProtos.java:259)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111)
> {code}
> I find the dfs.federation.router.quota.enable is false by default. And it 
> causes the problem. I think we should check the parameter when we call 
> setQuota and getQuotaUsage. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13710) RBF: setQuota and getQuotaUsage should check the dfs.federation.router.quota.enable

2018-07-01 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529348#comment-16529348
 ] 

Yiqun Lin commented on HDFS-13710:
--

Thanks [~hfyang20071] for reporting this. Some comments:

* Please update {{//check quota enable}} to {{// check if quota is enabled in 
Router}}
* I will prefer to reuse {{Router#isQuotaEnabled}} for checking this and throw 
an IOException when quota isn't enabled. {{checkOperation(OperationCategory op, 
boolean supported)}} indicates that this method isn't implemented. This seems 
not accurate for placing here.
* Please add an unit test for testing this change.

In addition, when attaching the next patches, please named them like 
HDFS-13710.001.patch, HDFS-13710.002.patch,  So we can distinguish them.

> RBF:  setQuota and getQuotaUsage should check the 
> dfs.federation.router.quota.enable
> 
>
> Key: HDFS-13710
> URL: https://issues.apache.org/jira/browse/HDFS-13710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation, hdfs
>Affects Versions: 2.9.1, 3.0.3
>Reporter: yanghuafeng
>Priority: Major
> Attachments: HDFS-13710.patch
>
>
> when I use the command below, some exceptions happened.
>  
> {code:java}
> hdfs dfsrouteradmin -setQuota /tmp -ssQuota 1G 
> {code}
>  the logs follow.
> {code:java}
> Successfully set quota for mount point /tmp
> {code}
> It looks like the quota is set successfully, but some exceptions happen in 
> the rbf server log.
> {code:java}
> java.io.IOException: No remote locations available
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1002)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:967)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:940)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:84)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:255)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:238)
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolServerSideTranslatorPB.updateMountTableEntry(RouterAdminProtocolServerSideTranslatorPB.java:179)
> at 
> org.apache.hadoop.hdfs.protocol.proto.RouterProtocolProtos$RouterAdminProtocolService$2.callBlockingMethod(RouterProtocolProtos.java:259)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111)
> {code}
> I find the dfs.federation.router.quota.enable is false by default. And it 
> causes the problem. I think we should check the parameter when we call 
> setQuota and getQuotaUsage. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13710) RBF: setQuota and getQuotaUsage should check the dfs.federation.router.quota.enable

2018-07-01 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529332#comment-16529332
 ] 

Fei Hui commented on HDFS-13710:


good catch ! [~elgoiri] [~linyiqun] Could you give any suggestions ?

> RBF:  setQuota and getQuotaUsage should check the 
> dfs.federation.router.quota.enable
> 
>
> Key: HDFS-13710
> URL: https://issues.apache.org/jira/browse/HDFS-13710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation, hdfs
>Affects Versions: 2.9.1, 3.0.3
>Reporter: yanghuafeng
>Priority: Major
> Attachments: HDFS-13710.patch
>
>
> when I use the command below, some exceptions happened.
>  
> {code:java}
> hdfs dfsrouteradmin -setQuota /tmp -ssQuota 1G 
> {code}
>  the logs follow.
> {code:java}
> Successfully set quota for mount point /tmp
> {code}
> It looks like the quota is set successfully, but some exceptions happen in 
> the rbf server log.
> {code:java}
> java.io.IOException: No remote locations available
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1002)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:967)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:940)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:84)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:255)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:238)
> at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolServerSideTranslatorPB.updateMountTableEntry(RouterAdminProtocolServerSideTranslatorPB.java:179)
> at 
> org.apache.hadoop.hdfs.protocol.proto.RouterProtocolProtos$RouterAdminProtocolService$2.callBlockingMethod(RouterProtocolProtos.java:259)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2111)
> {code}
> I find the dfs.federation.router.quota.enable is false by default. And it 
> causes the problem. I think we should check the parameter when we call 
> setQuota and getQuotaUsage. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13688) Introduce msync API call

2018-07-01 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529284#comment-16529284
 ] 

Chen Liang commented on HDFS-13688:
---

There have been some offline discussion on msync, sharing some notes here for 
reference.
 1. it is preferred not to have AlignmentContext in DFSClient. The current 
reason AlignmentContext in DFSClient as in WIP patch is that for a fresh client 
who has no initial state id, the client needs to check AlignmentContext, to see 
if state id is set, if not, DFSClient makes a RPC call to ANN. Since DFSClient 
has to make this check, it needs to see the AlignmentContext instance. If there 
is an alternative way where DFSClient does not need to explicitly make this 
check, there is no need to have AlignmentContext in DFSClient. Still need to 
investigate if there is an alternative way though.
 2. it is preferred not to check method name for msync
 3. need to make sure delegation token gets propagated to Observer first before 
Observer node reacts to a msync call.
 4. as mentioned as a TODO in the WIP patch, still missing the logic to trigger 
client making msync call when Observer node failover happens. Under the context 
current WIP patch, this can be down by reseting AlignmentContext instance 
stateid when switching observer in ProxyProvider.

These are based on discussions with [~shv], [~csun], [~zero45] and [~jnp], 
thanks for the feedbacks!

> Introduce msync API call
> 
>
> Key: HDFS-13688
> URL: https://issues.apache.org/jira/browse/HDFS-13688
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13688-HDFS-12943.WIP.002.patch, 
> HDFS-13688-HDFS-12943.WIP.patch
>
>
> As mentioned in the design doc in HDFS-12943, to ensure consistent read, we 
> need to introduce an RPC call {{msync}}. Specifically, client can issue a 
> msync call to Observer node along with a transactionID. The msync will only 
> return when the Observer's transactionID has caught up to the given ID. This 
> JIRA is to add this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529219#comment-16529219
 ] 

genericqa commented on HDFS-13381:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
38s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestNestedEncryptionZones |
|   | hadoop.hdfs.server.blockmanagement.TestPendingReconstruction |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.TestAclsEndToEnd |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.namenode.TestReencryption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13381 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929876/HDFS-13381-HDFS-10285-03.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 472a42c11641 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh 

[jira] [Commented] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529178#comment-16529178
 ] 

Rakesh R commented on HDFS-13381:
-

Attached another patch fixing sps related test failures and checkstyle warning.

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> 
>
> Key: HDFS-13381
> URL: https://issues.apache.org/jira/browse/HDFS-13381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch, HDFS-13381-HDFS-10285-02.patch, 
> HDFS-13381-HDFS-10285-03.patch
>
>
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13381:

Attachment: HDFS-13381-HDFS-10285-03.patch

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> 
>
> Key: HDFS-13381
> URL: https://issues.apache.org/jira/browse/HDFS-13381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch, HDFS-13381-HDFS-10285-02.patch, 
> HDFS-13381-HDFS-10285-03.patch
>
>
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529126#comment-16529126
 ] 

Nanda kumar commented on HDDS-199:
--

Thanks [~elek] for working on this feature.
We have some compilation issue with this patch.
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-hdds-server-scm: Compilation failure: Compilation failure:
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicateCommandWatcher.java:[18,8]
 org.apache.hadoop.hdds.scm.container.ReplicateCommandWatcher is not abstract 
and does not override abstract method 
onFinished(org.apache.hadoop.hdds.server.events.EventPublisher,org.apache.hadoop.hdds.scm.container.ReplicationManager.ReplicationRequestToRepeat)
 in org.apache.hadoop.hdds.server.events.EventWatcher
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicateCommandWatcher.java:[33,3]
 method does not override or implement a method from a supertype
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicateCommandWatcher.java:[41,3]
 method does not override or implement a method from a supertype
{noformat}
I have just started reviewing the patch, some initial feedback/suggestions.

LeaseManager: long sleepTime = 100L; seems very low.
>> when activeLeases empty, it starts to sleep until the and of the word
We have {{leaseMonitorThread.interrupt()}} call inside {{LeaseManager#acquire}} 
which will interrupt leaseMonitorThread when we create new leases. So this 
shouldn't be a problem.

ContainerPlacementPolicy: Rename suggestion: {{List 
existingReplicas}} to {{List excludedNodes}}. We don't have to 
worry about replica and it's count in ContainerPlacementPolicy, we can say 
which are all the nodes to be excluded in the process.

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529115#comment-16529115
 ] 

genericqa commented on HDFS-13381:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
53s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 366 unchanged - 0 fixed = 367 total (was 366) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}136m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}211m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile |
|   | hadoop.hdfs.server.namenode.TestNestedEncryptionZones |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.TestAclsEndToEnd |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.namenode.TestReencryption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13381 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929857/HDFS-13381-HDFS-10285-02.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 9dd85d90069c 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 

[jira] [Commented] (HDDS-206) default port number taken by ksm is 9862 while listing the volumes

2018-07-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529063#comment-16529063
 ] 

Nanda kumar commented on HDDS-206:
--

Thanks [~shashikant] for updating the patch. +1, LGTM, pending Jenkins.

> default port number taken by ksm is 9862 while listing the volumes
> --
>
> Key: HDDS-206
> URL: https://issues.apache.org/jira/browse/HDDS-206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-206.00.patch, HDDS-206.01.patch
>
>
> Here is the output of ozone -listVolume command without any port mentioned .
> By default, it chooses the port number as 9862 which is not mentioned in the 
> ozone-site.xml
> {noformat}
> [root@ozone-vm bin]# ./ozone oz -listVolume o3://127.0.0.1/
> 2018-06-29 04:42:20,652 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2018-06-29 04:42:21,914 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:22,915 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:23,917 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:24,925 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 3 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:25,928 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 4 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:26,931 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 5 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:27,932 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 6 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:28,934 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 7 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:29,935 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:30,938 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:31,075 [main] ERROR - Couldn't create protocol class 
> org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:292)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:172)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:156)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:111)
>  at org.apache.hadoop.ozone.web.ozShell.Handler.verifyURI(Handler.java:96)
>  at 
> org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.execute(ListVolumeHandler.java:80)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.dispatch(Shell.java:395)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.run(Shell.java:135)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  at 

[jira] [Commented] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529043#comment-16529043
 ] 

Rakesh R commented on HDFS-13381:
-

Attaching another patch inline with the external sps merge thoughts. This patch 
tried to use the {{Path filePath = DFSUtilClient.makePathFromFileId(inodeID);}} 
function and replaces the generics to differentiate String/int file path.

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> 
>
> Key: HDFS-13381
> URL: https://issues.apache.org/jira/browse/HDFS-13381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch, HDFS-13381-HDFS-10285-02.patch
>
>
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13381) [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path

2018-07-01 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13381:

Attachment: HDFS-13381-HDFS-10285-02.patch

> [SPS]: Use DFSUtilClient#makePathFromFileId() to prepare satisfier file path
> 
>
> Key: HDFS-13381
> URL: https://issues.apache.org/jira/browse/HDFS-13381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13381-HDFS-10285-00.patch, 
> HDFS-13381-HDFS-10285-01.patch, HDFS-13381-HDFS-10285-02.patch
>
>
> This Jira task will address the following comments:
>  # Use DFSUtilClient::makePathFromFileId, instead of generics(one for string 
> path and another for inodeId) like today.
>  # Only the context impl differs for external/internal sps. Here, it can 
> simply move FileCollector and BlockMoveTaskHandler to Context interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-07-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529018#comment-16529018
 ] 

genericqa commented on HDFS-13709:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
31s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13709 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929708/HDFS-13709.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 30f58f52840e 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cdb0844 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24528/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24528/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24528/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 3076 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in HDFS

2018-07-01 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-10285:
---
Summary: Storage Policy Satisfier in HDFS  (was: Storage Policy Satisfier 
in Namenode)

> Storage Policy Satisfier in HDFS
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-206) default port number taken by ksm is 9862 while listing the volumes

2018-07-01 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529006#comment-16529006
 ] 

Shashikant Banerjee commented on HDDS-206:
--

Thanks [~nandakumar131], for the review. Patch v1 addresses your  review 
comments. 

I ran the accpetance-test and it all worked fine.
{code:java}
==
Acceptance
==
Acceptance.Basic
==
Acceptance.Basic.Basic :: Smoketest ozone cluster startup
==
Test rest interface | PASS |
--
Check webui static resources | PASS |
--
Start freon testing | PASS |
--
Acceptance.Basic.Basic :: Smoketest ozone cluster startup | PASS |
3 critical tests, 3 passed, 0 failed
3 tests total, 3 passed, 0 failed
==
Acceptance.Basic.Ozone-Shell :: Test ozone shell CLI usage
==
RestClient without http port | PASS |
--
RestClient with http port | PASS |
--
RestClient without host name | PASS |
--
RpcClient with port | PASS |
--
RpcClient without host | PASS |
--
RpcClient without scheme | PASS |
--
Acceptance.Basic.Ozone-Shell :: Test ozone shell CLI usage | PASS |
6 critical tests, 6 passed, 0 failed
6 tests total, 6 passed, 0 failed
==
Acceptance.Basic | PASS |
9 critical tests, 9 passed, 0 failed
9 tests total, 9 passed, 0 failed
==
Acceptance.Ozonefs
==
Acceptance.Ozonefs.Ozonefs :: Ozonefs test
==
Create volume and bucket | PASS |
--
Check volume from ozonefs | PASS |
--
Create directory from ozonefs | PASS |
--
Acceptance.Ozonefs.Ozonefs :: Ozonefs test | PASS |
3 critical tests, 3 passed, 0 failed
3 tests total, 3 passed, 0 failed
==
Acceptance.Ozonefs | PASS |
3 critical tests, 3 passed, 0 failed
3 tests total, 3 passed, 0 failed
==
Acceptance | PASS |
12 critical tests, 12 passed, 0 failed
12 tests total, 12 passed, 0 failed
=={code}

> default port number taken by ksm is 9862 while listing the volumes
> --
>
> Key: HDDS-206
> URL: https://issues.apache.org/jira/browse/HDDS-206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-206.00.patch, HDDS-206.01.patch
>
>
> Here is the output of ozone -listVolume command without any port mentioned .
> By default, it chooses the port number as 9862 which is not mentioned in the 
> ozone-site.xml
> {noformat}
> [root@ozone-vm bin]# ./ozone oz -listVolume o3://127.0.0.1/
> 2018-06-29 04:42:20,652 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2018-06-29 04:42:21,914 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:22,915 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already 

[jira] [Updated] (HDDS-206) default port number taken by ksm is 9862 while listing the volumes

2018-07-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-206:
-
Attachment: HDDS-206.01.patch

> default port number taken by ksm is 9862 while listing the volumes
> --
>
> Key: HDDS-206
> URL: https://issues.apache.org/jira/browse/HDDS-206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-206.00.patch, HDDS-206.01.patch
>
>
> Here is the output of ozone -listVolume command without any port mentioned .
> By default, it chooses the port number as 9862 which is not mentioned in the 
> ozone-site.xml
> {noformat}
> [root@ozone-vm bin]# ./ozone oz -listVolume o3://127.0.0.1/
> 2018-06-29 04:42:20,652 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2018-06-29 04:42:21,914 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:22,915 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:23,917 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:24,925 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 3 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:25,928 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 4 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:26,931 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 5 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:27,932 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 6 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:28,934 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 7 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:29,935 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:30,938 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:31,075 [main] ERROR - Couldn't create protocol class 
> org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:292)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:172)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:156)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:111)
>  at org.apache.hadoop.ozone.web.ozShell.Handler.verifyURI(Handler.java:96)
>  at 
> org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.execute(ListVolumeHandler.java:80)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.dispatch(Shell.java:395)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.run(Shell.java:135)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:114)
> Caused by: java.net.ConnectException: Call From 

[jira] [Commented] (HDDS-206) default port number taken by ksm is 9862 while listing the volumes

2018-07-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529000#comment-16529000
 ] 

Nanda kumar commented on HDDS-206:
--

Thanks [~shashikant] for working on this.
In {{OzoneClientFactory}}, we don't have to change the public API. We can 
create the {{Configuration}} object inside the method instead of adding it as a 
method argument.

> default port number taken by ksm is 9862 while listing the volumes
> --
>
> Key: HDDS-206
> URL: https://issues.apache.org/jira/browse/HDDS-206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-206.00.patch
>
>
> Here is the output of ozone -listVolume command without any port mentioned .
> By default, it chooses the port number as 9862 which is not mentioned in the 
> ozone-site.xml
> {noformat}
> [root@ozone-vm bin]# ./ozone oz -listVolume o3://127.0.0.1/
> 2018-06-29 04:42:20,652 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2018-06-29 04:42:21,914 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:22,915 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:23,917 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:24,925 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 3 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:25,928 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 4 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:26,931 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 5 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:27,932 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 6 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:28,934 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 7 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:29,935 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:30,938 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9862. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2018-06-29 04:42:31,075 [main] ERROR - Couldn't create protocol class 
> org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:292)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:172)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:156)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:111)
>  at org.apache.hadoop.ozone.web.ozShell.Handler.verifyURI(Handler.java:96)
>  at 
> org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.execute(ListVolumeHandler.java:80)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.dispatch(Shell.java:395)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.run(Shell.java:135)
>  at 

[jira] [Commented] (HDDS-187) Command status publisher for datanode

2018-07-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528996#comment-16528996
 ] 

Nanda kumar commented on HDDS-187:
--

[~ajayydv], patch v03 has compilation error in {{DatanodeStateMachine}} class.
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-hdds-container-service: Compilation failure: Compilation failure:
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java:[101,21]
 constructor ReplicateContainerCommandHandler in class 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ReplicateContainerCommandHandler
 cannot be applied to given types;
[ERROR]   required: no arguments
[ERROR]   found: 
org.apache.hadoop.ozone.container.common.statemachine.StateContext
[ERROR]   reason: actual and formal argument lists differ in length
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java:[99,21]
 constructor DeleteBlocksCommandHandler in class 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler
 cannot be applied to given types;
[ERROR]   required: 
org.apache.hadoop.ozone.container.common.interfaces.ContainerManager,org.apache.hadoop.conf.Configuration
[ERROR]   found: 
org.apache.hadoop.ozone.container.common.interfaces.ContainerManager,org.apache.hadoop.conf.Configuration,org.apache.hadoop.ozone.container.common.statemachine.StateContext
[ERROR]   reason: actual and formal argument lists differ in length
[ERROR] 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java:[98,21]
 constructor CloseContainerCommandHandler in class 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler
 cannot be applied to given types;
[ERROR]   required: no arguments
[ERROR]   found: 
org.apache.hadoop.ozone.container.common.statemachine.StateContext
[ERROR]   reason: actual and formal argument lists differ in length
{noformat}

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-167) Rename KeySpaceManager to OzoneManager

2018-07-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528986#comment-16528986
 ] 

Nanda kumar commented on HDDS-167:
--

Triggered precommit build manually.

> Rename KeySpaceManager to OzoneManager
> --
>
> Key: HDDS-167
> URL: https://issues.apache.org/jira/browse/HDDS-167
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Manager
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-167.01.patch, HDDS-167.02.patch, HDDS-167.03.patch, 
> HDDS-167.04.patch, HDDS-167.05.patch, HDDS-167.06.patch, HDDS-167.07.patch, 
> HDDS-167.08.patch
>
>
> The Ozone KeySpaceManager daemon was renamed to OzoneManager. There's some 
> more changes needed to complete the rename everywhere e.g.
> - command-line
> - documentation
> - unit tests
> - Acceptance tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-07-01 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Status: Patch Available  (was: Open)

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org