[jira] [Work logged] (HDDS-2034) Async RATIS pipeline creation and destroy through heartbeat commands

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?focusedWorklogId=320356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320356
 ]

ASF GitHub Bot logged work on HDDS-2034:


Author: ASF GitHub Bot
Created on: 30/Sep/19 05:54
Start Date: 30/Sep/19 05:54
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #1469: HDDS-2034. 
Async RATIS pipeline creation and destroy through heartbea…
URL: https://github.com/apache/hadoop/pull/1469#discussion_r329416985
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java
 ##
 @@ -148,17 +166,20 @@ private void initializePipelineState() throws 
IOException {
   }
 
   @Override
-  public synchronized Pipeline createPipeline(
-  ReplicationType type, ReplicationFactor factor) throws IOException {
+  public synchronized Pipeline createPipeline(ReplicationType type,
+  ReplicationFactor factor) throws IOException {
 lock.writeLock().lock();
 try {
   Pipeline pipeline = pipelineFactory.create(type, factor);
   pipelineStore.put(pipeline.getId().getProtobuf().toByteArray(),
   pipeline.getProtobufMessage().toByteArray());
   stateManager.addPipeline(pipeline);
   nodeManager.addPipeline(pipeline);
-  metrics.incNumPipelineCreated();
-  metrics.createPerPipelineMetrics(pipeline);
+  metrics.incNumPipelineAllocated();
+  if (pipeline.isOpen()) {
+metrics.incNumPipelineCreated();
+metrics.createPerPipelineMetrics(pipeline);
+  }
 
 Review comment:
   Sorry! I did not get this point. createPipeline(ReplicationType type, 
ReplicationFactor factor) would create ONE factor pipeline also in ALLOCATED 
state according to the patch. Am I missing something?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320356)
Time Spent: 10h 20m  (was: 10h 10m)

> Async RATIS pipeline creation and destroy through heartbeat commands
> 
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Currently, pipeline creation and destroy are synchronous operations. SCM 
> directly connect to each datanode of the pipeline through gRPC channel to 
> create the pipeline to destroy the pipeline.  
> This task is to remove the gRPC channel, send pipeline creation and destroy 
> action through heartbeat command to each datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14810) review FSNameSystem editlog sync

2019-09-29 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940660#comment-16940660
 ] 

Xiaoqiao He commented on HDFS-14810:


[~jojochuang] gentle reminder again. any update here?

> review FSNameSystem editlog sync
> 
>
> Key: HDFS-14810
> URL: https://issues.apache.org/jira/browse/HDFS-14810
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14810.001.patch, HDFS-14810.002.patch, 
> HDFS-14810.003.patch, HDFS-14810.004.patch
>
>
> refactor and unified type of edit log sync in FSNamesystem as HDFS-11246 
> mentioned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2034) Async RATIS pipeline creation and destroy through heartbeat commands

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?focusedWorklogId=320353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320353
 ]

ASF GitHub Bot logged work on HDDS-2034:


Author: ASF GitHub Bot
Created on: 30/Sep/19 05:45
Start Date: 30/Sep/19 05:45
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #1469: HDDS-2034. 
Async RATIS pipeline creation and destroy through heartbea…
URL: https://github.com/apache/hadoop/pull/1469#discussion_r329415448
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/OneReplicaPipelineSafeModeRule.java
 ##
 @@ -75,69 +66,59 @@ public OneReplicaPipelineSafeModeRule(String ruleName, 
EventQueue eventQueue,
 HDDS_SCM_SAFEMODE_ONE_NODE_REPORTED_PIPELINE_PCT  +
 " value should be >= 0.0 and <= 1.0");
 
+// Exclude CLOSED pipeline
 int totalPipelineCount =
 pipelineManager.getPipelines(HddsProtos.ReplicationType.RATIS,
-HddsProtos.ReplicationFactor.THREE).size();
+HddsProtos.ReplicationFactor.THREE, Pipeline.PipelineState.OPEN)
+.size() +
+pipelineManager.getPipelines(HddsProtos.ReplicationType.RATIS,
 
 Review comment:
   OneReplicaPipelineSafeModeRule as per the description tracks the open 
containers in the system so that atleast one replica is available. So I guess 
if we are tracking one node pipelines, we can track them in this rule as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320353)
Time Spent: 10h 10m  (was: 10h)

> Async RATIS pipeline creation and destroy through heartbeat commands
> 
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Currently, pipeline creation and destroy are synchronous operations. SCM 
> directly connect to each datanode of the pipeline through gRPC channel to 
> create the pipeline to destroy the pipeline.  
> This task is to remove the gRPC channel, send pipeline creation and destroy 
> action through heartbeat command to each datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2034) Async RATIS pipeline creation and destroy through heartbeat commands

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2034?focusedWorklogId=320350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320350
 ]

ASF GitHub Bot logged work on HDDS-2034:


Author: ASF GitHub Bot
Created on: 30/Sep/19 05:36
Start Date: 30/Sep/19 05:36
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #1469: HDDS-2034. 
Async RATIS pipeline creation and destroy through heartbea…
URL: https://github.com/apache/hadoop/pull/1469#discussion_r329413810
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/BlockManagerImpl.java
 ##
 @@ -188,6 +208,15 @@ public AllocatedBlock allocateBlock(final long size, 
ReplicationType type,
   // TODO: #CLUTIL Remove creation logic when all replication types and
   // factors are handled by pipeline creator
   pipeline = pipelineManager.createPipeline(type, factor);
+  // wait until pipeline is ready
+  long current = System.currentTimeMillis();
+  while (!pipeline.isOpen() && System.currentTimeMillis() <
+  (current + pipelineCreateWaitTimeout)) {
+try {
+  Thread.sleep(1000);
+} catch (InterruptedException e) {
+}
+  }
 
 Review comment:
   > Maybe the best way is not create pipeline in such case if we can make sure 
there are enough pipelines to use after exited safe mode.
   
   @ChenSammi I agree. If we can remove the pipeline creation in block 
allocation it would be great.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320350)
Time Spent: 10h  (was: 9h 50m)

> Async RATIS pipeline creation and destroy through heartbeat commands
> 
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Currently, pipeline creation and destroy are synchronous operations. SCM 
> directly connect to each datanode of the pipeline through gRPC channel to 
> create the pipeline to destroy the pipeline.  
> This task is to remove the gRPC channel, send pipeline creation and destroy 
> action through heartbeat command to each datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.

2019-09-29 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940656#comment-16940656
 ] 

Surendra Singh Lilhore edited comment on HDFS-14794 at 9/30/19 5:32 AM:


Root cause of this bad block is HDFS-14720. We also saw the same in Observer 
namenode.


was (Author: surendrasingh):
Root cause of this bad block is HDFS-14794. We also saw the same in Observer 
namenode.

> [SBN read] reportBadBlock is rejected by Observer.
> --
>
> Key: HDFS-14794
> URL: https://issues.apache.org/jira/browse/HDFS-14794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{reportBadBlock}} is rejected by Observer via StandbyException
> {code}StandbyException: Operation category WRITE is not supported in state 
> observer{code}
> We should investigate what are the consequences of this and if we should 
> treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of 
> both {{ClientProtocol}} and {{DatanodeProtocol}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.

2019-09-29 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940656#comment-16940656
 ] 

Surendra Singh Lilhore commented on HDFS-14794:
---

Root cause of this bad block is HDFS-14794. We also saw the same in Observer 
namenode.

> [SBN read] reportBadBlock is rejected by Observer.
> --
>
> Key: HDFS-14794
> URL: https://issues.apache.org/jira/browse/HDFS-14794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{reportBadBlock}} is rejected by Observer via StandbyException
> {code}StandbyException: Operation category WRITE is not supported in state 
> observer{code}
> We should investigate what are the consequences of this and if we should 
> treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of 
> both {{ClientProtocol}} and {{DatanodeProtocol}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2204) Avoid buffer coping in checksum verification

2019-09-29 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-2204:
-
Status: Patch Available  (was: Open)

> Avoid buffer coping in checksum verification
> 
>
> Key: HDDS-2204
> URL: https://issues.apache.org/jira/browse/HDDS-2204
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: o2204_20190930.patch
>
>
> In Checksum.verifyChecksum(ByteString, ..), it first converts the ByteString 
> to a byte array.  It lead to an unnecessary buffer coping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2056) Datanode unable to start command handler thread with security enabled

2019-09-29 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-2056:
---

Assignee: Xiaoyu Yao

> Datanode unable to start command handler thread with security enabled
> -
>
> Key: HDDS-2056
> URL: https://issues.apache.org/jira/browse/HDDS-2056
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 0.5.0
>
>
>  
> {code:java}
> 2019-08-29 02:50:23,536 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> Critical Error : Command processor thread encountered an error. Thread: 
> Thread[Command processor thread,5,main]
> java.lang.IllegalArgumentException: Null user
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256)
>         at 
> org.apache.hadoop.hdds.security.token.BlockTokenVerifier.verify(BlockTokenVerifier.java:116)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.XceiverServer.submitRequest(XceiverServer.java:68)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.submitRequest(XceiverServerRatis.java:482)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:109)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-29 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940652#comment-16940652
 ] 

Xiaoqiao He commented on HDFS-14305:


Thanks [~shv],[~arp] very much for your feedback and works. it seems that all 
works(include [^HDFS-14305.006.patch] ) we did is just reducing conflict 
probability rather than avoid it completely.
{quote}If you start the NNs in arbitrary order, you can get block token 
collisions because the ranges will change in 3.2.1 compared to 3.2.0.{quote}
This case seems not eliminate with/without [^HDFS-14305-007.patch] changes. 
Please correct me if something wrong. Thanks [~shv] again.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14305-007.patch, HDFS-14305.001.patch, 
> HDFS-14305.002.patch, HDFS-14305.003.patch, HDFS-14305.004.patch, 
> HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2204) Avoid buffer coping in checksum verification

2019-09-29 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-2204:
-
Attachment: o2204_20190930.patch

> Avoid buffer coping in checksum verification
> 
>
> Key: HDDS-2204
> URL: https://issues.apache.org/jira/browse/HDDS-2204
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: o2204_20190930.patch
>
>
> In Checksum.verifyChecksum(ByteString, ..), it first converts the ByteString 
> to a byte array.  It lead to an unnecessary buffer coping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14554) Avoid to process duplicate blockreport of datanode when namenode in startup safemode

2019-09-29 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940646#comment-16940646
 ] 

Xiaoqiao He commented on HDFS-14554:


Thanks [~lindongdong] for your comments. This ticket is not valid. I just 
suggest to try lifeline feature or update safemode checking 
mechanism/HDFS-14559 is one demo implementation or turn off safemode auto leave 
when namenode startup if you also meet bottleneck about namenode restart. My 
own opinion  FYI.

> Avoid to process duplicate blockreport of datanode when namenode in startup 
> safemode
> 
>
> Key: HDFS-14554
> URL: https://issues.apache.org/jira/browse/HDFS-14554
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14554.001.patch
>
>
> When NameNode restart and enter stage of startup safemode, load of NameNode 
> could be very high since blockreport request storm from datanodes together. 
> Sometimes datanode may request blockreport times because NameNode doesn't 
> return in time and DataNode meet timeoutexception then retry, Thus it 
> sharpens load of NameNode.
> So we should check and filter duplicate blockreport request from one 
> datannode and reduce load of Namenode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2019-09-29 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940643#comment-16940643
 ] 

Xiaoqiao He commented on HDFS-14090:


Thanks [~crh] for your works and sorry for late response.
1. compile RBF sub-module and run rbf related unit tests, all passed.
2. check isolation logic with test env (use cases: high load of one subcluster 
+ unavailability of one subcluster, lightweight load of all subclusters, high 
lad of all subclusters) and it meets expectation for me.
3. check code style, it is clean as Jenkins reports above.
+1 for [^HDFS-14090.014.patch] from my side overall. hope to step forward. 
Thanks [~crh] again.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2019-09-29 Thread CR Hota (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940620#comment-16940620
 ] 

CR Hota commented on HDFS-14090:


[~elgoiri] [~xkrogen] Thanks for your patience. Lets try and close this in the 
coming week.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, RBF_ 
> Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940618#comment-16940618
 ] 

Hadoop QA commented on HDFS-14814:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 2 new + 5 unchanged - 0 fixed = 7 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 
41s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | HDFS-14814 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981737/HDFS-14814.010.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c4b45f252042 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 760b523 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27985/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27985/testReport/ |
| Max. process+thread count | 1585 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27985/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Created] (HDDS-2204) Avoid buffer coping in checksum verification

2019-09-29 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HDDS-2204:


 Summary: Avoid buffer coping in checksum verification
 Key: HDDS-2204
 URL: https://issues.apache.org/jira/browse/HDDS-2204
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz-wo Sze
Assignee: Tsz-wo Sze


In Checksum.verifyChecksum(ByteString, ..), it first converts the ByteString to 
a byte array.  It lead to an unnecessary buffer coping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940608#comment-16940608
 ] 

Hadoop QA commented on HDFS-14814:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
19s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 19s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 2 new + 5 unchanged - 0 fixed = 7 total (was 5) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 21s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | HDFS-14814 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981736/HDFS-14814.010.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4ddb4beddf90 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 760b523 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27984/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27984/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27984/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| checkstyle | 

[jira] [Commented] (HDFS-14554) Avoid to process duplicate blockreport of datanode when namenode in startup safemode

2019-09-29 Thread lindongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940607#comment-16940607
 ] 

lindongdong commented on HDFS-14554:


Hi, [~hexiaoqiao], Thanks for your work.

I test your patch, a lot of blocks are not reported. As I analyze, when DN 
reports blocks one disk by one disk, this problem happens.

A boolean is not enough for this, maybe a Set is needed.

> Avoid to process duplicate blockreport of datanode when namenode in startup 
> safemode
> 
>
> Key: HDFS-14554
> URL: https://issues.apache.org/jira/browse/HDFS-14554
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14554.001.patch
>
>
> When NameNode restart and enter stage of startup safemode, load of NameNode 
> could be very high since blockreport request storm from datanodes together. 
> Sometimes datanode may request blockreport times because NameNode doesn't 
> return in time and DataNode meet timeoutexception then retry, Thus it 
> sharpens load of NameNode.
> So we should check and filter duplicate blockreport request from one 
> datannode and reduce load of Namenode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14814:
---
Attachment: HDFS-14814.010.patch

> RBF: RouterQuotaUpdateService supports inherited rule.
> --
>
> Key: HDFS-14814
> URL: https://issues.apache.org/jira/browse/HDFS-14814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14814.001.patch, HDFS-14814.002.patch, 
> HDFS-14814.003.patch, HDFS-14814.004.patch, HDFS-14814.005.patch, 
> HDFS-14814.006.patch, HDFS-14814.007.patch, HDFS-14814.008.patch, 
> HDFS-14814.009.patch, HDFS-14814.010.patch
>
>
> I want to add a rule *'The quota should be set the same as the nearest 
> parent'* to Global Quota. Supposing we have the mount table below.
> M1: /dir-a                            ns0->/dir-a     \{nquota=10,squota=20}
> M2: /dir-a/dir-b                 ns1->/dir-b     \{nquota=-1,squota=30}
> M3: /dir-a/dir-b/dir-c       ns2->/dir-c     \{nquota=-1,squota=-1}
> M4: /dir-d                           ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota for the remote locations on the namespaces should be:
>  ns0->/dir-a     \{nquota=10,squota=20}
>  ns1->/dir-b     \{nquota=10,squota=30}
>  ns2->/dir-c      \{nquota=10,squota=30}
>  ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota of the remote location is set the same as the corresponding 
> MountTable, and if there is no quota of the MountTable then the quota is set 
> to the nearest parent MountTable with quota.
>  
> It's easy to implement it. In RouterQuotaUpdateService each time we compute 
> the currentQuotaUsage, we can get the quota info for each MountTable. We can 
> do a
>  check and fix all the MountTable which's quota doesn't match the rule above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14814:
---
Attachment: (was: HDFS-14814.010.patch)

> RBF: RouterQuotaUpdateService supports inherited rule.
> --
>
> Key: HDFS-14814
> URL: https://issues.apache.org/jira/browse/HDFS-14814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14814.001.patch, HDFS-14814.002.patch, 
> HDFS-14814.003.patch, HDFS-14814.004.patch, HDFS-14814.005.patch, 
> HDFS-14814.006.patch, HDFS-14814.007.patch, HDFS-14814.008.patch, 
> HDFS-14814.009.patch
>
>
> I want to add a rule *'The quota should be set the same as the nearest 
> parent'* to Global Quota. Supposing we have the mount table below.
> M1: /dir-a                            ns0->/dir-a     \{nquota=10,squota=20}
> M2: /dir-a/dir-b                 ns1->/dir-b     \{nquota=-1,squota=30}
> M3: /dir-a/dir-b/dir-c       ns2->/dir-c     \{nquota=-1,squota=-1}
> M4: /dir-d                           ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota for the remote locations on the namespaces should be:
>  ns0->/dir-a     \{nquota=10,squota=20}
>  ns1->/dir-b     \{nquota=10,squota=30}
>  ns2->/dir-c      \{nquota=10,squota=30}
>  ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota of the remote location is set the same as the corresponding 
> MountTable, and if there is no quota of the MountTable then the quota is set 
> to the nearest parent MountTable with quota.
>  
> It's easy to implement it. In RouterQuotaUpdateService each time we compute 
> the currentQuotaUsage, we can get the quota info for each MountTable. We can 
> do a
>  check and fix all the MountTable which's quota doesn't match the rule above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940593#comment-16940593
 ] 

Jinglun commented on HDFS-14814:


 
{quote}If you want to return a List of Entry, just create a new Map.
{quote}
Good idea ! I'll return a TreeMap instead. Upload v10 
and pend jenkins.

> RBF: RouterQuotaUpdateService supports inherited rule.
> --
>
> Key: HDFS-14814
> URL: https://issues.apache.org/jira/browse/HDFS-14814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14814.001.patch, HDFS-14814.002.patch, 
> HDFS-14814.003.patch, HDFS-14814.004.patch, HDFS-14814.005.patch, 
> HDFS-14814.006.patch, HDFS-14814.007.patch, HDFS-14814.008.patch, 
> HDFS-14814.009.patch, HDFS-14814.010.patch
>
>
> I want to add a rule *'The quota should be set the same as the nearest 
> parent'* to Global Quota. Supposing we have the mount table below.
> M1: /dir-a                            ns0->/dir-a     \{nquota=10,squota=20}
> M2: /dir-a/dir-b                 ns1->/dir-b     \{nquota=-1,squota=30}
> M3: /dir-a/dir-b/dir-c       ns2->/dir-c     \{nquota=-1,squota=-1}
> M4: /dir-d                           ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota for the remote locations on the namespaces should be:
>  ns0->/dir-a     \{nquota=10,squota=20}
>  ns1->/dir-b     \{nquota=10,squota=30}
>  ns2->/dir-c      \{nquota=10,squota=30}
>  ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota of the remote location is set the same as the corresponding 
> MountTable, and if there is no quota of the MountTable then the quota is set 
> to the nearest parent MountTable with quota.
>  
> It's easy to implement it. In RouterQuotaUpdateService each time we compute 
> the currentQuotaUsage, we can get the quota info for each MountTable. We can 
> do a
>  check and fix all the MountTable which's quota doesn't match the rule above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14814) RBF: RouterQuotaUpdateService supports inherited rule.

2019-09-29 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14814:
---
Attachment: HDFS-14814.010.patch

> RBF: RouterQuotaUpdateService supports inherited rule.
> --
>
> Key: HDFS-14814
> URL: https://issues.apache.org/jira/browse/HDFS-14814
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14814.001.patch, HDFS-14814.002.patch, 
> HDFS-14814.003.patch, HDFS-14814.004.patch, HDFS-14814.005.patch, 
> HDFS-14814.006.patch, HDFS-14814.007.patch, HDFS-14814.008.patch, 
> HDFS-14814.009.patch, HDFS-14814.010.patch
>
>
> I want to add a rule *'The quota should be set the same as the nearest 
> parent'* to Global Quota. Supposing we have the mount table below.
> M1: /dir-a                            ns0->/dir-a     \{nquota=10,squota=20}
> M2: /dir-a/dir-b                 ns1->/dir-b     \{nquota=-1,squota=30}
> M3: /dir-a/dir-b/dir-c       ns2->/dir-c     \{nquota=-1,squota=-1}
> M4: /dir-d                           ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota for the remote locations on the namespaces should be:
>  ns0->/dir-a     \{nquota=10,squota=20}
>  ns1->/dir-b     \{nquota=10,squota=30}
>  ns2->/dir-c      \{nquota=10,squota=30}
>  ns3->/dir-d     \{nquota=-1,squota=-1}
>  
> The quota of the remote location is set the same as the corresponding 
> MountTable, and if there is no quota of the MountTable then the quota is set 
> to the nearest parent MountTable with quota.
>  
> It's easy to implement it. In RouterQuotaUpdateService each time we compute 
> the currentQuotaUsage, we can get the quota info for each MountTable. We can 
> do a
>  check and fix all the MountTable which's quota doesn't match the rule above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940561#comment-16940561
 ] 

Hadoop QA commented on HDFS-14305:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
25s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 26s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}154m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Bad attempt to compute absolute value of signed random integer in new 
org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager(boolean, 
long, long, String, String, int, int, boolean, boolean)  At 
BlockTokenSecretManager.java:value of signed random integer in new 
org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager(boolean, 
long, long, String, String, int, int, boolean, boolean)  At 
BlockTokenSecretManager.java:[line 153] |
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | HDFS-14305 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981621/HDFS-14305-007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f0a018cd94ad 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 760b523 |
| maven | 

[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940533#comment-16940533
 ] 

Hadoop QA commented on HDFS-13736:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 13 new + 53 unchanged - 0 fixed = 66 total (was 53) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 25s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
32s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | 
hadoop.hdfs.server.blockmanagement.TestChooseFavoredNodesWithoutPreferLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | HDFS-13736 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981719/HDFS-13736.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f204436d0cbf 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c0edc84 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27982/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27982/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27982/testReport/ |

[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940530#comment-16940530
 ] 

Hudson commented on HDFS-14305:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17414 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17414/])
Revert "HDFS-14305. Fix serial number calculation in (shv: rev 
760b523e58fd1069f0726ae853bed5d44e9d1dc6)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailoverWithBlockTokensEnabled.java


> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14305-007.patch, HDFS-14305.001.patch, 
> HDFS-14305.002.patch, HDFS-14305.003.patch, HDFS-14305.004.patch, 
> HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-29 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-14305:
---
Status: Patch Available  (was: Reopened)

I just reverted this from trunk.
Will let Jenkins run on v07 patch.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.3, 3.2.1
>
> Attachments: HDFS-14305-007.patch, HDFS-14305.001.patch, 
> HDFS-14305.002.patch, HDFS-14305.003.patch, HDFS-14305.004.patch, 
> HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940521#comment-16940521
 ] 

Konstantin Shvachko commented on HDFS-14509:


Hey [~John Smith].
 Agreed we should apply to trunk and backport to 3.2, 3.1, and 2.10. With 2.10 
we are trying to make it a bridge release, it should allow upgrading to 3.x.

Let me clarify about the tests. Suppose that your fix is applied to 3.2 and 
2.10, but not to 2.9.
 # One test should make sure that when we upgrade from 2.10 to 3.2 the 
passwords are verified correctly on DNs running 2.10.
 # Another test should verify that when upgrading from 2.9 (which does not have 
the fix) to 2.10 (which does) the passwords are verified correctly on DNs 
running 2.9.

Hope it makes sense.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2203) Race condition in ByteStringHelper.init()

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2203?focusedWorklogId=320279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320279
 ]

ASF GitHub Bot logged work on HDDS-2203:


Author: ASF GitHub Bot
Created on: 29/Sep/19 19:21
Start Date: 29/Sep/19 19:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1544: HDDS-2203 Race 
condition in ByteStringHelper.init()
URL: https://github.com/apache/hadoop/pull/1544#issuecomment-536332649
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 1740 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | -1 | mvninstall | 31 | hadoop-hdds in trunk failed. |
   | -1 | mvninstall | 39 | hadoop-ozone in trunk failed. |
   | -1 | compile | 21 | hadoop-hdds in trunk failed. |
   | -1 | compile | 16 | hadoop-ozone in trunk failed. |
   | -0 | checkstyle | 37 | The patch fails to run checkstyle in hadoop-ozone |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 918 | branch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 21 | hadoop-hdds in trunk failed. |
   | -1 | javadoc | 18 | hadoop-ozone in trunk failed. |
   | 0 | spotbugs | 1016 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | -1 | findbugs | 34 | hadoop-hdds in trunk failed. |
   | -1 | findbugs | 20 | hadoop-ozone in trunk failed. |
   ||| _ Patch Compile Tests _ |
   | -1 | mvninstall | 35 | hadoop-hdds in the patch failed. |
   | -1 | mvninstall | 35 | hadoop-ozone in the patch failed. |
   | -1 | compile | 23 | hadoop-hdds in the patch failed. |
   | -1 | compile | 17 | hadoop-ozone in the patch failed. |
   | -1 | javac | 23 | hadoop-hdds in the patch failed. |
   | -1 | javac | 17 | hadoop-ozone in the patch failed. |
   | -0 | checkstyle | 29 | The patch fails to run checkstyle in hadoop-ozone |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 721 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 21 | hadoop-hdds in the patch failed. |
   | -1 | javadoc | 19 | hadoop-ozone in the patch failed. |
   | -1 | findbugs | 31 | hadoop-hdds in the patch failed. |
   | -1 | findbugs | 21 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | -1 | unit | 28 | hadoop-hdds in the patch failed. |
   | -1 | unit | 25 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 32 | The patch does not generate ASF License warnings. |
   | | | 4090 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1544 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux dbe4703bc166 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / c0edc84 |
   | Default Java | 1.8.0_222 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-mvninstall-hadoop-hdds.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-compile-hadoop-hdds.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-compile-hadoop-ozone.txt
 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out//home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1544/out/maven-branch-checkstyle-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-javadoc-hadoop-hdds.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-javadoc-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1544/2/artifact/out/branch-findbugs-hadoop-hdds.txt
 |
   | findbugs | 

[jira] [Commented] (HDDS-2203) Race condition in ByteStringHelper.init()

2019-09-29 Thread Istvan Fajth (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940499#comment-16940499
 ] 

Istvan Fajth commented on HDDS-2203:


I have some concerns about the design of this class, and I would like to have 
some insights from others, especially from [~aengineer] as he reviewed the 
change so far.
What do you think about removing this class, and move its parts back to the 
calling code, to
#1 avoid the synchronization at all
#2 eliminate the need of the init call before the conversion methods as this I 
think is an unfortunate design
#3 move this logic where it is used, as it is used from two places, but from 
one place one of the methods are called, from the other, an other method is 
called?

If you agree, I would create a new JIRA for that, and also I am happy to 
provide a PR with the changes.

> Race condition in ByteStringHelper.init()
> -
>
> Key: HDDS-2203
> URL: https://issues.apache.org/jira/browse/HDDS-2203
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, SCM
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available, pull-requests-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current init method:
> {code}
> public static void init(boolean isUnsafeByteOperation) {
>   final boolean set = INITIALIZED.compareAndSet(false, true);
>   if (set) {
> ByteStringHelper.isUnsafeByteOperationsEnabled =
>isUnsafeByteOperation;
>} else {
>  // already initialized, check values
>  Preconditions.checkState(isUnsafeByteOperationsEnabled
>== isUnsafeByteOperation);
>}
> }
> {code}
> In a scenario when two thread accesses this method, and the execution order 
> is the following, then the second thread runs into an exception from 
> PreCondition.checkState() in the else branch.
> In an unitialized state:
> - T1 thread arrives to the method with true as the parameter, the class 
> initialises the isUnsafeByteOperationsEnabled to false
> - T1 sets INITIALIZED true
> - T2 arrives to the method with true as the parameter
> - T2 reads the INITALIZED value and as it is not false goes to else branch
> - T2 tries to check if the internal boolean property is the same true as it 
> wanted to set, and as T1 still to set the value, the checkState throws an 
> IllegalArgumentException.
> This happens in certain Hive query cases, as it came from that testing, the 
> exception we see there is the following:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 2, vertexId=vertex_1569486223160_0334_1_02, 
> diagnostics=[Vertex vertex_1569486223160_0334_1_02 [Map 2] killed/failed
>  due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: item initializer failed, 
> vertex=vertex_1569486223160_0334_1_02 [Map 2], java.io.IOException: Couldn't 
> create RpcClient protocol
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:263)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:239)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:203)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:165)
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:158)
> at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
> at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:102)
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:155)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3315)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3364)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3332)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1821)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2002)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
> at 
> 

[jira] [Work logged] (HDDS-2203) Race condition in ByteStringHelper.init()

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2203?focusedWorklogId=320245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320245
 ]

ASF GitHub Bot logged work on HDDS-2203:


Author: ASF GitHub Bot
Created on: 29/Sep/19 17:39
Start Date: 29/Sep/19 17:39
Worklog Time Spent: 10m 
  Work Description: fapifta commented on issue #1544: HDDS-2203 Race 
condition in ByteStringHelper.init()
URL: https://github.com/apache/hadoop/pull/1544#issuecomment-536323684
 
 
   I have fixed the checkstyle issues, and removed volatile from the config 
holder boolean as that is not necessary after init is synchronized.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 320245)
Time Spent: 40m  (was: 0.5h)

> Race condition in ByteStringHelper.init()
> -
>
> Key: HDDS-2203
> URL: https://issues.apache.org/jira/browse/HDDS-2203
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, SCM
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available, pull-requests-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current init method:
> {code}
> public static void init(boolean isUnsafeByteOperation) {
>   final boolean set = INITIALIZED.compareAndSet(false, true);
>   if (set) {
> ByteStringHelper.isUnsafeByteOperationsEnabled =
>isUnsafeByteOperation;
>} else {
>  // already initialized, check values
>  Preconditions.checkState(isUnsafeByteOperationsEnabled
>== isUnsafeByteOperation);
>}
> }
> {code}
> In a scenario when two thread accesses this method, and the execution order 
> is the following, then the second thread runs into an exception from 
> PreCondition.checkState() in the else branch.
> In an unitialized state:
> - T1 thread arrives to the method with true as the parameter, the class 
> initialises the isUnsafeByteOperationsEnabled to false
> - T1 sets INITIALIZED true
> - T2 arrives to the method with true as the parameter
> - T2 reads the INITALIZED value and as it is not false goes to else branch
> - T2 tries to check if the internal boolean property is the same true as it 
> wanted to set, and as T1 still to set the value, the checkState throws an 
> IllegalArgumentException.
> This happens in certain Hive query cases, as it came from that testing, the 
> exception we see there is the following:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 2, vertexId=vertex_1569486223160_0334_1_02, 
> diagnostics=[Vertex vertex_1569486223160_0334_1_02 [Map 2] killed/failed
>  due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: item initializer failed, 
> vertex=vertex_1569486223160_0334_1_02 [Map 2], java.io.IOException: Couldn't 
> create RpcClient protocol
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:263)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:239)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:203)
> at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:165)
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:158)
> at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
> at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:102)
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:155)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3315)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3364)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3332)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1821)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2002)
> at 
> 

[jira] [Assigned] (HDDS-1615) ManagedChannel references are being leaked in ReplicationSupervisor.java

2019-09-29 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-1615:
---

Assignee: Mukul Kumar Singh  (was: Hrishikesh Gadre)

> ManagedChannel references are being leaked in ReplicationSupervisor.java
> 
>
> Key: HDDS-1615
> URL: https://issues.apache.org/jira/browse/HDDS-1615
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> ManagedChannel references are being leaked in ReplicationSupervisor.java
> {code}
> May 30, 2019 8:10:56 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=1495, 
> target=192.168.0.3:49868} was not shutdown properly!!! ~*~*~*
> Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
> at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient.(GrpcReplicationClient.java:65)
> at 
> org.apache.hadoop.ozone.container.replication.SimpleContainerDownloader.getContainerDataFromReplicas(SimpleContainerDownloader.java:87)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:118)
> at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:115)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

2019-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1569?focusedWorklogId=320242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320242
 ]

ASF GitHub Bot logged work on HDDS-1569:


Author: ASF GitHub Bot
Created on: 29/Sep/19 17:00
Start Date: 29/Sep/19 17:00
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1431: HDDS-1569 
Support creating multiple pipelines with same datanode
URL: https://github.com/apache/hadoop/pull/1431#issuecomment-536320655
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 36 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 18 new or modified test 
files. |
   ||| _ HDDS-1564 Compile Tests _ |
   | 0 | mvndep | 66 | Maven dependency ordering for branch |
   | -1 | mvninstall | 36 | hadoop-hdds in HDDS-1564 failed. |
   | -1 | mvninstall | 42 | hadoop-ozone in HDDS-1564 failed. |
   | -1 | compile | 17 | hadoop-hdds in HDDS-1564 failed. |
   | -1 | compile | 13 | hadoop-ozone in HDDS-1564 failed. |
   | +1 | checkstyle | 56 | HDDS-1564 passed |
   | +1 | mvnsite | 0 | HDDS-1564 passed |
   | +1 | shadedclient | 927 | branch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 18 | hadoop-hdds in HDDS-1564 failed. |
   | -1 | javadoc | 15 | hadoop-ozone in HDDS-1564 failed. |
   | 0 | spotbugs | 1009 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | -1 | findbugs | 29 | hadoop-hdds in HDDS-1564 failed. |
   | -1 | findbugs | 16 | hadoop-ozone in HDDS-1564 failed. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 23 | Maven dependency ordering for patch |
   | -1 | mvninstall | 30 | hadoop-hdds in the patch failed. |
   | -1 | mvninstall | 34 | hadoop-ozone in the patch failed. |
   | -1 | compile | 19 | hadoop-hdds in the patch failed. |
   | -1 | compile | 15 | hadoop-ozone in the patch failed. |
   | -1 | javac | 19 | hadoop-hdds in the patch failed. |
   | -1 | javac | 15 | hadoop-ozone in the patch failed. |
   | +1 | checkstyle | 24 | hadoop-hdds: The patch generated 0 new + 0 
unchanged - 3 fixed = 0 total (was 3) |
   | +1 | checkstyle | 27 | The patch passed checkstyle in hadoop-ozone |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 776 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 17 | hadoop-hdds in the patch failed. |
   | -1 | javadoc | 15 | hadoop-ozone in the patch failed. |
   | -1 | findbugs | 27 | hadoop-hdds in the patch failed. |
   | -1 | findbugs | 15 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | -1 | unit | 24 | hadoop-hdds in the patch failed. |
   | -1 | unit | 22 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 28 | The patch does not generate ASF License warnings. |
   | | | 2469 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1431 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml |
   | uname | Linux 7423369e04b0 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | HDDS-1564 / 7b5a5fe |
   | Default Java | 1.8.0_222 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-mvninstall-hadoop-hdds.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-compile-hadoop-hdds.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-compile-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-javadoc-hadoop-hdds.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-javadoc-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1431/16/artifact/out/branch-findbugs-hadoop-hdds.txt
 |
   | findbugs | 

[jira] [Comment Edited] (HDFS-14847) Erasure Coding: Blocks are over-replicated while EC decommissioning

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940342#comment-16940342
 ] 

Fei Hui edited comment on HDFS-14847 at 9/29/19 10:56 AM:
--

The problem of *findLeavingServiceSources* is that the decommissioning node(on 
which the replica has been transferred) is selected,   but other 
decommissioning node is not selected(have change no longer)
For example, EC storage strategy is RS-3-2.
There is a file which contains 5 blocks: b0, b1, b2, b3, b4. And 5 blocks are 
from dn0,  dn1, dn2, dn3, dn4.
We decommission dn0, dn1.

# 1st: Create ErasureCodingWork,  leavingServiceSources are dn0, dn1; Targets 
are 2
Assume that Replicating b1 of dn1 failed, b0 of dn0 success.
The file contians 6 blocks: b0, b1, b2, b3, b4, b0. And 6 blocks are from dn0, 
dn1, dn2, dn3, dn4, dn5.

# 2nd: Create ErasureCodingWork,  leavingServiceSources dn0, dn1. And target is 
1(4 live blocks.)
{quote}
final int num = Math.min(leavingServiceSources.size(), targets.length);
{quote}
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

# 3rd: The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0. And 6 blocks are 
from dn0, dn1, dn2, dn3, dn4, dn5, dn6.
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

# 4th : The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0, b0. And 6 
blocks are from dn0, dn1, dn2, dn3, dn4, dn5, dn6, dn7.

# 5th:...

The case occurs in our production environment.


was (Author: ferhui):
* The problem of *findLeavingServiceSources* is that the decommissioning 
node(on which the replica has been transferred) is selected,   but other 
decommissioning node is not selected(have change no longer)
For example, EC storage strategy is RS-3-2.
There is a file which contains 5 blocks: b0, b1, b2, b3, b4. And 5 blocks are 
from dn0,  dn1, dn2, dn3, dn4.
We decommission dn0, dn1.
1st: Create ErasureCodingWork,  leavingServiceSources are dn0, dn1; Targets are 
2
Assume that Replicating b1 of dn1 failed, b0 of dn0 success.
The file contians 6 blocks: b0, b1, b2, b3, b4, b0. And 6 blocks are from dn0, 
dn1, dn2, dn3, dn4, dn5.

2nd: Create ErasureCodingWork,  leavingServiceSources dn0, dn1. And target is 
1(4 live blocks.)
{quote}
final int num = Math.min(leavingServiceSources.size(), targets.length);
{quote}
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]
3rd: The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0. And 6 blocks are 
from dn0, dn1, dn2, dn3, dn4, dn5, dn6.
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]
4th : The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0, b0. And 6 blocks 
are from dn0, dn1, dn2, dn3, dn4, dn5, dn6, dn7.
5th:...
The case occurs in our production environment.

> Erasure Coding: Blocks are over-replicated while EC decommissioning
> ---
>
> Key: HDFS-14847
> URL: https://issues.apache.org/jira/browse/HDFS-14847
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-14847.001.patch, HDFS-14847.002.patch
>
>
> Found that Some blocks are over-replicated while ec decommissioning. Messages 
> in log as follow
> {quote}
> INFO BlockStateChange: Block: blk_-9223372035714984112_363779142, Expected 
> Replicas: 9, live replicas: 8, corrupt replicas: 0, decommissioned replicas: 
> 0, decommissioning replicas: 3, maintenance replicas: 0, live entering 
> maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes 
> having this block: 10.254.41.34:50010 10.254.54.53:50010 10.254.28.53:50010 
> 10.254.56.55:50010 10.254.32.21:50010 10.254.33.19:50010 10.254.63.17:50010 
> 10.254.31.19:50010 10.254.35.29:50010 10.254.51.57:50010 10.254.40.58:50010 
> 10.254.69.31:50010 10.254.47.18:50010 10.254.51.18:50010 10.254.43.57:50010 
> 10.254.50.47:50010 10.254.42.37:50010 10.254.57.29:50010 10.254.67.40:50010 
> 10.254.44.16:50010 10.254.59.38:50010 10.254.53.56:50010 10.254.45.11:50010 
> 10.254.39.22:50010 10.254.30.16:50010 10.254.35.53:50010 10.254.22.30:50010 
> 10.254.26.34:50010 10.254.17.58:50010 10.254.65.53:50010 10.254.60.39:50010 
> 10.254.61.20:50010 10.254.64.23:50010 10.254.21.13:50010 10.254.37.35:50010 
> 10.254.68.30:50010 10.254.62.37:50010 10.254.25.58:50010 10.254.52.54:50010 
> 10.254.58.31:50010 10.254.49.11:50010 10.254.55.52:50010 10.254.19.19:50010 
> 10.254.36.40:50010 10.254.18.30:50010 10.254.20.39:50010 10.254.66.52:50010 
> 10.254.56.32:50010 10.254.24.55:50010 10.254.34.11:50010 10.254.29.58:50010 
> 10.254.27.40:50010 10.254.46.33:50010 10.254.23.19:50010 10.254.74.12:50010 
> 10.254.74.13:50010 

[jira] [Comment Edited] (HDFS-14847) Erasure Coding: Blocks are over-replicated while EC decommissioning

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940342#comment-16940342
 ] 

Fei Hui edited comment on HDFS-14847 at 9/29/19 10:56 AM:
--

The problem of *findLeavingServiceSources* is that the decommissioning node(on 
which the replica has been transferred) is selected,   but other 
decommissioning node is not selected(have change no longer)
For example, EC storage strategy is RS-3-2.
There is a file which contains 5 blocks: b0, b1, b2, b3, b4. And 5 blocks are 
from dn0,  dn1, dn2, dn3, dn4.
We decommission dn0, dn1.

* 1st: Create ErasureCodingWork,  leavingServiceSources are dn0, dn1; Targets 
are 2
Assume that Replicating b1 of dn1 failed, b0 of dn0 success.
The file contians 6 blocks: b0, b1, b2, b3, b4, b0. And 6 blocks are from dn0, 
dn1, dn2, dn3, dn4, dn5.

* 2nd: Create ErasureCodingWork,  leavingServiceSources dn0, dn1. And target is 
1(4 live blocks.)
{quote}
final int num = Math.min(leavingServiceSources.size(), targets.length);
{quote}
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

* 3rd: The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0. And 6 blocks are 
from dn0, dn1, dn2, dn3, dn4, dn5, dn6.
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

* 4th : The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0, b0. And 6 
blocks are from dn0, dn1, dn2, dn3, dn4, dn5, dn6, dn7.

* 5th:...

The case occurs in our production environment.


was (Author: ferhui):
The problem of *findLeavingServiceSources* is that the decommissioning node(on 
which the replica has been transferred) is selected,   but other 
decommissioning node is not selected(have change no longer)
For example, EC storage strategy is RS-3-2.
There is a file which contains 5 blocks: b0, b1, b2, b3, b4. And 5 blocks are 
from dn0,  dn1, dn2, dn3, dn4.
We decommission dn0, dn1.

# 1st: Create ErasureCodingWork,  leavingServiceSources are dn0, dn1; Targets 
are 2
Assume that Replicating b1 of dn1 failed, b0 of dn0 success.
The file contians 6 blocks: b0, b1, b2, b3, b4, b0. And 6 blocks are from dn0, 
dn1, dn2, dn3, dn4, dn5.

# 2nd: Create ErasureCodingWork,  leavingServiceSources dn0, dn1. And target is 
1(4 live blocks.)
{quote}
final int num = Math.min(leavingServiceSources.size(), targets.length);
{quote}
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

# 3rd: The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0. And 6 blocks are 
from dn0, dn1, dn2, dn3, dn4, dn5, dn6.
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]

# 4th : The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0, b0. And 6 
blocks are from dn0, dn1, dn2, dn3, dn4, dn5, dn6, dn7.

# 5th:...

The case occurs in our production environment.

> Erasure Coding: Blocks are over-replicated while EC decommissioning
> ---
>
> Key: HDFS-14847
> URL: https://issues.apache.org/jira/browse/HDFS-14847
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-14847.001.patch, HDFS-14847.002.patch
>
>
> Found that Some blocks are over-replicated while ec decommissioning. Messages 
> in log as follow
> {quote}
> INFO BlockStateChange: Block: blk_-9223372035714984112_363779142, Expected 
> Replicas: 9, live replicas: 8, corrupt replicas: 0, decommissioned replicas: 
> 0, decommissioning replicas: 3, maintenance replicas: 0, live entering 
> maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes 
> having this block: 10.254.41.34:50010 10.254.54.53:50010 10.254.28.53:50010 
> 10.254.56.55:50010 10.254.32.21:50010 10.254.33.19:50010 10.254.63.17:50010 
> 10.254.31.19:50010 10.254.35.29:50010 10.254.51.57:50010 10.254.40.58:50010 
> 10.254.69.31:50010 10.254.47.18:50010 10.254.51.18:50010 10.254.43.57:50010 
> 10.254.50.47:50010 10.254.42.37:50010 10.254.57.29:50010 10.254.67.40:50010 
> 10.254.44.16:50010 10.254.59.38:50010 10.254.53.56:50010 10.254.45.11:50010 
> 10.254.39.22:50010 10.254.30.16:50010 10.254.35.53:50010 10.254.22.30:50010 
> 10.254.26.34:50010 10.254.17.58:50010 10.254.65.53:50010 10.254.60.39:50010 
> 10.254.61.20:50010 10.254.64.23:50010 10.254.21.13:50010 10.254.37.35:50010 
> 10.254.68.30:50010 10.254.62.37:50010 10.254.25.58:50010 10.254.52.54:50010 
> 10.254.58.31:50010 10.254.49.11:50010 10.254.55.52:50010 10.254.19.19:50010 
> 10.254.36.40:50010 10.254.18.30:50010 10.254.20.39:50010 10.254.66.52:50010 
> 10.254.56.32:50010 10.254.24.55:50010 10.254.34.11:50010 10.254.29.58:50010 
> 10.254.27.40:50010 10.254.46.33:50010 10.254.23.19:50010 10.254.74.12:50010 
> 

[jira] [Commented] (HDFS-14847) Erasure Coding: Blocks are over-replicated while EC decommissioning

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940342#comment-16940342
 ] 

Fei Hui commented on HDFS-14847:


* The problem of *findLeavingServiceSources* is that the decommissioning 
node(on which the replica has been transferred) is selected,   but other 
decommissioning node is not selected(have change no longer)
For example, EC storage strategy is RS-3-2.
There is a file which contains 5 blocks: b0, b1, b2, b3, b4. And 5 blocks are 
from dn0,  dn1, dn2, dn3, dn4.
We decommission dn0, dn1.
1st: Create ErasureCodingWork,  leavingServiceSources are dn0, dn1; Targets are 
2
Assume that Replicating b1 of dn1 failed, b0 of dn0 success.
The file contians 6 blocks: b0, b1, b2, b3, b4, b0. And 6 blocks are from dn0, 
dn1, dn2, dn3, dn4, dn5.

2nd: Create ErasureCodingWork,  leavingServiceSources dn0, dn1. And target is 
1(4 live blocks.)
{quote}
final int num = Math.min(leavingServiceSources.size(), targets.length);
{quote}
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]
3rd: The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0. And 6 blocks are 
from dn0, dn1, dn2, dn3, dn4, dn5, dn6.
Will create replication work for dn0(*leavingServiceSources.get(0)*), targets[0]
4th : The file contians 7 blocks: b0, b1, b2, b3, b4, b0, b0, b0. And 6 blocks 
are from dn0, dn1, dn2, dn3, dn4, dn5, dn6, dn7.
5th:...
The case occurs in our production environment.

> Erasure Coding: Blocks are over-replicated while EC decommissioning
> ---
>
> Key: HDFS-14847
> URL: https://issues.apache.org/jira/browse/HDFS-14847
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Critical
> Attachments: HDFS-14847.001.patch, HDFS-14847.002.patch
>
>
> Found that Some blocks are over-replicated while ec decommissioning. Messages 
> in log as follow
> {quote}
> INFO BlockStateChange: Block: blk_-9223372035714984112_363779142, Expected 
> Replicas: 9, live replicas: 8, corrupt replicas: 0, decommissioned replicas: 
> 0, decommissioning replicas: 3, maintenance replicas: 0, live entering 
> maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes 
> having this block: 10.254.41.34:50010 10.254.54.53:50010 10.254.28.53:50010 
> 10.254.56.55:50010 10.254.32.21:50010 10.254.33.19:50010 10.254.63.17:50010 
> 10.254.31.19:50010 10.254.35.29:50010 10.254.51.57:50010 10.254.40.58:50010 
> 10.254.69.31:50010 10.254.47.18:50010 10.254.51.18:50010 10.254.43.57:50010 
> 10.254.50.47:50010 10.254.42.37:50010 10.254.57.29:50010 10.254.67.40:50010 
> 10.254.44.16:50010 10.254.59.38:50010 10.254.53.56:50010 10.254.45.11:50010 
> 10.254.39.22:50010 10.254.30.16:50010 10.254.35.53:50010 10.254.22.30:50010 
> 10.254.26.34:50010 10.254.17.58:50010 10.254.65.53:50010 10.254.60.39:50010 
> 10.254.61.20:50010 10.254.64.23:50010 10.254.21.13:50010 10.254.37.35:50010 
> 10.254.68.30:50010 10.254.62.37:50010 10.254.25.58:50010 10.254.52.54:50010 
> 10.254.58.31:50010 10.254.49.11:50010 10.254.55.52:50010 10.254.19.19:50010 
> 10.254.36.40:50010 10.254.18.30:50010 10.254.20.39:50010 10.254.66.52:50010 
> 10.254.56.32:50010 10.254.24.55:50010 10.254.34.11:50010 10.254.29.58:50010 
> 10.254.27.40:50010 10.254.46.33:50010 10.254.23.19:50010 10.254.74.12:50010 
> 10.254.74.13:50010 10.254.41.35:50010 10.254.67.58:50010 10.254.54.11:50010 
> 10.254.68.14:50010 10.254.27.14:50010 10.254.51.29:50010 10.254.45.21:50010 
> 10.254.50.56:50010 10.254.47.31:50010 10.254.40.14:50010 10.254.65.21:50010 
> 10.254.62.22:50010 10.254.57.16:50010 10.254.36.52:50010 10.254.30.13:50010 
> 10.254.35.12:50010 10.254.69.34:50010 10.254.34.58:50010 10.254.17.50:50010 
> 10.254.63.12:50010 10.254.28.21:50010 10.254.58.30:50010 10.254.24.57:50010 
> 10.254.33.50:50010 10.254.44.52:50010 10.254.32.48:50010 10.254.43.39:50010 
> 10.254.20.37:50010 10.254.56.59:50010 10.254.22.33:50010 10.254.60.34:50010 
> 10.254.49.19:50010 10.254.52.21:50010 10.254.23.59:50010 10.254.21.16:50010 
> 10.254.42.55:50010 10.254.29.33:50010 10.254.53.17:50010 10.254.19.14:50010 
> 10.254.64.51:50010 10.254.46.20:50010 10.254.66.22:50010 10.254.18.38:50010 
> 10.254.39.17:50010 10.254.37.57:50010 10.254.31.54:50010 10.254.55.33:50010 
> 10.254.25.17:50010 10.254.61.33:50010 10.254.26.40:50010 10.254.59.23:50010 
> 10.254.59.35:50010 10.254.66.48:50010 10.254.41.15:50010 10.254.54.31:50010 
> 10.254.61.50:50010 10.254.62.31:50010 10.254.17.56:50010 10.254.29.18:50010 
> 10.254.45.16:50010 10.254.63.48:50010 10.254.22.34:50010 10.254.37.51:50010 
> 10.254.65.49:50010 10.254.58.21:50010 10.254.42.12:50010 10.254.55.17:50010 
> 10.254.27.13:50010 10.254.57.17:50010 10.254.67.18:50010 

[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940329#comment-16940329
 ] 

Fei Hui commented on HDFS-14509:


[~John Smith]  Part of approach 1 will be good.
 We only consider the NN changes of approach 1, and with your fix we could 
upgrade to 3.x directly.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940327#comment-16940327
 ] 

Yuxuan Wang commented on HDFS-14509:


[~ferhui] Oh, It's other approach can do, not my PR.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940326#comment-16940326
 ] 

Yuxuan Wang commented on HDFS-14509:


{quote}
As for unit tests I think we need two

that verifies the upgrade from 2.x to 3.x is possible.
that verifies the upgrade from 2.x-1 to 2.x is still possible.
{quote}
I don't understand the comment's meaning. What's 2.x-1 ? Patched with this 
patch ?
And  upgrade from 2.x to 3.x is impossible not possible?
I'll improve UT and fix checkstyle together after I realize how to.
I don't know your github ID, feel free to comment at PR.
Thanks for your review [~shv].

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940325#comment-16940325
 ] 

Fei Hui commented on HDFS-14509:


[~John Smith]
{quote}
NN 3.x does not include storage types into block token until the upgrade is 
finalized.
{quote}
During upgrading, NN does not use fields from HDFS-6708 and HDFS-9807.After 
finalizing NN could use it, with this we could upgrade to 3.x directly.

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-09-29 Thread hu xiaodong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940294#comment-16940294
 ] 

hu xiaodong commented on HDFS-13736:


Hello, [~ayushtkn]

   sorry for no responding for a long time.

    I'm a little busy recently.

    I have submitted the second patch which contains a unit test.

    If just add a parameter to {{chooseLocalStorage}} to denote it, I think  
lots of places should be modified. 

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-09-29 Thread hu xiaodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hu xiaodong updated HDFS-13736:
---
Attachment: HDFS-13736.002.patch

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940277#comment-16940277
 ] 

Yuxuan Wang commented on HDFS-14509:


[~John Smith] If we add some fields in the future, we still need this patch. 
Trunk is better.
According to hadoop's doc, we should update NN first. At that time, the block 
token will have new fields attached which DN not upgraded yet can't recognize. 
So we have to backport it to 2.x branch and upgrade DN before upgrade to 3.x .
Or I miss [~shv]'s some comment. Can you quote it ? [~ferhui]

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940272#comment-16940272
 ] 

Fei Hui commented on HDFS-14509:


[~John Smith] If it's a general solution, I think target version is 3.x better. 
Is it?
If combined with # 1 [~shv] said, we could upgrade to 3.x directly, do not have 
to upgrade an intermediate version

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940266#comment-16940266
 ] 

Yuxuan Wang commented on HDFS-14509:


[~ferhui] If it's just for resolving upgrading from 2.x to 3.x, we need patch 
it before HDFS-6708 and HDFS-9807.
But I think it's a general solution, I suggest to target to trunk and backport 
to 2.x branch.


> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940264#comment-16940264
 ] 

Fei Hui commented on HDFS-14509:


Have one question
Is target version is 2.x or both 2.x and 3.x?

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-29 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940253#comment-16940253
 ] 

Yuxuan Wang commented on HDFS-14509:


[~brahmareddy] Yes, existing cluster might need to update this patch before 
upgrade.
I have reopened PR, pending yetus and we can go ahead. 

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14752) backport HDFS-13709 to branch-2(Report bad block to NN when transfer block encounter EIO exception)

2019-09-29 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940251#comment-16940251
 ] 

Fei Hui commented on HDFS-14752:


Thanks for fixing HDFS-13709
Maybe it is not  necessary for Filing new issue.
As  far as I know, this patch should post to the issue HDFS-13709. And mark the 
fixed version as 2.x

> backport HDFS-13709 to branch-2(Report bad block to NN when transfer block 
> encounter EIO exception) 
> 
>
> Key: HDFS-14752
> URL: https://issues.apache.org/jira/browse/HDFS-14752
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14752.branch-2.001.patch, 
> HDFS-14752.branch-2.002.patch
>
>
> backport HDFS-13709 (Report bad block to NN when transfer block encounter EIO 
> exception) to branch-2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org