[jira] [Updated] (HDFS-16063) Add toString to EditLogFileInputStream

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-16063:
--
Labels: n00b newbie  (was: )

> Add toString to EditLogFileInputStream
> --
>
> Key: HDFS-16063
> URL: https://issues.apache.org/jira/browse/HDFS-16063
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: n00b, newbie
>
> The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no 
> {{toString}} method, so the logging is of limited value.  Also, put the DEBUG 
> statement behind some guards since it's printing an unbounded list of items.
> https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895
> Just need the following:
> {code:java}
>   private final LogSource log;
>   private final long firstTxId;
>   private final long lastTxId;
>   private final boolean isInProgress;
>   private int maxOpSize;
>   private State state = State.UNINIT;
>   private int logVersion = 0;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16063) Add toString to EditLogFileInputStream

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-16063:
--
Description: 
The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no 
{{toString}} method, so the logging is of limited value.  Also, put the DEBUG 
statement behind some guards since it's printing an unbounded list of items.

https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895

Just need the following:

{code:java}
  private final LogSource log;
  private final long firstTxId;
  private final long lastTxId;
  private final boolean isInProgress;
  private int maxOpSize;
  private State state = State.UNINIT;
  private int logVersion = 0;
{code}

  was:
The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no 
{{toString}} method, so the logging is of limited value.  Also, put the DEBUG 
statement behind some guards since it's printing an unbounded list of items.

https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895


> Add toString to EditLogFileInputStream
> --
>
> Key: HDFS-16063
> URL: https://issues.apache.org/jira/browse/HDFS-16063
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>
> The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no 
> {{toString}} method, so the logging is of limited value.  Also, put the DEBUG 
> statement behind some guards since it's printing an unbounded list of items.
> https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895
> Just need the following:
> {code:java}
>   private final LogSource log;
>   private final long firstTxId;
>   private final long lastTxId;
>   private final boolean isInProgress;
>   private int maxOpSize;
>   private State state = State.UNINIT;
>   private int logVersion = 0;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16063) Add toString to EditLogFileInputStream

2021-06-10 Thread David Mollitor (Jira)
David Mollitor created HDFS-16063:
-

 Summary: Add toString to EditLogFileInputStream
 Key: HDFS-16063
 URL: https://issues.apache.org/jira/browse/HDFS-16063
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: David Mollitor


The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no 
{{toString}} method, so the logging is of limited value.  Also, put the DEBUG 
statement behind some guards since it's printing an unbounded list of items.

https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-02-04 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278864#comment-17278864
 ] 

David Mollitor commented on HDFS-15790:
---

OK.  This looks OK with me. 

As I said, in my original issue, both engines were loaded into the same JVM and 
they would both fight at the point of registration.  It looks like things are 
now setup that they both register in the same static way and they don't explode 
when they both register.  Thanks.

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-01-28 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273800#comment-17273800
 ] 

David Mollitor commented on HDFS-15790:
---

There also needs to be some doc somewhere on how to allow third parties to 
leverage this functionality (if it's meant as a public vehicle).  Like I said, 
I haven't figured out where the import substitution happens in the build 
process.

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-01-28 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273799#comment-17273799
 ] 

David Mollitor commented on HDFS-15790:
---

Thanks [~vinayakumarb].  I don't mind adding this new capability, but it broke 
backwards compatibility of a public class.  Thanks for taking a look.  I hope 
this can be considered an add-on and not a replacement.

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-01-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271694#comment-17271694
 ] 

David Mollitor commented on HDFS-15790:
---

Also, to use this new engine, there is some wizard magic required to make the 
protobuf compiler to import core Protobuf functionality from 
{{org.apache.hadoop.thirdparty.protobuf.*;}} instead of the core protobuf JARs, 
but I haven't been able to find any documentation on how to pull this off.

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-01-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor moved HADOOP-17494 to HDFS-15790:


Key: HDFS-15790  (was: HADOOP-17494)
Project: Hadoop HDFS  (was: Hadoop Common)

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15621) Datanode DirectoryScanner uses excessive memory

2020-10-09 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211069#comment-17211069
 ] 

David Mollitor commented on HDFS-15621:
---

Another option would be to multi-thread the operation and use a blocking queue 
to regulate memory consumption.

 

Multiple threads are scanning directories, and pumping results into a queue.  
One or more thread processes the data in the queue.  If the queue is full, 
scanners block.  In this way, the number of objects that exist at one time is 
controlled.

> Datanode DirectoryScanner uses excessive memory
> ---
>
> Key: HDFS-15621
> URL: https://issues.apache.org/jira/browse/HDFS-15621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot 
> 2020-10-09 at 15.20.56.png
>
>
> We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes 
> with a lot of blocks, this can mean a lot of heap.
> We recently captured a heapdump of a DN with about 22M blocks and found only 
> about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken 
> by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to 
> strings.
> Checking the strings in question, we can see two strings per scanInfo, 
> looking like:
> {code}
> /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785
> _106716708.meta
> {code}
> I will update a screen shot from MAT showing this.
> For the first string especially, the part 
> "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will 
> be the same for every block in the block pool as the scanner is only 
> concerned about finalized blocks.
> We can probably also store just the subdir indexes "28" and "27" rather than 
> "subdir28/subdir17" and then construct the path when it is requested via the 
> getter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15393) Review of PendingReconstructionBlocks

2020-06-05 Thread David Mollitor (Jira)
David Mollitor created HDFS-15393:
-

 Summary: Review of PendingReconstructionBlocks
 Key: HDFS-15393
 URL: https://issues.apache.org/jira/browse/HDFS-15393
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


I started looking at this class based on [HDFS-15351].

* Uses {{java.sql.Time}} unnecessarily.  Confusing since Java ships with time 
formatters out of the box in JDK 8.  I believe this will cause issues later 
when trying to upgrade to JDK 9+ since SQL is a different module in Java.
* Remove code where appropriate
* Use Java Concurrent library for higher concurrent access to underlying map



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate

2020-06-05 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126748#comment-17126748
 ] 

David Mollitor commented on HDFS-15351:
---

Thanks for pinging me [~hemanthboyina] a few times. I have been a bit all over 
the place so thanks for you persistence and patients.

Probably should be using {{Collection}} classes instead of native arrays, but 
that's not for this ticket.
{code:java}
PendingBlockInfo remove = pendingReconstruction.remove(lastBlock);
if (remove != null) {
  List locations = remove.getTargets();
  DatanodeStorageInfo.decrementBlocksScheduled(locations.toArray(new 
DatanodeStorageInfo[0]));
 }
{code}

> Blocks Scheduled Count was wrong on Truncate 
> -
>
> Key: HDFS-15351
> URL: https://issues.apache.org/jira/browse/HDFS-15351
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, 
> HDFS-15351.003.patch
>
>
> On truncate and append we remove the blocks from Reconstruction Queue 
> On removing the blocks from pending reconstruction , we need to decrement 
> Blocks Scheduled 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14452) Make Op#valueOf() Public

2020-05-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107334#comment-17107334
 ] 

David Mollitor commented on HDFS-14452:
---

Hello Team,

 

Any more thoughts on this?  How do we move this forward?

> Make Op#valueOf() Public
> 
>
> Key: HDFS-14452
> URL: https://issues.apache.org/jira/browse/HDFS-14452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: hemanthboyina
>Priority: Minor
>  Labels: noob
> Attachments: HDFS-14452.patch
>
>
> Change signature of {{private static Op valueOf(byte code)}} to be public.  
> Right now, the only easy way to look up in Op is to pass in a {{DataInput}} 
> object, which is not all that flexible and efficient for other custom 
> implementations that want to store the Op code a different way.
> https://github.com/apache/hadoop/blob/8c95cb9d6bef369fef6a8364f0c0764eba90e44a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java#L53



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021120#comment-17021120
 ] 

David Mollitor commented on HDFS-15115:
---

Thanks for looping me in.

 

I've never liked this setup with the {{StringBuilder}} from the start.  It's 
just not the right way to do DEBUG logging.  All the logging should be 
generated in one block and not concatenated piecemeal.  However, I submitted a 
patch (slightly updated from v1) so that the {{builder}} is always populated 
and will therefore not throw NPE.

 

However, please note that if the DEBUG logging is enabled sometime during 
execution, the first log message may be only partial... that is, the first few 
concatenations happen while DEBUG is disabled, and the last few happen while 
DEBUG is enabled, and then the {{StringBuilder}} is sent to the logging 
framework for output.

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Status: Patch Available  (was: Open)

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Attachment: (was: HDFS-15115.1.patch)

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Attachment: HDFS-15115.2.patch

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Attachment: HDFS-15115.1.patch

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Attachment: (was: HDFS-14103.1.patch)

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HDFS-15115:
-

Assignee: David Mollitor

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14103.1.patch, HDFS-15115.001.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-22 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-15115:
--
Attachment: HDFS-14103.1.patch

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14103.1.patch, HDFS-15115.001.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured

2019-11-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975302#comment-16975302
 ] 

David Mollitor commented on HDFS-14902:
---

I don't have a great way of checking this.  I was previously using the 
prepackaged Hadoop binaries with the default configuration.  I agree that it 
should not even start.

> RBF: NullPointer When Misconfigured
> ---
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-14902.001.patch, HDFS-14902.002.patch
>
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14872) Read HDFS Blocks in Random Order

2019-10-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957343#comment-16957343
 ] 

David Mollitor commented on HDFS-14872:
---

Might be able to create a new copy routine at a higher level with the existing 
HDFS FS API.  Need to check,

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html#copy-java.io.File-org.apache.hadoop.fs.FileSystem-org.apache.hadoop.fs.Path-boolean-org.apache.hadoop.conf.Configuration-

> Read HDFS Blocks in Random Order
> 
>
> Key: HDFS-14872
> URL: https://issues.apache.org/jira/browse/HDFS-14872
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 2.8.5, 3.2.1
>Reporter: David Mollitor
>Priority: Major
>
> When the HDFS client is downloading (copying) an entire file, allow the 
> client to download the blocks in random order.  If a lot of clients are 
> reading the same file, in parallel, they will all download the first block, 
> the second block, and so on, stampeding down the line.
> It would be interesting to spread the load across across all the available 
> DataNodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-16 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952812#comment-16952812
 ] 

David Mollitor commented on HDFS-14854:
---

[~sodonnell] Thanks.  Looks good!

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch, HDFS-14854.010.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952183#comment-16952183
 ] 

David Mollitor commented on HDFS-14854:
---

What I was saying before,... now that I've dug into it a bit more, is that we 
should look at revamping the 
{{org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks}} class as 
part of this effort.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952179#comment-16952179
 ] 

David Mollitor commented on HDFS-14854:
---

# https://stackoverflow.com/questions/10868423/lock-lock-before-try
# Please grab the lock for {{dn.getStorageInfos()}} in its own block.  Easier 
to reason about.
# Using a 'null' value in this way is overloading the use of the {{Map}} class 
and it's not clearly articulated in the comments how this works.  I think it 
would be much cleaner to have {{processPendingNodes()}} return a list of nodes 
that need to be processed instead of populating the {{Map}} in this way.

{code:java}
  List pendingNodes;
 try {
...
processCancelledNodes();
pendingNodes = processPendingNodes();
  } finally {
namesystem.writeUnlock();
  }
 ...
 check(pendingNodes);
{code}

4. 

bq. For nodes to be added to pendingNodes, that is always done under the 
namenode writeLock

Please put that as a requirement in the JavaDoc for {{startTrackingNode}} 
method.

5.  I worry about the needless locking because that lock is a very hot lock,... 
used all over the place, and the time per iteration is configurable, so 30 
seconds is the default, but user may opt to lower to 1 second and there's no 
information for them to know that this will increase the lock retention, even 
if there is nothing to replicate.


> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952069#comment-16952069
 ] 

David Mollitor commented on HDFS-14854:
---

[~sodonnell] [~elgoiri]  I provided some feedback for you to review regarding 
this specific patching.

However, I would like to draw your attention to something I was saying before...

I think it would be cool if we could also include the 
{{BlockManager#neededReconstruction}} in improving decommissioning.  There is a 
bunch of polling going on in this class, checking sizes and statuses.  I think 
some of that could be removed by making the 
{{BlockManager#neededReconstruction}} Collection a synchronized priority 
queue perhaps it should just be it's own priority queue-backed 
{{ExecutorService}}.  This will help in that requests from dead nodes will be 
prioritized ahead of requests for decommissioning.  You could probably also 
make it a {{BlockingQueue}} with a fixed-size so that threads block if the 
queue gets too large.  In this way, there doesn't need to be batching.  Just 
figure out the next block to replicate, give up the global lock, try to add it 
to the {{neededReconstruction}} queue, and once complete, go find the next 
block to replicate.  Something like that.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952040#comment-16952040
 ] 

David Mollitor commented on HDFS-14854:
---

{code:java}
if (blockManager.blocksMap.getStoredBlock(block) == null) {
  LOG.trace("Removing unknown block {}", block);
  return true;
}

long bcId = block.getBlockCollectionId();
if (bcId == INodeId.INVALID_INODE_ID) {
  // Orphan block, will be invalidated eventually. Skip.
  return false;
}
{code}

I think it should return 'true' if the block is orphaned, no?  It should skip 
them in the same way that an 'unknown' block is.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952019#comment-16952019
 ] 

David Mollitor commented on HDFS-14854:
---

This code knows the pendingCount value and the pendingRepLimit... do not grab 
the write lock if the function is going to immediately return anyway.

{code:java}
int pendingCount = getPendingCount();

try {
  namesystem.writeLock();
  long repQueueSize = blockManager.getLowRedundancyBlocksCount();
...
  if (pendingCount >= pendingRepLimit) {
return;
  }
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952014#comment-16952014
 ] 

David Mollitor commented on HDFS-14854:
---

Please remove this method.  It can be replaced with {{map.computeIfAbsent(key, 
k -> new LinkedList()).add(v);}}

{code:java}
private void addBlockToPending(DatanodeDescriptor dn, BlockInfo block) {
List blockList = pendingRep.get(dn);
  if (blockList == null) {
 blockList = new LinkedList<>();
pendingRep.put(dn, blockList);
  }
  blockList.add(block);
}
{code}

https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function-

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952009#comment-16952009
 ] 

David Mollitor commented on HDFS-14854:
---

Nit: this is not very java-y...
{code:java}
final List toRemove = new ArrayList<>();
...
processMaintenanceNodes(toRemove);
...

// Check if any nodes have reached zero blocks and also update the stats
// exposed via JMX for all nodes still being processed.
checkForCompletedNodes(toRemove);

// Finally move the nodes to their final state if they are ready.
processCompletedNodes(toRemove);
{code}
Better to remove coupling:
{code:java}
final List maintenanceExpiredNodes = 
getMaintenanceNodes();
...

final List completedNodes = getCompletedNodes();

Iterable nodesToRemove = Iterables.unmodifiableIterable(
  Iterables.concat(maintenanceExpiredNodes , completedNodes));

// Finally move the nodes to their final state if they are ready.
processCompletedNodes(Lists.newArrayList(nodesToRemove));
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952001#comment-16952001
 ] 

David Mollitor commented on HDFS-14854:
---

{code:java}
  private void processPendingNodes() {
while (!pendingNodes.isEmpty() &&
(maxConcurrentTrackedNodes == 0 ||
outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) {
  outOfServiceNodeBlocks.put(pendingNodes.poll(), null);
}
  }
{code}

This method is accessed by the local running Thread.  However, {{pendingNodes}} 
does not appear to be a thread-safe class.  Perhaps the collection cannot be 
modified because of the external locking of the {{writeLock}} but there is no 
requirement to have the lock stated in the {{startTrackingNode}} method.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952001#comment-16952001
 ] 

David Mollitor edited comment on HDFS-14854 at 10/15/19 3:04 PM:
-

{code:java}
  private void processPendingNodes() {
while (!pendingNodes.isEmpty() &&
(maxConcurrentTrackedNodes == 0 ||
outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) {
  outOfServiceNodeBlocks.put(pendingNodes.poll(), null);
}
  }
{code}

This method is accessed by the local running Thread.  However, {{pendingNodes}} 
does not appear to be a thread-safe Collection.  Perhaps the collection cannot 
be modified because of the external locking of the {{writeLock}} but there is 
no requirement to have the lock stated in the {{startTrackingNode}} method 
javadoc.


was (Author: belugabehr):
{code:java}
  private void processPendingNodes() {
while (!pendingNodes.isEmpty() &&
(maxConcurrentTrackedNodes == 0 ||
outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) {
  outOfServiceNodeBlocks.put(pendingNodes.poll(), null);
}
  }
{code}

This method is accessed by the local running Thread.  However, {{pendingNodes}} 
does not appear to be a thread-safe class.  Perhaps the collection cannot be 
modified because of the external locking of the {{writeLock}} but there is no 
requirement to have the lock stated in the {{startTrackingNode}} method.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951991#comment-16951991
 ] 

David Mollitor commented on HDFS-14854:
---

The method {{scanDatanodeStorage}} uses the {{namesystem.readLock();}} in a 
pretty verbose and complicated way.

If the idea here is to grab the {{readLock}} for each DataNode, and unlock it 
after processing each DataNode, simply move the {{try...finally}} block inside 
the loop.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951985#comment-16951985
 ] 

David Mollitor commented on HDFS-14854:
---

{code:java}
while (!pendingNodes.isEmpty() &&
(maxConcurrentTrackedNodes == 0 ||
outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) {
  outOfServiceNodeBlocks.put(pendingNodes.poll(), null);
}
{code}

Using 'null' values is very out of vogue.  Better to put a new {{HashMap}} 
here. Allows for simplification of the code by assuming that values will never 
be 'null'.   The cost of creating a HashMap is very low here, especially it's 
only one per DataNode.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951978#comment-16951978
 ] 

David Mollitor edited comment on HDFS-14854 at 10/15/19 2:43 PM:
-

The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}}

 

{code:java}
while (!queue.isEmpty()) {
queue.poll();
}
{code}


was (Author: belugabehr):
The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951978#comment-16951978
 ] 

David Mollitor commented on HDFS-14854:
---

The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951972#comment-16951972
 ] 

David Mollitor commented on HDFS-14854:
---

I'm looking at this now, but one nit:
{code:java|title=Currently}
 try {
namesystem.writeLock();
...
  } finally {
namesystem.writeUnlock();
  }
{code}
Best practice is to grab the lock outside of the try statement.
 
[https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html]
{code:java|title=Currently}
 namesystem.writeLock();
 try {
...
  } finally {
namesystem.writeUnlock();
  }
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured

2019-10-08 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947053#comment-16947053
 ] 

David Mollitor commented on HDFS-14902:
---

I just downloaded the Hadoop binaries and ran no-arg {{./hdfs dfsrouter}}

> RBF: NullPointer When Misconfigured
> ---
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Priority: Minor
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-08 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946988#comment-16946988
 ] 

David Mollitor commented on HDFS-14899:
---

[~goiri] Yes.  I also verified with a quick and simple proxy... shout-out to 
[~ayushsaxena] for the idea.

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14902) NullPointer When Misconfigured

2019-10-08 Thread David Mollitor (Jira)
David Mollitor created HDFS-14902:
-

 Summary: NullPointer When Misconfigured
 Key: HDFS-14902
 URL: https://issues.apache.org/jira/browse/HDFS-14902
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.2.0
Reporter: David Mollitor


Admittedly the server was mis-configured, but this should be a bit more elegant.

{code:none}
2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
exception updating NN registration for null:null
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
at 
org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
at 
org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
at 
org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14902) NullPointer When Misconfigured

2019-10-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14902:
--
Priority: Minor  (was: Major)

> NullPointer When Misconfigured
> --
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Priority: Minor
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14898:
--
Attachment: HDFS-14898.2.patch

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14898.1.patch, HDFS-14898.2.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-08 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946925#comment-16946925
 ] 

David Mollitor commented on HDFS-14898:
---

[~ayushtkn] Thank you so much.  Super helpful.  I just used that method, using 
Caddy, to discover that my first patch was incorrect.  Thanks!

New patch supplied.

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14898.1.patch, HDFS-14898.2.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14898:
--
Status: Open  (was: Patch Available)

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14898.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Status: Open  (was: Patch Available)

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Attachment: HDFS-14899.1.patch

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Status: Patch Available  (was: Open)

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Attachment: (was: HDFS-14899.1.patch)

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Status: Patch Available  (was: Open)

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)
David Mollitor created HDFS-14899:
-

 Summary: Use Relative URLS in Hadoop HDFS RBF
 Key: HDFS-14899
 URL: https://issues.apache.org/jira/browse/HDFS-14899
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HDFS-14899.1.patch





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14899:
--
Attachment: HDFS-14899.1.patch

> Use Relative URLS in Hadoop HDFS RBF
> 
>
> Key: HDFS-14899
> URL: https://issues.apache.org/jira/browse/HDFS-14899
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14899.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14898:
--
Status: Patch Available  (was: Open)

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14898.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14898:
--
Attachment: HDFS-14898.1.patch

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14898.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-07 Thread David Mollitor (Jira)
David Mollitor created HDFS-14898:
-

 Summary: Use Relative URLS in Hadoop HDFS HTTP FS
 Key: HDFS-14898
 URL: https://issues.apache.org/jira/browse/HDFS-14898
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: httpfs
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS

2019-10-07 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14898:
--
Flags: Patch

> Use Relative URLS in Hadoop HDFS HTTP FS
> 
>
> Key: HDFS-14898
> URL: https://issues.apache.org/jira/browse/HDFS-14898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14872) Read HDFS Blocks in Random Order

2019-09-26 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938614#comment-16938614
 ] 

David Mollitor commented on HDFS-14872:
---

[~sodonnell]

I imagine something like... the client looks up the size of the file in HDFS 
and pre-allocates the file on the local system, then it gets a list of all the 
blocks for the file, shuffles them, iterates over them, then starts writing 
blocks to the local file at the required offsets.  Once the list of blocks is 
exhausted, the file is complete and made available to the application.

The first use case that comes to mind is better supporting large files 
submitted to the cluster that are required for MapReduce / Spark applications.  
The jobs will not start unless all of the required files are first localized 
from HDFS into the local host by the YARN NodeManager.  If the job requires a 
large JAR file or, even more likely, a large dependency file, all of the nodes 
will fight with each other to download the blocks in order.

One could increase {{mapreduce.client.submit.file.replication}}, however this 
has its limitations as well.  In a large cluster, it may take a long time for 
the NameNode to schedule all of the replication required to get all of the 
blocks up to the requested replication.

https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/SharedCache.html
https://blog.cloudera.com/resource-localization-in-yarn-deep-dive/
https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

> Read HDFS Blocks in Random Order
> 
>
> Key: HDFS-14872
> URL: https://issues.apache.org/jira/browse/HDFS-14872
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 2.8.5, 3.2.1
>Reporter: David Mollitor
>Priority: Major
>
> When the HDFS client is downloading (copying) an entire file, allow the 
> client to download the blocks in random order.  If a lot of clients are 
> reading the same file, in parallel, they will all download the first block, 
> the second block, and so on, stampeding down the line.
> It would be interesting to spread the load across across all the available 
> DataNodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault

2019-09-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938012#comment-16938012
 ] 

David Mollitor commented on HDFS-14863:
---

Different unit tests failed on the second Yetus run.  Flaky tests.

This particular data structure is accessed in a few places, but this is the 
only place it is synchronized on.  I just don't see a reason for it and it's 
not documented anywhere as to why this may be the case.

> Remove Synchronization From BlockPlacementPolicyDefault
> ---
>
> Key: HDFS-14863
> URL: https://issues.apache.org/jira/browse/HDFS-14863
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010
> The {{clusterMap}} has its own internal synchronization.  Also, these are 
> only read operations so any changes applied to the {{clusterMap}} from 
> another thread will be applied since no other thread synchronizes on the 
> {{clusterMap}} itself (that I could find).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault

2019-09-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14863:
--
Status: Open  (was: Patch Available)

> Remove Synchronization From BlockPlacementPolicyDefault
> ---
>
> Key: HDFS-14863
> URL: https://issues.apache.org/jira/browse/HDFS-14863
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010
> The {{clusterMap}} has its own internal synchronization.  Also, these are 
> only read operations so any changes applied to the {{clusterMap}} from 
> another thread will be applied since no other thread synchronizes on the 
> {{clusterMap}} itself (that I could find).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault

2019-09-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14863:
--
Status: Patch Available  (was: Open)

> Remove Synchronization From BlockPlacementPolicyDefault
> ---
>
> Key: HDFS-14863
> URL: https://issues.apache.org/jira/browse/HDFS-14863
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010
> The {{clusterMap}} has its own internal synchronization.  Also, these are 
> only read operations so any changes applied to the {{clusterMap}} from 
> another thread will be applied since no other thread synchronizes on the 
> {{clusterMap}} itself (that I could find).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault

2019-09-25 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14863:
--
Attachment: HDFS-14863.2.patch

> Remove Synchronization From BlockPlacementPolicyDefault
> ---
>
> Key: HDFS-14863
> URL: https://issues.apache.org/jira/browse/HDFS-14863
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010
> The {{clusterMap}} has its own internal synchronization.  Also, these are 
> only read operations so any changes applied to the {{clusterMap}} from 
> another thread will be applied since no other thread synchronizes on the 
> {{clusterMap}} itself (that I could find).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14862) Review of MovedBlocks

2019-09-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937794#comment-16937794
 ] 

David Mollitor commented on HDFS-14862:
---

[~elgoiri] Thank you for taking a look.

I'll take a look at this again.

This class is a bit confusing, the {{getLocations()}} method included.  There 
is no reason that this method needs to be synchronized at all because the 
variable {{locations}} is defined as {{final}} in the constructor and therefore 
will never change.  Since it never changes, there's no need to synchronize.


There are no comments in the code, so it's a bit hard to understand how it's 
being used, but it may be a life-cycle thing.  That is, multiple threads may be 
used to add new locations to the block, but then at the end, only a single 
thread accesses the results (through {{getLocations()}}).

However, I just realized that the blocks are also being synchronized externally 
as well.

https://github.com/apache/hadoop/blob/1de25d134f64d815f9b43606fa426ece5ddbc430/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java#L831

I think I'll drop the external synchronization in {{Dispatcher.java}}, keep the 
synchronization on the collection (because it is protected) and put a comment 
on the {{getLocations()}} method that warns user of trying to interact with the 
returned List... that it is not thread safe to change the contents and may 
throw a {{ConcurrentModicationException}} if the underlying collection is 
modified.

> Review of MovedBlocks
> -
>
> Key: HDFS-14862
> URL: https://issues.apache.org/jira/browse/HDFS-14862
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14862.1.patch
>
>
> Internal data structure needs to be protected (synchronized) but is scoped as 
> {{protected}} so any sub-class could modify without a lock.  Synchronize the 
> collection itself for protection.  It also returns the internal data 
> structure in {{getLocations}} so the structure could be modified outside of 
> the lock.  Create a copy instead.
> {code:java}
> /** The locations of the replicas of the block. */
> protected final List locations = new ArrayList(3);
> 
> public Locations(Block block) {
>   this.block = block;
> }
> 
> /** clean block locations */
> public synchronized void clearLocations() {
>   locations.clear();
> }
> ...
>/** @return its locations */
> public synchronized List getLocations() {
>   return locations;
> }
> {code}
>  
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java#L43]
> Also, remove a bunch of superfluous and complicated code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937757#comment-16937757
 ] 

David Mollitor commented on HDFS-14865:
---

[~elgoiri]

So, I was looking at the code, looking to remove some synchronization wherever 
I can find.

In this case, the threads require the {{namesystem}} write lock to interact 
with methods that modify the internal structure, so that is a pretty 
restrictive access applied.  On top of it, I did not remove synchronization 
from the class per se, I simply pushed it down into the collection using a 
{{ConcurrentHashMap}}.  The {{ConcurrentHashMap}} is nifty because it 
internally has several locks so that it can lock certain sections of the 
{{Map}} without locking the entire structure.

My aim here is to make it possible for the {{get*()}} methods of this class to 
be able to interact with the structure without serialized synchronization.

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, 
> HDFS-14865.3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937147#comment-16937147
 ] 

David Mollitor commented on HDFS-14860:
---

Still getting {{java.lang.OutOfMemoryError: unable to create new native thread}}

Maybe the build containers need to be larger.  I'll try again in the next few 
days and see if the issue clears.

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, 
> HDFS-14860.3.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Patch Available  (was: Open)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch, HDFS-14864.4.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Attachment: HDFS-14864.4.patch

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch, HDFS-14864.4.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Open  (was: Patch Available)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch, HDFS-14864.4.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937006#comment-16937006
 ] 

David Mollitor commented on HDFS-14864:
---

{{java.lang.OutOfMemoryError: unable to create new native thread}} seems to be 
a common failure.  Doesn't look related, but I'll kick it off once more to see 
if the number of tests review come down.

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Status: Patch Available  (was: Open)

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, 
> HDFS-14860.3.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936816#comment-16936816
 ] 

David Mollitor commented on HDFS-14843:
---

[~elgoiri] Are you able to help me out on this one too?

> Double Synchronization in BlockReportLeaseManager
> -
>
> Key: HDFS-14843
> URL: https://issues.apache.org/jira/browse/HDFS-14843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14843.1.patch
>
>
> {code:java|title=BlockReportLeaseManager.java}
>   private synchronized long getNextId() {
> long id;
> do {
>   id = nextId++;
> } while (id == 0);
> return id;
>   }
> {code}
> This is a private method and is synchronized, however, it is only be accessed 
> from an already-synchronized method.  No need to double-synchronize.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936804#comment-16936804
 ] 

David Mollitor commented on HDFS-14837:
---

[~elgoiri] Are we good to move forward on this?

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch, HDFS-14837.2.patch, 
> HDFS-14837.3.patch, HDFS-14837.4.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14872) Read HDFS Blocks in Random Order

2019-09-24 Thread David Mollitor (Jira)
David Mollitor created HDFS-14872:
-

 Summary: Read HDFS Blocks in Random Order
 Key: HDFS-14872
 URL: https://issues.apache.org/jira/browse/HDFS-14872
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 3.2.1, 2.8.5
Reporter: David Mollitor


When the HDFS client is downloading (copying) an entire file, allow the client 
to download the blocks in random order.  If a lot of clients are reading the 
same file, in parallel, they will all download the first block, the second 
block, and so on, stampeding down the line.

It would be interesting to spread the load across across all the available 
DataNodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-23 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936141#comment-16936141
 ] 

David Mollitor commented on HDFS-14860:
---

I'll submit as many times at it takes :)

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, 
> HDFS-14860.3.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Attachment: HDFS-14860.3.patch

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, 
> HDFS-14860.3.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Status: Open  (was: Patch Available)

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Attachment: HDFS-14864.3.patch

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Patch Available  (was: Open)

Same patch.  Different name to kick off CI.

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, 
> HDFS-14864.3.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-23 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Open  (was: Patch Available)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Attachment: HDFS-14865.3.patch

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, 
> HDFS-14865.3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Status: Open  (was: Patch Available)

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, 
> HDFS-14865.3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-21 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Status: Patch Available  (was: Open)

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, 
> HDFS-14865.3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Status: Patch Available  (was: Open)

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Status: Open  (was: Patch Available)

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Attachment: HDFS-14865.2.patch

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Patch Available  (was: Open)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Attachment: HDFS-14864.2.patch

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Open  (was: Patch Available)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934738#comment-16934738
 ] 

David Mollitor commented on HDFS-14864:
---

Something seems up with the CI build.  _TestHdfsNativeCodeLoader_ is a flaky 
failure for sure, and _shadedclient_ seems to be failing for all my recent 
patches.

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14866) NameNode stopRequested is Marked volatile

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14866:
--
Status: Patch Available  (was: Open)

> NameNode stopRequested is Marked volatile
> -
>
> Key: HDFS-14866
> URL: https://issues.apache.org/jira/browse/HDFS-14866
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Trivial
> Attachments: HDFS-14866.1.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405
> "Used for testing" so not a big deal, but it's a bit odd that it's scoped as 
> 'protected' and is not 'volatile'.  It could be accessed outside of a lock 
> and getting a bad value.  Tighten that up a little.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14866) NameNode stopRequested is Marked volatile

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14866:
--
Attachment: HDFS-14866.1.patch

> NameNode stopRequested is Marked volatile
> -
>
> Key: HDFS-14866
> URL: https://issues.apache.org/jira/browse/HDFS-14866
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Trivial
> Attachments: HDFS-14866.1.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405
> "Used for testing" so not a big deal, but it's a bit odd that it's scoped as 
> 'protected' and is not 'volatile'.  It could be accessed outside of a lock 
> and getting a bad value.  Tighten that up a little.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14866) NameNode stopRequested is Marked volatile

2019-09-20 Thread David Mollitor (Jira)
David Mollitor created HDFS-14866:
-

 Summary: NameNode stopRequested is Marked volatile
 Key: HDFS-14866
 URL: https://issues.apache.org/jira/browse/HDFS-14866
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405

"Used for testing" so not a big deal, but it's a bit odd that it's scoped as 
'protected' and is not 'volatile'.  It could be accessed outside of a lock and 
getting a bad value.  Tighten that up a little.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Status: Open  (was: Patch Available)

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Attachment: HDFS-14860.2.patch

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14860:
--
Status: Patch Available  (was: Open)

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-09-20 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934684#comment-16934684
 ] 

David Mollitor commented on HDFS-14860:
---

[~elgoiri] 

Are you talking about this?

{code:java}
  LOG.debug("Storage policy satisfier service is running outside namenode,"
  + " ignoring");
{code}

The compiler will take care of that at compile time.  Not to worry.

> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Attachment: HDFS-14865.1.patch

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Status: Patch Available  (was: Open)

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-14865.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)
David Mollitor created HDFS-14865:
-

 Summary: Reduce Synchronization in DatanodeManager
 Key: HDFS-14865
 URL: https://issues.apache.org/jira/browse/HDFS-14865
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14865:
--
Flags: Patch

> Reduce Synchronization in DatanodeManager
> -
>
> Key: HDFS-14865
> URL: https://issues.apache.org/jira/browse/HDFS-14865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue

2019-09-20 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14864:
--
Status: Patch Available  (was: Open)

> DatanodeDescriptor Use Concurrent BlockingQueue
> ---
>
> Key: HDFS-14864
> URL: https://issues.apache.org/jira/browse/HDFS-14864
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14864.1.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106
> This collection needs to be thread safe and it needs to repeatedly poll the 
> queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method 
> just for this purpose:
> {quote}
> This operation may be more efficient than repeatedly polling this queue.
> {quote}
> [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)]
> Also, the collection returns 'null' if there is nothing to drain from the 
> queue.  This is a confusing and error-prone affect.  It should just return an 
> empty list.  I've also updated the code to be more consistent and to return a 
> java {{List}} in all places instead of a {{List}} in some and a native array 
> in others.  This will make the entire usage much more consistent and safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   >