[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949769#comment-16949769
 ] 

David Mollitor commented on YARN-9863:
--------------------------------------

[~szegedim] Thank you for your feedback.

The background here is that I am working with a large cluster that has one job 
in particular that is crushing it. This one job is required to localize many 
resources, of varying file sizes, for the job to complete. As I understand 
YARN, when a job is submitted to the cluster, a list of files to localize is 
sent to each NodeManager involved in the job. In this case, all nodes are 
involved. All NodeManagers receive a carbon copy of the list of files from the 
ResourceManager (or maybe it's the 'yarn' client?). That is, they all have the 
same list, with the same ordering. The NodeManager then iterate through the 
list and request that each file be localized.

So, it would seem to me that all of the NodeManagers would request from HDFS 
file1, file2, file3, ...

This would have a stampeding affect on the HDFS DataNodes.

I am familiar with {{mapreduce.client.submit.file.replication}}. I understand 
that this is used to pump-up the replication of the submitted files so that 
they are available on more DataNodes. However, the way that it works, as I 
understand it, is that the file is first written to the HDFS cluster with the 
default replication (usually 3), and then the client requests that the file be 
replicated up to the final size in a separate request (setrep). This 
replication process happens asynchronously. If the 
{{mapreduce.client.submit.file.replication}} is set to 10, for example, the job 
may be submitted and finished before the file actually achieves a final 
replication of 10. This becomes exacerbated on larger clusters. If a cluster 
has 1,000 nodes, the recommended value of 
{{mapreduce.client.submit.file.replication}} is sqrt(1000) or ~32. The default 
number of connections each DataNode can support is 10 
({{dfs.datanode.handler.count}}). So, even if the desired replication is 
achieved, that is 32 x 10 connections = 320 connections supported at once. In a 
cluster with 1,000 nodes, that is going to stall.

By simply randomizing the list, the load can be spread across many sets of 32 
nodes and better support this scenario.

For your questions:
 # I'm not sure how HDFS would manage this. The requests are generated by the 
NodeManagers and the HDFS cluster is simply serving. They have no way to 
randomize the requests.
 # SecureRandom. This is not a secure operation. It only requires a fast and 
pretty-good randomization of the list to spread the load
 # I believe that the parallel nature of the localization is configurable with 
{{yarn.nodemanager.localizer.fetch.thread-count}} (default 4), but I believe 
that the requests are submitted to a work-queue in order, so there will still 
be some level of trampling, especially if there are more than 4 files to 
localize (as is this case with the scenario I am reviewing)

> Randomize List of Resources to Localize
> ---------------------------------------
>
>                 Key: YARN-9863
>                 URL: https://issues.apache.org/jira/browse/YARN-9863
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>         Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to