[
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963398#comment-16963398
]
Miklos Szegedi commented on YARN-9863:
--------------------------------------
[~belugabehr], thank you for the feedback. I did some end to end tests for
replication of files of a few gigabytes in 2017. The way HDFS does this is that
it copies the file first to one data node. Once the replication is set, it
starts streaming over full duplex lines based on my results, so no data node
requires more that 1 connection. The final replication count should be
proportional to the node count, so that connections are not limited, when
localizing, in fact in some cases data local mapping may help. I do not
remember well but I used an API to check the current replica count to wait for.
I can look it up, if you are interested in the details.
[~snemeth], do you think this feature is required?
> Randomize List of Resources to Localize
> ---------------------------------------
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of
> resources to be shuffled randomly. This will allow the Localizer to spread
> the load of requests so that not all of the NodeManagers are requesting to
> localize the same files, in the same order, from the same DataNodes,
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]