[ 
https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734931#comment-14734931
 ] 

Jason Lowe commented on YARN-2410:
----------------------------------

Thanks for updating the patch!

bq. The only reason was findbugs which does not allow more than 7 parameters in 
a function call
Normally a builder pattern is used to make the code more readable in those 
situations.  However I don't think we need more than 7.  ReduceContext really 
only needs mapIds, reduceId, channelCtx, user, infoMap, and outputBasePathStr.  
The other two parameters are either known to be zero (should not be passed) and 
can be derived from another (size of mapIds).  As such we don't need 
SendMapOutputParams.

bq. The reduceContext is a variable holds the value set by the setAttachment() 
method and is used by the getAttachment() answer. If I declare it in the test 
method, it needs be final which cannot be done due to it being used by the 
setter.
createMockChannel can simply have a ReduceContext parameter, marked final, and 
that should solve that problem.  But I thought we were getting rid of the use 
of channel attachments and just associating the context with the listener 
directly?

Related to the last comment, we're still using channel attachments.  sendMap 
can just take a ReduceContext parameter, and the listener can provide its 
context when calling it.  No need for channel attachments.

This can NPE since we're checking for null after we already use it:
{noformat}
+            nextMap = sendMapOutput(
+                reduceContext.getSendMapOutputParams().getCtx(),
+                reduceContext.getSendMapOutputParams().getCtx().getChannel(),
+                reduceContext.getSendMapOutputParams().getUser(), mapId,
+                reduceContext.getSendMapOutputParams().getReduceId(), info);
+            nextMap.addListener(new ReduceMapFileCount(reduceContext));
+            if (null == nextMap) {
{noformat}

maxSendMapCount should be cached during serviceInit like the other conf-derived 
settings so we aren't doing conf lookups on every shuffle.

The indentation in sendMap isn't correct, as code is indented after a 
conditional block at the same level as the contents of the conditional block.  
There's other places that are over-indented.

MockShuffleHandler only needs to override one thing, getShuffle, but the mock 
that method returns has to override a bunch of stuff.  It makes more sense to 
create a separate class for the mocked Shuffle than the mocked ShuffleHandler.

Should the mock Future stuff be part of creating the mocked channel?  Can 
simply pass the listener list to use as an arg to the method that mocks up the 
channel.

Nit: SHUFFLE_MAX_SEND_COUNT should probably be something like 
SHUFFLE_MAX_SESSION_OPEN_FILES to better match the property name.  Similarly 
maxSendMapCount could have a more appropriate name.

Nit: Format for 80 columns

Nit: There's still instances where we have a class definition immediately after 
variable definitions and a lack of whitespace between classes and methods or 
between methods. Whitespace would help readability in those places.

> Nodemanager ShuffleHandler can possible exhaust file descriptors
> ----------------------------------------------------------------
>
>                 Key: YARN-2410
>                 URL: https://issues.apache.org/jira/browse/YARN-2410
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Nathan Roberts
>            Assignee: Kuhu Shukla
>         Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, 
> YARN-2410-v3.patch, YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in 
> a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an 
> async transfer of the particular portion of this file(). This will 
> theoretically
> happen 6000*40=240000 times which will run the NM out of file descriptors and 
> cause it to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to