Github user kmadhugit commented on the pull request:
https://github.com/apache/spark/pull/7461#issuecomment-140028952
@andrewor14 , @shivaram , @mateiz
I debugged little more and found that its the timing issue causing
context.getExecutorStorageStatus.length to return 1 in my testcase. The
executors are registered asynchronously,so when treeAggregate calls
context.getexecutorStorageStatus, the executor registration process was still
on the way, none of them were completed. So if I add a "Thread sleep 5000"
before calling treeAggregate then the testcase passes, i.e it gives sufficient
time for executors to get registered with the sparkcontext of the application.
This scenario would rarely occur in actual customer application and we can
always skip this optimization if no of executors are less than a threshold
value.Now, the question is should we go ahead with the fix by adding a testcase
with the delay and sanity check for this specific case in treeAggregate?
Though there is no way to exactly predict the correct number of executors,
there may not be drastic change in number of executors between the point we
call context.getExecutorStorageStatus.length and reduceByKey() inside
treeAggregate. An approximate number of executor should be sufficient for this
optimization as minor change to number of executor shouldn't cause any
optimization/functional issues.
Let me know.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]