Github user kmadhugit commented on the pull request:

    https://github.com/apache/spark/pull/7461#issuecomment-140028952
  
    @andrewor14 , @shivaram ,  @mateiz 
    
    I debugged little more and found that its the timing issue causing 
context.getExecutorStorageStatus.length to return 1 in my testcase. The 
executors are registered asynchronously,so when treeAggregate calls 
context.getexecutorStorageStatus, the executor registration process was still 
on the way, none of them were completed. So if I add a "Thread sleep 5000"  
before calling treeAggregate then the testcase passes, i.e it gives sufficient 
time for executors to get registered with the sparkcontext of the application.  
    
    This scenario would rarely occur in actual customer application and we can 
always skip this optimization if no of executors are less than a threshold 
value.Now, the question is should we go ahead with the fix by adding a testcase 
with the delay and sanity check for this specific case in treeAggregate?
    
    Though there is no way to exactly predict the correct number of executors, 
there may not be drastic change in number of executors between the point we 
call context.getExecutorStorageStatus.length  and reduceByKey() inside 
treeAggregate. An approximate number of executor should be sufficient for this 
optimization as minor change to number of executor shouldn't cause any 
optimization/functional issues. 
    
    Let me know.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to