Ngone51 commented on pull request #28839:
URL: https://github.com/apache/spark/pull/28839#issuecomment-645234503


   Hi @sarutak, thanks for reporting and the fix.
   
   First of all, I think it's very unlikely that we'll reach the locality wait 
timeout(default 3s) since it is still very long for such a unit test. 
   
   After checking the 
[log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/testReport/org.apache.spark.scheduler/BarrierTaskContextSuite/SPARK_31485__barrier_stage_should_fail_if_only_partial_tasks_are_launched/),
 I believe the real root cause should be:
   
   Two test cases from different test suites got submitted at the same time 
because of concurrent execution. In this particular case, the two test cases 
(from DistributedSuite and BarrierTaskContextSuite) both launch under 
local-cluster mode. The two applications are submitted at the SAME time so they 
have the same applications(app-20200615210132-0000). Thus, when the cluster of 
BarrierTaskContextSuite is launching executors, it failed to create the 
directory for the executor 0/1, because the path 
(/home/jenkins/workspace/work/app-app-20200615210132-0000/0) has been used by 
the cluster of DistributedSuite. Therefore, it has to launch executor 2 and 3 
instead, that lead to non of the tasks can get perferred locality thus they got 
scheduled together and lead to the test failure.
   
   You can download the log from 
`https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/`
 and search appId app-20200615210132-0000 to confirm the root cause.
   
   
   The right fix I think is to use the dynamic executor id from the 
SparkContext instead of hardcode it. I'd like to open a separate PR for the fix 
if you don't mind.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to