GitHub user rezasafi opened a pull request:

    https://github.com/apache/spark/pull/19848

    [SPARK-22162] Executors and the driver should use consistent JobIDs in the 
RDD commit protocol

    I have modified SparkHadoopWriter so that executors and the driver always 
use consistent JobIds during the hadoop commit. Before SPARK-18191, spark 
always used the rddId, it just incorrectly named the variable stageId. After 
SPARK-18191, it used the rddId in the driver, and the stageId in the executors. 
With this change, FileCommitProtocol now has a commitTask method that will 
receive  rddId as the JobId  in addition to the stageId. Then during the hadoop 
commit protocol, the jobId will be used by hadoop while spark can still uses 
stageId like before. This way executors and the driver will consistently uses 
stageId.
    In addition to the existing unit tests, a test has been added to check 
whether executors and the driver are using the same JobId. The test failed 
before this change and passed after applying this fix.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rezasafi/spark stagerddsimple

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19848
    
----
commit 4dbdbe77435630e3b35581c59189ec75c9c2484d
Author: Reza Safi <rezas...@cloudera.com>
Date:   2017-11-28T23:03:37Z

    [SPARK-22162] Executors and the driver should use consistent JobIDs in the 
RDD commit protocol

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to