GitHub user httfighter opened a pull request:

    https://github.com/apache/spark/pull/22487

    [SPARK-25477] “INSERT OVERWRITE LOCAL DIRECTORY”, the data files 
allo…

    …cated on the non-driver node will not be written to the specified output 
directory
    
    ## What changes were proposed in this pull request?
    As  The "INSERT OVERWRITE LOCAL DIRECTORY" features use the local staging 
directory to load data into the specified output directory , the data files 
allocated on the non-driver node will not be written to the specified output 
directory. 
    In saveAsHiveFile.scala, the code is based on the output directory to 
determine whether to use the local staging directory or the distributed staging 
directory. I change the getStagingDir() method. Modify the first parameter from 
    " new Path(extURI.getScheme, extURI.getAuthority, extURI.getPath) " to "new 
Path(extURI.getPath)"
    
    If spark depends on the distributed storage system, then it will be used 
first. If it is not, it will be used locally. You can directly adjust it to let 
it be automatically selected instead of specifying it according to the output 
directory.
    
    ## How was this patch tested?
    manual tests
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/httfighter/spark SPARK-25477

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22487
    
----
commit 8fe6d095fd2ce1a1a129a46345b1cecf6df70d8c
Author: 韩田田00222924 <han.tiantian@...>
Date:   2018-09-20T07:57:06Z

    [SPARK-25477] “INSERT OVERWRITE LOCAL DIRECTORY”, the data files 
allocated on the non-driver node will not be written to the specified output 
directory

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to