GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/21289

    [SPARK-24240] Add a config to control whether InMemoryFileIndex should 
update cache when refresh.

    ## What changes were proposed in this pull request?
    In current 
code(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L172),
 after data is inserted, spark will always refresh file index and update the 
cache. If the target table has tons of files, job will suffer time and OOM 
issue. Could we add a config to control whether `InMemoryFileIndex` should 
update cache when refresh.
    
    ## How was this patch tested?
    
    To be added


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-24240

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21289
    
----
commit b49665d16be723a7abe9fdfa9ea600bd7be349df
Author: jinxing <jinxing6042@...>
Date:   2018-05-10T08:00:24Z

    [SPARK-24240] Add a config to control whether InMemoryFileIndex should 
update cache when refresh.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to