[GitHub] spark pull request: [SPARK-7041] Avoid writing empty files in Bypa...

squito Tue, 09 Jun 2015 10:47:30 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5622#discussion_r32041847
  
    --- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 ---
    @@ -107,6 +107,14 @@ public void insertAll(Iterator<Product2<K, V>> 
records) throws IOException {
             blockManager.diskBlockManager().createTempShuffleBlock();
           final File file = tempShuffleBlockIdPlusFile._2();
           final BlockId blockId = tempShuffleBlockIdPlusFile._1();
    +      // Note that we purposely do not call open() on the disk writers 
here; DiskBlockObjectWriter
    +      // will automatically open() itself if necessary. This is an 
optimization to avoid file
    +      // creation and truncation for empty partitions; this optimization 
probably doesn't make sense
    +      // for most realistic production workloads, but it can make a large 
difference when playing
    +      // around with Spark SQL queries in spark-shell on toy datasets: if 
you performed a query over
    +      // an extremely small number of records then Spark SQL's default 
parallelism of 200 would
    +      // result in slower out-of-the-box performance due to these 
constant-factor overheads. This
    +      // optimization speeds up local microbenchmarking and SQL unit tests.
           partitionWriters[i] =
             blockManager.getDiskWriter(blockId, file, serInstance, 
fileBufferSize, writeMetrics).open();
    --- End diff --
    
    looks like you are still calling `open()` here :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7041] Avoid writing empty files in Bypa...

Reply via email to