GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/22427

    [SPARK-25438][SQL][TEST] Fix FilterPushdownBenchmark to use the same memory 
assumption

    ## What changes were proposed in this pull request?
    
    This PR aims to fix three things in `FilterPushdownBenchmark`.
    
    **1. Use the same memory assumption.**
    The following configurations are used in ORC and Parquet.
    
    - Memory buffer for writing
      - parquet.block.size (default: 128MB)
      - orc.stripe.size (default: 64MB)
    
    - Compression chunk size
      - parquet.page.size (default: 1MB)
      - orc.compress.size (default: 256KB)
    
    SPARK-24692 used 1MB, the default value of `parquet.page.size`, for 
`parquet.block.size` and `orc.stripe.size`. But, it missed to match 
`orc.compression.size`. So, the current benchmark shows the result from ORC 
with 256KB memory for compression and Parquet with 1MB. To compare correctly, 
we need to be consistent.
    
    **2. Dictionary encoding should not be enforced for all cases.**
    SPARK-24206 enforced dictionary encoding for all test cases. This PR 
recovers the default behavior in general and enforces dictionary encoding only 
in case of `prepareStringDictTable`.
    
    **3. Generate test result on AWS r3.xlarge**
    SPARK-24206 generated the result on AWS in order to reproduce and compare 
easily. This PR also aims to update the result on the same machine again in the 
same reason. Specifically, AWS r3.xlarge with Instance Store is used.
    
    ## How was this patch tested?
    
    Manual. Enable the test cases and run `FilterPushdownBenchmark` on `AWS 
r3.xlarge`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-25438

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22427
    
----
commit fb14cd5829f431593db71b1b5ec06dd0957791ad
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-09-15T04:21:54Z

    [SPARK-25438][SQL][TEST] Fix FilterPushdownBenchmark to use the same memory 
assumption

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to