[GitHub] spark pull request: [SPARK-1368][SQL] Optimized HiveTableScan

liancheng Fri, 16 May 2014 16:28:32 -0700

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/758#issuecomment-43180729
  
    @marmbrus I worked around the test failure by adding a `SortedOperation` 
pattern that conservatively matches *some* definitely sorted operations (false 
negative rather than false positive). This may slow down the test suite a bit, 
since most test output are empty or very small, this shouldn't be an issue 
right now.
    
    Two new optimizations applied:
    
    - Using mutable pairs
    - Avoiding pattern matching function calls (`Array.unapplySeq`)
    
    New micro benchmark data:
    
    ```
    Original:
    
    [info] CSV: 27676 ms, RCFile: 26415 ms
    [info] CSV: 27703 ms, RCFile: 26029 ms
    [info] CSV: 27511 ms, RCFile: 25962 ms
    
    Optimized:
    
    [info] CSV: 12357 ms, RCFile: 9283 ms
    [info] CSV: 12291 ms, RCFile: 9298 ms
    [info] CSV: 12325 ms, RCFile: 9242 ms
    ```
    
    As for Hive data unwrapping, I couldn't find a "static" method to eliminate 
right now. Any hints?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1368][SQL] Optimized HiveTableScan

Reply via email to