GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19948

    [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

    ## What changes were proposed in this pull request?
    
    Until 2.2.1, Spark raises `RuntimeException` on zero-size ORC files. 
Usually, these zero-size ORC files are generated from 3rd-party apps like 
Flume. Since Apache ORC 1.4.1, ORC library supports this as an empty file 
correctly, ORC-162.
    
    ```scala
    scala> sql("create table empty_orc(a int) stored as orc location 
'/tmp/empty_orc'")
    
    $ touch /tmp/empty_orc/zero.orc
    
    scala> sql("select * from empty_orc").show
    java.lang.RuntimeException: serious problem
    ```
    
    After [SPARK-22279](https://github.com/apache/spark/pull/19499), Apache 
Spark with the default configuration doesn't have this bug. Although Hive 1.2.1 
library code path still has the problem, we had better have a test coverage on 
what we have now in order to prevent future regression on it.
    
    ## How was this patch tested?
    
    Pass a newly added test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-19809-EMPTY-FILE

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19948
    
----
commit 492a30ff673372b04455cd3bb454701961c65760
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2017-12-12T01:42:10Z

    [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to