[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @cloud-fan . As you adviced, I will replace old ORC in the current namespace and will try to move to `sql/core` later. Although, we cannot switch among old ORC and new ORC, we can bring

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 So far, the current ORC related code looks too old and tightly integrated with `hive-exec-1.2.1.spark2.jar` and `hive` module side-by-side. The patch also need to touch every part because

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 @cloud-fan . I'll rethink about consolidation the old and the new. Thank you for the advice! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 In case of `OrcFilters.scala`, the API is changed like the following. ``` - Some(builder.startAnd().isNull(attribute).end()) + Some(builder.startAnd().isNull(attribute,

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 The goal is using ORC with `-Phive`. You can build Spark and use ORC datasource. Previously, `org.apache.spark.sql.hive.orc.ORCFileFormat` is tightly coupled with Hive code outside

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18953 Are the ORC APIs changed a lot in 1.4? I was expecting a small patch to upgrade the current ORC data source, without moving it to sql/core. --- If your project is set up for it, you can reply to

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80722/ Test PASSed. ---

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 For the reader, there are three part. 1. OrcColumnarBatchReader: It's not included here. 2. **OrcRecordIterator**: It's included here. It doesn't not use Spark vectorization. 3.

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 This PR is about 1,100 lines and #17980 is about 3,833. I also updated #17980 today, too. If you want to review that PR, that is also great! --- If your project is set up for it, you can

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @cloud-fan . In my email, I wrote in the following order . 1. SPARK-21422: Depend on Apache ORC 1.4.0 2. SPARK-20682: Add a new faster ORC data source based on

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18953 what's the project plan for this ORC stuff? shall we move the old orc data source to sql/core with orc 1.4 first, and then send a new PR for vectorized reader? --- If your project is set up for

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80721/ Test FAILed. ---

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @cloud-fan , @gatorsmile , @rxin , @sameeragarwal , and @viirya . Could you review this ORC PR? I narrow down the focus and reduce the size of PR. For review purpose, I replace the

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80710/ Test PASSed. ---

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Rebased to the master since #18640 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80707/ Test FAILed. ---

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)** for PR 18953 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Implement new ORC data source based o...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)** for PR 18953 at commit