Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
Hi, @cloud-fan . As you adviced, I will replace old ORC in the current
namespace and will try to move to `sql/core` later. Although, we cannot switch
among old ORC and new ORC, we can bring
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
So far, the current ORC related code looks too old and tightly integrated
with `hive-exec-1.2.1.spark2.jar` and `hive` module side-by-side.
The patch also need to touch every part because
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
@cloud-fan . I'll rethink about consolidation the old and the new. Thank
you for the advice!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
In case of `OrcFilters.scala`, the API is changed like the following.
```
- Some(builder.startAnd().isNull(attribute).end())
+ Some(builder.startAnd().isNull(attribute,
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
The goal is using ORC with `-Phive`. You can build Spark and use ORC
datasource.
Previously, `org.apache.spark.sql.hive.orc.ORCFileFormat` is tightly
coupled with Hive code outside
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18953
Are the ORC APIs changed a lot in 1.4? I was expecting a small patch to
upgrade the current ORC data source, without moving it to sql/core.
---
If your project is set up for it, you can reply to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80722/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80722 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)**
for PR 18953 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
For the reader, there are three part.
1. OrcColumnarBatchReader: It's not included here.
2. **OrcRecordIterator**: It's included here. It doesn't not use Spark
vectorization.
3.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
This PR is about 1,100 lines and #17980 is about 3,833.
I also updated #17980 today, too. If you want to review that PR, that is
also great!
---
If your project is set up for it, you can
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
Hi, @cloud-fan .
In my email, I wrote in the following order .
1. SPARK-21422: Depend on Apache ORC 1.4.0
2. SPARK-20682: Add a new faster ORC data source based on
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18953
what's the project plan for this ORC stuff? shall we move the old orc data
source to sql/core with orc 1.4 first, and then send a new PR for vectorized
reader?
---
If your project is set up for
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80722 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80722/testReport)**
for PR 18953 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
Retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80721/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80721 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)**
for PR 18953 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
Hi, @cloud-fan , @gatorsmile , @rxin , @sameeragarwal , and @viirya .
Could you review this ORC PR? I narrow down the focus and reduce the size
of PR.
For review purpose, I replace the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80710/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80710 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)**
for PR 18953 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80721 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80721/testReport)**
for PR 18953 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18953
Rebased to the master since #18640 is merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18953
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80707/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80707 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)**
for PR 18953 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80710 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80710/testReport)**
for PR 18953 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18953
**[Test build #80707 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80707/testReport)**
for PR 18953 at commit
29 matches
Mail list logo