[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you, @gatorsmile !!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18640 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you so much, @rxin , @cloud-fan , @sameeragarwal , @mridulm , @viirya ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @cloud-fan , @rxin , @sameeragarwal and @mridulm . Could you merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you again, @viirya . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18640 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18640 LGTM besides some minor questions, @rxin any more comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @sameeragarwal and @mridulm . I cannot see any clear reason for the objection here. Also, there is a positive feedback from @ash211 in the dev@spark, too. This PR will bring an improvement definitely. Could you merge this PR for Apache Spark to move forward? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80576/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @mridulm, @sameeragarwal , and @rxin . Please let me know if there is something for me to do here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you so much, @sameeragarwal . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/18640 LGTM; unless @rxin still has some strong objections? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80466/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 @rxin . Could you make some decision for this PR? Do we need to put this into `sql/hive` still for some reasons? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Sure. Thank you so much, @omalley ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 I would also comment that in the long term, Spark should move to using the vectorized reader in ORC's core. That would remove the dependence on ORC's mapreduce module, which provides row by row shims on top of the vectorized reader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you again for coming and reviewing this PR, @rxin , @kiszk , @mridulm , @omalley . So far, we discussed the followings. 1. `Why are we adding this to core? Why not just the hive module?` (@rxin) - `sql/core` module gives more benefit than `sql/hive`. - Apache ORC library (`no-hive` version) is a general and resonably small library designed for non-hive apps. 2. `Can we add smaller amount of new code to use this, too?` (@kiszk) - The previous #17980 , #17924, and #17943 are the complete examples containing this PR. - This PR is focusing on dependency only. 3. `Why don't we then create a separate orc module? Just copy a few of the files over?` (@rxin) - Apache ORC library is the same with most of other data sources(CSV, JDBC, JSON, PARQUET, TEXT) which live inside `sql/core` - It's better to use as a library instead of copying ORC files because Apache ORC shaded jar has many files. We had better depend on Apache ORC community's effort until an unavoidable reason for copying occurs. 4. `I do worry in the future whether ORC would bring in a lot more jars` (@rxin) - The ORC core library's dependency tree is aggressively kept as small as possible. I've gone through and excluded unnecessary jars from our dependencies. I also kick back pull requests that add unnecessary new dependencies. (@omalley) I tried to contain and summarize all advices here, but please let me know if I missed some concerns here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 @rxin . How can I proceed this PR now? Could you give me some advice again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you, @omalley . @rxin . I think we had better depend on Apache ORC libraries as is in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 @rxin The ORC core library's dependency tree is aggressively kept as small as possible. I've gone through and excluded unnecessary jars from our dependencies. I also kick back pull requests that add unnecessary new dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @rxin Since ORC 1.4.0, ORC community provides small shaded jar files to improve usability in general purposes. This PR uses the followings. - orc-core-1.4.0-nohive.jar (1.4MB) - orc-mapreduce-1.4.0-nohive.jar (739KB) The size is due to including the [followings](https://github.com/apache/orc/blob/master/java/pom.xml#L258-L259). - com.google.protobuf:protobuf-java - org.apache.hive:hive-storage-api In terms of the number of files, - ORC (354 files) - ProtoBuf (247 files) - Hive Storage API (92 files) The bottom line is there are still some source codes come from `org.apache.hive` namespace originally. So, I'm wondering if this is the reason why you want to put this into `sql/hive` module still and want to copy source codes instead of using this shaded jar? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 I just checked the dependency size. They look pretty reasonable, roughly 2 MBs in total (although I do worry in the future whether ORC would bring in a lot more jars). cc @omalley any guidance on this topic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Until now, I think ORC is the same with most of other data sources(CSV, JDBC, JSON, PARQUET, TEXT) which live inside `sql/core` now. If that is an architectural plan of Apache Spark 2.3, I will. Are we going to move out all data sources into separate modules, e.g., `datasources/parquet`, in timeframe of Spark 2.3? Or, is there any other reason I don't catch here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why don't we then create a separate orc module? Just copy a few of the files over? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 I agree with the following, but this does not block those users. This is only better than putting the dependency on Hive because it also supports more the other users who are using ML and storage, too. In addition, when we refactor the data source dependecies, this will help the refactoring as clean as Parquet. > To the best of my knowledge almost everybody runs with Hive anyway and the vast majority of users that run ORC are Hive users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 To the best of my knowledge almost everybody runs with Hive anyway and the vast majority of users that run ORC are Hive users. In hindsight we probably should have put most of the data source dependencies as separate packages similar to Presto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/18640 LGTM, great to see progress on ORC support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you for review, @kiszk . The example may be #17980 , #17924, and #17943 . If possible, in this PR, I want to focus on only `Dependency on ORC` issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you for review, @rxin . We can use ORC like Parquet now. Parquet is inside `sql/core`, not `sql/hive`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18640 Can we add any smaller code to use this, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why are we adding this to core? Why not just the hive module? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @liancheng , @zhzhan , @rxin , @marmbrus . I'm pining you since you worked on #6194 before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80221/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79951/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @rxin , @srowen , @sameeragarwal , @cloud-fan , @hvanhovell , @gatorsmile , @ueshin , @viirya , @kiszk . Could you review this small PR about depedency change? This is a start of upgrade to Apache ORC in order to reduce the old Hive dependency in Apache Spark 2.3 for the following issues. - SPARK-20901 Feature parity for ORC with Parquet - SPARK-20682 Support a new faster ORC data source based on Apache ORC - SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core - SPARK-16060 Vectorized Orc Reader I've heard that Apache Spark will not drop ORC data source from @sameeragarwal . If then, could we move forward a small step like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79627/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)** for PR 18640 at commit [`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 This aims to reduce the review scope for #17980 . cc @kiszk . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org