[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Thank you all for review and merge! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19521 Thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Thank you, @rxin ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19521 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19521 cc @srowen @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Thank you for review, @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19521 Empty schema path probably related with this IIRC (not double checked): https://github.com/apache/spark/blob/cca945b6aa679e61864c1cabae91e6ae7703362e/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala#L52-L58 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19521 LGTM too BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Oh, I confused with what I'm watching in these days. For your example, Parquet also doesn't support. We may create an issue for both Parquet/ORC on empty schema . ```scala scala> val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty) scala> val dfNoCols = spark.createDataFrame(rddNoCols, StructType(Seq.empty)) scala> dfNoCols.write.format("parquet").saveAsTable("px") 17/10/18 05:46:17 ERROR Utils: Aborting task org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema { } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19521 `SPARK-15474 ` is zero row. The above case is zero column. Are they the same issues? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Thank you for review, @gatorsmile . 1. The test case was added at #15898 (SPARK-18457). I guess Parquet returns `null`, but we had better have explicit test cases. I will try to extend that test case for parquet next time. 2. Thanks for bringing that up. Yes. We can resolve that empty ORC file issue, SPARK-15474 (ORC data source fails to write and read back empty dataframe), with new ORC source by creating an empty file with the correct schema, not `struct<>`. BTW, I've linked all related ORC issues into [SPARK-20901](https://issues.apache.org/jira/browse/SPARK-20901) and am working on it. You can monitor ORC progress there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19521 We can save an empty DataFrame as an ORC table, but we are unable to fetch it from the table. ```Scala val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty) val dfNoCols = spark.createDataFrame(rddNoCols, StructType(Seq.empty)) dfNoCols.write.format("orc").saveAsTable("t") spark.sql("select 1 from t").show() ``` This is not related to this upgrade, but you might be interested in this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19521 Also LGTM Regarding the test case you posted, does Parquet return `null` or `empty string`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Thank you for review, @cloud-fan ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19521 looks good, no new dependencies introduced, just upgrading. cc @srowen to double check. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19521 Hi, @gatorsmile and @cloud-fan . This will remove the regression on on-going ORC PRs. - [test("Empty schema does not read data from ORC file")](https://github.com/apache/spark/pull/18953/files#diff-10e604a9a9d9c4bcc9cdc01049851095R609) Could you review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82853/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19521 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19521 **[Test build #82853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82853/testReport)** for PR 19521 at commit [`50ec007`](https://github.com/apache/spark/commit/50ec00769263bc7ea732cd0895091ac8701f84b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19521 **[Test build #82853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82853/testReport)** for PR 19521 at commit [`50ec007`](https://github.com/apache/spark/commit/50ec00769263bc7ea732cd0895091ac8701f84b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org