[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thank you all for review and merge!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thank you, @rxin !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19521
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19521
  
cc @srowen @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thank you for review, @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19521
  
Empty schema path probably related with this IIRC (not double checked):


https://github.com/apache/spark/blob/cca945b6aa679e61864c1cabae91e6ae7703362e/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala#L52-L58


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19521
  
LGTM too BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Oh, I confused with what I'm watching in these days.

For your example, Parquet also doesn't support. We may create an issue for 
both Parquet/ORC on empty schema .
```scala
scala> val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty)
scala> val dfNoCols = spark.createDataFrame(rddNoCols, 
StructType(Seq.empty))
scala> dfNoCols.write.format("parquet").saveAsTable("px")
17/10/18 05:46:17 ERROR Utils: Aborting task
org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema 
with an empty group: message spark_schema {
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19521
  
`SPARK-15474 ` is zero row. The above case is zero column. Are they the 
same issues?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thank you for review, @gatorsmile .

1. The test case was added at #15898 (SPARK-18457). I guess Parquet returns 
`null`, but we had better have explicit test cases. I will try to extend that 
test case for parquet next time.
2. Thanks for bringing that up. Yes. We can resolve that empty ORC file 
issue, SPARK-15474 (ORC data source fails to write and read back empty 
dataframe), with new ORC source by creating an empty file with the correct 
schema, not `struct<>`.

BTW, I've linked all related ORC issues into 
[SPARK-20901](https://issues.apache.org/jira/browse/SPARK-20901) and am working 
on it. You can monitor ORC progress there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19521
  
We can save an empty DataFrame as an ORC table, but we are unable to fetch 
it from the table. 

```Scala
  val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty)
  val dfNoCols = spark.createDataFrame(rddNoCols, StructType(Seq.empty))
  dfNoCols.write.format("orc").saveAsTable("t")
  spark.sql("select 1 from t").show()
```

This is not related to this upgrade, but you might be interested in this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19521
  
Also LGTM

Regarding the test case you posted, does Parquet return `null` or `empty 
string`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Thank you for review, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19521
  
looks good, no new dependencies introduced, just upgrading. cc @srowen to 
double check. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19521
  
Hi, @gatorsmile and @cloud-fan .

This will remove the regression on on-going ORC PRs. 
- [test("Empty schema does not read data from ORC 
file")](https://github.com/apache/spark/pull/18953/files#diff-10e604a9a9d9c4bcc9cdc01049851095R609)

Could you review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82853/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19521
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19521
  
**[Test build #82853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82853/testReport)**
 for PR 19521 at commit 
[`50ec007`](https://github.com/apache/spark/commit/50ec00769263bc7ea732cd0895091ac8701f84b0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19521
  
**[Test build #82853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82853/testReport)**
 for PR 19521 at commit 
[`50ec007`](https://github.com/apache/spark/commit/50ec00769263bc7ea732cd0895091ac8701f84b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org