[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-16 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/20511 I'm frustrated with the direction this has gone. The new reader is much better than the old reader, which uses Hive 1.2. ORC 1.4.3 had a pair of important, but not large or complex fixes

[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-13 Thread omalley
Github user omalley commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r167950837 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,16 @@ abstract class OrcSuite

[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-10 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/20511 Sorry, I forgot to transition the jira issues for the ORC 1.4.3, so they didn't show up in the search from the notes. The list of jiras closed by the 1.4.3 release is: https://s.apache.org

[GitHub] spark pull request #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread omalley
Github user omalley commented on a diff in the pull request: https://github.com/apache/spark/pull/18640#discussion_r133248648 --- Diff: sql/core/pom.xml --- @@ -87,6 +87,16 @@ + org.apache.orc + orc-core + ${orc.classifier

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-08 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 I would also comment that in the long term, Spark should move to using the vectorized reader in ORC's core. That would remove the dependence on ORC's mapreduce module, which provides row by row

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-07 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 @rxin The ORC core library's dependency tree is aggressively kept as small as possible. I've gone through and excluded unnecessary jars from our dependencies. I also kick back pull requests

[GitHub] spark issue #13257: [SPARK-15474][SQL]ORC data source fails to write and rea...

2017-03-01 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/13257 Ok, I see the problem. Hive's OrcInputFormat has that property, because it was getting the schema from the ObjectInspector, which only came with the values. When I get a chance, let me look at what