Github user omalley commented on the issue:
https://github.com/apache/spark/pull/13257
Ok, I see the problem. Hive's OrcInputFormat has that property, because it
was getting the schema from the ObjectInspector, which only came with the
values. When I get a chance, let me look at
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/20511
Sorry, I forgot to transition the jira issues for the ORC 1.4.3, so they
didn't show up in the search from the notes.
The list of jiras closed by the 1.4.3 release is: https://s.apach
Github user omalley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20511#discussion_r167950837
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
---
@@ -160,6 +160,16 @@ abstract class OrcSuite
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/20511
I'm frustrated with the direction this has gone.
The new reader is much better than the old reader, which uses Hive 1.2. ORC
1.4.3 had a pair of important, but not large or complex
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/18640
@rxin The ORC core library's dependency tree is aggressively kept as small
as possible. I've gone through and excluded unnecessary jars from our
dependencies. I also kick back pull req
Github user omalley commented on the issue:
https://github.com/apache/spark/pull/18640
I would also comment that in the long term, Spark should move to using the
vectorized reader in ORC's core. That would remove the dependence on ORC's
mapreduce module, which provides
Github user omalley commented on a diff in the pull request:
https://github.com/apache/spark/pull/18640#discussion_r133248648
--- Diff: sql/core/pom.xml ---
@@ -87,6 +87,16 @@
+ org.apache.orc
+ orc-core
+ ${orc.classifier