[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

GitBox Wed, 05 Aug 2020 19:30:07 -0700


HyukjinKwon commented on a change in pull request #29353:
URL: https://github.com/apache/spark/pull/29353#discussion_r466108888




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
##########
@@ -73,135 +75,157 @@ class OrcDeserializer(
    * Creates a writer to write ORC values to Catalyst data structure at the 
given ordinal.
    */
   private def newWriter(
-      dataType: DataType, updater: CatalystDataUpdater): (Int, 
WritableComparable[_]) => Unit =
+      dataType: DataType, reuseObj: Boolean)
+  : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit =
     dataType match {
-      case NullType => (ordinal, _) =>
+      case NullType => (updater, ordinal, _) =>

Review comment:
       @msamirkhan sorry if I rushed to read but why do we need pull this 
updater out here? If the field writers are created once, we don't have to 
bother about reusing the updater for nested types as they are just created once 
and cheap.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

Reply via email to