msamirkhan commented on a change in pull request #29353:
URL: https://github.com/apache/spark/pull/29353#discussion_r466144939
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
##########
@@ -73,135 +75,157 @@ class OrcDeserializer(
* Creates a writer to write ORC values to Catalyst data structure at the
given ordinal.
*/
private def newWriter(
- dataType: DataType, updater: CatalystDataUpdater): (Int,
WritableComparable[_]) => Unit =
+ dataType: DataType, reuseObj: Boolean)
+ : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit =
dataType match {
- case NullType => (ordinal, _) =>
+ case NullType => (updater, ordinal, _) =>
Review comment:
Only the top level field writers are currently being created once. For
each level below, they are created for each data point. Eg, the case for struct
did the following:
`case st: StructType => (ordinal, value) =>`
`...`
` val fieldConverters = st.map(_.dataType).map { dt => newWriter(dt,
fieldUpdater) }.toArray`
`...`
i.e., it returns a function, which when called will create relevant
fieldConverters. Instead, the field converters should be created outside the
function so that they are created only once.
Now, the updaters can not always be reused. Eg, under an array or a map
(i.e., whenever reuseObj turns to false).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]