msamirkhan commented on a change in pull request #29354:
URL: https://github.com/apache/spark/pull/29354#discussion_r466000481



##########
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala
##########
@@ -638,90 +628,57 @@ class SparkAvroDatumReader[T](
 
   /** Helper functions to create objects */
 
-  private[this] def getRowCreator(st: StructType): () => InternalRow = {
-    val constructorsArray = new Array[() => MutableValue](st.fields.length)
-    var i = 0
-    while (i < st.fields.length) {
-      st.fields(i).dataType match {
-        case BooleanType => constructorsArray(i) = () => new MutableBoolean
-        case ByteType => constructorsArray(i) = () => new MutableByte
-        case ShortType => constructorsArray(i) = () => new MutableShort
-        // We use INT for DATE internally
-        case IntegerType | DateType => constructorsArray(i) = () => new 
MutableInt
-        // We use Long for Timestamp internally
-        case LongType | TimestampType => constructorsArray(i) = () => new 
MutableLong
-        case FloatType => constructorsArray(i) = () => new MutableFloat
-        case DoubleType => constructorsArray(i) = () => new MutableDouble
-        case _ => constructorsArray(i) = () => new MutableAny
-      }
-      i += 1
-    }
-    () => {
-      val array = new Array[MutableValue](constructorsArray.length)
-      var i = 0
-      while (i < constructorsArray.length) {
-        array(i) = constructorsArray(i).apply()
-        i += 1
-      }
-      new SpecificInternalRow(array)

Review comment:
       The profiler showed some time being spent in SpecificInternalRow 
constructor, and we saw improvements when moving to this model where based on 
the schema we can fill in a constructors array and for each data point, call 
these constructors one by one. In retrospect, changes can instead be made to 
the SpecificInternalRow constructor which will benefit #29353 as well. So 
reverting here.
   
   The changes to SpecificInternalRow constructor can be found here: 
https://github.com/apache/spark/pull/29366




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to