mgaido91 opened a new pull request #24505: [SPARK-27607][SQL] Improve 
Row.toString performance
URL: https://github.com/apache/spark/pull/24505
 
 
   ## What changes were proposed in this pull request?
   
   `Row.toString` is currently causing the useless creation of an `Array` 
containing all the values in the row before generating the string containing 
it. This operation adds a considerable overhead.
   
   The PR proposes to avoid this operation in order to get a faster 
implementation.
   
   ## How was this patch tested?
   
   Run
   
   ```
   test("Row toString perf test") {
       val n = 100000
       val rows = (1 to n).map { i =>
         Row(i, i.toDouble, i.toString, i.toShort, true, null)
       }
       // warmup
       (1 to 10).foreach { _ => rows.foreach(_.toString) }
   
       val times = (1 to 100).map { _ =>
         val t0 = System.nanoTime()
         rows.foreach(_.toString)
         val t1 = System.nanoTime()
         t1 - t0
       }
       // scalastyle:off println
       println(s"Avg time on ${times.length} iterations for $n toString:" +
         s" ${times.sum.toDouble / times.length / 1e6} ms")
       // scalastyle:on println
     }
   ```
   Before the PR:
   ```
   Avg time on 100 iterations for 100000 toString: 61.08408419 ms
   ```
   After the PR:
   ```
   Avg time on 100 iterations for 100000 toString: 48.18608 ms
   ```
   This means the new implementation is about 1.27X faster than the original 
one.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to