[GitHub] [spark] sunchao commented on a change in pull request #29654: [SPARK-32802][SQL] Avoid using SpecificInternalRow in RunLengthEncoding#Encoder

GitBox Mon, 14 Sep 2020 10:41:19 -0700


sunchao commented on a change in pull request #29654:
URL: https://github.com/apache/spark/pull/29654#discussion_r488110350




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala
##########
@@ -182,8 +181,7 @@ private[columnar] case object RunLengthEncoding extends 
CompressionScheme {
     private var _uncompressedSize = 0
     private var _compressedSize = 0
 
-    // Using `MutableRow` to store the last value to avoid boxing/unboxing 
cost.

Review comment:
       Thanks for the comments @cloud-fan @dongjoon-hyun and @maropu , and yes 
my bad for not putting enough info in the PR description. The comment is pretty 
old and I'm not sure what has changed over the years. By examine flamegraph 
from a test run I observed boxing **both before and after the change**:
   
   Before:
   
   <img width="1196" alt="Screen Shot 2020-09-14 at 10 30 05 AM" 
src="https://user-images.githubusercontent.com/506679/93118585-5cfe5600-f675-11ea-916d-213260709b4a.png";>
   
   After:
   
   <img width="1191" alt="Screen Shot 2020-09-14 at 10 37 05 AM" 
src="https://user-images.githubusercontent.com/506679/93119208-43114300-f676-11ea-875c-df497bacd520.png";>
   
   Even though the time spent on the boxing part is noticeably less after the 
change (it could have sth to do with my sample size though).
   
   Perhaps we can open a JIRA to investigate this further? we could have more 
room for improvement if we can completely eliminate the boxing/unboxing.
   
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #29654: [SPARK-32802][SQL] Avoid using SpecificInternalRow in RunLengthEncoding#Encoder

Reply via email to