sunchao commented on a change in pull request #29654:
URL: https://github.com/apache/spark/pull/29654#discussion_r488110350
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala
##########
@@ -182,8 +181,7 @@ private[columnar] case object RunLengthEncoding extends
CompressionScheme {
private var _uncompressedSize = 0
private var _compressedSize = 0
- // Using `MutableRow` to store the last value to avoid boxing/unboxing
cost.
Review comment:
Thanks for the comments @cloud-fan @dongjoon-hyun and @maropu , and yes
my bad for not putting enough info in the PR description. The comment is pretty
old and I'm not sure what has changed over the years. By examine flamegraph
from a test run I observed boxing **both before and after the change**:
Before:
<img width="1196" alt="Screen Shot 2020-09-14 at 10 30 05 AM"
src="https://user-images.githubusercontent.com/506679/93118585-5cfe5600-f675-11ea-916d-213260709b4a.png">
After:
<img width="1191" alt="Screen Shot 2020-09-14 at 10 37 05 AM"
src="https://user-images.githubusercontent.com/506679/93119208-43114300-f676-11ea-875c-df497bacd520.png">
Even though the time spent on the boxing part is noticeably less after the
change (it could have sth to do with my sample size though).
Perhaps we can open a JIRA to investigate this further? we could have more
room for improvement if we can completely eliminate the boxing/unboxing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]