tgravescs commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer 
to speed up Datasets
URL: https://github.com/apache/spark/pull/24515#issuecomment-496983351
 
 
   thanks for posting this, I haven't looked at the code in detail yet, but we 
are also interested in this area.  In particular we are interested in 
converting lambdas and udfs into full catalyst expressions.  Once you have the 
catalyst expression you can do more optimizations with it.  In our cases we 
started to look at this for the columnar processing side of things.  If you 
have the catalyst expression then you can map that into a GPU operation. The 
more time you can  keep the data on the GPU, the more performance you can gain. 
 Copying back and forth is inefficient.  I think that applies for many types of 
columnar processing, if you can keep it in columnar without having to switch 
back and forth to rows, the more benefit you will have.
   
   We had a few people start to look at this. Originally they started with 
javasist but then switched to use JVMCI (but that is only available in jdk > 8 
and very specific oracle versions of jdk8).  The main reason they switched from 
javassit is when there are multiple lambdas within the same class they couldn't 
differentiate them since the lamdba classes generated at runtime can't be 
relfected or instrumented. If you only have 1 lambda per class it was fine.  
I'm not sure how many times this will be an issue but thought I would mention 
it here. 
   Also like you mention scala 2.12 doesn't always SAM-convert lambda 
functions, some of that is documented here: 
https://www.scala-lang.org/news/2.12.0/#java-8-style-bytecode-for-lambdas.
   
   I agree with many of your points that need to be discussed and decided upon. 
 I think if we can keep it pluggable like you are proposing people can try 
different things out. I think one of the main things is to know when to not try 
to convert or give up.  If you do that quickly enough 
   
   I'm very curious of other people experience here, @rednaxelafx have you had 
time to write up your thoughts from previous experience?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to