aokolnychyi commented on issue #24515: [SPARK-14083][WIP] Basic bytecode 
analyzer to speed up Datasets
URL: https://github.com/apache/spark/pull/24515#issuecomment-489523397
 
 
   I like the direction suggested by @rednaxelafx. That’s exactly what we 
planned to do next: identify concrete use cases of the typed API and see how we 
can optimize them (e.g. bytecode analysis, straightforward equivalent untyped 
operations, extending the untyped API, etc). I believe this discussion should 
be driven by use cases.
   
   @viirya the question of what operations can be safely optimized is mostly 
open. I think it is possible to support closures without side-effects (e.g. 
writing to files/logging) that operate on primitives, boxed values, case 
classes/POJOs. Potentially, we can handle side-effects via 
`Invoke`/`StaticInvoke` but I am not sure we want to do that.
   
   @rxin I know at least four companies that tried something around this topic. 
This PR is an attempt to bring everyone to the same table and have a consensus 
on the feasibility of bytecode analysis and future optimizations of the typed 
API. We are all aligned that there should be enough use cases that benefit from 
this and that the implementation must be reliable. Because of the complexity 
and maintenance overhead, I don't consider this to be in Spark directly. At 
most, a separate package.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to