aokolnychyi commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer to speed up Datasets URL: https://github.com/apache/spark/pull/24515#issuecomment-489523397 I like the direction suggested by @rednaxelafx. That’s exactly what we planned to do next: identify concrete use cases of the typed API and see how we can optimize them (e.g. bytecode analysis, straightforward equivalent untyped operations, extending the untyped API, etc). I believe this discussion should be driven by use cases. @viirya the question of what operations can be safely optimized is mostly open. I think it is possible to support closures without side-effects (e.g. writing to files/logging) that operate on primitives, boxed values, case classes/POJOs. Potentially, we can handle side-effects via `Invoke`/`StaticInvoke` but I am not sure we want to do that. @rxin I know at least four companies that tried something around this topic. This PR is an attempt to bring everyone to the same table and have a consensus on the feasibility of bytecode analysis and future optimizations of the typed API. We are all aligned that there should be enough use cases that benefit from this and that the implementation must be reliable. Because of the complexity and maintenance overhead, I don't consider this to be in Spark directly. At most, a separate package.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
