rxin edited a comment on issue #24795: [SPARK-27945][SQL] Minimal changes to support columnar processing URL: https://github.com/apache/spark/pull/24795#issuecomment-499869992 @revans2 I still feel this is pretty invasive. I think you can still accomplish your goal but reduce the scope quite significantly. If I understand your use case clearly, what you want to do is to build some processing logic completely outside Spark. What you really need there is quite small (not very different from an uber UDF interface): 1. The ability to inspect and transform Catalyst plans, to create your own plans, which already exists 2. The ability for operators to read columnar input, which somewhat exists at the scan level for Parquet and ORC but is a bit hacky at the moment. It can be generalized to support arbitrary operators. 3. A conversion operator that generates ColumnarBatches. I think that's all you need? There is no need to define column specific rules, or create a columnar expression interface. Those expressions are entirely outside Spark, and don't really need to be Spark Expression's. I also don't understand why you added reference counting to these column vectors. Are you doing that because you might have other things outside Spark that wants to hold these vectors? If that's the case, I think those code outside Spark should just copy, rather than holding onto existing vectors. Otherwise, Spark can no longer reuse these vectors, and also would create bigger memory management challenges (why is Spark generating so many vectors that it's not releasing?).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
