rxin edited a comment on issue #24795: [SPARK-27945][SQL] Minimal changes to 
support columnar processing
URL: https://github.com/apache/spark/pull/24795#issuecomment-499869992
 
 
   @revans2 I still feel this is pretty invasive. I think you can still 
accomplish your goal but reduce the scope quite significantly.
   
   If I understand your use case clearly, what you want to do is to build some 
processing logic completely outside Spark. What you really need there is quite 
small (not very different from an uber UDF interface):
   
   1. The ability to inspect and transform Catalyst plans, to create your own 
plans, which already exists
   2. The ability for operators to read columnar input, which somewhat exists 
at the scan level for Parquet and ORC but is a bit hacky at the moment. It can 
be generalized to support arbitrary operators.
   3. A conversion operator that generates ColumnarBatches.
   
   I think that's all you need?
   
   There is no need to define column specific rules, or create a columnar 
expression interface. Those expressions are entirely outside Spark, and don't 
really need to be Spark Expression's.
   
   I also don't understand why you added reference counting to these column 
vectors. Are you doing that because you might have other things outside Spark 
that wants to hold these vectors? If that's the case, I think those code 
outside Spark should just copy, rather than holding onto existing vectors. 
Otherwise, Spark can no longer reuse these vectors, and also would create 
bigger memory management challenges (why is Spark generating so many vectors 
that it's not releasing?).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to