Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/11956#issuecomment-215696918
  
    @davies I love your idea to use Apache Arrow or other in-memory format 
instead of ```Array[Byte]``` for DataFrame cache. That is what I want to create 
a JIRA entry in near future.
    This idea is related to Part 1 (get data from columnar storage for 
DataFrame cache under ```org.apache.spark.sql.execution.columnar```) in this 
PR. Even if we use any in-memory format, an interface to code generation would 
be ColumnVector. As you can see, we already succeeded to wrap ```Array[Byte]``` 
and Parquet representation by using methods such as ```getInt()``` in 
```ColumnVector```.
    
    In this PR, can we target Part 2 (codegen 
under```org.apache.spark.sql.execution```) of this PR  for 2.0 at least? This 
is because Part 2 could accept any in-memory representation thru 
```ColumnVector```. Part 2 is very small (less than 200 lines) after dropping 
code for Part 1. Thus, it is easy to review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to