GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/14091
[SPARK-16412][SQL] Generate Java code that gets an array in each column of CachedBatch when DataFrame.cache() is called ## What changes were proposed in this pull request? Waiting #11956 to be merged. This PR generates Java code to directly get an array of each column from CachedBatch when DataFrame.cache() is called. This is done in whole stage code generation. When DataFrame.cache() is called, data is stored as column-oriented storage (columnar cache) in CachedBatch. This PR avoid conversion from column-oriented storage to row-oriented storage. This PR handles an array type that is stored into a column. This PR generates code both for row-oriented storage and column-oriented storage only if - InMemoryColumnarTableScan exists in a plan sub-tree. A decision is performed by checking an given iterator is ColumnaIterator at runtime - Sort or join does not exist in a plan sub-tree. This PR generates Java code for columnar cache only if types in all columns, which are accessed in operations, are primitive or an array I will add benchmark suites into [here](https://github.com/kiszk/spark/blob/SPARK-14098/sql/core/src/test/scala/org/apache/spark/sql/DataFrameCacheBenchmark.scala) ## How was this patch tested? Added new tests into `DataFrameCacheSuite.scala` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-16412 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14091.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14091 ---- commit 09af5a5851786b918f45c6f997b1c357745fe883 Author: Kazuaki Ishizaki <ishiz...@jp.ibm.com> Date: 2016-07-07T10:36:14Z support codegen for an array in CachedBatch commit 8e218e38d5acb6c04db221fcd3cd6d2483926552 Author: Kazuaki Ishizaki <ishiz...@jp.ibm.com> Date: 2016-07-07T10:36:34Z update test suites commit 54df41c8691f02dd9eac3eef3d816a130b87a5c9 Author: Kazuaki Ishizaki <ishiz...@jp.ibm.com> Date: 2016-07-07T13:18:58Z remove debug print ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org