Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22222
@cloud-fan @rdblue
I want to leave some comments and thoughts during looking into this again,
hope these can help us deciding the next step plan.
Currently all the plan assumed input row is `RDD[InternalRow]`, whole
framework treat columnar read as special case. Also the `inputRDDs` function
not only be called in `WholeStageCodegenExec`, but also all the father physical
node, it's very easy to get a mess in the scenario of nested plan during debug
this fix. So we may have these 3 choices, the first two can totally remove cast
but maybe have many changes on `CodegenSupport`, the last one can limited the
changes but still has cast problem:
1. Erasure the type of `inputRDDs`, because we should allow both
RDD[InternalRow] and RDD[ColumnarBatch] passed, mainly for the parent physical
plan call the child. This is implemented as the last commit in this PR:
https://github.com/apache/spark/pull/22222/files
2. Refactor the framework to let all plan dealing with columnar batch
3. Limited the changes in `ColumnarBatchScan`, don't change
`CodegenSupport`, but still left the cast problem. This is implemented as the
first two commit in this PR:
https://github.com/apache/spark/pull/22222/files/7e88599dfc2caf177d12e890d588be68bdd3bc8e
If all of these are not make sense, I'll just close this. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]