[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

xuanyuanking Thu, 30 Aug 2018 00:25:12 -0700

Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22222
  
    @cloud-fan @rdblue 
    I want to leave some comments and thoughts during looking into this again, 
hope these can help us deciding the next step plan.
    Currently all the plan assumed input row is `RDD[InternalRow]`, whole 
framework treat columnar read as special case. Also the `inputRDDs` function 
not only be called in `WholeStageCodegenExec`, but also all the father physical 
node, it's very easy to get a mess in the scenario of nested plan during debug 
this fix. So we may have these 3 choices, the first two can totally remove cast 
but maybe have many changes on `CodegenSupport`, the last one can limited the 
changes but still has cast problem:
    1. Erasure the type of `inputRDDs`, because we should allow both 
RDD[InternalRow] and RDD[ColumnarBatch] passed, mainly for the parent physical 
plan call the child. This is implemented as the last commit in this PR: 
https://github.com/apache/spark/pull/22222/files
    2. Refactor the framework to let all plan dealing with columnar batch
    3. Limited the changes in `ColumnarBatchScan`, don't change 
`CodegenSupport`, but still left the cast problem. This is implemented as the 
first two commit in this PR: 
https://github.com/apache/spark/pull/22222/files/7e88599dfc2caf177d12e890d588be68bdd3bc8e
    
    If all of these are not make sense, I'll just close this. Thanks.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

Reply via email to