AngersZhuuuu commented on issue #25028: [SPARK-28227][SQL] Support TRANSFORM with aggregation. URL: https://github.com/apache/spark/pull/25028#issuecomment-548233362 > simpler test case: > > ``` > FROM (select 1 as key, 100 as value) src > MAP src.*, CAST(src.key % 10 AS INT), src.value > USING 'cat' AS (k, v, one, tvalue); > ``` Found the reason, in current mode of trasform, in here https://github.com/apache/spark/blob/095f7b05fd7ae8ce0d8a82f0c4bc26aa92853762/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1098 It will expand star with t.child.output, after t.child was analyzed, it will have `src.key`, `src.value`, `gen_alias_2` `src.value` output, then expand method will make all t.child's output match `src` as transform's input. Then this error happened. change to ``` // If the script transformation input contains Stars, expand it. case t: ScriptTransformation if containsStar(t.input) => t.copy( input = t.child.output ) ``` This change is reasonable since transform's input is it's child's output.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
