AngersZhuuuu commented on issue #25028: [SPARK-28227][SQL] Support TRANSFORM 
with aggregation.
URL: https://github.com/apache/spark/pull/25028#issuecomment-548233362
 
 
   > simpler test case:
   > 
   > ```
   > FROM (select 1 as key, 100 as value) src
   > MAP src.*, CAST(src.key % 10 AS INT), src.value
   > USING 'cat' AS (k, v, one, tvalue);
   > ```
   
   Found the reason, in current mode of trasform, in here
   
https://github.com/apache/spark/blob/095f7b05fd7ae8ce0d8a82f0c4bc26aa92853762/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1098
   
   It will expand star with t.child.output, after t.child was analyzed, it will 
have `src.key`, `src.value`, `gen_alias_2` `src.value`  output, then expand 
method will make all t.child's output match `src` as transform's input. Then 
this error happened. 
   
   change to 
   
   ```
         // If the script transformation input contains Stars, expand it.
         case t: ScriptTransformation if containsStar(t.input) =>
           t.copy(
             input = t.child.output
           )
   ```
   
   This change is reasonable since transform's input is it's child's output.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to