AngersZhuuuu opened a new pull request #27983: [SPARK-15694][SQL][FOLLOW-UP] 
Implement ScriptTransformation in sql/core (part 1) 
URL: https://github.com/apache/spark/pull/27983
 
 
   ### What changes were proposed in this pull request?
   
    * Renamed `hive/execution/ScriptTransformationExec` to 
`hive/execution/script/HiveScriptTransformationExec`
    * Added ScriptTransformationExec which would run script operator in SQL 
mode (without Hive).
   The output of script would be read as a string and column values are 
extracted by using a delimiter (default : tab character)
    * `ScriptTransformBase` has common code used across 
`ScriptTransformationExec` and `HiveScriptTransformationExec`
    * For thread writing data to script, ScriptTransformationWriterThread has 
the core logic. HiveScriptTransformationWriterThread extends that for Hive 
specific stuff.
    * `ScriptTransformationWriterThread` will be used for Spark SQL. It only 
supports writing data to script process by serializing column values as tab 
delimited string
    * `HiveScriptTransformationWriterThread` additionally supports Hive serde
    * Added a Strategy named Scripts which would emit ScriptTransformationExec 
in physical plans. This would be used in non-Hive mode.
   
   Todo List;
   
   - For Hive, by default only serde's must be used, and for without hive can't 
use serde
   - Cleanup past hacks that are observed (and people suggest / report)
          -  support use transform with aggregation 
[SPARK-28227](https://issues.apache.org/jira/browse/SPARK-28227)
          - support array/map as transform's input 
[SPARK-22435](https://issues.apache.org/jira/browse/SPARK-22435)
   - Use code-gen projection to serialize rows to output stream()
   
   ### Why are the changes needed?
   Support run transform in SQL mode without hive
   
   
   ### Does this PR introduce any user-facing change?
   Yes
   
   
   ### How was this patch tested?
   Added UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to