[GitHub] [spark] HeartSaVioR edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

GitBox Fri, 22 Jan 2021 20:22:57 -0800


HeartSaVioR edited a comment on pull request #31296:
URL: https://github.com/apache/spark/pull/31296#issuecomment-765863456



   Could you please describe the actual use case? I would like to confirm this 
works with complicated schema like array/map/nested struct with some binary 
columns, and for the case how forked process can deserialize inputs properly 
and applies operations and serialize again (that wouldn't matter much as the 
return is `Dataset[String]` though).
   
   In addition, this would incur non-trivial serde cost on communication 
between Spark process and external process. Probably we also need to revisit 
which benefits this gives to us compared to what Spark provides now (UDF, or 
some others if I miss?).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

Reply via email to