[GitHub] [spark] HeartSaVioR commented on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

GitBox Fri, 22 Jan 2021 23:11:52 -0800


HeartSaVioR commented on pull request #31296:
URL: https://github.com/apache/spark/pull/31296#issuecomment-765881095



   I'm sorry, now I understand how this approach avoids such issue. This 
converts InternalRow to object before passing to PipedRDD... At least that 
would work the same with PipedRDD if the type is NOT a representation of Spark 
and preferably custom one. I'd concern about the performance perspective, but I 
totally agree this is the same how PipedRDD does.
   
   It's still a question which is the better approach, top level of API vs 
built-in, as the API haven't existed for such a long time (the lifetime of 2.x) 
and no one might have been complained for that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

Reply via email to