HeartSaVioR commented on pull request #31296:
URL: https://github.com/apache/spark/pull/31296#issuecomment-765876628


   > Our internal client needs to pipe streaming read from Kafka through a 
forked process, and currently with SS users cannot do it. I think the above 
question also applied to RDD.pipe. It's users' responsibility to make sure the 
process can understand the input.
   
   Yes the question is also applied to RDD.pipe as well, but the serialization 
is done via `OutputStreamWriter.println` which is relatively "known"  - 
String.valueOf(T) and print it out. Easy to reason about, though it'd show bad 
performance and tricky to deserialize if toString is implemented human friendly.
   
   Here we are serializing via Encoder which is like a black-box to others 
(Have we documented how encoder encodes the object?) so that can't be users' 
responsibility. And once we do this, the approach to encode will also become a 
kind of public API.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to