HeartSaVioR commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765876628
> Our internal client needs to pipe streaming read from Kafka through a forked process, and currently with SS users cannot do it. I think the above question also applied to RDD.pipe. It's users' responsibility to make sure the process can understand the input. Yes the question is also applied to RDD.pipe as well, but the serialization is done via `OutputStreamWriter.println` which is relatively "known" - String.valueOf(T) and print it out. Easy to reason about, though it'd show bad performance and tricky to deserialize if toString is implemented human friendly. Here we are serializing via Encoder which is like a black-box to others (Have we documented how encoder encodes the object?) so that can't be users' responsibility. And once we do this, the approach to encode will also become a kind of public API. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
