HeartSaVioR commented on pull request #31296:
URL: https://github.com/apache/spark/pull/31296#issuecomment-766520145


   Is it too hard requirement to explain the actual use case, especially you've 
said you have internal customer claiming this feature? I don't think my request 
requires anything needed redaction. (If there's something you can abstract the 
details or do some redaction by yourself.) My first comment was asking about 
the actual use case and I have been asking consistently.
   
   I don't think `RDD.pipe` and `Dataset.pipe` is exactly same, at least the 
usability of the default `printRDDElement`. There're lots of users using 
"untyped" Dataset (DataFrame) which the default `printRDDElement` would depend 
on the internal implementation (Row is just an interface). The default 
serializer implementation only works if Dataset has only one column which type 
is matched with Java/Scala type, otherwise they always want to provide the 
serializer implementation. Based on this I wonder we should allow default 
serializer - probably we want to require end users to provide serializer so 
that they should know what they are doing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to