HeartSaVioR commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-766520145
Is it too hard requirement to explain the actual use case, especially you've said you have internal customer claiming this feature? I don't think my request requires anything needed redaction. (If there's something you can abstract the details or do some redaction by yourself.) My first comment was asking about the actual use case and I have been asking consistently. I don't think `RDD.pipe` and `Dataset.pipe` is exactly same, at least the usability of the default `printRDDElement`. There're lots of users using "untyped" Dataset (DataFrame) which the default `printRDDElement` would depend on the internal implementation (Row is just an interface). The default serializer implementation only works if Dataset has only one column which type is matched with Java/Scala type, otherwise they always want to provide the serializer implementation. Based on this I wonder we should allow default serializer - probably we want to require end users to provide serializer so that they should know what they are doing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
