[GitHub] [spark] HeartSaVioR commented on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

GitBox Sun, 24 Jan 2021 19:43:57 -0800


HeartSaVioR commented on pull request #31296:
URL: https://github.com/apache/spark/pull/31296#issuecomment-766520145



   Is it too hard requirement to explain the actual use case, especially you've 
said you have internal customer claiming this feature? I don't think my request 
requires anything needed redaction. (If there's something you can abstract the 
details or do some redaction by yourself.) My first comment was asking about 
the actual use case and I have been asking consistently.
   
   I don't think `RDD.pipe` and `Dataset.pipe` is exactly same, at least the 
usability of the default `printRDDElement`. There're lots of users using 
"untyped" Dataset (DataFrame) which the default `printRDDElement` would depend 
on the internal implementation (Row is just an interface). The default 
serializer implementation only works if Dataset has only one column which type 
is matched with Java/Scala type, otherwise they always want to provide the 
serializer implementation. Based on this I wonder we should allow default 
serializer - probably we want to require end users to provide serializer so 
that they should know what they are doing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

Reply via email to