viirya commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765879767
> Map doesn't expose it because object of type T is just passed into the map function. > > Pipe would need to serialize T in order to pass it to the different process, wouldn't it? I think that's what @HeartSaVioR was trying to say. No~ As I mentioned many times above, Dataset.pipe works like RDD.pipe. For RDD.pipe, there is one parameter `printRDDElement: (T, String => Unit) => Unit`, which is similar to the map function and takes the domain object T. The function takes the object T and uses the second parameter, a print out function, to print the string data. So it is basically like Dataset.map. Here is I'm confused why @HeartSaVioR continues to mention that point. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
