viirya commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765878887
> It is an issue because encoder only specifies how an object would map to the internal physical structure of the row, and by exposing this pipe API, we are exposing the structure (serialization) itself. > > We can never change this serialization anymore. So Dataset.map also exposes it? The pipe API only outputs the domain object T as string to forked process. The forked process doesn't touch the physical structure. Just like Dataset.map outputs domain object T to user-provided function. The user-provided function doesn't know about physical structure. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
