rxin commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-765879297
> > It is an issue because encoder only specifies how an object would map to the internal physical structure of the row, and by exposing this pipe API, we are exposing the structure (serialization) itself. > > > > > > We can never change this serialization anymore. > > > > So Dataset.map also exposes it? The pipe API only outputs the domain object T as string to forked process. The forked process doesn't know the physical structure. Just like Dataset.map outputs domain object T to user-provided function. The user-provided function doesn't know about physical structure. > > > > Map doesn't expose it because object of type T is just passed into the map function. Pipe would need to serialize T in order to pass it to the different process, wouldn't it? I think that's what @HeartSaVioR was trying to say. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
