zero323 commented on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-546617204 @HyukjinKwon To be honest I have mixed feelings about this. It looks sensible as a _temporary workaround_, but I am not fond of the idea of enforcing notion of `Row` being an unordered dictionary-like object (though with compact dict as standard, that doesn't matter that much), especially when it is close to becoming completely obsolete. Personally I'd prefer to wait a moment and see where the discussion on SPARK-22232 goes. If the resolution is introduction of legacy mode, then the scope of this particular change could be conditioned on it and Python version. If not I'd like to see some memory profiling data (especially memory - timings might be actually better for now, as we skip all the nasty `obj[n]`, but that's not very meaningful*) first. ---- \* Is there any reason why we do this: https://github.com/apache/spark/blob/2115bf61465b504bc21e37465cb34878039b5cb8/python/pyspark/sql/types.py#L615 instead of just `tuple(self)`? That's huge performance bottleneck with wide schemas. Depending on the resolution of this one, that's something to fix, don't you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
