Kimahriman commented on PR #41569: URL: https://github.com/apache/spark/pull/41569#issuecomment-1636752962
Attempted a PR for the arrow issue: https://github.com/apache/arrow/pull/36701. Though after doing some digging I think that was only causing one test to fail that's a weird case of trying to convert a double to a string as part of the arrow conversion. Arrow already supports converting pandas series of strings to large_string type (when the numpy type is object), but not a numpy string list (when numpy type is utf8). The former goes through https://github.com/apache/arrow/blob/main/python/pyarrow/src/arrow/python/numpy_to_arrow.cc#L324C9-L324C26 instead of the other `Visit` paths. The other test failures were just due to arrow not having large type support when looking up the numpy type for an arrow type (also added that to the above PR). That can be fixed on the Spark side by just using np.object explicitly for string and binary types, but hitting a weird new test issue I'm trying to figure out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
