Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21546
@gatorsmile , this is just the format for Arrow IPC between the JVM and
Python process and although it used the Arrow File format, there is nothing
persisted. There is no real reason to keep both formats, the stream format is
better for our purposes and it's already what is being used for `pandas_udf`s,
so there is unlikely a bug in the Arrow format itself. As with any change, a
bug is possible but this has been tested pretty thouroughly and trying to keep
the old code would get really messy and complicated.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]