Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/19349
Nice job on refactoring `PythonRunner`! I think we should just replace the
arrow file format with stream format for pandas udf instead of having a new
conf to enable it, as long as all the issues are worked out. Along with being a
little faster, it's also easier on memory usage. I'd like to do the same for
`toPandas()` also, but that can be a followup. Is it possible to do away with
the SQLConf and maybe rename some of these classes to be more general, e.g.
`ArrowStreamPythonUDFRunner` -> `ArrowPythonRunner`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]