Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/15821
Updated to work with the latest Arrow to prepare for 0.3 release (tests
should fail because that artifact is not yet available). Also improved
consistency of ArrowConverters and did some cleanup. From @rxin 's comments:
> Move ArrowConverters.scala somewhere else that's not top level, e.g.
execution.arrow
It is now in the o.a.s.sql.execution.arrow package
> Update this to arrow 0.3
Ready to do this, should just need to update the pom again
>Use SQLConf rather than a parameter for toPandas.
I removed this flag and used the conf "spark.sql.execution.arrow.enable"
which defaults to "false"
>Handle failure gracefully if arrow is not installed (or somehow package it
with Spark?)
It would be difficult to package with Spark, I think, because pyarrow also
depends on the native Arrow cpp library. I changed it to fail gracefully if
pyarrow is not available. The error message is:
```
ImportError: No module named pyarrow
note: pyarrow must be installed and available on calling Python processif
using spark.sql.execution.arrow.enable=true
```
>How are the memory managed? Who allocates the memory for the arrow
records, and who's responsible for releasing them?
The Java side of Arrow requires using a BufferAllocator class that manages
the allocated memory. An instance of this must be used each time a
ArrowRecordBatch is created and then the batch and allocator must be
released/closed after they have been processed. This is all handled in the
`ArrowConverter` functions. On the Python side, buffers are allocated from the
Arrow cpp library and cleaned up when reference counts to the objects are zero.
The end user does not have to worry about managing any memory.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]