Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/15821
  
    ## Dependency Info
    
    This change does add Apache Arrow as a dependency, specifically the Java 
arrow-vector artifact.  For Python, usage is optional and test are conditional 
on ability to import pyarrow.  The Java Arrow dependency tree is minimal and 
can be found 
[here](https://github.com/apache/spark/files/798133/arrow-vector_deptree.txt).  
It is relatively small, but does include Netty 4.0.41 (Spark currently uses 
netty-all 4.0.42 and doesn't conflict).
    
    Changes to Spark APIs have been kept at a minimal, and all Arrow classes 
have been encapsulated within `o.a.s.sql.ArrowConverters`.  On the Scala side, 
a package private method `toArrowPayloadBytes` has been added to perform the 
conversion to an Arrow 'payload' on the executor JVM.  This would also allow 
uses for the conversion, like with R for instance.  On the Python side, 
additions are a method `collectAsArrow` to collect and serve the Arrow payload 
to Python and a flag on `toPandas` that when enabled, will make use of 
`collectToArrow`.
    
    I know Spark has been burned on other dependencies before, like file 
formats, so I'll just point out how this is different.  Unlike a file on disk, 
Arrow is an in-memory format and is not meant to persist on disk.  So many 
issues that might arise when choosing a file format are not applicable here.  A 
great deal of care has gone in upfront to define the Arrow 
[format](https://github.com/apache/arrow/tree/master/format) so that it can 
remain as stable as possible.  I have also heard from the Arrow community that 
they are fully committed to ensure success in projects like Spark, and meet 
compatibility needs.  I can also attest from first-hand experience that they 
have been incredibly responsive to issues related to the this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to