On 12/20/21 9:01 AM, Tao Wang wrote:
Hi,

I looked through Arrow's docs about its formats and APIs.

But I am still somewhat confused about typical usecases of Arrow.

As in my understanding, the goal of Arrow is to eliminate the (de)serialization 
costs among different data analytic systems, since it has the common format.

But, it still needs some data conversion between Arrow format and language 
native format, right? For example, you have to convert Arrow columnar-based 
format to C++ row-based format. Or is there any usecase to directly conduct 
data analysis on Arrow's format?
Conversion may be required, but the hope is that for many data analytics applications, if the data can be described by the arrow format, then conversion is not needed, and data processing can occur efficiently. Please see examples[1] and cookbook[2] for analytics demonstrations.

Best,
Tao

Hi Tao,
The documentation is still being updated. For an end user, Python documentation [1][2] and Ballista[3] documentation are probably of most interest. The original motivation for Arrow was to develop more efficient data frames that allow for interoperability[4].
Regards,
Benson

[1] https://arrow.apache.org/docs/python/index.html
[2] https://arrow.apache.org/cookbook/py/
[3] https://arrow.apache.org/blog/2021/04/12/ballista-donation/
[4] https://wesmckinney.com/blog/apache-arrow-pandas-internals/

Reply via email to