Hi Hongze,
I am not too familiar with distributed systems in general, but I did
work on using the Arrow Dataset API in the python Dask library which
can work in a distributed way (https://dask.org/).
For dask, we used the second idea of sending serialized data to the
workers, but on the level of
Hi Hongze,
> Does anyone ever try using Arrow Dataset API in a distributed system?
My understanding is the Dataset project was initially was intended for
running on a single node machine. It might be reasonable to extend it to
be useable in a distributed system, but I'll let the primary contrib