First off, huge +1 to a good integration with Arrow and Beam. I think
to fully realize the benefits we need to have deeper integration than
arrow-frame-batches as elements, i.e. SDKs should be augmented to
understand arrow frames as batches of individual elements, each with
(possibly) their own
On Thu, Mar 28, 2019 at 12:24 PM Brian Hulette wrote:
> > - Presumably there is a pandas counterpart in Java. Is there? Do you
> know?
> I think there are some dataframe libraries in Java we could look into. I'm
> not aware of anything that has the same popularity and arrow integration as
>
On Thu, Mar 28, 2019 at 12:24 PM Brian Hulette wrote:
> > I think splitting to new transforms rather that adding new options to
> existing IO transforms would be simpler for users. I think this would be a
> question that could be easier to answer with a PR.
> Ok I'll start working on one :)
>
>
On Wed, Mar 27, 2019 at 9:19 PM Kenneth Knowles wrote:
> Thinking about Arrow + Beam SQL + schemas:
>
> - Obviously many SQL operations could be usefully accelerated by arrow /
> columnar. Especially in the analytical realm this is the new normal. For
> ETL, perhaps less so.
>
> - Beam SQL
Thinking about Arrow + Beam SQL + schemas:
- Obviously many SQL operations could be usefully accelerated by arrow /
columnar. Especially in the analytical realm this is the new normal. For
ETL, perhaps less so.
- Beam SQL planner (pipeline construction) is implemented in Java, and so
the
Thank you Brian, this looks promising.
cc: +Chamikara Jayalath +Heejong Lee
On Wed, Mar 27, 2019 at 1:22 PM Brian Hulette wrote:
> Hi everyone,
> I've been doing some investigations into how Arrow might fit into Beam as
> a way to ramp up on the project. As I've gone about this I've
Hi everyone,
I've been doing some investigations into how Arrow might fit into Beam as a
way to ramp up on the project. As I've gone about this I've prototyped a
couple of additions to the Python SDK. I think these additions may be
useful for others so I'm considering cleaning them up and