Re: Python SDK Arrow Integrations

2019-03-29 Thread Robert Bradshaw
First off, huge +1 to a good integration with Arrow and Beam. I think to fully realize the benefits we need to have deeper integration than arrow-frame-batches as elements, i.e. SDKs should be augmented to understand arrow frames as batches of individual elements, each with (possibly) their own

Re: Python SDK Arrow Integrations

2019-03-28 Thread Kenneth Knowles
On Thu, Mar 28, 2019 at 12:24 PM Brian Hulette wrote: > > - Presumably there is a pandas counterpart in Java. Is there? Do you > know? > I think there are some dataframe libraries in Java we could look into. I'm > not aware of anything that has the same popularity and arrow integration as >

Re: Python SDK Arrow Integrations

2019-03-28 Thread Ahmet Altay
On Thu, Mar 28, 2019 at 12:24 PM Brian Hulette wrote: > > I think splitting to new transforms rather that adding new options to > existing IO transforms would be simpler for users. I think this would be a > question that could be easier to answer with a PR. > Ok I'll start working on one :) > >

Re: Python SDK Arrow Integrations

2019-03-28 Thread Chamikara Jayalath
On Wed, Mar 27, 2019 at 9:19 PM Kenneth Knowles wrote: > Thinking about Arrow + Beam SQL + schemas: > > - Obviously many SQL operations could be usefully accelerated by arrow / > columnar. Especially in the analytical realm this is the new normal. For > ETL, perhaps less so. > > - Beam SQL

Re: Python SDK Arrow Integrations

2019-03-27 Thread Kenneth Knowles
Thinking about Arrow + Beam SQL + schemas: - Obviously many SQL operations could be usefully accelerated by arrow / columnar. Especially in the analytical realm this is the new normal. For ETL, perhaps less so. - Beam SQL planner (pipeline construction) is implemented in Java, and so the

Re: Python SDK Arrow Integrations

2019-03-27 Thread Ahmet Altay
Thank you Brian, this looks promising. cc: +Chamikara Jayalath +Heejong Lee On Wed, Mar 27, 2019 at 1:22 PM Brian Hulette wrote: > Hi everyone, > I've been doing some investigations into how Arrow might fit into Beam as > a way to ramp up on the project. As I've gone about this I've

Python SDK Arrow Integrations

2019-03-27 Thread Brian Hulette
Hi everyone, I've been doing some investigations into how Arrow might fit into Beam as a way to ramp up on the project. As I've gone about this I've prototyped a couple of additions to the Python SDK. I think these additions may be useful for others so I'm considering cleaning them up and