Re: [DISCUSS] [Rust] Python-datafusion

2021-05-05 Thread Andy Grove
Wes, thanks for following up on this and making sure that we are following the process here. I have merged a PR to revert the previous revert, so the Python bindings are now back in the repo. On Tue, May 4, 2021 at 4:14 PM Wes McKinney wrote: > Based on the general@incubator thread, there isn't

Re: [DISCUSS] [Rust] Python-datafusion

2021-05-04 Thread Wes McKinney
Based on the general@incubator thread, there isn't a 100% consensus but I think we can accept the PR as is and move forward. I appreciate everyone's patience On Tue, May 4, 2021 at 10:24 AM Wes McKinney wrote: > > See thread on general@incubator > > https://lists.apache.org/thread.html/r3108dd293

Re: [DISCUSS] [Rust] Python-datafusion

2021-05-04 Thread Wes McKinney
See thread on general@incubator https://lists.apache.org/thread.html/r3108dd293240967cab4d75a8003895b247b3b3b726a7e1e54f3d9b65%40%3Cgeneral.incubator.apache.org%3E On Tue, May 4, 2021 at 9:35 AM Wes McKinney wrote: > > I admit it's an unusual situation to have a single-author codebase > where th

Re: [DISCUSS] [Rust] Python-datafusion

2021-05-04 Thread Wes McKinney
I admit it's an unusual situation to have a single-author codebase where the developer is on the PMC, let's determine what is the protocol for this kind of thing in the future so we don't create unnecessary work for ourselves. On Tue, May 4, 2021 at 9:15 AM Andy Grove wrote: > > I apologize. For

Re: [DISCUSS] [Rust] Python-datafusion

2021-05-04 Thread Andy Grove
I apologize. For some reason, I had thought that because Jorge was the only contributor (except for one contribution fixing a typo in the README) that the IP clearance process did not apply in this case. I will create a PR to revert. On Tue, May 4, 2021 at 8:06 AM Wes McKinney wrote: > Just to

Re: [DISCUSS] [Rust] Python-datafusion

2021-05-04 Thread Wes McKinney
Just to circle back on this. Since this was an independent codebase previously developed over a 10 month period, I had assumed we would be looking at an IP clearance vote, but instead it was just merged into arrow-datafusion. On Tue, Apr 27, 2021 at 10:50 AM Micah Kornfield wrote: > > Hi Jorge, >

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-27 Thread Micah Kornfield
Hi Jorge, This all sounds good to me. It might be nice to test against both the pinned released version of pyarrow and at head if possible. I like the idea of not causing release churn as long as all the underlying libraries are compatible. Thanks for the write up. -Micah On Mon, Apr 26, 2021

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-26 Thread Jorge Cardoso Leitão
Hi Micah, All testing is actually done from Python: create a record batch in pyarrow, push it to datafusion, consume it back in Python, and compare the result using pyarrows' equality. Sometimes parquet is used instead. The library is tested against pyarrow==1 from pypi: we can bump that, but if i

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-26 Thread Alessandro Molina
Would "incorporate" mean that the codebase is moved into the arrow repository or is the plan to keep a separate repository for datafusion-python but under the apache org? On Sun, Apr 25, 2021 at 10:40 PM Daniël Heres wrote: > Hi Jorge, > > Awesome, I think this is a super valuable addition and m

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-25 Thread Daniël Heres
Hi Jorge, Awesome, I think this is a super valuable addition and makes DataFusion much more accessible / approachable for anyone wanting to experiment with DataFusion. Would be very cool to update it to the latest version and include it in the project. Best, Daniël On Sun, Apr 25, 2021, 22:32 M

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-25 Thread Micah Kornfield
Hi Jorge, I think this would certainly be a valuable contribution. How were you thinking of hosting (which repo)/publishing it (maintaintaining a separate wheel)? Also did you have thoughts integration testing with pyarrow? Cheers, Micah On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão < jo

[DISCUSS] [Rust] Python-datafusion

2021-04-25 Thread Jorge Cardoso Leitão
Hi, I fielded a PR [1] to open up a discussion to incorporate python-datafusion [2] into the Apache Arrow project. Python-datafusion is a Python library [3] built on top of DataFusions that enables people to use DataFusion from Python. It leverages the C data interface for zero-cost copy between