Re: Rust bindings for Gandiva

2019-05-11 Thread Renjie Liu
I agree that this should be a separate project, so that this can be used by
other databases written in rust, not only datafusion. Let's start with an
implementation by binding with gandiva, and build pure rust implementation
later.

On Sat, May 11, 2019 at 10:28 PM Andy Grove  wrote:

> Hi Renjie,
>
> I have not started on this but I would be interested in helping you with
> it.
>
> At a high level I think there are two main parts to this work:
>
> 1. Translating DataFusion expressions to Gandiva protobuf
> 2. Implementing the code to make the native C call to Gandiva
>
> I could help with #1 pretty easily.
>
> I am concerned about the packaging implications of this. I feel quite
> strongly that there should be a "pure Rust" version of DataFusion/Arrow and
> that the Gandiva integration should be opt-in somehow, so maybe this is a
> separate project within the repository, or a feature that can be controlled
> by a feature flag in the Cargo.toml somehow.
>
> Thanks,
>
> Andy.
>
>
>
>
> On Sat, May 11, 2019 at 3:10 AM Renjie Liu 
> wrote:
>
>>
>> Hi:
>> @Andy Grove  Are you developing this? I'm
>> interested in this and want to join development.
>>
>> On Tue, Jan 8, 2019 at 3:18 PM Praveen Kumar  wrote:
>>
>>> Agree with Wes, the protobuf based interface should be the language
>>> neutral
>>> way to build expressions with Gandiva.
>>>
>>> On Mon, Jan 7, 2019 at 8:30 PM Andy Grove  wrote:
>>>
>>> > This makes sense to me know that I understand a little more about
>>> Gandiva.
>>> > This also fits well with my proposal to donate DataFusion in the other
>>> > thread. DataFusion can manage the overall logical query plan in Rust
>>> and
>>> > potentially delegate some subset of expression evaluation to Gandiva
>>> via
>>> > protobuf.
>>> >
>>> > Thanks,
>>> >
>>> > Andy.
>>> >
>>> > On Mon, Jan 7, 2019 at 7:51 AM Wes McKinney 
>>> wrote:
>>> >
>>> > > Gandiva supports a Protobuf-based interface -- this is how Java
>>> > > interacts with it via JNI. Rust could do the same -- that would
>>> > > probably be easier than wrapping the C++ class structure. It would
>>> > > also help drive new feature requirements in the serialized
>>> > > projection/filter expression trees
>>> > >
>>> > > - Wes
>>> > >
>>> > > On Mon, Jan 7, 2019 at 3:22 AM Krisztián Szűcs
>>> > >  wrote:
>>> > > >
>>> > > > I'm not sure, that a binding is a good idea. Both Arrow and Parquet
>>> > > > already have their own rust implementation, and a interfacing with
>>> > > > cpp isn't as easy and straightforward than it is with C. Otherwise
>>> > > > We could simply just maintain bindings for all of the cpp
>>> libraries,
>>> > > > rather than of having a hybrid solution.
>>> > > >
>>> > > > While We could spare the reimplementation of gandiva, it'd make
>>> > > > packaging more complicated and rust development way less
>>> > > > welcoming to new contributors.
>>> > > >
>>> > > > On Fri, Jan 4, 2019 at 3:39 PM Andy Grove 
>>> > wrote:
>>> > > >
>>> > > > > Now that the Rust implementation of Arrow is maturing, I'm
>>> interested
>>> > > in
>>> > > > > having bindings for Gandiva for query execution, rather than
>>> > > duplicating
>>> > > > > this in Rust.
>>> > > > >
>>> > > > > I will likely start looking at this soon but wanted to see if
>>> anyone
>>> > > else
>>> > > > > here is particularly interested in this area of functionality?
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Andy.
>>> > > > >
>>> > >
>>> >
>>>
>>
>>
>> --
>> Renjie Liu
>> Software Engineer, MVAD
>>
>

-- 
Renjie Liu
Software Engineer, MVAD


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Philipp Moritz
Congrats Neville!

On Sat, May 11, 2019 at 6:09 PM Renjie Liu  wrote:

> Congrats!
>
> Chao Sun  于 2019年5月12日周日 上午12:38写道:
>
> > Congrats Neville!
> >
> > On Sat, May 11, 2019 at 9:36 AM Micah Kornfield 
> > wrote:
> >
> > > Congrats!!
> > >
> > > On Saturday, May 11, 2019, paddy horan  wrote:
> > >
> > > > Congrats Neville!  Thank you for your contributions!
> > > >
> > > > Get Outlook for iOS
> > > > 
> > > > From: Andy Grove 
> > > > Sent: Saturday, May 11, 2019 11:23 AM
> > > > To: dev@arrow.apache.org
> > > > Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
> > > >
> > > > On behalf of the Arrow PMC, I'm happy to announce that Neville has
> > > >
> > > > accepted an invitation to become a committer on Apache Arrow.
> > > >
> > > > Welcome, and thank you for your contributions!
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Renjie Liu
Congrats!

Chao Sun  于 2019年5月12日周日 上午12:38写道:

> Congrats Neville!
>
> On Sat, May 11, 2019 at 9:36 AM Micah Kornfield 
> wrote:
>
> > Congrats!!
> >
> > On Saturday, May 11, 2019, paddy horan  wrote:
> >
> > > Congrats Neville!  Thank you for your contributions!
> > >
> > > Get Outlook for iOS
> > > 
> > > From: Andy Grove 
> > > Sent: Saturday, May 11, 2019 11:23 AM
> > > To: dev@arrow.apache.org
> > > Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that Neville has
> > >
> > > accepted an invitation to become a committer on Apache Arrow.
> > >
> > > Welcome, and thank you for your contributions!
> > >
> >
>


[jira] [Created] (ARROW-5302) Memory leak when read_table().to_pandas().to_json(orient='records')

2019-05-11 Thread Jorge (JIRA)
Jorge created ARROW-5302:


 Summary: Memory leak when 
read_table().to_pandas().to_json(orient='records')
 Key: ARROW-5302
 URL: https://issues.apache.org/jira/browse/ARROW-5302
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.13.0
 Environment: Linux, Python 3.6.4 :: Anaconda, Inc.
Reporter: Jorge


The following piece of code (running on a Linux, Python 3.6 from anaconda) 
demonstrates a memory leak when reading data from disk.
{code:java}
import resource

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq


# some random data, some of them as array columns
path = 'data.parquet'
batches = 5000
df = pd.DataFrame({
'a': ['AA%d' % i for i in range(batches)],
't': [list(range(0, 180 * 60, 5))] * batches,
'v': list(pd.np.random.normal(10, 0.1, size=(batches, 180 * 60 // 
5))),
'u': ['t'] * batches,
})

pq.write_table(pa.Table.from_pandas(df), path)

# read the data above and convert it to json (e.g. the backend of a restful API)
for i in range(100):
# comment any of the 2 lines for the leak to vanish.
df = pq.read_table(path).to_pandas()
df.to_json(orient='records')
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

{code}
Result :
{code:java}
785560
1065460
1383532
1607676
1924820
...{code}
Relevant pip freeze:

pyarrow (0.13.0)

pandas (0.24.2)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Chao Sun
Congrats Neville!

On Sat, May 11, 2019 at 9:36 AM Micah Kornfield 
wrote:

> Congrats!!
>
> On Saturday, May 11, 2019, paddy horan  wrote:
>
> > Congrats Neville!  Thank you for your contributions!
> >
> > Get Outlook for iOS
> > 
> > From: Andy Grove 
> > Sent: Saturday, May 11, 2019 11:23 AM
> > To: dev@arrow.apache.org
> > Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Neville has
> >
> > accepted an invitation to become a committer on Apache Arrow.
> >
> > Welcome, and thank you for your contributions!
> >
>


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Micah Kornfield
Congrats!!

On Saturday, May 11, 2019, paddy horan  wrote:

> Congrats Neville!  Thank you for your contributions!
>
> Get Outlook for iOS
> 
> From: Andy Grove 
> Sent: Saturday, May 11, 2019 11:23 AM
> To: dev@arrow.apache.org
> Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
>
> On behalf of the Arrow PMC, I'm happy to announce that Neville has
>
> accepted an invitation to become a committer on Apache Arrow.
>
> Welcome, and thank you for your contributions!
>


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread paddy horan
Congrats Neville!  Thank you for your contributions!

Get Outlook for iOS

From: Andy Grove 
Sent: Saturday, May 11, 2019 11:23 AM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow committer: Neville Dipale

On behalf of the Arrow PMC, I'm happy to announce that Neville has

accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


[ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Andy Grove
On behalf of the Arrow PMC, I'm happy to announce that Neville has

accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


Re: Rust bindings for Gandiva

2019-05-11 Thread Andy Grove
Hi Renjie,

I have not started on this but I would be interested in helping you with
it.

At a high level I think there are two main parts to this work:

1. Translating DataFusion expressions to Gandiva protobuf
2. Implementing the code to make the native C call to Gandiva

I could help with #1 pretty easily.

I am concerned about the packaging implications of this. I feel quite
strongly that there should be a "pure Rust" version of DataFusion/Arrow and
that the Gandiva integration should be opt-in somehow, so maybe this is a
separate project within the repository, or a feature that can be controlled
by a feature flag in the Cargo.toml somehow.

Thanks,

Andy.




On Sat, May 11, 2019 at 3:10 AM Renjie Liu  wrote:

>
> Hi:
> @Andy Grove  Are you developing this? I'm
> interested in this and want to join development.
>
> On Tue, Jan 8, 2019 at 3:18 PM Praveen Kumar  wrote:
>
>> Agree with Wes, the protobuf based interface should be the language
>> neutral
>> way to build expressions with Gandiva.
>>
>> On Mon, Jan 7, 2019 at 8:30 PM Andy Grove  wrote:
>>
>> > This makes sense to me know that I understand a little more about
>> Gandiva.
>> > This also fits well with my proposal to donate DataFusion in the other
>> > thread. DataFusion can manage the overall logical query plan in Rust and
>> > potentially delegate some subset of expression evaluation to Gandiva via
>> > protobuf.
>> >
>> > Thanks,
>> >
>> > Andy.
>> >
>> > On Mon, Jan 7, 2019 at 7:51 AM Wes McKinney 
>> wrote:
>> >
>> > > Gandiva supports a Protobuf-based interface -- this is how Java
>> > > interacts with it via JNI. Rust could do the same -- that would
>> > > probably be easier than wrapping the C++ class structure. It would
>> > > also help drive new feature requirements in the serialized
>> > > projection/filter expression trees
>> > >
>> > > - Wes
>> > >
>> > > On Mon, Jan 7, 2019 at 3:22 AM Krisztián Szűcs
>> > >  wrote:
>> > > >
>> > > > I'm not sure, that a binding is a good idea. Both Arrow and Parquet
>> > > > already have their own rust implementation, and a interfacing with
>> > > > cpp isn't as easy and straightforward than it is with C. Otherwise
>> > > > We could simply just maintain bindings for all of the cpp libraries,
>> > > > rather than of having a hybrid solution.
>> > > >
>> > > > While We could spare the reimplementation of gandiva, it'd make
>> > > > packaging more complicated and rust development way less
>> > > > welcoming to new contributors.
>> > > >
>> > > > On Fri, Jan 4, 2019 at 3:39 PM Andy Grove 
>> > wrote:
>> > > >
>> > > > > Now that the Rust implementation of Arrow is maturing, I'm
>> interested
>> > > in
>> > > > > having bindings for Gandiva for query execution, rather than
>> > > duplicating
>> > > > > this in Rust.
>> > > > >
>> > > > > I will likely start looking at this soon but wanted to see if
>> anyone
>> > > else
>> > > > > here is particularly interested in this area of functionality?
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Andy.
>> > > > >
>> > >
>> >
>>
>
>
> --
> Renjie Liu
> Software Engineer, MVAD
>


[jira] [Created] (ARROW-5301) [Python] parquet documentation outdated on nthreads argument

2019-05-11 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5301:


 Summary: [Python] parquet documentation outdated on nthreads 
argument
 Key: ARROW-5301
 URL: https://issues.apache.org/jira/browse/ARROW-5301
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche
 Fix For: 0.14.0


[https://arrow.apache.org/docs/python/parquet.html#multithreaded-reads] still 
mentions {{nthreads}} instead of {{use_threads}}.

 

>From https://github.com/pandas-dev/pandas/issues/26340



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Rust bindings for Gandiva

2019-05-11 Thread Renjie Liu
Hi:
@Andy Grove  Are you developing this? I'm interested
in this and want to join development.

On Tue, Jan 8, 2019 at 3:18 PM Praveen Kumar  wrote:

> Agree with Wes, the protobuf based interface should be the language neutral
> way to build expressions with Gandiva.
>
> On Mon, Jan 7, 2019 at 8:30 PM Andy Grove  wrote:
>
> > This makes sense to me know that I understand a little more about
> Gandiva.
> > This also fits well with my proposal to donate DataFusion in the other
> > thread. DataFusion can manage the overall logical query plan in Rust and
> > potentially delegate some subset of expression evaluation to Gandiva via
> > protobuf.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 7, 2019 at 7:51 AM Wes McKinney  wrote:
> >
> > > Gandiva supports a Protobuf-based interface -- this is how Java
> > > interacts with it via JNI. Rust could do the same -- that would
> > > probably be easier than wrapping the C++ class structure. It would
> > > also help drive new feature requirements in the serialized
> > > projection/filter expression trees
> > >
> > > - Wes
> > >
> > > On Mon, Jan 7, 2019 at 3:22 AM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > I'm not sure, that a binding is a good idea. Both Arrow and Parquet
> > > > already have their own rust implementation, and a interfacing with
> > > > cpp isn't as easy and straightforward than it is with C. Otherwise
> > > > We could simply just maintain bindings for all of the cpp libraries,
> > > > rather than of having a hybrid solution.
> > > >
> > > > While We could spare the reimplementation of gandiva, it'd make
> > > > packaging more complicated and rust development way less
> > > > welcoming to new contributors.
> > > >
> > > > On Fri, Jan 4, 2019 at 3:39 PM Andy Grove 
> > wrote:
> > > >
> > > > > Now that the Rust implementation of Arrow is maturing, I'm
> interested
> > > in
> > > > > having bindings for Gandiva for query execution, rather than
> > > duplicating
> > > > > this in Rust.
> > > > >
> > > > > I will likely start looking at this soon but wanted to see if
> anyone
> > > else
> > > > > here is particularly interested in this area of functionality?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > > >
> > >
> >
>


-- 
Renjie Liu
Software Engineer, MVAD