Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Kouhei Sutou
+1 (binding)

In 
  "[VOTE] Accept donation of Rust DataFusion library for Apache Arrow" on Tue, 
22 Jan 2019 19:05:16 -0600,
  Wes McKinney  wrote:

> Dear all,
> 
> The developers of DataFusion, an analytical query engine written
> in Rust, based on the Arrow columnar memory format, are proposing
> to donate the code to Apache Arrow:
> 
> https://github.com/andygrove/datafusion
> 
> The community has had an opportunity to discuss this [1] and
> there do not seem to be objections to this. Andy Grove has staged
> the code donation in the form of a pull request:
> 
> https://github.com/apache/arrow/pull/3399
> 
> This vote is to determine if the Arrow PMC is in favor of accepting
> this donation. If the vote passes, the PMC and the authors of the code
> will work together to complete the ASF IP Clearance process
> (http://incubator.apache.org/ip-clearance/) and import this Rust
> codebase implementation into Apache Arrow.
> 
> [ ] +1 : Accept contribution of DataFusion Rust library
> [ ]  0 : No opinion
> [ ] -1 : Reject contribution because...
> 
> Here is my vote: +1
> 
> The vote will be open for at least 72 hours.
> 
> Thanks,
> Wes
> 
> [1]: 
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E


[jira] [Created] (ARROW-4366) [Docs] Change extension from format/README.md to format/README.rst

2019-01-24 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-4366:
---

 Summary: [Docs] Change extension from format/README.md to 
format/README.rst
 Key: ARROW-4366
 URL: https://issues.apache.org/jira/browse/ARROW-4366
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Yosuke Shiro
Assignee: Yosuke Shiro


format/README.md is written with the _reST_ syntax.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4365) [Rust] [Parquet] Implement RecordReader

2019-01-24 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4365:
-

 Summary: [Rust] [Parquet] Implement RecordReader
 Key: ARROW-4365
 URL: https://issues.apache.org/jira/browse/ARROW-4365
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


RecordReader reads logical records into memory, this is the prerequisite for 
ColumnReader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Donald E. Foss
+1

On a side note, +1 for Rust in general.

Donald E. Foss (mobile-US ET)

> On Jan 23, 2019, at 6:26 AM, Neville Dipale  wrote:
> 
> Hi Andy,
> 
> +1 : Accept contribution of DataFusion Rust library
> 
> Thanks
> 
>> On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:
>> 
>> Dear all,
>> 
>> The developers of DataFusion, an analytical query engine written
>> in Rust, based on the Arrow columnar memory format, are proposing
>> to donate the code to Apache Arrow:
>> 
>> https://github.com/andygrove/datafusion
>> 
>> The community has had an opportunity to discuss this [1] and
>> there do not seem to be objections to this. Andy Grove has staged
>> the code donation in the form of a pull request:
>> 
>> https://github.com/apache/arrow/pull/3399
>> 
>> This vote is to determine if the Arrow PMC is in favor of accepting
>> this donation. If the vote passes, the PMC and the authors of the code
>> will work together to complete the ASF IP Clearance process
>> (http://incubator.apache.org/ip-clearance/) and import this Rust
>> codebase implementation into Apache Arrow.
>> 
>>[ ] +1 : Accept contribution of DataFusion Rust library
>>[ ]  0 : No opinion
>>[ ] -1 : Reject contribution because...
>> 
>> Here is my vote: +1
>> 
>> The vote will be open for at least 72 hours.
>> 
>> Thanks,
>> Wes
>> 
>> [1]:
>> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
>> 


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Wes McKinney
hi Areg -- no one has worked on a bridge between Rust and C++ but it
should definitely be possible.

I figure we are destined to end up with multiple query engines in the
project from lighter-weight / smaller scope to heavier-weight / larger
scope. Having a first-class embeddable query engine natively in C++ is
very much a "must-do" goal for me

On Thu, Jan 24, 2019 at 12:45 AM Melik-Adamyan, Areg
 wrote:
>
> +1 (non-binding)
>
> Is there a plan for C++ API?
>
> -Original Message-
> From: Renjie Liu [mailto:liurenjie2...@gmail.com]
> Sent: Wednesday, January 23, 2019 7:44 PM
> To: dev@arrow.apache.org
> Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache 
> Arrow
>
> +1 (non-binding)
>
> I also tried to write a similar engine, and glad to merge with datadusion
>
> paddy horan  于 2019年1月24日周四 上午5:29写道:
>
> > +1 (non-binding)
> >
> > Thanks Andy
> >
> > Get Outlook for iOS
> >
> > 
> > From: Chao Sun 
> > Sent: Wednesday, January 23, 2019 1:07 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [VOTE] Accept donation of Rust DataFusion library for
> > Apache Arrow
> >
> > +1 (non-binding)
> >
> > Glad to see this coming and I think it is a great complement to
> > existing modules, e.g., Arrow and Parquet. It also aligns with the
> > overall direction that the project is going.
> >
> > Chao
> >
> > On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:
> >
> > > As far as I know, the majority of the PMC are not actively using
> > > Rust, so as supporting evidence for interest in this donation from
> > > the Rust community, here is a Reddit thread where I talked about
> > > offering
> > DataFusion
> > > for donation recently:
> > >
> > >
> > >
> > https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_
> > query_engine_for_apache/
> > >
> > > There were 69 upvotes and many supportive comments, including a
> > > couple where people specifically mentioned that they liked the fact
> > > that DataFusion uses Arrow. I would hope that this donation leads to
> > > more
> > people
> > > contributing to Arrow.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale
> > > 
> > > wrote:
> > >
> > > > Hi Andy,
> > > >
> > > > +1 : Accept contribution of DataFusion Rust library
> > > >
> > > > Thanks
> > > >
> > > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> > wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > The developers of DataFusion, an analytical query engine written
> > > > > in Rust, based on the Arrow columnar memory format, are
> > > > > proposing to donate the code to Apache Arrow:
> > > > >
> > > > > https://github.com/andygrove/datafusion
> > > > >
> > > > > The community has had an opportunity to discuss this [1] and
> > > > > there do not seem to be objections to this. Andy Grove has
> > > > > staged the code donation in the form of a pull request:
> > > > >
> > > > > https://github.com/apache/arrow/pull/3399
> > > > >
> > > > > This vote is to determine if the Arrow PMC is in favor of
> > > > > accepting this donation. If the vote passes, the PMC and the
> > > > > authors of the
> > code
> > > > > will work together to complete the ASF IP Clearance process
> > > > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > > > codebase implementation into Apache Arrow.
> > > > >
> > > > > [ ] +1 : Accept contribution of DataFusion Rust library [ ] 0 :
> > > > > No opinion [ ] -1 : Reject contribution because...
> > > > >
> > > > > Here is my vote: +1
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > Thanks,
> > > > > Wes
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23
> > e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > > >
> > > >
> > >
> >


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Antoine Pitrou


Given the interest among the Rust community, +1 from me (binding).

Regards

Antoine.


Le 23/01/2019 à 18:29, Andy Grove a écrit :
> As far as I know, the majority of the PMC are not actively using Rust, so
> as supporting evidence for interest in this donation from the Rust
> community, here is a Reddit thread where I talked about offering DataFusion
> for donation recently:
> 
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
> 
> There were 69 upvotes and many supportive comments, including a couple
> where people specifically mentioned that they liked the fact that
> DataFusion uses Arrow. I would hope that this donation leads to more people
> contributing to Arrow.
> 
> Thanks,
> 
> Andy.
> 
> On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> wrote:
> 
>> Hi Andy,
>>
>> +1 : Accept contribution of DataFusion Rust library
>>
>> Thanks
>>
>> On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:
>>
>>> Dear all,
>>>
>>> The developers of DataFusion, an analytical query engine written
>>> in Rust, based on the Arrow columnar memory format, are proposing
>>> to donate the code to Apache Arrow:
>>>
>>> https://github.com/andygrove/datafusion
>>>
>>> The community has had an opportunity to discuss this [1] and
>>> there do not seem to be objections to this. Andy Grove has staged
>>> the code donation in the form of a pull request:
>>>
>>> https://github.com/apache/arrow/pull/3399
>>>
>>> This vote is to determine if the Arrow PMC is in favor of accepting
>>> this donation. If the vote passes, the PMC and the authors of the code
>>> will work together to complete the ASF IP Clearance process
>>> (http://incubator.apache.org/ip-clearance/) and import this Rust
>>> codebase implementation into Apache Arrow.
>>>
>>> [ ] +1 : Accept contribution of DataFusion Rust library
>>> [ ]  0 : No opinion
>>> [ ] -1 : Reject contribution because...
>>>
>>> Here is my vote: +1
>>>
>>> The vote will be open for at least 72 hours.
>>>
>>> Thanks,
>>> Wes
>>>
>>> [1]:
>>>
>> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
>>>
>>
> 


Re: Round-trip of categorical data with Arrow and Parquet

2019-01-24 Thread Hatem Helal
Thanks Wes,

Glad to hear this in your plan.  

I probably should have done this earlier...but here are some JIRA tickets that 
seem to cover this:

https://issues.apache.org/jira/browse/ARROW-3772
https://issues.apache.org/jira/browse/ARROW-3325
https://issues.apache.org/jira/browse/ARROW-3769



On 1/24/19, 4:27 PM, "Wes McKinney"  wrote:

hi Hatem,

There are several issues open about this already (I'll have to dig
them up), so this is something that we have desired for a long time,
but have not gotten around to implementing.

Since many Parquet writers use dictionary encoding, it would make most
sense to have an option to return DictionaryArray (which can be
converted to pandas.Categorical) from any column, and internally we
will perform the conversion from the encoded Parquet format as
efficiently as possible.

There are many cases to consider:

* Dictionary encoded, but different dictionaries in each row group
(this is actually the most likely scenario)
* Dictionary encoded, but the same dictionary in all row groups
* PLAIN encoded data that we pass through DictionaryBuilder as it is
decoded to yield DictionaryArray
* Dictionary encoded, but switch over to PLAIN encoding mid-stream

Having column metadata to automatically "opt in" to the
DictionaryArray conversion sounds reasonable (so long as Arrow readers
have a way to opt out, probably via a global flag to ignore such
custom metadata fields) for usability.

Part of the reason this work was not done in the past was because some
of our hash table machinery was a bit immature. Antoine has recently
improved things significantly, so it should be a lot easier now to do
this work. This is a quite large project, though, and one that affects
a _lot_ of users, so I would be willing to take an initial pass on the
implementation.

Along with completing the nested data read/write path I would say this
is the 2nd highest priority project in parquet-cpp for Arrow users.

- Wes

On Thu, Jan 24, 2019 at 9:59 AM Hatem Helal  
wrote:
>
> Hi everyone,
>
> I wanted to gauge interest and feasibility for adding support for 
natively reading an arrow::DictionaryArray from a parquet file.  Currently, 
writing an arrow::DictionaryArray is read back as the native index type [0].  I 
came across a prior discussion for this problem in the context of pandas [1] 
but I think this would be useful for other arrow clients (C++ or otherwise).
>
> The solution I had in mind would be to add arrow type information as 
column metadata.  This metadata would then be used when reading back the 
parquet file to determine which arrow type to create for the column data.
>
> I’m willing to contribute this feature but first wanted to get some 
feedback on whether this would be generally useful and if the high-level 
proposed solution would make sense.
>
> Thanks!
>
> Hatem
>
>
> [0] This test demonstrates this behavior
> 
https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/arrow-reader-writer-test.cc#L1848
> [1] https://github.com/apache/arrow/issues/1688




Re: A renewed plea for help [was Re: Recruiting more maintainers for Apache Arrow]

2019-01-24 Thread Wes McKinney
hi Antoine,

On Wed, Jan 23, 2019 at 4:35 AM Antoine Pitrou  wrote:
>
> On Tue, 22 Jan 2019 16:57:42 -0600
> Wes McKinney  wrote:
> >
> > There were 1540 patches merged into the project in 2018 (excluding the
> > Parquet merge) -- that's more than 4 patches per day. Evidence
> > suggests that the overall patch count for 2019 will be even higher; if
> > I had to guess somewhere well over 2000. Out of last year's patches, I
> > merged 1028, i.e. 2 out of every 3. If we are to be able to take on
> > 2000 or more patches this year, we'll need more help. If you are
> > neither a committer nor a PMC member, you can still help with code
> > review and discussions to help contributors get their work into
> > merge-ready state.
>
> I generally try to review as many PRs as I feel competent to.
>
> What should be the guideline when some PRs for other implementations
> (such as C#, Java...) are lingering on?

I have generally taken the approach of merging patches when the builds
are passing and there has been some code review. In the case of C# as
an example, we don't have consistent reviewers so I will generally
glance through the code (5 minutes or less) to make sure I see nothing
terribly concerning, or to catch other problems like accidental
changes to other files in rebase conflicts, or binary files
accidentally checked in.

It is also helpful to ping people about stale PRs to keep them engaged.

>
> > I'll do what I have to in order to keep the patches flowing as fast as
> > possible into master, but contributors and other maintainers can help
> > with the Always Be Closing mindset -- the 80/20 rule or 90/10 rule
> > frequently applies. In many cases it is better to merge a patch and
> > open up a JIRA for follow up improvements if there is uncertainty
> > about whether something is "done".
>
> I'm quite wary of technical debt (which can quickly plague fast-growing
> projects) so I tend to be a bit demanding in my reviews :-)

I'm also wary of technical debt -- I am definitely not suggesting to
merge patches that you are not comfortable with! =)

I have noticed that sometimes patches may get left in a broken state
while also falling short of addressing all of the review comments. I
would prefer to see a 90% finished patch with a passing build than any
kind of broken build. Whether or not that last 10% needs to get done
in that patch or in a follow up patch depends.

I frequently will step in to "carry" patches when there are small
fixes necessary to get a passing build so something can be merged.
What "carry" means can depend a lot; e.g. rebasing or fixing lint
errors is common. Ideally contributors will take responsibility for
getting a patch into a merge-ready state in a timely fashion, but not
always.

- Wes

>
> Regards
>
> Antoine.
>
>


Re: Round-trip of categorical data with Arrow and Parquet

2019-01-24 Thread Wes McKinney
hi Hatem,

There are several issues open about this already (I'll have to dig
them up), so this is something that we have desired for a long time,
but have not gotten around to implementing.

Since many Parquet writers use dictionary encoding, it would make most
sense to have an option to return DictionaryArray (which can be
converted to pandas.Categorical) from any column, and internally we
will perform the conversion from the encoded Parquet format as
efficiently as possible.

There are many cases to consider:

* Dictionary encoded, but different dictionaries in each row group
(this is actually the most likely scenario)
* Dictionary encoded, but the same dictionary in all row groups
* PLAIN encoded data that we pass through DictionaryBuilder as it is
decoded to yield DictionaryArray
* Dictionary encoded, but switch over to PLAIN encoding mid-stream

Having column metadata to automatically "opt in" to the
DictionaryArray conversion sounds reasonable (so long as Arrow readers
have a way to opt out, probably via a global flag to ignore such
custom metadata fields) for usability.

Part of the reason this work was not done in the past was because some
of our hash table machinery was a bit immature. Antoine has recently
improved things significantly, so it should be a lot easier now to do
this work. This is a quite large project, though, and one that affects
a _lot_ of users, so I would be willing to take an initial pass on the
implementation.

Along with completing the nested data read/write path I would say this
is the 2nd highest priority project in parquet-cpp for Arrow users.

- Wes

On Thu, Jan 24, 2019 at 9:59 AM Hatem Helal  wrote:
>
> Hi everyone,
>
> I wanted to gauge interest and feasibility for adding support for natively 
> reading an arrow::DictionaryArray from a parquet file.  Currently, writing an 
> arrow::DictionaryArray is read back as the native index type [0].  I came 
> across a prior discussion for this problem in the context of pandas [1] but I 
> think this would be useful for other arrow clients (C++ or otherwise).
>
> The solution I had in mind would be to add arrow type information as column 
> metadata.  This metadata would then be used when reading back the parquet 
> file to determine which arrow type to create for the column data.
>
> I’m willing to contribute this feature but first wanted to get some feedback 
> on whether this would be generally useful and if the high-level proposed 
> solution would make sense.
>
> Thanks!
>
> Hatem
>
>
> [0] This test demonstrates this behavior
> https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/arrow-reader-writer-test.cc#L1848
> [1] https://github.com/apache/arrow/issues/1688


Round-trip of categorical data with Arrow and Parquet

2019-01-24 Thread Hatem Helal
Hi everyone,

I wanted to gauge interest and feasibility for adding support for natively 
reading an arrow::DictionaryArray from a parquet file.  Currently, writing an 
arrow::DictionaryArray is read back as the native index type [0].  I came 
across a prior discussion for this problem in the context of pandas [1] but I 
think this would be useful for other arrow clients (C++ or otherwise).

The solution I had in mind would be to add arrow type information as column 
metadata.  This metadata would then be used when reading back the parquet file 
to determine which arrow type to create for the column data.

I’m willing to contribute this feature but first wanted to get some feedback on 
whether this would be generally useful and if the high-level proposed solution 
would make sense.

Thanks!

Hatem


[0] This test demonstrates this behavior
https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/arrow-reader-writer-test.cc#L1848
[1] https://github.com/apache/arrow/issues/1688


[jira] [Created] (ARROW-4364) [C++] Fix -weverything -wextra compilation errors

2019-01-24 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4364:
-

 Summary: [C++] Fix -weverything -wextra compilation errors
 Key: ARROW-4364
 URL: https://issues.apache.org/jira/browse/ARROW-4364
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.0
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4362) [Java] Test OpenJDK 11 in CI

2019-01-24 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4362:
--

 Summary: [Java] Test OpenJDK 11 in CI
 Key: ARROW-4362
 URL: https://issues.apache.org/jira/browse/ARROW-4362
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Java
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4363) [C++] Add CMake format checks

2019-01-24 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4363:
-

 Summary: [C++] Add CMake format checks
 Key: ARROW-4363
 URL: https://issues.apache.org/jira/browse/ARROW-4363
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration, Developer Tools
Affects Versions: 0.12.0
Reporter: Antoine Pitrou


We should try to standardize the formatting of our CMake files somehow.

The [cmake-format utility|https://github.com/cheshirekow/cmake_format] could 
help.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Krisztián Szűcs
+1 (binding)

On Thu, Jan 24, 2019 at 11:19 AM Uwe L. Korn  wrote:

> +1 (binding) as the Rust community seems to support this.
>
> Uwe
>
> On Thu, Jan 24, 2019, at 7:45 AM, Melik-Adamyan, Areg wrote:
> > +1 (non-binding)
> >
> > Is there a plan for C++ API?
> >
> > -Original Message-
> > From: Renjie Liu [mailto:liurenjie2...@gmail.com]
> > Sent: Wednesday, January 23, 2019 7:44 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [VOTE] Accept donation of Rust DataFusion library for
> Apache Arrow
> >
> > +1 (non-binding)
> >
> > I also tried to write a similar engine, and glad to merge with datadusion
> >
> > paddy horan  于 2019年1月24日周四 上午5:29写道:
> >
> > > +1 (non-binding)
> > >
> > > Thanks Andy
> > >
> > > Get Outlook for iOS
> > >
> > > 
> > > From: Chao Sun 
> > > Sent: Wednesday, January 23, 2019 1:07 PM
> > > To: dev@arrow.apache.org
> > > Subject: Re: [VOTE] Accept donation of Rust DataFusion library for
> > > Apache Arrow
> > >
> > > +1 (non-binding)
> > >
> > > Glad to see this coming and I think it is a great complement to
> > > existing modules, e.g., Arrow and Parquet. It also aligns with the
> > > overall direction that the project is going.
> > >
> > > Chao
> > >
> > > On Wed, Jan 23, 2019 at 9:30 AM Andy Grove 
> wrote:
> > >
> > > > As far as I know, the majority of the PMC are not actively using
> > > > Rust, so as supporting evidence for interest in this donation from
> > > > the Rust community, here is a Reddit thread where I talked about
> > > > offering
> > > DataFusion
> > > > for donation recently:
> > > >
> > > >
> > > >
> > > https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_
> > > query_engine_for_apache/
> > > >
> > > > There were 69 upvotes and many supportive comments, including a
> > > > couple where people specifically mentioned that they liked the fact
> > > > that DataFusion uses Arrow. I would hope that this donation leads to
> > > > more
> > > people
> > > > contributing to Arrow.
> > > >
> > > > Thanks,
> > > >
> > > > Andy.
> > > >
> > > > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale
> > > > 
> > > > wrote:
> > > >
> > > > > Hi Andy,
> > > > >
> > > > > +1 : Accept contribution of DataFusion Rust library
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> > > wrote:
> > > > >
> > > > > > Dear all,
> > > > > >
> > > > > > The developers of DataFusion, an analytical query engine written
> > > > > > in Rust, based on the Arrow columnar memory format, are
> > > > > > proposing to donate the code to Apache Arrow:
> > > > > >
> > > > > > https://github.com/andygrove/datafusion
> > > > > >
> > > > > > The community has had an opportunity to discuss this [1] and
> > > > > > there do not seem to be objections to this. Andy Grove has
> > > > > > staged the code donation in the form of a pull request:
> > > > > >
> > > > > > https://github.com/apache/arrow/pull/3399
> > > > > >
> > > > > > This vote is to determine if the Arrow PMC is in favor of
> > > > > > accepting this donation. If the vote passes, the PMC and the
> > > > > > authors of the
> > > code
> > > > > > will work together to complete the ASF IP Clearance process
> > > > > > (http://incubator.apache.org/ip-clearance/) and import this
> Rust
> > > > > > codebase implementation into Apache Arrow.
> > > > > >
> > > > > > [ ] +1 : Accept contribution of DataFusion Rust library [ ] 0 :
> > > > > > No opinion [ ] -1 : Reject contribution because...
> > > > > >
> > > > > > Here is my vote: +1
> > > > > >
> > > > > > The vote will be open for at least 72 hours.
> > > > > >
> > > > > > Thanks,
> > > > > > Wes
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > > https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23
> > > e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > > > >
> > > > >
> > > >
> > >
>


[jira] [Created] (ARROW-4361) [Website] Update commiters list

2019-01-24 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-4361:
---

 Summary: [Website] Update commiters list
 Key: ARROW-4361
 URL: https://issues.apache.org/jira/browse/ARROW-4361
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Yosuke Shiro
Assignee: Yosuke Shiro






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4360) [C++] Query homebrew for Thrift

2019-01-24 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4360:
--

 Summary: [C++] Query homebrew for Thrift
 Key: ARROW-4360
 URL: https://issues.apache.org/jira/browse/ARROW-4360
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


Also search for LLVM with homebrew when on OSX and THRIFT_HOME is not set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4359) Column metadata is not saved or loaded in parquet

2019-01-24 Thread Seb Fru (JIRA)
Seb Fru created ARROW-4359:
--

 Summary: Column metadata is not saved or loaded in parquet
 Key: ARROW-4359
 URL: https://issues.apache.org/jira/browse/ARROW-4359
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Seb Fru


Hi all,

a while ago I posted this issue:

[Issue-3866|{color:#33}https://issues.apache.org/jira/browse/ARROW-3866{color}]

{color:#33}While working with Pyarrow I encountered another potential bug 
related to column metadata: If I create a table containing columns with 
metadata everything is fine. But after I save the table to parquet and load it 
back as a table using pq.read_table, the column metadata is gone.{color}

 

{color:#33}As of now I can not say yet whether the metadata is not saved 
correctly or not loaded correctly, as I have no idea how to verify it. 
Unfortunately I also don't have the time try a lot, but I wanted to let you 
know anyway.{color}

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4358) [Gandiva][Crossbow] Trusty build broken

2019-01-24 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-4358:
--

 Summary: [Gandiva][Crossbow] Trusty build broken
 Key: ARROW-4358
 URL: https://issues.apache.org/jira/browse/ARROW-4358
 Project: Apache Arrow
  Issue Type: Task
Reporter: Praveen Kumar Desabandu
Assignee: Praveen Kumar Desabandu


As a side effect of 
[https://github.com/apache/arrow/commit/1b8a7bc3baa4bce660c18a13934115d55f8733df,]
 java builds on trusty are broken due to removal of travis maven in this commit.

This Jira is to support both environments..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4357) arrow java build broken on trusty

2019-01-24 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4357:
-

 Summary: arrow java build broken on trusty
 Key: ARROW-4357
 URL: https://issues.apache.org/jira/browse/ARROW-4357
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Pindikura Ravindra
Assignee: Pindikura Ravindra


[https://travis-ci.com/dremio/arrow-build/builds/98435917]

 
SLF4J: The requested version 1.5.6 by your slf4j binding is not compatible with 
[1.6, 1.7]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-24 Thread Uwe L. Korn
+1 (binding) as the Rust community seems to support this.

Uwe

On Thu, Jan 24, 2019, at 7:45 AM, Melik-Adamyan, Areg wrote:
> +1 (non-binding)
> 
> Is there a plan for C++ API?
> 
> -Original Message-
> From: Renjie Liu [mailto:liurenjie2...@gmail.com] 
> Sent: Wednesday, January 23, 2019 7:44 PM
> To: dev@arrow.apache.org
> Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache 
> Arrow
> 
> +1 (non-binding)
> 
> I also tried to write a similar engine, and glad to merge with datadusion
> 
> paddy horan  于 2019年1月24日周四 上午5:29写道:
> 
> > +1 (non-binding)
> >
> > Thanks Andy
> >
> > Get Outlook for iOS
> >
> > 
> > From: Chao Sun 
> > Sent: Wednesday, January 23, 2019 1:07 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [VOTE] Accept donation of Rust DataFusion library for 
> > Apache Arrow
> >
> > +1 (non-binding)
> >
> > Glad to see this coming and I think it is a great complement to 
> > existing modules, e.g., Arrow and Parquet. It also aligns with the 
> > overall direction that the project is going.
> >
> > Chao
> >
> > On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:
> >
> > > As far as I know, the majority of the PMC are not actively using 
> > > Rust, so as supporting evidence for interest in this donation from 
> > > the Rust community, here is a Reddit thread where I talked about 
> > > offering
> > DataFusion
> > > for donation recently:
> > >
> > >
> > >
> > https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_
> > query_engine_for_apache/
> > >
> > > There were 69 upvotes and many supportive comments, including a 
> > > couple where people specifically mentioned that they liked the fact 
> > > that DataFusion uses Arrow. I would hope that this donation leads to 
> > > more
> > people
> > > contributing to Arrow.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> > > 
> > > wrote:
> > >
> > > > Hi Andy,
> > > >
> > > > +1 : Accept contribution of DataFusion Rust library
> > > >
> > > > Thanks
> > > >
> > > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> > wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > The developers of DataFusion, an analytical query engine written 
> > > > > in Rust, based on the Arrow columnar memory format, are 
> > > > > proposing to donate the code to Apache Arrow:
> > > > >
> > > > > https://github.com/andygrove/datafusion
> > > > >
> > > > > The community has had an opportunity to discuss this [1] and 
> > > > > there do not seem to be objections to this. Andy Grove has 
> > > > > staged the code donation in the form of a pull request:
> > > > >
> > > > > https://github.com/apache/arrow/pull/3399
> > > > >
> > > > > This vote is to determine if the Arrow PMC is in favor of 
> > > > > accepting this donation. If the vote passes, the PMC and the 
> > > > > authors of the
> > code
> > > > > will work together to complete the ASF IP Clearance process
> > > > > (http://incubator.apache.org/ip-clearance/) and import this Rust 
> > > > > codebase implementation into Apache Arrow.
> > > > >
> > > > > [ ] +1 : Accept contribution of DataFusion Rust library [ ] 0 : 
> > > > > No opinion [ ] -1 : Reject contribution because...
> > > > >
> > > > > Here is my vote: +1
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > Thanks,
> > > > > Wes
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23
> > e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > > >
> > > >
> > >
> >


[jira] [Created] (ARROW-4356) [CI] Add integration (docker) test for turbodbc

2019-01-24 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4356:
--

 Summary: [CI] Add integration (docker) test for turbodbc
 Key: ARROW-4356
 URL: https://issues.apache.org/jira/browse/ARROW-4356
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Uwe L. Korn
 Fix For: 0.13.0


We regularly break our API so that {{turbodbc}} needs to make minor changes to 
support the new Arrow version. We should setup a small integration test to 
check before a release that {{turbodbc}} can easily upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4355) [C++] test-util functions are no longer part of libarrow

2019-01-24 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4355:
--

 Summary: [C++] test-util functions are no longer part of libarrow
 Key: ARROW-4355
 URL: https://issues.apache.org/jira/browse/ARROW-4355
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.0
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


I have used these functions in other artifacts like {{turbodbc}}. I would like 
to have them back as part of libarrow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)