Question regarding scope of Arrow

2017-10-04 Thread paddy horan
Hi All,

I’m hoping someone on this list can comment on the scope of Arrow.  In the 
interview with Wes for O’Reilly he spoke about an “operator kernel library”.  
On the homepage it states that Arrow “enables execution engines to take 
advantage of the latest SIMD…”.  Is this “operator kernel library” a part of 
Arrow or will it be a separate “execution engine” library that is built on top 
of Arrow.  It seems to me that is will be a part of Arrow, is my understanding 
correct?

If this is the case, what is the scope of such a library?  Taking pandas 2.0 as 
an example, do you plan to have pandas be a wrapper around Arrow?  Arrow being 
the “libpandas” referred to in the design document for pandas 2.0 maybe?

“we have not decided” is a valid response to any/all of the questions above.  
Apologies if these are basic questions.  I’m excited about the project and 
where it could go.  I’m an Actuary looking to build an Actuarial modeling 
library on top of Arrow and I would love to contribute.  However, I feel I have 
a lot to learn first.  Is there a better forum for basic questions from would 
be new contributors?  (I won’t be offended if you tell me that there is no 
forum for basic questions, I understand that momentum is important and you are 
all busy moving the project toward 1.0)

Thanks for your time,
Paddy

Sent from Mail for Windows 10



Re: Rust bindings

2018-03-23 Thread paddy horan
Hi Andy,

I’m looking to get involved in contributing to the Rust implementation also, 
would love to see it in the arrow repo sooner rather than later.

Should we identify what needs to be added to iron-Arrow before it’s ready to be 
donated to the Apache repo?


Thanks,
Paddy

Get Outlook for iOS
_
From: Andy Grove 
Sent: Friday, March 23, 2018 9:08 AM
Subject: Rust bindings
To: 


Hi,

Congratulations on the release of the Go bindings for Arrow. I think Rust
should be next ;-)

I've been a bit distracted getting a release out in the day job but am now
working on iron-arrow and getting it ready to integrate with my project. I
hope to be able to put some time in this weekend on this. I don't think it
will be very hard to get to a point where I am at least using the Array
type.

I can commit to working on the Rust bindings moving forward (weekends
mostly) so I think we should go ahead and do this under the arrow repo if
everyone is in agreement.

Thanks,

Andy,




Update to crates.io

2018-10-21 Thread paddy horan
Hi All,


We did not update crates.io for the 0.11 release of the Rust implementation of 
Apache Arrow.  I believe that the Rust version is on the same release cycle as 
the other major implementations as we did release for 0.10 with the other 
implementations.


I don't have permissions to do this, could someone (Andy maybe?) help?  Note, 
this is not urgent but we should keep crates.io updated so that people can see 
that we are making progress on the Rust implementation.


Thanks,

P


Re: Issue with GitHub PR

2018-10-22 Thread paddy horan
Ah, ok thanks.

Get Outlook for iOS<https://aka.ms/o0ukef>


From: Antoine Pitrou 
Sent: Monday, October 22, 2018 11:00 AM
To: dev@arrow.apache.org
Subject: Re: Issue with GitHub PR


Le 22/10/2018 à 16:53, paddy horan a écrit :
> Hey all,
>
> I created a PR for ARROW-3541, after addressing review comments i rebased and 
> force pushed to my branch. GitHub seems to be having issues though, the PR is 
> not updating and i don’t believe CI was re-triggered. Looking at the PR now 
> comments I made this morning are not showing up and comments I deleted 
> because GitHub posted them multiple times are back.

GitHub is currently having issues:
https://status.github.com/messages

There is no point in creating a new PR.

Regards

Antoine.


Issue with GitHub PR

2018-10-22 Thread paddy horan
Hey all,

I created a PR for ARROW-3541, after addressing review comments i rebased and 
force pushed to my branch. GitHub seems to be having issues though, the PR is 
not updating and i don’t believe CI was re-triggered. Looking at the PR now 
comments I made this morning are not showing up and comments I deleted because 
GitHub posted them multiple times are back.

I know we have tooling that relies on the PR name, for instance in JIRA the 
pull-request-available tag has been added to the issue.  Can I rename and 
abandon the PR so I can open a new PR with the correct name to try and get the 
CI to trigger or will this mess up our tooling?

Thanks,
P

Get Outlook for iOS


Re: Update to crates.io

2018-10-22 Thread paddy horan
Great, thanks Andy

Get Outlook for iOS<https://aka.ms/o0ukef>


From: Andy Grove 
Sent: Sunday, October 21, 2018 11:45 PM
To: dev@arrow.apache.org
Subject: Re: Update to crates.io

I'd be happy to take a look at this tomorrow. I will also write up docs on
the process. I should give permissions to some other committers too.

Thanks,

Andy.


On Sun, Oct 21, 2018 at 9:00 PM paddy horan  wrote:

> Hi All,
>
>
> We did not update crates.io for the 0.11 release of the Rust
> implementation of Apache Arrow. I believe that the Rust version is on the
> same release cycle as the other major implementations as we did release for
> 0.10 with the other implementations.
>
>
> I don't have permissions to do this, could someone (Andy maybe?) help?
> Note, this is not urgent but we should keep crates.io updated so that
> people can see that we are making progress on the Rust implementation.
>
>
> Thanks,
>
> P
>


Re: Update to crates.io

2018-10-23 Thread paddy horan
Thanks very much Andy

Get Outlook for iOS<https://aka.ms/o0ukef>


From: Andy Grove 
Sent: Tuesday, October 23, 2018 9:19 PM
To: dev@arrow.apache.org
Subject: Re: Update to crates.io

I have a PR to add instructions to the README (
https://github.com/apache/arrow/pull/2823) and I have published the 0.11.0
release to crates.io using these instructions.

Andy.



On Mon, Oct 22, 2018 at 5:36 AM paddy horan  wrote:

> Great, thanks Andy
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
>
> 
> From: Andy Grove 
> Sent: Sunday, October 21, 2018 11:45 PM
> To: dev@arrow.apache.org
> Subject: Re: Update to crates.io
>
> I'd be happy to take a look at this tomorrow. I will also write up docs on
> the process. I should give permissions to some other committers too.
>
> Thanks,
>
> Andy.
>
>
> On Sun, Oct 21, 2018 at 9:00 PM paddy horan 
> wrote:
>
> > Hi All,
> >
> >
> > We did not update crates.io for the 0.11 release of the Rust
> > implementation of Apache Arrow. I believe that the Rust version is on the
> > same release cycle as the other major implementations as we did release
> for
> > 0.10 with the other implementations.
> >
> >
> > I don't have permissions to do this, could someone (Andy maybe?) help?
> > Note, this is not urgent but we should keep crates.io updated so that
> > people can see that we are making progress on the Rust implementation.
> >
> >
> > Thanks,
> >
> > P
> >
>


Re: [Rust] move parquet into a separate sub-crate

2018-12-29 Thread paddy horan
Ok, thanks Chao.

Sounds good to me then.

P


From: Chao Sun 
Sent: Saturday, December 29, 2018 2:20 AM
To: dev@arrow.apache.org
Subject: Re: [Rust] move parquet into a separate sub-crate

Thanks Paddy. Similarly, I can't see a reason for arrow to reference
parquet since it is all about the in-memory data representation and tools
to build and process the data. In case it does, we probably should move the
logic to the parquet side.

IMO the arrow-parquet integration (e.g., reading from parquet to arrow,
writing arrow into parquet) should happen in parquet side as it involves
encoding/decoding mechanisms which are specific to the latter. Therefore,
parquet needs to depend on arrow. The cargo workspace is pretty flexible
about this so a sub-crate is allowed to depend on the main crate.

On Fri, Dec 28, 2018 at 6:55 PM paddy horan  wrote:

> This seems reasonable.  The flexibility with CI is a positive for sure.
>
> > 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> > depends on arrow, we can't reference parquet in arrow.
>
> This is my only concern, the Rust implementation is evolving rapidly and
> adopting workspaces may reduce our flexibility, I can’t think of a specific
> situation right now but might this be a problem in the future?
>
> If we have this restriction we have to agree on which way dependencies
> flow, what is your preference?
>
> I have not used workspaces in anger but it seems it is designed for crates
> to flow up?  i.e. that arrow would be allowed to reference the parquet
> sub-crate.
>
> P
>
> From: Renjie Liu<mailto:liurenjie2...@gmail.com>
> Sent: Thursday, December 27, 2018 10:09 PM
> To: dev@arrow.apache.org<mailto:dev@arrow.apache.org>
> Subject: Re: [Rust] move parquet into a separate sub-crate
>
> Cool. It may also be worthy to put adapters into a separate crate.
>
> On Fri, Dec 28, 2018 at 4:10 AM Chao Sun  wrote:
>
> > Hi,
> >
> > It just occurs to me that it may be a better idea to move the parquet
> > module into a separate sub-crate by using cargo workspaces
> > <https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html>. The
> > advantage is that we can make the project more modular (in future, we may
> > want to add more sub-crates such as arrow/parquet_derive, orc, gandiva,
> > etc), and allow us to run CI jobs separately on each crate.
> >
> > Some small caveats:
> > 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> > depends on arrow, we can't reference parquet in arrow. This doesn't seem
> > like an issue though since arrow itself should be physical on-disk format
> > independent. I also didn't see any reference on parquet in cpp/src/arrow.
> > 2. The path dependency used in workspace has to be changed to a version
> > number when we do "cargo publish". This should be added to the release
> > instructions and committer who performs the job should do the extra step.
> >
> > Thoughts?
> >
> > Chao
> >
>
>
> --
> Renjie Liu
> Software Engineer, MVAD
>
>


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread paddy horan
+1 (non-binding)

Thanks Andy

Get Outlook for iOS


From: Chao Sun 
Sent: Wednesday, January 23, 2019 1:07 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

+1 (non-binding)

Glad to see this coming and I think it is a great complement to existing
modules, e.g., Arrow and Parquet. It also aligns with the overall direction
that the project is going.

Chao

On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:

> As far as I know, the majority of the PMC are not actively using Rust, so
> as supporting evidence for interest in this donation from the Rust
> community, here is a Reddit thread where I talked about offering DataFusion
> for donation recently:
>
>
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
>
> There were 69 upvotes and many supportive comments, including a couple
> where people specifically mentioned that they liked the fact that
> DataFusion uses Arrow. I would hope that this donation leads to more people
> contributing to Arrow.
>
> Thanks,
>
> Andy.
>
> On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> wrote:
>
> > Hi Andy,
> >
> > +1 : Accept contribution of DataFusion Rust library
> >
> > Thanks
> >
> > On Wed, 23 Jan 2019 at 03:05, Wes McKinney  wrote:
> >
> > > Dear all,
> > >
> > > The developers of DataFusion, an analytical query engine written
> > > in Rust, based on the Arrow columnar memory format, are proposing
> > > to donate the code to Apache Arrow:
> > >
> > > https://github.com/andygrove/datafusion
> > >
> > > The community has had an opportunity to discuss this [1] and
> > > there do not seem to be objections to this. Andy Grove has staged
> > > the code donation in the form of a pull request:
> > >
> > > https://github.com/apache/arrow/pull/3399
> > >
> > > This vote is to determine if the Arrow PMC is in favor of accepting
> > > this donation. If the vote passes, the PMC and the authors of the code
> > > will work together to complete the ASF IP Clearance process
> > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > codebase implementation into Apache Arrow.
> > >
> > > [ ] +1 : Accept contribution of DataFusion Rust library
> > > [ ] 0 : No opinion
> > > [ ] -1 : Reject contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > >
> >
>


Re: [Rust] Adding owners to crates.io for arrow and parquet crates

2019-01-19 Thread paddy horan
Hey Andy,

I assume you are looking for Arrow committers? (Probably PMC is more ideal)

Basically I’m happy to help where I can if I can.

P

Get Outlook for iOS


From: Andy Grove 
Sent: Saturday, January 19, 2019 3:58 PM
To: dev@arrow.apache.org
Subject: [Rust] Adding owners to crates.io for arrow and parquet crates

Currently I am the sole owner of the arrow crate and therefore the only
person who can publish new versions.

It would be good to add some other committers as owners of this and the
parquet crate.

Could we get some volunteers to create accounts at crates.io and then let
me and Chao know your username so we can add you as owners.

Thanks,

Andy.


Re: Timeline for Arrow 0.12.0 release

2018-12-05 Thread paddy horan
I’m traveling at the moment but I’ll look through all the outstanding Jira’s 
tomorrow and organize them.

Paddy

Get Outlook for iOS


From: Andy Grove 
Sent: Tuesday, December 4, 2018 9:58 PM
To: dev@arrow.apache.org
Subject: Re: Timeline for Arrow 0.12.0 release

I'd love to tackle the three related issues for supporting simple
math/comparison operations on primitive arrays and casting primitive arrays
but since the change to use Rust specialization feature I'm a bit stuck and
need some assistance applying the math operations to the numeric types and
not the boolean primitives. I have added a comment to
https://github.com/apache/arrow/pull/3033 ... if I can get help solving for
this PR then I should be able to handle the others. I'll also do some
research and try and figure this out myself.

Andy.






On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney  wrote:

> Andy, Paddy, or other Rust developers -- could you review the 6 issues
> in TODO in the 0.12 backlog and either assign them or move them to the
> next release if they aren't going to be completed this week or next?
>
>
> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney  wrote:
> >
> > hi folks,
> >
> > Tomorrow is December 1. The last major Arrow release (0.11.0) took
> > place on October 8. Given how much work has happened in the project in
> > the last ~2 months, I think it would be great to complete the next
> > major release before the end-of-year holidays set in.
> >
> > I've been curating the JIRA backlog the last couple of weeks, and have
> > just created a 0.12.0 release wiki page to help us stay organized
> >
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release
> >
> > Given that there are only 3 full working weeks between now and
> > Christmas, I think we should be in position to cut a release by the
> > end of the week of December 10, i.e. by Friday December 14. Not all of
> > the TODO issues have to be completed to make the release, but it would
> > be good to push to complete as much as possible. Please help by
> > reviewing the backlog, and if possible, assigning issues to yourself
> > that you'd like to pursue in the next 2 weeks.
> >
> > Let me know if this sounds reasonable, or any concerns.
> >
> > Thanks
> > Wes
>


[RUST] [DISCUSS] Changing type of array lengths

2018-12-06 Thread paddy horan
All,

As part of the PR for ARROW-3347 there was a discussion regarding the type that 
should be used for anything that measures the length of an array, i.e.  len and 
capacity.

The result of this discussion was that the Rust implementation should switch to 
using usize as the type for representing len and capacity.  This would mean 
supporting a way to split larger arrays into smaller array when passing data 
from one implementation to another.  The exact size of these smaller arrays 
would depend on the implementation you are passing data to.  C++ supports 
arrays up to size i64, but **all** implementations support lengths up to i32 as 
specified by the spec.  The full discussion is here:
https://github.com/apache/arrow/pull/2858

This is not a major change so I’ll push it to 0.13 but I wanted to open up the 
discussion before making the change, the previous debate was hidden in a PR.  
In particular, Andy and Chao are you in favor of this change?

Paddy


RE: Timeline for Arrow 0.12.0 release

2018-12-06 Thread paddy horan
Other than Andy’s PR below I’m going to try and find time to work on 
ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.  There 
is nothing else in the 0.12 backlog for Rust.  It would be nice to get the 
parquet merge in though.



Paddy




From: Andy Grove 
Sent: Thursday, December 6, 2018 10:20:48 AM
To: dev@arrow.apache.org
Subject: Re: Timeline for Arrow 0.12.0 release

I have PRs pending for all the Rust issues that I want to get into 0.12.0
and would appreciate some reviews so I can go ahead and merge:

https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and ARROW-3881
- add math and comparison operations to primitive arrays)
https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
process)
https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)

With these in place I plan on writing a tutorial for reading a CSV file,
performing some operations on primitive arrays and writing the output to a
new CSV file.

I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0

Thanks,

Andy.

On Tue, Dec 4, 2018 at 7:57 PM Andy Grove  wrote:

> I'd love to tackle the three related issues for supporting simple
> math/comparison operations on primitive arrays and casting primitive arrays
> but since the change to use Rust specialization feature I'm a bit stuck and
> need some assistance applying the math operations to the numeric types and
> not the boolean primitives. I have added a comment to
> https://github.com/apache/arrow/pull/3033 ... if I can get help solving
> for this PR then I should be able to handle the others. I'll also do some
> research and try and figure this out myself.
>
> Andy.
>
>
>
>
>
>
> On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney  wrote:
>
>> Andy, Paddy, or other Rust developers -- could you review the 6 issues
>> in TODO in the 0.12 backlog and either assign them or move them to the
>> next release if they aren't going to be completed this week or next?
>>
>>
>> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney  wrote:
>> >
>> > hi folks,
>> >
>> > Tomorrow is December 1. The last major Arrow release (0.11.0) took
>> > place on October 8. Given how much work has happened in the project in
>> > the last ~2 months, I think it would be great to complete the next
>> > major release before the end-of-year holidays set in.
>> >
>> > I've been curating the JIRA backlog the last couple of weeks, and have
>> > just created a 0.12.0 release wiki page to help us stay organized
>> >
>> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release
>> >
>> > Given that there are only 3 full working weeks between now and
>> > Christmas, I think we should be in position to cut a release by the
>> > end of the week of December 10, i.e. by Friday December 14. Not all of
>> > the TODO issues have to be completed to make the release, but it would
>> > be good to push to complete as much as possible. Please help by
>> > reviewing the backlog, and if possible, assigning issues to yourself
>> > that you'd like to pursue in the next 2 weeks.
>> >
>> > Let me know if this sounds reasonable, or any concerns.
>> >
>> > Thanks
>> > Wes
>>
>


Re: [RUST] [DISCUSS] Changing type of array lengths

2018-12-09 Thread paddy horan
Thanks All,

I didn't hear any strong opinions against this change so the PR is here:
https://github.com/apache/arrow/pull/3142

Thanks,
Paddy

From: Marco Neumann 
Sent: Friday, December 7, 2018 12:35 PM
To: dev@arrow.apache.org
Subject: Re: [RUST] [DISCUSS] Changing type of array lengths

On windows it depends if it's a 32 or 64 bit binary, like on every other system 
as well.

usize is usually used by Rust containers for indexing (see for example Vec in 
the standard library) and I found it personally very annoying if libraries 
break that rule, because in Rust you have to be explicit about integer 
conversions. You don't have implicit down or up sizings like in C/C++. So you 
cast all back and forth 100 of times just for a single library you use.

On December 7, 2018 6:18:42 PM GMT+01:00, Wes McKinney  
wrote:
>What would be the argument for using usize over i64/u64? Is usize 64
>bits in Rust when compiling on Windows?
>On Fri, Dec 7, 2018 at 9:48 AM Andy Grove 
>wrote:
>>
>> I am in favor of using usize.
>>
>> Thanks.
>>
>> On Thu, Dec 6, 2018 at 7:20 PM paddy horan 
>wrote:
>>
>> > All,
>> >
>> > As part of the PR for ARROW-3347 there was a discussion regarding
>the type
>> > that should be used for anything that measures the length of an
>array,
>> > i.e.  len and capacity.
>> >
>> > The result of this discussion was that the Rust implementation
>should
>> > switch to using usize as the type for representing len and
>capacity.  This
>> > would mean supporting a way to split larger arrays into smaller
>array when
>> > passing data from one implementation to another.  The exact size of
>these
>> > smaller arrays would depend on the implementation you are passing
>data to.
>> > C++ supports arrays up to size i64, but **all** implementations
>support
>> > lengths up to i32 as specified by the spec.  The full discussion is
>here:
>> > https://github.com/apache/arrow/pull/2858
>> >
>> > This is not a major change so I’ll push it to 0.13 but I wanted to
>open up
>> > the discussion before making the change, the previous debate was
>hidden in
>> > a PR.  In particular, Andy and Chao are you in favor of this
>change?
>> >
>> > Paddy
>> >


Re: [DISCUSS] Rust add adapter for parquet

2018-11-21 Thread paddy horan
I was using x86_64-pc-windows-msvc but it was just a quick test, I’ll take a 
closer look when I get a chance. I agree that lack of support for 32 bit should 
not hold this up.

Is the change to the nightly compiler the kind of thing that the PMC should 
vote on?  This could be done in advance of the code donantion.  Specialization, 
in particular, would be really useful within the existing code base.

Paddy

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Chao Sun 
Sent: Wednesday, November 21, 2018 2:42 PM
To: dev@arrow.apache.org
Cc: Wes McKinney; Andy Grove; Ivan Sadikov; Parquet Dev
Subject: Re: [DISCUSS] Rust add adapter for parquet

> A bigger issue is windows support for parquet-rs, Chao – I don’t believe
that parquet-rs supports windows, right? When I test it myself I get
errors regarding clang and libclang which parquet-rs must use.

I think there are some issue regarding clang working with zstd on 32-bit
platform. However, it was able to compile with target: x86_64-pc-windows-msvc,
which seems to be the only one that arrow is using for windows CI. So I
think we can move forward and address the 32-bit platform issue later.

On Wed, Nov 21, 2018 at 10:18 AM paddy horan  wrote:

> I think using nightly is fine for the reasons mentioned already. We
> should switch our CI to still run CI against stable (non fatal) so we know
> when we can make the move back to stable.
>
>
>
> A bigger issue is windows support for parquet-rs, Chao – I don’t believe
> that parquet-rs supports windows, right? When I test it myself I get
> errors regarding clang and libclang which parquet-rs must use.
>
>
>
> We have had windows support pretty much since the beginning for Rust. Is
> it possible to put parquet support behind a feature gate initially and only
> run CI for non-windows? I would be willing to help get windows support
> working after the fact, although I know very little about parquet right now.
>
>
>
> Are there other strategies for dealing with this?
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> 
> From: Chao Sun 
> Sent: Wednesday, November 21, 2018 12:52:32 PM
> To: Wes McKinney
> Cc: Andy Grove; dev@arrow.apache.org; Ivan Sadikov; Parquet Dev
> Subject: Re: [DISCUSS] Rust add adapter for parquet
>
> > Can you remind us all why nightly is required?
>
> Here's a tracking issue <https://github.com/sunchao/parquet-rs/issues/119>
> for all the unstable features parquet-rs uses. I'm personally inclined to
> use nightly since some new features such as specialization makes
> development much easier. Like Andy mentioned, as we are still developing
> arrow + parquet I think we can stay with nightly and transition back to
> stable once major features are implemented and more and more people start
> to use it (hopefully by that time the unstable features are stabilized).
> Moreover, I've seen quite a few popular projects rely on nightly such as
> rocket, tikv, etc., so seems it is not uncommon in the Rust world.
>
> > The steps from here are for you all to get the codebase into a state
> that is ready for donation, including ASF license headers, etc. A pull
> request into apache/arrow would be the best thing
>
> Sure. I'll prepare a pull request in the next few days, and then we can
> proceed to the voting, ICLA, etc. Thanks.
>
> Chao
>
>
> On Wed, Nov 21, 2018 at 7:55 AM Wes McKinney  wrote:
>
> > The steps from here are for you all to get the codebase into a state
> > that is ready for donation, including ASF license headers, etc. A pull
> > request into apache/arrow would be the best thing
> >
> > Then we have to do the following
> >
> > * Vote on the Arrow mailing list
> > * Receive ICLAs from contributors
> > * Complete IP clearance
> > * Merge codebase
> >
> > Let me know when you are ready to move forward. From start to finish
> > that can get done in approximately 6 days if the code is ready
> >
> > Thanks
> > On Wed, Nov 21, 2018 at 9:45 AM Andy Grove 
> wrote:
> > >
> > > Renjie,
> > >
> > > Can you remind us all why nightly is required?
> > >
> > > My personal feeling is that stable is a nice-to-have, but Rust is still
> > moving fast and we are on the bleeding edge here so I'm OK with Arrow
> > relying on nightly for now. Maybe we can have a plan to transition back
> to
> > stable for a future release if we go with nightly now.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > > On Wed, Nov 21, 2018 at 7:37 AM Renjie Liu 
> > wrote:
>

RE: [DISCUSS] Rust add adapter for parquet

2018-11-21 Thread paddy horan
I think using nightly is fine for the reasons mentioned already.  We should 
switch our CI to still run CI against stable (non fatal) so we know when we can 
make the move back to stable.



A bigger issue is windows support for parquet-rs, Chao – I don’t believe that 
parquet-rs supports windows, right?  When I test it myself I get errors 
regarding clang and libclang which parquet-rs must use.



We have had windows support pretty much since the beginning for Rust.  Is it 
possible to put parquet support behind a feature gate initially and only run CI 
for non-windows?  I would be willing to help get windows support working after 
the fact, although I know very little about parquet right now.



Are there other strategies for dealing with this?



Sent from Mail for Windows 10




From: Chao Sun 
Sent: Wednesday, November 21, 2018 12:52:32 PM
To: Wes McKinney
Cc: Andy Grove; dev@arrow.apache.org; Ivan Sadikov; Parquet Dev
Subject: Re: [DISCUSS] Rust add adapter for parquet

> Can you remind us all why nightly is required?

Here's a tracking issue 
for all the unstable features parquet-rs uses. I'm personally inclined to
use nightly since some new features such as specialization makes
development much easier. Like Andy mentioned, as we are still developing
arrow + parquet I think we can stay with nightly and transition back to
stable once major features are implemented and more and more people start
to use it (hopefully by that time the unstable features are stabilized).
Moreover, I've seen quite a few popular projects rely on nightly such as
rocket, tikv, etc., so seems it is not uncommon in the Rust world.

> The steps from here are for you all to get the codebase into a state
that is ready for donation, including ASF license headers, etc. A pull
request into apache/arrow would be the best thing

Sure. I'll prepare a pull request in the next few days, and then we can
proceed to the voting, ICLA, etc. Thanks.

Chao


On Wed, Nov 21, 2018 at 7:55 AM Wes McKinney  wrote:

> The steps from here are for you all to get the codebase into a state
> that is ready for donation, including ASF license headers, etc. A pull
> request into apache/arrow would be the best thing
>
> Then we have to do the following
>
> * Vote on the Arrow mailing list
> * Receive ICLAs from contributors
> * Complete IP clearance
> * Merge codebase
>
> Let me know when you are ready to move forward. From start to finish
> that can get done in approximately 6 days if the code is ready
>
> Thanks
> On Wed, Nov 21, 2018 at 9:45 AM Andy Grove  wrote:
> >
> > Renjie,
> >
> > Can you remind us all why nightly is required?
> >
> > My personal feeling is that stable is a nice-to-have, but Rust is still
> moving fast and we are on the bleeding edge here so I'm OK with Arrow
> relying on nightly for now. Maybe we can have a plan to transition back to
> stable for a future release if we go with nightly now.
> >
> > Thanks,
> >
> > Andy.
> >
> >
> > On Wed, Nov 21, 2018 at 7:37 AM Renjie Liu 
> wrote:
> >>
> >> That sounds great. But parquet-rs currently relies on nightly rust, that
> >> would be the first problem to resolve.
> >>
> >> On Wed, Nov 21, 2018 at 4:49 AM Andy Grove 
> wrote:
> >>
> >> > This sounds like a great idea.
> >> >
> >> > With support for both CSV and Parquet in the Arrow crate, it would be
> nice
> >> > to design a standard interface for Arrow data sources. Maybe this is
> as
> >> > simple as implementing `Iterator`.
> >> >
> >> > Andy.
> >> >
> >> > On Tue, Nov 20, 2018 at 11:46 AM Chao Sun  wrote:
> >> >
> >> > > Yes, we'd be interested to move forward. I'm inclined to merge this
> into
> >> > > Arrow because of the issues that you pointed out with parquet c++
> merge,
> >> > > and I do see a tight relationship between the two projects, and
> potential
> >> > > sharing of common libraries. @Ivan Sadikov 
> what
> >> > > do you think?
> >> > >
> >> > > Chao
> >> > >
> >> > > On Tue, Nov 20, 2018 at 10:23 AM Wes McKinney 
> >> > wrote:
> >> > >
> >> > >> hi folks,
> >> > >>
> >> > >> Would you all be interested in moving forward the parquet-rs
> project?
> >> > >> I have a little more bandwidth to help with the code donation in
> the
> >> > >> next month or two.
> >> > >>
> >> > >> I know we voted on the Parquet mailing list about the donation
> >> > >> already. One big question is whether you want to create an
> >> > >> apache/parquet-rs repository or whether you want to co-develop
> >> > >> parquet-rs together with Arrow in Rust, similar to what we are
> doing
> >> > >> with C++. It's possible you might run into the same kinds of issues
> >> > >> that led us to consider the monorepo arrangement.
> >> > >>
> >> > >> Thanks
> >> > >> Wes
> >> > >> On Sun, Aug 19, 2018 at 11:11 PM Renjie Liu <
> liurenjie2...@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> > Hi, Chao:
> >> > >> > I've opened an 

Re: Rust bindings for Gandiva

2019-01-04 Thread paddy horan
Hey Andy,

I am very interested in this, I’m also looking into adding explicit SIMD to our 
existing “array_ops”.

Maybe we can plan out what is needed on the developer wiki so that we can all 
help out where we are able.

I’ve seen it mentioned here and there but what it the current state of gandiva 
on windows?  I’m willing to help where I can but I’m not very experienced with 
C++/cmake.

P

Get Outlook for iOS

From: Andy Grove 
Sent: Friday, January 4, 2019 9:39 AM
To: dev@arrow.apache.org
Subject: Rust bindings for Gandiva

Now that the Rust implementation of Arrow is maturing, I'm interested in
having bindings for Gandiva for query execution, rather than duplicating
this in Rust.

I will likely start looking at this soon but wanted to see if anyone else
here is particularly interested in this area of functionality?

Thanks,

Andy.


RE: [Rust] move parquet into a separate sub-crate

2018-12-28 Thread paddy horan
This seems reasonable.  The flexibility with CI is a positive for sure.

> 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> depends on arrow, we can't reference parquet in arrow.

This is my only concern, the Rust implementation is evolving rapidly and 
adopting workspaces may reduce our flexibility, I can’t think of a specific 
situation right now but might this be a problem in the future?

If we have this restriction we have to agree on which way dependencies flow, 
what is your preference?

I have not used workspaces in anger but it seems it is designed for crates to 
flow up?  i.e. that arrow would be allowed to reference the parquet sub-crate.

P

From: Renjie Liu
Sent: Thursday, December 27, 2018 10:09 PM
To: dev@arrow.apache.org
Subject: Re: [Rust] move parquet into a separate sub-crate

Cool. It may also be worthy to put adapters into a separate crate.

On Fri, Dec 28, 2018 at 4:10 AM Chao Sun  wrote:

> Hi,
>
> It just occurs to me that it may be a better idea to move the parquet
> module into a separate sub-crate by using cargo workspaces
> . The
> advantage is that we can make the project more modular (in future, we may
> want to add more sub-crates such as arrow/parquet_derive, orc, gandiva,
> etc), and allow us to run CI jobs separately on each crate.
>
> Some small caveats:
> 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> depends on arrow, we can't reference parquet in arrow. This doesn't seem
> like an issue though since arrow itself should be physical on-disk format
> independent. I also didn't see any reference on parquet in cpp/src/arrow.
> 2. The path dependency used in workspace has to be changed to a version
> number when we do "cargo publish". This should be added to the release
> instructions and committer who performs the job should do the extra step.
>
> Thoughts?
>
> Chao
>


--
Renjie Liu
Software Engineer, MVAD



Re: [ANNOUNCE] New Arrow committer: Chao Sun

2019-02-28 Thread paddy horan
Congrats Chao!

Get Outlook for iOS

From: Uwe L. Korn 
Sent: Thursday, February 28, 2019 5:29 AM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow committer: Chao Sun

On behalf of the Arrow PMC, I'm happy to announce that Chao has an
accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


Re: [Rust] [DataFusion] Preferences on futures / threading crates?

2019-03-04 Thread paddy horan
No opposition here.

P

Get Outlook for iOS


From: Andy Grove 
Sent: Sunday, March 3, 2019 11:55 PM
To: dev@arrow.apache.org
Subject: [Rust] [DataFusion] Preferences on futures / threading crates?

I have been working on a PoC of parallel query execution and it is working
well, and I am now starting to create PRs for the various refactors
necessary for this in DataFusion.

I haven't been following the async/await and futures/tokio developments
lately but for the PoC I used tokio-threadpool which seems simple to use.

I just wanted to give everyone a chance to give their thoughts on this
before I get too far with my batch of PRs. Is anyone opposed to using
tokio-threadpool?

Thanks,

Andy.


Assignee on Jira

2019-03-09 Thread paddy horan
Hi All,

Quick question.  I have merged two PR's in the last week (ARROW-2409 and 
ARROW-4791).  In both cases the assignee was left unassigned in JIRA.  I think 
this should be updated for metrics, etc.?

When I go back and try to update it manually I can't seem to find the users 
that contributed the PR's, I can find the usernames in JIRA but when updating 
the "assignee" field they don't appear.

For example, for ARROW-2409 Owen Wilson submitted the patch, his user name is 
"theomn" but when I enter this in the assignee field it says "User 'theomn' 
cannot be assigned issues."

In both cases I believe that the contributors are new contributors, is there a 
change in permissions on their accounts that is needed to allow me to assign 
the issues?

Thanks,
Paddy

p.s. the other contributor was Yu Ding, username "dingelish"


Re: Assignee on Jira

2019-03-09 Thread paddy horan
Correction, the contributors' name was Owen Nelson.  I'd be surprised if Owen 
Wilson was contributing to Arrow...

P

From: paddy horan 
Sent: Saturday, March 9, 2019 8:33 PM
To: dev@arrow.apache.org
Subject: Assignee on Jira

Hi All,

Quick question.  I have merged two PR's in the last week (ARROW-2409 and 
ARROW-4791).  In both cases the assignee was left unassigned in JIRA.  I think 
this should be updated for metrics, etc.?

When I go back and try to update it manually I can't seem to find the users 
that contributed the PR's, I can find the usernames in JIRA but when updating 
the "assignee" field they don't appear.

For example, for ARROW-2409 Owen Wilson submitted the patch, his user name is 
"theomn" but when I enter this in the assignee field it says "User 'theomn' 
cannot be assigned issues."

In both cases I believe that the contributors are new contributors, is there a 
change in permissions on their accounts that is needed to allow me to assign 
the issues?

Thanks,
Paddy

p.s. the other contributor was Yu Ding, username "dingelish"


Re: Assignee on Jira

2019-03-10 Thread paddy horan
Thanks Kou, appreciate it.

P


From: Kouhei Sutou 
Sent: Sunday, March 10, 2019 12:59 AM
To: dev@arrow.apache.org
Subject: Re: Assignee on Jira

Hi,

Yes. We need to add the user to the "contributor" role in
JIRA to assign to the user. Adding an user to the
"contributor" role needs the "administrators" role. There
are PMC members in the "administrators" role.

I've added Owen Nelson and Yu Ding to the "contributor" role
and assigned them to each issue.


Thanks,
--
kou

In 
"Re: Assignee on Jira" on Sat, 9 Mar 2019 20:02:53 -0800,
Micah Kornfield  wrote:

> I don't know the details, but you might need to make them a "contributor"
> in JIRA. It has been mentioned a few times on the mailing list in the past.
>
> On Sat, Mar 9, 2019 at 5:38 PM paddy horan  wrote:
>
>> Correction, the contributors' name was Owen Nelson. I'd be surprised if
>> Owen Wilson was contributing to Arrow...
>>
>> P
>> 
>> From: paddy horan 
>> Sent: Saturday, March 9, 2019 8:33 PM
>> To: dev@arrow.apache.org
>> Subject: Assignee on Jira
>>
>> Hi All,
>>
>> Quick question. I have merged two PR's in the last week (ARROW-2409 and
>> ARROW-4791). In both cases the assignee was left unassigned in JIRA. I
>> think this should be updated for metrics, etc.?
>>
>> When I go back and try to update it manually I can't seem to find the
>> users that contributed the PR's, I can find the usernames in JIRA but when
>> updating the "assignee" field they don't appear.
>>
>> For example, for ARROW-2409 Owen Wilson submitted the patch, his user name
>> is "theomn" but when I enter this in the assignee field it says "User
>> 'theomn' cannot be assigned issues."
>>
>> In both cases I believe that the contributors are new contributors, is
>> there a change in permissions on their accounts that is needed to allow me
>> to assign the issues?
>>
>> Thanks,
>> Paddy
>>
>> p.s. the other contributor was Yu Ding, username "dingelish"
>>


Re: [Rust] Rust 0.13.0 release

2019-02-12 Thread paddy horan
Hi All,

The focus for me for 0.13.0 is SIMD.  I would like to port all the "ops" in 
"array_ops" to the new "compute" module and leverage SIMD for them all.  I have 
most of this done in various forks.

Past 0.13.0 I would really like to work toward getting Rust running in the 
integration tests.  The thing I am most excited about regarding Arrow is the 
concept of defining computational libraries in say Rust and being able to use 
them from any implementation, pyarrow probably for me.  This all starts and 
ends with the integration tests.

Also, Gandiva is fascinating I would love to have robust support for this in 
Rust (via bindings)...

Regards,
P



From: Neville Dipale 
Sent: Tuesday, February 12, 2019 11:33 AM
To: dev@arrow.apache.org
Subject: Re: [Rust] Rust 0.13.0 release

Thanks for bringing this up Andy.

I'm unemployed/on recovery leave, so I've had some surplus time to work on
Rust.

There's a lot of features that I've wanted to work on, some which I've
spent some time attempting, but struggled with. A few block additional work
that I could contribute.

In 0.13.0 and the release thereafter: I'd like to see:

Date/time support. I've spent a lot of time trying to implement this, but I
get the feeling that my Rust isn't good enough yet to pull this together.

More IO support.
I'm working on JSON reader, and want to work on JSON and CSV (continuing
where you left off) writers after this.
With date/time support, I can also work on date/time parsing so we can have
these in CSV and JSON.
Parquet support isn't on my radar at the moment. JSON and CSV are more
commonly used, so I'm hoping that with concrete support for these, more
people using Rust can choose to integrate Arrow. That could bring us more
hands to help.

Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I tried
working on it but failed. Related to this would be array chunking.
I need these in order to be able to operate on "Tables" like CPP, Python
and others. I've got ChunkedArray, Column and Table roughly implemented in
my fork, but without zero-copy slicing, I can't upstream them.

I've made good progress on scalar and array operations. I have trig
functions, some string operators and other functions that one can run on a
Spark-esque dataframe.
These will fit in well with DataFusion's SQL operations, but from a
decision-perspective, I think it would help if we join heads and think
about the direction we want to take on compute.

SIMD is great, and when Paddy's hashed out how it works, more of us will be
able to contribute SIMD compatible compute operators.

Thanks,
Neville

On Tue, 12 Feb 2019 at 18:12, Andy Grove  wrote:

> I was curious what our Rust committers and contributors are excited about
> for 0.13.0.
>
> The feature I would most like to see is that ability for DataFusion to run
> SQL against Parquet files again, as that would give me an excuse for a PoC
> in my day job using Arrow.
>
> I know there were some efforts underway to build arrow array readers for
> Parquet and it would make sense for me to help there.
>
> I would also like to start building out some benchmarks.
>
> I think the SIMD work is exciting too.
>
> I'd like to hear thoughts from everyone else though since we're all coming
> at this from different perspectives.
>
> Thanks,
>
> Andy.
>


Re: [Rust] code style: restrict line width to 90 characters?

2019-01-25 Thread paddy horan
+1 from me

Get Outlook for iOS


From: Renjie Liu 
Sent: Friday, January 25, 2019 7:49 PM
To: dev@arrow.apache.org
Subject: Re: [Rust] code style: restrict line width to 90 characters?

+1 for this suggestio.

Chao Sun  于 2019年1月26日周六 上午2:39写道:

> Hi Neville, there's no limit today: you'll need to add
>
> max_width = 90
> comment_width = 90
>
> to rustfmt.toml to limit both source code and comment to 90 characters.
>
> Chao
>
> On Fri, Jan 25, 2019 at 10:34 AM Neville Dipale 
> wrote:
>
> > Hi Chao,
> >
> > What's the current limit? I just ran rustfmt, and seems like it's not
> > reformatting at 100 characters. I support changing whatever the current
> > width is to 90 characters.
> >
> > Regards
> > Neville
> >
> > On Fri, 25 Jan 2019 at 19:49, Chao Sun  wrote:
> >
> > > Hi Rust developers,
> > >
> > > Just want to know if anyone like the idea to restrict the line width to
> > 90
> > > characters for Rust, similar to the C++ coding style. Personally I
> found
> > it
> > > helpful when you need to keep multiple windows in a monitor. This can
> > > easily be enforced via rustfmt. If there's no objection, I can open a
> > JIRA
> > > for this and apply the change to the existing codebase.
> > >
> > > Thanks,
> > > Chao
> > >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Andy Grove

2019-02-04 Thread paddy horan
Congrats Andy

Get Outlook for iOS


From: Wes McKinney 
Sent: Monday, February 4, 2019 10:39 AM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow PMC member: Andy Grove

The Project Management Committee (PMC) for Apache Arrow has invited
Andy Grove to become a PMC member and we are pleased to announce that
Andy has accepted.

Congratulations and welcome!


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread paddy horan
Congrats Neville!  Thank you for your contributions!

Get Outlook for iOS

From: Andy Grove 
Sent: Saturday, May 11, 2019 11:23 AM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow committer: Neville Dipale

On behalf of the Arrow PMC, I'm happy to announce that Neville has

accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


Re: [ANNOUNCE] New Arrow PMC member: Sebastien Binet

2019-08-14 Thread paddy horan
Congrats Sebastian!

Get Outlook for iOS

From: Wes McKinney 
Sent: Tuesday, August 13, 2019 4:54 PM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow PMC member: Sebastien Binet

The Project Management Committee (PMC) for Apache Arrow has invited
Sebastien Binet to become a PMC member and we are pleased to announce
that Sebastien has accepted.

Congratulations and welcome!


Re: [ANNOUNCE] New Arrow committer: David M Li

2019-08-31 Thread paddy horan
Congrats David

Get Outlook for iOS

From: Ryan Murray 
Sent: Saturday, August 31, 2019 4:14:08 AM
To: dev@arrow.apache.org ; emkornfi...@gmail.com 

Subject: Re: [ANNOUNCE] New Arrow committer: David M Li

Congratulations David!

On Sat, 31 Aug 2019, 03:56 Micah Kornfield,  wrote:

> Congrats David, well desrved.
>
> On Fri, Aug 30, 2019 at 2:02 PM Bryan Cutler  wrote:
>
> > Congrats David!
> >
> > On Fri, Aug 30, 2019 at 10:19 AM Antoine Pitrou 
> > wrote:
> >
> > >
> > > Congratulations David and welcome to the team  :-)
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 30/08/2019 à 18:21, Wes McKinney a écrit :
> > > > On behalf of the Arrow PMC I'm happy to announce that David has
> > > > accepted an invitation to become an Arrow committer!
> > > >
> > > > Welcome, and thank you for your contributions!
> > > >
> > >
> >
>


Assigning Issues to New Users

2019-08-23 Thread paddy horan
Hi All,

I was going to merge a PR last night when I noticed that it was still 
unassigned, I believe it is best practice to make sure all issues are assigned 
on JIRA before merging the corresponding PR?

However, I cannot assign the issue to the user, I believe that I need to change 
his permissions but I don't seem to be able to do this.

In short:
 - is it important to ensure that all issues are assigned on JIRA before 
merging the corresponding PR?
 - can someone with the correct permissions change the settings of the user 
below so that issues can be assigned please?

User: https://issues.apache.org/jira/secure/ViewProfile.jspa?name=andyscho

Thanks,
Paddy


Re: Assigning Issues to New Users

2019-08-23 Thread paddy horan
Thanks Wes,

I think it’s fine the way it is, I just wasn’t sure.  If it becomes a 
distraction to PMC’s we can change policy so that committers can help but it’s 
fairly infrequent so far.

Paddy

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Wes McKinney 
Sent: Friday, August 23, 2019 12:11:46 PM
To: dev@arrow.apache.org 
Subject: Re: Assigning Issues to New Users

hi Paddy,

I just added andyscho to the "Contributor" role on JIRA so you can
assign them the issue now.

You need to be a JIRA administrator on the "Arrow" project to alter
roles -- currently only PMC members are admins. I am not opposed to
letting all committers be Admin on JIRA, but we have never formally
discussed it.

- Wes

On Fri, Aug 23, 2019 at 9:50 AM paddy horan  wrote:
>
> Hi All,
>
> I was going to merge a PR last night when I noticed that it was still 
> unassigned, I believe it is best practice to make sure all issues are 
> assigned on JIRA before merging the corresponding PR?
>
> However, I cannot assign the issue to the user, I believe that I need to 
> change his permissions but I don't seem to be able to do this.
>
> In short:
>  - is it important to ensure that all issues are assigned on JIRA before 
> merging the corresponding PR?
>  - can someone with the correct permissions change the settings of the user 
> below so that issues can be assigned please?
>
> User: https://issues.apache.org/jira/secure/ViewProfile.jspa?name=andyscho
>
> Thanks,
> Paddy


New Users on JIRA

2019-09-05 Thread paddy horan
Hi All,

I have the same issue again where there is a new user (hengruo) that needs 
permissions changed so I can assign an issue.  I know that this was discussed 
recently which leads me to another question.

How do others find previous conversations in the mailing list archives?  I find 
it pretty tedious to navigate the archive when looking for specific threads.  
Do others keep the mail in their e-mail clients for future searching or is 
there some search functionality or tool I am missing?

Thanks,
Paddy


Re: New Users on JIRA

2019-09-05 Thread paddy horan
Thanks on both counts Wes!

From: Wes McKinney 
Sent: Thursday, September 5, 2019 10:52 PM
To: dev 
Subject: Re: New Users on JIRA

hi Paddy,

I keep all the e-mail in Gmail, it's easy to search there.

The Pony Mail interface works well too

https://lists.apache.org/list.html?dev@arrow.apache.org

To assign issues to new users

* Navigate to "JIRA Administration > Projects" in the top right
* Click on "Apache Arrow"
* Click "Users and Roles" on the left
* Click "Add users to role"
* Type user name or username
* Make sure to select "Contributor"
* Click Add

I just took care of this one.

- Wes

On Thu, Sep 5, 2019 at 9:44 PM paddy horan  wrote:
>
> Hi All,
>
> I have the same issue again where there is a new user (hengruo) that needs 
> permissions changed so I can assign an issue.  I know that this was discussed 
> recently which leads me to another question.
>
> How do others find previous conversations in the mailing list archives?  I 
> find it pretty tedious to navigate the archive when looking for specific 
> threads.  Do others keep the mail in their e-mail clients for future 
> searching or is there some search functionality or tool I am missing?
>
> Thanks,
> Paddy


[Rust] Long compile times causing CI to fail

2019-09-07 Thread paddy horan
Hi All,

We have recently had a lot of CI builds fail for Rust due to long compile 
times, this was first pointed out by Francois on the following PR:
https://github.com/apache/arrow/pull/5303

However, it seems unrelated to this change as the following PR's are failing 
for the same reason also:
https://github.com/apache/arrow/pull/5310
https://github.com/apache/arrow/pull/5309

I'm not sure what has changed (4 days ago I posted a PR that did not have this 
issue, https://github.com/apache/arrow/pull/5269) but it would seem that this 
increase might be due to updates to the nightly compiler, I will try to find 
out.  Any other ideas?

Paddy






Options for running the integration tests

2019-08-07 Thread paddy horan
Hi All,

I have been away from Arrow for a while due to relocation of family and RSI.  
I'd like to start working toward getting Rust passing the integration tests.  
In the last few months a lot of work has been done to "dockerize" many of the 
build steps in the project, which I'm trying to figure out.

I started out using the 'arrow_integration_xenial_base' image and submitted a 
PR to allow it to be built from a windows host, but I noticed that there is a 
page in the pyarrow docs related to integration testing 
(https://arrow.apache.org/docs/developers/integration.html) that uses 
docker-compose from the top level of the project.  It seems that the 
'arrow_integration_xenial_base' image is replaced by this solution?

Is there a way to run the integration tests (integration_test.py) in a 
reproducible way via docker at this time?  If not I plan to add the 
dependencies for go and java etc to 'arrow_integration_xenial_base' so that I 
can run integration_test.py in a docker container.

Thanks,
Paddy

Integration Testing — Apache Arrow 
v0.13.0
# Build and run manually docker-compose build cpp docker-compose build python 
docker-compose run python # Using the makefile with proper image dependency 
resolution make -f Makefile.docker python
arrow.apache.org



Re: Options for running the integration tests

2019-08-08 Thread paddy horan
Thanks Antoine,

> Personally I run C++ / Java integration tests locally, without any Docker 
> image. But I wouldn't be able to run the other integration tests...

Right this where I started but I figured it's better to use docker as I'm not 
too familiar with other tool chains and the number of languages supported is 
expanding all the time.  I'm thinking Krisztian is planning to solve this with 
"ursabot", I just wanted to make sure I wasn't missing anything.  I'll plug 
away with the "arrow_integration_xenial_base" image for now.

Paddy


From: Antoine Pitrou 
Sent: Thursday, August 8, 2019 10:50 AM
To: dev@arrow.apache.org 
Subject: Re: Options for running the integration tests

On Wed, 7 Aug 2019 20:29:13 +
paddy horan  wrote:

> Hi All,
>
> I have been away from Arrow for a while due to relocation of family and RSI.  
> I'd like to start working toward getting Rust passing the integration tests.  
> In the last few months a lot of work has been done to "dockerize" many of the 
> build steps in the project, which I'm trying to figure out.
>
> I started out using the 'arrow_integration_xenial_base' image and submitted a 
> PR to allow it to be built from a windows host, but I noticed that there is a 
> page in the pyarrow docs related to integration testing 
> (https://arrow.apache.org/docs/developers/integration.html) that uses 
> docker-compose from the top level of the project.

That documentation page may be confusing things.  It's entitled
"integration testing" but it doesn't seem to talk about integration
tests in the Arrow sense, rather regular unit tests.

> It seems that the 'arrow_integration_xenial_base' image is replaced
> by this solution?

I have no idea.  Perhaps Krisztian knows the answer?
Personally I run C++ / Java integration tests locally, without any
Docker image. But I wouldn't be able to run the other integration
tests...

Regards

Antoine.




Re: Options for running the integration tests

2019-08-08 Thread paddy horan
Thanks Krisztián,

I’ll take a look at setting it up.

P

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Krisztián Szűcs 
Sent: Thursday, August 8, 2019 6:55 PM
To: dev@arrow.apache.org
Subject: Re: Options for running the integration tests

We indeed don't have a docker-compose image for the
"format integration" tests. We can either set it up with
ursabot or the docker-compose, the easiest solution
would be to have the python image as the base image
and install the other language backends, java, rust etc.
then simply run the "format integration" suite.

Because ursabot adoption is under discussion, setting
it up the the docker-compose would require a docker
image like:

```dockerfile
FROM python:3.6

RUN install java
RUN install rust
RUN other backends ...

CMD python arrow/integration/integration_test.py
```

and a corresponding entry in the docker-compose.yml.

On Thu, Aug 8, 2019 at 6:39 PM paddy horan  wrote:

> Thanks Antoine,
>
> > Personally I run C++ / Java integration tests locally, without any
> Docker image. But I wouldn't be able to run the other integration tests...
>
> Right this where I started but I figured it's better to use docker as I'm
> not too familiar with other tool chains and the number of languages
> supported is expanding all the time.  I'm thinking Krisztian is planning to
> solve this with "ursabot", I just wanted to make sure I wasn't missing
> anything.  I'll plug away with the "arrow_integration_xenial_base" image
> for now.
>
> Paddy
>
> 
> From: Antoine Pitrou 
> Sent: Thursday, August 8, 2019 10:50 AM
> To: dev@arrow.apache.org 
> Subject: Re: Options for running the integration tests
>
> On Wed, 7 Aug 2019 20:29:13 +
> paddy horan  wrote:
>
> > Hi All,
> >
> > I have been away from Arrow for a while due to relocation of family and
> RSI.  I'd like to start working toward getting Rust passing the integration
> tests.  In the last few months a lot of work has been done to "dockerize"
> many of the build steps in the project, which I'm trying to figure out.
> >
> > I started out using the 'arrow_integration_xenial_base' image and
> submitted a PR to allow it to be built from a windows host, but I noticed
> that there is a page in the pyarrow docs related to integration testing (
> https://arrow.apache.org/docs/developers/integration.html) that uses
> docker-compose from the top level of the project.
>
> That documentation page may be confusing things.  It's entitled
> "integration testing" but it doesn't seem to talk about integration
> tests in the Arrow sense, rather regular unit tests.
>
> > It seems that the 'arrow_integration_xenial_base' image is replaced
> > by this solution?
>
> I have no idea.  Perhaps Krisztian knows the answer?
> Personally I run C++ / Java integration tests locally, without any
> Docker image. But I wouldn't be able to run the other integration
> tests...
>
> Regards
>
> Antoine.
>
>
>


Re: [ANNOUNCE] New Arrow PMC member: Micah Kornfield

2019-08-09 Thread paddy horan
Congrats Micah!

Get Outlook for iOS

From: Wes McKinney 
Sent: Friday, August 9, 2019 11:12 AM
To: dev@arrow.apache.org
Subject: [ANNOUNCE] New Arrow PMC member: Micah Kornfield

The Project Management Committee (PMC) for Apache Arrow has invited
Micah Kornfield to become a PMC member and we are pleased to announce
that Micah has accepted.

Congratulations and welcome!


Re: [Help Needed] Arrow IPC Reader in Rust

2019-11-16 Thread paddy horan
Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or 
tomorrow).

P


From: Neville Dipale 
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org 
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville


Re: [Help Needed] Arrow IPC Reader in Rust

2019-11-18 Thread paddy horan
I should have mentioned that I pushed the fix to your branch.

P

From: paddy horan 
Sent: Monday, November 18, 2019 3:04 PM
To: dev@arrow.apache.org 
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I had a chance to look at this.  The debugging output is a separate, but 
misleading, issue.  The real cause is the precision of the 32-bit floating 
point values.  The JSON data has 3 decimal places and the array returned from 
the reader has more than 3, this might be due to the fact that we read in 
64-bit floats and cast?

I implemented a quick fix to test and I can pass all tests locally, although I 
will leave it to you to change as I'm not sure where in your process it's best 
to adjust the precision.

Regards,
Paddy

From: paddy horan 
Sent: Saturday, November 16, 2019 1:03 PM
To: dev@arrow.apache.org 
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or 
tomorrow).

P


From: Neville Dipale 
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org 
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville


Re: [Help Needed] Arrow IPC Reader in Rust

2019-11-18 Thread paddy horan
Hey Neville,

I had a chance to look at this.  The debugging output is a separate, but 
misleading, issue.  The real cause is the precision of the 32-bit floating 
point values.  The JSON data has 3 decimal places and the array returned from 
the reader has more than 3, this might be due to the fact that we read in 
64-bit floats and cast?

I implemented a quick fix to test and I can pass all tests locally, although I 
will leave it to you to change as I'm not sure where in your process it's best 
to adjust the precision.

Regards,
Paddy

From: paddy horan 
Sent: Saturday, November 16, 2019 1:03 PM
To: dev@arrow.apache.org 
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or 
tomorrow).

P


From: Neville Dipale 
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org 
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville


Re: [ANNOUNCE] New Arrow committer: Eric Erhardt

2019-10-17 Thread paddy horan
Congrats Eric!


From: Micah Kornfield 
Sent: Thursday, October 17, 2019 12:45:15 PM
To: dev 
Subject: Re: [ANNOUNCE] New Arrow committer: Eric Erhardt

Congrats Eric!

On Thu, Oct 17, 2019 at 6:58 AM Wes McKinney  wrote:

> On behalf of the Arrow PMC, I'm happy to announce that Eric has
> accepted an invitation to become a committer on Apache Arrow.
>
> Welcome, and thank you for your contributions!
>


Re: Can't find myself in contributor list

2019-10-09 Thread paddy horan
It might also be due to our merge tool.  PRs are merged locally and pushed to 
master (with the corresponding PR on github being “closed” rather than 
“merged”).  This might not be reflected in the pulse view.

P


From: Wes McKinney 
Sent: Wednesday, October 9, 2019 4:06:59 PM
To: dev 
Subject: Re: Can't find myself in contributor list

GitHub only shows the top 100 contributors to the project in

https://github.com/apache/arrow/graphs/contributors

Similarly I think you need more commits to show up in the Pulse view

On Wed, Oct 9, 2019 at 2:58 PM Hengruo Zhang  wrote:
>
> Hi,
>
> My two PRs have been already merged to the master branch, but I cannot
> see me in the contributor list of GitHub, even if I narrowed down the
> time span so that there are only less than 50 people. And I can't even
> find my merging in https://github.com/apache/arrow/pulse .
>
> Could you please provide some possible reasons for this?
>
> PRs:
> https://github.com/apache/arrow/pull/5577
> https://github.com/apache/arrow/pull/5303
>
> Thanks,
> Hengruo


Re: [Discuss][Rust] Policy regarding "unsafe"

2020-01-19 Thread paddy horan
I think we are all broadly thinking along the same lines.

I would mention that I don't see "unsafe" as a problem that needs to be removed 
per-se, it has it's place especially in libraries that are lower level like 
Arrow.

I do have a problem with the fact that "value" (which is safe) has a comment in 
the docstring that says "Note this doesn't do any bounds checking for 
performance reasons", I would prefer to see this marked as unsafe.

I'll open some JIRA's and PR's and we can decide on a case by case basis.

Thanks,
Paddy


From: Andy Grove 
Sent: Friday, January 17, 2020 1:44 PM
To: dev 
Subject: Re: [Discuss][Rust] Policy regarding "unsafe"

I agree that we need to audit use of unsafe and think carefully about how
we use it.

This blog post is somewhat tangential since it is primarily about the
actix-web situation with the author quitting open source, but "unsafe"
played a large role in it, and I think this is worth reading.

 https://words.steveklabnik.com/a-sad-day-for-rust

Andy.

On Fri, Jan 17, 2020 at 11:15 AM Neville Dipale 
wrote:

> Hi Paddy, Arrow Developers,
>
> I've given this some thought, and I preliminarily think that perhaps we can
> audit our use of unsafe and evaluate where we can remove it, propagate it
> upwards (and provide safe alternatives) or provide some safety to callers.
>
> Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:
>
>
>1. Add in bounds checking so that we don't need to deal with unsafe at
>all.
>2. Propagate the unsafes up through the code.
>3. Maintain a safe and unsafe version of each function that is currently
>unsafe.
>
>
> I think bounds checking would hurt performance, an example being the
> changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
> ARROW-4670, I believe we were able to get the compiler to auto-vectorise
> due to Array::value() avoiding bounds checks. In the case of compute, we
> are in control of the array length, and so we know that it's safe to skip
> bounds checking. I presume this would largely be the case in tabular-data
> use-cases (because we assert that arrows in a record batch meet certain
> criteria).
>
> From a cursory glance, if we do find that we don't need explicit SIMD
> (still immature in Rust, I've found it difficult to implement in some
> cases), we could potentially reduce our unsafe count by around 20%. The
> flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
> need to maybe adopt option 2 from Paul on IPC once we're done with the
> basics.
>
> We'd then mainly be left with bit manipulation and `Buffer` (which as as
> much unsafe as the fbs generated files). I think the API around buffer
> would depend on whether we're expecting (based on what can be done with
> buffers) this to be exposed to users beyond those using Arrow as a
> development platform.
>
> The above are some of my thoughts, but important's that I don't have a lot
> of experience with Rust, especially `unsafe` and the other dark corners of
> the language.
>
> Regards
> Neville
>
> On Fri, 10 Jan 2020 at 04:13, paddy horan  wrote:
>
> > Hi All,
> >
> > This time last year there was a brief discussion on the usage of unsafe
> in
> > Rust (a user on github raised the issue and I created the JIRA). [1]
> >
> > So far we mostly avoid unsafe in the public API's.  The thinking here is
> > that Arrow is a "development platform", i.e. lower level that most
> > libraries, and library builders will want to avoid any performance hit of
> > bounds checking, etc.
> >
> > This is not typical in the Rust community where unsafe is a clear signal
> > that care is needed.  Although it might clutter the API a little more I
> > would be in favor of having safe and unsafe variants of methods as
> needed.
> > For instance, "value" for array access would be changed to "value" and
> > "value_unchecked" where the latter is unsafe and does not perform bounds
> > checks.
> >
> > We don't have a huge number of libraries building on top of Arrow in Rust
> > at the moment so it seems like a good time, before 1.0, to decide on this
> > to avoid breaking changes to the public API in post 1.0.
> >
> > Thoughts?
> >
> > Paddy
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
> >
> >
>


[Discuss][Rust] Policy regarding "unsafe"

2020-01-09 Thread paddy horan
Hi All,

This time last year there was a brief discussion on the usage of unsafe in Rust 
(a user on github raised the issue and I created the JIRA). [1]

So far we mostly avoid unsafe in the public API's.  The thinking here is that 
Arrow is a "development platform", i.e. lower level that most libraries, and 
library builders will want to avoid any performance hit of bounds checking, etc.

This is not typical in the Rust community where unsafe is a clear signal that 
care is needed.  Although it might clutter the API a little more I would be in 
favor of having safe and unsafe variants of methods as needed.  For instance, 
"value" for array access would be changed to "value" and "value_unchecked" 
where the latter is unsafe and does not perform bounds checks.

We don't have a huge number of libraries building on top of Arrow in Rust at 
the moment so it seems like a good time, before 1.0, to decide on this to avoid 
breaking changes to the public API in post 1.0.

Thoughts?

Paddy

[1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557



[Discuss] [Rust] Common Trait(s) for iterating over RecordBatch's

2020-04-22 Thread paddy horan
Hi All,

I just open ARROW-8559 [1] to consolidate the traits for Record Batch 
iterators.  I feel this needs to be done prior to 1.0 as we need to be clear as 
to what external crates should implement to integrate with the Arrow ecosystem. 
 This might be disruptive though so I wanted to bring it to the attention of 
the mailing list.

Paddy

[1] - https://issues.apache.org/jira/browse/ARROW-8559


Re: [ANNOUNCE] New Arrow committers: Ji Liu and Liya Fan

2020-06-11 Thread paddy horan
Congrats!


From: Micah Kornfield 
Sent: Thursday, June 11, 2020 12:59:32 PM
To: dev 
Subject: Re: [ANNOUNCE] New Arrow committers: Ji Liu and Liya Fan

Congratulations!

On Thu, Jun 11, 2020 at 9:32 AM David Li  wrote:

> Congrats Ji  & Liya!
>
> David
>
> On 6/11/20, siddharth teotia  wrote:
> > Congratulations!
> >
> > On Thu, Jun 11, 2020 at 7:51 AM Neal Richardson
> > 
> > wrote:
> >
> >> Congratulations, both!
> >>
> >> Neal
> >>
> >> On Thu, Jun 11, 2020 at 7:38 AM Wes McKinney 
> wrote:
> >>
> >> > On behalf of the Arrow PMC I'm happy to announce that Ji Liu and Liya
> >> > Fan have been invited to be Arrow committers and they have both
> >> > accepted.
> >> >
> >> > Welcome, and thank you for your contributions!
> >> >
> >>
> >
> >
> > --
> > *Best Regards,*
> > *SIDDHARTH TEOTIA*
> > *2008C6PS540G*
> > *BITS PILANI- GOA CAMPUS*
> >
> > *+91 87911 75932*
> >
>


Re: [Rust] Announcement: Rust is now part of the Arrow Flight Integration Test

2021-01-09 Thread paddy horan
This was a long time coming.  Congrats and thank you to all involved!



From: Andrew Lamb 
Sent: Saturday, January 9, 2021 6:08:47 AM
To: dev@arrow.apache.org ; carol.nich...@integer32.com 
; jake.gould...@integer32.com 
; Neville Dipale 
Subject: [Rust] Announcement: Rust is now part of the Arrow Flight Integration 
Test

As of this PR: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F9049data=04%7C01%7C%7Cf2d511d2f68c426ee70e08d8b48efd19%7C84df9e7fe9f640afb435%7C1%7C0%7C637457873503840521%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=zgvF78YarHcxYygznnWkPA2DfUaNlzHz1fYI4HgNgi0%3Dreserved=0
 we have enabled
Rust client and server in the ongoing archery integration tests, which I
think brings a new level of maturity to the Rust implementation and
strengthens the ecosystem as a whole.

Thanks to all the hard work from @carol.nich...@integer32.com
 and @jake.gould...@integer32.com
 and @Neville Dipale  for
making this happen!

Andrew


Re: [ANNOUNCE] New Arrow committer: Ian Cook

2021-04-28 Thread paddy horan
Congrats Ian!



From: Jorge Cardoso Leit?o 
Sent: Wednesday, April 28, 2021 4:56:12 PM
To: dev@arrow.apache.org 
Subject: Re: [ANNOUNCE] New Arrow committer: Ian Cook

Congratulations and thank you for your contributions :)

On Wed, Apr 28, 2021 at 10:37 PM Neal Richardson <
neal.p.richard...@gmail.com> wrote:

> On behalf of the Arrow PMC, I'm happy to announce that Ian has accepted an
> invitation to become a committer on Apache Arrow. Welcome, and thank you
> for your contributions!
>
> Neal
>


Re: [ANNOUNCE] New Arrow committer: Daniël Heres

2021-04-28 Thread paddy horan
Congrats Daniël!



From: Andy Grove 
Sent: Wednesday, April 28, 2021 9:24:41 AM
To: dev 
Subject: [ANNOUNCE] New Arrow committer: Daniël Heres

On behalf of the Arrow PMC, I'm happy to announce that Daniël has

accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


[DISCUSS] How to describe computation on Arrow data?

2021-03-18 Thread paddy horan
Hi All,

I do not have a computer science background so I may not be asking this in the 
correct way or using the correct terminology but I wonder if we can achieve 
some level of standardization when describing computation over Arrow data.

At the moment on the Rust side DataFusion clearly has a way to describe 
computation, I believe that Ballista adds the ability to serialize this to 
allow distributed computation.  On the C++ side work is starting on a similar 
query engine and we already have Gandiva.  Is there an opportunity to define a 
kind of IR for computation over Arrow data that could be adopted across 
implementations?

In this case DataFusion could easily incorporate Gandiva to generate optimized 
compute kernels if they were using the same IR to describe computation.  
Applications built on Arrow could "describe" computation in any language and 
take advantage or innovations across the community, adding this to Arrow's zero 
copy data sharing could be a game changer in my mind.  I'm not someone who 
knows enough to drive this forward but I obviously would like to get involved.  
For some time I was playing around with using TVM's relay IR [1] and applying 
it to Arrow data.

As the Arrow memory format has now matured I fell like this could be the next 
step.  Is there any plan for this kind of work or are we going to allow 
sub-projects to "go their own way"?

Thanks,
Paddy

[1] - Introduction to Relay IR - tvm 0.8.dev0 documentation 
(apache.org)



Re: [VOTE] Accept donation of Rust Ballista project

2021-03-21 Thread paddy horan
+1 (non-binding)



From: Sutou Kouhei 
Sent: Sunday, March 21, 2021 4:34:43 PM
To: dev@arrow.apache.org 
Subject: Re: [VOTE] Accept donation of Rust Ballista project

+1 (binding)

In 
  "[VOTE] Accept donation of Rust Ballista project" on Sun, 21 Mar 2021 
09:56:32 -0600,
  Andy Grove  wrote:

> Dear all,
>
> On behalf of the Ballista community, I would like to propose that we donate
> Ballista to the Apache Arrow project.
>
> Ballista is a distributed scheduler based on Arrow standards (memory
> format, IPC, Flight) and supports distributed query execution with the
> DataFusion query engine.
>
> The community has had an opportunity to discuss this [1] and there do not
> seem to be objections to this.
>
> The code donation in the form of a pull request:
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F9723data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=MjSjUnA1L%2BV3QRRiK%2FjwoBFMAYZ61cpmwCbZ5WqyBm8%3Dreserved=0
>
> This vote is to determine if the Arrow PMC is in favor of accepting this
> donation. If the vote passes, the PMC and the authors of the code will work
> together to complete the ASF IP Clearance process (
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fincubator.apache.org%2Fip-clearance%2Fdata=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=y6AsPjwyysggZI%2BmDeBkU%2Fu%2B8RGYbRY5PYv9D2uoKac%3Dreserved=0)
>  and import this Rust codebase
> implementation into Apache Arrow.
>
> [ ] +1 : Accept contribution of Ballista [ ] 0 : No opinion [ ] -1 : Reject
> contribution because...
>
> Here is my vote: +1
>
> The vote will be open for at least 72 hours.
>
> Thanks,
>
> Andy.
>
> [1]
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Fr09556898c9c94259c00e35c04ea051040931bbe9ce577cba60c148c8%40%253Cdev.arrow.apache.org%253Edata=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=SGfaUnM0ZcDKUoF4qa0hmCSZsQ1fhDdXuka1a5c337s%3Dreserved=0


Re: [ANNOUNCE] New Arrow PMC member: Andrew Lamb

2021-03-08 Thread paddy horan
Congrats Andrew!



From: Krisztián Szűcs 
Sent: Monday, March 8, 2021 3:14:03 PM
To: dev 
Subject: Re: [ANNOUNCE] New Arrow PMC member: Andrew Lamb

Congratulations Andrew!

On Mon, Mar 8, 2021 at 7:43 PM Daniël Heres  wrote:
>
> Congrats Andrew, well deserved!
>
> Op ma 8 mrt. 2021 om 19:25 schreef Fernando Herrera <
> fernando.j.herr...@gmail.com>:
>
> > Congrats Andrew
> >
> > On Mon, 8 Mar 2021, 17:26 Micah Kornfield,  wrote:
> >
> > > Congratulations Andrew!
> > >
> > > On Mon, Mar 8, 2021 at 9:23 AM Wes McKinney  wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > Andrew Lamb to become a PMC member and we are pleased to announce
> > > > that Andrew has accepted.
> > > >
> > > > Congratulations and welcome!
> > > >
> > >
> >
>
>
> --
> Daniël Heres


Re: [ANNOUNCE] New Arrow PMC member: Jorge Leitão

2021-03-08 Thread paddy horan
Congrats Jorge!  Well deserved!



From: Krisztián Szűcs 
Sent: Monday, March 8, 2021 3:14 PM
To: dev
Subject: Re: [ANNOUNCE] New Arrow PMC member: Jorge Leitão

Congratulations Jorge!

On Mon, Mar 8, 2021 at 7:44 PM Daniël Heres  wrote:
>
> Congrats Jorge, well deserved!
>
> Op ma 8 mrt. 2021 om 19:25 schreef Fernando Herrera <
> fernando.j.herr...@gmail.com>:
>
> > Congrats Jorge
> >
> > On Mon, 8 Mar 2021, 17:26 Micah Kornfield,  wrote:
> >
> > > Congratulations Jorge!
> > >
> > > On Mon, Mar 8, 2021 at 9:25 AM Wes McKinney  wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > Jorge Leitão to become a PMC member and we are pleased to announce
> > > > that Jorge has accepted.
> > > >
> > > > Congratulations and welcome!
> > > >
> > >
> >
>
>
> --
> Daniël Heres


Re: [VOTE] Move Rust components to new repos and process

2021-04-15 Thread paddy horan
+1



From: Joris Van den Bossche 
Sent: Thursday, April 15, 2021 10:07:27 AM
To: dev 
Subject: Re: [VOTE] Move Rust components to new repos and process

+1 (non-binding)

Joris

On Thu, 15 Apr 2021 at 15:42, Wes McKinney  wrote:

> +1 (binding)
>
> On Thu, Apr 15, 2021 at 7:31 AM Weston Steimel 
> wrote:
> >
> > +1
> >
> > On Thu, 15 Apr 2021 at 00:05, Andy Grove  wrote:
> >
> > > This vote is to determine if the Arrow PMC is in favor of the Rust
> > > community moving the Rust implementation of Apache Arrow as well as the
> > > related projects (such as Parquet, DataFusion, Ballista, etc) out of
> the
> > > monorepo and into two new repositories, as outlined in the proposal
> > > document [1].
> > >
> > > Please vote whether to accept the proposal and allow the Rust
> community to
> > > proceed with the work.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 : Accept the proposal
> > >
> > > [ ] 0 : No opinion
> > >
> > > [ ] -1 : Reject proposal because...
> > >
> > > Here is my vote: +1
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1]
> > >
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI%2Fedit%3Fusp%3Dsharingdata=04%7C01%7C%7Cb9f01171b45d4f9259da08d90017dc13%7C84df9e7fe9f640afb435%7C1%7C0%7C637540924766988151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=izFW%2B4NiT%2F28S%2BtU6TOOrBrj1Vcfqu%2FVT3LvDi2tFvQ%3Dreserved=0
> > >
>


RE: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-08-03 Thread paddy horan
Hi Jorge,

What do you think about moving Arrow2 into the main Arrow repo where it is only 
enabled via an "experimental" feature flag?  This would allow development of 
Arrow2 to proceed in the main repo but also this would be a clear signal that 
Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is 1.0) we can release it in 
the next main release with Arrow2 being the default and move the existing 
implementation behind a "legacy" feature flag.

Here is why I think this might work well:
 - People contributing to the Arrow project will naturally contribute to 
Arrow2.  At the moment, some people will still contribute to Arrow instead of 
Arrow2 just by virtue of it being the "official" implementation.  However, if 
both are in one repo people will want to contribute to the "future", i.e. 
Arrow2.
 - the experimental flag will be a clear signal to the existing Arrow community 
that Arrow2 is the future but that it is <1.0
 - existing users will be well supported in this transition
 - In general, I think the longer that development proceeds in separate repos 
the harder it will be to eventually merge the two in a way that supports 
existing users. 

Do you think would work?

Paddy

-Original Message-
From: Jorge Cardoso Leitão  
Sent: Monday, August 2, 2021 1:59 PM
To: dev@arrow.apache.org
Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Hi,

Sorry for the delay.

If there is a path towards an official release under a <1.0.0 versioning schema 
aligned with the rest of the Rust ecosystem and in line with the stability of 
the API, then IMO we should move all development to within Apache experimental 
asap (I can handle this and the likely IP clearance round). If we require a 
release >=1.X.Y to it and/or a schedule, then I prefer to keep expectations 
aligned and postpone any movement.

Under the move situation, I was thinking in something as follows:

* gradually stop maintaining "arrow" in crates, offering a maintenance window 
over which we release patches (*)
* work towards achieving feature parity on arrow2/parquet2 on the experimental 
repos.
* keep releasing arrow2/parquet2 under a 0.X model during the step above
(**)
* migrate to arrow-rs and archive experimentals (***)
* break arrow2 in smaller crates so that we can version the APIs at a different 
cadence
* once a crate reaches some stability (this is always opinionated, but it is 
fine), we bump it to 1.0 and announce a maintenance plan ala tokio 
.

(*) e.g. "we will continue to patch the arrow crate up to at least 6 months 
starting after the first release of arrow2 that supports
a) nested parquet read and write
b) union array (including IPC integration tests)
c) map array (including IPC integration tests)"

(**) officially or un-officially (I would suggest officially so that we can 
acknowledge everyone's work on it, but no strong feelings)

(***) something like:
1. place arrow2 on top of a clear arrow repo so that the full contribution 
history up to that point preserved 2. make arrow-rs the home of arrow2 (i.e. we 
start releasing arrow2 from
arrow-rs) and archive the experimental repos; create arrow-rs-parquet or 
something for parquet2.

In summary, the core pain point for me is the current versioning of arrow, 
which I feel is incompatible with my goals for arrow2 and the ecosystem I 
envision it supporting :)

Best,
Jorge

On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney  wrote:

> I think it would also be fine to push "beta" arrow2 crates out of a 
> repo under apache/ so long as they are not marked on crates.io as 
> being Apache-official releases. There's a possible slippery slope 
> there, but as long as we are on a path to formalizing the releases I think it 
> is okay.
>
> On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb  wrote:
>
> > Jorge -- do you feel like we have a resolution on what to do with 
> > arrow2
> in
> > the near term?
> >
> > The current state of affairs seems to me that arrow2 is released 
> > from
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3Dreserved=0
> >  to crates.io (which is fine).
> > Are
> > you happy with keeping development in the jorgecarleitao repo where 
> > you will retain maximal control and flexibility until it is ready to 
> > start integrating?
> >
> > Or would you prefer to put it into one of the 

RE: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-08-03 Thread paddy horan
s code for the 
same or higher performance.

On the opposite side, merging the development of crates under the same repo 
leads to: more triagging of PRs; more work for releases and changelogging; 
tagging based on crates; multiple READMEs in subpaths of the repo, curation of 
the CI to accommodate this, a workspace with many crates each with its own set 
of dependencies, increasing compilation and development; mixed commit logs, 
difficulties in reverts and cherry-picks; more difficult to find stuff in the 
repo. See e.g. how tokio-rs does it:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rsdata=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3Dreserved=0,
 even for small crates like bytes 
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytesdata=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3Dreserved=0>.

Best,
Jorge

On Tue, Aug 3, 2021 at 3:13 PM paddy horan  wrote:

> Hi Jorge,
>
> What do you think about moving Arrow2 into the main Arrow repo where 
> it is only enabled via an "experimental" feature flag?  This would 
> allow development of Arrow2 to proceed in the main repo but also this 
> would be a clear signal that Arrow2 is <1.0.  When we feel ready (i.e. 
> Arrow2 is 1.0) we can release it in the next main release with Arrow2 
> being the default and move the existing implementation behind a "legacy" 
> feature flag.
>
> Here is why I think this might work well:
>  - People contributing to the Arrow project will naturally contribute 
> to Arrow2.  At the moment, some people will still contribute to Arrow 
> instead of Arrow2 just by virtue of it being the "official" implementation.
> However, if both are in one repo people will want to contribute to the 
> "future", i.e. Arrow2.
>  - the experimental flag will be a clear signal to the existing Arrow 
> community that Arrow2 is the future but that it is <1.0
>  - existing users will be well supported in this transition
>  - In general, I think the longer that development proceeds in 
> separate repos the harder it will be to eventually merge the two in a 
> way that supports existing users.
>
> Do you think would work?
>
> Paddy
>
> -Original Message-
> From: Jorge Cardoso Leitão 
> Sent: Monday, August 2, 2021 1:59 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi,
>
> Sorry for the delay.
>
> If there is a path towards an official release under a <1.0.0 
> versioning schema aligned with the rest of the Rust ecosystem and in 
> line with the stability of the API, then IMO we should move all 
> development to within Apache experimental asap (I can handle this and 
> the likely IP clearance round). If we require a release >=1.X.Y to it 
> and/or a schedule, then I prefer to keep expectations aligned and postpone 
> any movement.
>
> Under the move situation, I was thinking in something as follows:
>
> * gradually stop maintaining "arrow" in crates, offering a maintenance 
> window over which we release patches (*)
> * work towards achieving feature parity on arrow2/parquet2 on the 
> experimental repos.
> * keep releasing arrow2/parquet2 under a 0.X model during the step 
> above
> (**)
> * migrate to arrow-rs and archive experimentals (***)
> * break arrow2 in smaller crates so that we can version the APIs at a 
> different cadence
> * once a crate reaches some stability (this is always opinionated, but 
> it is fine), we bump it to 1.0 and announce a maintenance plan ala 
> tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> .rs%2Fblog%2F2020-12-tokio-1-0data=04%7C01%7C%7Ca37de2cddc6e447a7
> 77b08d956c4dbce%7C84df9e7fe9f640afb435%7C1%7C0%7C637636225
> 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=oHPQI8MeSumgLTEsawCkRN
> 5hANft%2BkbLTEmLZ3pIDiU%3Dreserved=0
> >.
>
> (*) e.g. "we will continue to patch the arrow crate up to at least 6 
> months starting after the first release of arrow2 that supports
> a) nested parquet read and write
> b) union array (including IPC integration tests)
> c) map array (including IPC integration tests)"
>
> (**) officially or un-officially (I w

Re: [ANNOUNCE] New Arrow PMC member: Neville Dipale

2021-07-29 Thread paddy horan
Congrats Neville!

From: Wes McKinney 
Sent: Thursday, July 29, 2021 6:20 PM
To: dev
Subject: [ANNOUNCE] New Arrow PMC member: Neville Dipale

The Project Management Committee (PMC) for Apache Arrow has invited
Neville Dipale to become a PMC member and we are pleased to announce
that Neville has accepted.

Congratulations and welcome!


[jira] [Created] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-04-26 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2516:
--

 Summary: AppVeyor Build Matrix should be specific to the changes 
made in a PR
 Key: ARROW-2516
 URL: https://issues.apache.org/jira/browse/ARROW-2516
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3035) [Rust] Examples in README.md do not run

2018-08-09 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3035:
--

 Summary: [Rust] Examples in README.md do not run
 Key: ARROW-3035
 URL: https://issues.apache.org/jira/browse/ARROW-3035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3088) [Rust] Use internal `Result` type instead of `Result

2018-08-19 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3088:
--

 Summary: [Rust] Use internal `Result` type instead of 
`Result
 Key: ARROW-3088
 URL: https://issues.apache.org/jira/browse/ARROW-3088
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3177) [Rust] Update expected error messages for tests that 'should panic'

2018-09-05 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3177:
--

 Summary: [Rust] Update expected error messages for tests that 
'should panic'
 Key: ARROW-3177
 URL: https://issues.apache.org/jira/browse/ARROW-3177
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3174) [Rust] run examples as part of CI

2018-09-04 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3174:
--

 Summary: [Rust] run examples as part of CI
 Key: ARROW-3174
 URL: https://issues.apache.org/jira/browse/ARROW-3174
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3173) [Rust] dynamic_types example does not run

2018-09-04 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3173:
--

 Summary: [Rust] dynamic_types example does not run
 Key: ARROW-3173
 URL: https://issues.apache.org/jira/browse/ARROW-3173
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3172) [Rust] Update documentation for datatypes.rs

2018-09-04 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3172:
--

 Summary: [Rust] Update documentation for datatypes.rs
 Key: ARROW-3172
 URL: https://issues.apache.org/jira/browse/ARROW-3172
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3207) Investigate Issue Template for Github Issues

2018-09-10 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3207:
--

 Summary: Investigate Issue Template for Github Issues
 Key: ARROW-3207
 URL: https://issues.apache.org/jira/browse/ARROW-3207
 Project: Apache Arrow
  Issue Type: Task
Reporter: Paddy Horan


I believe that there is a way to specify a starting template for github issues 
so that when new issues are opened the template to fill out is already in the 
issue body.

 

This could be a useful way to explain the purpose of github issues versus Jira 
issues with regard to this project with a section marked as "if you are still 
unsure if your issue warrants a new Jira issue, delete this section and report 
your issue" or similiar.

 

This might reduce the number of times maintainers have to direct people to open 
an issue on jira, as the project grows this is going to happen more and more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2434) [Rust] Add windows support

2018-04-09 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2434:
--

 Summary: [Rust] Add windows support
 Key: ARROW-2434
 URL: https://issues.apache.org/jira/browse/ARROW-2434
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
 Fix For: 0.10.0


Currently `cargo test` fails on windows OS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2436) [Rust] Add windows CI

2018-04-09 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2436:
--

 Summary: [Rust] Add windows CI
 Key: ARROW-2436
 URL: https://issues.apache.org/jira/browse/ARROW-2436
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2502) [Rust] Restore Windows Compatibility

2018-04-23 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2502:
--

 Summary: [Rust] Restore Windows Compatibility
 Key: ARROW-2502
 URL: https://issues.apache.org/jira/browse/ARROW-2502
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


Windows support is currently broken due to a call to free in builder.rs and the 
memory_pool abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2474) [Rust] Windows build fails in memory pool abstraction

2018-04-18 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2474:
--

 Summary: [Rust] Windows build fails in memory pool abstraction
 Key: ARROW-2474
 URL: https://issues.apache.org/jira/browse/ARROW-2474
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3537) [Rust] Implement Tensor Type

2018-10-16 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3537:
--

 Summary: [Rust] Implement Tensor Type
 Key: ARROW-3537
 URL: https://issues.apache.org/jira/browse/ARROW-3537
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3573) [Rust] with_bitset does not set valid bits correctly

2018-10-19 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3573:
--

 Summary: [Rust] with_bitset does not set valid bits correctly
 Key: ARROW-3573
 URL: https://issues.apache.org/jira/browse/ARROW-3573
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


The boundary check is off a little, 
{color:#33}`MutableBuffer::new(64).with_bitset(64, false);` will fail.  
This issue only happens if the arguments to `new` and `with_bitset` are the 
same and a multiple of 64.
{color}

{color:#33}`write_bytes` is currently writing 1 instead of 255 to set all 
the bits when `val` is `true`{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3541) [Rust] Update BufferBuilder to allow new bit-packed BooleanArray

2018-10-17 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3541:
--

 Summary: [Rust] Update BufferBuilder to allow new bit-packed 
BooleanArray
 Key: ARROW-3541
 URL: https://issues.apache.org/jira/browse/ARROW-3541
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3540) [Rust] Incorporate BooleanArray into PrimitiveArray

2018-10-17 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3540:
--

 Summary: [Rust] Incorporate BooleanArray into PrimitiveArray
 Key: ARROW-3540
 URL: https://issues.apache.org/jira/browse/ARROW-3540
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


Currently we have a specific implementation for `BooleanArray` (bit-packing), 
but due to the `ArrowPrimitiveType` trait which we use as a trait bound in many 
places `PrimitiveArray` is still a valid type.  `make_array` actually 
uses `PrimitiveArray` which may be a bug but would be fixed by this issue 
anyway.

I propose move the implementation of `BooleanArray` into 
`PrimitiveArray`, this would allow us to use the `ArrayPrimitiveType` 
trait as a bound more consistently.  i.e. `PrimitiveArrayBuilder` could 
return `PrimitiveArray` instead of having a separate `BooleanArrayBuilder`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3644) [Rust] Implement ListArrayBuilder

2018-10-28 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3644:
--

 Summary: [Rust] Implement ListArrayBuilder
 Key: ARROW-3644
 URL: https://issues.apache.org/jira/browse/ARROW-3644
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3643) Optimize `push_slice` of `BufferBuilder`

2018-10-28 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3643:
--

 Summary: Optimize `push_slice` of `BufferBuilder`
 Key: ARROW-3643
 URL: https://issues.apache.org/jira/browse/ARROW-3643
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan


Current implementation just repeatedly calls `push`, this should be optimized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3687) [Rust] Anything measuring array slots should be `usize`

2018-11-01 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3687:
--

 Summary: [Rust] Anything measuring array slots should be `usize`
 Key: ARROW-3687
 URL: https://issues.apache.org/jira/browse/ARROW-3687
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3658) [Rust] validation of offsets buffer is incorrect for `List`

2018-10-30 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3658:
--

 Summary: [Rust] validation of offsets buffer is incorrect for 
`List`
 Key: ARROW-3658
 URL: https://issues.apache.org/jira/browse/ARROW-3658
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3713) [Rust] Implement BinaryArrayBuilder

2018-11-06 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3713:
--

 Summary: [Rust] Implement BinaryArrayBuilder
 Key: ARROW-3713
 URL: https://issues.apache.org/jira/browse/ARROW-3713
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3776) [Rust] Mark methods that do not perform bounds checking as unsafe

2018-11-12 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3776:
--

 Summary: [Rust] Mark methods that do not perform bounds checking 
as unsafe
 Key: ARROW-3776
 URL: https://issues.apache.org/jira/browse/ARROW-3776
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3787) Implement From for BinaryArray

2018-11-13 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3787:
--

 Summary: Implement From for BinaryArray
 Key: ARROW-3787
 URL: https://issues.apache.org/jira/browse/ARROW-3787
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3796) [Rust] Add Example for PrimitiveArrayBuilder

2018-11-14 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3796:
--

 Summary: [Rust] Add Example for PrimitiveArrayBuilder
 Key: ARROW-3796
 URL: https://issues.apache.org/jira/browse/ARROW-3796
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3347) [Rust] Implement PrimitiveArrayBuilder

2018-09-27 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3347:
--

 Summary: [Rust] Implement PrimitiveArrayBuilder
 Key: ARROW-3347
 URL: https://issues.apache.org/jira/browse/ARROW-3347
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


This is a sub-task of ARROW-3089



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3398) [Rust] Update existing Builder to use MutableBuffer internally

2018-10-01 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3398:
--

 Summary: [Rust] Update existing Builder to use MutableBuffer 
internally
 Key: ARROW-3398
 URL: https://issues.apache.org/jira/browse/ARROW-3398
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4271) [Rust] Move Parquet specific info to Parquet Readme

2019-01-15 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4271:
--

 Summary: [Rust] Move Parquet specific info to Parquet Readme
 Key: ARROW-4271
 URL: https://issues.apache.org/jira/browse/ARROW-4271
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


The arrow readme contains parquet specific info that was copied over from the 
top level readme, it should be moved to the parquet readme.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4282) [Rust] builder benchmark is broken

2019-01-17 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4282:
--

 Summary: [Rust] builder benchmark is broken
 Key: ARROW-4282
 URL: https://issues.apache.org/jira/browse/ARROW-4282
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


This benchmark uses `push_slice` which has been renamed to `append_slice`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4071) [Rust] Add rustfmt as a pre-commit hook

2018-12-18 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4071:
--

 Summary: [Rust] Add rustfmt as a pre-commit hook
 Key: ARROW-4071
 URL: https://issues.apache.org/jira/browse/ARROW-4071
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4072) [Rust] Set default value for PARQUET_TEST_DATA

2018-12-18 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4072:
--

 Summary: [Rust] Set default value for PARQUET_TEST_DATA
 Key: ARROW-4072
 URL: https://issues.apache.org/jira/browse/ARROW-4072
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: Paddy Horan


See discussion [here](https://github.com/apache/arrow/pull/3210)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3827) [Rust] Implement UnionArray

2018-11-16 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3827:
--

 Summary: [Rust] Implement UnionArray
 Key: ARROW-3827
 URL: https://issues.apache.org/jira/browse/ARROW-3827
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4196) [Rust] Add explicit SIMD vectorization for ops in "array_ops"

2019-01-08 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4196:
--

 Summary: [Rust] Add explicit SIMD vectorization for ops in 
"array_ops"
 Key: ARROW-4196
 URL: https://issues.apache.org/jira/browse/ARROW-4196
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4683) [Rust] Enable "#![deny(missing_docs)]"

2019-02-26 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4683:
--

 Summary: [Rust] Enable "#![deny(missing_docs)]"
 Key: ARROW-4683
 URL: https://issues.apache.org/jira/browse/ARROW-4683
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan


We are moving fast with the Rust implementation and the docs can be ignored at 
times.  We are starting to get to the point where the project is useful and we 
should ensure that the docs are up to scratch to avoid hurting adoption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4752) [Rust] Add explicit SIMD vectorization for the divide kernel

2019-03-03 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4752:
--

 Summary: [Rust] Add explicit SIMD vectorization for the divide 
kernel
 Key: ARROW-4752
 URL: https://issues.apache.org/jira/browse/ARROW-4752
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4656) [Rust] Implement CSV Writer

2019-02-21 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4656:
--

 Summary: [Rust] Implement CSV Writer
 Key: ARROW-4656
 URL: https://issues.apache.org/jira/browse/ARROW-4656
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4513) [Rust] Implement BitAnd/BitOr for and

2019-02-08 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4513:
--

 Summary: [Rust] Implement BitAnd/BitOr for  and 

 Key: ARROW-4513
 URL: https://issues.apache.org/jira/browse/ARROW-4513
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4591) [Rust] Add explicit SIMD vectorization for aggregation ops in "array_ops"

2019-02-16 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4591:
--

 Summary: [Rust] Add explicit SIMD vectorization for aggregation 
ops in "array_ops"
 Key: ARROW-4591
 URL: https://issues.apache.org/jira/browse/ARROW-4591
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4590) [Rust] Add explicit SIMD vectorization for comparison ops in "array_ops"

2019-02-16 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4590:
--

 Summary: [Rust] Add explicit SIMD vectorization for comparison ops 
in "array_ops"
 Key: ARROW-4590
 URL: https://issues.apache.org/jira/browse/ARROW-4590
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4586) [Rust] Remove arrow/mod.rs as it is not needed

2019-02-15 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4586:
--

 Summary: [Rust] Remove arrow/mod.rs as it is not needed
 Key: ARROW-4586
 URL: https://issues.apache.org/jira/browse/ARROW-4586
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4488) [Rust] From AsRef<[u8]> for Buffer does not ensure correct padding

2019-02-05 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4488:
--

 Summary: [Rust] From AsRef<[u8]> for Buffer does not ensure 
correct padding
 Key: ARROW-4488
 URL: https://issues.apache.org/jira/browse/ARROW-4488
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4489) [Rust] PrimitiveArray.value_slice performs bounds checking when it should not

2019-02-05 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4489:
--

 Summary: [Rust] PrimitiveArray.value_slice performs bounds 
checking when it should not
 Key: ARROW-4489
 URL: https://issues.apache.org/jira/browse/ARROW-4489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4490) [Rust] Add explicit SIMD vectorization for boolean ops in "array_ops"

2019-02-05 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-4490:
--

 Summary: [Rust] Add explicit SIMD vectorization for boolean ops in 
"array_ops"
 Key: ARROW-4490
 URL: https://issues.apache.org/jira/browse/ARROW-4490
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >