[jira] [Created] (ARROW-7542) [CI][C++] nrpoc isn't availabe on macOS

2020-01-09 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7542:
---

 Summary: [CI][C++] nrpoc isn't availabe on macOS
 Key: ARROW-7542
 URL: https://issues.apache.org/jira/browse/ARROW-7542
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://github.com/apache/arrow/runs/38286#step:5:32

{noformat}
ci/scripts/cpp_test.sh: line 31: nproc: command not found
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-01-09 Thread Micah Kornfield
Hi Wes,
I'm still interested in doing the work.  But don't to hold anybody up if
they have bandwidth.

In order to actually make progress on this, my plan will be to:
1.  Help with the current Java review backlog through early next week or so
(this has been taking the majority of my time allocated for Arrow
contributions for the last 6 months or so).
2.  Shift all my attention to trying to get this done (this means no
reviews other then closing out existing ones that I've started until it is
done).  Hopefully, other Java committers can help shrink the backlog
further (Jacques thanks for you recent efforts here).

Thanks,
Micah

On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney  wrote:

> hi folks,
>
> I think we have reached a point where the incomplete C++ Parquet
> nested data assembly/disassembly is harming the value of several
> others parts of the project, for example the Datasets API. As another
> example, it's possible to ingest nested data from JSON but not write
> it to Parquet in general.
>
> Implementing the nested data read and write path completely is a
> difficult project requiring at least several weeks of dedicated work,
> so it's not so surprising that it hasn't been accomplished yet. I know
> that several people have expressed interest in working on it, but I
> would like to see if anyone would be able to volunteer a commitment of
> time and guess on a rough timeline when this work could be done. It
> seems to me if this slips beyond 2020 it will significant diminish the
> value being created by other parts of the project.
>
> Since I'm pretty familiar with all the Parquet code I'm one candidate
> person to take on this project (and I can dedicate the time, but it
> would come at the expense of other projects where I can also be
> useful). But Micah and others expressed interest in working on it, so
> I wanted to have a discussion about it to see what others think.
>
> Thanks
> Wes
>


[jira] [Created] (ARROW-7541) [GLib] Install license files

2020-01-09 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7541:
---

 Summary: [GLib] Install license files
 Key: ARROW-7541
 URL: https://issues.apache.org/jira/browse/ARROW-7541
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7540) [C++] License files aren't installed

2020-01-09 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7540:
---

 Summary: [C++] License files aren't installed
 Key: ARROW-7540
 URL: https://issues.apache.org/jira/browse/ARROW-7540
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7539) [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-01-09 Thread Ji Liu (Jira)
Ji Liu created ARROW-7539:
-

 Summary: [Java] FieldVector getFieldBuffers API should not set 
reader/writer indices
 Key: ARROW-7539
 URL: https://issues.apache.org/jira/browse/ARROW-7539
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Per discussion 
[https://github.com/apache/arrow/pull/6133#discussion_r364906302].

The fact that we have reader/writer settings in {{getFieldBuffers}} is wrong. 
To clarify, {{getFieldBuffers}} is distinct from {{getBuffers}}. The former 
should be for getting access to underlying data for higher-performance 
algorithms. The latter is for sending the data over the wire. Seems we've mixed 
up use of both.

 

Currently in {{VectorUnloader}}, we used {{getFieldBuffers}} to create 
{{ArrowRecordBatch}} that’s why we keep writer/reader indices in 
{{getFieldBuffers}}, we should use {{getBuffers}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7538) Clarify actual and desired size in AllocationManager

2020-01-09 Thread David Li (Jira)
David Li created ARROW-7538:
---

 Summary: Clarify actual and desired size in AllocationManager
 Key: ARROW-7538
 URL: https://issues.apache.org/jira/browse/ARROW-7538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


As a follow up to the review of ARROW-7329, we should clarify the different 
sizes (desired vs actual size) in AllocationManager: 
https://github.com/apache/arrow/pull/5973#discussion_r354729754



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Discuss][Rust] Policy regarding "unsafe"

2020-01-09 Thread paddy horan
Hi All,

This time last year there was a brief discussion on the usage of unsafe in Rust 
(a user on github raised the issue and I created the JIRA). [1]

So far we mostly avoid unsafe in the public API's.  The thinking here is that 
Arrow is a "development platform", i.e. lower level that most libraries, and 
library builders will want to avoid any performance hit of bounds checking, etc.

This is not typical in the Rust community where unsafe is a clear signal that 
care is needed.  Although it might clutter the API a little more I would be in 
favor of having safe and unsafe variants of methods as needed.  For instance, 
"value" for array access would be changed to "value" and "value_unchecked" 
where the latter is unsafe and does not perform bounds checks.

We don't have a huge number of libraries building on top of Arrow in Rust at 
the moment so it seems like a good time, before 1.0, to decide on this to avoid 
breaking changes to the public API in post 1.0.

Thoughts?

Paddy

[1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557



Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau
Understood and appreciated. Yeah, it can become a bit of a mess.

On Thu, Jan 9, 2020 at 12:22 PM Wes McKinney  wrote:

> Will do -- there were many C++ and Python-related issues that I think
> were put in 1.0.0 / 0.16.0 overly optimistically and so I removed the
> Fix Version entirely (some of these had been pushed off 3-4 major
> releases ago). I may have removed some Fix Versions from other
> components that should have been rolled over -- sorry about that. It's
> hard to judge on some issues that have been open for 6-12 months or
> more.
>
> In general I think we should try to be more conservative about what
> issues we pre-emptively assign fix versions -- there may be a more
> constructive way that we can prioritize issues and distinguish between
> "optimistic" / nice-to-have issues and "must do to release" issues.
>
> On Thu, Jan 9, 2020 at 12:42 PM Jacques Nadeau  wrote:
> >
> > It would be helpful that when something is assigned to a release and you
> > want to push it out, you push it to the next release as opposed to
> removing
> > a fix version entirely. Thanks!
> >
> > On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney 
> wrote:
> >
> > > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will
> > > work on removing issues that are not necessary to be able to release
> > > (others, please help). If we make miraculous progress with the 1.0.0
> > > columnar format blockers (per discussion below), we can change this
> > > back, but I think either way we should put ourselves on a critical
> > > path to have an RC cut by Friday January 24. Does that seem doable?
> > >
> > > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney 
> wrote:
> > > >
> > > > We absolutely should have a list of exactly what needs to be done to
> > > > put out the 1.0.0 release, but based on what we know needs to be done
> > > > I am not optimistic that it can all be accomplished before the end of
> > > > January. That doesn't mean that we should assume these things won't
> > > > get done before March/April time frame. If they get done sooner,
> let's
> > > > release 1.0.0 sooner.
> > > >
> > > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson
> > > >  wrote:
> > > > >
> > > > > I'm all for maintaining a regular cadence of releases, but before
> we
> > > cast
> > > > > aside the idea of 1.0, I'd still encourage us to do the work of
> > > enumerating
> > > > > what truly must happen before we call a release 1.0 so that we can
> get
> > > it
> > > > > done. Otherwise, in April we're going to be talking about doing a
> 0.17
> > > > > release.
> > > > >
> > > > > I believe I've found the issues that Wes referenced and added them
> as
> > > > > "blockers" to 1.0.0. That brings the total blocker count listed on
> > > > >
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > > to 10
> > > > > issues, though some may be overlapping/redundant. Do we think this
> is
> > > an
> > > > > exhaustive list of blockers? Should some of these be downgraded to
> > > > > not-blocking? If we were to resolve all 10 of these issues, would
> we
> > > have
> > > > > consensus that we're ready for 1.0?
> > > > >
> > > > > Would it help to update this wiki, which seems pretty stale at this
> > > point?
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone
> > > > >
> > > > > Thanks,
> > > > > Neal
> > > > >
> > > > >
> > > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler 
> > > wrote:
> > > > >
> > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out
> > > with
> > > > > > getting the Java side ready for 1.0.
> > > > > >
> > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya 
> > > wrote:
> > > > > >
> > > > > > > Hi Jacques,
> > > > > > >
> > > > > > > ARROW-4526 is interesting. I would like to try to resolve it.
> > > > > > > Thanks a lot for the information.
> > > > > > >
> > > > > > > Best,
> > > > > > > Liya Fan
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau <
> jacq...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > The third ticket I was commenting on was ARROW-4526.
> > > > > > > >
> > > > > > > > Fan, do you want to take a shot at that one?
> > > > > > > >
> > > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya <
> liya.fa...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > >   Hi Jacques,
> > > > > > > > >
> > > > > > > > > I am interested in the issues, and if it is possible, I
> would
> > > like to
> > > > > > > try
> > > > > > > > > to resolve them.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > Liya Fan
> > > > > > > > >
> > > > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau <
> > > jacq...@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I identified three things in the java library that I
> think
> > > are top
> > > > > > of
> > > > > > > > > mind
> > > > > > > > > > and should be fixed before 1.0 to avoid weird
> incompatibility
> 

[jira] [Created] (ARROW-7537) [CI][R] Nightly macOS autobrew job should be more verbose if it fails

2020-01-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7537:
--

 Summary: [CI][R] Nightly macOS autobrew job should be more verbose 
if it fails
 Key: ARROW-7537
 URL: https://issues.apache.org/jira/browse/ARROW-7537
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.16.0


Things like https://travis-ci.org/ursa-labs/crossbow/builds/634643469#L673-L676 
are hard to debug because the installation log is not printed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7536) [Java] [Dev] `docker-compose pull debian-java` fails

2020-01-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7536:
-

 Summary: [Java] [Dev] `docker-compose pull debian-java` fails
 Key: ARROW-7536
 URL: https://issues.apache.org/jira/browse/ARROW-7536
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools, Java
Reporter: Antoine Pitrou
Assignee: Krisztian Szucs


I get the following error here:
{code}
$ docker-compose pull debian-java
Pulling debian-java ... error

ERROR: for debian-java  manifest for 
apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found
ERROR: manifest for apache/arrow-dev:amd64-debian-9-java-8-maven-3.5.4 not found
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Wes McKinney
Will do -- there were many C++ and Python-related issues that I think
were put in 1.0.0 / 0.16.0 overly optimistically and so I removed the
Fix Version entirely (some of these had been pushed off 3-4 major
releases ago). I may have removed some Fix Versions from other
components that should have been rolled over -- sorry about that. It's
hard to judge on some issues that have been open for 6-12 months or
more.

In general I think we should try to be more conservative about what
issues we pre-emptively assign fix versions -- there may be a more
constructive way that we can prioritize issues and distinguish between
"optimistic" / nice-to-have issues and "must do to release" issues.

On Thu, Jan 9, 2020 at 12:42 PM Jacques Nadeau  wrote:
>
> It would be helpful that when something is assigned to a release and you
> want to push it out, you push it to the next release as opposed to removing
> a fix version entirely. Thanks!
>
> On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney  wrote:
>
> > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will
> > work on removing issues that are not necessary to be able to release
> > (others, please help). If we make miraculous progress with the 1.0.0
> > columnar format blockers (per discussion below), we can change this
> > back, but I think either way we should put ourselves on a critical
> > path to have an RC cut by Friday January 24. Does that seem doable?
> >
> > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney  wrote:
> > >
> > > We absolutely should have a list of exactly what needs to be done to
> > > put out the 1.0.0 release, but based on what we know needs to be done
> > > I am not optimistic that it can all be accomplished before the end of
> > > January. That doesn't mean that we should assume these things won't
> > > get done before March/April time frame. If they get done sooner, let's
> > > release 1.0.0 sooner.
> > >
> > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson
> > >  wrote:
> > > >
> > > > I'm all for maintaining a regular cadence of releases, but before we
> > cast
> > > > aside the idea of 1.0, I'd still encourage us to do the work of
> > enumerating
> > > > what truly must happen before we call a release 1.0 so that we can get
> > it
> > > > done. Otherwise, in April we're going to be talking about doing a 0.17
> > > > release.
> > > >
> > > > I believe I've found the issues that Wes referenced and added them as
> > > > "blockers" to 1.0.0. That brings the total blocker count listed on
> > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > to 10
> > > > issues, though some may be overlapping/redundant. Do we think this is
> > an
> > > > exhaustive list of blockers? Should some of these be downgraded to
> > > > not-blocking? If we were to resolve all 10 of these issues, would we
> > have
> > > > consensus that we're ready for 1.0?
> > > >
> > > > Would it help to update this wiki, which seems pretty stale at this
> > point?
> > > >
> > https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone
> > > >
> > > > Thanks,
> > > > Neal
> > > >
> > > >
> > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler 
> > wrote:
> > > >
> > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out
> > with
> > > > > getting the Java side ready for 1.0.
> > > > >
> > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya 
> > wrote:
> > > > >
> > > > > > Hi Jacques,
> > > > > >
> > > > > > ARROW-4526 is interesting. I would like to try to resolve it.
> > > > > > Thanks a lot for the information.
> > > > > >
> > > > > > Best,
> > > > > > Liya Fan
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau 
> > > > > wrote:
> > > > > >
> > > > > > > The third ticket I was commenting on was ARROW-4526.
> > > > > > >
> > > > > > > Fan, do you want to take a shot at that one?
> > > > > > >
> > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya 
> > wrote:
> > > > > > >
> > > > > > > >   Hi Jacques,
> > > > > > > >
> > > > > > > > I am interested in the issues, and if it is possible, I would
> > like to
> > > > > > try
> > > > > > > > to resolve them.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > Liya Fan
> > > > > > > >
> > > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau <
> > jacq...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I identified three things in the java library that I think
> > are top
> > > > > of
> > > > > > > > mind
> > > > > > > > > and should be fixed before 1.0 to avoid weird incompatibility
> > > > > changes
> > > > > > > in
> > > > > > > > > the java apis (technical debt). I've tagged them as pre-1.0
> > as I
> > > > > > don't
> > > > > > > > > exactly see what is the right way to tag/label a target
> > release
> > > > > for a
> > > > > > > > > ticket.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > https://issues.apache.org/jira/browse/ARROW-7495?jql=labels%20%3D%20pre-1.0
> > > > > > > > 

Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau
It would be helpful that when something is assigned to a release and you
want to push it out, you push it to the next release as opposed to removing
a fix version entirely. Thanks!

On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney  wrote:

> I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will
> work on removing issues that are not necessary to be able to release
> (others, please help). If we make miraculous progress with the 1.0.0
> columnar format blockers (per discussion below), we can change this
> back, but I think either way we should put ourselves on a critical
> path to have an RC cut by Friday January 24. Does that seem doable?
>
> On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney  wrote:
> >
> > We absolutely should have a list of exactly what needs to be done to
> > put out the 1.0.0 release, but based on what we know needs to be done
> > I am not optimistic that it can all be accomplished before the end of
> > January. That doesn't mean that we should assume these things won't
> > get done before March/April time frame. If they get done sooner, let's
> > release 1.0.0 sooner.
> >
> > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson
> >  wrote:
> > >
> > > I'm all for maintaining a regular cadence of releases, but before we
> cast
> > > aside the idea of 1.0, I'd still encourage us to do the work of
> enumerating
> > > what truly must happen before we call a release 1.0 so that we can get
> it
> > > done. Otherwise, in April we're going to be talking about doing a 0.17
> > > release.
> > >
> > > I believe I've found the issues that Wes referenced and added them as
> > > "blockers" to 1.0.0. That brings the total blocker count listed on
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> to 10
> > > issues, though some may be overlapping/redundant. Do we think this is
> an
> > > exhaustive list of blockers? Should some of these be downgraded to
> > > not-blocking? If we were to resolve all 10 of these issues, would we
> have
> > > consensus that we're ready for 1.0?
> > >
> > > Would it help to update this wiki, which seems pretty stale at this
> point?
> > >
> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone
> > >
> > > Thanks,
> > > Neal
> > >
> > >
> > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler 
> wrote:
> > >
> > > > I agree on a 0.16.0 release. In the meantime I'll try to help out
> with
> > > > getting the Java side ready for 1.0.
> > > >
> > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya 
> wrote:
> > > >
> > > > > Hi Jacques,
> > > > >
> > > > > ARROW-4526 is interesting. I would like to try to resolve it.
> > > > > Thanks a lot for the information.
> > > > >
> > > > > Best,
> > > > > Liya Fan
> > > > >
> > > > >
> > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau 
> > > > wrote:
> > > > >
> > > > > > The third ticket I was commenting on was ARROW-4526.
> > > > > >
> > > > > > Fan, do you want to take a shot at that one?
> > > > > >
> > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya 
> wrote:
> > > > > >
> > > > > > >   Hi Jacques,
> > > > > > >
> > > > > > > I am interested in the issues, and if it is possible, I would
> like to
> > > > > try
> > > > > > > to resolve them.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Liya Fan
> > > > > > >
> > > > > > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau <
> jacq...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I identified three things in the java library that I think
> are top
> > > > of
> > > > > > > mind
> > > > > > > > and should be fixed before 1.0 to avoid weird incompatibility
> > > > changes
> > > > > > in
> > > > > > > > the java apis (technical debt). I've tagged them as pre-1.0
> as I
> > > > > don't
> > > > > > > > exactly see what is the right way to tag/label a target
> release
> > > > for a
> > > > > > > > ticket.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> https://issues.apache.org/jira/browse/ARROW-7495?jql=labels%20%3D%20pre-1.0
> > > > > > > >
> > > > > > > > For the three tickets I identified, does anyone have
> interest in
> > > > > trying
> > > > > > > to
> > > > > > > > resolve?
> > > > > > > >
> > > > > > > > thanks,
> > > > > > > > Jacques
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jan 2, 2020 at 11:55 AM Neal Richardson <
> > > > > > > > neal.p.richard...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > > Happy new year! As we look ahead to 2020, it's time to
> start
> > > > > > mobilizing
> > > > > > > > for
> > > > > > > > > the Arrow 1.0 release. At 0.15, I believe we decided that
> our
> > > > next
> > > > > > > > release
> > > > > > > > > should be 1.0, and it's been a couple of months since
> 0.15, so
> > > > > we're
> > > > > > > due
> > > > > > > > to
> > > > > > > > > release again this month, give or take. (See [1] for when
> we most
> > > > > > > > recently
> > > > > > > > > discussed doing 1.0 back in June, or if 

[jira] [Created] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7534:
-

 Summary: Create a new java/contrib module
 Key: ARROW-7534
 URL: https://issues.apache.org/jira/browse/ARROW-7534
 Project: Apache Arrow
  Issue Type: Task
Reporter: Jacques Nadeau
Assignee: Liya Fan


To better clarify the status of java sub-modules, create a contrib module and 
move the following modules underneath it.

* algorithm
* adapter
* plasma



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7533:
-

 Summary: [Java] Move ArrowBufPointer out of the java the memory 
package
 Key: ARROW-7533
 URL: https://issues.apache.org/jira/browse/ARROW-7533
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Jacques Nadeau
Assignee: Liya Fan


The memory package is focused on memory access and management. ArrowBufPointer 
should be moved to algorithm package as it isn't core to the Arrow memory 
management primitives. I would further suggest that is an anti-pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7532) [CI] Unskip brew test after Homebrew fixes it upstream

2020-01-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7532:
--

 Summary: [CI] Unskip brew test after Homebrew fixes it upstream
 Key: ARROW-7532
 URL: https://issues.apache.org/jira/browse/ARROW-7532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson


Followup to ARROW-7492. See https://github.com/Homebrew/brew/issues/6908.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Pending Java pull requests

2020-01-09 Thread Micah Kornfield
My time has been more limited lately, but i'll try to work through these
some as well over the next couple of days.

On Thu, Jan 9, 2020 at 8:44 AM Jacques Nadeau  wrote:

> I think there are a decent chunk that are of questionable value. We need to
> be more willing to simply reject requests rather than leave them in
> no-man's land. I'll try to do a pass through and help dispatch, etc.
>
> On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs 
> wrote:
>
> > Hi,
> >
> > Roughly 40% of the pending pull requests are tagged as Java [1].
> > Some of those having long threads and some of them are not
> > reviewed yet. Considering the upcoming release it would be great
> > to close or proceed with them.
> > So any additional help from Java developers would be appreciated!
> >
> > Thanks, Krisztian
> >
> > [1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java
> >
>


Re: [DRAFT] Apache Arrow Board Report January 2020

2020-01-09 Thread Jacques Nadeau
Posted with correction. Thanks to Wes, Antoine and Todd!

On Wed, Jan 8, 2020 at 10:15 AM Wes McKinney  wrote:

> Not sure what happened there. The two words after "grow" can be removed
>
> ## Description:
>
> The mission of Apache Arrow is the creation and maintenance of software
> related
> to columnar in-memory processing and data interchange
>
> ## Issues:
>
> There are no issues requiring board attention at this time.
>
> ## Membership Data:
> Apache Arrow was founded 2016-01-19 (4 years ago)
> There are currently 50 committers and 28 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:4.
>
> Community changes, past quarter:
> - No new PMC members. Last addition was Micah Kornfield on 2019-08-21.
> - Eric Erhardt was added as committer on 2019-10-18
> - Joris Van den Bossche was added as committer on 2019-12-06
>
> ## Project Activity:
>
> * We have completed our initial migration away from Travis CI for
>   continuous integration and patch validation to use the new
>   GitHub Actions (GHA) service. We are much happier with the
>   compute resource allocation provided by GitHub but longer term
>   we are concerned that the generous free allocation may not
>   continue and would be interested to know what kinds of
>   guarantees (if any) GitHub may make to the ASF regarding GHA.
> * We are not out of the woods on CI/CD as there are features of Apache
> Arrow
>   that we cannot test in GitHub Actions. We are still considering options
> for
>   running these optional test workloads as well as other kinds of periodic
>   workloads like benchmarking
> * We hope to make a 1.0.0 release of the project in early 2020. We had
> thought
>   that our next major release after 0.15.0 would be 1.0.0 but we have not
> yet
>   completed some necessary work items that the community has agreed are
>   essential to graduate to 1.0.0
>
> Recent releases:
> 0.15.0 was released on 2019-10-05.
> 0.14.1 was released on 2019-07-21.
> 0.14.0 was released on 2019-07-04.
>
> ## Community Health:
>
> The developer community is healthy and continues to grow.
>
> On Wed, Jan 8, 2020 at 12:12 PM Todd Hendricks 
> wrote:
> >
> > Hi Wes,
> >
> > Looks like there is a cutoff sentence at the end of the Community Health
> > section.
> >
> > On Wed, Jan 8, 2020 at 10:01 AM Wes McKinney 
> wrote:
> >
> > > Here is an updated draft. If there is no more feedback, this can be
> > > submitted to the board
> > >
> > > ## Description:
> > >
> > > The mission of Apache Arrow is the creation and maintenance of software
> > > related
> > > to columnar in-memory processing and data interchange
> > >
> > > ## Issues:
> > >
> > > There are no issues requiring board attention at this time.
> > >
> > > ## Membership Data:
> > > Apache Arrow was founded 2016-01-19 (4 years ago)
> > > There are currently 50 committers and 28 PMC members in this project.
> > > The Committer-to-PMC ratio is roughly 7:4.
> > >
> > > Community changes, past quarter:
> > > - No new PMC members. Last addition was Micah Kornfield on 2019-08-21.
> > > - Eric Erhardt was added as committer on 2019-10-18
> > > - Joris Van den Bossche was added as committer on 2019-12-06
> > >
> > > ## Project Activity:
> > >
> > > * We have completed our initial migration away from Travis CI for
> > >   continuous integration and patch validation to use the new
> > >   GitHub Actions (GHA) service. We are much happier with the
> > >   compute resource allocation provided by GitHub but longer term
> > >   we are concerned that the generous free allocation may not
> > >   continue and would be interested to know what kinds of
> > >   guarantees (if any) GitHub may make to the ASF regarding GHA.
> > > * We are not out of the woods on CI/CD as there are features of Apache
> > > Arrow
> > >   that we cannot test in GitHub Actions. We are still considering
> options
> > > for
> > >   running these optional test workloads as well as other kinds of
> periodic
> > >   workloads like benchmarking
> > > * We hope to make a 1.0.0 release of the project in early 2020. We had
> > > thought
> > >   that our next major release after 0.15.0 would be 1.0.0 but we have
> not
> > > yet
> > >   completed some necessary work items that the community has agreed are
> > >   essential to graduate to 1.0.0
> > >
> > > Recent releases:
> > > 0.15.0 was released on 2019-10-05.
> > > 0.14.1 was released on 2019-07-21.
> > > 0.14.0 was released on 2019-07-04.
> > >
> > > ## Community Health:
> > >
> > > The developer community is healthy and continues to grow.THe co
> > >
> > > On Mon, Jan 6, 2020 at 11:16 AM Antoine Pitrou 
> wrote:
> > > >
> > > >
> > > > Perhaps also mention that we're dependent on enough capacity on
> GitHub
> > > > Actions currently.  I'm not sure how long their generosity will last
> :-)
> > > >
> > > >
> > > > Le 06/01/2020 à 18:14, Wes McKinney a écrit :
> > > > > There is still the question of how to manage CI tasks (e.g.
> > > > > GPU-enabled, ARM-enabled) that are unable to be run 

Re: Pending Java pull requests

2020-01-09 Thread Jacques Nadeau
I think there are a decent chunk that are of questionable value. We need to
be more willing to simply reject requests rather than leave them in
no-man's land. I'll try to do a pass through and help dispatch, etc.

On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs 
wrote:

> Hi,
>
> Roughly 40% of the pending pull requests are tagged as Java [1].
> Some of those having long threads and some of them are not
> reviewed yet. Considering the upcoming release it would be great
> to close or proceed with them.
> So any additional help from Java developers would be appreciated!
>
> Thanks, Krisztian
>
> [1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java
>


[jira] [Created] (ARROW-7531) [C++] Investigate header cost reduction

2020-01-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7531:
-

 Summary: [C++] Investigate header cost reduction
 Key: ARROW-7531
 URL: https://issues.apache.org/jira/browse/ARROW-7531
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou


Using https://github.com/aras-p/ClangBuildAnalyzer we could create to find out 
the worst offenders in terms of header file parsing cost when compiling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-01-09 Thread Wes McKinney
hi folks,

I think we have reached a point where the incomplete C++ Parquet
nested data assembly/disassembly is harming the value of several
others parts of the project, for example the Datasets API. As another
example, it's possible to ingest nested data from JSON but not write
it to Parquet in general.

Implementing the nested data read and write path completely is a
difficult project requiring at least several weeks of dedicated work,
so it's not so surprising that it hasn't been accomplished yet. I know
that several people have expressed interest in working on it, but I
would like to see if anyone would be able to volunteer a commitment of
time and guess on a rough timeline when this work could be done. It
seems to me if this slips beyond 2020 it will significant diminish the
value being created by other parts of the project.

Since I'm pretty familiar with all the Parquet code I'm one candidate
person to take on this project (and I can dedicate the time, but it
would come at the expense of other projects where I can also be
useful). But Micah and others expressed interest in working on it, so
I wanted to have a discussion about it to see what others think.

Thanks
Wes


Re: Human-readable version of Arrow Schema?

2020-01-09 Thread Francois Saint-Jacques
The desired goal for this feature is trivial modifications, e.g.
within an editor, by data-scientists and researchers.

I'd go for the flatbuffer's json representation as it is stable and
has native support in almost any language or editor due to the
ubiquity of JSON. The C interface schema string representation is
optimized for developers writing parser/codecs and looks like
gibberish to anyone not familiar with python's struct format string.

François


On Wed, Jan 8, 2020 at 8:50 PM Kohei KaiGai  wrote:
>
> Hello,
>
> pg2arrow [*1] has '--dump' mode to print out schema definition of the
> given Apache Arrow file.
> Does it make sense for you?
>
> $ ./pg2arrow --dump ~/hoge.arrow
> [Footer]
> {Footer: version=V4, schema={Schema: endianness=little,
> fields=[{Field: name="id", nullable=true, type={Int32}, children=[],
> custom_metadata=[]}, {Field: name="a", nullable=true, type={Float64},
> children=[], custom_metadata=[]}, {Field: name="b", nullable=true,
> type={Decimal: precision=11, scale=7}, children=[],
> custom_metadata=[]}, {Field: name="c", nullable=true, type={Struct},
> children=[{Field: name="x", nullable=true, type={Int32}, children=[],
> custom_metadata=[]}, {Field: name="y", nullable=true, type={Float32},
> children=[], custom_metadata=[]}, {Field: name="z", nullable=true,
> type={Utf8}, children=[], custom_metadata=[]}], custom_metadata=[]},
> {Field: name="d", nullable=true, type={Utf8},
> dictionary={DictionaryEncoding: id=0, indexType={Int32},
> isOrdered=false}, children=[], custom_metadata=[]}, {Field: name="e",
> nullable=true, type={Timestamp: unit=us}, children=[],
> custom_metadata=[]}, {Field: name="f", nullable=true, type={Utf8},
> children=[], custom_metadata=[]}, {Field: name="random",
> nullable=true, type={Float64}, children=[], custom_metadata=[]}],
> custom_metadata=[{KeyValue: key="sql_command" value="SELECT *,random()
> FROM t"}]}, dictionaries=[{Block: offset=920, metaDataLength=184
> bodyLength=128}], recordBatches=[{Block: offset=1232,
> metaDataLength=648 bodyLength=386112}]}
> [Dictionary Batch 0]
> {Block: offset=920, metaDataLength=184 bodyLength=128}
> {Message: version=V4, body={DictionaryBatch: id=0, data={RecordBatch:
> length=6, nodes=[{FieldNode: length=6, null_count=0}],
> buffers=[{Buffer: offset=0, length=0}, {Buffer: offset=0, length=64},
> {Buffer: offset=64, length=64}]}, isDelta=false}, bodyLength=128}
> [Record Batch 0]
> {Block: offset=1232, metaDataLength=648 bodyLength=386112}
> {Message: version=V4, body={RecordBatch: length=3000,
> nodes=[{FieldNode: length=3000, null_count=0}, {FieldNode:
> length=3000, null_count=60}, {FieldNode: length=3000, null_count=62},
> {FieldNode: length=3000, null_count=0}, {FieldNode: length=3000,
> null_count=56}, {FieldNode: length=3000, null_count=66}, {FieldNode:
> length=3000, null_count=0}, {FieldNode: length=3000, null_count=0},
> {FieldNode: length=3000, null_count=64}, {FieldNode: length=3000,
> null_count=0}, {FieldNode: length=3000, null_count=0}],
> buffers=[{Buffer: offset=0, length=0}, {Buffer: offset=0,
> length=12032}, {Buffer: offset=12032, length=384}, {Buffer:
> offset=12416, length=24000}, {Buffer: offset=36416, length=384},
> {Buffer: offset=36800, length=48000}, {Buffer: offset=84800,
> length=0}, {Buffer: offset=84800, length=384}, {Buffer: offset=85184,
> length=12032}, {Buffer: offset=97216, length=384}, {Buffer:
> offset=97600, length=12032}, {Buffer: offset=109632, length=0},
> {Buffer: offset=109632, length=12032}, {Buffer: offset=121664,
> length=96000}, {Buffer: offset=217664, length=0}, {Buffer:
> offset=217664, length=12032}, {Buffer: offset=229696, length=384},
> {Buffer: offset=230080, length=24000}, {Buffer: offset=254080,
> length=0}, {Buffer: offset=254080, length=12032}, {Buffer:
> offset=266112, length=96000}, {Buffer: offset=362112, length=0},
> {Buffer: offset=362112, length=24000}]}, bodyLength=386112}
>
> [*1] https://heterodb.github.io/pg-strom/arrow_fdw/#using-pg2arrow
>
> 2019年12月7日(土) 6:26 Christian Hudon :
> >
> > Hi,
> >
> > For the uses I would like to make of Arrow, I would need a human-readable
> > and -writable version of an Arrow Schema, that could be converted to and
> > from the Arrow Schema C++ object. Going through the doc for 0.15.1, I don't
> > see anything to that effect, with the closest being the ToString() method
> > on DataType instances, but which is meant for debugging only. (I need an
> > expression of an Arrow Schema that people can read, and that can live
> > outside of the code for a particular operation.)
> >
> > Is a text representation of an Arrow Schema something that is being worked
> > on now? If not, would you folks be interested in me putting up an initial
> > proposal for discussion? Any design constraints I should pay attention to,
> > then?
> >
> > Thanks,
> >
> >   Christian
> > --
> >
> >
> > │ Christian Hudon
> >
> > │ Applied Research Scientist
> >
> >Element AI, 6650 Saint-Urbain #500
> >
> >Montréal, QC, H2S 3G9, Canada
> 

[NIGHTLY] Arrow Build Report for Job nightly-2020-01-09-0

2020-01-09 Thread Crossbow


Arrow Build Report for Job nightly-2020-01-09-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0

Failed Tasks:
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-gandiva-jar-osx
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-macos-r-autobrew
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-pandas-master
- wheel-manylinux2010-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-wheel-manylinux2010-cp38

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-travis-gandiva-jar-trusty
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-09-0-circle-test-conda-python-3.8-dask-master
- 

[jira] [Created] (ARROW-7530) [Developer] Do not include list of commits from PR in squashed summary message

2020-01-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7530:
---

 Summary: [Developer] Do not include list of commits from PR in 
squashed summary message
 Key: ARROW-7530
 URL: https://issues.apache.org/jira/browse/ARROW-7530
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 1.0.0


We might assess whether these messages add useful information to the project's 
commit history. Other projects like Apache Spark have stopped preserving this 
information. This came up in https://github.com/apache/arrow/pull/6136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Pending Java pull requests

2020-01-09 Thread Krisztián Szűcs
Hi,

Roughly 40% of the pending pull requests are tagged as Java [1].
Some of those having long threads and some of them are not
reviewed yet. Considering the upcoming release it would be great
to close or proceed with them.
So any additional help from Java developers would be appreciated!

Thanks, Krisztian

[1]: https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aopen+java


[jira] [Created] (ARROW-7529) [C++][Gandiva] Handle utf8 characters for castVARCHAR(string, int) function

2020-01-09 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7529:
-

 Summary: [C++][Gandiva] Handle utf8 characters for 
castVARCHAR(string, int) function
 Key: ARROW-7529
 URL: https://issues.apache.org/jira/browse/ARROW-7529
 Project: Apache Arrow
  Issue Type: Task
  Components: C++ - Gandiva
Reporter: Projjal Chanda
Assignee: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7528) [Python] The pandas.datetime class (import of datetime.datetime) is deprecated

2020-01-09 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7528:


 Summary: [Python] The pandas.datetime class (import of 
datetime.datetime) is deprecated
 Key: ARROW-7528
 URL: https://issues.apache.org/jira/browse/ARROW-7528
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche
Assignee: Joris Van den Bossche
 Fix For: 0.16.0


The {{pd.datetime}} was actually just an import from {{datetime.datetime}}, and 
is being removed from pandas (to use the stdlib one directly).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7527) [Python] pandas/feather tests failing on pandas master

2020-01-09 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7527:


 Summary: [Python] pandas/feather tests failing on pandas master
 Key: ARROW-7527
 URL: https://issues.apache.org/jira/browse/ARROW-7527
 Project: Apache Arrow
  Issue Type: Test
  Components: Python
Reporter: Joris Van den Bossche


Because I merged a PR in pandas to support Period dtype, some tests in pyarrow 
are now failing (they were using period dtype to test "unsupported" dtypes)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7526) [C++][Compute]: Optimize small integer sorting

2020-01-09 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-7526:
---

 Summary: [C++][Compute]: Optimize small integer sorting
 Key: ARROW-7526
 URL: https://issues.apache.org/jira/browse/ARROW-7526
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Compute
Reporter: Yibo Cai
Assignee: Yibo Cai


Current sorting kernel handles all data types with stl stable_sort. It is 
suboptimal for small integers like Int8, in which case counting sort is more 
suitable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)