from:"Bryan Cutler"

[DISCUSS] Possibility of 12.0.2 release

2023-06-23 Thread Bryan Cutler

Hi All,

I recently became aware of CVE issue
https://github.com/advisories/GHSA-6mjq-h674-j845 with the Java netty
libraries and using the fixed Netty library in version 4.1.94.Final
required a patch for Arrow, already merged in
https://github.com/apache/arrow/issues/36209.

I know the freeze for 13.0.0 is not too far away, but wanted to check about
any interest for a 12.0.2 in the meantime and if there were any other
pending issues that might make the minor release worthwhile?

Thanks,
Bryan

Re: [VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-05 Thread Bryan Cutler

+1 (non-binding)

I ran:
TEST_DEFAULT=0 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
ARROW_GANDIVA=OFF ARROW_PLASMA=OFF dev/release/verify-release-candidate.sh
8.0.0 3

On Wed, May 4, 2022 at 3:23 PM Sutou Kouhei  wrote:

> +1
>
> I ran the followings on Debian GNU/Linux sid:
>
>   * TEST_DEFAULT=0 \
>   TEST_SOURCE=1 \
>   LANG=C \
>   TZ=UTC \
>   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON
> -DCUDAToolkit_ROOT=/usr" \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
>   * TEST_DEFAULT=0 \
>   TEST_APT=1 \
>   LANG=C \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
>   * TEST_DEFAULT=0 \
>   TEST_BINARY=1 \
>   LANG=C \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
>   * TEST_DEFAULT=0 \
>   TEST_JARS=1 \
>   LANG=C \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
>   * TEST_DEFAULT=0 \
>   TEST_WHEELS=1 \
>   LANG=C \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
>   * TEST_DEFAULT=0 \
>   TEST_YUM=1 \
>   LANG=C \
>   dev/release/verify-release-candidate.sh 8.0.0 3
>
> with:
>
>   * .NET SDK (6.0.202)
>   * Python 3.10.4
>   * Python 3.9.12
>   * gcc (Debian 11.2.0-20) 11.2.0
>   * nvidia-cuda-dev 11.4.3-1+b2
>   * openjdk 11.0.14.1 2022-02-08
>   * ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux-gnu]
>
>
> Thanks,
> --
> kou
>
> In 
>   "[VOTE] Release Apache Arrow 8.0.0 - RC3" on Tue, 3 May 2022 22:07:55
> +0200,
>   Krisztián Szűcs  wrote:
>
> > Hi,
> >
> > I would like to propose the following release candidate (RC3) of Apache
> > Arrow version 8.0.0. This is a release consisting of 608
> > resolved JIRA issues[1].
> >
> > This release candidate is based on commit:
> > c3d031250a7fdcfee5e576833bf6f39097602c30 [2]
> >
> > The source release rc3 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
> > The changelog is located at [12].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [13] for how to validate a release
> candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 8.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 8.0.0 because...
> >
> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%208.0.0
> > [2]:
> https://github.com/apache/arrow/tree/c3d031250a7fdcfee5e576833bf6f39097602c30
> > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-8.0.0-rc3
> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
> > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
> > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/8.0.0-rc3
> > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/8.0.0-rc3
> > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/8.0.0-rc3
> > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> > [12]:
> https://github.com/apache/arrow/blob/c3d031250a7fdcfee5e576833bf6f39097602c30/CHANGELOG.md
> > [13]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

Re: [ANNOUNCE] New Arrow committer: Liang-Chi Hsieh

2022-04-27 Thread Bryan Cutler

Congratulations!! That's great news and really glad to have you on the
project!

On Wed, Apr 27, 2022, 11:44 AM Andrew Lamb  wrote:

> On behalf of the Arrow PMC, I'm happy to announce that Liang-Chi Hsieh
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> Andrew
>

Re: [JAVA] JDK Support Policy?

2022-04-05 Thread Bryan Cutler

Thanks for bringing this up Micah. Given that we have finite resources for
CI, I think the oldest active LTS version sounds pretty reasonable.
Ultimately it should be community driven and balance between the available
resources we have and peoples time to patch any issues that come up.

On Tue, Mar 29, 2022 at 1:09 PM David Li  wrote:

> Hey Micah,
>
> Looks like this hasn't gotten much attention, unfortunately. If Spark
> wants to support JDK8 for a while longer, since Spark is an important user
> of Arrow, we might also want/need to support JDK8, unless they're OK with
> using an older version of Arrow, or targeting different Arrow versions for
> different JDK versions? (Not sure if that is customary or reasonable, but
> as I understand it, Spark doesn't use too much of the Arrow APIs - so it
> might be workable?) Either way that would need support from maintainers and
> CI.
>
> -David
>
> On Wed, Mar 16, 2022, at 13:02, Micah Kornfield wrote:
> > I don't think we've ever discussed a formal policy on which JDK versions
> we
> > intend to support in Java.
> >
> > JDK8 is ending active support this month (but still has premium/security
> > support available).  Spark seems like it will continue to support JDK8
> > through its 3.x versions which are still under active development.
> >
> > As a data point I think Python generally tries to be compatible with
> > versions that aren't end of life (and at least some other big projects
> that
> > depend on Arrow also follow this policy.).
> >
> > To a large extent this will boil down to contributors willing to set up
> and
> > maintain the necessary CI infrastructure to ensure that Arrow is working
> on
> > all the existing JDKs.
> >
> > My opinion is we should at least support the oldest LTS version that has
> > active support but would like to hear others' thoughts.
> >
> > Cheers,
> > Micah
>

Re: [VOTE] Extend Arrow Flight SQL with GetXdbcTypeInfo, SQL type info in schemas

2022-03-28 Thread Bryan Cutler

+1 (non-binding)

On Mon, Mar 28, 2022, 7:07 AM Andrew Lamb  wrote:

> Thank you David for pushing this through -- I think the overall FlightSQL
> story is very compelling for the Arrow ecosystem
>
> I am also +1 on the idea, but I haven't had enough time to study the
> implementation in detail yet. I hope to do so over the next few weeks
>
> Andrew
>
> On Sun, Mar 27, 2022 at 7:06 PM David Li  wrote:
>
> > Thanks Wes, and sorry about the lapse.
> >
> > On Sun, Mar 27, 2022, at 14:04, Wes McKinney wrote:
> > > Adding my +1 (binding) vote (technically votes need 3 binding +1's so
> > > this will pass)
> > >
> > > On Fri, Mar 25, 2022 at 4:12 PM David Li  wrote:
> > >>
> > >> The vote has been open for a while now without objection, so the vote
> > passes with 2 +1 votes (binding), 4 +1 votes (non-binding).
> > >>
> > >> Thanks to all the contributors and reviewers who worked on these
> > changes.
> > >>
> > >> On Wed, Mar 23, 2022, at 13:28, José Almeida wrote:
> > >> > Thanks for the reply David. Your answer is correct.
> > >> >
> > >> > The first PR [1], we are not voting for it yet. It contains what
> we've
> > >> > built from the JDBC using flight-sql so far. I don't recall if we
> > already
> > >> > implemented the proposals from PR [2] and [3].
> > >> > I guess that we already have a draft of typeInfo and ColumnMetadata
> on
> > >> > JDBC, but they will need changes after this is approved.
> > >> > Feel free to take a look in the JDBC PR and give us your feedback
> > Andrew.
> > >> > All feedbacks are welcome 
> > >> >
> > >> > The second PR [2] contains the metadata related to the columns, so
> > some
> > >> > operations will be able to send it as response and the JDBC/ODBC
> will
> > have
> > >> > access to it. The metadata that we are sending
> > >> > were the ones we identified, but perhaps there should be more that
> we
> > >> > couldn't identify.
> > >> >
> > >> > The third PR[3] contains another functionality that retrieves
> > information
> > >> > about the types that the data sources support.
> > >> >
> > >> > Feel free to ask any questions you might have 
> > >> >
> > >> > On Tue, Mar 22, 2022 at 10:13 AM David Li 
> > wrote:
> > >> >
> > >> >> Maybe one of the contributors wants to chime in with more details,
> > but:
> > >> >>
> > >> >> PR#12254 isn't part of the vote, it's just the motivation for these
> > >> >> changes. I suppose it isn't fully in sync with the other PRs?
> > >> >> PR#11999 annotates fields with metadata that is used to support
> > JDBC/ODBC
> > >> >> drivers (e.g. the ability to tell what table a column originated
> > from)
> > >> >> PR#11982 is used to retrieve metadata about supported SQL data
> types.
> > >> >>
> > >> >> On Mon, Mar 21, 2022, at 16:08, Andrew Lamb wrote:
> > >> >> > BTW thank you all for your work in this matter (making JDBC/ODBC
> > >> >> clients)!
> > >> >> > I think it is super valuable for the overall ecosystem.
> > >> >> >
> > >> >> > I am sorry for missing the conversation, but I am not clear on
> > what we
> > >> >> are
> > >> >> > voting on. Can we please clarify what changes are proposed to
> > FlightSQL?
> > >> >> >
> > >> >> > The PRs appear to contain changes to FlightSql.proto that seem
> > somewhat
> > >> >> > redundant / contradictory. For example:
> > >> >> >
> > >> >> > Metadata named `CATALOG_NAME` on  [1]
> > >> >> > Metadata named `ARROW:FLIGHT:SQL:CATALOG_NAME` on [2]
> > >> >> > No metadata for catalog name on [3] (but does have other metadata
> > like
> > >> >> > auto_increment)
> > >> >> >
> > >> >> > Andrew
> > >> >> >
> > >> >> > [1] https://github.com/apache/arrow/pull/12254
> > >> >> > [2] https://github.com/apache/arrow/pull/11999/
> > >> >> > [3] https://github.com/apache/arrow/pull/11982
> > >> >> >
> > >> >> >
> > >> >> > On Mon, Mar 21, 2022 at 2:02 PM Antoine Pitrou <
> anto...@python.org
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> >>
> > >> >> >> Moral +1 from me. I've posted minor comments on the specs
> changes
> > in the
> > >> >> >> PRs.
> > >> >> >>
> > >> >> >>
> > >> >> >> Le 16/03/2022 à 20:50, David Li a écrit :
> > >> >> >> > Hello,
> > >> >> >> >
> > >> >> >> > Jose Almeida and James Duong have proposed two additions to
> > Arrow
> > >> >> Flight
> > >> >> >> SQL, an experimental protocol for interacting with SQL databases
> > over
> > >> >> Arrow
> > >> >> >> Flight. The purpose of these additions is to provide necessary
> > metadata
> > >> >> for
> > >> >> >> implementing a JDBC driver on top of Flight SQL [1].
> > >> >> >> >
> > >> >> >> > The additions are as follows:
> > >> >> >> >
> > >> >> >> > - As part of returned schemas, include metadata about the
> > underlying
> > >> >> SQL
> > >> >> >> data type [2].
> > >> >> >> > - Add a new RPC endpoint, GetXdbcTypeInfo, to get metadata
> > about the
> > >> >> >> supported SQL data types [3].
> > >> >> >> >
> > >> >> >> > Both pull requests implement the additions in C++ and Java and
> > contain
> > >> >> >> integration tests.
> > >> >> >> >
> > >> >>

Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, and Kun Liu

2022-03-14 Thread Bryan Cutler

Congrats to all!

On Thu, Mar 10, 2022 at 12:11 AM Alenka Frim  wrote:

> Congratulations all!
>
> On Thu, Mar 10, 2022 at 1:55 AM Yang hao <1371656737...@gmail.com> wrote:
>
> > Congratulations to all!
> >
> > From: Benson Muite 
> > Date: Thursday, March 10, 2022 at 03:45
> > To: dev@arrow.apache.org 
> > Subject: Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang
> > Xudong, Yijie Shen, and Kun Liu
> > Congratulations!
> >
> > On 3/9/22 9:56 PM, David Li wrote:
> > > Congrats everyone!
> > >
> > > On Wed, Mar 9, 2022, at 13:47, Rok Mihevc wrote:
> > >> Congrats all!
> > >>
> > >> Rok
> > >>
> > >> On Wed, Mar 9, 2022 at 7:16 PM QP Hou  wrote:
> > >>>
> > >>> Congratulations to all, well deserved!
> > >>>
> > >>> On Wed, Mar 9, 2022 at 9:37 AM Daniël Heres 
> > wrote:
> > 
> >  Congratulations!
> > 
> >  On Wed, Mar 9, 2022, 18:26 LM  wrote:
> > 
> > > Congrats to you all!
> > >
> > > On Wed, Mar 9, 2022 at 9:19 AM Chao Sun 
> wrote:
> > >
> > >> Congrats all!
> > >>
> > >> On Wed, Mar 9, 2022 at 9:16 AM Micah Kornfield <
> > emkornfi...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Congrats!
> > >>>
> > >>> On Wed, Mar 9, 2022 at 8:36 AM Weston Pace <
> weston.p...@gmail.com>
> > >> wrote:
> > >>>
> >  Congratulations to all of you!
> > 
> >  On Wed, Mar 9, 2022, 4:52 AM Matthew Turner <
> > >> matthew.m.tur...@outlook.com>
> >  wrote:
> > 
> > > Congrats all and thank you for your contributions! It's been
> > great
> > > to
> >  work
> > > with and learn from you all.
> > >
> > > -Original Message-
> > > From: Andrew Lamb 
> > > Sent: Wednesday, March 9, 2022 8:59 AM
> > > To: dev 
> > > Subject: [ANNOUNCE] New Arrow committers: Raphael
> Taylor-Davies,
> > > Wang
> > > Xudong, Yijie Shen, and Kun Liu
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that
> > >
> > > Raphael Taylor-Davies
> > > Wang Xudong
> > > Yijie Shen
> > > Kun Liu
> > >
> > > Have all accepted invitations to become committers on Apache
> > Arrow!
> > > Welcome, thank you for all your contributions so far, and we
> look
> > >> forward
> > > to continuing to drive Apache Arrow forward to an even better
> > place
> > >> in
> >  the
> > > future.
> > >
> > > This exciting growth in committers mirrors the growth of the
> > Arrow
> > >> Rust
> > > community.
> > >
> > > Andrew
> > >
> > > p.s. sorry for the somewhat impersonal email; I was trying to
> > avoid
> > > several very similar emails. I am truly excited for each of
> these
> > > individuals.
> > >
> > 
> > >>
> > >
> >
>

Re: [VOTE] Release Apache Arrow 7.0.0 - Java artifacts

2022-03-14 Thread Bryan Cutler

+1 (non-binding)

On Mon, Mar 14, 2022 at 10:26 AM David Li  wrote:

> My vote: +1 (binding)
>
> Are any other PMC members able to take a quick look?
>
> Thanks,
> David
>
> On Sat, Mar 12, 2022, at 07:31, Kun Liu wrote:
> > +1  non-binding
> > just uploaded the missing pom and don't need to release the 7.0.1
> >
> > Thanks,
> > Kun
> >
> >
> > Rafael Telles  于2022年3月12日周六 04:18写道：
> >
> >> +1
> >>
> >> Em sex., 11 de mar. de 2022 às 16:10, José Almeida <
> >> jose.alme...@simbioseventures.com> escreveu:
> >>
> >> > +1
> >> >
> >> > On Fri, Mar 11, 2022 at 4:09 PM Ray Lum  wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On Fri, Mar 11, 2022 at 6:30 AM David Li 
> wrote:
> >> > >
> >> > > > Hello,
> >> > > >
> >> > > > I would like to propose releasing these additional Java artifacts
> for
> >> > > > Apache Arrow 7.0.0:
> >> > > >
> >> > > > arrow-flight-7.0.0.pom
> >> > > > flight-integration-tests-7.0.0-jar-with-dependencies.jar
> >> > > > flight-integration-tests-7.0.0-javadoc.jar
> >> > > > flight-integration-tests-7.0.0-sources.jar
> >> > > > flight-integration-tests-7.0.0-tests.jar
> >> > > > flight-integration-tests-7.0.0.jar
> >> > > > flight-integration-tests-7.0.0.pom
> >> > > > flight-sql-7.0.0-javadoc.jar
> >> > > > flight-sql-7.0.0-sources.jar
> >> > > > flight-sql-7.0.0-tests.jar
> >> > > > flight-sql-7.0.0.jar
> >> > > > flight-sql-7.0.0.pom
> >> > > >
> >> > > > This release is based on commit:
> >> > > > e90472e35b40f58b17d408438bb8de1641bfe6ef [1]
> >> > > >
> >> > > > Artifacts can be found in the staging Maven repository:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/
> >> > > >
> >> > > > These artifacts were not uploaded during the initial release due
> to
> >> > > > ARROW-15746 [2], now fixed. Thanks Kou for uploading them!
> >> > > >
> >> > > > The vote will be open for at least 72 hours.
> >> > > >
> >> > > > [ ] +1 Release this as part of Apache Arrow 7.0.0
> >> > > > [ ] +0
> >> > > > [ ] -1 Do not release this as part of Apache Arrow 7.0.0
> because...
> >> > > >
> >> > > > [1]:
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/arrow/tree/e90472e35b40f58b17d408438bb8de1641bfe6ef
> >> > > > [2]: https://issues.apache.org/jira/browse/ARROW-15746
> >> > > >
> >> > > > -David
> >> > > >
> >> > > > P.S. If you would like to try building against the staging
> artifacts,
> >> > try
> >> > > > a POM like the following:
> >> > > >
> >> > > > 
> >> > > > http://maven.apache.org/POM/4.0.0;
> >> > > >  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
> >> > > >  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> >> > > > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
> >> > > > 4.0.0
> >> > > > org.example
> >> > > > demo
> >> > > > 1.0-SNAPSHOT
> >> > > > 
> >> > > > 7.0.0
> >> > > > 
> >> > > > 
> >> > > >   
> >> > > > apache-staging
> >> > > > Apache Staging
> >> > > > 
> >> > https://repository.apache.org/content/repositories/staging/
> >> > > > 
> >> > > >   
> >> > > > 
> >> > > > 
> >> > > > 
> >> > > > org.apache.arrow
> >> > > > arrow-vector
> >> > > > ${arrow.version}
> >> > > > 
> >> > > > 
> >> > > > org.apache.arrow
> >> > > > arrow-memory-netty
> >> > > > ${arrow.version}
> >> > > > 
> >> > > > 
> >> > > > org.apache.arrow
> >> > > > arrow-format
> >> > > > ${arrow.version}
> >> > > > 
> >> > > > 
> >> > > > org.apache.arrow
> >> > > > flight-core
> >> > > > ${arrow.version}
> >> > > > 
> >> > > > 
> >> > > > org.apache.arrow
> >> > > > flight-grpc
> >> > > > ${arrow.version}
> >> > > > 
> >> > > > 
> >> > > > 
> >> > > >   
> >> > > > 
> >> > > >   kr.motd.maven
> >> > > >   os-maven-plugin
> >> > > >   1.7.0
> >> > > > 
> >> > > >   
> >> > > > 
> >> > > > 
> >> > > >
> >> > > > On Fri, Mar 11, 2022, at 04:38, Sutou Kouhei wrote:
> >> > > > > Hi,
> >> > > > >
> >> > > > > I've uploaded the followings:
> >> > > > >
> >> > > > > arrow-flight-7.0.0.pom
> >> > > > > flight-integration-tests-7.0.0-jar-with-dependencies.jar
> >> > > > > flight-integration-tests-7.0.0-javadoc.jar
> >> > > > > flight-integration-tests-7.0.0-sources.jar
> >> > > > > flight-integration-tests-7.0.0-tests.jar
> >> > > > > flight-integration-tests-7.0.0.jar
> >> > > > > flight-integration-tests-7.0.0.pom
> >> > > > > flight-sql-7.0.0-javadoc.jar
> >> > > > > flight-sql-7.0.0-sources.jar
> >> > > > > flight-sql-7.0.0-tests.jar
> >> > > > > flight-sql-7.0.0.jar
> >> > > > > flight-sql-7.0.0.pom
> >> > > > >
> >> > > > > You can find them at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Flight/FlightSQL Optimization for Small Results?

2022-03-01 Thread Bryan Cutler

I think this would be a useful feature and be nice to have in Flight core.
For cases like previewing data, you usually just want to get a small amount
of data quickly. Would it make sense to make this part of DoGet since it
still would be returning a record batch? Perhaps a Ticket could be made to
have an optional FlightDescriptor that would serve as an all-in-one shot?

On Tue, Mar 1, 2022 at 8:44 AM David Li  wrote:

> I agree with something along Antoine's proposal, though: maybe we should
> be more structured with the flags (akin to what Micah mentioned with the
> Feature enum).
>
> Also, the flag could be embedded into the Flight SQL messages instead. (So
> in effect, Flight would only add the capability to return data with
> FlightInfo, and it's up to applications, like Flight SQL, to decide how
> they want to take advantage of that.)
>
> I think having a completely separate method and return type and having to
> poll for it beforehand somewhat defeats the purpose of having it/would be
> much harder of a transition.
>
> Also: it should be `repeated FlightInfo inline_data` right? In case we
> also need dictionary batches?
>
> On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
> > Can we just add the following field to the FlightDescriptor message:
> >
> >   bool accept_inline_data = 4;
> >
> > and this one to the FlightInfo message:
> >
> >   FlightData inline_data = 100;
> >
> > Then new clients can `accept_inline_data` to true (the default being
> > false if omitted) to signal servers that they can put the data if
> > `inline_data` if deemed small enough.
> >
> > (the `accept_inline_data` field could also be used to the Criteria
> > message)
> >
> >
> > Alternatively, if the FlightDescriptor expansion looks a bit dirty
> > (FlightDescriptor being used in other contexts where
> > `accept_inline_data` makes no sense), we can instead define a new
> > method:
> >
> >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns (FlightInfo) {}
> >
> > with:
> >
> > message GetFlightInfoRequest {
> >   FlightDescriptor flight_descriptor = 1;
> >   bool accept_inline_data = 2;
> > }
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Mon, 28 Feb 2022 11:29:12 -0800
> > James Duong  wrote:
> >> This seems reasonable, however we need to account for existing Flight
> >> clients that were written before this.
> >>
> >> It seems like the server will need to still handle the ticket returned
> for
> >> getStream() for clients that are unaware of the small result
> optimization.
> >>
> >> On Mon, Feb 28, 2022 at 11:26 AM David Li  wrote:
> >>
> >> > Ah, that makes more sense, that would be a reasonable extension to
> Flight
> >> > overall. (While we're at it, I think it would help to have an
> app_metadata
> >> > field in FlightInfo as well.)
> >> >
> >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
> >> > >>
> >> > >> But it seems reasonable to add a one-shot query path using DoGet.
> >> > >
> >> > >
> >> > > I was thinking more of adding a bytes field to FlightInfo that
> could
> >> > store
> >> > > arrow data.  That way GetFlightInfo would be the only RPC necessary
> for
> >> > > small results when executing a CMD.  The client doesn't necessarily
> know
> >> > > whether a query will return large or small results.
> >> > >
> >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li 
> wrote:
> >> > >
> >> > >> I think the focus was on large result sets (though I don't recall
> this
> >> > >> being discussed before) and supporting multi-node setups (hence
> >> > >> GetFlightInfo/DoGet are separated). But it seems reasonable to add
> a
> >> > >> one-shot query path using DoGet.
> >> > >>
> >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
> >> > >> > I saw the same. A small, stateless query ability would be nice
> >> > >> (connection
> >> > >> > open, initialization, query in one message, the resultset in
> the
> >> > response
> >> > >> > in one message)
> >> > >> >
> >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
> emkornfi...@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not sure if
> I'm
> >> > >> missing
> >> > >> >> it but is there any optimization for small results?  My concern
> is
> >> > that
> >> > >> the
> >> > >> >> overhead of the RPCs for the DoGet after executing the query
> could
> >> > add
> >> > >> >> non-trivial latency for smaller results.
> >> > >> >>
> >> > >> >> Has anybody else thought about this/investigated it?  Am I
> >> > understanding
> >> > >> >> this correctly?
> >> > >> >>
> >> > >> >> Thanks,
> >> > >> >> Micah
> >> > >> >>
> >> > >>
> >> >
> >>
> >>
>

Re: Is 7.0.0 release missing the Java arrow-flight POM?

2022-02-21 Thread Bryan Cutler

Thanks Kou, sounds like if it's in that list then the upload should work.
It would be good if we could make that not so brittle, but in the meantime
I'll open a pr to add the pom. Made
https://issues.apache.org/jira/browse/ARROW-15746 to track.

On Sat, Feb 19, 2022 at 9:39 PM Sutou Kouhei  wrote:

> Hi,
>
> I found that "dev/release/04-binary-download.sh 7.0.0 10
> --task-filter 'java-jars'" doesn't download
> arrow-flight*.pom.
>
> I think that we need to add arrow-flight*.pom to
> https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L761
> .
>
>
> Thanks,
> --
> kou
>
> In <20220220.142345.1095495044811966896@clear-code.com>
>   "Re: Is 7.0.0 release missing the Java arrow-flight POM?" on Sun, 20 Feb
> 2022 14:23:45 +0900 (JST),
>   Sutou Kouhei  wrote:
>
> > Hi,
> >
> > I tried "dev/release/06-java-upload.sh 7.0.0 10" and upload
> > the log to
> > https://gist.github.com/kou/b6d8aa2b9420baa086a7cf0763a9bf37
> > . It seems that arrow-flight isn't uploaded...
> >
> > You can see uploaded files at
> > https://repository.apache.org/#stagingRepositories with your
> > ASF account.
> > Note that you MUST not press the "Close" button! I'll remove
> > them by pressing "Drop" button when we fix this.
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: Is 7.0.0 release missing the Java arrow-flight POM?" on Fri, 18
> Feb 2022 12:51:59 -0800,
> >   Bryan Cutler  wrote:
> >
> >> I wasn't able to able to run the entire process, so I downloaded a few
> >> artifacts from the nightly java-jars and pointed the script there to see
> >> the output:
> >>
> >> dev/release/06-java-upload.sh 7.0.0 10
> >>
> >> deploy:deploy-file -Durl=
> >> https://repository.apache.org/service/local/staging/deploy/maven2
> >> -DrepositoryId=apache.releases.https
> >> -DpomFile=./java-jars/arrow-flight-8.0.0.dev82.pom
> >> -Dfile=./java-jars/arrow-flight-8.0.0.dev82.pom -Dfiles= -Dtypes=
> >> -Dclassifiers=
> >>
> >> deploy:deploy-file -Durl=
> >> https://repository.apache.org/service/local/staging/deploy/maven2
> >> -DrepositoryId=apache.releases.https
> >> -DpomFile=./java-jars/arrow-java-root-8.0.0.dev82.pom
> >> -Dfile=./java-jars/arrow-java-root-8.0.0.dev82.pom -Dfiles= -Dtypes=
> >> -Dclassifiers=
> >>
> >> deploy:deploy-file -Durl=
> >> https://repository.apache.org/service/local/staging/deploy/maven2
> >> -DrepositoryId=apache.releases.https
> >> -DpomFile=./java-jars/flight-core-8.0.0.dev82.pom
> >> -Dfile=./java-jars/flight-core-8.0.0.dev82.jar -Dfiles= -Dtypes=
> >> -Dclassifiers=
> >>
> >> Based on that, it looks like the maven command is correct, it should
> deploy
> >> it just like the arrow-java-root pom. Is there any way to see the log
> >> output for when the release artifacts were deployed?
> >>
> >> On Thu, Feb 17, 2022 at 10:06 PM Bryan Cutler 
> wrote:
> >>
> >>> Sure, I'll take a look at the script.
> >>>
> >>> On Thu, Feb 17, 2022 at 4:39 PM Sutou Kouhei 
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Ah, arrow-flight-*.pom exists on our CI artifacts:
> >>>>
> >>>>
> https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-02-17-0-github-java-jars
> >>>>
> >>>> I don't know why our upload script
> >>>>
> https://github.com/apache/arrow/blob/master/dev/release/06-java-upload.sh
> >>>> doesn't upload it...
> >>>>
> >>>> Could you take a look at it?
> >>>>
> >>>>
> >>>> Thanks,
> >>>> --
> >>>> kou
> >>>>
> >>>> In <
> cabr4zata2bwatbhoic_enfwemzpynvhyys895+5rjxpcgur...@mail.gmail.com>
> >>>>   "Re: Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17
> >>>> Feb 2022 14:14:52 -0800,
> >>>>   Bryan Cutler  wrote:
> >>>>
> >>>> > Yes, it's a little confusing. The original arrow-flight was when the
> >>>> Flight
> >>>> > implementation was a single module, then it got split into
> flight-core
> >>>> and
> >>>> > flight-grpc and arrow-flight was no more Now it's back but only as a
> >>>> parent
> >>>> > POM. I thi

Re: Is 7.0.0 release missing the Java arrow-flight POM?

2022-02-18 Thread Bryan Cutler

I wasn't able to able to run the entire process, so I downloaded a few
artifacts from the nightly java-jars and pointed the script there to see
the output:

dev/release/06-java-upload.sh 7.0.0 10

deploy:deploy-file -Durl=
https://repository.apache.org/service/local/staging/deploy/maven2
-DrepositoryId=apache.releases.https
-DpomFile=./java-jars/arrow-flight-8.0.0.dev82.pom
-Dfile=./java-jars/arrow-flight-8.0.0.dev82.pom -Dfiles= -Dtypes=
-Dclassifiers=

deploy:deploy-file -Durl=
https://repository.apache.org/service/local/staging/deploy/maven2
-DrepositoryId=apache.releases.https
-DpomFile=./java-jars/arrow-java-root-8.0.0.dev82.pom
-Dfile=./java-jars/arrow-java-root-8.0.0.dev82.pom -Dfiles= -Dtypes=
-Dclassifiers=

deploy:deploy-file -Durl=
https://repository.apache.org/service/local/staging/deploy/maven2
-DrepositoryId=apache.releases.https
-DpomFile=./java-jars/flight-core-8.0.0.dev82.pom
-Dfile=./java-jars/flight-core-8.0.0.dev82.jar -Dfiles= -Dtypes=
-Dclassifiers=

Based on that, it looks like the maven command is correct, it should deploy
it just like the arrow-java-root pom. Is there any way to see the log
output for when the release artifacts were deployed?

On Thu, Feb 17, 2022 at 10:06 PM Bryan Cutler  wrote:

> Sure, I'll take a look at the script.
>
> On Thu, Feb 17, 2022 at 4:39 PM Sutou Kouhei  wrote:
>
>> Hi,
>>
>> Ah, arrow-flight-*.pom exists on our CI artifacts:
>>
>> https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-02-17-0-github-java-jars
>>
>> I don't know why our upload script
>> https://github.com/apache/arrow/blob/master/dev/release/06-java-upload.sh
>> doesn't upload it...
>>
>> Could you take a look at it?
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17
>> Feb 2022 14:14:52 -0800,
>>   Bryan Cutler  wrote:
>>
>> > Yes, it's a little confusing. The original arrow-flight was when the
>> Flight
>> > implementation was a single module, then it got split into flight-core
>> and
>> > flight-grpc and arrow-flight was no more Now it's back but only as a
>> parent
>> > POM. I think the current structure seems correct, so I'm not sure if we
>> > would want to go back to using flight-core as the parent. Is it
>> possible to
>> > deploy just the arrow-flight POM? Otherwise, people can't use Flight for
>> > this release without some kind of hack.
>> >
>> > On Thu, Feb 17, 2022 at 12:45 PM Sutou Kouhei 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> It seems that arrow-flight isn't released after 0.15.1:
>> >>   https://repo1.maven.org/maven2/org/apache/arrow/arrow-flight/
>> >>
>> >> But flight-core is released since 0.16.0:
>> >>   https://repo1.maven.org/maven2/org/apache/arrow/flight-core/
>> >>
>> >> flight-grpc is also released:
>> >>   https://repo1.maven.org/maven2/org/apache/arrow/flight-grpc/
>> >>
>> >> Can we use flight-core (and flight-grpc) instead of
>> >> arrow-flight?
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In > >
>> >>   "Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17 Feb
>> >> 2022 09:48:57 -0800,
>> >>   Bryan Cutler  wrote:
>> >>
>> >> > Hi All,
>> >> >
>> >> > Congrats on the 7.0.0 release! I was trying it out and got an error
>> not
>> >> > being able to find arrow-flight-7.0.0.pom. This looks like a new
>> parent
>> >> POM
>> >> > for Flight, so I checked maven central and don't see it deployed
>> there.
>> >> Not
>> >> > sure what could have happened, but maybe it's only me. Anyone else
>> seeing
>> >> > the same issue?
>> >> >
>> >> > Thanks,
>> >> > Bryan
>> >>
>>
>

Re: Is 7.0.0 release missing the Java arrow-flight POM?

2022-02-17 Thread Bryan Cutler

Sure, I'll take a look at the script.

On Thu, Feb 17, 2022 at 4:39 PM Sutou Kouhei  wrote:

> Hi,
>
> Ah, arrow-flight-*.pom exists on our CI artifacts:
>
> https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-02-17-0-github-java-jars
>
> I don't know why our upload script
> https://github.com/apache/arrow/blob/master/dev/release/06-java-upload.sh
> doesn't upload it...
>
> Could you take a look at it?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17 Feb
> 2022 14:14:52 -0800,
>   Bryan Cutler  wrote:
>
> > Yes, it's a little confusing. The original arrow-flight was when the
> Flight
> > implementation was a single module, then it got split into flight-core
> and
> > flight-grpc and arrow-flight was no more Now it's back but only as a
> parent
> > POM. I think the current structure seems correct, so I'm not sure if we
> > would want to go back to using flight-core as the parent. Is it possible
> to
> > deploy just the arrow-flight POM? Otherwise, people can't use Flight for
> > this release without some kind of hack.
> >
> > On Thu, Feb 17, 2022 at 12:45 PM Sutou Kouhei 
> wrote:
> >
> >> Hi,
> >>
> >> It seems that arrow-flight isn't released after 0.15.1:
> >>   https://repo1.maven.org/maven2/org/apache/arrow/arrow-flight/
> >>
> >> But flight-core is released since 0.16.0:
> >>   https://repo1.maven.org/maven2/org/apache/arrow/flight-core/
> >>
> >> flight-grpc is also released:
> >>   https://repo1.maven.org/maven2/org/apache/arrow/flight-grpc/
> >>
> >> Can we use flight-core (and flight-grpc) instead of
> >> arrow-flight?
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17 Feb
> >> 2022 09:48:57 -0800,
> >>   Bryan Cutler  wrote:
> >>
> >> > Hi All,
> >> >
> >> > Congrats on the 7.0.0 release! I was trying it out and got an error
> not
> >> > being able to find arrow-flight-7.0.0.pom. This looks like a new
> parent
> >> POM
> >> > for Flight, so I checked maven central and don't see it deployed
> there.
> >> Not
> >> > sure what could have happened, but maybe it's only me. Anyone else
> seeing
> >> > the same issue?
> >> >
> >> > Thanks,
> >> > Bryan
> >>
>

Re: Is 7.0.0 release missing the Java arrow-flight POM?

2022-02-17 Thread Bryan Cutler

Yes, it's a little confusing. The original arrow-flight was when the Flight
implementation was a single module, then it got split into flight-core and
flight-grpc and arrow-flight was no more Now it's back but only as a parent
POM. I think the current structure seems correct, so I'm not sure if we
would want to go back to using flight-core as the parent. Is it possible to
deploy just the arrow-flight POM? Otherwise, people can't use Flight for
this release without some kind of hack.

On Thu, Feb 17, 2022 at 12:45 PM Sutou Kouhei  wrote:

> Hi,
>
> It seems that arrow-flight isn't released after 0.15.1:
>   https://repo1.maven.org/maven2/org/apache/arrow/arrow-flight/
>
> But flight-core is released since 0.16.0:
>   https://repo1.maven.org/maven2/org/apache/arrow/flight-core/
>
> flight-grpc is also released:
>   https://repo1.maven.org/maven2/org/apache/arrow/flight-grpc/
>
> Can we use flight-core (and flight-grpc) instead of
> arrow-flight?
>
> Thanks,
> --
> kou
>
> In 
>   "Is 7.0.0 release missing the Java arrow-flight POM?" on Thu, 17 Feb
> 2022 09:48:57 -0800,
>   Bryan Cutler  wrote:
>
> > Hi All,
> >
> > Congrats on the 7.0.0 release! I was trying it out and got an error not
> > being able to find arrow-flight-7.0.0.pom. This looks like a new parent
> POM
> > for Flight, so I checked maven central and don't see it deployed there.
> Not
> > sure what could have happened, but maybe it's only me. Anyone else seeing
> > the same issue?
> >
> > Thanks,
> > Bryan
>

Is 7.0.0 release missing the Java arrow-flight POM?

2022-02-17 Thread Bryan Cutler

Hi All,

Congrats on the 7.0.0 release! I was trying it out and got an error not
being able to find arrow-flight-7.0.0.pom. This looks like a new parent POM
for Flight, so I checked maven central and don't see it deployed there. Not
sure what could have happened, but maybe it's only me. Anyone else seeing
the same issue?

Thanks,
Bryan

Re: [ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-27 Thread Bryan Cutler

Congratulations Kou, thanks for all your work!

On Thu, Jan 27, 2022, 4:36 PM Sutou Kouhei  wrote:

> Thanks everyone!!!
>
> In 
>   "[ANNOUNCE] New Arrow PMC chair: Kouhei Sutou" on Tue, 25 Jan 2022
> 11:32:56 -0500,
>   Wes McKinney  wrote:
>
> > I am pleased to announce that we have a new PMC chair and VP as per
> > our newly started tradition of rotating the chair once a year. I have
> > resigned and Kouhei was duly elected by the PMC and approved
> > unanimously by the board. Please join me in congratulating Kou!
> >
> > Thanks,
> > Wes
>

Re: [VOTE] Arrow should state a convention for encoding instants as Timestamp with "UTC" as the time zone

2021-06-30 Thread Bryan Cutler

+1 non-binding

On Wed, Jun 30, 2021, 2:53 AM Weston Pace  wrote:

> This vote is a result of previous discussion[1][2].  This vote is also
> a prerequisite for the PR in [5].
>
> ---
> Some date & time libraries have three temporal concepts.  For the sake
> of this document we will call them LocalDateTime, ZonedDateTime, and
> Instant.  An Instant is a timestamp that has no meaningful reference
> time zone (e.g. events that did not occur on Earth or columns of
> timestamps spanning more than one time zone). For more extensive
> definitions and a discussion of their semantics and uses see [3].
> Currently Arrow describes how to encode two of these three concepts
> into a Timestamp column and there is no guideline on how to store an
> Instant.
>
>
> This proposal states that Arrow should recommend that instants be encoded
> into timestamp columns by setting the timezone string to "UTC".
> ---
>
> For sample arguments (currently grouped as "for changing schema.fbs"
> and "against changing schema.fbs") see [4].  For a detailed definition
> of the terms LocalDateTime, ZonedDateTime, and Instant and a
> discussion of their semantics see [3].  For a straw poll on
> possible ways to handle instants see [2].
>
> This vote will be open for at least 72 hours.
>
> [ ] +1 Update schema.fbs to state the above convention
> [ ] +0
> [ ] -1 Do not make any change
>
> [1]
> https://lists.apache.org/thread.html/r8216e5de3efd2935e3907ad9bd20ce07e430952f84de69b36337e5eb%40%3Cdev.arrow.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/r1bdffc76537ae9c12c37396880087fee9c0eec9000bf6ed4c9850c44%40%3Cdev.arrow.apache.org%3E
> [3]
> https://docs.google.com/document/d/1QDwX4ypfNvESc2ywcT1ygaf2Y1R8SmkpifMV7gpJdBI/edit?usp=sharing
> [4]
> https://docs.google.com/document/d/1xEKRhs-GUSMwjMhgmQdnCNMXwZrA10226AcXRoP8g9E/edit?usp=sharing
> [5] https://github.com/apache/arrow/pull/10629
>

Re: [STRAW POLL] (How) should Arrow define storage for "Instant"s

2021-06-28 Thread Bryan Cutler

C first choice, E second

On Mon, Jun 28, 2021, 8:40 AM Julian Hyde  wrote:

> D
>
> (2nd choice E if we’re doing ranked-choice voting)
>
> Julian
>
> > On Jun 24, 2021, at 12:24 PM, Weston Pace  wrote:
> >
> > The discussion in [1] led to the following question.  Before we
> > proceed on a vote it was decided we should do a straw poll to settle
> > on an approach (which can then be voted on in a +1/-1 fashion).
> >
> > ---
> > Some date & time libraries have three temporal concepts.  For the sake
> > of this document we will call them LocalDateTime, ZonedDateTime, and
> > Instant.  An Instant is a timestamp that has no meaningful reference
> > time zone (e.g. events that did not occur on Earth or columns of
> > timestamps spanning more than one time zone). For more extensive
> > definitions and a discussion of their semantics and uses see [1].
> > Currently Arrow describes how to define two of these three concepts
> > and there is no guideline on how to store an Instant (assuming the
> > proposal in [2] passes).
> >
> >
> > This proposal states that Arrow should define how to encode an Instant
> > into Arrow data.  There are several ways this could happen, some which
> > change schema.fbs and some which do not.
> > ---
> >
> > For sample arguments (currently grouped as "for changing schema.fbs"
> > and "against changing schema.fbs") see [2].  For a detailed definition
> > of the terms LocalDateTime, ZonedDateTime, and Instant and a
> > discussion of their semantics see [3].
> >
> > Options:
> >
> > A) Do nothing, don’t introduce the nuance of “instants” into Arrow
> > B) Do nothing, but update the comments in schema.fbs to acknowledge
> > the existence of the concept and explain that implementations are free
> > to decide if/how to support the type.
> > C) Define timestamp with timezone “UTC” as “instant”.
> > D) Add a first class instant type to schema.fbs
> > E) Add instant as a canonical extension type
> >
> > Note: This is just a straw poll and the results will not be binding in
> > any way but will help craft a future vote.  For example, if the
> > plurality of votes goes to C but a majority of votes is spread across
> > A & B then some flavor of A/B would likely be pursued.
> >
> > Vote for as many options as you would like.
> >
> > I will summarize and send out the results in 72 hours.
>

Re: [ANNOUNCE] New Arrow PMC member: David M Li

2021-06-23 Thread Bryan Cutler

Congrats David!

On Tue, Jun 22, 2021, 7:24 PM Micah Kornfield  wrote:

> Congrats David!
>
> On Tue, Jun 22, 2021 at 7:13 PM Fan Liya  wrote:
>
> > Congratulations David!
> >
> > Best,
> > Liya Fan
> >
> >
> > On Wed, Jun 23, 2021 at 9:44 AM Yibo Cai  wrote:
> >
> > > Congrats David!
> > >
> > > On 6/22/21 8:56 PM, David Li wrote:
> > > > Thanks everyone!
> > > >
> > > > I've learned a lot and had a great time contributing here, and I look
> > > > forward to continuing to work with everybody.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On 2021/06/22 10:54:08, Krisztián Szűcs 
> > > wrote:
> > > >> Congrats David!
> > > >>
> > > >> On Tue, Jun 22, 2021 at 11:19 AM Rok Mihevc 
> > > wrote:
> > > >>>
> > > >>> Congrats David!
> > > >>>
> > > >>> On Tue, Jun 22, 2021 at 4:44 AM Micah Kornfield <
> > emkornfi...@gmail.com
> > > >
> > > >>> wrote:
> > > >>>
> > >  Congrats!
> > > 
> > >  On Mon, Jun 21, 2021 at 7:40 PM Weston Pace <
> weston.p...@gmail.com>
> > > wrote:
> > > 
> > > > Congratulations David!
> > > >
> > > > On Mon, Jun 21, 2021 at 2:24 PM Niranda Perera <
> > > niranda.per...@gmail.com
> > > >
> > > > wrote:
> > > >>
> > > >> Congrats David! :-)
> > > >>
> > > >> On Mon, Jun 21, 2021 at 6:32 PM Nate Bauernfeind <
> > > > nate.bauernfe...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Congratulations! Well earned!
> > > >>>
> > > >>> On Mon, Jun 21, 2021 at 4:20 PM Ian Cook <
> i...@ursacomputing.com>
> > > > wrote:
> > > >>>
> > >  Congratulations, David!
> > > 
> > >  Ian
> > > 
> > > 
> > >  On Mon, Jun 21, 2021 at 6:19 PM Wes McKinney <
> > wesmck...@gmail.com
> > > >
> > > >>> wrote:
> > > >
> > > > The Project Management Committee (PMC) for Apache Arrow has
> > >  invited
> > > > David M Li to become a PMC member and we are pleased to
> > announce
> > > > that David has accepted.
> > > >
> > > > Congratulations and welcome!
> > > 
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Niranda Perera
> > > >> https://niranda.dev/
> > > >> @n1r44 
> > > >
> > > 
> > > >>
> > >
> >
>

Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-07 Thread Bryan Cutler

Congratulations!!

On Sun, Jun 6, 2021, 7:28 PM Sutou Kouhei  wrote:

> Hi,
>
> On behalf of the Arrow PMC, I'm happy to announce that
> Kazuaki Ishizaki has accepted an invitation to become a
> committer on Apache Arrow. Welcome, and thank you for your
> contributions!
>
>
> Thanks,
> --
> kou
>

Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-06 Thread Bryan Cutler

Congrats Ben!

On Thu, May 6, 2021 at 12:05 PM Antoine Pitrou  wrote:

>
> Congratulations Ben :-)
>
>
> Le 06/05/2021 à 21:02, Rok Mihevc a écrit :
> > Congrats!
> >
> > On Thu, May 6, 2021 at 10:49 AM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> >> Congrats Ben!
> >>
> >> On Thu, May 6, 2021 at 9:20 AM Joris Van den Bossche
> >>  wrote:
> >>>
> >>> Congrats!
> >>>
> >>> On Thu, 6 May 2021 at 07:03, Weston Pace 
> wrote:
> >>>
>  Congratulations Ben!
> 
>  On Wed, May 5, 2021 at 6:48 PM Micah Kornfield  >
>  wrote:
> 
> > Congrats!
> >
> > On Wed, May 5, 2021 at 4:33 PM David Li  wrote:
> >
> >> Congrats Ben! Well deserved.
> >>
> >> Best,
> >> David
> >>
> >> On Wed, May 5, 2021, at 19:22, Neal Richardson wrote:
> >>> Congrats Ben!
> >>>
> >>> Neal
> >>>
> >>> On Wed, May 5, 2021 at 4:16 PM Eduardo Ponce <
> >> edponc...@gmail.com
> >> > wrote:
> >>>
>  Great news! Congratulations Ben.
> 
>  ~Eduardo
> 
>  
>  From: Wes McKinney  > wesmckinn%40gmail.com
> 
>  Sent: Wednesday, May 5, 2021, 7:10 PM
>  To: dev
>  Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman
> 
>  The Project Management Committee (PMC) for Apache Arrow has
> >> invited
>  Benjamin Kietzman to become a PMC member and we are pleased to
> > announce
>  that Benjamin has accepted.
> 
>  Congratulations and welcome!
> 
> 
> >>>
> >>
> >
> 
> >>
> >
>

Re: [C++][CI] Make "C++ on s390x" build mandatory?

2021-02-23 Thread Bryan Cutler

+1 sgtm

On Tue, Feb 23, 2021, 9:47 AM Micah Kornfield  wrote:

> +1, but let's keep an eye on it to make sure it remains stable.
>
> On Tue, Feb 23, 2021 at 5:34 AM Kazuaki Ishizaki 
> wrote:
>
> > Thank you. +1 for this proposal,
> >
> > Kazuaki Ishizaki
> >
> >
> >
> > From:   Benjamin Kietzman 
> > To: dev 
> > Date:   2021/02/23 21:19
> > Subject:[EXTERNAL] Re: [C++][CI] Make "C++ on s390x" build
> > mandatory?
> >
> >
> >
> > +1 for making it mandatory
> >
> > On Tue, Feb 23, 2021, 07:07 Krisztián Szűcs 
> > wrote:
> >
> > > Hi!
> > >
> > > On Tue, Feb 23, 2021 at 11:53 AM Antoine Pitrou 
> > > wrote:
> > > >
> > > >
> > > > Hello,
> > > >
> > > > For a while we've had a big endian (s390x-based) build on Travis-CI.
> > > > The build is optional, meaning errors don't actually fail the CI.
> > > >
> > > > The build has been reasonably stable for some time apart for some
> > > > occasional regressions, which often don't get spotted because the
> > build
> > > > is reported as "green" anyway (because it's optional).
> > > >
> > > > I propose we make the build mandatory, to avoid missing further
> > > > regressions.  What do you think?
> > > Sounds good to me.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > >
> >
> >
> >
> >
>

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-20 Thread Bryan Cutler

+1 (non-binding)

I verified binaries and source with the following:
ARROW_TMPDIR=/tmp/arrow-test ARROW_GANDIVA=0 ARROW_PLASMA=0 TEST_DEFAULT=0
TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1
TEST_INTEGRATION_JAVA=1 dev/release/verify-release-candidate.sh source
3.0.0 2

I also have Spark integration tests passing in PR
https://github.com/apache/arrow/pull/9210

On Wed, Jan 20, 2021 at 12:48 PM Sutou Kouhei  wrote:

> +1 (binding)
>
> I ran the followings on Debian GNU/Linux sid:
>
>   * TZ=UTC \
>   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON" \
>   CUDA_TOOLKIT_ROOT=/usr \
>   dev/release/verify-release-candidate.sh source 3.0.0 2
>   * dev/release/verify-release-candidate.sh binaries 3.0.0 2
>   * LANG=C dev/release/verify-release-candidate.sh wheels 3.0.0 2
>
> with:
>
>   * gcc (Debian 10.2.1-6) 10.2.1 20210110
>   * openjdk version "11.0.10-ea" 2021-01-19
>   * nvidia-cuda-dev 11.1.1-3
>
> Thanks,
> --
> kou
>
> In 
>   "[VOTE] Release Apache Arrow 3.0.0 - RC2" on Tue, 19 Jan 2021 04:49:13
> +0100,
>   Krisztián Szűcs  wrote:
>
> > Hi,
> >
> > I would like to propose the following release candidate (RC2) of Apache
> > Arrow version 3.0.0. This is a release consisting of 678
> > resolved JIRA issues[1].
> >
> > This release candidate is based on commit:
> > d613aa68789288d3503dfbd8376a41f2d28b6c9d [2]
> >
> > The source release rc2 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 3.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 3.0.0 because...
> >
> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%203.0.0
> > [2]:
> https://github.com/apache/arrow/tree/d613aa68789288d3503dfbd8376a41f2d28b6c9d
> > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-3.0.0-rc2
> > [4]: https://bintray.com/apache/arrow/centos-rc/3.0.0-rc2
> > [5]: https://bintray.com/apache/arrow/debian-rc/3.0.0-rc2
> > [6]: https://bintray.com/apache/arrow/python-rc/3.0.0-rc2
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/3.0.0-rc2
> > [8]:
> https://github.com/apache/arrow/blob/d613aa68789288d3503dfbd8376a41f2d28b6c9d/CHANGELOG.md
> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

Re: [Java] PR review for ARROW-11173

2021-01-19 Thread Bryan Cutler

Hi Nick,
I left a note in the PR that I will try to review soon, thanks!

On Sun, Jan 17, 2021 at 8:22 PM Nick Bruno  wrote:

> Hi All,
>
> I'd like to get feedback on the pull request I created a little over a
> week ago - https://github.com/apache/arrow/pull/9151
>
> It adds support for Map types in the readers / writers.
>
> Let me know how I can improve it.
>
> Thanks,
> Nick
>

Github check error with ORC JNI adapter

2020-11-03 Thread Bryan Cutler

There seems to be a Github check error with the Java JNI tests for the ORC
adapter that is affecting a lot of recent PRs, see
https://github.com/apache/arrow/pull/8577/checks?check_run_id=1346780145.
>From the log, it looks like some env setting, but I can't tell what's
wrong. Anyone else know?

Thanks,
Bryan

Re: [ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-26 Thread Bryan Cutler

Congrats Wes, well deserved!

On Sun, Oct 25, 2020, 10:17 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Thanks a lot Jacques for taking the flag until now, and congratulations,
> Wes!
>
> On Sun, Oct 25, 2020 at 2:58 PM Wes McKinney  wrote:
>
> > Thanks all!
> >
> > On Sun, Oct 25, 2020 at 6:29 AM Krisztián Szűcs
> >  wrote:
> > >
> > > Congrats Wes!
> > >
> > > On Sun, Oct 25, 2020 at 2:40 AM David Li 
> wrote:
> > > >
> > > > Congratulations Wes!
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On 10/24/20, Li Jin  wrote:
> > > > > Congrats Wes!
> > > > >
> > > > > On Sat, Oct 24, 2020 at 10:05 AM Ying Zhou 
> > wrote:
> > > > >
> > > > >> Congratulations Wes! :)
> > > > >>
> > > > >> Ying
> > > > >>
> > > > >> > On Oct 23, 2020, at 7:35 PM, Jacques Nadeau  >
> > wrote:
> > > > >> >
> > > > >> > I am pleased to announce that we have a new PMC chair and VP as
> > per our
> > > > >> > newly started tradition of rotating the chair once a year. I
> have
> > > > >> resigned
> > > > >> > and Wes was duly elected by the PMC and approved unanimously by
> > the
> > > > >> board.
> > > > >> >
> > > > >> > Please join me in congratulating Wes!
> > > > >> >
> > > > >> > Jacques
> > > > >>
> > > > >>
> > > > >
> >
>

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-14 Thread Bryan Cutler

+1 (non-binding)

I verified binaries and source with:
ARROW_TMPDIR=/tmp/arrow-test ARROW_GANDIVA=0 ARROW_PLASMA=0 TEST_DEFAULT=0
TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1
dev/release/verify-release-candidate.sh source 2.0.0 2

On Wed, Oct 14, 2020 at 2:02 PM Sutou Kouhei  wrote:

> Hi,
>
> I forgot to mention some notes:
>
>   * JavaScript test was failed with system Node.js
> v12.18.4. (INSTALL_NODE=0
> dev/release/verify-release-candidate.sh source)
>
> It works without INSTALL_NODE=0. (Node.js is installed
> by nvm.)
>
>   * Python 3.8 wheel's test was failed. It also failed with
> 1.0.0 and 1.0.1.
>
> https://gist.github.com/kou/62cae5dcf4dcdd8f044fd33c50e8a007
>
>
> Thanks,
> --
> kou
>
> In <20201014.122838.1320330358136283691@clear-code.com>
>   "Re: [VOTE] Release Apache Arrow 2.0.0 - RC2" on Wed, 14 Oct 2020
> 12:28:38 +0900 (JST),
>   Sutou Kouhei  wrote:
>
> > +1 (binding)
> >
> > I ran the followings on Debian GNU/Linux sid:
> >
> >   * TZ=UTC \
> >   ARROW_CMAKE_OPTIONS="-DgRPC_SOURCE=BUNDLED
> -DBoost_NO_BOOST_CMAKE=ON" \
> >   CUDA_TOOLKIT_ROOT=/usr \
> >   dev/release/verify-release-candidate.sh source 2.0.0 2
> >   * dev/release/verify-release-candidate.sh binaries 2.0.0 2
> >   * LANG=C dev/release/verify-release-candidate.sh wheels 2.0.0 2
> >
> > with:
> >
> >   * gcc (Debian 10.2.0-9) 10.2.0
> >   * openjdk version "11.0.8" 2020-07-14
> >   * nvidia-cuda-dev 10.2.89-4
> >
> > Thanks,
> > --
> > kou
> >
> >
> > In 
> >   "[VOTE] Release Apache Arrow 2.0.0 - RC2" on Tue, 13 Oct 2020 17:41:00
> +0200,
> >   Krisztián Szűcs  wrote:
> >
> >> Hi,
> >>
> >> I would like to propose the following release candidate (RC2*) of Apache
> >> Arrow version 2.0.0. This is a release consisting of 561
> >> resolved JIRA issues[1].
> >>
> >> This release candidate is based on commit:
> >> 478286658055bb91737394c2065b92a7e92fb0c1 [2]
> >>
> >> The source release rc2 is hosted at [3].
> >> The binary artifacts are hosted at [4][5][6][7].
> >> The changelog is located at [8].
> >>
> >> Please download, verify checksums and signatures, run the unit tests,
> >> and vote on the release. See [9] for how to validate a release
> candidate.
> >>
> >> The vote will be open for at least 72 hours.
> >>
> >> [ ] +1 Release this as Apache Arrow 2.0.0
> >> [ ] +0
> >> [ ] -1 Do not release this as Apache Arrow 2.0.0 because...
> >>
> >> * RC1 had a CMake issue which surfaced during the packaging builds so
> >> I had to create another release candidate.
> >> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%202.0.0
> >> [2]:
> https://github.com/apache/arrow/tree/478286658055bb91737394c2065b92a7e92fb0c1
> >> [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-2.0.0-rc2
> >> [4]: https://bintray.com/apache/arrow/centos-rc/2.0.0-rc2
> >> [5]: https://bintray.com/apache/arrow/debian-rc/2.0.0-rc2
> >> [6]: https://bintray.com/apache/arrow/python-rc/2.0.0-rc2
> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/2.0.0-rc2
> >> [8]:
> https://github.com/apache/arrow/blob/478286658055bb91737394c2065b92a7e92fb0c1/CHANGELOG.md
> >> [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

Re: conversion between pyspark.DataFrame and pyarrow.Table

2020-09-01 Thread Bryan Cutler

There isn't a direct conversion to/from Spark, I made
https://issues.apache.org/jira/browse/SPARK-29040 a while ago for
conversion to Spark from an Arrow table. If possible, make a comment there
for your use case which might help get support for it.

Bryan

On Mon, Aug 31, 2020, 9:12 PM Micah Kornfield  wrote:

> Hi Radu,
> I'm not a spark expert, but I haven't seen any documentation on direct
> conversion.  You might be better off asking the user@spark or dev@spark
> mailing lists.
>
> Thanks,
> Micah
>
>
> On Wed, Aug 26, 2020 at 1:46 PM Radu Teodorescu
>  wrote:
>
> > Hi,
> > I noticed that arrow is mentioned as an optional intermediary format for
> > converting between pandas DFs and spark DFs. Is there a way to explicitly
> > convert an pyarrow Table to a spark DataFrame and the other way around.
> > Absent that, going pysprak->pandas->pyarrow and back works but it’s
> > obviously suboptimal.
> > Thank you
> > Radu
>

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Bryan Cutler

I also think this would be a worthwhile addition and help the project
expand in more areas. Beyond the Apache Spark optimization use case, having
Arrow interoperability with the Python data science stack on BE would be
very useful. I have looked at the remaining PRs for Java and they seem
pretty minimal and straightforward. Implementing the equivalent record
batch swapping as done in C++ at [1] would be a little more involved, but
still reasonable. Would it make sense to create a branch to apply all
remaining changes with CI to get a better picture before deciding on
bringing into master branch?  I could help out with shepherding this effort
and assist in maintenance, if we decide to accept.

Bryan

[1] https://github.com/apache/arrow/pull/7507

On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney  wrote:

> I think it's well within the right of an implementation to reject BE
> data (or non-native-endian), but if an implementation chooses to
> implement and maintain the endianness conversions, then it does not
> seem so bad to me.
>
> On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau  wrote:
> >
> > And yes, for those of you looking closely, I commented on ARROW-245 when
> it
> > was committed. I just forgot about it.
> >
> > It looks like I had mostly the same concerns then that I do now :) Now
> I'm
> > just more worried about format sprawl...
> >
> > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau 
> wrote:
> >
> > > What do you mean?  The Endianness field (a Big|Little enum) was added 4
> > >> years ago:
> > >> https://issues.apache.org/jira/browse/ARROW-245
> > >
> > >
> > > I didn't realize that was done, my bad. Good example of format rot
> from my
> > > pov.
> > >
> > >
> > >
>

Re: change in pyarrow scalar equality?

2020-08-14 Thread Bryan Cutler

Thanks for the detailed response and background on this Joris! My case was
certainly not necessary to compare pyarrow scalars, so it would have been
better to just raise an error, but there are probably other cases where
that wouldn't be preferred. Anyway, I think it would be a good idea to
document this since I'm sure others will hit it. I made
https://issues.apache.org/jira/browse/ARROW-9750 for adding some docs.

On Thu, Aug 6, 2020 at 12:18 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Hi Bryan,
>
> This indeed changed in 1.0. The full scalar implementation in pyarrow was
> refactored (there were two types of scalars before, see
> https://issues.apache.org/jira/browse/ARROW-9017 /
> https://github.com/apache/arrow/pull/7519).
>
> Due to that PR, there was discussion about what "==" should mean
> (originally triggered by comparison with Null returning Null, but then
> expanded to comparison in general, see the mailing list thread "Extremely
> dubious Python equality semantics" ->
>
> https://lists.apache.org/thread.html/rdd11d3635c751a3a626e14106f1a95f3cddba4dd3bf44247edefde49%40%3Cdev.arrow.apache.org%3E
> ).
> The options for "==" are: is it a strict "data structure / object" equality
> (like the '.equals(..)' method), or is it an "analytical/semantic" equality
> (like the element-wise 'equal' compute method)?
>
> In the end, we opted for the object equality, and then made it actually
> strict to only have it compare equal to actual pyarrow scalars (and not do
> automatic conversion of python scalars to pyarrow scalars). But note that
> even different types will not compare equal like that at the moment:
>
> >>> a = pa.array([1,2,3], type="int64")
> >>> b = pa.array([1,2,3], type="int32")
> >>> a[0] == b[0]
> False
> >>> a[0] == 1
> False
> >>> a[0].equals(1)
> ...
> TypeError: Argument 'other' has incorrect type (expected
> pyarrow.lib.Scalar, got int)
>
> Using the pyarrow.compute module, you _should_ get the analytical equality
> as you expected in this case. However, it seems that the "equal" kernel is
> not yet implemented for differing types (I suppose an automatic casting
> step is still missing):
>
> >>> import pyarrow.compute as pc
> >>> pc.equal(a[0], b[0])
> ...
> ArrowNotImplementedError: Function equal has no kernel matching input types
> (scalar[int64], scalar[int32])
> >>> pc.equal(a[0], 1)
> ...
> TypeError: Got unexpected argument type  for compute function
>
> For this last one, we should probably do an attempt to convert the python
> scalar to a pyarrow scalar, and maybe for the "a[0] == 1" case as well
> (however, coerce to which type if there are multiple possibilities (eg
> int64 vs int32)?)
>
> I agree the new behaviour might be confusing (if you expect semantic
> equality), but on the other hand is also clear avoiding dubious cases. But
> I don't think this is already set in stone, so more feedback is certainly
> welcome.
>
> Joris
>
> On Thu, 6 Aug 2020 at 01:12, Bryan Cutler  wrote:
>
> > Hi all,
> >
> > I came across a behavior change from 0.17.1 when comparing array scalar
> > values with python objects. This used to work for 0.17.1 and before, but
> in
> > 1.0.0 equals always returns false. I saw there was a previous discussion
> on
> > Python equality semantics, but not sure if the conclusion is the behavior
> > I'm seeing. For example:
> >
> > In [4]: a = pa.array([1,2,3])
> >
> >
> > In [5]: a[0] == 1
> >
> > Out[5]: False
> >
> > In [6]: a[0].as_py() == 1
> >
> > Out[6]: True
> >
> > I know the scalars can be converted with `as_py()`, but it does seem a
> > little strange to return False when compared with a python object. Is
> this
> > the expected behavior for 1.0.0+?
> >
> > Thanks,
> > Bryan
> >
>

change in pyarrow scalar equality?

2020-08-05 Thread Bryan Cutler

Hi all,

I came across a behavior change from 0.17.1 when comparing array scalar
values with python objects. This used to work for 0.17.1 and before, but in
1.0.0 equals always returns false. I saw there was a previous discussion on
Python equality semantics, but not sure if the conclusion is the behavior
I'm seeing. For example:

In [4]: a = pa.array([1,2,3])


In [5]: a[0] == 1

Out[5]: False

In [6]: a[0].as_py() == 1

Out[6]: True

I know the scalars can be converted with `as_py()`, but it does seem a
little strange to return False when compared with a python object. Is this
the expected behavior for 1.0.0+?

Thanks,
Bryan

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-22 Thread Bryan Cutler

+1 (non-binding)

I ran release verification script with the following args
ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
dev/release/verify-release-candidate.sh source 1.0.0 2

On Wed, Jul 22, 2020 at 1:19 PM Sutou Kouhei  wrote:

> >>   * Python 3.8 wheel's test is failed.
> > With the same Cython test issue?
>
> Yes.
>
> In 
>   "Re: [VOTE] Release Apache Arrow 1.0.0 - RC2" on Wed, 22 Jul 2020
> 12:33:49 +0200,
>   Krisztián Szűcs  wrote:
>
> > On Wed, Jul 22, 2020 at 3:19 AM Sutou Kouhei  wrote:
> >>
> >> Hi,
> >>
> >> +1 (binding)
> >>
> >> I ran the followings on Debian GNU/Linux sid:
> >>
> >>   * INSTALL_NODE=0 \
> >>   JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
> >>   CUDA_TOOLKIT_ROOT=/usr \
> >>   ARROW_CMAKE_OPTIONS="-DgRPC_SOURCE=BUNDLED
> -DBoost_NO_BOOST_CMAKE=ON" \
> >> dev/release/verify-release-candidate.sh source 1.0.0 2
> >>   * dev/release/verify-release-candidate.sh binaries 1.0.0 2
> >>   * dev/release/verify-release-candidate.sh wheels 1.0.0 2
> >>
> >> with:
> >>
> >>   * gcc version 9.3.0 (Debian 9.3.0-15)
> >>   * openjdk version "1.8.0_252"
> >>   * Node.js v12.18.1
> >>   * nvidia-cuda-dev 10.1.243-6+b1
> >>
> >> Notes:
> >>
> >>   * JavaScript tests are failed without INSTALL_NODE=0 (with
> >> Node.js 14).
> >>
> >>   * Python 3.8 wheel's test is failed.
> > With the same Cython test issue?
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >>
> >> In 
> >>   "[VOTE] Release Apache Arrow 1.0.0 - RC2" on Tue, 21 Jul 2020
> 04:07:39 +0200,
> >>   Krisztián Szűcs  wrote:
> >>
> >> > Hi,
> >> >
> >> > I would like to propose the following release candidate (RC2) of
> Apache
> >> > Arrow version 1.0.0. This is a release consisting of 838
> >> > resolved JIRA issues[1].
> >> >
> >> > This release candidate is based on commit:
> >> > b0d623957db820de4f1ff0a5ebd3e888194a48f0 [2]
> >> >
> >> > The source release rc2 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7].
> >> > The changelog is located at [8].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> >> > and vote on the release. See [9] for how to validate a release
> candidate.
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow 1.0.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow 1.0.0 because...
> >> >
> >> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%201.0.0
> >> > [2]:
> https://github.com/apache/arrow/tree/b0d623957db820de4f1ff0a5ebd3e888194a48f0
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-1.0.0-rc2
> >> > [4]: https://bintray.com/apache/arrow/centos-rc/1.0.0-rc2
> >> > [5]: https://bintray.com/apache/arrow/debian-rc/1.0.0-rc2
> >> > [6]: https://bintray.com/apache/arrow/python-rc/1.0.0-rc2
> >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/1.0.0-rc2
> >> > [8]:
> https://github.com/apache/arrow/blob/b0d623957db820de4f1ff0a5ebd3e888194a48f0/CHANGELOG.md
> >> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Bryan Cutler

t; > In [15]: struct_arr = pa.StructArray.from_arrays([arr],
> > > > names=['f0'])
> > > > > > > > >
> > > > > > > > > In [16]: struct_arr
> > > > > > > > > Out[16]:
> > > > > > > > > 
> > > > > > > > > -- is_valid: all not null
> > > > > > > > > -- child 0 type: timestamp[ns, tz=America/Los_Angeles]
> > > > > > > > >   [
> > > > > > > > > 1970-01-01 00:00:00.0,
> > > > > > > > > 1970-01-01 00:00:00.1,
> > > > > > > > > 1970-01-01 00:00:00.2
> > > > > > > > >   ]
> > > > > > > > >
> > > > > > > > > In [17]: struct_arr.to_pandas()
> > > > > > > > > Out[17]:
> > > > > > > > > 0{'f0': 0}
> > > > > > > > > 1{'f0': 1}
> > > > > > > > > 2{'f0': 2}
> > > > > > > > > dtype: object
> > > > > > > > >
> > > > > > > > > All in all it appears that this part of the project needs
> > some
> > > > TLC
> > > > > > > > >
> > > > > > > > > On Sun, Jul 19, 2020 at 6:16 PM Wes McKinney <
> > > > wesmck...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Well, the problem is that time zones are really finicky
> > > > comparing
> > > > > > > > > > Spark (which uses a localtime interpretation of
> timestamps
> > > > without
> > > > > > > > > > time zone) and Arrow (which has naive timestamps -- a
> > concept
> > > > similar
> > > > > > > > > > but different from the SQL concept TIMESTAMP WITHOUT TIME
> > ZONE
> > > > -- and
> > > > > > > > > > tz-aware timestamps). So somewhere there is a time zone
> > being
> > > > > > > stripped
> > > > > > > > > > or applied/localized which may result in the transferred
> > data
> > > > to/from
> > > > > > > > > > Spark being shifted by the time zone offset. I think it's
> > > > important
> > > > > > > > > > that we determine what the problem is -- if it's a
> problem
> > > > that has
> > > > > > > to
> > > > > > > > > > be fixed in Arrow (and it's not clear to me that it is)
> > it's
> > > > worth
> > > > > > > > > > spending some time to understand what's going on to avoid
> > the
> > > > > > > > > > possibility of patch release on account of this.
> > > > > > > > > >
> > > > > > > > > > On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > If it’s a display problem, should it block the release?
> > > > > > > > > > >
> > > > > > > > > > > Sent from my iPhone
> > > > > > > > > > >
> > > > > > > > > > > > On Jul 19, 2020, at 3:57 PM, Wes McKinney <
> > > > wesmck...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I opened https://issues.apache.org/
> > jira/browse/ARROW-9525
> > > > > > > about the
> > > > > > > > > > > > display problem. My guess is that there are other
> > problems
> > > > > > > lurking
> > > > > > > > > > > > here
> > > > > > > > > > > >
> > > > > > > > > > > >> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <
> > > > > > > wesmck...@gmail.com> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> hi Bryan,
> > > > > > > > > > &

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Bryan Cutler

+0 (non-binding)

I ran verification script for binaries and then source, as below, and both
look good
ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
dev/release/verify-release-candidate.sh source 1.0.0 1

I tried to patch Spark locally to verify the recent change in nested
timestamps and was not able to get things working quite right, but I'm not
sure if the problem is in Spark, Arrow or my patch - hence my vote of +0.

Here is what I'm seeing

```
(Input as datetime)
datetime.datetime(2018, 3, 10, 0, 0)
datetime.datetime(2018, 3, 15, 0, 0)

(Struct Array)
-- is_valid: all not null
-- child 0 type: timestamp[us, tz=America/Los_Angeles]
  [
2018-03-10 00:00:00.00,
2018-03-10 00:00:00.00
  ]
-- child 1 type: timestamp[us, tz=America/Los_Angeles]
  [
2018-03-15 00:00:00.00,
2018-03-15 00:00:00.00
  ]

(Flattened Arrays)
types [TimestampType(timestamp[us, tz=America/Los_Angeles]),
TimestampType(timestamp[us, tz=America/Los_Angeles])]
[
[
  2018-03-10 00:00:00.00,
  2018-03-10 00:00:00.00
], 
[
  2018-03-15 00:00:00.00,
  2018-03-15 00:00:00.00
]]

(Pandas Conversion)
[
0   2018-03-09 16:00:00-08:00
1   2018-03-09 16:00:00-08:00
dtype: datetime64[ns, America/Los_Angeles],

0   2018-03-14 17:00:00-07:00
1   2018-03-14 17:00:00-07:00
dtype: datetime64[ns, America/Los_Angeles]]
```

Based on output of existing a correct timestamp udf, it looks like the
pyarrow Struct Array values are wrong and that's carried through the
flattened arrays, causing the Pandas values to have a negative offset.

Here is output from a working udf with timestamp, the pyarrow Array
displays in UTC time, I believe.

```
(Timestamp Array)
type timestamp[us, tz=America/Los_Angeles]
[
  [
1969-01-01 09:01:01.00
  ]
]

(Pandas Conversion)
0   1969-01-01 01:01:01-08:00
Name: _0, dtype: datetime64[ns, America/Los_Angeles]

(Timezone Localized)
0   1969-01-01 01:01:01
Name: _0, dtype: datetime64[ns]
```

I'll have to dig in further at another time and debug where the values go
wrong.

On Sat, Jul 18, 2020 at 9:51 PM Micah Kornfield 
wrote:

> +1 (binding)
>
> Ran wheel and binary tests on ubuntu 19.04
>
> On Fri, Jul 17, 2020 at 2:25 PM Neal Richardson <
> neal.p.richard...@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > In addition to the usual verification on
> > https://github.com/apache/arrow/pull/7787, I've successfully staged the
> R
> > binary artifacts on Windows (
> > https://github.com/r-windows/rtools-packages/pull/126), macOS (
> > https://github.com/autobrew/homebrew-core/pull/12), and Linux (
> > https://github.com/ursa-labs/arrow-r-nightly/actions/runs/172977277)
> using
> > the release candidate.
> >
> > And I agree with the judgment about skipping a JS release artifact. Looks
> > like there hasn't been a code change since October so there's no point.
> >
> > Neal
> >
> > On Fri, Jul 17, 2020 at 10:37 AM Wes McKinney 
> wrote:
> >
> > > I see the JS failures as well. I think it is a failure localized to
> > > newer Node versions since our JavaScript CI works fine. I don't think
> > > it should block the release given the lack of development activity in
> > > JavaScript [1] -- if any JS devs are concerned about publishing an
> > > artifact then we can skip pushing it to NPM
> > >
> > > @Ryan it seems it may be something environment related on your
> > > machine, I'm on Ubuntu 18.04 and have not seen this.
> > >
> > > On
> > >
> > > >   * Python 3.8 wheel's tests are failed. 3.5, 3.6 and 3.7
> > > > are passed. It seems that -larrow and -larrow_python for
> > > > Cython are failed.
> > >
> > > I suspect this is related to
> > >
> > >
> >
> https://github.com/apache/arrow/commit/120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-0f69784b44040448d17d0e4e8a641fe8
> > > ,
> > > but I don't think it's a blocking issue
> > >
> > > [1]: https://github.com/apache/arrow/commits/master/js
> > >
> > > On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray  wrote:
> > > >
> > > > I've tested Java and it looks good. However the verify script keeps
> on
> > > > bailing with protobuf related errors:
> > > > 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc'
> and
> > > > friends cant find protobuf definitions. A bit odd as cmake can see
> > > protobuf
> > > > headers and builds directly off master work just fine. Has anyone
> else
> > > > experienced this? I am on ubutnu 18.04
> > > >
> > > > On Fri, Jul 17, 2020 at 10:49 AM Antoine Pitrou 
> > > wrote:
> > > >
> > > > >
> > > > > +1 (binding).  I tested on Ubuntu 18.04.
> > > > >
> > > > > * Wheels verification went fine.
> > > > > * Source verification went fine with CUDA enabled and
> > > > > TEST_INTEGRATION_JS=0 TEST_JS=0.
> > > > >
> > > > > I didn't test the binaries.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > Le 17/07/2020 à 03:41, Krisztián Szűcs a écrit :
> > > > > > Hi,
> > > > > >
> >

Re: [VOTE] Add Decimal::bitWidth field to Schema.fbs for forward compatibility

2020-06-25 Thread Bryan Cutler

+1

On Wed, Jun 24, 2020, 10:38 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> +1 (binding)
>

Re: [ANNOUNCE] New Arrow committers: Ji Liu and Liya Fan

2020-06-12 Thread Bryan Cutler

Congratulations!

On Thu, Jun 11, 2020, 9:29 PM Fan Liya  wrote:

> Dear all,
>
> I want to thank you all for all your kind help.
> It is a great honor to work with you in this great community.
> I Hope we can contribute more and make the community better.
>
> Best,
> Liya Fan
>
> On Fri, Jun 12, 2020 at 12:02 PM Ji Liu  wrote:
>
> > Thanks everyone for the warm welcome!
> > It's a great honor for me to be a committer. Looking forward to
> > contributing more to the community.
> >
> > Thanks,
> > Ji Liu
> >
> >
> > paddy horan  于2020年6月12日周五 上午8:52写道：
> >
> > > Congrats!
> > >
> > > 
> > > From: Micah Kornfield 
> > > Sent: Thursday, June 11, 2020 12:59:32 PM
> > > To: dev 
> > > Subject: Re: [ANNOUNCE] New Arrow committers: Ji Liu and Liya Fan
> > >
> > > Congratulations!
> > >
> > > On Thu, Jun 11, 2020 at 9:32 AM David Li 
> wrote:
> > >
> > > > Congrats Ji  & Liya!
> > > >
> > > > David
> > > >
> > > > On 6/11/20, siddharth teotia  wrote:
> > > > > Congratulations!
> > > > >
> > > > > On Thu, Jun 11, 2020 at 7:51 AM Neal Richardson
> > > > > 
> > > > > wrote:
> > > > >
> > > > >> Congratulations, both!
> > > > >>
> > > > >> Neal
> > > > >>
> > > > >> On Thu, Jun 11, 2020 at 7:38 AM Wes McKinney  >
> > > > wrote:
> > > > >>
> > > > >> > On behalf of the Arrow PMC I'm happy to announce that Ji Liu and
> > > Liya
> > > > >> > Fan have been invited to be Arrow committers and they have both
> > > > >> > accepted.
> > > > >> >
> > > > >> > Welcome, and thank you for your contributions!
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Best Regards,*
> > > > > *SIDDHARTH TEOTIA*
> > > > > *2008C6PS540G*
> > > > > *BITS PILANI- GOA CAMPUS*
> > > > >
> > > > > *+91 87911 75932*
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Arrow 0.17.1 - RC1

2020-05-15 Thread Bryan Cutler

+1 (non-binding)

I ran:
ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
dev/release/verify-release-candidate.sh source 0.17.1 1

On Fri, May 15, 2020 at 8:38 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> +1 binding, verified sources and binaries locally (no exclusions).
>
> On Fri, May 15, 2020 at 10:38 AM Neal Richardson
>  wrote:
> >
> > +1 (binding)
> >
> > Verification here: https://github.com/apache/arrow/pull/7170
> >
> > Still haven't worked out the Windows source verification job, but
> > everything else looks good.
> >
> > Neal
> >
> > On Thu, May 14, 2020 at 5:43 PM Sutou Kouhei  wrote:
> >
> > > +1 (binding)
> > >
> > > I ran the followings on Debian GNU/Linux sid:
> > >
> > >   * JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
> > >   CUDA_TOOLKIT_ROOT=/usr \
> > >   ARROW_CMAKE_OPTIONS=-DgRPC_SOURCE=BUNDLED \
> > > dev/release/verify-release-candidate.sh source 0.17.1 1
> > >   * dev/release/verify-release-candidate.sh binaries 0.17.1 1
> > >   * dev/release/verify-release-candidate.sh wheels 0.17.1 1
> > >
> > > with:
> > >
> > >   * gcc (Debian 9.3.0-10) 9.3.0
> > >   * openjdk version "1.8.0_252"
> > >   * Node.JS v12.16.2
> > >   * nvidia-cuda-dev 10.1.168-8
> > >
> > > Thanks,
> > > --
> > > kou
> > > In  >
> > >   "[VOTE] Release Apache Arrow 0.17.1 - RC1" on Thu, 14 May 2020
> 18:23:43
> > > +0200,
> > >   Krisztián Szűcs  wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose the following release candidate (RC1) of
> Apache
> > > > Arrow version 0.17.1. This is a release consisting of 19
> > > > resolved JIRA issues[1].
> > > >
> > > > This release candidate is based on commit:
> > > > ff7ee06020949daf66ac05090753e1a17736d9fa [2]
> > > >
> > > > The source release rc1 is hosted at [3].
> > > > The binary artifacts are hosted at [4][5][6][7].
> > > > The changelog is located at [8].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > > > and vote on the release. See [9] for how to validate a release
> candidate.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow 0.17.1
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow 0.17.1 because...
> > > >
> > > > [1]:
> > >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.1
> > > > [2]:
> > >
> https://github.com/apache/arrow/tree/ff7ee06020949daf66ac05090753e1a17736d9fa
> > > > [3]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.1-rc1
> > > > [4]: https://bintray.com/apache/arrow/centos-rc/0.17.1-rc1
> > > > [5]: https://bintray.com/apache/arrow/debian-rc/0.17.1-rc1
> > > > [6]: https://bintray.com/apache/arrow/python-rc/0.17.1-rc1
> > > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.1-rc1
> > > > [8]:
> > >
> https://github.com/apache/arrow/blob/ff7ee06020949daf66ac05090753e1a17736d9fa/CHANGELOG.md
> > > > [9]:
> > >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > >
>

Re: Python is there support for extension types in Parquet?

2020-04-24 Thread Bryan Cutler

Thanks for the tips Micah and Wes. The storage type is an int64 list, which
works in a roundtrip for parquet by itself. I'll look into it a bit more to
see what is going on.

On Fri, Apr 24, 2020 at 11:50 AM Wes McKinney  wrote:

> Extension types will round trip correctly through Parquet so long as
> the storage type can be roundtripped (as Micah pointed out support for
> reading all nested types is not yet available).
>
> Note for reinforcement that Feather V2 is exactly an Arrow IPC file --
> so IPC files could already do this prior to 0.17.0. People seem to
> like the name so I figured there wasn't much reason to discard the
> "brand" which already has a good reputation in the community.
>
> On Fri, Apr 24, 2020 at 1:26 PM Micah Kornfield 
> wrote:
> >
> > Hi Bryan,
> > Extension types isn't explicitly called out but
> > https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks)
> > might be a good place to track this.
> >
> > Thanks,
> > Micah
> >
> > On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler  wrote:
> >
> > > I've been trying out IO with Arrow's extension types and I was able
> write a
> > > parquet file but reading it back causes an error:
> > > "pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at
> the
> > > code for the parquet reader, it checks nested types and only allows a
> few
> > > specific ones. Is this a known limitation? I couldn't find a JIRA but
> I'll
> > > make one if it is. Alternatively, I was able to convert my extension
> array
> > > to/from a Pandas DataFrame and read/write to a Feather file, which is
> > > awesome - nice work!
> > >
> > > Thanks,
> > > Bryan
> > >
>

Python is there support for extension types in Parquet?

2020-04-24 Thread Bryan Cutler

I've been trying out IO with Arrow's extension types and I was able write a
parquet file but reading it back causes an error:
"pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the
code for the parquet reader, it checks nested types and only allows a few
specific ones. Is this a known limitation? I couldn't find a JIRA but I'll
make one if it is. Alternatively, I was able to convert my extension array
to/from a Pandas DataFrame and read/write to a Feather file, which is
awesome - nice work!

Thanks,
Bryan

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Bryan Cutler

I really would like to see a 1.0.0 release with complete implementations
for C++ and Java. From my experience, that interoperability has been a
major selling point for the project. That being said, my time for
contributions has been pretty limited lately and I know that Java has been
lagging, so if the rest of the community would like to push forward with a
reduced scope, that is okay with me. I'll still continue to do what I can
on Java to fill in the gaps.

Bryan

On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney  wrote:

> Hi all -- are there some opinions about this?
>
> Thanks
>
> On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney  wrote:
> >
> > hi folks,
> >
> > Previously we had discussed a plan for making a 1.0.0 release based on
> > completeness of columnar format integration tests and making
> > forward/backward compatibility guarantees as formalized in
> >
> >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> >
> > In particular, we wanted to demonstrate comprehensive Java/C++
> interoperability.
> >
> > As time has passed we have stalled out a bit on completing integration
> > tests for the "long tail" of data types and columnar format features.
> >
> >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> >
> > As such I wanted to propose a reduction in scope so that we can make a
> > 1.0.0 release sooner. The plan would be as follows:
> >
> > * Endeavor to have integration tests implemented and working in at
> > least one reference implementation (likely to be the C++ library). It
> > seems important to verify that what's in Columnar.rst is able to be
> > unambiguously implemented.
> > * Indicate in Versioning.rst or another place in the documentation the
> > list of data types or advanced columnar format features (like
> > delta/replacement dictionaries) that are not yet fully integration
> > tested.
> >
> > Some of the essential protocol stability details and all of the most
> > commonly used data types have been stable for a long time now,
> > particularly after the recent alignment change. The current list of
> > features that aren't being tested for cross-implementation
> > compatibility should not pose risk to downstream users.
> >
> > Thoughts about this? The 1.0.0 release is an important milestone for
> > the project and will help build continued momentum in developer and
> > user community growth.
> >
> > Thanks
> > Wes
>

Re: Trouble installing archery?

2020-04-13 Thread Bryan Cutler

I had the same problem and Antoine's suggestion was exactly what was wrong.

On Mon, Apr 13, 2020 at 1:27 AM Antoine Pitrou  wrote:

>
> Le 13/04/2020 à 02:42, Micah Kornfield a écrit :
> > When I follow the instructions at
> > https://arrow.apache.org/docs/developers/benchmarks.html
> >
> > "pip install -e dev/archery"
> >
> > I get a permission denied (error pasted at the end in full).  Are there
> > additional steps that need to happen when using virtualenv?
>
> Hmm, I don't think so.  Did you run `git clean -Xfd` in your checkout?
> Perhaps there are root-created files lying around... (this often happens
> with Docker)
>
> Regards
>
> Antoine.
>

[jira] [Created] (ARROW-8386) [Python] pyarrow.jvm raises error for empty Arrays

2020-04-09 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-8386:
---

 Summary: [Python] pyarrow.jvm raises error for empty Arrays
 Key: ARROW-8386
 URL: https://issues.apache.org/jira/browse/ARROW-8386
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler


In the pyarrow.jvm module, when there is an empty array in Java, trying to 
create it in python raises a ValueError. This is because for an empty array, 
Java returns an empty list of buffers, then pyarrow.jvm attempts to create the 
array with pa.Array.from_buffers with an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Bryan Cutler

+1 for using black

On Fri, Mar 27, 2020 at 11:53 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Fri, 27 Mar 2020 at 18:49, Antoine Pitrou  wrote:
>
> >
> > I don't want to be the small minority opposing this so let's go for it.
> > One question though: will we continue to check Cython files using
> > flake8?
> >
>
> Yes, and I think we can continue to check flake8 for python files as well.
> At
> least that is what we do in eg pandas. There are a few things that flake8
> checks that Black doesn't fix automatically. For example comments that are
> too long are not reformatted by black, so it's good to keep flake8 working
> for that.
>
> Joris
>
>
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Thu, 26 Mar 2020 20:37:01 +0100
> > Joris Van den Bossche  wrote:
> > > Hi all,
> > >
> > > I would like to propose adopting Black as code formatter within the
> > python
> > > project. There is an older JIRA issue about this (
> > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to
> > the
> > > mailing list for wider attention.
> > >
> > > Black (https://github.com/ambv/black) is a tool for automatically
> > > formatting python code in ways which flake8 and our other linters
> approve
> > > of (and fill a similar role to clang-format for C++ and cmake-format
> for
> > > cmake). It can also be added to the linting checks on CI and to the
> > > pre-commit hooks like we now run flake8.
> > > Using it ensures python code will be formatted consistently, and more
> > > importantly automates this formatting, letting you focus on more
> > important
> > > matters.
> > >
> > > Black makes some specific formatting choices, and not everybody (me
> > > included) will always like those choices (that's how it goes with
> > something
> > > subjective like formatting). But my experience with using it in some
> > other
> > > big python projects (pandas, dask) has been very positive. You very
> > quickly
> > > get used to how it looks, while it is much nicer to not have to worry
> > about
> > > formatting anymore.
> > >
> > > Best,
> > > Joris
> > >
> >
> >
> >
> >
>

Re: [DISCUSS] Flight testing inconsistency for empty batches

2020-02-28 Thread Bryan Cutler

Thanks all, I agree with validating each record batch independently. I made
https://issues.apache.org/jira/browse/ARROW-7966 to ensure this, and that
will hopefully iron out any kinks in the different implementations.

Thanks,
Bryan

On Wed, Feb 26, 2020 at 3:13 PM Wes McKinney  wrote:

> I agree with independent validation.
>
> On Tue, Feb 25, 2020 at 2:55 PM David Li  wrote:
> >
> > Hey Bryan,
> >
> > Thanks for looking into this issue. I would vote that we should
> > validate each batch independently, so we can catch issues related to
> > the structure of the data and not just the content. C++ doesn't do any
> > detection of empty batches per se, but on both ends it reads all the
> > data into a table, which would eliminate any empty batches.
> >
> > It also wouldn't be reasonable to stop sending batches that are empty,
> > because Flight lets you attach metadata to batches, and so an empty
> > batch might still have metadata that the client or server wants.
> >
> > Best,
> > David
> >
> > On 2/24/20, Bryan Cutler  wrote:
> > > While looking into Null type testing for ARROW-7899, a couple small
> issues
> > > came up regarding Flight integration testing with empty batches (row
> count
> > > == 0) that could be worked out with a quick discussion. It seems there
> is a
> > > small difference between the C++ and Java Flight servers when there are
> > > empty record batches at the end of a stream, more details in PR
> > > https://github.com/apache/arrow/pull/6476.
> > >
> > > The Java server sends all record batches, even the empty ones, and the
> test
> > > client verifies each of these batches matches the batches read from a
> JSON
> > > file. The C++ servers seems to recognize if the end of the stream is
> only
> > > empty batches (please correct me if I'm wrong) and will not serve them.
> > > This seems reasonable, as there is no more actual data left in the
> stream.
> > > The C++ test client reads all batches into a table, does the same for
> the
> > > JSON file, and compares final Tables. I also noticed that empty
> batches in
> > > the middle of the stream will be served.  My questions are:
> > >
> > > 1) What is the expected behavior of a Flight server for empty record
> > > batches, can they be ignored and not sent to the Client?
> > >
> > > 2) Is it good enough to test against a final concatenation of all
> batches
> > > in the stream or should each batch be verified individually to ensure
> the
> > > server is sending out correctly batched data?
> > >
> > > Thanks,
> > > Bryan
> > >
>

[jira] [Created] (ARROW-7966) [Integration][Flight][C++] Client should verify each batch independently

2020-02-28 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7966:
---

 Summary: [Integration][Flight][C++] Client should verify each 
batch independently
 Key: ARROW-7966
 URL: https://issues.apache.org/jira/browse/ARROW-7966
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Bryan Cutler


Currently the C++ Flight test client in {{test_integration_client.cc}} reads 
all batches from JSON into a Table, reads all batches in the flight stream from 
the server into a Table, then compares the Tables for equality.  This is 
potentially a problem because a record batch might have specific information 
that is then lost in the conversion to a Table. For example, if the server 
sends empty batches, the resulting Table would not be different from one with 
no empty batches.

Instead, the client should check each record batch from the JSON file against 
each record batch from the server independently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[DISCUSS] Flight testing inconsistency for empty batches

2020-02-24 Thread Bryan Cutler

While looking into Null type testing for ARROW-7899, a couple small issues
came up regarding Flight integration testing with empty batches (row count
== 0) that could be worked out with a quick discussion. It seems there is a
small difference between the C++ and Java Flight servers when there are
empty record batches at the end of a stream, more details in PR
https://github.com/apache/arrow/pull/6476.

The Java server sends all record batches, even the empty ones, and the test
client verifies each of these batches matches the batches read from a JSON
file. The C++ servers seems to recognize if the end of the stream is only
empty batches (please correct me if I'm wrong) and will not serve them.
This seems reasonable, as there is no more actual data left in the stream.
The C++ test client reads all batches into a table, does the same for the
JSON file, and compares final Tables. I also noticed that empty batches in
the middle of the stream will be served.  My questions are:

1) What is the expected behavior of a Flight server for empty record
batches, can they be ignored and not sent to the Client?

2) Is it good enough to test against a final concatenation of all batches
in the stream or should each batch be verified individually to ensure the
server is sending out correctly batched data?

Thanks,
Bryan

[jira] [Created] (ARROW-7933) [Java][Flight][Tests] Add roundtrip tests for Java Flight Test Client

2020-02-24 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7933:
---

 Summary: [Java][Flight][Tests] Add roundtrip tests for Java Flight 
Test Client
 Key: ARROW-7933
 URL: https://issues.apache.org/jira/browse/ARROW-7933
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Reporter: Bryan Cutler


There should be some built-in roundtrip tests for Java Flight 
IntegrationTestClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: PR Dashboard for Java?

2020-02-12 Thread Bryan Cutler

Works now, thanks! I added a page for Java open PRs
https://cwiki.apache.org/confluence/display/ARROW/Java+Open+Patches

On Tue, Feb 11, 2020 at 12:08 PM Wes McKinney  wrote:

> Weird. Try now
>
> On Tue, Feb 11, 2020 at 1:03 PM Bryan Cutler  wrote:
> >
> > Wes, it doesn't seem to have worked. Could you double check the
> privileges
> > for me (cutlerb)? I'd also like to add something to the verify release
> > candidate page. It's weird, I made an edit before on another page a while
> > ago, not sure what happened.  Thanks!
> >
> > On Mon, Jan 27, 2020 at 2:23 PM Wes McKinney 
> wrote:
> >
> > > Bryan -- I just gave you (cutlerb) Confluence edit privileges. These
> > > have to be explicitly managed on a per-user basis to avoid spam
> > > problems
> > >
> > > On Mon, Jan 27, 2020 at 4:12 PM Bryan Cutler 
> wrote:
> > > >
> > > > Thanks Neal, but it doesn't look like I have confluence privileges.
> > > That's
> > > > fine though, the github interface is easy enough.
> > > >
> > > > On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
> > > > neal.p.richard...@gmail.com> wrote:
> > > >
> > > > > If you have confluence privileges, duplicate a page like
> > > > >
> https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard
> > > and
> > > > > then edit the Jira query (something like status in open/in
> > > > > progress/reopened, labels = pull-request-available, component =
> java,
> > > > > project = ARROW) if you want to make it Java issues that have pull
> > > requests
> > > > > open.
> > > > >
> > > > > Or you could bookmark
> > > > >
> > > > >
> > >
> https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> > > > > or https://github.com/apache/arrow/labels/lang-java
> > > > >
> > > > > Neal
> > > > >
> > > > > On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler 
> > > wrote:
> > > > >
> > > > > > I saw on Confluence that other Arrow components have PR
> dashboards,
> > > but I
> > > > > > don't see one for Java? I think it would be helpful, is it
> difficult
> > > to
> > > > > add
> > > > > > one for Java? I'm happy to do it if someone could point me in the
> > > right
> > > > > > direction. Thanks!
> > > > > >
> > > > > > Bryan
> > > > > >
> > > > >
> > >
>

Re: PR Dashboard for Java?

2020-02-11 Thread Bryan Cutler

Wes, it doesn't seem to have worked. Could you double check the privileges
for me (cutlerb)? I'd also like to add something to the verify release
candidate page. It's weird, I made an edit before on another page a while
ago, not sure what happened.  Thanks!

On Mon, Jan 27, 2020 at 2:23 PM Wes McKinney  wrote:

> Bryan -- I just gave you (cutlerb) Confluence edit privileges. These
> have to be explicitly managed on a per-user basis to avoid spam
> problems
>
> On Mon, Jan 27, 2020 at 4:12 PM Bryan Cutler  wrote:
> >
> > Thanks Neal, but it doesn't look like I have confluence privileges.
> That's
> > fine though, the github interface is easy enough.
> >
> > On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
> > neal.p.richard...@gmail.com> wrote:
> >
> > > If you have confluence privileges, duplicate a page like
> > > https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard
> and
> > > then edit the Jira query (something like status in open/in
> > > progress/reopened, labels = pull-request-available, component = java,
> > > project = ARROW) if you want to make it Java issues that have pull
> requests
> > > open.
> > >
> > > Or you could bookmark
> > >
> > >
> https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> > > or https://github.com/apache/arrow/labels/lang-java
> > >
> > > Neal
> > >
> > > On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler 
> wrote:
> > >
> > > > I saw on Confluence that other Arrow components have PR dashboards,
> but I
> > > > don't see one for Java? I think it would be helpful, is it difficult
> to
> > > add
> > > > one for Java? I'm happy to do it if someone could point me in the
> right
> > > > direction. Thanks!
> > > >
> > > > Bryan
> > > >
> > >
>

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-04 Thread Bryan Cutler

+1

I had some trouble due to ARROW-7760 at first, but applied the same patch
and passed. I ran the command:
TMPDIR=/tmp/arrow TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1
TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
dev/release/verify-release-candidate.sh source 0.16.0 2

On Tue, Feb 4, 2020 at 10:08 AM Wes McKinney  wrote:

> +1 (binding)
>
> Some patches were required to the verification scripts but I have run:
>
> * Full source verification on Ubuntu 18.04
> * Linux binary verification
> * Source verification on Windows 10 (needed ARROW-6757)
> * Windows binary verification. Note that Python 3.8 wheel is broken
> (see ARROW-7755). Whoever uploads the wheels to PyPI _SHOULD NOT_
> upload this 3.8 wheel until we know what's wrong (if we upload a
> broken wheel then `pip install pyarrow==0.16.0` will be permanently
> broken on Windows/Python 3.8)
>
> On Mon, Feb 3, 2020 at 9:26 PM Francois Saint-Jacques
>  wrote:
> >
> > Tested on ubuntu 18.04 for the source release.
> >
> > On Mon, Feb 3, 2020 at 10:07 PM Francois Saint-Jacques
> >  wrote:
> > >
> > > +1
> > >
> > > Binaries verification didn't have any issues.
> > > Sources verification worked with some local environment hiccups
> > >
> > > François
> > >
> > > On Mon, Feb 3, 2020 at 8:46 PM Andy Grove 
> wrote:
> > > >
> > > > +1 (binding) based on running the Rust tests
> > > >
> > > > Thanks.
> > > >
> > > > On Thu, Jan 30, 2020 at 8:13 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose the following release candidate (RC2) of
> Apache
> > > > > Arrow version 0.16.0. This is a release consisting of 728
> > > > > resolved JIRA issues[1].
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 729a7689fd87572e6a14ad36f19cd579a8b8d9c5 [2]
> > > > >
> > > > > The source release rc2 is hosted at [3].
> > > > > The binary artifacts are hosted at [4][5][6][7].
> > > > > The changelog is located at [8].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > and vote on the release. See [9] for how to validate a release
> candidate.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow 0.16.0
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> > > > >
> > > > > [1]:
> > > > >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > > > > [2]:
> > > > >
> https://github.com/apache/arrow/tree/729a7689fd87572e6a14ad36f19cd579a8b8d9c5
> > > > > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc2
> > > > > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc2
> > > > > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc2
> > > > > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc2
> > > > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc2
> > > > > [8]:
> > > > >
> https://github.com/apache/arrow/blob/729a7689fd87572e6a14ad36f19cd579a8b8d9c5/CHANGELOG.md
> > > > > [9]:
> > > > >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > > > >
>

[jira] [Created] (ARROW-7770) [Release] Archery does not use correct integration test args

2020-02-04 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7770:
---

 Summary: [Release] Archery does not use correct integration test 
args
 Key: ARROW-7770
 URL: https://issues.apache.org/jira/browse/ARROW-7770
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: Bryan Cutler
Assignee: Bryan Cutler


When using release verification script and selecting integration tests, Archery 
ignores selected tests and runs all tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Java] Issues with IntelliJ + errorprone + OpenJDK

2020-02-04 Thread Bryan Cutler

Here is where it came up at, looks to be installed in the m2 repository

bryan@lm-P50 ~ $ find ~/ -name "failureaccess-*.jar" -type f
/home/bryan/.m2/repository/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar
/home/bryan/.IdeaIC2019.2/system/download-cache/error-prone/2.3.3/failureaccess-1.0.1.jar

Also error-prone jar is in the IntelliJ plugin directory
find ~/ -name "error-prone*.jar" -type f
/home/bryan/.IdeaIC2019.2/config/plugins/error-prone/lib/error-prone.jar
/home/bryan/.IdeaIC2019.2/config/plugins/error-prone/lib/jps/error-prone-jps-plugin.jar




On Tue, Feb 4, 2020 at 7:44 AM Andy Grove  wrote:

> Actually, central.maven.org doesn't even exist ...
>
> On Tue, Feb 4, 2020 at 8:28 AM Andy Grove  wrote:
>
> > Thanks for the help but I followed the same instructions and get this
> > error:
> >
> > Error:Failed to download error-prone compiler JARs: Failed to download '
> >
> http://central.maven.org/maven2/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar
> > ':
> > central.maven.org
> >
> > The issue is that this maven central no longer supports http and requires
> > https. Maybe I could manually install this file somewhere? I did try
> > installing in my local m2 repo but that didn't work.
> >
> > If anyone could scan their local drive for this file and let me know
> where
> > it is installed that could unblock me.
> >
> > Thanks,
> >
> > Andy.
> >
> >
> >
> > On Mon, Feb 3, 2020 at 6:24 PM Fan Liya  wrote:
> >
> >> I was having the same problem, and it was solved by
> >>
> >> 1. Install the "Error Prone Compiler" plugin to intellij
> >> 2. setting "Settings/Build, Execution, Deployment/Compiler/Java
> >> Compiler/Use compiler" to "Javac with error-prone"
> >>
> >> I am using Intellij 2019.3 (Community Edition)
> >>
> >> Best,
> >> Liya Fan
> >>
> >> On Tue, Feb 4, 2020 at 7:25 AM Bryan Cutler  wrote:
> >>
> >> > Ahh, now that you sent that link it jogged my memory. A while ago I
> >> think I
> >> > did see that error and installed the error prone compiler plugin
> >> mentioned.
> >> > It worked after that I believe, but I am on IntillJ 2019.2.4 on
> Ubuntu,
> >> and
> >> > it was a while ago so maybe something changed. If there is anything I
> >> can
> >> > check to help you out, let me know.
> >> >
> >> > On Mon, Feb 3, 2020 at 12:22 PM Andy Grove 
> >> wrote:
> >> >
> >> > > So it turns out there are specific instructions [1] for using
> >> errorprone
> >> > > with IntelliJ. Unfortunately, this doesn't work due to a bug in
> >> IntelliJ
> >> > > that was fixed a few days ago but not released yet [2].
> >> > >
> >> > > [1] https://errorprone.info/docs/installation
> >> > > [2]
> >> > >
> >> > >
> >> >
> >>
> https://intellij-support.jetbrains.com/hc/en-us/community/posts/360007052380-error-prone-compile-plugin-cant-download-jar
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Feb 3, 2020 at 1:10 PM Andy Grove 
> >> wrote:
> >> > >
> >> > > > Hi Bryan,
> >> > > >
> >> > > > Yes, I tried opening as a Maven project and got the same error.
> I'm
> >> > using
> >> > > > OpenJDK 1.8.0_232 on both Ubuntu 19.04 and macOS 10.14.6 and get
> the
> >> > same
> >> > > > error on both. I'm using IntelliJ Ultimate 2019.3.2. Building from
> >> the
> >> > > > command line with Maven works fine.
> >> > > >
> >> > > > Very odd. I'll guess I'll do a little more research on errorprone.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Andy.
> >> > > >
> >> > > >
> >> > > > On Mon, Feb 3, 2020 at 12:50 PM Bryan Cutler 
> >> > wrote:
> >> > > >
> >> > > >> Hi Andy,
> >> > > >> What is your JDK version? I haven't seen that exact error, did
> you
> >> > open
> >> > > >> Arrow as a Maven project in Intellij?
> >> > > >>
> >> > > >> On Mon, Feb 3, 2020 at 7:47 AM Andy Grove  >
> >> > > wrote:
> >> > >

Re: [Java] Issues with IntelliJ + errorprone + OpenJDK

2020-02-03 Thread Bryan Cutler

Ahh, now that you sent that link it jogged my memory. A while ago I think I
did see that error and installed the error prone compiler plugin mentioned.
It worked after that I believe, but I am on IntillJ 2019.2.4 on Ubuntu, and
it was a while ago so maybe something changed. If there is anything I can
check to help you out, let me know.

On Mon, Feb 3, 2020 at 12:22 PM Andy Grove  wrote:

> So it turns out there are specific instructions [1] for using errorprone
> with IntelliJ. Unfortunately, this doesn't work due to a bug in IntelliJ
> that was fixed a few days ago but not released yet [2].
>
> [1] https://errorprone.info/docs/installation
> [2]
>
> https://intellij-support.jetbrains.com/hc/en-us/community/posts/360007052380-error-prone-compile-plugin-cant-download-jar
>
>
>
> On Mon, Feb 3, 2020 at 1:10 PM Andy Grove  wrote:
>
> > Hi Bryan,
> >
> > Yes, I tried opening as a Maven project and got the same error. I'm using
> > OpenJDK 1.8.0_232 on both Ubuntu 19.04 and macOS 10.14.6 and get the same
> > error on both. I'm using IntelliJ Ultimate 2019.3.2. Building from the
> > command line with Maven works fine.
> >
> > Very odd. I'll guess I'll do a little more research on errorprone.
> >
> > Thanks,
> >
> > Andy.
> >
> >
> > On Mon, Feb 3, 2020 at 12:50 PM Bryan Cutler  wrote:
> >
> >> Hi Andy,
> >> What is your JDK version? I haven't seen that exact error, did you open
> >> Arrow as a Maven project in Intellij?
> >>
> >> On Mon, Feb 3, 2020 at 7:47 AM Andy Grove 
> wrote:
> >>
> >> > I'm working on the Java codebase and cannot run code inside IntelliJ
> >> and it
> >> > looks like some kind of compatibility issue between errorprone and the
> >> JDK
> >> > that IntelliJ is using. I'm hoping other Java committers have found a
> >> > solution already to this?
> >> >
> >> > Error:java: java.lang.RuntimeException: java.lang.NoSuchMethodError:
> >> >
> >> >
> >>
> com.sun.tools.javac.util.JavacMessages.add(Lcom/sun/tools/javac/util/JavacMessages$ResourceBundleHelper;)V
> >> > Error:java: Caused by: java.lang.NoSuchMethodError:
> >> >
> >> >
> >>
> com.sun.tools.javac.util.JavacMessages.add(Lcom/sun/tools/javac/util/JavacMessages$ResourceBundleHelper;)V
> >> > Error:java: at
> >> >
> >> >
> >>
> com.google.errorprone.BaseErrorProneJavaCompiler.setupMessageBundle(BaseErrorProneJavaCompiler.java:202)
> >> > Error:java: at
> >> >
> >> >
> >>
> com.google.errorprone.ErrorProneJavacPlugin.init(ErrorProneJavacPlugin.java:40)
> >> >
> >>
> >
>

Re: [Java] Issues with IntelliJ + errorprone + OpenJDK

2020-02-03 Thread Bryan Cutler

Hi Andy,
What is your JDK version? I haven't seen that exact error, did you open
Arrow as a Maven project in Intellij?

On Mon, Feb 3, 2020 at 7:47 AM Andy Grove  wrote:

> I'm working on the Java codebase and cannot run code inside IntelliJ and it
> looks like some kind of compatibility issue between errorprone and the JDK
> that IntelliJ is using. I'm hoping other Java committers have found a
> solution already to this?
>
> Error:java: java.lang.RuntimeException: java.lang.NoSuchMethodError:
>
> com.sun.tools.javac.util.JavacMessages.add(Lcom/sun/tools/javac/util/JavacMessages$ResourceBundleHelper;)V
> Error:java: Caused by: java.lang.NoSuchMethodError:
>
> com.sun.tools.javac.util.JavacMessages.add(Lcom/sun/tools/javac/util/JavacMessages$ResourceBundleHelper;)V
> Error:java: at
>
> com.google.errorprone.BaseErrorProneJavaCompiler.setupMessageBundle(BaseErrorProneJavaCompiler.java:202)
> Error:java: at
>
> com.google.errorprone.ErrorProneJavacPlugin.init(ErrorProneJavacPlugin.java:40)
>

Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-29 Thread Bryan Cutler

An update on Spark integration tests: the new error looks to be a
regression so I made https://issues.apache.org/jira/browse/ARROW-7723 and
marked as a blocker. It's possible to work around this bug, so I wouldn't
call it a hard blocker if we need to proceed with the release.

On Wed, Jan 29, 2020 at 7:45 AM Neal Richardson 
wrote:

> The place where the segfault is triggered in the R nightlies is a couple of
> tests after the one I added in that patch. If that patch is causing the
> segfaults, we can skip the new test (
>
> https://github.com/apache/arrow/blob/master/r/tests/testthat/test-parquet.R#L125
> )
> and investigate later. The patch is exercising previously existing
> codepaths that were not tested, so I don't think that identifying and
> fixing the segfault should be release blocking (though we should clearly
> fix it).
>
> Neal
>
>
>
> On Wed, Jan 29, 2020 at 7:33 AM David Li  wrote:
>
> > The Flight leak should be unrelated to that commit, the failing test
> > already existed before - it's a flaky test
> > https://issues.apache.org/jira/browse/ARROW-7721.
> >
> > I'm hoping to look at the issue this week but we might just want to
> > disable the test for now.
> >
> > David
> >
> > On 1/29/20, Krisztián Szűcs  wrote:
> > > Hi,
> > >
> > > - The fuzzit builds has been disabled by Neal on the current master.
> > > - Created a PR to resolve occasionally failing python dataset tests [1]
> > > - Merged the fix for C# timezone error [2]
> > > - Merged various fixes for the release scripts.
> > > - The nightly Gandiva OS X build is still failing, but because of a
> > travis
> > >   deployment timeout, which shouldn't block the release.
> > >
> > > We still have failing tests:
> > > - failing Spark integration test
> > > - failing nightly R builds (see the 2020-01-29 nightly report)
> > > - master reports a Java flight memory leak
> > >
> > > Spark:
> > > Joris created a fix for the immediate issue [3], but now we have a
> > > different
> > > spark test error, see the discussion in the PR [3].
> > > I put up a PR [6] to check the regressions 0.16 arrow release would
> > > introduce
> > > for spark interoperability, and it turns out that arrow 0.15.1 is not
> > > compatible
> > > with neither spark 2.4.4 nor spark 2.4.5-rc1, so 0.16 arrow release
> could
> > > only be compatible with spark 3.0 or spark master which we have tests
> > for.
> > > So I'm a bit confused how to interpret arrow backward compatibility
> with
> > > Spark, thus what should and what should not block the release.
> > > Either way we'll need to fix the remaining spark issues and add nightly
> > > spark
> > > integration tests for both the next spark release and spark master.
> > >
> > > R:
> > > There is the same segfault in each R nightly builds [4]. There was a
> > single
> > > change [5] which could introduce the regression compared to the
> previous
> > > builds.
> > > I've tried to reproduce the builds using docker-compose, but locally
> > > 3.6-bionic
> > > has passed for me. I'm trying to wipe my local cache and rerun to see
> > > whether I can reproduce it.
> > >
> > > Java/Flight leak:
> > > The current master reports memory leak [6] which I guess is surfaced by
> > > change [7]
> > >
> > > If we manage to fix the issues above today than I can cut RC2 tomorrow.
> > >
> > > Thanks, Krisztian
> > >
> > > [1]: https://github.com/apache/arrow/pull/6319
> > > [2]: https://github.com/apache/arrow/pull/6309
> > > [3]: https://github.com/apache/arrow/pull/6312
> > > [4]: https://github.com/ursa-labs/crossbow/branches/all?query=r-base
> > > [5]:
> > >
> >
> https://github.com/apache/arrow/commit/8b7911b086d120359e2000fbedb0c38c0f13f683
> > > [6]: https://github.com/apache/arrow/runs/415037585#step:5:1533
> > > [7]:
> > >
> >
> https://github.com/apache/arrow/commit/8b42288f58caa84a40bb7a13c1731ff919c934f2
> > >
> > > On Wed, Jan 29, 2020 at 11:06 AM Sutou Kouhei 
> > wrote:
> > >>
> > >> Hi,
> > >>
> > >> > Thank you.  After the C# download fix, I have the following C# test
> > >> > failure:
> > >> > https://gist.github.com/pitrou/d82ed1ff80db43b63f0c3d5e5f2474a4
> > >>
> > >> https://github.com/apache/arrow/pull/6309
> > >> will fix it.
> > >>
> > >> I think that this is a test problem, not an implementation
> > >> problem.
> > >>
> > >>
> > >> Workaround:
> > >>
> > >>   TZ=UTC dev/release/verify-release-candidate.sh ...
> > >>
> > >>
> > >> Thanks,
> > >> --
> > >> kou
> > >>
> > >> In <2bd07a17-600b-7f49-3ea1-a0b1acc91...@python.org>
> > >>   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC1" on Wed, 29 Jan 2020
> > >> 10:11:46 +0100,
> > >>   Antoine Pitrou  wrote:
> > >>
> > >> >
> > >> > Thank you.  After the C# download fix, I have the following C# test
> > >> > failure:
> > >> > https://gist.github.com/pitrou/d82ed1ff80db43b63f0c3d5e5f2474a4
> > >> >
> > >> > Regards
> > >> >
> > >> > Antoine.
> > >> >
> > >> >
> > >> >
> > >> > Le 29/01/2020 à 00:42, Sutou Kouhei a écrit :
> > >> >> Hi,
> > >> >>
> > >> >>> Source

[jira] [Created] (ARROW-7723) [Python] StructArray timestamp type with timezone to_pandas convert error

2020-01-29 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7723:
---

 Summary: [Python] StructArray  timestamp type with timezone 
to_pandas convert error
 Key: ARROW-7723
 URL: https://issues.apache.org/jira/browse/ARROW-7723
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When a {{StructArray}} has a child that is a timestamp with a timezone, the 
{{to_pandas}} conversion outputs an int64 instead of a timestamp
{code:java}
In [1]: import pyarrow as pa 
   ...: import pandas as pd 
   ...: arr = pa.array([{'start': pd.Timestamp.now(), 'end': 
pd.Timestamp.now()}]) 
   ...: 
 

In [2]: arr.to_pandas() 
  
Out[2]: 
0{'end': 2020-01-29 11:38:02.792681, 'start': 2...
dtype: object

In [3]: ts = pd.Timestamp.now() 
 

In [4]: arr2 = pa.array([ts], type=pa.timestamp('us', tz='America/New_York'))   
 

In [5]: arr2.to_pandas()
  
Out[5]: 
0   2020-01-29 06:38:47.848944-05:00
dtype: datetime64[ns, America/New_York]

In [6]: arr = pa.StructArray.from_arrays([arr2, arr2], ['start', 'stop'])   
 

In [7]: arr.to_pandas() 
  
Out[7]: 
0{'start': 1580297927848944000, 'stop': 1580297...
dtype: object

{code}
from https://github.com/apache/arrow/pull/6312



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Bryan Cutler

The nightly Spark integration was failing because a test was renamed
recently in master. Once I fixed that and ran it again, this surfaced. I
remember it passed after the `split_blocks` change not too long ago, so all
this is pretty recent.

On Tue, Jan 28, 2020 at 2:46 PM Wes McKinney  wrote:

> Bryan -- was this tested somewhere that we missed (eg a nightly)?
>
> On Tue, Jan 28, 2020, 4:31 PM Bryan Cutler  wrote:
>
> > -1
> > There is a bug in Pandas conversion for timestamps that looks to be a
> > regression, https://issues.apache.org/jira/browse/ARROW-7709
> >
> > On Tue, Jan 28, 2020 at 11:30 AM Wes McKinney 
> wrote:
> >
> > > I opened https://issues.apache.org/jira/browse/ARROW-7708.
> > >
> > > On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney 
> > wrote:
> > > >
> > > > Hi Gawain -- since PARQUET issues are attached to a different project
> > > and fix version these have to be extracted from the git changelog. We
> can
> > > alter our scripts to scrape these commits from the git log output.
> > > >
> > > > On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton 
> > > wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >> It would seem that the list of issues does not include any of the
> > issues
> > > >> in the Parquet project which were fixed in this release.
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Gawain
> > > >>
> > > >> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> > > >> > Sorry, the previous email is hardly readable.
> > > >> >
> > > >> > I would like to propose the following release candidate (RC1) of
> > > Apache
> > > >> > Arrow version 0.16.0. This is a release consisting of 710 resolved
> > > JIRA
> > > >> > issues[1].
> > > >> >
> > > >> > This release candidate is based on commit:
> > > >> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> > > >> >
> > > >> > The source release rc1 is hosted at [3].
> > > >> > The binary artifacts are hosted at [4][5][6][7].
> > > >> > The changelog is located at [8].
> > > >> >
> > > >> > Please download, verify checksums and signatures, run the unit
> > tests,
> > > >> > and vote on the release. See [9] for how to validate a release
> > > candidate.
> > > >> >
> > > >> > The vote will be open for at least 72 hours.
> > > >> >
> > > >> > [ ] +1 Release this as Apache Arrow 0.16.0
> > > >> > [ ] +0
> > > >> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> > > >> >
> > > >> > [1]:
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > > >> > [2]:
> > >
> >
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> > > >> > [3]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> > > >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> > > >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> > > >> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> > > >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> > > >> > [8]:
> > >
> >
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> > > >> > [9]:
> > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > > >> >
> > > >> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> > > >> >  wrote:
> > > >> >> Hi,
> > > >> >>
> > > >> >> I would like to propose the following release candidate (RC1) of
> > > Apache
> > > >> >> Arrow version 0.16.0. This is a release consisting of 710
> > > >> >> resolved JIRA issues[1].
> > > >> >>
> > > >> >> This release candidate is based on commit:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> 188afde1f4298fb668e8eb

Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Bryan Cutler

-1
There is a bug in Pandas conversion for timestamps that looks to be a
regression, https://issues.apache.org/jira/browse/ARROW-7709

On Tue, Jan 28, 2020 at 11:30 AM Wes McKinney  wrote:

> I opened https://issues.apache.org/jira/browse/ARROW-7708.
>
> On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney  wrote:
> >
> > Hi Gawain -- since PARQUET issues are attached to a different project
> and fix version these have to be extracted from the git changelog. We can
> alter our scripts to scrape these commits from the git log output.
> >
> > On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton 
> wrote:
> >>
> >> Hello,
> >>
> >> It would seem that the list of issues does not include any of the issues
> >> in the Parquet project which were fixed in this release.
> >>
> >> Cheers,
> >>
> >> Gawain
> >>
> >> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> >> > Sorry, the previous email is hardly readable.
> >> >
> >> > I would like to propose the following release candidate (RC1) of
> Apache
> >> > Arrow version 0.16.0. This is a release consisting of 710 resolved
> JIRA
> >> > issues[1].
> >> >
> >> > This release candidate is based on commit:
> >> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >> >
> >> > The source release rc1 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7].
> >> > The changelog is located at [8].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> >> > and vote on the release. See [9] for how to validate a release
> candidate.
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow 0.16.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> >> >
> >> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> >> > [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> >> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> >> > [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> >> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >> >
> >> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> >> >  wrote:
> >> >> Hi,
> >> >>
> >> >> I would like to propose the following release candidate (RC1) of
> Apache
> >> >> Arrow version 0.16.0. This is a release consisting of 710
> >> >> resolved JIRA issues[1].
> >> >>
> >> >> This release candidate is based on commit:
> >> >>
> >> >>
> >> >>
> >> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >> >>
> >> >>
> >> >>
> >> >>   The source release rc1 is
> >> >> hosted at [3].
> >> >> The binary artifacts are hosted at [4][5][6][7].
> >> >>
> >> >>
> >> >>   The changelog is located at
> >> >> [8].
> >> >>
> >> >>
> >> >>
> >> >>   Please download, verify
> >> >> checksums and signatures, run the unit tests,
> >> >> and vote on the release. See [9] for how to validate a release
> >> >> candidate.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> The vote will be open for at least 72 hours.
> >> >>
> >> >>
> >> >>
> >> >>   [ ] +1 Release this as
> Apache
> >> >> Arrow 0.16.0
> >> >> [ ] +0
> >> >>
> >> >>
> >> >>   [ ] -1 Do not release this
> as
> >> >> Apache Arrow 0.16.0 because...
> >> >>
> >> >> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> >> >> [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> >> >> [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> >> >> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> >> >> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> >> >> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> >> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> >> >> [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> >> >> [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

[jira] [Created] (ARROW-7709) [Python] Conversion from Table Column to Pandas loses name for Timestamps

2020-01-28 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7709:
---

 Summary: [Python] Conversion from Table Column to Pandas loses 
name for Timestamps
 Key: ARROW-7709
 URL: https://issues.apache.org/jira/browse/ARROW-7709
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When converting a Table timestamp column to Pandas, the name of the column is 
lost in the resulting series.
{code:java}
In [23]: a1 = pa.array([pd.Timestamp.now()])
 

In [24]: a2 = pa.array([1]) 
 

In [25]: t = pa.Table.from_arrays([a1, a2], ['ts', 'a'])
 

In [26]: for c in t: 
...: print(c.to_pandas()) 
...:
 
0   2020-01-28 13:17:26.738708
dtype: datetime64[ns]
01
Name: a, dtype: int64 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: PR Dashboard for Java?

2020-01-27 Thread Bryan Cutler

Thanks Neal, but it doesn't look like I have confluence privileges. That's
fine though, the github interface is easy enough.

On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
neal.p.richard...@gmail.com> wrote:

> If you have confluence privileges, duplicate a page like
> https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard and
> then edit the Jira query (something like status in open/in
> progress/reopened, labels = pull-request-available, component = java,
> project = ARROW) if you want to make it Java issues that have pull requests
> open.
>
> Or you could bookmark
>
> https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> or https://github.com/apache/arrow/labels/lang-java
>
> Neal
>
> On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler  wrote:
>
> > I saw on Confluence that other Arrow components have PR dashboards, but I
> > don't see one for Java? I think it would be helpful, is it difficult to
> add
> > one for Java? I'm happy to do it if someone could point me in the right
> > direction. Thanks!
> >
> > Bryan
> >
>

Re: [DISCUSS][JAVA] Correct the behavior of ListVector isEmpty

2020-01-27 Thread Bryan Cutler

Return a null might be more correct since `getObject(int index)` also
return a null value if not set, but I don't think it's worth making a more
complicated API for this. It should be fine to return `false` for a null
value.
+1 for treating nulls as empty.

On Fri, Jan 24, 2020 at 9:12 AM Brian Hulette  wrote:

> What about returning null for a null list? It looks like now the function
> returns a primitive boolean, so I guess that would be a substantial change,
> but null seems more correct to me.
>
> On Thu, Jan 23, 2020, 21:38 Micah Kornfield  wrote:
>
> >  I would vote for treating nulls as empty.
> >
> > On Fri, Jan 10, 2020 at 12:36 AM Ji Liu 
> > wrote:
> >
> > > Hi all,
> > >
> > > Currently isEmpty API is always return false in
> BaseRepeatedValueVector,
> > > and its subclass ListVector did not overwrite this method.
> > > This will lead to incorrect result, for example, a ListVector with data
> > > [1,2], null, [], [5,6] would get [false, false, false, false] which is
> > not
> > > right.
> > > I opened a PR to fix this[1] and not sure what’s the right behavior for
> > > null value, should it return [false, false, true, false] or [false,
> true,
> > > true, false] ?
> > >
> > >
> > > Thanks,
> > > Ji Liu
> > >
> > >
> > > [1] https://github.com/apache/arrow/pull/6044
> > >
> > >
> >
>

PR Dashboard for Java?

2020-01-27 Thread Bryan Cutler

I saw on Confluence that other Arrow components have PR dashboards, but I
don't see one for Java? I think it would be helpful, is it difficult to add
one for Java? I'm happy to do it if someone could point me in the right
direction. Thanks!

Bryan

[jira] [Created] (ARROW-7693) [CI] Fix test-conda-python-3.7-spark-master nightly errors

2020-01-27 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7693:
---

 Summary: [CI] Fix test-conda-python-3.7-spark-master nightly errors
 Key: ARROW-7693
 URL: https://issues.apache.org/jira/browse/ARROW-7693
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Spark master renamed some tests, need to update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Java] PR Reviewers

2020-01-27 Thread Bryan Cutler

Hi Micah, I don't have a ton of bandwidth at the moment, but I'll try to
review some more PRs. Anyone, please feel free to ping me too if you have a
stale PR that needs some help getting through. Outreach to other Java
communities sounds like a good idea - more Java users would definitely be a
good thing!

Bryan

On Mon, Jan 27, 2020 at 8:12 AM Andy Grove  wrote:

> I've now started working with the Java implementation of Arrow,
> specifically Flight, and would be happy to help although I do have limited
> time each week. I can at least review from a Java correctness point of
> view.
>
> Andy.
>
> On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield 
> wrote:
>
> > I mentioned this elsewhere but my intent is to stop doing java reviews
> for
> > the immediate future once I wrap up the few that I have requested change
> > on.
> >
> > I'm happy to try to triage incoming Java PRs, but in order to do this, I
> > need to know which committers have some bandwidth to do reviews (some of
> > the existing PRs I've tagged people who never responded).
> >
> > Thanks,
> > Micah
> >
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-24 Thread Bryan Cutler

Thanks Joris for clearing that up! It's correct that pyspark will allow the
user to do operations on the resulting DataFrame, so it doesn't sound like
I should set `split_blocks=True` in the conversion. You're right that the
unnecessary assignments can be easily avoided if not timestamps, so that
will be a big help. I'll link this discussion to the JIRA in case it could
help others. Thanks again.

Bryan

On Fri, Jan 24, 2020 at 2:10 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Hi Bryan,
>
> For the case that the column is no timestamp and was not modified: I don't
> think it will take copies of the full dataframe by assigning columns in a
> loop like that. But it is still doing work (it will copy data for that
> column into the array holding those data for 2D blocks), and which can
> easily be avoided I think by only assigning back when the column was
> actually modified (eg by moving the is_datetime64tz_dtype inline in the
> loop iterating through all columns, so you can only write back if actually
> having tz-aware data).
>
> Further, even if you do the above to avoid writing back to the dataframe
> when not needed, I am not sure you should directly try to use the new
> zero-copy feature of the Table.to_pandas conversion (with
> split_blocks=True). It depends very much on what further happens with the
> converted dataframe. Once you do some operations in pandas, those splitted
> blocks will get combined (resulting in a memory copy then), and it also
> means you can't modify the dataframe (if this dataframe is used in python
> UDFs, it might limit what can be done in those UDFs. Just guessing here, I
> don't know the pyspark code well enough).
>
> Joris
>
>
> On Thu, 23 Jan 2020 at 21:03, Bryan Cutler  wrote:
>
> > Thanks for investigating this and the quick fix Joris and Wes!  I just
> have
> > a couple questions about the behavior observed here.  The pyspark code
> > assigns either the same series back to the pandas.DataFrame or makes some
> > modifications if it is a timestamp. In the case there are no timestamps,
> is
> > this potentially making extra copies or will it be unable to take
> advantage
> > of new zero-copy features in pyarrow? For the case of having timestamp
> > columns that need to be modified, is there a more efficient way to
> create a
> > new dataframe with only copies of the modified series?  Thanks!
> >
> > Bryan
> >
> > On Thu, Jan 16, 2020 at 11:48 PM Joris Van den Bossche <
> > jorisvandenboss...@gmail.com> wrote:
> >
> > > That sounds like a good solution. Having the zero-copy behavior
> depending
> > > on whether you have only 1 column of a certain type or not, might lead
> to
> > > surprising results. To avoid yet another keyword, only doing it when
> > > split_blocks=True sounds good to me (in practice, that's also when it
> > will
> > > happen mostly, except for very narrow dataframes with only few
> columns).
> > >
> > > Joris
> > >
> > > On Thu, 16 Jan 2020 at 22:44, Wes McKinney 
> wrote:
> > >
> > > > hi Joris,
> > > >
> > > > Thanks for investigating this. It seems there were some unintended
> > > > consequences of the zero-copy optimizations from ARROW-3789. Another
> > > > way forward might be to "opt in" to this behavior, or to only do the
> > > > zero copy optimizations when split_blocks=True. What do you think?
> > > >
> > > > - Wes
> > > >
> > > > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> > > >  wrote:
> > > > >
> > > > > So the spark integration build started to fail, and with the
> > following
> > > > test
> > > > > error:
> > > > >
> > > > >
> > ==
> > > > > ERROR: test_toPandas_batch_order
> > > > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > > > >
> > --
> > > > > Traceback (most recent call last):
> > > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422,
> in
> > > > > test_toPandas_batch_order
> > > > > run_test(*case)
> > > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409,
> in
> > > > run_test
> > > > > pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> > > > >   File "/spark/python/pyspark/sql/tests/test_

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

2020-01-23 Thread Bryan Cutler

Thanks for investigating this and the quick fix Joris and Wes!  I just have
a couple questions about the behavior observed here.  The pyspark code
assigns either the same series back to the pandas.DataFrame or makes some
modifications if it is a timestamp. In the case there are no timestamps, is
this potentially making extra copies or will it be unable to take advantage
of new zero-copy features in pyarrow? For the case of having timestamp
columns that need to be modified, is there a more efficient way to create a
new dataframe with only copies of the modified series?  Thanks!

Bryan

On Thu, Jan 16, 2020 at 11:48 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> That sounds like a good solution. Having the zero-copy behavior depending
> on whether you have only 1 column of a certain type or not, might lead to
> surprising results. To avoid yet another keyword, only doing it when
> split_blocks=True sounds good to me (in practice, that's also when it will
> happen mostly, except for very narrow dataframes with only few columns).
>
> Joris
>
> On Thu, 16 Jan 2020 at 22:44, Wes McKinney  wrote:
>
> > hi Joris,
> >
> > Thanks for investigating this. It seems there were some unintended
> > consequences of the zero-copy optimizations from ARROW-3789. Another
> > way forward might be to "opt in" to this behavior, or to only do the
> > zero copy optimizations when split_blocks=True. What do you think?
> >
> > - Wes
> >
> > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> >  wrote:
> > >
> > > So the spark integration build started to fail, and with the following
> > test
> > > error:
> > >
> > > ==
> > > ERROR: test_toPandas_batch_order
> > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > > --
> > > Traceback (most recent call last):
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> > > test_toPandas_batch_order
> > > run_test(*case)
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in
> > run_test
> > > pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> > > _toPandas_arrow_toggle
> > > pdf_arrow = df.toPandas()
> > >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in
> > toPandas
> > > return _check_dataframe_localize_timestamps(pdf, timezone)
> > >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > > _check_dataframe_localize_timestamps
> > > pdf[column] = _check_series_localize_timestamps(series, timezone)
> > >   File
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > line 3487, in __setitem__
> > > self._set_item(key, value)
> > >   File
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > line 3565, in _set_item
> > > NDFrame._set_item(self, key, value)
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > > line 3381, in _set_item
> > > self._data.set(key, value)
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > > line 1090, in set
> > > blk.set(blk_locs, value_getitem(val_locs))
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > > line 380, in set
> > > self.values[locs] = values
> > > ValueError: assignment destination is read-only
> > >
> > >
> > > It's from a test that is doing conversions from spark to arrow to
> pandas
> > > (so calling pyarrow.Table.to_pandas here
> > > <
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115
> > >),
> > > and on the resulting DataFrame, it is iterating through all columns,
> > > potentially fixing timezones, and writing each column back into the
> > > DataFrame (here
> > > <
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181
> > >
> > > ).
> > >
> > > Since it is giving an error about read-only, it might be related to
> > > zero-copy behaviour of to_pandas, and thus might be related to the
> > refactor
> > > of the arrow->pandas conversion that landed yesterday (
> > > https://github.com/apache/arrow/pull/6067, it says it changed to do
> > > zero-copy for 1-column blocks if possible).
> > > I am not sure if something should be fixed in pyarrow for this, but the
> > > obvious thing that pyspark can do is specify they don't want zero-copy.
> > >
> > > Joris
> > >
> > > On Wed, 15 Jan 2020 at 14:32, Crossbow  wrote:
> > >
> >
>

Re: Looking to 1.0

2020-01-06 Thread Bryan Cutler

I agree on a 0.16.0 release. In the meantime I'll try to help out with
getting the Java side ready for 1.0.

On Sat, Jan 4, 2020 at 7:21 PM Fan Liya  wrote:

> Hi Jacques,
>
> ARROW-4526 is interesting. I would like to try to resolve it.
> Thanks a lot for the information.
>
> Best,
> Liya Fan
>
>
> On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau  wrote:
>
> > The third ticket I was commenting on was ARROW-4526.
> >
> > Fan, do you want to take a shot at that one?
> >
> > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya  wrote:
> >
> > >   Hi Jacques,
> > >
> > > I am interested in the issues, and if it is possible, I would like to
> try
> > > to resolve them.
> > >
> > > Thanks.
> > >
> > > Liya Fan
> > >
> > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau 
> > wrote:
> > >
> > > > I identified three things in the java library that I think are top of
> > > mind
> > > > and should be fixed before 1.0 to avoid weird incompatibility changes
> > in
> > > > the java apis (technical debt). I've tagged them as pre-1.0 as I
> don't
> > > > exactly see what is the right way to tag/label a target release for a
> > > > ticket.
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/ARROW-7495?jql=labels%20%3D%20pre-1.0
> > > >
> > > > For the three tickets I identified, does anyone have interest in
> trying
> > > to
> > > > resolve?
> > > >
> > > > thanks,
> > > > Jacques
> > > >
> > > >
> > > >
> > > > On Thu, Jan 2, 2020 at 11:55 AM Neal Richardson <
> > > > neal.p.richard...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > > Happy new year! As we look ahead to 2020, it's time to start
> > mobilizing
> > > > for
> > > > > the Arrow 1.0 release. At 0.15, I believe we decided that our next
> > > > release
> > > > > should be 1.0, and it's been a couple of months since 0.15, so
> we're
> > > due
> > > > to
> > > > > release again this month, give or take. (See [1] for when we most
> > > > recently
> > > > > discussed doing 1.0 back in June, or if you're a fan of ancient
> > > history,
> > > > > see [2] for a similar discussion from July 2017.)
> > > > >
> > > > > Since there appeared to be consensus before that it is time for
> 1.0,
> > > > let's
> > > > > discuss how to get it done. One first step would be to make sure
> that
> > > > we've
> > > > > identified all format/specification issues we think we must resolve
> > > > before
> > > > > declaring 1.0. [3] shows 3 "blockers" for the 1.0 release already.
> > > There
> > > > > are an additional 14 "Format" issues ([4]); perhaps some of those
> > > should
> > > > > also be labeled blockers for 1.0.
> > > > >
> > > > > It would be great if folks could review Jira in their areas of
> > > expertise
> > > > > and make sure everything essential for 1.0 is ticketed and
> > prioritized
> > > > > appropriately. Once we've identified the required tasks for making
> a
> > > 1.0
> > > > > release, we can work together on burning those down.
> > > > >
> > > > > Neal
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/44a7a3d256ab5dbd62da6fe45b56951b435697426bf4adedb6520907@%3Cdev.arrow.apache.org%3E
> > > > >
> > > > > [2]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/0aca401e8906e1adbb37228b38569a9a7736b864da854007dad111c3%40%3Cdev.arrow.apache.org%3E
> > > > > [3]:
> > > >
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > > > > [4]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(%22In%20Review%22%2C%20Open%2C%20%22In%20Progress%22)%20AND%20fixVersion%20%3D%201.0.0%20AND%20component%20%3D%20Format
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (ARROW-7502) [Integration] Remove Spark Integration patch that not needed anymore

2020-01-06 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7502:
---

 Summary: [Integration] Remove Spark Integration patch that not 
needed anymore
 Key: ARROW-7502
 URL: https://issues.apache.org/jira/browse/ARROW-7502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Apache Spark master has been updated to work with Arrow 0.15.1 after the binary 
protocol  change and patching Spark master is no longer necessary to build with 
current Arrow, so the previous patch can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors

2019-11-20 Thread Bryan Cutler

I'm not sure what the best way to handle this is.  Ideally we would use an
alternative that doesn't require setting a property, but I don't know Netty
well enough to make a recommendation. I also want to be careful not to
introduce anything that would hurt performance or cause any other side
effects. I made https://issues.apache.org/jira/browse/ARROW-7223 to track
this, we can continue the discussion there and I will try to do some
research into possible solutions.

On Wed, Nov 20, 2019 at 2:51 AM Fan Liya  wrote:

> Hi Bryan,
>
> Thanks for bringing this up.
> +1 for the change.
>
> I am not clear what is the right place to override the jvm property.
> It is possible that when we try to override it (possibly in a static
> block), the old property value has already been read by netty library.
> To avoid this problem, do we need to control the order of class loading?
>
> Best,
> Liya Fan
>
> On Mon, Nov 18, 2019 at 3:17 PM Micah Kornfield 
> wrote:
>
> > This sounds reasonable to me.  At this point I think having our consumers
> > have a better experience is more important then library purity concerns
> > I've had in the past.
> >
> > Do we need to handle jdk8 as a special case?  Do you think it pays to try
> > to find an alternate library that doesn't require special flags for
> > whatever we are using this functionality for?
> >
> > Thanks,
> > Micah
> >
> > On Sunday, November 17, 2019, Bryan Cutler  wrote:
> >
> > > After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above
> are
> > > required to set the JVM property "io.netty.tryReflectionSetAccessible=
> > > true"
> > > at startup, each time Arrow code is run, as documented at [2]. Not
> doing
> > > this will result in the error "java.lang.UnsupportedOperationException:
> > > sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not
> available".
> > > This is due to a part of the Netty codebase, and I'm not sure if there
> > any
> > > way around it, but I don't think it's the correct behavior for Arrow
> > > out-of-the-box to fail with a 3rd party error by default. This has come
> > up
> > > before in our own unit testing [3], and most recently when upgrading
> > Arrow
> > > in Spark [4].
> > >
> > > I'd like to propose that Arrow Java change to the following behavior:
> > >
> > > 1) check to see if the property io.netty.tryReflectionSetAccessible has
> > > been set
> > > 2) if not set, automatically set to "true"
> > > 3) else if set to "false", catch the Netty error and prepend the error
> > > message with the suggested setting of "true"
> > >
> > > What do other devs think?
> > >
> > > [1] https://issues.apache.org/jira/browse/ARROW-3191
> > > [2] https://github.com/apache/arrow/tree/master/java#java-properties
> > > [3] https://issues.apache.org/jira/browse/ARROW-5412
> > > [4] https://github.com/apache/spark/pull/26552
> > >
> >
>

[jira] [Created] (ARROW-7223) [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true

2019-11-20 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7223:
---

 Summary: [Java] Provide default setting of 
io.netty.tryReflectionSetAccessible=true
 Key: ARROW-7223
 URL: https://issues.apache.org/jira/browse/ARROW-7223
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


After ARROW-3191, consumers of Arrow Java with a JDK 9 and above are required 
to set the JVM property "io.netty.tryReflectionSetAccessible=true" at startup, 
each time Arrow code is run, as documented at 
https://github.com/apache/arrow/tree/master/java#java-properties. Not doing 
this will result in the error "java.lang.UnsupportedOperationException: 
sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available", making 
Arrow unusable out-of-the-box.

This proposes to automatically set the property if not already set in the 
following steps:

1) check to see if the property io.netty.tryReflectionSetAccessible has been set
2) if not set, automatically set to "true"
3) else if set to "false", catch the Netty error and prepend the error message 
with the suggested setting of "true"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors

2019-11-17 Thread Bryan Cutler

After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above are
required to set the JVM property "io.netty.tryReflectionSetAccessible=true"
at startup, each time Arrow code is run, as documented at [2]. Not doing
this will result in the error "java.lang.UnsupportedOperationException:
sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available".
This is due to a part of the Netty codebase, and I'm not sure if there any
way around it, but I don't think it's the correct behavior for Arrow
out-of-the-box to fail with a 3rd party error by default. This has come up
before in our own unit testing [3], and most recently when upgrading Arrow
in Spark [4].

I'd like to propose that Arrow Java change to the following behavior:

1) check to see if the property io.netty.tryReflectionSetAccessible has
been set
2) if not set, automatically set to "true"
3) else if set to "false", catch the Netty error and prepend the error
message with the suggested setting of "true"

What do other devs think?

[1] https://issues.apache.org/jira/browse/ARROW-3191
[2] https://github.com/apache/arrow/tree/master/java#java-properties
[3] https://issues.apache.org/jira/browse/ARROW-5412
[4] https://github.com/apache/spark/pull/26552

[jira] [Created] (ARROW-7173) Add test to verify Map field names can be arbitrary

2019-11-14 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-7173:
---

 Summary: Add test to verify Map field names can be arbitrary
 Key: ARROW-7173
 URL: https://issues.apache.org/jira/browse/ARROW-7173
 Project: Apache Arrow
  Issue Type: Test
  Components: Integration
Reporter: Bryan Cutler


A Map has child fields and the format spec only recommends that they be named 
"entries", "key", and "value" but could be named anything. Currently, 
integration tests for Map arrays verify the exchanged schema is equal, so the 
child fields are always named the same. There should be tests that use 
different names to verify implementations can accept this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Java] Append multiple record batches together?

2019-11-08 Thread Bryan Cutler

I think having a chunked array with multiple vector buffers would be ideal,
similar to C++. It might take a fair amount of work to add this but would
open up a lot more functionality. As for the API,
VectorSchemaRoot.concat(Collection) seems good to me.

On Thu, Nov 7, 2019 at 12:09 AM Fan Liya  wrote:

> Hi Micah,
>
> Thanks for bringing this up.
>
> > 1.  An efficient solution already exists? It seems like TransferPair
> implementations could possibly be improved upon or have they already been
> optimized?
>
> Fundamnentally, memory copy is unavoidable, IMO, because the source and
> targe memory regions are likely to be in non-contiguous regions.
> An alternative is to make ArrowBuf support a number of non-contiguous
> memory regions. However, that would harm the perfomance of ArrowBuf, and
> ArrowBuf is the core of the Arrow library.
>
> > 2.  What the preferred API for doing this would be?  Some options i can
> think of:
>
> > * VectorSchemaRoot.concat(Collection)
> > * VectorSchemaRoot.from(Collection)
> > * VectorLoader.load(Collection)
>
> IMO, option 1 is required, as we have scenarios that need to concate
> vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta
> dictionaries).
> Options 2 and 3 are optional for us.
>
> Best,
> Liya Fan
>
> On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield 
> wrote:
>
> > Hi,
> > A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048
> for
> > having similar functionality to the python APIs that allow for creating
> one
> > larger data structure from a series of record batches.  I just wanted to
> > surface it here in case:
> > 1.  An efficient solution already exists? It seems like TransferPair
> > implementations could possibly be improved upon or have they already been
> > optimized?
> > 2.  What the preferred API for doing this would be?  Some options i can
> > think of:
> >
> > * VectorSchemaRoot.concat(Collection)
> > * VectorSchemaRoot.from(Collection)
> > * VectorLoader.load(Collection)
> >
> > Thanks,
> > Micah
> >
>

Re: [VOTE] Release Apache Arrow 0.15.1 - RC0

2019-10-31 Thread Bryan Cutler

After changing it to `pip install archery` it will use my base conda env.
Then I get an error that `archery integration` is not a valid command:

+ INTEGRATION_TEST_ARGS=
+ '[' OFF = ON ']'
+
LD_LIBRARY_PATH=/tmp/arrow-0.15.1.0tRNC/apache-arrow-0.15.1/cpp/build/release:/tmp/arrow-0.15.1.0tRNC/install/lib:
+ archery integration --with-cpp=1 --with-java=1 --with-js=0 --with-go=0
Usage: archery [OPTIONS] COMMAND [ARGS]...
Try "archery --help" for help.

Error: No such command "integration".

Checking the install, it only has 2 options to benchmark or build:
$ archery --help
Usage: archery [OPTIONS] COMMAND [ARGS]...

  Apache Arrow developer utilities.

  See sub-commands help with `archery  --help`.

Options:
  --debug  Increase logging with debugging output.
  --pdbInvoke pdb on uncaught exception.
  -q, --quiet  Silence executed commands.
  --help   Show this message and exit.

Commands:
  benchmark  Arrow benchmarking.
  build  Initialize an Arrow C++ build

On Thu, Oct 31, 2019 at 2:38 PM Bryan Cutler  wrote:

> I am using a conda env, so that will install the package there. When
> archery runs the integration tests though, it looks like it uses my system
> python/pip3, which doesn't have any of the required packages. Not sure if
> I'm missing something to use the right environment.
>
> On Thu, Oct 31, 2019 at 2:17 PM Wes McKinney  wrote:
>
>> hi Bryan -- I think `pip3 install setuptools` will take care of it
>>
>> On Thu, Oct 31, 2019 at 2:06 PM Bryan Cutler  wrote:
>> >
>> > +1 (non-binding), although I could not complete the source verification
>> > script
>> >
>> > On Ubuntu 16.04 I ran
>> >  * verification script for binaries, no issues
>> >  * verification script for source, could not complete:
>> > TEST_DEFAULT=0 TEST_SOURCE=1 TEST_PYTHON=1 TEST_INTEGRATION_CPP=1
>> > TEST_INTEGRATION_JAVA=1 ARROW_FLIGHT=OFF
>> > dev/release/verify-release-candidate.sh source 0.15.1 0
>> > I had an issue building ORC adapter, and had to manually disable.
>> Probably
>> > an env issue for me.
>> > I ran into the following problem, I think this is due to using my system
>> > python instead of the active conda env, possibly caused by changes in
>> > https://github.com/apache/arrow/pull/5600
>> > + pip3 install -e dev/archery
>> > Obtaining file:///tmp/arrow-0.15.1.7OxLD/apache-arrow-0.15.1/dev/archery
>> > Complete output from command python setup.py egg_info:
>> > Traceback (most recent call last):
>> >   File "", line 1, in 
>> > ImportError: No module named 'setuptools'
>> >
>> > Any way to get around this?
>> >
>> > On Thu, Oct 31, 2019 at 6:53 AM Francois Saint-Jacques <
>> > fsaintjacq...@gmail.com> wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > Ubuntu 18.04
>> > > - Source release verified
>> > > - Binary release verified
>> > >
>> > > François
>> > >
>> > >
>> > > On Fri, Oct 25, 2019 at 2:43 PM Krisztián Szűcs
>> > >  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > I would like to propose the following release candidate (RC0) of
>> Apache
>> > > > Arrow version 0.15.1. This is a patch release consisting of 36
>> resolved
>> > > > JIRA issues[1].
>> > > >
>> > > > This release candidate is based on commit:
>> > > > b789226ccb2124285792107c758bb3b40b3d082a [2]
>> > > >
>> > > > The source release rc0 is hosted at [3].
>> > > > The binary artifacts are hosted at [4][5][6][7].
>> > > > The changelog is located at [8].
>> > > >
>> > > > Please download, verify checksums and signatures, run the unit
>> tests,
>> > > > and vote on the release. See [9] for how to validate a release
>> candidate.
>> > > >
>> > > > The vote will be open for at least 72 hours.
>> > > >
>> > > > [ ] +1 Release this as Apache Arrow 0.15.1
>> > > > [ ] +0
>> > > > [ ] -1 Do not release this as Apache Arrow 0.15.1 because...
>> > > >
>> > > > [1]:
>> > >
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.1
>> > > > [2]:
>> > >
>> https://github.com/apache/arrow/tree/b789226ccb2124285792107c758bb3b40b3d082a
>> > > > [3]:
>> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.1-rc0
>> > > > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.1-rc0
>> > > > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.1-rc0
>> > > > [6]: https://bintray.com/apache/arrow/python-rc/0.15.1-rc0
>> > > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.1-rc0
>> > > > [8]:
>> > >
>> https://github.com/apache/arrow/blob/b789226ccb2124285792107c758bb3b40b3d082a/CHANGELOG.md
>> > > > [9]:
>> > >
>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>> > >
>>
>

Re: [VOTE] Release Apache Arrow 0.15.1 - RC0

2019-10-31 Thread Bryan Cutler

I am using a conda env, so that will install the package there. When
archery runs the integration tests though, it looks like it uses my system
python/pip3, which doesn't have any of the required packages. Not sure if
I'm missing something to use the right environment.

On Thu, Oct 31, 2019 at 2:17 PM Wes McKinney  wrote:

> hi Bryan -- I think `pip3 install setuptools` will take care of it
>
> On Thu, Oct 31, 2019 at 2:06 PM Bryan Cutler  wrote:
> >
> > +1 (non-binding), although I could not complete the source verification
> > script
> >
> > On Ubuntu 16.04 I ran
> >  * verification script for binaries, no issues
> >  * verification script for source, could not complete:
> > TEST_DEFAULT=0 TEST_SOURCE=1 TEST_PYTHON=1 TEST_INTEGRATION_CPP=1
> > TEST_INTEGRATION_JAVA=1 ARROW_FLIGHT=OFF
> > dev/release/verify-release-candidate.sh source 0.15.1 0
> > I had an issue building ORC adapter, and had to manually disable.
> Probably
> > an env issue for me.
> > I ran into the following problem, I think this is due to using my system
> > python instead of the active conda env, possibly caused by changes in
> > https://github.com/apache/arrow/pull/5600
> > + pip3 install -e dev/archery
> > Obtaining file:///tmp/arrow-0.15.1.7OxLD/apache-arrow-0.15.1/dev/archery
> > Complete output from command python setup.py egg_info:
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > ImportError: No module named 'setuptools'
> >
> > Any way to get around this?
> >
> > On Thu, Oct 31, 2019 at 6:53 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > +1 (non-binding)
> > >
> > > Ubuntu 18.04
> > > - Source release verified
> > > - Binary release verified
> > >
> > > François
> > >
> > >
> > > On Fri, Oct 25, 2019 at 2:43 PM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to propose the following release candidate (RC0) of
> Apache
> > > > Arrow version 0.15.1. This is a patch release consisting of 36
> resolved
> > > > JIRA issues[1].
> > > >
> > > > This release candidate is based on commit:
> > > > b789226ccb2124285792107c758bb3b40b3d082a [2]
> > > >
> > > > The source release rc0 is hosted at [3].
> > > > The binary artifacts are hosted at [4][5][6][7].
> > > > The changelog is located at [8].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > > > and vote on the release. See [9] for how to validate a release
> candidate.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow 0.15.1
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow 0.15.1 because...
> > > >
> > > > [1]:
> > >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.1
> > > > [2]:
> > >
> https://github.com/apache/arrow/tree/b789226ccb2124285792107c758bb3b40b3d082a
> > > > [3]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.1-rc0
> > > > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.1-rc0
> > > > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.1-rc0
> > > > [6]: https://bintray.com/apache/arrow/python-rc/0.15.1-rc0
> > > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.1-rc0
> > > > [8]:
> > >
> https://github.com/apache/arrow/blob/b789226ccb2124285792107c758bb3b40b3d082a/CHANGELOG.md
> > > > [9]:
> > >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > >
>

Re: [VOTE] Release Apache Arrow 0.15.1 - RC0

2019-10-31 Thread Bryan Cutler

+1 (non-binding), although I could not complete the source verification
script

On Ubuntu 16.04 I ran
 * verification script for binaries, no issues
 * verification script for source, could not complete:
TEST_DEFAULT=0 TEST_SOURCE=1 TEST_PYTHON=1 TEST_INTEGRATION_CPP=1
TEST_INTEGRATION_JAVA=1 ARROW_FLIGHT=OFF
dev/release/verify-release-candidate.sh source 0.15.1 0
I had an issue building ORC adapter, and had to manually disable. Probably
an env issue for me.
I ran into the following problem, I think this is due to using my system
python instead of the active conda env, possibly caused by changes in
https://github.com/apache/arrow/pull/5600
+ pip3 install -e dev/archery
Obtaining file:///tmp/arrow-0.15.1.7OxLD/apache-arrow-0.15.1/dev/archery
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named 'setuptools'

Any way to get around this?

On Thu, Oct 31, 2019 at 6:53 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> +1 (non-binding)
>
> Ubuntu 18.04
> - Source release verified
> - Binary release verified
>
> François
>
>
> On Fri, Oct 25, 2019 at 2:43 PM Krisztián Szűcs
>  wrote:
> >
> > Hi,
> >
> > I would like to propose the following release candidate (RC0) of Apache
> > Arrow version 0.15.1. This is a patch release consisting of 36 resolved
> > JIRA issues[1].
> >
> > This release candidate is based on commit:
> > b789226ccb2124285792107c758bb3b40b3d082a [2]
> >
> > The source release rc0 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.15.1
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.15.1 because...
> >
> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.1
> > [2]:
> https://github.com/apache/arrow/tree/b789226ccb2124285792107c758bb3b40b3d082a
> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.1-rc0
> > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.1-rc0
> > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.1-rc0
> > [6]: https://bintray.com/apache/arrow/python-rc/0.15.1-rc0
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.1-rc0
> > [8]:
> https://github.com/apache/arrow/blob/b789226ccb2124285792107c758bb3b40b3d082a/CHANGELOG.md
> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>

Re: [DISCUSS][Java] Builders for java classes

2019-10-29 Thread Bryan Cutler

Just to clarify, how will this be different than the current vector writers
that they are wrapping? Is it just the ability to add multiple values at
once, or more efficiently?
Also, if we are going to be adding new APIs, maybe we can try to match more
closely the existing builders in C++?  I believe they are pretty similar,
only use "append*", "appendValues", etc.

Bryan

On Sun, Oct 27, 2019 at 1:03 PM Jacques Nadeau  wrote:

> +1 on the idea of enhancing builder interfaces.
>
> >>IntVectorBuilder addAll(int[] values);
>
> Let's make sure that anything like the above is efficient. People will
> judge the quality of the project on the efficiency of the methods we
> provide. If everybody starts using int[] to build Arrow vectors, we should
> make sure it is good. The complexwriter tries to be dynamically typed in a
> statically typed language and had to give up some efficiency. For methods
> where we're adding multiple values (as above), we should make sure to
> improve the layers to maximize things.
>
> On Thu, Oct 24, 2019 at 3:08 AM Fan Liya  wrote:
>
> > Hi Micah,
> >
> > IMO, we need an adapter from on-heap array to off-heap array.
> > This is useful because many third-party Java libraries populate data to
> an
> > on-heap array.
> >
> > And I see this API in your design:
> >
> > IntVectorBuilder addAll(int[] values);
> >
> > So I am +1 for this.
> >
> > Best,
> > Liya Fan
> >
> > On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield 
> > wrote:
> >
> > > As part a PR Ji Liu has made to help populate data for test cases [1],
> > the
> > > question came up on whether we should provide a more  builder classes
> in
> > > java for ValueVectors.  The proposed implementation would wrap the
> > existing
> > > Writer classes.
> > >
> > > Do people think this would be a valuable addition to the java library?
> I
> > > imagine it would be a builder per ValueVectorType.  The main benefit I
> > see
> > > to this is making the library potentially slightly easier to use for
> > > new-comers, but might not be the most efficient.  A straw-man interface
> > is
> > > listed below.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Micah
> > >
> > > class IntVectorBuilder {
> > >public IntVectorBuilder(BufferAllocator allocator);
> > >
> > >IntVectorBuilder add(int value);
> > > IntVectorBuilder addAll(int[] values);
> > > IntVectorBuilder addNull();
> > > // handles null values in array
> > > IntVectorBuilder addAll(Integer... values);
> > > IntVectorBuilder addAll(List values);
> > > IntVector build(String name);
> > > }
> > >
> >
>

Re: [ANNOUNCE] New Arrow committer: Eric Erhardt

2019-10-18 Thread Bryan Cutler

Congrats!

On Thu, Oct 17, 2019, 6:26 PM Fan Liya  wrote:

> Congrats Eric!
>
> Best,
> Liya Fan
>
> On Fri, Oct 18, 2019 at 3:06 AM paddy horan 
> wrote:
>
> > Congrats Eric!
> >
> > 
> > From: Micah Kornfield 
> > Sent: Thursday, October 17, 2019 12:45:15 PM
> > To: dev 
> > Subject: Re: [ANNOUNCE] New Arrow committer: Eric Erhardt
> >
> > Congrats Eric!
> >
> > On Thu, Oct 17, 2019 at 6:58 AM Wes McKinney 
> wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Eric has
> > > accepted an invitation to become a committer on Apache Arrow.
> > >
> > > Welcome, and thank you for your contributions!
> > >
> >
>

[jira] [Created] (ARROW-6904) [Python] Implement MapArray and MapType

2019-10-16 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6904:
---

 Summary: [Python] Implement MapArray and MapType
 Key: ARROW-6904
 URL: https://issues.apache.org/jira/browse/ARROW-6904
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 1.0.0


Map arrays are already added to C++, need to expose them in the Python API also



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Apache Arrow 0.15.0 released

2019-10-07 Thread Bryan Cutler

Great work everyone!

On Sun, Oct 6, 2019 at 1:46 PM Wes McKinney  wrote:

> Congrats everyone!
>
> On Sat, Oct 5, 2019 at 10:09 AM Krisztián Szűcs  wrote:
> >
> > The Apache Arrow community is pleased to announce the 0.15.0 release.
> > The release includes 711 resolved issues ([1]) since the 0.14.0 release.
> >
> > The release is available now from our website, [2] and [3]:
> > http://arrow.apache.org/install/
> >
> > Release notes are available at:
> > https://arrow.apache.org/release/0.15.0.html
> >
> > What is Apache Arrow?
> > -
> >
> > Apache Arrow is a cross-language development platform for in-memory
> data. It
> > specifies a standardized language-independent columnar memory format for
> > flat
> > and hierarchical data, organized for efficient analytic operations on
> modern
> > hardware. It also provides computational libraries and zero-copy
> streaming
> > messaging and interprocess communication. Languages currently supported
> > include
> > C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
> >
> > Please report any feedback to the mailing lists ([4])
> >
> > Regards,
> > The Apache Arrow community
> >
> > [1]: https://issues.apache.org/jira/projects/ARROW/versions/12345978
> > [2]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.15.0/
> > [3]: https://bintray.com/apache/arrow
> > [4]: https://lists.apache.org/list.html?dev@arrow.apache.org
>

[jira] [Created] (ARROW-6790) [Release] Automatically disable integration test cases in release verification

2019-10-03 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6790:
---

 Summary: [Release] Automatically disable integration test cases in 
release verification
 Key: ARROW-6790
 URL: https://issues.apache.org/jira/browse/ARROW-6790
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Bryan Cutler
Assignee: Bryan Cutler


If dev/release/verify-release-candidate.sh is run with selective testing and 
includes integration tests, the selected implementations should be the only 
ones enabled when running the integration test portion. For example:

TEST_DEFAULT=0 \
TEST_CPP=1 \
TEST_JAVA=1 \
TEST_INTEGRATION=1 \
dev/release/verify-release-candidate.sh source 0.15.0 2

Should run integration only for C++ and Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Collecting Arrow critique and our roadmap on that

2019-10-03 Thread Bryan Cutler

A lot of good info here, I added a point that has come up often for me.

On Thu, Oct 3, 2019 at 10:03 AM Wes McKinney  wrote:

> I read through and left some comments.
>
> Would be great to turn into an FAQ section in the docs and add a link
> to the navigation on the front page of the website.
>
> On Mon, Sep 23, 2019 at 1:22 PM Uwe L. Korn  wrote:
> >
> > Thanks to the all contributions that already came in. I made some more
> additions and hope to turn this into a PR to the site soon.
> >
> > Uwe
> >
> > On Fri, Sep 20, 2019, at 10:46 AM, Micah Kornfield wrote:
> > > I think this is a good idea, as well.  I added comments and additions
> on
> > > the document.
> > >
> > > On Thu, Sep 19, 2019 at 11:47 AM Neal Richardson <
> > > neal.p.richard...@gmail.com> wrote:
> > >
> > > > Uwe, I think this is an excellent idea. I've started
> > > >
> > > >
> https://docs.google.com/document/d/1cgN7mYzH30URDTaioHsCP2d80wKKHDNs9f5s7vdb2mA/edit?usp=sharing
> > > > to collect some ideas and notes. Once we have gathered our thoughts
> > > > there, we can put them in the appropriate places.
> > > >
> > > > I think that some of the result will go into the FAQ, some into
> > > > documentation (maybe more "how-to" and "getting started" guides in
> the
> > > > respective language docs, as well as some "how to share Arrow data
> > > > from X to Y"), and other things that we haven't yet done should go
> > > > into a sort of Roadmap document on the main website. We have some
> very
> > > > outdated content related to a roadmap on the confluence wiki that
> > > > should be folded in as appropriate too.
> > > >
> > > > Neal
> > > >
> > > > On Thu, Sep 19, 2019 at 10:26 AM Uwe L. Korn 
> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > there has been a lot of public discussions lately with some
> mentions of
> > > > actually informed, valid critique of things in the Arrow project.
> From my
> > > > perspective, these things include "there is not STL-native C++ Arrow
> API",
> > > > "the base build requires too much dependencies", "the pyarrow
> package is
> > > > really huge and you cannot select single components". These are
> things we
> > > > cannot tackle at the moment due to the lack of contributors to the
> project.
> > > > But we can use this as a basis to point people that critique the
> project on
> > > > this that this is not intentional but a lack of resources as well as
> it
> > > > provides another point of entry for new contributors looking for
> work.
> > > > >
> > > > > Thus I would like to start a document (possibly on the website)
> where we
> > > > list the major critiques on Arrow, mention our long-term solution to
> that
> > > > and what JIRAs need to be done for that.
> > > > >
> > > > > Would that be something others would also see as valuable?
> > > > >
> > > > > There has also been a lot of uninformed criticism, I think that
> can be
> > > > best combat by documentation, blog posts and public appearances at
> > > > conferences and is not covered by this proposal.
> > > > >
> > > > > Uwe
> > > >
> > >
>

Re: Docker organization for development images

2019-10-03 Thread Bryan Cutler

Sounds good, thanks Krisztian!

On Thu, Oct 3, 2019 at 6:10 AM Krisztián Szűcs 
wrote:

> Hi,
>
> We've created a docker hub organisation called "arrowdev"
> to host the images defined in the docker-compose.yml, see
> the following commit [1].
> So now it is possible to speed up the image builds by pulling
> the layers first, I suggest to use the --pull flag for building
> images: `docker-compose build --pull cpp`
>
> We need to manually grant write access for committers and
> PMC members, so please send me your dockerhub username.
>
> Thanks, Krisztian
>
> P.S. Github has recently introduced its packaging feature, so
> we'll be able to experiment with hosting docker images on
> GitHub directly which would handle our permission settings
> out of the box. IMO we should try it once it is enabled for
> the apache/arrow repository.
>
> [1]:
>
> https://github.com/apache/arrow/commit/1165cdb85b92cefcf59ac39d35f42d168cc64517
>

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Bryan Cutler

+1 (non-binding)

I ran the following on Ubuntu 16.04 4.15.0-64-generic:
> dev/release/verify-release-candidate.sh binaries 0.15.0 2
> ARROW_CUDA=OFF \
TEST_DEFAULT=0 \
TEST_SOURCE=1 \
TEST_CPP=1 \
TEST_PYTHON=1 \
TEST_JAVA=1 \
TEST_INTEGRATION=1 \
dev/release/verify-release-candidate.sh source 0.15.0 2

For source verification I set INTEGRATION_TEST_ARGS="--enable-js=0
--enable-go=0"

When attempting source verification with defaults, I got the below error
when building the ORC adapter. It looks like just a warning that is being
treated as error and seems to be only in

On Wed, Oct 2, 2019 at 7:53 AM Andy Grove  wrote:

> +1 (binding)
>
> On Mon, Sep 30, 2019 at 11:57 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I would like to propose the following release candidate (RC2) of Apache
> > Arrow version 0.15.0. This is a release consiting of 697
> > resolved JIRA issues[1].
> >
> > This release candidate is based on commit:
> > 40d468e162e88e1761b1e80b3ead060f0be927ee [2]
> >
> > The source release rc2 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.15.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.15.0 because...
> >
> > [1]:
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.0
> > [2]:
> >
> >
> https://github.com/apache/arrow/tree/40d468e162e88e1761b1e80b3ead060f0be927ee
> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.0-rc2
> > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.0-rc2
> > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.0-rc2
> > [6]: https://bintray.com/apache/arrow/python-rc/0.15.0-rc2
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.0-rc2
> > [8]:
> >
> >
> https://github.com/apache/arrow/blob/40d468e162e88e1761b1e80b3ead060f0be927ee/CHANGELOG.md
> > [9]:
> >
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >
>

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-02 Thread Bryan Cutler

Accidentally sent too soon. The ORC build error I got was probably just an
env issue for me, but here it is in case anyone else had the same issue:

In file included from
ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep/c++/src/wrap/orc-proto-wrapper.cc:44:0ESC[mESC[K:
ESC[01mESC[K/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc:970:13:ESC[mESC[K
ESC[01;31mESC[Kerror:
ESC[mESC[K‘ESC[01mESC[Kdynamic_init_dummy_orc_5fproto_2eprotoESC[mESC[K’
defined but not used [-Werror=unused-variable]
 static bool dynamic_init_dummy_orc_5fproto_2eproto = []() {
AddDescriptors_orc_5fproto_2eproto(); return true; }();
ESC[01;32mESC[K ^ESC[mESC[K
cc1plus: all warnings being treated as errors
make[5]: *** [c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o] Error
1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [c++/src/CMakeFiles/orc.dir/all] Error 2
make[3]: *** [all] Error 2

[ 29%] Performing build step for 'orc_ep'
CMake Error at
/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-RELEASE.cmake:16
(message):
  Command failed: 2

   'make'

  See also


/tmp/arrow-0.15.0.KxYbA/apache-arrow-0.15.0/cpp/build/orc_ep-prefix/src/orc_ep-stamp/orc_ep-build-*.log

CMakeFiles/orc_ep.dir/build.make:111: recipe for target
'orc_ep-prefix/src/orc_ep-stamp/orc_ep-build' failed
make[2]: *** [orc_ep-prefix/src/orc_ep-stamp/orc_ep-build] Error 1
CMakeFiles/Makefile2:1248: recipe for target 'CMakeFiles/orc_ep.dir/all'
failed
make[1]: *** [CMakeFiles/orc_ep.dir/all] Error 2

On Wed, Oct 2, 2019 at 4:12 PM Bryan Cutler  wrote:

> +1 (non-binding)
>
> I ran the following on Ubuntu 16.04 4.15.0-64-generic:
> > dev/release/verify-release-candidate.sh binaries 0.15.0 2
> > ARROW_CUDA=OFF \
> TEST_DEFAULT=0 \
> TEST_SOURCE=1 \
> TEST_CPP=1 \
> TEST_PYTHON=1 \
> TEST_JAVA=1 \
> TEST_INTEGRATION=1 \
> dev/release/verify-release-candidate.sh source 0.15.0 2
>
> For source verification I set INTEGRATION_TEST_ARGS="--enable-js=0
> --enable-go=0"
>
> When attempting source verification with defaults, I got the below error
> when building the ORC adapter. It looks like just a warning that is being
> treated as error and seems to be only in
>
> On Wed, Oct 2, 2019 at 7:53 AM Andy Grove  wrote:
>
>> +1 (binding)
>>
>> On Mon, Sep 30, 2019 at 11:57 PM Krisztián Szűcs <
>> szucs.kriszt...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I would like to propose the following release candidate (RC2) of Apache
>> > Arrow version 0.15.0. This is a release consiting of 697
>> > resolved JIRA issues[1].
>> >
>> > This release candidate is based on commit:
>> > 40d468e162e88e1761b1e80b3ead060f0be927ee [2]
>> >
>> > The source release rc2 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7].
>> > The changelog is located at [8].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [9] for how to validate a release
>> candidate.
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow 0.15.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow 0.15.0 because...
>> >
>> > [1]:
>> >
>> >
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.0
>> > [2]:
>> >
>> >
>> https://github.com/apache/arrow/tree/40d468e162e88e1761b1e80b3ead060f0be927ee
>> > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.0-rc2
>> > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.0-rc2
>> > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.0-rc2
>> > [6]: https://bintray.com/apache/arrow/python-rc/0.15.0-rc2
>> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.0-rc2
>> > [8]:
>> >
>> >
>> https://github.com/apache/arrow/blob/40d468e162e88e1761b1e80b3ead060f0be927ee/CHANGELOG.md
>> > [9]:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>> >
>>
>

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-24-0

2019-09-24 Thread Bryan Cutler

I'm able to pass Spark integration tests locally with the build patch from
https://github.com/apache/arrow/pull/5465, so I'm reasonably confident all
the issues have been resolved and it's just flaky timeouts now. We are
trying some things to fix the timeouts, but nothing to hold up the release
for.

On Tue, Sep 24, 2019 at 8:54 AM Micah Kornfield 
wrote:

> Hi Wes,
> Thanks, that makes sense, I'll pick a commit in a little bit to get started
> with.  Somehow I thought we had done so in the past.
>
> Thanks,
> Micah
>
> On Tue, Sep 24, 2019 at 7:59 AM Wes McKinney  wrote:
>
> > hi Micah -- we should not stop merging PRs. That's been our policy
> > with past releases. If you want to pick a commit to base your release
> > branch off that's fine -- we rebase master later after the release
> > vote closes.
> >
> > On Tue, Sep 24, 2019 at 9:39 AM Micah Kornfield 
> > wrote:
> > >
> > > OK at least Spark and Wheel builds look like they might just be flaky
> > > timeouts.  I agree with Fuzzit not being a blocker.  Are there any
> other
> > > blockers I should be aware of?  Otherwise, I will try to start the
> build
> > > process later today.
> > >
> > > On Tue, Sep 24, 2019 at 8:33 AM Antoine Pitrou 
> > wrote:
> > >
> > > >
> > > > At least for Fuzzit and the OS X Python wheel, I don't think those
> are
> > > > blockers.
> > > >
> > > > (IMHO the others shouldn't block the release either)
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 24/09/2019 à 16:29, Micah Kornfield a écrit :
> > > > > Have the failures already been fixed (i.e. is this a timing
> > issue?).  If
> > > > > not could people chime in if they are looking at some of them?  I
> > assume
> > > > > these are blockers until 0.15.0?
> > > > >
> > > > > If people are OK with it, it might make sense to stop merging
> > > > non-blocking
> > > > > PRs until 0.15.0 is out the door.  Thoughts?
> > > > >
> > > > > On Tue, Sep 24, 2019 at 8:25 AM Crossbow 
> > wrote:
> > > > >
> > > > >>
> > > > >> Arrow Build Report for Job nightly-2019-09-24-0
> > > > >>
> > > > >> All tasks:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0
> > > > >>
> > > > >> Failed Tasks:
> > > > >> - docker-cpp-fuzzit:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-cpp-fuzzit
> > > > >> - docker-spark-integration:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-spark-integration
> > > > >> - gandiva-jar-osx:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-gandiva-jar-osx
> > > > >> - docker-dask-integration:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-dask-integration
> > > > >> - wheel-osx-cp27m:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-wheel-osx-cp27m
> > > > >>
> > > > >> Succeeded Tasks:
> > > > >> - wheel-manylinux2010-cp37m:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-wheel-manylinux2010-cp37m
> > > > >> - docker-python-3.6:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-python-3.6
> > > > >> - docker-clang-format:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-clang-format
> > > > >> - homebrew-cpp:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-homebrew-cpp
> > > > >> - docker-cpp-static-only:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-cpp-static-only
> > > > >> - wheel-osx-cp36m:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-wheel-osx-cp36m
> > > > >> - homebrew-cpp-autobrew:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-travis-homebrew-cpp-autobrew
> > > > >> - docker-python-3.7:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-python-3.7
> > > > >> - docker-python-2.7-nopandas:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-circle-docker-python-2.7-nopandas
> > > > >> - wheel-win-cp35m:
> > > > >>   URL:
> > > > >>
> > > >
> >
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-24-0-appveyor-wheel-win-cp35m
> >

[jira] [Created] (ARROW-6652) [Python] to_pandas conversion removes timezone from type

2019-09-21 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6652:
---

 Summary: [Python] to_pandas conversion removes timezone from type
 Key: ARROW-6652
 URL: https://issues.apache.org/jira/browse/ARROW-6652
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler
 Fix For: 0.15.0


Calling {{to_pandas}} on a {{pyarrow.Array}} with a timezone aware timestamp 
type, removes the timezone in the resulting {{pandas.Series}}.

{code}
>>> import pyarrow as pa
>>> a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))
>>> a.to_pandas()
0   1970-01-01 00:00:00.01
dtype: datetime64[ns]
{code}

Previous behavior from 0.14.1 of converting a {{pyarrow.Column}} {{to_pandas}} 
retained the timezone.
{code}
In [4]: import pyarrow as pa 
   ...: a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles'))  
   ...: c = pa.Column.from_array('ts', a) 

In [5]: c.to_pandas()   
 
Out[5]: 
0   1969-12-31 16:00:00.01-08:00
Name: ts, dtype: datetime64[ns, America/Los_Angeles]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-6534) [Java] Fix typos and spelling

2019-09-11 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6534:
---

 Summary: [Java] Fix typos and spelling
 Key: ARROW-6534
 URL: https://issues.apache.org/jira/browse/ARROW-6534
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


Fix typos and spelling, mostly in docs and tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-10 Thread Bryan Cutler

I have the patch for the EOS with Java writers up here
https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of
{0x, 0x} is used for both stream and file formats, in
non-legacy writing mode.

On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler  wrote:

> Sounds good to me also and I don't think we need a vote either.
>
> On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield 
> wrote:
>
>> +1 on this, I also don't think a vote is necessary as long as we make the
>> change before 0.15.0
>>
>> On Saturday, September 7, 2019, Wes McKinney  wrote:
>>
>> > I see, thank you for catching this nuance.
>> >
>> > I agree that using {0x, 0x} for EOS will resolve the
>> > issue while allowing implementations to be backwards compatible (i.e.
>> > handling the 4-byte EOS from older payloads).
>> >
>> > I'm not sure that we need to have a vote about this, what do others
>> think?
>> >
>> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu 
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > During the java code review[1], seems there is a problem with the
>> > current implementations(C++/Java etc) when reaching EOS, since the new
>> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
>> end
>> > of stream, and the additional 4 bytes will not be read which cause
>> problems
>> > for following up readings.
>> > >
>> > > There are some optional suggestions[2] as below, we should reach
>> > consistent and fix this problem before 0.15 release.
>> > > i. For the new format, an 8-byte EOS token should look like
>> {0x,
>> > 0x}, so we read the continuation token first, and then know to
>> read
>> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just
>> remember
>> > the state, so if it reads the continuation token from the beginning,
>> then
>> > read all 8 bytes at the end.
>> > >
>> > > Thanks,
>> > > Ji Liu
>> > >
>> > > [1] https://github.com/apache/arrow/pull/5229
>> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > From:Eric Erhardt 
>> > > Send Time:2019年9月5日(星期四) 07:16
>> > > To:dev@arrow.apache.org ; Ji Liu <
>> > niki...@aliyun.com>
>> > > Cc:emkornfield ; Paul Taylor <
>> ptay...@apache.org>
>> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > The C# PR is up.
>> > >
>> > > https://github.com/apache/arrow/pull/5280
>> > >
>> > > Eric
>> > >
>> > > -Original Message-
>> > > From: Eric Erhardt 
>> > > Sent: Wednesday, September 4, 2019 10:12 AM
>> > > To: dev@arrow.apache.org; Ji Liu 
>> > > Cc: emkornfield ; Paul Taylor <
>> ptay...@apache.org
>> > >
>> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > I'm working on a PR for the C# bindings. I hope to have it up in the
>> > next day or two. Integration tests for C# would be a great addition at
>> some
>> > point - it's been on my backlog. For now I plan on manually testing it.
>> > >
>> > > -Original Message-
>> > > From: Wes McKinney 
>> > > Sent: Tuesday, September 3, 2019 10:17 PM
>> > > To: Ji Liu 
>> > > Cc: emkornfield ; dev ;
>> > Paul Taylor 
>> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > hi folks,
>> > >
>> > > We now have patches up for Java, JS, and Go. How are we doing on the
>> > code reviews for getting these in?
>> > >
>> > > Since C# implements the binary protocol, the C# developers might want
>> to
>> > look at this before the 0.15.0 release also. Absent integration tests
>> it's
>> > difficult to verify the C# library, though
>> > >
>> > > Thanks
>> > >
>> > &g

[jira] [Created] (ARROW-6519) [Java] Use IPC continuation token to mark EOS

2019-09-10 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6519:
---

 Summary: [Java] Use IPC continuation token to mark EOS
 Key: ARROW-6519
 URL: https://issues.apache.org/jira/browse/ARROW-6519
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


For Arrow stream in non-legacy mode, the EOS identifier should be \{0x, 
0x}. This way, all bytes sent by the writer can be read.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-09 Thread Bryan Cutler

--
> > > > From:Ji Liu  Send Time:2019年8月28日(星期三)
> > > > 17:34 To:emkornfield ; dev
> > > >  Cc:Paul Taylor 
> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > > >
> > > > I could take the Java implementation and will take a close watch on
> > this issue in the next few days.
> > > >
> > > > Thanks,
> > > > Ji Liu
> > > >
> > > >
> > > > --
> > > > From:Micah Kornfield  Send
> Time:2019年8月28日(星期三)
> > > > 17:14 To:dev  Cc:Paul Taylor
> > > > 
> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > > >
> > > > I should have integration tests with 0.14.1 generated binaries in the
> > > > next few days.  I think the one remaining unassigned piece of work in
> > > > the Java implementation, i can take that up next if no one else gets
> > to it.
> > > >
> > > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney 
> > wrote:
> > > >
> > > > > Here's the C++ changes
> > > > >
> > > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > > > thub.com
> %2Fapache%2Farrow%2Fpull%2F5211data=02%7C01%7CEric.Erha
> > > > > rdt%40microsoft.com
> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > > > >
> 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=zWaHS8X
> > > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3Dreserved=0
> > > > >
> > > > > I'm going to create a integration branch where we can merge each
> > > > > patch before merging to master
> > > > >
> > > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney 
> > wrote:
> > > > > >
> > > > > > It isn't implemented in C++ yet but I will try to get a patch up
> > > > > > for that soon (today maybe). I think we should create a branch
> > > > > > where we can stack the patches that implement this for each
> > language.
> > > > > >
> > > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> > > > > > 
> > > > > wrote:
> > > > > > >
> > > > > > > I'll do the JS updates. Is it safe to validate against the
> Arrow
> > > > > > > C++ integration tests?
> > > > > > >
> > > > > > >
> > > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > > > > I created
> > > > > > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > > F%2Fissues.apache.org
> %2Fjira%2Fbrowse%2FARROW-6313data=02
> > > > > > > > %7C01%7CEric.Erhardt%40microsoft.com
> %7C90f02600c4ce40ff5c9008d
> > > > > > > >
> 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > > > >
> 38512163816sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > > > > NiLxI%3Dreserved=0 as a
> > > > > tracking
> > > > > > > > issue with sub-issues on the development work.  So far no-one
> > > > > > > > has
> > > > > claimed
> > > > > > > > Java and Javascript tasks.
> > > > > > > >
> > > > > > > > Would it make sense to have a separate dev branch for this
> > work?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Micah
> > > > > > > >
> > > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> > > > > > > > 
> > > > > wrote:
> > > > > > > >
> > > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding
> +1
> > > > > > > >>
> > > > > > > >> I'll merge the specification patch later today and we can
> > > > > > > >> begin working on implementations so we can get this done for
> > > > > > > >> 0.15.0
> > > > > > > >>

Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson

2019-09-06 Thread Bryan Cutler

Congrats Ben, Kenta and Neal!

On Fri, Sep 6, 2019, 12:15 PM Krisztián Szűcs 
wrote:

> Congratulations!
>
> On Fri, Sep 6, 2019 at 8:12 PM Ben Kietzman 
> wrote:
>
> > Thanks!
> >
> > On Fri, Sep 6, 2019 at 1:09 PM Micah Kornfield 
> > wrote:
> >
> > > Congrats everyone! (apologies if I double sent this).
> > >
> > > On Fri, Sep 6, 2019 at 10:06 AM Neal Richardson <
> > > neal.p.richard...@gmail.com>
> > > wrote:
> > >
> > > > Thanks, y'all!
> > > >
> > > > On Fri, Sep 6, 2019 at 5:44 AM David Li 
> wrote:
> > > > >
> > > > > Congrats all! :)
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On 9/6/19, Francois Saint-Jacques  wrote:
> > > > > > Congrats to everyone!
> > > > > >
> > > > > > François
> > > > > >
> > > > > > On Fri, Sep 6, 2019 at 4:34 AM Kenta Murata 
> wrote:
> > > > > >>
> > > > > >> Thank you very much everyone!
> > > > > >> I'm very happy to join this community.
> > > > > >>
> > > > > >> 2019年9月6日(金) 12:39 Micah Kornfield :
> > > > > >>
> > > > > >> >
> > > > > >> > Congrats everyone.
> > > > > >> >
> > > > > >> > On Thu, Sep 5, 2019 at 7:06 PM Ji Liu
> >  > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Congratulations!
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Ji Liu
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > --
> > > > > >> > > From:Fan Liya 
> > > > > >> > > Send Time:2019年9月6日(星期五) 09:28
> > > > > >> > > To:dev 
> > > > > >> > > Subject:Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta
> > > Murata,
> > > > > >> > > and
> > > > > >> > > Neal Richardson
> > > > > >> > >
> > > > > >> > > Big congratulations to Ben, Kenta and Neal!
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > > Liya Fan
> > > > > >> > >
> > > > > >> > > On Fri, Sep 6, 2019 at 5:33 AM Wes McKinney <
> > > wesmck...@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > hi all,
> > > > > >> > > >
> > > > > >> > > > on behalf of the Arrow PMC, I'm pleased to announce that
> > Ben,
> > > > > >> > > > Kenta,
> > > > > >> > > > and Neal have accepted invitations to become Arrow
> > committers.
> > > > > >> > > > Welcome
> > > > > >> > > > and thank you for all your contributions!
> > > > > >> > > >
> > > > > >> > >
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Kenta Murata
> > > > > >> OpenPGP FP = 1D69 ADDE 081C 9CC2 2E54  98C1 CEFE 8AFB 6081 B062
> > > > > >>
> > > > > >> 本を書きました!!
> > > > > >> 『Ruby 逆引きレシピ』 http://www.amazon.co.jp/dp/4798119881/mrkn-22
> > > > > >>
> > > > > >> E-mail: m...@mrkn.jp
> > > > > >> twitter: http://twitter.com/mrkn/
> > > > > >> blog: http://d.hatena.ne.jp/mrkn/
> > > > > >
> > > >
> > >
> >
>

[jira] [Created] (ARROW-6461) [Java] EchoServer can close socket before client has finished reading

2019-09-04 Thread Bryan Cutler (Jira)

Bryan Cutler created ARROW-6461:
---

 Summary: [Java] EchoServer can close socket before client has 
finished reading
 Key: ARROW-6461
 URL: https://issues.apache.org/jira/browse/ARROW-6461
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
 Fix For: 0.15.0


When the EchoServer finishes running the client connection, the socket is 
closed immediately. This causes a race condition and the client will fail with a
{noformat}
 SocketException: connection reset {noformat}
if it has not read all of the echoed batches.

This was consistently happening with the fix for ARROW-6315



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [ANNOUNCE] New Arrow committer: David M Li

2019-08-30 Thread Bryan Cutler

Congrats David!

On Fri, Aug 30, 2019 at 10:19 AM Antoine Pitrou  wrote:

>
> Congratulations David and welcome to the team  :-)
>
> Regards
>
> Antoine.
>
>
> Le 30/08/2019 à 18:21, Wes McKinney a écrit :
> > On behalf of the Arrow PMC I'm happy to announce that David has
> > accepted an invitation to become an Arrow committer!
> >
> > Welcome, and thank you for your contributions!
> >
>

Re: [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-20 Thread Bryan Cutler

+1 (non-binding)

On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou  wrote:

>
> Sorry, had forgotten to send my vote on this.
>
> +1 from me.
>
> Regards
>
> Antoine.
>
>
> On Wed, 14 Aug 2019 17:42:33 -0500
> Wes McKinney  wrote:
> > hi all,
> >
> > As we've been discussing [1], there is a need to introduce 4 bytes of
> > padding into the preamble of the "encapsulated IPC message" format to
> > ensure that the Flatbuffers metadata payload begins on an 8-byte
> > aligned memory offset. The alternative to this would be for Arrow
> > implementations where alignment is important (e.g. C or C++) to copy
> > the metadata (which is not always small) into memory when it is
> > unaligned.
> >
> > Micah has proposed to address this by adding a
> > 4-byte "continuation" value at the beginning of the payload
> > having the value 0x. The reason to do it this way is that
> > old clients will see an invalid length (what is currently the
> > first 4 bytes of the message -- a 32-bit little endian signed
> > integer indicating the metadata length) rather than potentially
> > crashing on a valid length. We also propose to expand the "end of
> > stream" marker used in the stream and file format from 4 to 8
> > bytes. This has the additional effect of aligning the file footer
> > defined in File.fbs.
> >
> > This would be a backwards incompatible protocol change, so older Arrow
> > libraries would not be able to read these new messages. Maintaining
> > forward compatibility (reading data produced by older libraries) would
> > be possible as we can reason that a value other than the continuation
> > value was produced by an older library (and then validate the
> > Flatbuffer message of course). Arrow implementations could offer a
> > backward compatibility mode for the sake of old readers if they desire
> > (this may also assist with testing).
> >
> > Additionally with this vote, we want to formally approve the change to
> > the Arrow "file" format to always write the (new 8-byte) end-of-stream
> > marker, which enables code that processes Arrow streams to safely read
> > the file's internal messages as though they were a normal stream.
> >
> > The PR making these changes to the IPC documentation is here
> >
> > https://github.com/apache/arrow/pull/4951
> >
> > Please vote to accept these changes. This vote will be open for at
> > least 72 hours
> >
> > [ ] +1 Adopt these Arrow protocol changes
> > [ ] +0
> > [ ] -1 I disagree because...
> >
> > Here is my vote: +1
> >
> > Thanks,
> > Wes
> >
> > [1]:
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> >
>
>
>
>

Re: [VOTE] Proposed addition to Arrow Flight Protocol

2019-08-16 Thread Bryan Cutler

+1 (non-binding)

On Fri, Aug 16, 2019, 8:36 AM Micah Kornfield  wrote:

> My vote +1 (binding)
>
> On Friday, August 16, 2019, David Li  wrote:
>
> > +1 (non-binding)
> >
> > Thanks Ryan for working on this!
> >
> > Best,
> > David
> >
> > On 8/16/19, Micah Kornfield  wrote:
> > > Hello,
> > > Ryan Murray has proposed adding a GetFlightSchema RPC [1] to the Arrow
> > > Flight Protocol [2].  The purpose of this RPC is to allow decoupling
> > schema
> > > and endpoint retrieval as provided by the GetFlightInfo RPC.  The new
> > > definition provided is:
> > >
> > > message SchemaResult {
> > >   // Serialized Flatbuffer Schema message.
> > >   bytes schema = 1;
> > > }
> > > rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
> > >
> > > Ryan has also provided a PR demonstrating implementation of the new RPC
> > [3]
> > > in Java, C++ and Python which can be reviewed and merged after this
> > > addition is approved.
> > >
> > > Please vote whether to accept the addition. The vote will be open for
> at
> > > least 72 hours.
> > >
> > > [ ] +1 Accept this addition to the Flight protocol
> > > [ ] +0
> > > [ ] -1 Do not accept the changes because...
> > >
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1]
> > > https://docs.google.com/document/d/1zLdFYikk3owbKpHvJrARLMlmYpi-
> > Ef6OJy7H90MqViA/edit
> > > [2] https://github.com/apache/arrow/blob/master/format/Flight.proto
> > > [3] https://github.com/apache/arrow/pull/4980
> > >
> >
>

[jira] [Created] (ARROW-6215) [Java] RangeEqualVisitor does not properly compare ZeroVector

2019-08-12 Thread Bryan Cutler (JIRA)

Bryan Cutler created ARROW-6215:
---

 Summary: [Java] RangeEqualVisitor does not properly compare 
ZeroVector
 Key: ARROW-6215
 URL: https://issues.apache.org/jira/browse/ARROW-6215
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


ZeroVector.accept and RangeEqualVisitor always return True no matter what type 
of other vector is compared



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Re: [DISCUSS] Add GetFlightSchema to Flight RPC

2019-08-01 Thread Bryan Cutler

Sounds good to me, I would just echo what others have said.

On Thu, Aug 1, 2019 at 8:17 AM Ryan Murray  wrote:

> Thanks Wes,
>
> The descriptor is only there to maintain a bit of symmetry with
> GetFlightInfo. Happy to remove it, I don't think its necessary and already
> a few people agree. Similar with the method name, I am neutral to the
> naming and can call it whatever the community is happy with.
>
> Best,
> Ryan
>
> On Thu, Aug 1, 2019 at 3:56 PM Wes McKinney  wrote:
>
> > I'm generally supporting of adding the new RPC endpoint.
> >
> > To make a couple points from the document
> >
> > * I'm not sure what the purpose of returning the FlightDescriptor is,
> > but I haven't thought too hard about it
> > * The Schema consists of a single IPC message -- dictionaries will
> > appear in the actual DoGet stream. To motivate why this is --
> > different endpoints might have different dictionaries corresponding to
> > fields in the schema, to have static/constant dictionaries in a
> > distributed Flight setting is likely to be impractical. I summarize
> > the issue as "dictionaries are data, not metadata".
> > * I would be OK calling this GetSchema instead of GetFlightSchema but
> > either is okay
> >
> > - Wes
> >
> > On Thu, Aug 1, 2019 at 8:08 AM David Li  wrote:
> > >
> > > Hi Ryan,
> > >
> > > Thanks for writing this up! I made a couple of minor comments in the
> > > doc/implementation, but overall I'm in favor of having this RPC
> > > method.
> > >
> > > Best,
> > > David
> > >
> > > On 8/1/19, Ryan Murray  wrote:
> > > > Hi All,
> > > >
> > > > Please see the attached document for a proposed addition to the
> Flight
> > > > RPC[1]. This is the result of a previous mailing list discussion[2].
> > > >
> > > > I have created the Pull Request[3] to make the proposal a little more
> > > > concrete.
> > > > 
> > > > Please let me know if you have any questions or concerns.
> > > >
> > > > Best,
> > > > Ryan
> > > >
> > > > [1]:
> > > >
> >
> https://docs.google.com/document/d/1zLdFYikk3owbKpHvJrARLMlmYpi-Ef6OJy7H90MqViA/edit?usp=sharing
> > > > [2]:
> > > >
> >
> https://lists.apache.org/thread.html/3539984493cf3d4d439bef25c150fa9e09e0b43ce0afb6be378d41df@%3Cdev.arrow.apache.org%3E
> > > > [3]: https://github.com/apache/arrow/pull/4980
> > > >
> >
>
>
> --
>
> Ryan Murray  | Principal Consulting Engineer
>
> +447540852009 | rym...@dremio.com
>
> 
> Check out our GitHub , join our community
> site  & Download Dremio
> 
>

Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond

2019-07-31 Thread Bryan Cutler

+1 (non-binding)

On Wed, Jul 31, 2019 at 8:59 AM Uwe L. Korn  wrote:

> +1 from me.
>
> I really like the separate versions
>
> Uwe
>
> On Tue, Jul 30, 2019, at 2:21 PM, Antoine Pitrou wrote:
> >
> > +1 from me.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Fri, 26 Jul 2019 14:33:30 -0500
> > Wes McKinney  wrote:
> > > hello,
> > >
> > > As discussed on the mailing list thread [1], Micah Kornfield has
> > > proposed a version scheme for the project to take effect starting with
> > > the 1.0.0 release. See document [2] containing a discussion of the
> > > issues involved.
> > >
> > > To summarize my understanding of the plan:
> > >
> > > 1. TWO VERSIONS: As of 1.0.0, we establish separate FORMAT and LIBRARY
> > > versions. Currently there is only a single version number.
> > >
> > > 2. SEMANTIC VERSIONING: We follow https://semver.org/ with regards to
> > > communicating library API changes. Given the project's pace of
> > > evolution, most releases are likely to be MAJOR releases according to
> > > SemVer principles.
> > >
> > > 3. RELEASES: Releases of the project will be named according to the
> > > LIBRARY version. A major release may or may not change the FORMAT
> > > version. When a LIBRARY version has been released for a new FORMAT
> > > version, the latter is considered to be released and official.
> > >
> > > 4. Each LIBRARY version will have a corresponding FORMAT version. For
> > > example, LIBRARY versions 2.0.0 and 3.0.0 may track FORMAT version
> > > 1.0.0. The idea is that FORMAT version will change less often than
> > > LIBRARY version.
> > >
> > > 5. BACKWARD COMPATIBILITY GUARANTEE: A newer versioned client library
> > > will be able to read any data and metadata produced by an older client
> > > library.
> > >
> > > 6. FORWARD COMPATIBILITY GUARANTEE: An older client library must be
> > > able to either read data generated from a new client library or detect
> > > that it cannot properly read the data.
> > >
> > > 7. FORMAT MINOR VERSIONS: An increase in the minor version of the
> > > FORMAT version, such as 1.0.0 to 1.1.0, indicates that 1.1.0 contains
> > > new features not available in 1.0.0. So long as these features are not
> > > used (such as a new logical data type), forward compatibility is
> > > preserved.
> > >
> > > 8. FORMAT MAJOR VERSIONS: A change in the FORMAT major version
> > > indicates a disruption to these compatibility guarantees in some way.
> > > Hopefully we don't have to do this many times in our respective
> > > lifetimes
> > >
> > > If I've misrepresented some aspect of the proposal it's fine to
> > > discuss more and we can start a new votes.
> > >
> > > Please vote to approve this proposal. I'd like to keep this vote open
> > > for 7 days (until Friday August 2) to allow for ample opportunities
> > > for the community to have a look.
> > >
> > > [ ] +1 Adopt these version conventions and compatibility guarantees as
> > > of Apache Arrow 1.0.0
> > > [ ] +0
> > > [ ] -1 I disagree because...
> > >
> > > Here is my vote: +1
> > >
> > > Thanks
> > > Wes
> > >
> > > [1]:
> https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8@%3Cdev.arrow.apache.org%3E
> > > [2]:
> https://docs.google.com/document/d/1uBitWu57rDu85tNHn0NwstAbrlYqor9dPFg_7QaE-nc/edit#
> > >
> >
> >
> >
> >
>

Re: [Discuss] Do a 0.15.0 release before 1.0.0?

2019-07-24 Thread Bryan Cutler

+1 on a 0.15.0 release. At the minimum, if we could detect the stream and
provide a clear error message for Python and Java I think that would help
the transition. If we are also able to implement readers/writers that can
fallback to 4-byte prefix, then that would be nice to have.

On Wed, Jul 24, 2019 at 1:27 PM Jacques Nadeau  wrote:

> I'm ok with the change and 0.15 release to better manage it.
>
>
> > I've always understood the metadata to be a few dozen/hundred KB, a
> > small percentage of the total message size. I could be underestimating
> > the ratios though -- is it common to have tables w/ 1000+ columns? I've
> > seen a few reports like that in cuDF, but I'm curious to hear
> > Jacques'/Dremio's experience too.
> >
>
> Metadata size has been an issue at different points for us. We do
> definitely see datasets with 1000+ columns. It is also compounded by the
> fact that as we add more columns, we typically decrease row count so that
> the individual batches are still easily pipelined--which further increases
> the relative ratio between data and metadata.
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-18 Thread Bryan Cutler

Hey Wes,
I understand we don't want to burden 1.0 by maintaining compatibility and
that is fine with me. I'm just try to figure out how to best handle this
situation so Spark users won't get a cryptic error message. It sounds like
it will need to be handled on the Spark side to not allow mixing 1.0 and
pre-1.0 versions. I'm not too sure how much a 0.15.0 release with
compatibility would help, it might depend on when things get released but
we can discuss that in another thread.

On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney  wrote:

> hi Bryan -- well, the reason for the current 0.x version is precisely
> to avoid a situation where we are making decisions on the basis of
> maintaining forward / backward compatibility.
>
> One possible way forward on this is to make a 0.15.0 (0.14.2, so there
> is less trouble for Spark to upgrade) release that supports reading
> _both_ old and new variants of the protocol.
>
> On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler  wrote:
> >
> > Are we going to say that Arrow 1.0 is not compatible with any version
> > before?  My concern is that Spark 2.4.x might get stuck on Arrow Java
> > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will not
> work.
> > In Spark 3.0.0, though it will be no problem to update both Java and
> Python
> > to 1.0. Having a compatibility mode so that new readers/writers can work
> > with old readers using a 4-byte prefix would solve the problem, but if we
> > don't want to do this will pyarrow be able to raise an error that clearly
> > the new version does not support the old protocol?  For example, would a
> > pyarrow reader see the 0x and raise something like "PyArrow
> > detected an old protocol and cannot continue, please use a version <
> 1.0.0"?
> >
> > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney 
> wrote:
> >
> > > Hi Francois -- copying the metadata into memory isn't the end of the
> world
> > > but it's a pretty ugly wart. This affects every IPC protocol message
> > > everywhere.
> > >
> > > We have an opportunity to address the wart now but such a fix
> post-1.0.0
> > > will be much more difficult.
> > >
> > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> > > fsaintjacq...@gmail.com> wrote:
> > >
> > > > If the data buffers are still aligned, then I don't think we should
> > > > add a breaking change just for avoiding the copy on the metadata? I'd
> > > > expect said metadata to be small enough that zero-copy doesn't really
> > > > affect performance.
> > > >
> > > > François
> > > >
> > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > > >
> > > > > While working on trying to fix undefined behavior for unaligned
> memory
> > > > > accesses [1], I ran into an issue with the IPC specification [2]
> which
> > > > > prevents us from ever achieving zero-copy memory mapping and having
> > > > aligned
> > > > > accesses (i.e. clean UBSan runs).
> > > > >
> > > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned
> > > accesses.
> > > > >
> > > > > In the IPC format we align each message to 8-byte boundaries.  We
> then
> > > > > write a int32_t integer to to denote the size of flat buffer
> metadata,
> > > > > followed immediately  by the flatbuffer metadata.  This means the
> > > > > flatbuffer metadata will never be 8 byte aligned.
> > > > >
> > > > > Do people care?  A simple fix  would be to use int64_t instead of
> > > int32_t
> > > > > for length.  However, any fix essentially breaks all previous
> client
> > > > > library versions or incurs a memory copy.
> > > > >
> > > > > [1] https://github.com/apache/arrow/pull/4757
> > > > > [2] https://arrow.apache.org/docs/ipc.html
> > > >
> > >
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

2019-07-18 Thread Bryan Cutler

Are we going to say that Arrow 1.0 is not compatible with any version
before?  My concern is that Spark 2.4.x might get stuck on Arrow Java
0.14.1 and a lot of users will install PyArrow 1.0.0, which will not work.
In Spark 3.0.0, though it will be no problem to update both Java and Python
to 1.0. Having a compatibility mode so that new readers/writers can work
with old readers using a 4-byte prefix would solve the problem, but if we
don't want to do this will pyarrow be able to raise an error that clearly
the new version does not support the old protocol?  For example, would a
pyarrow reader see the 0x and raise something like "PyArrow
detected an old protocol and cannot continue, please use a version < 1.0.0"?

On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney  wrote:

> Hi Francois -- copying the metadata into memory isn't the end of the world
> but it's a pretty ugly wart. This affects every IPC protocol message
> everywhere.
>
> We have an opportunity to address the wart now but such a fix post-1.0.0
> will be much more difficult.
>
> On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > If the data buffers are still aligned, then I don't think we should
> > add a breaking change just for avoiding the copy on the metadata? I'd
> > expect said metadata to be small enough that zero-copy doesn't really
> > affect performance.
> >
> > François
> >
> > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield 
> > wrote:
> > >
> > > While working on trying to fix undefined behavior for unaligned memory
> > > accesses [1], I ran into an issue with the IPC specification [2] which
> > > prevents us from ever achieving zero-copy memory mapping and having
> > aligned
> > > accesses (i.e. clean UBSan runs).
> > >
> > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned
> accesses.
> > >
> > > In the IPC format we align each message to 8-byte boundaries.  We then
> > > write a int32_t integer to to denote the size of flat buffer metadata,
> > > followed immediately  by the flatbuffer metadata.  This means the
> > > flatbuffer metadata will never be 8 byte aligned.
> > >
> > > Do people care?  A simple fix  would be to use int64_t instead of
> int32_t
> > > for length.  However, any fix essentially breaks all previous client
> > > library versions or incurs a memory copy.
> > >
> > > [1] https://github.com/apache/arrow/pull/4757
> > > [2] https://arrow.apache.org/docs/ipc.html
> >
>

1 2 3 >

1 - 100 of 276 matches

Mail list logo