[jira] [Created] (ARROW-7714) [Release] Variable expansion is missing

2020-01-28 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7714:
---

 Summary: [Release] Variable expansion is missing
 Key: ARROW-7714
 URL: https://issues.apache.org/jira/browse/ARROW-7714
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7713) [Java] TastLeak was put at the wrong location

2020-01-28 Thread Ji Liu (Jira)
Ji Liu created ARROW-7713:
-

 Summary: [Java] TastLeak was put at the wrong location
 Key: ARROW-7713
 URL: https://issues.apache.org/jira/browse/ARROW-7713
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Ji Liu
Assignee: Ji Liu


Seems {{TestLeak.java}} was put at the wrong place, we should move it into 
{{flight-core}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7712) [CI][Crossbow] Fix or delete fuzzit jobs

2020-01-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7712:
--

 Summary: [CI][Crossbow] Fix or delete fuzzit jobs
 Key: ARROW-7712
 URL: https://issues.apache.org/jira/browse/ARROW-7712
 Project: Apache Arrow
  Issue Type: Task
  Components: C++, Continuous Integration
Reporter: Neal Richardson


Not sure we need them now that we're using the OSS-Fuzz project, but they're 
broken. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7711) [C#] Date32 test depends on system timezone

2020-01-28 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7711:
---

 Summary: [C#] Date32 test depends on system timezone
 Key: ARROW-7711
 URL: https://issues.apache.org/jira/browse/ARROW-7711
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


The following failure was occurred on 2020-01-29:08:47:33+09:00:

{noformat}
Starting test execution, please wait...
[xUnit.net 00:00:00.53] Apache.Arrow.Tests.Date32ArrayTests+Set.SetAndGet 
[FAIL]
  X Apache.Arrow.Tests.Date32ArrayTests+Set.SetAndGet [19ms]
  Error Message:
   Assert.Equal() Failure
Expected: 2020-01-28T00:00:00.000
Actual:   2020-01-27T00:00:00.000
  Stack Trace:
 at Apache.Arrow.Tests.Date32ArrayTests.Set.SetAndGet() in 
/tmp/arrow-0.16.0.mrKfP/apache-arrow-0.16.0/csharp/test/Apache.Arrow.Tests/Date32ArrayTests.cs:line
 38
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7710) [Release][C#] .NET download URL is redirected

2020-01-28 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7710:
---

 Summary: [Release][C#] .NET download URL is redirected
 Key: ARROW-7710
 URL: https://issues.apache.org/jira/browse/ARROW-7710
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://gist.github.com/pitrou/5c4a98387153ef415ef64b8aa2457e63

{noformat}

++ curl 
https://dotnet.microsoft.com/download/thank-you/dotnet-sdk-2.2.300-linux-x64-binaries
++ grep 'window\.open'
++ grep -E -o '[^"]+'
++ sed -n 2p
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
+ local dotnet_download_url=
+ curl
+ tar xzf - -C /tmp/arrow-0.16.0.iRp8b/apache-arrow-0.16.0/csharp/bin
curl: try 'curl --help' or 'curl --manual' for more information

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
{noformat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Bryan Cutler
The nightly Spark integration was failing because a test was renamed
recently in master. Once I fixed that and ran it again, this surfaced. I
remember it passed after the `split_blocks` change not too long ago, so all
this is pretty recent.

On Tue, Jan 28, 2020 at 2:46 PM Wes McKinney  wrote:

> Bryan -- was this tested somewhere that we missed (eg a nightly)?
>
> On Tue, Jan 28, 2020, 4:31 PM Bryan Cutler  wrote:
>
> > -1
> > There is a bug in Pandas conversion for timestamps that looks to be a
> > regression, https://issues.apache.org/jira/browse/ARROW-7709
> >
> > On Tue, Jan 28, 2020 at 11:30 AM Wes McKinney 
> wrote:
> >
> > > I opened https://issues.apache.org/jira/browse/ARROW-7708.
> > >
> > > On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney 
> > wrote:
> > > >
> > > > Hi Gawain -- since PARQUET issues are attached to a different project
> > > and fix version these have to be extracted from the git changelog. We
> can
> > > alter our scripts to scrape these commits from the git log output.
> > > >
> > > > On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton 
> > > wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >> It would seem that the list of issues does not include any of the
> > issues
> > > >> in the Parquet project which were fixed in this release.
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Gawain
> > > >>
> > > >> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> > > >> > Sorry, the previous email is hardly readable.
> > > >> >
> > > >> > I would like to propose the following release candidate (RC1) of
> > > Apache
> > > >> > Arrow version 0.16.0. This is a release consisting of 710 resolved
> > > JIRA
> > > >> > issues[1].
> > > >> >
> > > >> > This release candidate is based on commit:
> > > >> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> > > >> >
> > > >> > The source release rc1 is hosted at [3].
> > > >> > The binary artifacts are hosted at [4][5][6][7].
> > > >> > The changelog is located at [8].
> > > >> >
> > > >> > Please download, verify checksums and signatures, run the unit
> > tests,
> > > >> > and vote on the release. See [9] for how to validate a release
> > > candidate.
> > > >> >
> > > >> > The vote will be open for at least 72 hours.
> > > >> >
> > > >> > [ ] +1 Release this as Apache Arrow 0.16.0
> > > >> > [ ] +0
> > > >> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> > > >> >
> > > >> > [1]:
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > > >> > [2]:
> > >
> >
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> > > >> > [3]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> > > >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> > > >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> > > >> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> > > >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> > > >> > [8]:
> > >
> >
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> > > >> > [9]:
> > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > > >> >
> > > >> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> > > >> >  wrote:
> > > >> >> Hi,
> > > >> >>
> > > >> >> I would like to propose the following release candidate (RC1) of
> > > Apache
> > > >> >> Arrow version 0.16.0. This is a release consisting of 710
> > > >> >> resolved JIRA issues[1].
> > > >> >>
> > > >> >> This release candidate is based on commit:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>   The source release rc1
> is
> > > >> >> hosted at [3].
> > > >> >> The binary artifacts are hosted at [4][5][6][7].
> > > >> >>
> > > >> >>
> > > >> >>   The changelog is
> located
> > at
> > > >> >> [8].
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>   Please download, verify
> > > >> >> checksums and signatures, run the unit tests,
> > > >> >> and vote on the release. See [9] for how to validate a release
> > > >> >> candidate.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> The vote will be open for at least 72 hours.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>   [ ] +1 Release this as
> > > Apache
> > > >> >> Arrow 0.16.0
> > > >> >> [ ] +0
> > > >> >>
> > > >> >>
> > > >> >>   [ ] -1 Do not release
> > this
> > > as
> > > >> >> Apache Arrow 0.16.0 because...
> > > >> >>
> > > >> >> [1]:
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > > >> >> [2]:
> 

Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Wes McKinney
Bryan -- was this tested somewhere that we missed (eg a nightly)?

On Tue, Jan 28, 2020, 4:31 PM Bryan Cutler  wrote:

> -1
> There is a bug in Pandas conversion for timestamps that looks to be a
> regression, https://issues.apache.org/jira/browse/ARROW-7709
>
> On Tue, Jan 28, 2020 at 11:30 AM Wes McKinney  wrote:
>
> > I opened https://issues.apache.org/jira/browse/ARROW-7708.
> >
> > On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney 
> wrote:
> > >
> > > Hi Gawain -- since PARQUET issues are attached to a different project
> > and fix version these have to be extracted from the git changelog. We can
> > alter our scripts to scrape these commits from the git log output.
> > >
> > > On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton 
> > wrote:
> > >>
> > >> Hello,
> > >>
> > >> It would seem that the list of issues does not include any of the
> issues
> > >> in the Parquet project which were fixed in this release.
> > >>
> > >> Cheers,
> > >>
> > >> Gawain
> > >>
> > >> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> > >> > Sorry, the previous email is hardly readable.
> > >> >
> > >> > I would like to propose the following release candidate (RC1) of
> > Apache
> > >> > Arrow version 0.16.0. This is a release consisting of 710 resolved
> > JIRA
> > >> > issues[1].
> > >> >
> > >> > This release candidate is based on commit:
> > >> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> > >> >
> > >> > The source release rc1 is hosted at [3].
> > >> > The binary artifacts are hosted at [4][5][6][7].
> > >> > The changelog is located at [8].
> > >> >
> > >> > Please download, verify checksums and signatures, run the unit
> tests,
> > >> > and vote on the release. See [9] for how to validate a release
> > candidate.
> > >> >
> > >> > The vote will be open for at least 72 hours.
> > >> >
> > >> > [ ] +1 Release this as Apache Arrow 0.16.0
> > >> > [ ] +0
> > >> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> > >> >
> > >> > [1]:
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > >> > [2]:
> >
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> > >> > [3]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> > >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> > >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> > >> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> > >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> > >> > [8]:
> >
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> > >> > [9]:
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > >> >
> > >> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> > >> >  wrote:
> > >> >> Hi,
> > >> >>
> > >> >> I would like to propose the following release candidate (RC1) of
> > Apache
> > >> >> Arrow version 0.16.0. This is a release consisting of 710
> > >> >> resolved JIRA issues[1].
> > >> >>
> > >> >> This release candidate is based on commit:
> > >> >>
> > >> >>
> > >> >>
> > >> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> > >> >>
> > >> >>
> > >> >>
> > >> >>   The source release rc1 is
> > >> >> hosted at [3].
> > >> >> The binary artifacts are hosted at [4][5][6][7].
> > >> >>
> > >> >>
> > >> >>   The changelog is located
> at
> > >> >> [8].
> > >> >>
> > >> >>
> > >> >>
> > >> >>   Please download, verify
> > >> >> checksums and signatures, run the unit tests,
> > >> >> and vote on the release. See [9] for how to validate a release
> > >> >> candidate.
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> The vote will be open for at least 72 hours.
> > >> >>
> > >> >>
> > >> >>
> > >> >>   [ ] +1 Release this as
> > Apache
> > >> >> Arrow 0.16.0
> > >> >> [ ] +0
> > >> >>
> > >> >>
> > >> >>   [ ] -1 Do not release
> this
> > as
> > >> >> Apache Arrow 0.16.0 because...
> > >> >>
> > >> >> [1]:
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > >> >> [2]:
> >
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> > >> >> [3]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> > >> >> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> > >> >> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> > >> >> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> > >> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> > >> >> [8]:
> >
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> > >> >> [9]:
> >
> 

Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Bryan Cutler
-1
There is a bug in Pandas conversion for timestamps that looks to be a
regression, https://issues.apache.org/jira/browse/ARROW-7709

On Tue, Jan 28, 2020 at 11:30 AM Wes McKinney  wrote:

> I opened https://issues.apache.org/jira/browse/ARROW-7708.
>
> On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney  wrote:
> >
> > Hi Gawain -- since PARQUET issues are attached to a different project
> and fix version these have to be extracted from the git changelog. We can
> alter our scripts to scrape these commits from the git log output.
> >
> > On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton 
> wrote:
> >>
> >> Hello,
> >>
> >> It would seem that the list of issues does not include any of the issues
> >> in the Parquet project which were fixed in this release.
> >>
> >> Cheers,
> >>
> >> Gawain
> >>
> >> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> >> > Sorry, the previous email is hardly readable.
> >> >
> >> > I would like to propose the following release candidate (RC1) of
> Apache
> >> > Arrow version 0.16.0. This is a release consisting of 710 resolved
> JIRA
> >> > issues[1].
> >> >
> >> > This release candidate is based on commit:
> >> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >> >
> >> > The source release rc1 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7].
> >> > The changelog is located at [8].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> >> > and vote on the release. See [9] for how to validate a release
> candidate.
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow 0.16.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> >> >
> >> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> >> > [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> >> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> >> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> >> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> >> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> >> > [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> >> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >> >
> >> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> >> >  wrote:
> >> >> Hi,
> >> >>
> >> >> I would like to propose the following release candidate (RC1) of
> Apache
> >> >> Arrow version 0.16.0. This is a release consisting of 710
> >> >> resolved JIRA issues[1].
> >> >>
> >> >> This release candidate is based on commit:
> >> >>
> >> >>
> >> >>
> >> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >> >>
> >> >>
> >> >>
> >> >>   The source release rc1 is
> >> >> hosted at [3].
> >> >> The binary artifacts are hosted at [4][5][6][7].
> >> >>
> >> >>
> >> >>   The changelog is located at
> >> >> [8].
> >> >>
> >> >>
> >> >>
> >> >>   Please download, verify
> >> >> checksums and signatures, run the unit tests,
> >> >> and vote on the release. See [9] for how to validate a release
> >> >> candidate.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> The vote will be open for at least 72 hours.
> >> >>
> >> >>
> >> >>
> >> >>   [ ] +1 Release this as
> Apache
> >> >> Arrow 0.16.0
> >> >> [ ] +0
> >> >>
> >> >>
> >> >>   [ ] -1 Do not release this
> as
> >> >> Apache Arrow 0.16.0 because...
> >> >>
> >> >> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> >> >> [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> >> >> [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> >> >> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> >> >> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> >> >> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> >> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> >> >> [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> >> >> [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>


[jira] [Created] (ARROW-7709) [Python] Conversion from Table Column to Pandas loses name for Timestamps

2020-01-28 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-7709:
---

 Summary: [Python] Conversion from Table Column to Pandas loses 
name for Timestamps
 Key: ARROW-7709
 URL: https://issues.apache.org/jira/browse/ARROW-7709
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Bryan Cutler


When converting a Table timestamp column to Pandas, the name of the column is 
lost in the resulting series.
{code:java}
In [23]: a1 = pa.array([pd.Timestamp.now()])
 

In [24]: a2 = pa.array([1]) 
 

In [25]: t = pa.Table.from_arrays([a1, a2], ['ts', 'a'])
 

In [26]: for c in t: 
...: print(c.to_pandas()) 
...:
 
0   2020-01-28 13:17:26.738708
dtype: datetime64[ns]
01
Name: a, dtype: int64 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-7708.

On Tue, Jan 28, 2020 at 1:24 PM Wes McKinney  wrote:
>
> Hi Gawain -- since PARQUET issues are attached to a different project and fix 
> version these have to be extracted from the git changelog. We can alter our 
> scripts to scrape these commits from the git log output.
>
> On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton  wrote:
>>
>> Hello,
>>
>> It would seem that the list of issues does not include any of the issues
>> in the Parquet project which were fixed in this release.
>>
>> Cheers,
>>
>> Gawain
>>
>> On 28/01/2020 11:46, Krisztián Szűcs wrote:
>> > Sorry, the previous email is hardly readable.
>> >
>> > I would like to propose the following release candidate (RC1) of Apache
>> > Arrow version 0.16.0. This is a release consisting of 710 resolved JIRA
>> > issues[1].
>> >
>> > This release candidate is based on commit:
>> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
>> >
>> > The source release rc1 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7].
>> > The changelog is located at [8].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [9] for how to validate a release candidate.
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow 0.16.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
>> >
>> > [1]: 
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
>> > [2]: 
>> > https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
>> > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
>> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
>> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
>> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
>> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
>> > [8]: 
>> > https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
>> > [9]: 
>> > https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>> >
>> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
>> >  wrote:
>> >> Hi,
>> >>
>> >> I would like to propose the following release candidate (RC1) of Apache
>> >> Arrow version 0.16.0. This is a release consisting of 710
>> >> resolved JIRA issues[1].
>> >>
>> >> This release candidate is based on commit:
>> >>
>> >>
>> >>
>> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
>> >>
>> >>
>> >>
>> >>   The source release rc1 is
>> >> hosted at [3].
>> >> The binary artifacts are hosted at [4][5][6][7].
>> >>
>> >>
>> >>   The changelog is located at
>> >> [8].
>> >>
>> >>
>> >>
>> >>   Please download, verify
>> >> checksums and signatures, run the unit tests,
>> >> and vote on the release. See [9] for how to validate a release
>> >> candidate.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> The vote will be open for at least 72 hours.
>> >>
>> >>
>> >>
>> >>   [ ] +1 Release this as Apache
>> >> Arrow 0.16.0
>> >> [ ] +0
>> >>
>> >>
>> >>   [ ] -1 Do not release this as
>> >> Apache Arrow 0.16.0 because...
>> >>
>> >> [1]: 
>> >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
>> >> [2]: 
>> >> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
>> >> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
>> >> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
>> >> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
>> >> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
>> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
>> >> [8]: 
>> >> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
>> >> [9]: 
>> >> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


[jira] [Created] (ARROW-7708) [Release] Include PARQUET commits from git changelog in release changelogs

2020-01-28 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7708:
---

 Summary: [Release] Include PARQUET commits from git changelog in 
release changelogs
 Key: ARROW-7708
 URL: https://issues.apache.org/jira/browse/ARROW-7708
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Wes McKinney
Hi Gawain -- since PARQUET issues are attached to a different project and
fix version these have to be extracted from the git changelog. We can alter
our scripts to scrape these commits from the git log output.

On Tue, Jan 28, 2020, 1:06 PM Gawain Bolton  wrote:

> Hello,
>
> It would seem that the list of issues does not include any of the issues
> in the Parquet project which were fixed in this release.
>
> Cheers,
>
> Gawain
>
> On 28/01/2020 11:46, Krisztián Szűcs wrote:
> > Sorry, the previous email is hardly readable.
> >
> > I would like to propose the following release candidate (RC1) of Apache
> > Arrow version 0.16.0. This is a release consisting of 710 resolved JIRA
> > issues[1].
> >
> > This release candidate is based on commit:
> > 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >
> > The source release rc1 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.16.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.16.0 because...
> >
> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> > [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> > [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> > [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> > [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> > [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >
> > On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
> >  wrote:
> >> Hi,
> >>
> >> I would like to propose the following release candidate (RC1) of Apache
> >> Arrow version 0.16.0. This is a release consisting of 710
> >> resolved JIRA issues[1].
> >>
> >> This release candidate is based on commit:
> >>
> >>
> >>
> >> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
> >>
> >>
> >>
> >>   The source release rc1 is
> >> hosted at [3].
> >> The binary artifacts are hosted at [4][5][6][7].
> >>
> >>
> >>   The changelog is located at
> >> [8].
> >>
> >>
> >>
> >>   Please download, verify
> >> checksums and signatures, run the unit tests,
> >> and vote on the release. See [9] for how to validate a release
> >> candidate.
> >>
> >>
> >>
> >>
> >>
> >> The vote will be open for at least 72 hours.
> >>
> >>
> >>
> >>   [ ] +1 Release this as Apache
> >> Arrow 0.16.0
> >> [ ] +0
> >>
> >>
> >>   [ ] -1 Do not release this as
> >> Apache Arrow 0.16.0 because...
> >>
> >> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> >> [2]:
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> >> [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> >> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> >> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> >> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> >> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> >> [8]:
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> >> [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>


Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Gawain Bolton

Hello,

It would seem that the list of issues does not include any of the issues 
in the Parquet project which were fixed in this release.


Cheers,

Gawain

On 28/01/2020 11:46, Krisztián Szűcs wrote:

Sorry, the previous email is hardly readable.

I would like to propose the following release candidate (RC1) of Apache
Arrow version 0.16.0. This is a release consisting of 710 resolved JIRA
issues[1].

This release candidate is based on commit:
188afde1f4298fb668e8ebadeacbc545e2de086f [2]

The source release rc1 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7].
The changelog is located at [8].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [9] for how to validate a release candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.16.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 0.16.0 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
[2]: 
https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
[4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
[5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
[6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
[7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
[8]: 
https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
[9]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates

On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
 wrote:

Hi,

I would like to propose the following release candidate (RC1) of Apache
Arrow version 0.16.0. This is a release consisting of 710
resolved JIRA issues[1].

This release candidate is based on commit:



188afde1f4298fb668e8ebadeacbc545e2de086f [2]



  The source release rc1 is
hosted at [3].
The binary artifacts are hosted at [4][5][6][7].


  The changelog is located at
[8].



  Please download, verify
checksums and signatures, run the unit tests,
and vote on the release. See [9] for how to validate a release
candidate.





The vote will be open for at least 72 hours.



  [ ] +1 Release this as Apache
Arrow 0.16.0
[ ] +0


  [ ] -1 Do not release this as
Apache Arrow 0.16.0 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
[2]: 
https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
[4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
[5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
[6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
[7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
[8]: 
https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
[9]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


[jira] [Created] (ARROW-7707) Build errors after building from source

2020-01-28 Thread Hossein R (Jira)
Hossein R created ARROW-7707:


 Summary: Build errors after building from source
 Key: ARROW-7707
 URL: https://issues.apache.org/jira/browse/ARROW-7707
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.1
Reporter: Hossein R


Hi there I built the library from source and now trying to write parquet files. 
Mainly using the example code, but I am getting link errors I cannot seem to be 
able to fix.

This is how I build the library:

 

{{ }}
{{-> % cmake .. -DARROW_PARQUET:BOOL=ON -DPARQUET_BUILD_EXAMPLES:BOOL=ON }}
{{-- Building using CMake version: 3.16.2-- The C compiler identification is 
GNU 5.4.0}}
{{-- The CXX compiler identification is GNU 5.4.0-- Check for working C 
compiler: /usr/bin/cc}}
{{-- Check for working C compiler: /usr/bin/cc -- works-- Detecting C compiler 
ABI info}}
{{-- Detecting C compiler ABI info - done-- Detecting C compile features}}
{{-- Detecting C compile features - done}}
{{-- Check for working CXX compiler: /usr/bin/c++-- Check for working CXX 
compiler: /usr/bin/c++ -- works}}
{{-- Detecting CXX compiler ABI info-- Detecting CXX compiler ABI info - done}}
{{-- Detecting CXX compile features-- Detecting CXX compile features - done}}
{{-- Arrow version: 0.15.1 (full: '0.15.1')-- Arrow SO version: 15 (full: 
15.1.0)}}
{{-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- 
clang-tidy not found}}
{{-- clang-format not found-- infer not found}}
{{-- Found PythonInterp: /home/hosanez/env/bin/python (found version "2.7.12") 
}}
{{-- Found cpplint executable at 
/home/hosanez/scratch/apache-arrow-0.15.1/cpp/build-support/cpplint.py}}
{{-- Compiler command: env LANG=C /usr/bin/c++ -v}}
{{-- Compiler version: Using built-in specs.}}
{{COLLECT_GCC=/usr/bin/c++}}
{{COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper}}
{{Target: x86_64-linux-gnu}}
{{Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs 
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-5 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib 
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home 
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib 
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu 
--host=x86_64-linux-gnu --target=x86_64-linux-gnu}}
{{Thread model: posix}}
{{gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) }}
{{-- Compiler id: GNU}}
{{Selected compiler gcc 5.4.0-- Performing Test CXX_SUPPORTS_SSE4_2}}
{{-- Performing Test CXX_SUPPORTS_SSE4_2 - Success-- Performing Test 
CXX_SUPPORTS_ALTIVEC}}
{{-- Performing Test CXX_SUPPORTS_ALTIVEC - Failed-- Performing Test 
CXX_SUPPORTS_ARMCRC}}
{{-- Performing Test CXX_SUPPORTS_ARMCRC - Failed-- Performing Test 
CXX_SUPPORTS_ARMV8_CRC_CRYPTO}}
{{-- Performing Test CXX_SUPPORTS_ARMV8_CRC_CRYPTO - Failed-- Arrow build 
warning level: PRODUCTION}}
{{Using ld linkerConfigured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...})}}
{{-- Build Type: RELEASE}}
{{-- Using AUTO approach to find dependencies}}
{{-- AWSSDK_VERSION: 1.7.160}}
{{-- BOOST_VERSION: 1.67.0}}
{{-- BROTLI_VERSION: v1.0.7}}
{{-- BZIP2_VERSION: 1.0.8}}
{{-- CARES_VERSION: 1.15.0}}
{{-- DOUBLE_CONVERSION_VERSION: v3.1.5}}
{{-- FLATBUFFERS_VERSION: v1.11.0}}
{{-- GBENCHMARK_VERSION: v1.5.0}}
{{-- GFLAGS_VERSION: v2.2.0}}
{{-- GLOG_VERSION: v0.3.5}}
{{-- GRPC_VERSION: v1.20.0}}
{{-- GTEST_VERSION: 1.8.1}}
{{-- JEMALLOC_VERSION: 5.2.1}}
{{-- LZ4_VERSION: v1.8.3}}
{{-- MIMALLOC_VERSION: 270e765454f98e8bab9d42609b153425f749fff6}}
{{-- ORC_VERSION: 1.5.5}}
{{-- PROTOBUF_VERSION: v3.7.1}}
{{-- RAPIDJSON_VERSION: 2bbd33b33217ff4a73434ebf10cdac41e2ef5e34}}
{{-- RE2_VERSION: 2019-08-01}}
{{-- SNAPPY_VERSION: 1.1.7}}
{{-- THRIFT_VERSION: 0.12.0}}
{{-- THRIFT_MD5_CHECKSUM: 3deebbb4d1ca77dd9c9e009a1ea02183}}
{{-- URIPARSER_VERSION: 0.9.3}}
{{-- ZLIB_VERSION: 1.2.11}}
{{-- ZSTD_VERSION: v1.4.3}}
{{-- Looking for pthread.h}}
{{-- Looking for pthread.h - found}}
{{-- Performing Test CMAKE_HAVE_LIBC_PTHREAD}}
{{-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed}}
{{-- Check 

[jira] [Created] (ARROW-7706) saving a dataframe to the same partitioned location silently doubles the data

2020-01-28 Thread Tsvika Shapira (Jira)
Tsvika Shapira created ARROW-7706:
-

 Summary: saving a dataframe to the same partitioned location 
silently doubles the data
 Key: ARROW-7706
 URL: https://issues.apache.org/jira/browse/ARROW-7706
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
Reporter: Tsvika Shapira


When a user saves a dataframe:
{code:python}
df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')
{code}
it will create sub-directories named "{{a=val1}}", "{{a=val2}}" in 
{{/tmp/table}}. Each of them will contain one (or more?) parquet files with 
random filenames.

If a user runs the same command again, the code will use the existing 
sub-directories, but with different (random) filenames. As a result, any data 
loaded from this folder will be wrong - each row will be present twice.

For example, when using
{code:python}
df1.to_parquet('/tmp/table', partition_cols=['col_a'], engine='pyarrow')  # 
second time

df2 = pd.read_parquet('/tmp/table', engine='pyarrow')
assert len(df1) == len(df2)  # raise an error{code}
This is a subtle change in the data that can pass unnoticed.

 

I would expect that the code will prevent the user from using an non-empty 
destination as partitioned target. an overwrite flag can also be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7705) [Rust] Initial sort implementation

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7705:
-

 Summary: [Rust] Initial sort implementation
 Key: ARROW-7705
 URL: https://issues.apache.org/jira/browse/ARROW-7705
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Neville Dipale


An initial sort implementation that allows sorting an array by various options 
(e.g. sort order). This is mainly to iterate on the design and inner workings 
of a sort algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7704) [Rust] Support sort

2020-01-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7704:
-

 Summary: [Rust] Support sort
 Key: ARROW-7704
 URL: https://issues.apache.org/jira/browse/ARROW-7704
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Neville Dipale


This lays out the work needed to support sorting arrays and record batches



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7703) [C++][Dataset] Give more informative error message for mismatching schemas for FileSystemSources

2020-01-28 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7703:


 Summary: [C++][Dataset] Give more informative error message for 
mismatching schemas for FileSystemSources
 Key: ARROW-7703
 URL: https://issues.apache.org/jira/browse/ARROW-7703
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Joris Van den Bossche


Currently, if you try to create a dataset from files with different schemes, 
you get this error:

{code}
ArrowInvalid: Unable to merge: Field a has incompatible types: int64 vs int32
{code}

If you are reading a directory of files, it would be very helpful if the error 
message can indicate which files are involved here (eg if you have a lot of 
files and only one has an error).

You can already inspect the schema's if you first make a SourceFactory 
manually, but that also only gives a list of schema's, not mapped to the 
original file (this last item probably relates to ARROW-7608 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7702) [C++][Dataset] Provide (optional) deterministic order of batches

2020-01-28 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7702:


 Summary: [C++][Dataset] Provide (optional) deterministic order of 
batches
 Key: ARROW-7702
 URL: https://issues.apache.org/jira/browse/ARROW-7702
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Dataset, Python
Reporter: Joris Van den Bossche


Example with python:

{code}
import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table({'a': range(12)}) 
pq.write_table(table, "test_chunks.parquet", chunk_size=3) 

# reading with dataset
import pyarrow.dataset as ds
ds.dataset("test_chunks.parquet").to_table().to_pandas()
{code}

gives non-deterministic result (order of the row groups in the parquet file):

```
In [25]: ds.dataset("test_chunks.parquet").to_table().to_pandas()   

   
Out[25]: 
 a
00
11
22
33
44
55
66
77
88
99
10  10
11  11

In [26]: ds.dataset("test_chunks.parquet").to_table().to_pandas()   

   
Out[26]: 
 a
00
11
22
33
48
59
6   10
7   11
84
95
10   6
11   7

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7701) [C++] [CI] Flight test error on macOS

2020-01-28 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7701:
-

 Summary: [C++] [CI] Flight test error on macOS
 Key: ARROW-7701
 URL: https://issues.apache.org/jira/browse/ARROW-7701
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration, FlightRPC
Reporter: Antoine Pitrou


See e.g. https://github.com/apache/arrow/pull/6295/checks?check_run_id=412748673

{code}
[ RUN  ] TestTls.DoAction
E0128 12:02:52.140841000 4447722944 ssl_security_connector.cc:275] 
Handshaker factory creation failed with TSI_INVALID_ARGUMENT.
E0128 12:02:52.14259 4447722944 server_secure_chttp2.cc:81]
{"created":"@1580212972.142576000","description":"Unable to create secure 
server with credentials of type 
Ssl.","file":"/Users/runner/runners/2.164.0/work/arrow/arrow/build/cpp/grpc_ep-prefix/src/grpc_ep/src/core/ext/transport/chttp2/server/secure/server_secure_chttp2.cc","file_line":63}
/Users/runner/runners/2.164.0/work/arrow/arrow/cpp/build-support/run-test.sh: 
line 97: 32477 Segmentation fault: 11  $TEST_EXECUTABLE "$@" 2>&1
 32478 Done| $ROOT/build-support/asan_symbolize.py
 32479 Done| ${CXXFILT:-c++filt}
 32480 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
 32481 Done| $pipe_cmd 2>&1
 32482 Done| tee $LOGFILE
~/runners/2.164.0/work/arrow/arrow/build/cpp/src/arrow/flight
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Krisztián Szűcs
Sorry, the previous email is hardly readable.

I would like to propose the following release candidate (RC1) of Apache
Arrow version 0.16.0. This is a release consisting of 710 resolved JIRA
issues[1].

This release candidate is based on commit:
188afde1f4298fb668e8ebadeacbc545e2de086f [2]

The source release rc1 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7].
The changelog is located at [8].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [9] for how to validate a release candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.16.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 0.16.0 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
[2]: 
https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
[4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
[5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
[6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
[7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
[8]: 
https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
[9]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates

On Tue, Jan 28, 2020 at 11:43 AM Krisztián Szűcs
 wrote:
>
> Hi,
>
> I would like to propose the following release candidate (RC1) of Apache
> Arrow version 0.16.0. This is a release consisting of 710
> resolved JIRA issues[1].
>
> This release candidate is based on commit:
>
>
>
> 188afde1f4298fb668e8ebadeacbc545e2de086f [2]
>
>
>
>  The source release rc1 is
> hosted at [3].
> The binary artifacts are hosted at [4][5][6][7].
>
>
>  The changelog is located at
> [8].
>
>
>
>  Please download, verify
> checksums and signatures, run the unit tests,
> and vote on the release. See [9] for how to validate a release
> candidate.
>
>
>
>
>
>The vote will be open for at least 72 hours.
>
>
>
>  [ ] +1 Release this as Apache
> Arrow 0.16.0
> [ ] +0
>
>
>  [ ] -1 Do not release this as
> Apache Arrow 0.16.0 because...
>
> [1]: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
> [2]: 
> https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
> [4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
> [5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
> [6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
> [8]: 
> https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
> [9]: 
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


[VOTE] Release Apache Arrow 0.16.0 - RC1

2020-01-28 Thread Krisztián Szűcs
Hi,

I would like to propose the following release candidate (RC1) of Apache
Arrow version 0.16.0. This is a release consisting of 710
resolved JIRA issues[1].

This release candidate is based on commit:



188afde1f4298fb668e8ebadeacbc545e2de086f [2]



 The source release rc1 is
hosted at [3].
The binary artifacts are hosted at [4][5][6][7].


 The changelog is located at
[8].



 Please download, verify
checksums and signatures, run the unit tests,
and vote on the release. See [9] for how to validate a release
candidate.





   The vote will be open for at least 72 hours.



 [ ] +1 Release this as Apache
Arrow 0.16.0
[ ] +0


 [ ] -1 Do not release this as
Apache Arrow 0.16.0 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.16.0
[2]: 
https://github.com/apache/arrow/tree/188afde1f4298fb668e8ebadeacbc545e2de086f
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.16.0-rc1
[4]: https://bintray.com/apache/arrow/centos-rc/0.16.0-rc1
[5]: https://bintray.com/apache/arrow/debian-rc/0.16.0-rc1
[6]: https://bintray.com/apache/arrow/python-rc/0.16.0-rc1
[7]: https://bintray.com/apache/arrow/ubuntu-rc/0.16.0-rc1
[8]: 
https://github.com/apache/arrow/blob/188afde1f4298fb668e8ebadeacbc545e2de086f/CHANGELOG.md
[9]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


[jira] [Created] (ARROW-7700) [Rust] All array types should have iterators and FromIterator support.

2020-01-28 Thread Andy Thomason (Jira)
Andy Thomason created ARROW-7700:


 Summary: [Rust] All array types should have iterators and 
FromIterator support.
 Key: ARROW-7700
 URL: https://issues.apache.org/jira/browse/ARROW-7700
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Thomason


Array types should have an Iterable trait that generates plain or nullable 
iterators.


{code}
pub trait Iterable<'a>
where Self::IterType: std::iter::Iterator
{
type IterType;

fn iter(&'a self) -> Self::IterType;
fn iter_nulls(&'a self) -> NullableIterator;
}
{code}

IterType depends on the array type from standard slice iterators for primitive 
types, string iterators for UTF8 types and composite iterators (generating 
other iterators) for list, struct and dictionary types.

The NullableIterator type should bundle a null bitmap pointer with another 
iterator type to form a composite iterator that returns an option:

{code}
/// Convert any iterator to a nullable iterator by using the null bitmap.
#[derive(Debug, PartialEq, Clone)]
pub struct NullableIterator {
iter: T,
i: usize,
null_bitmap: *const u8,
}

impl NullableIterator {
fn from(iter: T, null_bitmap: , offset: usize) -> Self;
}
{code}

For more details, some exploratory work has been done here: 
https://github.com/andy-thomason/arrow/blob/ARROW-iterators/rust/arrow/src/array/array.rs#L1711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Java] PR Reviewers

2020-01-28 Thread Fan Liya
Hi Micah,

Thank you so much for investing huge amounts of effort in reviewing Java
PRs.

I understand that you will stop reviewing Java PRs and focus on higher
priority issues.

However, I still hope you can (if possible) participate in relatively
important Java discussions and give your valuable comments.

Best,
Liya Fan


On Tue, Jan 28, 2020 at 3:14 PM Micah Kornfield 
wrote:

> Thanks for the offers I'll try to do a triage pass in the next few days and
> tag some of the people who have volunteered.
>
> Cheers,
> Micah
>
> On Mon, Jan 27, 2020 at 10:40 AM Ryan Murray  wrote:
>
> > Hey all, I would love to help out. Is there any specific ones that are
> > relatively easy for me to get started on?
> >
> > On Mon, 27 Jan 2020, 18:31 Bryan Cutler,  wrote:
> >
> > > Hi Micah, I don't have a ton of bandwidth at the moment, but I'll try
> to
> > > review some more PRs. Anyone, please feel free to ping me too if you
> > have a
> > > stale PR that needs some help getting through. Outreach to other Java
> > > communities sounds like a good idea - more Java users would definitely
> > be a
> > > good thing!
> > >
> > > Bryan
> > >
> > > On Mon, Jan 27, 2020 at 8:12 AM Andy Grove 
> > wrote:
> > >
> > > > I've now started working with the Java implementation of Arrow,
> > > > specifically Flight, and would be happy to help although I do have
> > > limited
> > > > time each week. I can at least review from a Java correctness point
> of
> > > > view.
> > > >
> > > > Andy.
> > > >
> > > > On Thu, Jan 23, 2020 at 9:41 PM Micah Kornfield <
> emkornfi...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I mentioned this elsewhere but my intent is to stop doing java
> > reviews
> > > > for
> > > > > the immediate future once I wrap up the few that I have requested
> > > change
> > > > > on.
> > > > >
> > > > > I'm happy to try to triage incoming Java PRs, but in order to do
> > this,
> > > I
> > > > > need to know which committers have some bandwidth to do reviews
> (some
> > > of
> > > > > the existing PRs I've tagged people who never responded).
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (ARROW-7699) [Java] Support concating dense union vectors in batch

2020-01-28 Thread Liya Fan (Jira)
Liya Fan created ARROW-7699:
---

 Summary: [Java] Support concating dense union vectors in batch
 Key: ARROW-7699
 URL: https://issues.apache.org/jira/browse/ARROW-7699
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


After supporting the dense union vector, we need to support concating dense 
union vectors in batch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)