Re: [Java] CI test failures

2019-09-11 Thread Micah Kornfield
I will try to take look the next couple of hours to see if I can fix it
quickly.

On Wed, Sep 11, 2019 at 4:54 AM Antoine Pitrou  wrote:

>
> Hello,
>
> Some Travis-CI builds are failing because of Java issues.  It would be
> good if a Java contributor or maintainer could take a look ASAP.
> https://issues.apache.org/jira/browse/ARROW-6509
>
> Thanks
>
> Antoine.
>


[jira] [Created] (ARROW-6547) [C++] valgrind errors in arrow-ipc-read-write-test

2019-09-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6547:
---

 Summary: [C++] valgrind errors in arrow-ipc-read-write-test
 Key: ARROW-6547
 URL: https://issues.apache.org/jira/browse/ARROW-6547
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


Not sure when these crept in but I encountered when looking into a segfault in 
a build today

https://gist.github.com/wesm/b388dda4f0e2e38a8aa77dfc9bd91914



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6546) [C++] Add missing FlatBuffers source dependency

2019-09-11 Thread Sutou Kouhei (Jira)
Sutou Kouhei created ARROW-6546:
---

 Summary: [C++] Add missing FlatBuffers source dependency
 Key: ARROW-6546
 URL: https://issues.apache.org/jira/browse/ARROW-6546
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Sutou Kouhei
Assignee: Sutou Kouhei






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-11 Thread Wes McKinney
Thanks Bryan.

I merged the Java patch with the EOS change and submitted a C++ patch
which also updates the specification

https://github.com/apache/arrow/pull/5361

Let me know when the JS or C# patches are ready to go and I can merge those.

I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
the Go change corresponding to this

On Wed, Sep 11, 2019 at 12:33 AM Bryan Cutler  wrote:
>
> I have the patch for the EOS with Java writers up here
> https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of
> {0x, 0x} is used for both stream and file formats, in
> non-legacy writing mode.
>
> On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler  wrote:
>
> > Sounds good to me also and I don't think we need a vote either.
> >
> > On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield 
> > wrote:
> >
> >> +1 on this, I also don't think a vote is necessary as long as we make the
> >> change before 0.15.0
> >>
> >> On Saturday, September 7, 2019, Wes McKinney  wrote:
> >>
> >> > I see, thank you for catching this nuance.
> >> >
> >> > I agree that using {0x, 0x} for EOS will resolve the
> >> > issue while allowing implementations to be backwards compatible (i.e.
> >> > handling the 4-byte EOS from older payloads).
> >> >
> >> > I'm not sure that we need to have a vote about this, what do others
> >> think?
> >> >
> >> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu 
> >> wrote:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > During the java code review[1], seems there is a problem with the
> >> > current implementations(C++/Java etc) when reaching EOS, since the new
> >> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
> >> end
> >> > of stream, and the additional 4 bytes will not be read which cause
> >> problems
> >> > for following up readings.
> >> > >
> >> > > There are some optional suggestions[2] as below, we should reach
> >> > consistent and fix this problem before 0.15 release.
> >> > > i. For the new format, an 8-byte EOS token should look like
> >> {0x,
> >> > 0x}, so we read the continuation token first, and then know to
> >> read
> >> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just
> >> remember
> >> > the state, so if it reads the continuation token from the beginning,
> >> then
> >> > read all 8 bytes at the end.
> >> > >
> >> > > Thanks,
> >> > > Ji Liu
> >> > >
> >> > > [1] https://github.com/apache/arrow/pull/5229
> >> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > From:Eric Erhardt 
> >> > > Send Time:2019年9月5日(星期四) 07:16
> >> > > To:dev@arrow.apache.org ; Ji Liu <
> >> > niki...@aliyun.com>
> >> > > Cc:emkornfield ; Paul Taylor <
> >> ptay...@apache.org>
> >> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > The C# PR is up.
> >> > >
> >> > > https://github.com/apache/arrow/pull/5280
> >> > >
> >> > > Eric
> >> > >
> >> > > -Original Message-
> >> > > From: Eric Erhardt 
> >> > > Sent: Wednesday, September 4, 2019 10:12 AM
> >> > > To: dev@arrow.apache.org; Ji Liu 
> >> > > Cc: emkornfield ; Paul Taylor <
> >> ptay...@apache.org
> >> > >
> >> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > I'm working on a PR for the C# bindings. I hope to have it up in the
> >> > next day or two. Integration tests for C# would be a great addition at
> >> some
> >> > point - it's been on my backlog. For now I plan on manually testing it.
> >> > >
> >> > > -Original Message-
> >> > > From: Wes McKinney 
> >> > > Sent: Tuesday, September 3, 2019 10:17 PM
> >> > > To: Ji Liu 
> >> > > Cc: emkornfield ; dev ;
> >> > Paul Taylor 
> >> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > hi folks,
> >> > >
> >> > > We now have patches up for Java, JS, and Go. How are we doing on the
> >> > code reviews for getting these in?
> >> > >
> >> > > Since C# implements the binary protocol, the C# developers might want
> >> to
> >> > look at this before the 0.15.0 release also. Absent integration tests
> >> it's
> >> > difficult to verify the C# library, though
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu  wrote:
> >> > > >
> >> > > > Here is the Java implementation
> >> > > >
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >> > > > ub.com
> >> %2Fapache%2Farrow%2Fpull%2F5229data=02%7C01%7CEric.Erhardt%
> >> > > > 40microsoft.com
> >> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> >> > > >
> >> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=b87u5x8lLvfdnU5
> >> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3Dreserved=0

[jira] [Created] (ARROW-6545) [Go] Update Go IPC writer to use two-part EOS per mailing list discussion

2019-09-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6545:
---

 Summary: [Go] Update Go IPC writer to use two-part EOS per mailing 
list discussion
 Key: ARROW-6545
 URL: https://issues.apache.org/jira/browse/ARROW-6545
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Wes McKinney
Assignee: Sebastien Binet
 Fix For: 0.15.0


Per mailing list discussion



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6544) [R] Documentation/polishing for 0.15 release

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6544:
--

 Summary: [R] Documentation/polishing for 0.15 release
 Key: ARROW-6544
 URL: https://issues.apache.org/jira/browse/ARROW-6544
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.15.0






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6543) [R] Support LargeBinary and LargeString types

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6543:
--

 Summary: [R] Support LargeBinary and LargeString types
 Key: ARROW-6543
 URL: https://issues.apache.org/jira/browse/ARROW-6543
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See ARROW-750



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6542) [R] Add View() method to array types

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6542:
--

 Summary: [R] Add View() method to array types
 Key: ARROW-6542
 URL: https://issues.apache.org/jira/browse/ARROW-6542
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See ARROW-6048



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6541) [Format][C++] Use two-part EOS and amend Format documentation

2019-09-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6541:
---

 Summary: [Format][C++] Use two-part EOS and amend Format 
documentation
 Key: ARROW-6541
 URL: https://issues.apache.org/jira/browse/ARROW-6541
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Format
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.15.0


Per mailing list discussion



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6540) [R] Add Validate() methods

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6540:
--

 Summary: [R] Add Validate() methods
 Key: ARROW-6540
 URL: https://issues.apache.org/jira/browse/ARROW-6540
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson


See ARROW-6174 and ARROW-6177



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6539) [R] Provide mechanism to write out old format

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6539:
--

 Summary: [R] Provide mechanism to write out old format
 Key: ARROW-6539
 URL: https://issues.apache.org/jira/browse/ARROW-6539
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Romain François
 Fix For: 0.15.0


See ARROW-6474. {{sparklyr}} will have the same issue so we should make sure 
this is supported in R.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6538) [R] Add Abort() method to streams

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6538:
--

 Summary: [R] Add Abort() method to streams
 Key: ARROW-6538
 URL: https://issues.apache.org/jira/browse/ARROW-6538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See ARROW-6300



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6537) [R] Pass column_types to CSV reader

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6537:
--

 Summary: [R] Pass column_types to CSV reader
 Key: ARROW-6537
 URL: https://issues.apache.org/jira/browse/ARROW-6537
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See also ARROW-6536. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6536) [C++] CSV reader accept schema

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6536:
--

 Summary: [C++] CSV reader accept schema
 Key: ARROW-6536
 URL: https://issues.apache.org/jira/browse/ARROW-6536
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson


The CSV reader lets you specify {{column_types}}, but this is an 
{{unordered_map}} of column name and type. Why not accept a Schema instead? 
Isn't that essentially an ordered map? Seems that if you took a Schema, some of 
the validation of what's being passed in would already have been handled. Plus, 
I suspect that the Datasets project will want to do even more with passing a 
Schema (e.g. selecting a subset of columns). 

Thoughts [~pitrou] [~fsaintjacques] [~bkietz]?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6535) [C++] Status::WithMessage should accept variadic parameters

2019-09-11 Thread Benjamin Kietzman (Jira)
Benjamin Kietzman created ARROW-6535:


 Summary: [C++] Status::WithMessage should accept variadic 
parameters
 Key: ARROW-6535
 URL: https://issues.apache.org/jira/browse/ARROW-6535
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Benjamin Kietzman


Currently only the Status factories like {{Status::Invalid}} accept variadic 
ostreamable parameters, but {{Status::WithMessage}} would also benefit from 
this pattern (and potentially other methods of Status or Result)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6534) [Java] Fix typos and spelling

2019-09-11 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-6534:
---

 Summary: [Java] Fix typos and spelling
 Key: ARROW-6534
 URL: https://issues.apache.org/jira/browse/ARROW-6534
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


Fix typos and spelling, mostly in docs and tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6532) [R] Write parquet files with compression

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6532:
--

 Summary: [R] Write parquet files with compression
 Key: ARROW-6532
 URL: https://issues.apache.org/jira/browse/ARROW-6532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-6360. See ARROW-6216 for the C++ side. `write_parquet()` 
should be able to write compressed files, including with a specified 
compression level.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6533) [R] Compression codec should take a "level"

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6533:
--

 Summary: [R] Compression codec should take a "level"
 Key: ARROW-6533
 URL: https://issues.apache.org/jira/browse/ARROW-6533
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See ARROW-6216 for the C++ side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Timeline for 0.15.0 release

2019-09-11 Thread Wes McKinney
hi Eric -- yes, that's correct. I'm planning to amend the Format docs
today regarding the EOS issue and also update the C++ library

On Wed, Sep 11, 2019 at 11:21 AM Eric Erhardt
 wrote:
>
> I assume the plan is to merge the ARROW-6313-flatbuffer-alignment branch into 
> master before the 0.15 release, correct?
>
> BTW - I believe the C# alignment changes are ready to be merged into the 
> alignment branch -  https://github.com/apache/arrow/pull/5280/
>
> Eric
>
> -Original Message-
> From: Micah Kornfield 
> Sent: Tuesday, September 10, 2019 10:24 PM
> To: Wes McKinney 
> Cc: dev ; niki.lj 
> Subject: Re: Timeline for 0.15.0 release
>
> I should have a little more bandwidth to help with some of the packaging 
> starting tomorrow and going into the weekend.
>
> On Tuesday, September 10, 2019, Wes McKinney  wrote:
>
> > Hi folks,
> >
> > With the state of nightly packaging and integration builds things
> > aren't looking too good for being in release readiness by the end of
> > this week but maybe I'm wrong. I'm planning to be working to close as
> > many issues as I can and also to help with the ongoing alignment fixes.
> >
> > Wes
> >
> > On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield 
> > wrote:
> >
> >> Just for reference [1] has a dashboard of the current issues:
> >>
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
> >> ki.apache.org%2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea
> >> sedata=02%7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034
> >> a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376
> >> 90648216338sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4
> >> %3Dreserved=0
> >>
> >> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney  wrote:
> >>
> >>> hi all,
> >>>
> >>> It doesn't seem like we're going to be in a position to release at
> >>> the beginning of next week. I hope that one more week of work (or
> >>> less) will be enough to get us there. Aside from merging the
> >>> alignment changes, we need to make sure that our packaging jobs
> >>> required for the release candidate are all working.
> >>>
> >>> If folks could remove issues from the 0.15.0 backlog that they don't
> >>> think they will finish by end of next week that would help focus
> >>> efforts (there are currently 78 issues in 0.15.0 still). I am
> >>> looking to tackle a few small features related to dictionaries while
> >>> the release window is still open.
> >>>
> >>> - Wes
> >>>
> >>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney 
> >>> wrote:
> >>> >
> >>> > hi,
> >>> >
> >>> > I think we should try to release the week of September 9, so
> >>> > development work should be completed by end of next week.
> >>> >
> >>> > Does that seem reasonable?
> >>> >
> >>> > I plan to get up a patch for the protocol alignment changes for
> >>> > C++ in the next couple of days -- I think that getting the
> >>> > alignment work done is the main barrier to releasing.
> >>> >
> >>> > Thanks
> >>> > Wes
> >>> >
> >>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu
> >>> > 
> >>> wrote:
> >>> > >
> >>> > > Hi, Wes, on the java side, I can think of several bugs that need
> >>> > > to
> >>> be fixed or reminded.
> >>> > >
> >>> > > i. ARROW-6040: Dictionary entries are required in IPC streams
> >>> > > even
> >>> when empty[1]
> >>> > > This one is under review now, however through this PR we find
> >>> > > that
> >>> there seems a bug in java reading and writing dictionaries in IPC
> >>> which is Inconsistent with spec[2] since it assumes all dictionaries
> >>> are at the start of stream (see details in PR comments,  and this
> >>> fix may not catch up with version 0.15). @Micah Kornfield
> >>> > >
> >>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test
> >>> JSON files[3]
> >>> > > Java side code already checked in, other implementations seems not.
> >>> > >
> >>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] Caused by trying
> >>> > > to load all records in one contiguous batch, fixed
> >>> by providing iterator API for iteratively reading in ARROW-6219[5].
> >>> > >
> >>> > > Thanks,
> >>> > > Ji Liu
> >>> > >
> >>> > > [1]
> >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>> > > 2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4960data=02%7C01%7CE
> >>> > > ric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7
> >>> > > C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338
> >>> > > mp;sdata=eDF%2FAsJmVs7WjfEuNBYo%2F1TypIN44xx1TTlK6kQHZVg%3D
> >>> > > reserved=0 [2]
> >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>> > > 2Farrow.apache.org%2Fdocs%2Fipc.htmldata=02%7C01%7CEric.Erh
> >>> > > ardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7C72f988
> >>> > > bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338sdat
> >>> > > a=H0pM8bVKsOyeORDhHxLlS%2BpaS%2F5meT52wxTKmNssuMk%3Dreserve
> >>> > > d=0 [3]
> >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> >>> > > 

RE: Timeline for 0.15.0 release

2019-09-11 Thread Eric Erhardt
I assume the plan is to merge the ARROW-6313-flatbuffer-alignment branch into 
master before the 0.15 release, correct?

BTW - I believe the C# alignment changes are ready to be merged into the 
alignment branch -  https://github.com/apache/arrow/pull/5280/ 

Eric

-Original Message-
From: Micah Kornfield  
Sent: Tuesday, September 10, 2019 10:24 PM
To: Wes McKinney 
Cc: dev ; niki.lj 
Subject: Re: Timeline for 0.15.0 release

I should have a little more bandwidth to help with some of the packaging 
starting tomorrow and going into the weekend.

On Tuesday, September 10, 2019, Wes McKinney  wrote:

> Hi folks,
>
> With the state of nightly packaging and integration builds things 
> aren't looking too good for being in release readiness by the end of 
> this week but maybe I'm wrong. I'm planning to be working to close as 
> many issues as I can and also to help with the ongoing alignment fixes.
>
> Wes
>
> On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield 
> wrote:
>
>> Just for reference [1] has a dashboard of the current issues:
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
>> ki.apache.org%2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea
>> sedata=02%7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034
>> a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376
>> 90648216338sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4
>> %3Dreserved=0
>>
>> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney  wrote:
>>
>>> hi all,
>>>
>>> It doesn't seem like we're going to be in a position to release at 
>>> the beginning of next week. I hope that one more week of work (or 
>>> less) will be enough to get us there. Aside from merging the 
>>> alignment changes, we need to make sure that our packaging jobs 
>>> required for the release candidate are all working.
>>>
>>> If folks could remove issues from the 0.15.0 backlog that they don't 
>>> think they will finish by end of next week that would help focus 
>>> efforts (there are currently 78 issues in 0.15.0 still). I am 
>>> looking to tackle a few small features related to dictionaries while 
>>> the release window is still open.
>>>
>>> - Wes
>>>
>>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney 
>>> wrote:
>>> >
>>> > hi,
>>> >
>>> > I think we should try to release the week of September 9, so 
>>> > development work should be completed by end of next week.
>>> >
>>> > Does that seem reasonable?
>>> >
>>> > I plan to get up a patch for the protocol alignment changes for 
>>> > C++ in the next couple of days -- I think that getting the 
>>> > alignment work done is the main barrier to releasing.
>>> >
>>> > Thanks
>>> > Wes
>>> >
>>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu 
>>> > 
>>> wrote:
>>> > >
>>> > > Hi, Wes, on the java side, I can think of several bugs that need 
>>> > > to
>>> be fixed or reminded.
>>> > >
>>> > > i. ARROW-6040: Dictionary entries are required in IPC streams 
>>> > > even
>>> when empty[1]
>>> > > This one is under review now, however through this PR we find 
>>> > > that
>>> there seems a bug in java reading and writing dictionaries in IPC 
>>> which is Inconsistent with spec[2] since it assumes all dictionaries 
>>> are at the start of stream (see details in PR comments,  and this 
>>> fix may not catch up with version 0.15). @Micah Kornfield
>>> > >
>>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test
>>> JSON files[3]
>>> > > Java side code already checked in, other implementations seems not.
>>> > >
>>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] Caused by trying 
>>> > > to load all records in one contiguous batch, fixed
>>> by providing iterator API for iteratively reading in ARROW-6219[5].
>>> > >
>>> > > Thanks,
>>> > > Ji Liu
>>> > >
>>> > > [1] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4960data=02%7C01%7CE
>>> > > ric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7
>>> > > C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338
>>> > > mp;sdata=eDF%2FAsJmVs7WjfEuNBYo%2F1TypIN44xx1TTlK6kQHZVg%3D
>>> > > reserved=0 [2] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Farrow.apache.org%2Fdocs%2Fipc.htmldata=02%7C01%7CEric.Erh
>>> > > ardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7C72f988
>>> > > bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338sdat
>>> > > a=H0pM8bVKsOyeORDhHxLlS%2BpaS%2F5meT52wxTKmNssuMk%3Dreserve
>>> > > d=0 [3] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-1875data=02%7C0
>>> > > 1%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678
>>> > > a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216
>>> > > 338sdata=coTpuoEGhfjyOSBTagdlohOTX24DQZmtbWC0gYsDmkM%3D
>>> > > ;reserved=0 [4] 
>>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> > > 

[jira] [Created] (ARROW-6531) [C++] Do not always close raw OutputStream in BufferedOutputStream::Close

2019-09-11 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6531:
---

 Summary: [C++] Do not always close raw OutputStream in 
BufferedOutputStream::Close
 Key: ARROW-6531
 URL: https://issues.apache.org/jira/browse/ARROW-6531
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


{{BufferedOutputStream::Close}} closes the raw file handle unconditionally. 
This may be undesirable in some circumstances.

Some alternatives:

* Do not close it
* Only close it if the {{use_count}} of the {{shared_ptr}} is 1, so we know 
that no one else has a copy of the shared_ptr



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6530) [CI][Crossbow][R] Nightly R job doesn't install all dependencies

2019-09-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6530:
--

 Summary: [CI][Crossbow][R] Nightly R job doesn't install all 
dependencies
 Key: ARROW-6530
 URL: https://issues.apache.org/jira/browse/ARROW-6530
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.15.0


https://circleci.com/gh/ursa-labs/crossbow/2802

{code}
* checking for file './DESCRIPTION' ... OK
* preparing 'arrow':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* running 'cleanup'
Error in loadVignetteBuilder(pkgdir, TRUE) : 
  vignette builder 'knitr' not found
Execution halted
Exited with code 1
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6529) [C++] Feather: slow writing of NullArray

2019-09-11 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6529:


 Summary: [C++] Feather: slow writing of NullArray
 Key: ARROW-6529
 URL: https://issues.apache.org/jira/browse/ARROW-6529
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Joris Van den Bossche


>From 
>https://stackoverflow.com/questions/57877017/pandas-feather-format-is-slow-when-writing-a-column-of-none

Smaller example with just using pyarrow, it seems that writing an array of 
nulls takes much longer than an array of for example ints, which seems a bit 
strange:

{code}
In [93]: arr = pa.array([1]*1000)  

In [94]: %%timeit 
...: w = pyarrow.feather.FeatherWriter('__test.feather') 
...: w.writer.write_array('x', arr) 
...: w.writer.close() 

31.4 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)

In [95]: arr = pa.array([None]*1000)  

In [96]: arr
Out[96]: 

1000 nulls

In [97]: %%timeit 
...: w = pyarrow.feather.FeatherWriter('__test.feather') 
...: w.writer.write_array('x', arr) 
...: w.writer.close() 

3.75 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{code}

So writing the same length NullArray takes ca 100x more time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)

2019-09-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6528:
-

 Summary: [C++] Spurious Flight test failures (port allocation 
failure)
 Key: ARROW-6528
 URL: https://issues.apache.org/jira/browse/ARROW-6528
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


Seems like our port allocation scheme inside unit tests is still not very 
reliable :-/
https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio

{code}
[--] 3 tests from TestMetadata
[ RUN  ] TestMetadata.DoGet
E0905 12:45:40.322644527   10203 server_chttp2.cc:40]
{"created":"@1567687540.322612245","description":"No address added out of total 
1 
resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable
 to configure 
socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address
 already in 
use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address
 already in use","syscall":"bind"}]}]}
../src/arrow/flight/flight_test.cc:429: Failure
Failed
'server->Init(options)' failed with Unknown error: Server did not start properly
/buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: 10203 
Segmentation fault  (core dumped) $TEST_EXECUTABLE "$@" 2>&1
 10204 Done| $ROOT/build-support/asan_symbolize.py
 10205 Done| ${CXXFILT:-c++filt}
 10206 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
 10207 Done| $pipe_cmd 2>&1
 10208 Done| tee $LOGFILE
/buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight
{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[NIGHTLY] Arrow Build Report for Job nightly-2019-09-11-0

2019-09-11 Thread Crossbow


Arrow Build Report for Job nightly-2019-09-11-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0

Failed Tasks:
- docker-turbodbc-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-turbodbc-integration
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-gandiva-jar-trusty
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux2010-cp35m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-osx-cp37m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-osx-cp36m
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-appveyor-wheel-win-cp35m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-osx-cp35m
- docker-spark-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-spark-integration
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-osx-cp27m
- docker-r:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-r

Succeeded Tasks:
- docker-lint:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-lint
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-gandiva-jar-osx
- docker-iwyu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-iwyu
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-cpp-release
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-appveyor-wheel-win-cp37m
- wheel-manylinux1-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux1-cp27mu
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-azure-centos-7
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux1-cp35m
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-appveyor-conda-win-vs2015-py36
- docker-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-python-3.6
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-azure-debian-buster
- docker-go:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-go
- wheel-manylinux2010-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux2010-cp36m
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux1-cp36m
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-dask-integration
- docker-cpp-cmake32:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-cpp-cmake32
- wheel-manylinux2010-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux2010-cp27mu
- wheel-manylinux2010-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux2010-cp27m
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-python-2.7-nopandas
- docker-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-python-2.7
- ubuntu-disco:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-azure-ubuntu-disco
- docker-hdfs-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-hdfs-integration
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-azure-conda-linux-gcc-py27
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-travis-wheel-manylinux1-cp37m
- docker-cpp-fuzzit:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-11-0-circle-docker-cpp-fuzzit
- docker-python-3.7:
  URL: 

[Java] CI test failures

2019-09-11 Thread Antoine Pitrou


Hello,

Some Travis-CI builds are failing because of Java issues.  It would be
good if a Java contributor or maintainer could take a look ASAP.
https://issues.apache.org/jira/browse/ARROW-6509

Thanks

Antoine.


[jira] [Created] (ARROW-6527) [C++] Add OutputStream::Write() variant taking an owned buffer

2019-09-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6527:
-

 Summary: [C++] Add OutputStream::Write() variant taking an owned 
buffer
 Key: ARROW-6527
 URL: https://issues.apache.org/jira/browse/ARROW-6527
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


When Write() takes an arbitrary data pointer and needs to buffer it, it is 
mandatory to copy the data because the pointer may go stale, or the data may be 
overwritten.

Buf if the user has an immutable Buffer, then it should be enough to store the 
Buffer as necessary, without doing a memory copy. We could add a special 
Write() variant for that.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6526) [C++] Poison data in PoolBuffer destructor

2019-09-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6526:
-

 Summary: [C++] Poison data in PoolBuffer destructor
 Key: ARROW-6526
 URL: https://issues.apache.org/jira/browse/ARROW-6526
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


In debug mode, we could poison data (at least the first and last bytes?) in the 
PoolBuffer destructor so as to easily detect buffer lifetime issues.

(ASAN also helps, but this would act as a first defense barrier, e.g. for local 
development)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6525) [C++] CloseFromDestructor() should perhaps not crash

2019-09-11 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6525:
-

 Summary: [C++] CloseFromDestructor() should perhaps not crash
 Key: ARROW-6525
 URL: https://issues.apache.org/jira/browse/ARROW-6525
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


When a stream object fails to close in its destructor, CloseFromDestructor() 
will abort the process with a fatal error. This may not be desirable on e.g. 
networked filesystems where failing to closing isn't uncommon. Perhaps we 
should just log an error instead.

(stream users should generally call Close() explicitly, but in some cases they 
may fail to do so, e.g. when an error interrupts processing)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6524) [Developer][Packaging] Nightly build report's subject should contain Arrow

2019-09-11 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6524:
--

 Summary: [Developer][Packaging] Nightly build report's subject 
should contain Arrow
 Key: ARROW-6524
 URL: https://issues.apache.org/jira/browse/ARROW-6524
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.15.0


Something like: "[NIGHTLY] Arrow build report for job "



--
This message was sent by Atlassian Jira
(v8.3.2#803003)