[jira] [Resolved] (ARROW-5456) [GLib][Plasma] Installed plasma-glib may be used on building document

2019-05-31 Thread Yosuke Shiro (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro resolved ARROW-5456.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4425
[https://github.com/apache/arrow/pull/4425]

> [GLib][Plasma] Installed plasma-glib may be used on building document
> -
>
> Key: ARROW-5456
> URL: https://issues.apache.org/jira/browse/ARROW-5456
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.13.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2719) [Python/C++] ArrowSchema not hashable

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2719:

Fix Version/s: (was: 0.14.0)

> [Python/C++] ArrowSchema not hashable
> -
>
> Key: ARROW-2719
> URL: https://issues.apache.org/jira/browse/ARROW-2719
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Florian Jetter
>Priority: Minor
>
> The arrow schema is immutable and should provide a way of hashing itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2256:

Fix Version/s: (was: 0.14.0)

> [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
> 
>
> Key: ARROW-2256
> URL: https://issues.apache.org/jira/browse/ARROW-2256
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> I did a clean upgrade to 16.04 on one of my machine and ran into the problem 
> described here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087
> I think this can be resolved temporarily by symlinking the static library, 
> but we should document the problem so other devs know what to do when it 
> happens



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853514#comment-16853514
 ] 

Wes McKinney commented on ARROW-2256:
-

I can't get fuzzing working at all on Ubuntu 19.04. The error looks like this

{code}
$ ./debug/arrow-ipc-fuzzing-test 
INFO: Seed: 3163524211
INFO: Loaded 1 modules   (33926 guards): 33926 [0xd15918, 0xd36b30), 
INFO: Loaded 1 modules   (143 inline 8-bit counters): 143 [0xd36b30, 0xd36bbf), 
INFO: Loaded 1 PC tables (143 PCs): 143 [0xd36bc0,0xd374b0), 
ERROR: The size of coverage PC tables does not match the
number of instrumented PCs. This might be a compiler bug,
please contact the libFuzzer developers.
Also check https://bugs.llvm.org/show_bug.cgi?id=34636
for possible workarounds (tl;dr: don't use the old GNU ld)
{code}

There's a long thread about it here 
https://groups.google.com/forum/#!topic/llvm-dev/fnDXbyduLjw

and https://github.com/google/oss-fuzz/issues/1042

> [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
> 
>
> Key: ARROW-2256
> URL: https://issues.apache.org/jira/browse/ARROW-2256
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I did a clean upgrade to 16.04 on one of my machine and ran into the problem 
> described here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087
> I think this can be resolved temporarily by symlinking the static library, 
> but we should document the problem so other devs know what to do when it 
> happens



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2248:

Labels: nightly  (was: )

> [Python] Nightly or on-demand HDFS test builds
> --
>
> Key: ARROW-2248
> URL: https://issues.apache.org/jira/browse/ARROW-2248
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.14.0
>
>
> We continue to acquire more functionality related to HDFS and Parquet. 
> Testing this, including tests that involve interoperability with other 
> systems, like Spark, will require some work outside of our normal CI 
> infrastructure.
> I suggest we start with testing the C++/Python HDFS integration, which will 
> help with validating patches like ARROW-1643 
> https://github.com/apache/arrow/pull/1668



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2248:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Nightly or on-demand HDFS test builds
> --
>
> Key: ARROW-2248
> URL: https://issues.apache.org/jira/browse/ARROW-2248
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.15.0
>
>
> We continue to acquire more functionality related to HDFS and Parquet. 
> Testing this, including tests that involve interoperability with other 
> systems, like Spark, will require some work outside of our normal CI 
> infrastructure.
> I suggest we start with testing the C++/Python HDFS integration, which will 
> help with validating patches like ARROW-1643 
> https://github.com/apache/arrow/pull/1668



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1581:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: nightly
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1581:

Labels: nightly  (was: )

> [Python] Set up nightly wheel builds for Linux, macOS
> -
>
> Key: ARROW-1581
> URL: https://issues.apache.org/jira/browse/ARROW-1581
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: nightly
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1582:

Labels: nightly  (was: )

> [Python] Set up + document nightly conda builds for macOS
> -
>
> Key: ARROW-1582
> URL: https://issues.apache.org/jira/browse/ARROW-1582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.14.0
>
>
> It's already been great to be able to test the nightlies on Linux in conda; 
> it would be great to be able to do the same on macOS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1582:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Set up + document nightly conda builds for macOS
> -
>
> Key: ARROW-1582
> URL: https://issues.apache.org/jira/browse/ARROW-1582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: nightly
> Fix For: 0.15.0
>
>
> It's already been great to be able to test the nightlies on Linux in conda; 
> it would be great to be able to do the same on macOS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5289) [C++] Move arrow/util/concatenate.h to arrow/array/

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5289:
--
Labels: pull-request-available  (was: )

> [C++] Move arrow/util/concatenate.h to arrow/array/
> ---
>
> Key: ARROW-5289
> URL: https://issues.apache.org/jira/browse/ARROW-5289
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> I think this would be a better location for array/columnar algorithms
> Please wait until after ARROW-3144 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-843:
---
Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Parquet merging unequal but equivalent schemas
> 
>
> Key: ARROW-843
> URL: https://issues.apache.org/jira/browse/ARROW-843
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: dataset, parquet
> Fix For: 0.15.0
>
>
> Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL 
> repetition types. While such schemas aren't strictly equal, we will need to 
> consider them equivalent on the read path



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5475) [Python] Add Python binding for arrow::Concatenate

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5475:
---

 Summary: [Python] Add Python binding for arrow::Concatenate
 Key: ARROW-5475
 URL: https://issues.apache.org/jira/browse/ARROW-5475
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1279) Integration tests for Map type

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1279:
--
Labels: pull-request-available  (was: )

> Integration tests for Map type
> --
>
> Key: ARROW-1279
> URL: https://issues.apache.org/jira/browse/ARROW-1279
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Integration, Java
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853481#comment-16853481
 ] 

Wes McKinney commented on ARROW-5474:
-

We don't have much regular feedback about older Boost versions, it mainly 
happens around releases, though it would not be a huge effort to set up a 
Docker job to test with the Boost versions coming from different Linux 
distributions. Gandiva doesn't build with Boost 1.54 on Ubuntu 14.04, for 
example, see ARROW-4868

> [C++] What version of Boost do we require now?
> --
>
> Key: ARROW-5474
> URL: https://issues.apache.org/jira/browse/ARROW-5474
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One 
> possible cause for that error is that the local filesystem patch increased 
> the version of boost that we actually require. The boost version (1.54 vs 
> 1.58) was one difference between failure and success. 
> Another point of confusion was that CMake reported two different versions of 
> boost at different times. 
> If we require a minimum version of boost, can we document that better, check 
> for it more accurately in the build scripts, and fail with a useful message 
> if that minimum isn't met? Or something else helpful.
> If the actual cause of the failure was something else (e.g. compiler 
> version), we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4482) [Website] Add blog archive page

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-4482:
--

Assignee: Neal Richardson

> [Website] Add blog archive page
> ---
>
> Key: ARROW-4482
> URL: https://issues.apache.org/jira/browse/ARROW-4482
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.15.0
>
>
> There's no easy way to get a bulleted list of all blog posts on the Arrow 
> website. See example archive on my personal blog 
> http://wesmckinney.com/archives.html
> It would be great to have such a generated archive on our website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5474) [C++] What version of Boost do we require now?

2019-05-31 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5474:
--

 Summary: [C++] What version of Boost do we require now?
 Key: ARROW-5474
 URL: https://issues.apache.org/jira/browse/ARROW-5474
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Antoine Pitrou
 Fix For: 0.14.0


See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One possible 
cause for that error is that the local filesystem patch increased the version 
of boost that we actually require. The boost version (1.54 vs 1.58) was one 
difference between failure and success. 

Another point of confusion was that CMake reported two different versions of 
boost at different times. 

If we require a minimum version of boost, can we document that better, check 
for it more accurately in the build scripts, and fail with a useful message if 
that minimum isn't met? Or something else helpful.

If the actual cause of the failure was something else (e.g. compiler version), 
we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5470.
-
Resolution: Fixed

Issue resolved by pull request 4443
[https://github.com/apache/arrow/pull/4443]

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1642) [GLib] Build GLib using Meson in Appveyor

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-1642.
-
   Resolution: Duplicate
 Assignee: Kouhei Sutou
Fix Version/s: 0.13.0

Arrow GLib's AppVeyor build was added by ARROW-4353.
It uses MinGW not Visual Studio. We should create anew JIRA issue when we need 
Visual Studio build.

> [GLib] Build GLib using Meson in Appveyor
> -
>
> Key: ARROW-1642
> URL: https://issues.apache.org/jira/browse/ARROW-1642
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4159) [C++] Check for -Wdocumentation issues

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4159.
-
Resolution: Fixed

Issue resolved by pull request 4441
[https://github.com/apache/arrow/pull/4441]

> [C++] Check for -Wdocumentation issues 
> ---
>
> Key: ARROW-4159
> URL: https://issues.apache.org/jira/browse/ARROW-4159
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux 
> distribution but not another, both with clang-6.0. Not sure why that is 
> exactly, but it would be good to try to reproduce and see if our CI can be 
> improved to catch these, or in worst case we could do it in one of our 
> docker-compose builds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4159) [C++] Check for -Wdocumentation issues

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4159:
---

Assignee: Wes McKinney

> [C++] Check for -Wdocumentation issues 
> ---
>
> Key: ARROW-4159
> URL: https://issues.apache.org/jira/browse/ARROW-4159
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux 
> distribution but not another, both with clang-6.0. Not sure why that is 
> exactly, but it would be good to try to reproduce and see if our CI can be 
> improved to catch these, or in worst case we could do it in one of our 
> docker-compose builds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853433#comment-16853433
 ] 

Wes McKinney commented on ARROW-1983:
-

Yes. I don't think it necessarily to resolve all of this in a single patch, so 
we can open a follow-up JIRA to implement the optimization to read a row group 
given a _metadata file. There is some other complexity there such as how to 
open the filepath (you need a FileSystem handle -- see the filesystem API work 
that is in process)

> [Python] Add ability to write parquet `_metadata` file
> --
>
> Key: ARROW-1983
> URL: https://issues.apache.org/jira/browse/ARROW-1983
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Jim Crist
>Priority: Major
>  Labels: beginner, parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file 
> (mostly just schema information). It would be useful to add the ability to 
> write a {{_metadata}} file as well. This should include information about 
> each row group in the dataset, including summary statistics. Having this 
> summary file would allow filtering of row groups without needing to access 
> each file beforehand.
> This would require that the user is able to get the written RowGroups out of 
> a {{pyarrow.parquet.write_table}} call and then give these objects as a list 
> to new function that then passes them on as C++ objects to {{parquet-cpp}} 
> that generates the respective {{_metadata}} file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853433#comment-16853433
 ] 

Wes McKinney edited comment on ARROW-1983 at 5/31/19 9:28 PM:
--

Yes. I don't think it is necessary to resolve all of this in a single patch, so 
we can open a follow-up JIRA to implement the optimization to read a row group 
given a _metadata file. There is some other complexity there such as how to 
open the filepath (you need a FileSystem handle -- see the filesystem API work 
that is in process)


was (Author: wesmckinn):
Yes. I don't think it necessarily to resolve all of this in a single patch, so 
we can open a follow-up JIRA to implement the optimization to read a row group 
given a _metadata file. There is some other complexity there such as how to 
open the filepath (you need a FileSystem handle -- see the filesystem API work 
that is in process)

> [Python] Add ability to write parquet `_metadata` file
> --
>
> Key: ARROW-1983
> URL: https://issues.apache.org/jira/browse/ARROW-1983
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Jim Crist
>Priority: Major
>  Labels: beginner, parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file 
> (mostly just schema information). It would be useful to add the ability to 
> write a {{_metadata}} file as well. This should include information about 
> each row group in the dataset, including summary statistics. Having this 
> summary file would allow filtering of row groups without needing to access 
> each file beforehand.
> This would require that the user is able to get the written RowGroups out of 
> a {{pyarrow.parquet.write_table}} call and then give these objects as a list 
> to new function that then passes them on as C++ objects to {{parquet-cpp}} 
> that generates the respective {{_metadata}} file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1837:

Priority: Major  (was: Blocker)

> [Java] Unable to read unsigned integers outside signed range for bit width in 
> integration tests
> ---
>
> Key: ARROW-1837
> URL: https://issues.apache.org/jira/browse/ARROW-1837
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 0.14.0
>
> Attachments: generated_primitive.json
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I believe this was introduced recently (perhaps in the refactors), but there 
> was a problem where the integration tests weren't being properly run that hid 
> the error from us
> see https://github.com/apache/arrow/pull/1294#issuecomment-345553066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3571:

Priority: Major  (was: Blocker)

> [Wiki] Release management guide does not explain how to set up Crossbow or 
> where to find instructions
> -
>
> Key: ARROW-3571
> URL: https://issues.apache.org/jira/browse/ARROW-3571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Wiki
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> If you follow the guide, at one point it says "Launch a Crossbow build" but 
> provides no link to the setup instructions for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-31 Thread Rick Zamora (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853430#comment-16853430
 ] 

Rick Zamora commented on ARROW-1983:


Right, I see what you are saying.  You can pass in a list of files to 
pq.ParquetDataset (obtained by calling read_metadata on the metadata file), but 
the footer metadata will be unecessarily parsed a second time.   For dask, this 
is probably not much of an issue, because each worker will only be dealing with 
a subset of the global dataset. In many other cases this is clearly 
undesireable.

 

> [Python] Add ability to write parquet `_metadata` file
> --
>
> Key: ARROW-1983
> URL: https://issues.apache.org/jira/browse/ARROW-1983
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Jim Crist
>Priority: Major
>  Labels: beginner, parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file 
> (mostly just schema information). It would be useful to add the ability to 
> write a {{_metadata}} file as well. This should include information about 
> each row group in the dataset, including summary statistics. Having this 
> summary file would allow filtering of row groups without needing to access 
> each file beforehand.
> This would require that the user is able to get the written RowGroups out of 
> a {{pyarrow.parquet.write_table}} call and then give these objects as a list 
> to new function that then passes them on as C++ objects to {{parquet-cpp}} 
> that generates the respective {{_metadata}} file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries

2019-05-31 Thread David Li (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853425#comment-16853425
 ] 

David Li edited comment on ARROW-5143 at 5/31/19 9:05 PM:
--

[~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282], the 
non-nested dictionary case worked with some additional effort, so that PR 
enables that. I didn't look into why the nested case still fails.


was (Author: lidavidm):
[~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282,] the 
non-nested dictionary case worked with some additional effort, so that PR 
enables that. I didn't look into why the nested case still fails.

> [Flight] Enable integration testing of batches with dictionaries
> 
>
> Key: ARROW-5143
> URL: https://issues.apache.org/jira/browse/ARROW-5143
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Integration
>Reporter: David Li
>Priority: Major
>  Labels: flight
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries

2019-05-31 Thread David Li (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853425#comment-16853425
 ] 

David Li commented on ARROW-5143:
-

[~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282,] the 
non-nested dictionary case worked with some additional effort, so that PR 
enables that. I didn't look into why the nested case still fails.

> [Flight] Enable integration testing of batches with dictionaries
> 
>
> Key: ARROW-5143
> URL: https://issues.apache.org/jira/browse/ARROW-5143
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Integration
>Reporter: David Li
>Priority: Major
>  Labels: flight
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5396) [JS] Ensure reader and writer support files and streams with no RecordBatches

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5396.
-
Resolution: Fixed

Issue resolved by pull request 4373
[https://github.com/apache/arrow/pull/4373]

> [JS] Ensure reader and writer support files and streams with no RecordBatches
> -
>
> Key: ARROW-5396
> URL: https://issues.apache.org/jira/browse/ARROW-5396
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.13.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Re: https://issues.apache.org/jira/browse/ARROW-2119 and 
> [https://github.com/apache/arrow/pull/3871], the JS reader and writer should 
> support files and streams with a Schema but no RecordBatches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-5055:

Fix Version/s: 0.15.0

> [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
> 
>
> Key: ARROW-5055
> URL: https://issues.apache.org/jira/browse/ARROW-5055
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.12.1
> Environment: windows, MSYS2
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.15.0
>
>
> MSYS2 doesn't include the parquet libraries so cannot use red-parquet which 
> uses gobject-introspection against libparquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby

2019-05-31 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853423#comment-16853423
 ] 

Kouhei Sutou commented on ARROW-5055:
-

0.14.0 includes Parquet support with MinGW build.
We can close this when 0.14.0 is released and 
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-arrow/PKGBUILD is 
updated.

> [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
> 
>
> Key: ARROW-5055
> URL: https://issues.apache.org/jira/browse/ARROW-5055
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.12.1
> Environment: windows, MSYS2
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
>
> MSYS2 doesn't include the parquet libraries so cannot use red-parquet which 
> uses gobject-introspection against libparquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-5055:

Summary: [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby  
(was: [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby)

> [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
> 
>
> Key: ARROW-5055
> URL: https://issues.apache.org/jira/browse/ARROW-5055
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.12.1
> Environment: windows, MSYS2
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
>
> MSYS2 doesn't include the parquet libraries so cannot use red-parquet which 
> uses gobject-introspection against libparquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4021) [Ruby] Error building red-arrow on msys2

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-4021.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

I think that this is pkg-config gem issue. And it has been fixed.

> [Ruby] Error building red-arrow on msys2
> 
>
> Key: ARROW-4021
> URL: https://issues.apache.org/jira/browse/ARROW-4021
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
> Environment: windows 7, ruby 
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.14.0
>
> Attachments: gem_make.out, mkmf.log
>
>
> Trying to install red-arrow on ruby version 2.5.3 and it doesn't compile.  I 
> installed arrow with msys2
> "mingw64/mingw-w64-x86_64-arrow 0.11.1-1 [installed]
>  Apache Arrow is a cross-language development platform for in-memory data 
> (mingw-w64)"
> C:\Users\Dominic E 
> Sisneros\Documents\work_new\projects\nexcom\dominic\SLCI_RTR\drawings\working>ruby
>  --version
> ruby 2.5.3p105 (2018-10-18 revision 65156) [x64-mingw32]
> E:\Sisneros\Documents\work_new\projects\nexcom\dominic\SLCI_RTR\drawings\working>gem
>  install red-arrow
> g required msys2 packages: mingw-w64-x86_64-glib2
> mingw-w64-x86_64-glib2-2.58.1-1 is up to date -- skipping
> native extensions. This could take a while...
> rror installing red-arrow:
> RROR: Failed to build gem native extension.
> nt directory: 
> E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2
> /rubyinstaller-2.5.3-1-x64/bin/ruby.exe -r 
> ./siteconf20181213-23396-1gomjgx.rb extconf.rb
> for --enable-debug-build option... no
> for -Wall option to compiler... yes
> for -Waggregate-return option to compiler... yes
> for -Wcast-align option to compiler... yes
> for -Wextra option to compiler... yes
> for -Wformat=2 option to compiler... yes
> for -Winit-self option to compiler... yes
> for -Wlarger-than-65500 option to compiler... yes
> for -Wmissing-declarations option to compiler... yes
> for -Wmissing-format-attribute option to compiler... yes
> for -Wmissing-include-dirs option to compiler... yes
> for -Wmissing-noreturn option to compiler... yes
> for -Wmissing-prototypes option to compiler... yes
> for -Wnested-externs option to compiler... yes
> for -Wold-style-definition option to compiler... yes
> for -Wpacked option to compiler... yes
> for -Wp,-D_FORTIFY_SOURCE=2 option to compiler... yes
> for -Wpointer-arith option to compiler... yes
> for -Wswitch-default option to compiler... yes
> for -Wswitch-enum option to compiler... yes
> for -Wundef option to compiler... yes
> for -Wout-of-line-declaration option to compiler... no
> for -Wunsafe-loop-optimizations option to compiler... yes
> for -Wwrite-strings option to compiler... yes
> for Windows... yes
> for gobject-2.0 version (>= 2.12.0)... yes
> for gthread-2.0... yes
> for unistd.h... yes
> for io.h... yes
> for g_spawn_close_pid() in glib.h... no
> for g_thread_init() in glib.h... no
> for g_main_depth() in glib.h... no
> for g_listenv() in glib.h... no
> for rb_check_array_type() in ruby.h... yes
> for rb_check_hash_type() in ruby.h... yes
> for rb_exec_recursive() in ruby.h... yes
> for rb_errinfo() in ruby.h... yes
> for rb_thread_call_without_gvl() in ruby.h... yes
> for ruby_native_thread_p() in ruby.h... yes
> for rb_thread_call_with_gvl() in ruby.h... yes
> for rb_gc_register_mark_object() in ruby.h... yes
> for rb_exc_new_str() in ruby.h... yes
> for rb_enc_str_new_static() in ruby.h... yes
> for curr_thread in ruby.h,node.h... no
> for rb_curr_thread in ruby.h,node.h... no
> ruby-glib2.pc
> glib-enum-types.c
> glib-enum-types.h
> Makefile
> irectory: 
> E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2
> TDIR=" clean
> irectory: 
> E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2
> TDIR="
>  glib-enum-types.c
>  rbglib-bytes.c
>  rbglib-gc.c
> .c: In function 'gc_marker_mark_each':
> .c:26:30: warning: unused parameter 'key' [-Wunused-parameter]
> r_mark_each(gpointer key, gpointer value, gpointer user_data)
>  ~^~~
> .c:26:60: warning: unused parameter 'user_data' [-Wunused-parameter]
> r_mark_each(gpointer key, gpointer value, gpointer user_data)
>  ~^
> .c: At top level:
> .c:53:5: warning: missing initializer for field 'reserved' of 'struct 
> ' [-Wmissing-field-initializers]
> ncluded from E:/rubies/rubyinstaller-2.5.3-1-x64/include/ruby-2.5.0/ruby.h:33,
>  from rbgobject.h:27,
>  from rbgprivate.h:33,
>  from rbglib-gc.c:21:
> /rubyinstaller-2.5.3-1-x64/include/ruby-2.5.0/ruby/ruby.h:1088:8: note: 
> 'reserved' declared here
> eserved[2]; /* For future extension.
> ~~~
>  rbglib-variant-type.c
>  rbglib-variant.c
>  rbglib.c
>  In function 

[jira] [Assigned] (ARROW-5055) [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-5055:
---

Assignee: Kouhei Sutou

> [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby
> 
>
> Key: ARROW-5055
> URL: https://issues.apache.org/jira/browse/ARROW-5055
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.12.1
> Environment: windows, MSYS2
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
>
> MSYS2 doesn't include the parquet libraries so cannot use red-parquet which 
> uses gobject-introspection against libparquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5345) [C++] Relax Field hashing in DictionaryMemo

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5345:

Fix Version/s: (was: 0.14.0)

> [C++] Relax Field hashing in DictionaryMemo
> ---
>
> Key: ARROW-5345
> URL: https://issues.apache.org/jira/browse/ARROW-5345
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Follow up to ARROW-3144
> Currently we associate dictionaries with a hash table mapping a Field's 
> memory address to a dictionary id. This poses an issue if two RecordBatches 
> are equal (equal field names, equal types) but were instantiated separately. 
> We don't have a hash function in C++ for Field so we should consider 
> implementing one and using that instead (if it is not too expensive) so that 
> same but "different" (different C++ objects) won't blow up in the user's face 
> with an unintuitive error (this did in fact occur once in the Python test 
> suite, not sure exactly why it wasn't a problem before, I think it worked "by 
> accident")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853418#comment-16853418
 ] 

Wes McKinney commented on ARROW-5138:
-

I think we should change the RangeIndex optimization to only do so for trivial 
RangeIndex starting at 0 and with step 1. Then this issue is resolved

> [Python/C++] Row group retrieval doesn't restore index properly
> ---
>
> Key: ARROW-5138
> URL: https://issues.apache.org/jira/browse/ARROW-5138
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.13.0
>Reporter: Florian Jetter
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> When retrieving row groups the index is no longer properly restored to its 
> initial value and is set to an range index starting at zero no matter what. 
> version 0.12.1 restored and int64 index with the correct index values.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> print(pa.__version__)
> df = pd.DataFrame(
> {"a": [1, 2, 3, 4]}
> )
> print("total DF")
> print(df.index)
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf, chunk_size=2)
> reader = pa.BufferReader(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(reader)
> rg = parquet_file.read_row_group(1)
> df_restored = rg.to_pandas()
> print("Row group")
> print(df_restored.index)
> {code}
> Previous behavior
> {code:python}
> 0.12.1
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> Int64Index([2, 3], dtype='int64')
> {code}
> Behavior now
> {code:python}
> 0.13.0
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> RangeIndex(start=0, stop=2, step=1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853417#comment-16853417
 ] 

Wes McKinney commented on ARROW-5082:
-

In investigating the wheels I found that something is wrong wrong with the 
shared library symlinks causing the shared libs to be duplicated. I do not know 
how much this impacts the final wheel size. Until that issue is fixed at least, 
I'm not comfortable releasing the project again

> [Python][Packaging] Reduce size of macOS and manylinux1 wheels
> --
>
> Key: ARROW-5082
> URL: https://issues.apache.org/jira/browse/ARROW-5082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is 
> mostly because of LLVM but we should take a closer look to see if the size 
> can be reduced



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853417#comment-16853417
 ] 

Wes McKinney edited comment on ARROW-5082 at 5/31/19 8:52 PM:
--

In investigating the wheels I found that something is wrong with the shared 
library symlinks causing the shared libs to be duplicated. I do not know how 
much this impacts the final wheel size. Until that issue is fixed at least, I'm 
not comfortable releasing the project again


was (Author: wesmckinn):
In investigating the wheels I found that something is wrong wrong with the 
shared library symlinks causing the shared libs to be duplicated. I do not know 
how much this impacts the final wheel size. Until that issue is fixed at least, 
I'm not comfortable releasing the project again

> [Python][Packaging] Reduce size of macOS and manylinux1 wheels
> --
>
> Key: ARROW-5082
> URL: https://issues.apache.org/jira/browse/ARROW-5082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is 
> mostly because of LLVM but we should take a closer look to see if the size 
> can be reduced



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5082:

Priority: Blocker  (was: Major)

> [Python][Packaging] Reduce size of macOS and manylinux1 wheels
> --
>
> Key: ARROW-5082
> URL: https://issues.apache.org/jira/browse/ARROW-5082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is 
> mostly because of LLVM but we should take a closer look to see if the size 
> can be reduced



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5415) [Release] Release script should update R version everywhere

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5415:
--

Assignee: Neal Richardson

> [Release] Release script should update R version everywhere
> ---
>
> Key: ARROW-5415
> URL: https://issues.apache.org/jira/browse/ARROW-5415
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: release
> Fix For: 0.14.0
>
>
> See [https://github.com/apache/arrow/pull/4322#discussion_r287151330.] There 
> are probably other places that should be updated (NEWS.md, which doesn't yet 
> exist but needs to).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5033) [C++] JSON table writer

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5033:
---

Assignee: (was: Benjamin Kietzman)

> [C++] JSON table writer
> ---
>
> Key: ARROW-5033
> URL: https://issues.apache.org/jira/browse/ARROW-5033
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> Users who need to emit json in line delimited format currently cannot do so 
> using arrow. It should be straightforward to implement this efficiently, and 
> it will be very helpful for testing and benchmarking



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5033) [C++] JSON table writer

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5033:

Fix Version/s: (was: 0.14.0)

> [C++] JSON table writer
> ---
>
> Key: ARROW-5033
> URL: https://issues.apache.org/jira/browse/ARROW-5033
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>
> Users who need to emit json in line delimited format currently cannot do so 
> using arrow. It should be straightforward to implement this efficiently, and 
> it will be very helpful for testing and benchmarking



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5036) [Plasma][C++] Serialization tests resort to memcpy to check equality

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5036:

Fix Version/s: (was: 0.14.0)

> [Plasma][C++] Serialization tests resort to memcpy to check equality
> 
>
> Key: ARROW-5036
> URL: https://issues.apache.org/jira/browse/ARROW-5036
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Francois Saint-Jacques
>Priority: Major
>
> {code:bash}
> 1: 
> /tmp/arrow-0.13.0.Q4czW/apache-arrow-0.13.0/cpp/src/plasma/test/serialization_tests.cc:193:
>  Failure
> 1: Expected equality of these values:
> 1:   memcmp(_objects[object_ids[0]], _objects_return[0], 
> sizeof(PlasmaObject))
> 1: Which is: 45
> 1:   0
> 1: [  FAILED  ] PlasmaSerialization.GetReply (0 ms)
> {code}
> The source of the problem is the random_plasma_object stack allocated object. 
> As a fix, I propose that PlasmaObject implements the `operator==` method and 
> drops the memcpy equality check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5028) [Python][C++] Arrow to Parquet conversion drops and corrupts values

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853414#comment-16853414
 ] 

Wes McKinney commented on ARROW-5028:
-

[~marco.neumann.by] have you been able to make any progress with this?

> [Python][C++] Arrow to Parquet conversion drops and corrupts values
> ---
>
> Key: ARROW-5028
> URL: https://issues.apache.org/jira/browse/ARROW-5028
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.13.0
> Environment: python 3.6
>Reporter: Marco Neumann
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
> Attachments: dct.pickle.gz
>
>
> I am sorry if this bugs feels rather long and the reproduction data is large, 
> but I was not able to reduce the data even further while still triggering the 
> problem. I was able to trigger this behavior on master and on {{0.11.1}}.
> {code:python}
> import io
> import os.path
> import pickle
> import numpy as np
> import pyarrow as pa
> import pyarrow.parquet as pq
> def dct_to_table(index_dct):
> labeled_array = pa.array(np.array(list(index_dct.keys(
> partition_array = pa.array(np.array(list(index_dct.values(
> return pa.Table.from_arrays(
> [labeled_array, partition_array], names=['a', 'b']
> )
> def check_pq_nulls(data):
> fp = io.BytesIO(data)
> pfile = pq.ParquetFile(fp)
> assert pfile.num_row_groups == 1
> md = pfile.metadata.row_group(0)
> col = md.column(1)
> assert col.path_in_schema == 'b.list.item'
> assert col.statistics.null_count == 0  # fails
> def roundtrip(table):
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> data = buf.getvalue().to_pybytes()
> # this fails:
> #   check_pq_nulls(data)
> reader = pa.BufferReader(data)
> return pq.read_table(reader)
> with open(os.path.join(os.path.dirname(__file__), 'dct.pickle'), 'rb') as fp:
> dct = pickle.load(fp)
> # this does NOT help:
> #   pa.set_cpu_count(1)
> #   import gc; gc.disable()
> table = dct_to_table(dct)
> # this fixes the issue:
> #   table = pa.Table.from_pandas(table.to_pandas())
> table2 = roundtrip(table)
> assert table.column('b').null_count == 0
> assert table2.column('b').null_count == 0  # fails
> # if table2 is converted to pandas, you can also observe that some values at 
> the end of column b are `['']` which clearly is not present in the original 
> data
> {code}
> I would also be thankful for any pointers on where the bug comes from or on 
> who to reduce the test case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4999) [Doc][C++] Add examples on how to construct with ArrayData::Make instead of builder classes

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4999:

Summary: [Doc][C++] Add examples on how to construct with ArrayData::Make 
instead of builder classes  (was: [Doc] Add examples on how to construct with 
ArrayData::Make instead of builder classes)

> [Doc][C++] Add examples on how to construct with ArrayData::Make instead of 
> builder classes
> ---
>
> Key: ARROW-4999
> URL: https://issues.apache.org/jira/browse/ARROW-4999
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4888) [C++/Python] Test build with conda's defaults channel

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4888:

Fix Version/s: (was: 0.14.0)

> [C++/Python] Test build with conda's defaults channel
> -
>
> Key: ARROW-4888
> URL: https://issues.apache.org/jira/browse/ARROW-4888
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging, Python
>Reporter: Uwe L. Korn
>Priority: Major
>
> We mostly use {{conda-forge}} as the developers of Arrow but also have some 
> users that would build with packages from {{defaults}}. As the versions of 
> packages is a bit behind (and sometimes the contents are different), we 
> should also have a docker-test for this channel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4917:

Fix Version/s: (was: 0.14.0)

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4999) [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4999:

Component/s: C++

> [Doc] Add examples on how to construct with ArrayData::Make instead of 
> builder classes
> --
>
> Key: ARROW-4999
> URL: https://issues.apache.org/jira/browse/ARROW-4999
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4960) [R] Add crossbow task for r-arrow-feedstock

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4960:

Fix Version/s: (was: 0.14.0)

> [R] Add crossbow task for r-arrow-feedstock
> ---
>
> Key: ARROW-4960
> URL: https://issues.apache.org/jira/browse/ARROW-4960
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, R
>Reporter: Uwe L. Korn
>Priority: Major
>
> We also have an R package on conda-forge now: 
> [https://github.com/conda-forge/r-arrow-feedstock] This should be tested 
> using crossbow as we do with the other packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4884) [C++] conda-forge thrift-cpp package not available via pkg-config or cmake

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4884.
---
Resolution: Not A Problem

> [C++] conda-forge thrift-cpp package not available via pkg-config or cmake
> --
>
> Key: ARROW-4884
> URL: https://issues.apache.org/jira/browse/ARROW-4884
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Artifact of CMake refactor
> I opened https://github.com/conda-forge/thrift-cpp-feedstock/issues/35 about 
> investigating why Thrift does not export the correct files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4868) [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4868:

Fix Version/s: (was: 0.14.0)

> [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04
> ---
>
> Key: ARROW-4868
> URL: https://issues.apache.org/jira/browse/ARROW-4868
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Wes McKinney
>Priority: Major
>
> It would be nice for things to work out of the box, but maybe not worth it. I 
> can use vendored Boost for now
> {code}
> /usr/include/boost/functional/hash/extensions.hpp:269:20: error: no matching 
> function for call to 'hash_value'
> return hash_value(val);
>^~
> /usr/include/boost/functional/hash/hash.hpp:249:17: note: in instantiation of 
> member function 'boost::hash 
> >::operator()' requested here
> seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
> ^
> /home/wesm/code/arrow/cpp/src/gandiva/filter_cache_key.h:40:12: note: in 
> instantiation of function template specialization 
> 'boost::hash_combine >' requested here
> boost::hash_combine(result, configuration);
>^
> /usr/include/boost/functional/hash/extensions.hpp:70:17: note: candidate 
> template ignored: could not match 'pair' against 'shared_ptr'
> std::size_t hash_value(std::pair const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:79:17: note: candidate 
> template ignored: could not match 'vector' against 'shared_ptr'
> std::size_t hash_value(std::vector const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:85:17: note: candidate 
> template ignored: could not match 'list' against 'shared_ptr'
> std::size_t hash_value(std::list const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:91:17: note: candidate 
> template ignored: could not match 'deque' against 'shared_ptr'
> std::size_t hash_value(std::deque const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:97:17: note: candidate 
> template ignored: could not match 'set' against 'shared_ptr'
> std::size_t hash_value(std::set const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:103:17: note: candidate 
> template ignored: could not match 'multiset' against 'shared_ptr'
> std::size_t hash_value(std::multiset const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:109:17: note: candidate 
> template ignored: could not match 'map' against 'shared_ptr'
> std::size_t hash_value(std::map const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:115:17: note: candidate 
> template ignored: could not match 'multimap' against 'shared_ptr'
> std::size_t hash_value(std::multimap const& v)
> ^
> /usr/include/boost/functional/hash/extensions.hpp:121:17: note: candidate 
> template ignored: could not match 'complex' against 'shared_ptr'
> std::size_t hash_value(std::complex const& v)
> ^
> /usr/include/boost/functional/hash/hash.hpp:187:57: note: candidate template 
> ignored: substitution failure [with T = 
> std::shared_ptr]: no type named 'type' in 
> 'boost::hash_detail::basic_numbers >'
> typename boost::hash_detail::basic_numbers::type hash_value(T v)
> ^
> /usr/include/boost/functional/hash/hash.hpp:193:56: note: candidate template 
> ignored: substitution failure [with T = 
> std::shared_ptr]: no type named 'type' in 
> 'boost::hash_detail::long_numbers >'
> typename boost::hash_detail::long_numbers::type hash_value(T v)
>    ^
> /usr/include/boost/functional/hash/hash.hpp:199:57: note: candidate template 
> ignored: substitution failure [with T = 
> std::shared_ptr]: no type named 'type' in 
> 'boost::hash_detail::ulong_numbers >'
> typename boost::hash_detail::ulong_numbers::type hash_value(T v)
> ^
> /usr/include/boost/functional/hash/hash.hpp:205:31: note: candidate template 
> ignored: disabled by 'enable_if' [with T = 
> std::shared_ptr]
> typename boost::enable_if, std::size_t>::type
>   ^
> /usr/include/boost/functional/hash/hash.hpp:213:36: note: candidate template 
> ignored: could not match 'T *const' against 'const 
> std::shared_ptr'
> template  std::size_t hash_value(T* const& v)
>^
> /usr/include/boost/functional/hash/hash.hpp:306:24: note: candidate template 
> ignored: could not match 'const T [N]' against 'const 
> std::shared_ptr'
> inline 

[jira] [Updated] (ARROW-3975) Find a better organizational scheme for inter-language integration / protocol tests and integration tests between Apache Arrow and third party projects

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-3975:
---
Component/s: Continuous Integration

> Find a better organizational scheme for inter-language integration / protocol 
> tests and integration tests between Apache Arrow and third party projects
> ---
>
> Key: ARROW-3975
> URL: https://issues.apache.org/jira/browse/ARROW-3975
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Some integration tests with 3rd party projects have gotten moved to 
> https://github.com/apache/arrow/tree/master/integration which doesn't look 
> right to me. I suggest we either find a new home in the codebase for the 
> protocol integration tests or move the 3rd party integration tests somewhere 
> else



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5452) [R] Add documentation website (pkgdown)

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5452:
---
Component/s: R
 Documentation

> [R] Add documentation website (pkgdown)
> ---
>
> Key: ARROW-5452
> URL: https://issues.apache.org/jira/browse/ARROW-5452
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> pkgdown ([https://pkgdown.r-lib.org/]) is the standard for R package 
> documentation websites. Build this for arrow and deploy it at 
> https://arrow.apache.org/docs/r.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4875) [C++] MSVC Boost warnings after CMake refactor on cmake 3.12

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4875:

Fix Version/s: (was: 0.14.0)

> [C++] MSVC Boost warnings after CMake refactor on cmake 3.12
> 
>
> Key: ARROW-4875
> URL: https://issues.apache.org/jira/browse/ARROW-4875
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> I haven't investigated if this was present before the refactor, but since we 
> set {{Boost_ADDITIONAL_VERSIONS}} in theory this "scary" warning should not 
> show up
> {code}
> CMake Warning at C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847
>  (message):
>   New Boost version may have incorrect or missing dependencies and imported
>   targets
> Call Stack (most recent call first):
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959
>  (_Boost_COMPONENT_DEPENDENCIES)
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618
>  (_Boost_MISSING_DEPENDENCIES)
>   cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package)
>   CMakeLists.txt:536 (include)
> CMake Warning at C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847
>  (message):
>   New Boost version may have incorrect or missing dependencies and imported
>   targets
> Call Stack (most recent call first):
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959
>  (_Boost_COMPONENT_DEPENDENCIES)
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618
>  (_Boost_MISSING_DEPENDENCIES)
>   cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package)
>   CMakeLists.txt:536 (include)
> CMake Warning at C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847
>  (message):
>   New Boost version may have incorrect or missing dependencies and imported
>   targets
> Call Stack (most recent call first):
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959
>  (_Boost_COMPONENT_DEPENDENCIES)
>   C:/Program Files (x86)/Microsoft Visual 
> Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618
>  (_Boost_MISSING_DEPENDENCIES)
>   cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package)
>   CMakeLists.txt:536 (include)
> -- Boost version: 1.69.0
> -- Found the following Boost libraries:
> --   regex
> --   system
> --   filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4864) [C++] gandiva-micro_benchmarks is broken in MSVC build

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4864:

Fix Version/s: (was: 0.14.0)

> [C++] gandiva-micro_benchmarks is broken in MSVC build
> --
>
> Key: ARROW-4864
> URL: https://issues.apache.org/jira/browse/ARROW-4864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Pindikura Ravindra
>Priority: Major
>
> Not a blocking issue for 0.13. I encountered this when debugging the CMake 
> refactor branch with Visual Studio 2015



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4830) [Python] Remove backward compatibility hacks from pyarrow.pandas_compat

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4830:

Fix Version/s: (was: 0.14.0)

> [Python] Remove backward compatibility hacks from pyarrow.pandas_compat
> ---
>
> Key: ARROW-4830
> URL: https://issues.apache.org/jira/browse/ARROW-4830
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> This code is growing less maintainable. I think we can remove these backwards 
> compatibility hacks since there are released versions of pyarrow that can be 
> used to read old metadata and "fix" Parquet files if need be



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5470:
---
Component/s: Continuous Integration

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4860) [C++] Build AWS C++ SDK for Windows in conda-forge

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4860:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Build AWS C++ SDK for Windows in conda-forge
> --
>
> Key: ARROW-4860
> URL: https://issues.apache.org/jira/browse/ARROW-4860
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: filesystem
> Fix For: 0.15.0
>
>
> We the aws-sdk-cpp package to be able to use the C++ SDK for S3 support. it 
> is currently available for Linux and macOS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4809) [Python] import error with undefined symbol _ZNK5arrow6Status8ToStringB5xcc11Ev

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853411#comment-16853411
 ] 

Wes McKinney commented on ARROW-4809:
-

I'm inclined to close this unless there is something we can do in the Arrow 
project to help

> [Python] import error with undefined symbol 
> _ZNK5arrow6Status8ToStringB5xcc11Ev
> ---
>
> Key: ARROW-4809
> URL: https://issues.apache.org/jira/browse/ARROW-4809
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
> Environment: RHELS 6.10; Python 3.7.2
>Reporter: David Schwab
>Priority: Major
>
> I installed conda 4.5.12 and created a new environment named test-env. I 
> activated this environment and installed several packages with conda, 
> including pyarrow. When I run a Python shell and import pyarrow, I get the 
> following error:
>  
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/test-env/lib/python3.7/site-packages/pyarrow/__init__.py", line 54, 
> in 
>     from pyarrow.lib import cpu_count, set cpu_count
> Import Error: 
> /test-env/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so:
>  undefined symbol:  _ZNK5arrow6Status8ToStringB5xcc11Ev
> {code}
> From Googling, I believe this has to do with the compiler flags used to build 
> either pyarrow or one of its dependencies (libboost has been suggested); I 
> can build the package from source if I need to, but I'm not sure what flags I 
> would need to set to fix the error.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4810) [Format][C++] Add "LargeList" type with 64-bit offsets

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4810:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Format][C++] Add "LargeList" type with 64-bit offsets
> --
>
> Key: ARROW-4810
> URL: https://issues.apache.org/jira/browse/ARROW-4810
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Wes McKinney
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Mentioned in https://github.com/apache/arrow/issues/3845



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4809) [Python] import error with undefined symbol _ZNK5arrow6Status8ToStringB5xcc11Ev

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4809:

Fix Version/s: (was: 0.14.0)

> [Python] import error with undefined symbol 
> _ZNK5arrow6Status8ToStringB5xcc11Ev
> ---
>
> Key: ARROW-4809
> URL: https://issues.apache.org/jira/browse/ARROW-4809
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
> Environment: RHELS 6.10; Python 3.7.2
>Reporter: David Schwab
>Priority: Major
>
> I installed conda 4.5.12 and created a new environment named test-env. I 
> activated this environment and installed several packages with conda, 
> including pyarrow. When I run a Python shell and import pyarrow, I get the 
> following error:
>  
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/test-env/lib/python3.7/site-packages/pyarrow/__init__.py", line 54, 
> in 
>     from pyarrow.lib import cpu_count, set cpu_count
> Import Error: 
> /test-env/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so:
>  undefined symbol:  _ZNK5arrow6Status8ToStringB5xcc11Ev
> {code}
> From Googling, I believe this has to do with the compiler flags used to build 
> either pyarrow or one of its dependencies (libboost has been suggested); I 
> can build the package from source if I need to, but I'm not sure what flags I 
> would need to set to fix the error.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4798) [C++] Re-enable runtime/references cpplint check

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4798:

Fix Version/s: (was: 0.14.0)

> [C++] Re-enable runtime/references cpplint check
> 
>
> Key: ARROW-4798
> URL: https://issues.apache.org/jira/browse/ARROW-4798
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> This will help keep the codebase clean.
> We might consider defining some custom filters for cpplint warnings we want 
> to suppress, like it doesn't like {{benchmark::State&}} because of the 
> non-const reference



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853410#comment-16853410
 ] 

Wes McKinney commented on ARROW-4787:
-

FYI [~fsaintjacques] [~pitrou] -- it is important to be able to compute 
analytics for values occurring when hash keys are null, rather than dropping 
them

> [C++] Include "null" values (perhaps with an option to toggle on/off) in hash 
> kernel actions
> 
>
> Key: ARROW-4787
> URL: https://issues.apache.org/jira/browse/ARROW-4787
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> Null is a meaningful value in the context of analytics. We should have the 
> option of considering it distinctly in e.g. {{ValueCounts}} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4788) [C++] Develop less verbose API for constructing StructArray

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4788:

Fix Version/s: (was: 0.14.0)

> [C++] Develop less verbose API for constructing StructArray
> ---
>
> Key: ARROW-4788
> URL: https://issues.apache.org/jira/browse/ARROW-4788
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> See comment at 
> https://github.com/apache/arrow/pull/3579/files#diff-7a1bd8476ae3e687fa8d961059596f06R526



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4799) [C++] Propose alternative strategy for handling Operation logical output types

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4799:

Fix Version/s: (was: 0.14.0)

> [C++] Propose alternative strategy for handling Operation logical output types
> --
>
> Key: ARROW-4799
> URL: https://issues.apache.org/jira/browse/ARROW-4799
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Currently in the prototype work in ARROW-4782, operations are being "boxed" 
> in a strongly typed Expr types. An alternative structure would be for an 
> operation to define a virtual
> {code}
> virtual std::shared_ptr out_type() const = 0;
> {code}
> Where {{ArgType}} is some class that encodes the arity (array vs. scalar 
> vs) and value type (if any) that is emitted by the operation.
> Operations emitting multiple pieces of data would need some kind of "tuple" 
> object output. We can iterate on this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4761) [C++] Support zstandard<1

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4761:

Fix Version/s: (was: 0.14.0)

> [C++] Support zstandard<1
> -
>
> Key: ARROW-4761
> URL: https://issues.apache.org/jira/browse/ARROW-4761
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Priority: Major
>
> To support building with as many system packages as possible on Ubuntu, we 
> should support building with zstandard 0.5.1 which is the one available on 
> Ubuntu Xenial. Given the size of our current code for Zstandard, this seems 
> feasible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4761) [C++] Support zstandard<1

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4761:
---

Assignee: (was: Uwe L. Korn)

> [C++] Support zstandard<1
> -
>
> Key: ARROW-4761
> URL: https://issues.apache.org/jira/browse/ARROW-4761
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> To support building with as many system packages as possible on Ubuntu, we 
> should support building with zstandard 0.5.1 which is the one available on 
> Ubuntu Xenial. Given the size of our current code for Zstandard, this seems 
> feasible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5470:
--
Labels: pull-request-available  (was: )

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4787:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Include "null" values (perhaps with an option to toggle on/off) in hash 
> kernel actions
> 
>
> Key: ARROW-4787
> URL: https://issues.apache.org/jira/browse/ARROW-4787
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> Null is a meaningful value in the context of analytics. We should have the 
> option of considering it distinctly in e.g. {{ValueCounts}} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853409#comment-16853409
 ] 

Neal Richardson commented on ARROW-5470:


Just reinstalling the packages that got removed seems to fix it. We now get 
through our expected R package build failure 
[https://travis-ci.org/nealrichardson/arrow/jobs/539871131]

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4504) [C++] Reduce the number of unit test executables

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4504:
--
Labels: pull-request-available  (was: )

> [C++] Reduce the number of unit test executables
> 
>
> Key: ARROW-4504
> URL: https://issues.apache.org/jira/browse/ARROW-4504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Link times are a significant drag in MSVC builds. They don't affect Linux 
> nearly as much when building with Ninja. I suggest we combine some of the 
> fast-running tests within logical units to see if we can cut down from 106 
> test executables to 70 or so
> {code}
> 100% tests passed, 0 tests failed out of 107
> Label Time Summary:
> arrow-tests   =  21.19 sec*proc (48 tests)
> arrow_python-tests=   0.26 sec*proc (1 test)
> example   =   0.05 sec*proc (1 test)
> gandiva-tests =  11.65 sec*proc (39 tests)
> parquet-tests =  35.81 sec*proc (18 tests)
> unittest  =  68.92 sec*proc (106 tests)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853383#comment-16853383
 ] 

Wes McKinney commented on ARROW-1983:
-

Well, one issue is how to use the _metadata file to read data from the files it 
lists within without having to parse those files' respective metadata again. I 
think this may require a little bit of refactoring in the Parquet C++ library

> [Python] Add ability to write parquet `_metadata` file
> --
>
> Key: ARROW-1983
> URL: https://issues.apache.org/jira/browse/ARROW-1983
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Jim Crist
>Priority: Major
>  Labels: beginner, parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file 
> (mostly just schema information). It would be useful to add the ability to 
> write a {{_metadata}} file as well. This should include information about 
> each row group in the dataset, including summary statistics. Having this 
> summary file would allow filtering of row groups without needing to access 
> each file beforehand.
> This would require that the user is able to get the written RowGroups out of 
> a {{pyarrow.parquet.write_table}} call and then give these objects as a list 
> to new function that then passes them on as C++ objects to {{parquet-cpp}} 
> that generates the respective {{_metadata}} file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-31 Thread Rick Zamora (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853378#comment-16853378
 ] 

Rick Zamora commented on ARROW-1983:


I submitted a PR to perform the metadata aggregation and metadata-only file 
write ([https://github.com/apache/arrow/pull/4405]).  I just syncronized with 
the master branch, so hopefully I can address any suggestions/concerns people 
have relatively quickly.

Are there any additional features that we need for "utilizing" the metadata 
file within arrow.parque itself?  I believe the existing read_metadata function 
should be sufficient for the needs of dask.

> [Python] Add ability to write parquet `_metadata` file
> --
>
> Key: ARROW-1983
> URL: https://issues.apache.org/jira/browse/ARROW-1983
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Jim Crist
>Priority: Major
>  Labels: beginner, parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file 
> (mostly just schema information). It would be useful to add the ability to 
> write a {{_metadata}} file as well. This should include information about 
> each row group in the dataset, including summary statistics. Having this 
> summary file would allow filtering of row groups without needing to access 
> each file beforehand.
> This would require that the user is able to get the written RowGroups out of 
> a {{pyarrow.parquet.write_table}} call and then give these objects as a list 
> to new function that then passes them on as C++ objects to {{parquet-cpp}} 
> that generates the respective {{_metadata}} file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2019-05-31 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853341#comment-16853341
 ] 

Wes McKinney commented on ARROW-5473:
-

The suspicious line is 
https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1259

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5473:
---

 Summary: [C++] Build failure on googletest_ep on Windows when 
using Ninja
 Key: ARROW-5473
 URL: https://issues.apache.org/jira/browse/ARROW-5473
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.14.0


I consistently get this error when trying to use Ninja locally:

{code}
-- extracting...
 
src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
 
dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
  file RENAME failed to rename


C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1

  to

C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep

  because: Directory not empty



[179/623] Building CXX object 
src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
ninja: build stopped: subcommand failed.
{code}

I'm running within cmdr terminal emulator so it's conceivable there's some path 
modifications that are causing issues.

The CMake invocation is

{code}
cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
-DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
 -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
-DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4504) [C++] Reduce the number of unit test executables

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4504:
---

Assignee: Wes McKinney

> [C++] Reduce the number of unit test executables
> 
>
> Key: ARROW-4504
> URL: https://issues.apache.org/jira/browse/ARROW-4504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Link times are a significant drag in MSVC builds. They don't affect Linux 
> nearly as much when building with Ninja. I suggest we combine some of the 
> fast-running tests within logical units to see if we can cut down from 106 
> test executables to 70 or so
> {code}
> 100% tests passed, 0 tests failed out of 107
> Label Time Summary:
> arrow-tests   =  21.19 sec*proc (48 tests)
> arrow_python-tests=   0.26 sec*proc (1 test)
> example   =   0.05 sec*proc (1 test)
> gandiva-tests =  11.65 sec*proc (39 tests)
> parquet-tests =  35.81 sec*proc (18 tests)
> unittest  =  68.92 sec*proc (106 tests)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5433) [C++][Parquet] improve parquet-reader columns information

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5433.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4403
[https://github.com/apache/arrow/pull/4403]

> [C++][Parquet] improve parquet-reader columns information
> -
>
> Key: ARROW-5433
> URL: https://issues.apache.org/jira/browse/ARROW-5433
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Renat Valiullin
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> replace column name by column path and better type information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5433) [C++][Parquet] improve parquet-reader columns information

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5433:
---

Assignee: Renat Valiullin

> [C++][Parquet] improve parquet-reader columns information
> -
>
> Key: ARROW-5433
> URL: https://issues.apache.org/jira/browse/ARROW-5433
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Renat Valiullin
>Assignee: Renat Valiullin
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> replace column name by column path and better type information



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4343) [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to docker-compose setup

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4343:

Fix Version/s: (was: 0.14.0)

> [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to 
> docker-compose setup
> -
>
> Key: ARROW-4343
> URL: https://issues.apache.org/jira/browse/ARROW-4343
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Until we formally stop supporting Trusty it would be useful to be able to 
> verify in Docker that builds work there. I still have an Ubuntu 14.04 machine 
> that I use (and I've been filing bugs that I find on it) but not sure for how 
> much longer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4159) [C++] Check for -Wdocumentation issues

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4159:
--
Labels: pull-request-available  (was: )

> [C++] Check for -Wdocumentation issues 
> ---
>
> Key: ARROW-4159
> URL: https://issues.apache.org/jira/browse/ARROW-4159
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux 
> distribution but not another, both with clang-6.0. Not sure why that is 
> exactly, but it would be good to try to reproduce and see if our CI can be 
> improved to catch these, or in worst case we could do it in one of our 
> docker-compose builds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4097) [C++] Add function to "conform" a dictionary array to a target new dictionary

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4097:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Add function to "conform" a dictionary array to a target new dictionary
> -
>
> Key: ARROW-4097
> URL: https://issues.apache.org/jira/browse/ARROW-4097
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> Follow up work to ARROW-554. 
> Unifying multiple dictionary-encoded arrays is one use case. Another is 
> rewriting a DictionaryArray to be based on another dictionary. For example, 
> this would be used to implement Cast from one dictionary type to another.
> This will need to be able to insert nulls where there are values that are not 
> found in the target dictionary
> see also discussion at 
> https://github.com/apache/arrow/pull/3165#discussion_r243025730



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5472) [Development] Add warning to PR merge tool if no JIRA component is set

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5472:
---

 Summary: [Development] Add warning to PR merge tool if no JIRA 
component is set
 Key: ARROW-5472
 URL: https://issues.apache.org/jira/browse/ARROW-5472
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 0.14.0


This will help with JIRA hygiene (there are over 300 resolved issues this 
moment with no component set)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5087) [Debian] APT repository no longer contains libarrow-dev

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5087:

Component/s: Packaging

> [Debian] APT repository no longer contains libarrow-dev
> ---
>
> Key: ARROW-5087
> URL: https://issues.apache.org/jira/browse/ARROW-5087
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Steven Fackler
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.13.0
>
>
> After following the Debian APT repository setup instructions in 
> [https://arrow.apache.org/install/], apt can no longer find the libarrow-dev 
> package:
> {noformat}
> root@674af4cba924:/# apt update
> Get:1 http://apt.llvm.org/stretch llvm-toolchain-stretch-7 InRelease [4235 B]
> Get:3 http://apt.llvm.org/stretch llvm-toolchain-stretch-7/main Sources [2506 
> B]
> Get:4 http://apt.llvm.org/stretch llvm-toolchain-stretch-7/main amd64 
> Packages [9063 B]
> Hit:2 http://security-cdn.debian.org/debian-security stretch/updates InRelease
> Ign:5 https://dl.bintray.com/apache/arrow/debian stretch InRelease
> Get:6 https://dl.bintray.com/apache/arrow/debian stretch Release [4087 B]
> Get:8 https://dl.bintray.com/apache/arrow/debian stretch Release.gpg [833 B]
> Ign:7 http://cdn-fastly.deb.debian.org/debian stretch InRelease
> Hit:9 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease
> Get:10 https://dl.bintray.com/apache/arrow/debian stretch/main amd64 Packages 
> [3036 B]
> Hit:11 http://cdn-fastly.deb.debian.org/debian stretch Release
> Fetched 23.8 kB in 0s (33.1 kB/s)
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> 1 package can be upgraded. Run 'apt list --upgradable' to see it.
> root@674af4cba924:/# apt install -y libarrow-dev
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> E: Unable to locate package libarrow-dev
> root@674af4cba924:/# apt search libarrow
> Sorting... Done
> Full Text Search... Done
> libarrow-cuda-glib-dev/unknown 0.13.0-1 amd64
> Apache Arrow is a data processing library for analysis
> libarrow-cuda-glib13/unknown 0.13.0-1 amd64
> Apache Arrow is a data processing library for analysis
> libarrow-cuda13/unknown 0.13.0-1 amd64
> Apache Arrow is a data processing library for analysis{noformat}
> This worked just fine last week, so I assume something bad happened with the 
> 0.13 release? The packages seem to be in bintray at least: 
> [https://bintray.com/apache/arrow/debian/0.13.0#files/debian%2Fpool%2Fstretch%2Fmain%2Fa%2Fapache-arrow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5218) [C++] Improve build when third-party library locations are specified

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5218:

Component/s: C++

> [C++] Improve build when third-party library locations are specified 
> -
>
> Key: ARROW-5218
> URL: https://issues.apache.org/jira/browse/ARROW-5218
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Deepak Majeti
>Assignee: Deepak Majeti
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The current CMake build system does not handle user specified third-party 
> library locations well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5418) [CI][R] Run code coverage and report to codecov.io

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5418:

Component/s: R

> [CI][R] Run code coverage and report to codecov.io
> --
>
> Key: ARROW-5418
> URL: https://issues.apache.org/jira/browse/ARROW-5418
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5430:
--
Labels: parquet pull-request-available  (was: parquet)

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet, pull-request-available
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the priority to minor here because it's easy enough to work around this 
> in user code unless you really need the 64 bit hash (and you probably 
> shouldn't be partitioning on that anyway).
> I could take a stab at writing a patch for this if there's interest?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853274#comment-16853274
 ] 

Neal Richardson commented on ARROW-5470:


Using xenial fixes the compilation error, but it breaks R by removing 
libgfortran. After successfully building the C++ library and moving on to 
installing R packages, R itself fails to start: 
{code:java}
$ Rscript -e 'install.packages(c("remotes"));if (!all(c("remotes") %in% 
installed.packages())) { q(status = 1, save = "no")}'
Error: package or namespace load failed for ‘stats’ in dyn.load(file, DLLpath = 
DLLpath, ...):
 unable to load shared object 
'/home/travis/R-bin/lib/R/library/stats/libs/stats.so':
  libgfortran.so.3: cannot open shared object file: No such file or directory
{code}
I confirmed that R is not broken before we build the C++ library: 
[https://github.com/nealrichardson/arrow/commit/d27d374488d500d329c67a58256e80d473b8]

It appears that on Xenial, `sudo apt-get install -q clang-7 clang-format-7 
clang-tidy-7` removes fortran 
[https://travis-ci.org/nealrichardson/arrow/jobs/539817605#L717-L750]:
{code:java}
The following packages will be REMOVED:
gfortran gfortran-5 libblas-dev libgfortran-5-dev libgfortran3 liblapack-dev
liblapack3
{code}
while on Trusty, no packages are removed: 
[https://travis-ci.org/apache/arrow/jobs/538795366#L1061-L1093]

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-5470:
--

Assignee: Neal Richardson

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853231#comment-16853231
 ] 

Neal Richardson commented on ARROW-5470:


Update: it wasn't the env var: 
[https://travis-ci.org/nealrichardson/arrow/jobs/539794848]

Next theory: OS/library version. R is using Trusty while GLib is on Xenial, so 
there are these version differences:

GLib:

-- Building using CMake version: 3.12.4

-- The C compiler identification is GNU 5.4.0

-- The CXX compiler identification is GNU 5.4.0

-- BOOST_VERSION: 1.67.0

(but later)

-- Boost version: 1.58.0

 

R: 

-- Building using CMake version: 3.9.2

-- The C compiler identification is GNU 4.8.4

-- The CXX compiler identification is GNU 4.8.4

-- BOOST_VERSION: 1.67.0

(but later)

-- Boost version: 1.54.0

> [CI] C++ local filesystem patch breaks Travis R job
> ---
>
> Key: ARROW-5470
> URL: https://issues.apache.org/jira/browse/ARROW-5470
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Priority: Blocker
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and 
> required downstream bindings to be updated. Romain wasn't immediately 
> available to update R, so we marked the R job on Travis as an "allowed 
> failure". That failure looked like this: 
> [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ 
> library built fine, but then the R package failed to build because it didn't 
> line up with what's in C++.
> Then, the C++ local file system patch 
> (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
> though we were still ignoring the R build, which continued to fail. But, it 
> started failing differently. Here's what the R build failure looks like on 
> that PR, and on master since then: 
> [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ 
> library is failing to build, so we're not even getting to the expected R 
> failure.
> For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
> setup to the R build, and it's still passing. One difference between the two 
> jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which 
> sounds related to some open R issues, and `boost::filesystem` appears all 
> over the error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5471) [C++][Gandiva]Array offset is ignored in Gandiva projector

2019-05-31 Thread Zeyuan Shang (JIRA)
Zeyuan Shang created ARROW-5471:
---

 Summary: [C++][Gandiva]Array offset is ignored in Gandiva projector
 Key: ARROW-5471
 URL: https://issues.apache.org/jira/browse/ARROW-5471
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Zeyuan Shang


I used the test case in 
[https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25],
 and found an issue when I was using the slice operator {{input_batch[1:]}}. It 
seems that the offset is ignored in the Gandiva projector.
{code:java}
import pyarrow as pa
import pyarrow.gandiva as gandiva

builder = gandiva.TreeExprBuilder()

field_a = pa.field('a', pa.int32())
field_b = pa.field('b', pa.int32())

schema = pa.schema([field_a, field_b])

field_result = pa.field('res', pa.int32())

node_a = builder.make_field(field_a)
node_b = builder.make_field(field_b)

condition = builder.make_function("greater_than", [node_a, node_b],
pa.bool_())
if_node = builder.make_if(condition, node_a, node_b, pa.int32())

expr = builder.make_expression(if_node, field_result)

projector = gandiva.make_projector(
schema, [expr], pa.default_memory_pool())

a = pa.array([10, 12, -20, 5], type=pa.int32())
b = pa.array([5, 15, 15, 17], type=pa.int32())
e = pa.array([10, 15, 15, 17], type=pa.int32())
input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b'])

r, = projector.evaluate(input_batch[1:])
print(r)
{code}
If we use the full record batch {{input_batch}}, the expected output is {{[10, 
15, 15, 17]}}. So if we use {{input_batch[1:]}}, the expected output should be 
{{[15, 15, 17]}}, however this script returned {{[10, 15, 15]}}. It seems that 
the projector ignores the offset and always reads from 0.

 

A corresponding issue is created in GitHub as well 
[https://github.com/apache/arrow/issues/4420]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-5467:
--

Assignee: Sebastien Binet

> [Go] implement read/write IPC for Time32/Time64 arrays
> --
>
> Key: ARROW-5467
> URL: https://issues.apache.org/jira/browse/ARROW-5467
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-5468:
--

Assignee: Sebastien Binet

> [Go] implement read/write IPC for Timestamp arrays
> --
>
> Key: ARROW-5468
> URL: https://issues.apache.org/jira/browse/ARROW-5468
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-5469:
--

Assignee: Sebastien Binet

> [Go] implement read/write IPC for Date32/Date64 arrays
> --
>
> Key: ARROW-5469
> URL: https://issues.apache.org/jira/browse/ARROW-5469
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5266) [Go] implement read/write IPC for Float16

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet reassigned ARROW-5266:
--

Assignee: Sebastien Binet

> [Go] implement read/write IPC for Float16
> -
>
> Key: ARROW-5266
> URL: https://issues.apache.org/jira/browse/ARROW-5266
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job

2019-05-31 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5470:
--

 Summary: [CI] C++ local filesystem patch breaks Travis R job
 Key: ARROW-5470
 URL: https://issues.apache.org/jira/browse/ARROW-5470
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Neal Richardson
 Fix For: 0.14.0


https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and required 
downstream bindings to be updated. Romain wasn't immediately available to 
update R, so we marked the R job on Travis as an "allowed failure". That 
failure looked like this: 
[https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ library 
built fine, but then the R package failed to build because it didn't line up 
with what's in C++.

Then, the C++ local file system patch 
(https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, 
though we were still ignoring the R build, which continued to fail. But, it 
started failing differently. Here's what the R build failure looks like on that 
PR, and on master since then: 
[https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ library 
is failing to build, so we're not even getting to the expected R failure.

For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar 
setup to the R build, and it's still passing. One difference between the two 
jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which sounds 
related to some open R issues, and `boost::filesystem` appears all over the 
error in the R job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5469:
--
Labels: pull-request-available  (was: )

> [Go] implement read/write IPC for Date32/Date64 arrays
> --
>
> Key: ARROW-5469
> URL: https://issues.apache.org/jira/browse/ARROW-5469
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5467:
--
Labels: pull-request-available  (was: )

> [Go] implement read/write IPC for Time32/Time64 arrays
> --
>
> Key: ARROW-5467
> URL: https://issues.apache.org/jira/browse/ARROW-5467
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5468:
--
Labels: pull-request-available  (was: )

> [Go] implement read/write IPC for Timestamp arrays
> --
>
> Key: ARROW-5468
> URL: https://issues.apache.org/jira/browse/ARROW-5468
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5266) [Go] implement read/write IPC for Float16

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5266:
--
Labels: pull-request-available  (was: )

> [Go] implement read/write IPC for Float16
> -
>
> Key: ARROW-5266
> URL: https://issues.apache.org/jira/browse/ARROW-5266
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-3571:
---
Priority: Blocker  (was: Major)

> [Wiki] Release management guide does not explain how to set up Crossbow or 
> where to find instructions
> -
>
> Key: ARROW-3571
> URL: https://issues.apache.org/jira/browse/ARROW-3571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Wiki
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Blocker
> Fix For: 0.14.0
>
>
> If you follow the guide, at one point it says "Launch a Crossbow build" but 
> provides no link to the setup instructions for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >