[jira] [Resolved] (ARROW-5289) [C++] Move arrow/util/concatenate.h to arrow/array/
[ https://issues.apache.org/jira/browse/ARROW-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-5289. Resolution: Fixed Issue resolved by pull request 4445 [https://github.com/apache/arrow/pull/4445] > [C++] Move arrow/util/concatenate.h to arrow/array/ > --- > > Key: ARROW-5289 > URL: https://issues.apache.org/jira/browse/ARROW-5289 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I think this would be a better location for array/columnar algorithms > Please wait until after ARROW-3144 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5289) [C++] Move arrow/util/concatenate.h to arrow/array/
[ https://issues.apache.org/jira/browse/ARROW-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield reassigned ARROW-5289: -- Assignee: Wes McKinney > [C++] Move arrow/util/concatenate.h to arrow/array/ > --- > > Key: ARROW-5289 > URL: https://issues.apache.org/jira/browse/ARROW-5289 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I think this would be a better location for array/columnar algorithms > Please wait until after ARROW-3144 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5456) [GLib][Plasma] Installed plasma-glib may be used on building document
[ https://issues.apache.org/jira/browse/ARROW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yosuke Shiro resolved ARROW-5456. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4425 [https://github.com/apache/arrow/pull/4425] > [GLib][Plasma] Installed plasma-glib may be used on building document > - > > Key: ARROW-5456 > URL: https://issues.apache.org/jira/browse/ARROW-5456 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.13.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2719) [Python/C++] ArrowSchema not hashable
[ https://issues.apache.org/jira/browse/ARROW-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2719: Fix Version/s: (was: 0.14.0) > [Python/C++] ArrowSchema not hashable > - > > Key: ARROW-2719 > URL: https://issues.apache.org/jira/browse/ARROW-2719 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Florian Jetter >Priority: Minor > > The arrow schema is immutable and should provide a way of hashing itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
[ https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2256: Fix Version/s: (was: 0.14.0) > [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos > > > Key: ARROW-2256 > URL: https://issues.apache.org/jira/browse/ARROW-2256 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > > I did a clean upgrade to 16.04 on one of my machine and ran into the problem > described here: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087 > I think this can be resolved temporarily by symlinking the static library, > but we should document the problem so other devs know what to do when it > happens -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
[ https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853514#comment-16853514 ] Wes McKinney commented on ARROW-2256: - I can't get fuzzing working at all on Ubuntu 19.04. The error looks like this {code} $ ./debug/arrow-ipc-fuzzing-test INFO: Seed: 3163524211 INFO: Loaded 1 modules (33926 guards): 33926 [0xd15918, 0xd36b30), INFO: Loaded 1 modules (143 inline 8-bit counters): 143 [0xd36b30, 0xd36bbf), INFO: Loaded 1 PC tables (143 PCs): 143 [0xd36bc0,0xd374b0), ERROR: The size of coverage PC tables does not match the number of instrumented PCs. This might be a compiler bug, please contact the libFuzzer developers. Also check https://bugs.llvm.org/show_bug.cgi?id=34636 for possible workarounds (tl;dr: don't use the old GNU ld) {code} There's a long thread about it here https://groups.google.com/forum/#!topic/llvm-dev/fnDXbyduLjw and https://github.com/google/oss-fuzz/issues/1042 > [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos > > > Key: ARROW-2256 > URL: https://issues.apache.org/jira/browse/ARROW-2256 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I did a clean upgrade to 16.04 on one of my machine and ran into the problem > described here: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087 > I think this can be resolved temporarily by symlinking the static library, > but we should document the problem so other devs know what to do when it > happens -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds
[ https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2248: Labels: nightly (was: ) > [Python] Nightly or on-demand HDFS test builds > -- > > Key: ARROW-2248 > URL: https://issues.apache.org/jira/browse/ARROW-2248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > Fix For: 0.14.0 > > > We continue to acquire more functionality related to HDFS and Parquet. > Testing this, including tests that involve interoperability with other > systems, like Spark, will require some work outside of our normal CI > infrastructure. > I suggest we start with testing the C++/Python HDFS integration, which will > help with validating patches like ARROW-1643 > https://github.com/apache/arrow/pull/1668 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds
[ https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2248: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Nightly or on-demand HDFS test builds > -- > > Key: ARROW-2248 > URL: https://issues.apache.org/jira/browse/ARROW-2248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > Fix For: 0.15.0 > > > We continue to acquire more functionality related to HDFS and Parquet. > Testing this, including tests that involve interoperability with other > systems, like Spark, will require some work outside of our normal CI > infrastructure. > I suggest we start with testing the C++/Python HDFS integration, which will > help with validating patches like ARROW-1643 > https://github.com/apache/arrow/pull/1668 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1581: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: nightly > Fix For: 0.15.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1581: Labels: nightly (was: ) > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: nightly > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS
[ https://issues.apache.org/jira/browse/ARROW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1582: Labels: nightly (was: ) > [Python] Set up + document nightly conda builds for macOS > - > > Key: ARROW-1582 > URL: https://issues.apache.org/jira/browse/ARROW-1582 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > Fix For: 0.14.0 > > > It's already been great to be able to test the nightlies on Linux in conda; > it would be great to be able to do the same on macOS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS
[ https://issues.apache.org/jira/browse/ARROW-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1582: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Set up + document nightly conda builds for macOS > - > > Key: ARROW-1582 > URL: https://issues.apache.org/jira/browse/ARROW-1582 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > Fix For: 0.15.0 > > > It's already been great to be able to test the nightlies on Linux in conda; > it would be great to be able to do the same on macOS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5289) [C++] Move arrow/util/concatenate.h to arrow/array/
[ https://issues.apache.org/jira/browse/ARROW-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5289: -- Labels: pull-request-available (was: ) > [C++] Move arrow/util/concatenate.h to arrow/array/ > --- > > Key: ARROW-5289 > URL: https://issues.apache.org/jira/browse/ARROW-5289 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > I think this would be a better location for array/columnar algorithms > Please wait until after ARROW-3144 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas
[ https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-843: --- Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Parquet merging unequal but equivalent schemas > > > Key: ARROW-843 > URL: https://issues.apache.org/jira/browse/ARROW-843 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: dataset, parquet > Fix For: 0.15.0 > > > Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL > repetition types. While such schemas aren't strictly equal, we will need to > consider them equivalent on the read path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5475) [Python] Add Python binding for arrow::Concatenate
Wes McKinney created ARROW-5475: --- Summary: [Python] Add Python binding for arrow::Concatenate Key: ARROW-5475 URL: https://issues.apache.org/jira/browse/ARROW-5475 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.15.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1279) Integration tests for Map type
[ https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1279: -- Labels: pull-request-available (was: ) > Integration tests for Map type > -- > > Key: ARROW-1279 > URL: https://issues.apache.org/jira/browse/ARROW-1279 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Integration, Java >Reporter: Wes McKinney >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?
[ https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853481#comment-16853481 ] Wes McKinney commented on ARROW-5474: - We don't have much regular feedback about older Boost versions, it mainly happens around releases, though it would not be a huge effort to set up a Docker job to test with the Boost versions coming from different Linux distributions. Gandiva doesn't build with Boost 1.54 on Ubuntu 14.04, for example, see ARROW-4868 > [C++] What version of Boost do we require now? > -- > > Key: ARROW-5474 > URL: https://issues.apache.org/jira/browse/ARROW-5474 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One > possible cause for that error is that the local filesystem patch increased > the version of boost that we actually require. The boost version (1.54 vs > 1.58) was one difference between failure and success. > Another point of confusion was that CMake reported two different versions of > boost at different times. > If we require a minimum version of boost, can we document that better, check > for it more accurately in the build scripts, and fail with a useful message > if that minimum isn't met? Or something else helpful. > If the actual cause of the failure was something else (e.g. compiler > version), we should figure that out too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4482) [Website] Add blog archive page
[ https://issues.apache.org/jira/browse/ARROW-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson reassigned ARROW-4482: -- Assignee: Neal Richardson > [Website] Add blog archive page > --- > > Key: ARROW-4482 > URL: https://issues.apache.org/jira/browse/ARROW-4482 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Wes McKinney >Assignee: Neal Richardson >Priority: Major > Fix For: 0.15.0 > > > There's no easy way to get a bulleted list of all blog posts on the Arrow > website. See example archive on my personal blog > http://wesmckinney.com/archives.html > It would be great to have such a generated archive on our website -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5474) [C++] What version of Boost do we require now?
Neal Richardson created ARROW-5474: -- Summary: [C++] What version of Boost do we require now? Key: ARROW-5474 URL: https://issues.apache.org/jira/browse/ARROW-5474 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Neal Richardson Assignee: Antoine Pitrou Fix For: 0.14.0 See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One possible cause for that error is that the local filesystem patch increased the version of boost that we actually require. The boost version (1.54 vs 1.58) was one difference between failure and success. Another point of confusion was that CMake reported two different versions of boost at different times. If we require a minimum version of boost, can we document that better, check for it more accurately in the build scripts, and fail with a useful message if that minimum isn't met? Or something else helpful. If the actual cause of the failure was something else (e.g. compiler version), we should figure that out too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5470. - Resolution: Fixed Issue resolved by pull request 4443 [https://github.com/apache/arrow/pull/4443] > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-1642) [GLib] Build GLib using Meson in Appveyor
[ https://issues.apache.org/jira/browse/ARROW-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-1642. - Resolution: Duplicate Assignee: Kouhei Sutou Fix Version/s: 0.13.0 Arrow GLib's AppVeyor build was added by ARROW-4353. It uses MinGW not Visual Studio. We should create anew JIRA issue when we need Visual Studio build. > [GLib] Build GLib using Meson in Appveyor > - > > Key: ARROW-1642 > URL: https://issues.apache.org/jira/browse/ARROW-1642 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Wes McKinney >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4159) [C++] Check for -Wdocumentation issues
[ https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4159. - Resolution: Fixed Issue resolved by pull request 4441 [https://github.com/apache/arrow/pull/4441] > [C++] Check for -Wdocumentation issues > --- > > Key: ARROW-4159 > URL: https://issues.apache.org/jira/browse/ARROW-4159 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux > distribution but not another, both with clang-6.0. Not sure why that is > exactly, but it would be good to try to reproduce and see if our CI can be > improved to catch these, or in worst case we could do it in one of our > docker-compose builds -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4159) [C++] Check for -Wdocumentation issues
[ https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4159: --- Assignee: Wes McKinney > [C++] Check for -Wdocumentation issues > --- > > Key: ARROW-4159 > URL: https://issues.apache.org/jira/browse/ARROW-4159 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux > distribution but not another, both with clang-6.0. Not sure why that is > exactly, but it would be good to try to reproduce and see if our CI can be > improved to catch these, or in worst case we could do it in one of our > docker-compose builds -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853433#comment-16853433 ] Wes McKinney commented on ARROW-1983: - Yes. I don't think it necessarily to resolve all of this in a single patch, so we can open a follow-up JIRA to implement the optimization to read a row group given a _metadata file. There is some other complexity there such as how to open the filepath (you need a FileSystem handle -- see the filesystem API work that is in process) > [Python] Add ability to write parquet `_metadata` file > -- > > Key: ARROW-1983 > URL: https://issues.apache.org/jira/browse/ARROW-1983 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Jim Crist >Priority: Major > Labels: beginner, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file > (mostly just schema information). It would be useful to add the ability to > write a {{_metadata}} file as well. This should include information about > each row group in the dataset, including summary statistics. Having this > summary file would allow filtering of row groups without needing to access > each file beforehand. > This would require that the user is able to get the written RowGroups out of > a {{pyarrow.parquet.write_table}} call and then give these objects as a list > to new function that then passes them on as C++ objects to {{parquet-cpp}} > that generates the respective {{_metadata}} file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853433#comment-16853433 ] Wes McKinney edited comment on ARROW-1983 at 5/31/19 9:28 PM: -- Yes. I don't think it is necessary to resolve all of this in a single patch, so we can open a follow-up JIRA to implement the optimization to read a row group given a _metadata file. There is some other complexity there such as how to open the filepath (you need a FileSystem handle -- see the filesystem API work that is in process) was (Author: wesmckinn): Yes. I don't think it necessarily to resolve all of this in a single patch, so we can open a follow-up JIRA to implement the optimization to read a row group given a _metadata file. There is some other complexity there such as how to open the filepath (you need a FileSystem handle -- see the filesystem API work that is in process) > [Python] Add ability to write parquet `_metadata` file > -- > > Key: ARROW-1983 > URL: https://issues.apache.org/jira/browse/ARROW-1983 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Jim Crist >Priority: Major > Labels: beginner, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file > (mostly just schema information). It would be useful to add the ability to > write a {{_metadata}} file as well. This should include information about > each row group in the dataset, including summary statistics. Having this > summary file would allow filtering of row groups without needing to access > each file beforehand. > This would require that the user is able to get the written RowGroups out of > a {{pyarrow.parquet.write_table}} call and then give these objects as a list > to new function that then passes them on as C++ objects to {{parquet-cpp}} > that generates the respective {{_metadata}} file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests
[ https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1837: Priority: Major (was: Blocker) > [Java] Unable to read unsigned integers outside signed range for bit width in > integration tests > --- > > Key: ARROW-1837 > URL: https://issues.apache.org/jira/browse/ARROW-1837 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Wes McKinney >Assignee: Micah Kornfield >Priority: Major > Labels: columnar-format-1.0, pull-request-available > Fix For: 0.14.0 > > Attachments: generated_primitive.json > > Time Spent: 0.5h > Remaining Estimate: 0h > > I believe this was introduced recently (perhaps in the refactors), but there > was a problem where the integration tests weren't being properly run that hid > the error from us > see https://github.com/apache/arrow/pull/1294#issuecomment-345553066 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions
[ https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3571: Priority: Major (was: Blocker) > [Wiki] Release management guide does not explain how to set up Crossbow or > where to find instructions > - > > Key: ARROW-3571 > URL: https://issues.apache.org/jira/browse/ARROW-3571 > Project: Apache Arrow > Issue Type: Improvement > Components: Wiki >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Fix For: 0.14.0 > > > If you follow the guide, at one point it says "Launch a Crossbow build" but > provides no link to the setup instructions for this -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853430#comment-16853430 ] Rick Zamora commented on ARROW-1983: Right, I see what you are saying. You can pass in a list of files to pq.ParquetDataset (obtained by calling read_metadata on the metadata file), but the footer metadata will be unecessarily parsed a second time. For dask, this is probably not much of an issue, because each worker will only be dealing with a subset of the global dataset. In many other cases this is clearly undesireable. > [Python] Add ability to write parquet `_metadata` file > -- > > Key: ARROW-1983 > URL: https://issues.apache.org/jira/browse/ARROW-1983 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Jim Crist >Priority: Major > Labels: beginner, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file > (mostly just schema information). It would be useful to add the ability to > write a {{_metadata}} file as well. This should include information about > each row group in the dataset, including summary statistics. Having this > summary file would allow filtering of row groups without needing to access > each file beforehand. > This would require that the user is able to get the written RowGroups out of > a {{pyarrow.parquet.write_table}} call and then give these objects as a list > to new function that then passes them on as C++ objects to {{parquet-cpp}} > that generates the respective {{_metadata}} file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries
[ https://issues.apache.org/jira/browse/ARROW-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853425#comment-16853425 ] David Li edited comment on ARROW-5143 at 5/31/19 9:05 PM: -- [~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282], the non-nested dictionary case worked with some additional effort, so that PR enables that. I didn't look into why the nested case still fails. was (Author: lidavidm): [~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282,] the non-nested dictionary case worked with some additional effort, so that PR enables that. I didn't look into why the nested case still fails. > [Flight] Enable integration testing of batches with dictionaries > > > Key: ARROW-5143 > URL: https://issues.apache.org/jira/browse/ARROW-5143 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC, Integration >Reporter: David Li >Priority: Major > Labels: flight > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries
[ https://issues.apache.org/jira/browse/ARROW-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853425#comment-16853425 ] David Li commented on ARROW-5143: - [~wesmckinn] I tried in [https://github.com/apache/arrow/pull/4282,] the non-nested dictionary case worked with some additional effort, so that PR enables that. I didn't look into why the nested case still fails. > [Flight] Enable integration testing of batches with dictionaries > > > Key: ARROW-5143 > URL: https://issues.apache.org/jira/browse/ARROW-5143 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC, Integration >Reporter: David Li >Priority: Major > Labels: flight > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5396) [JS] Ensure reader and writer support files and streams with no RecordBatches
[ https://issues.apache.org/jira/browse/ARROW-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5396. - Resolution: Fixed Issue resolved by pull request 4373 [https://github.com/apache/arrow/pull/4373] > [JS] Ensure reader and writer support files and streams with no RecordBatches > - > > Key: ARROW-5396 > URL: https://issues.apache.org/jira/browse/ARROW-5396 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: 0.13.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Re: https://issues.apache.org/jira/browse/ARROW-2119 and > [https://github.com/apache/arrow/pull/3871], the JS reader and writer should > support files and streams with a Schema but no RecordBatches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
[ https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-5055: Fix Version/s: 0.15.0 > [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby > > > Key: ARROW-5055 > URL: https://issues.apache.org/jira/browse/ARROW-5055 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 0.12.1 > Environment: windows, MSYS2 >Reporter: Dominic Sisneros >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.15.0 > > > MSYS2 doesn't include the parquet libraries so cannot use red-parquet which > uses gobject-introspection against libparquet -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
[ https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853423#comment-16853423 ] Kouhei Sutou commented on ARROW-5055: - 0.14.0 includes Parquet support with MinGW build. We can close this when 0.14.0 is released and https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-arrow/PKGBUILD is updated. > [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby > > > Key: ARROW-5055 > URL: https://issues.apache.org/jira/browse/ARROW-5055 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 0.12.1 > Environment: windows, MSYS2 >Reporter: Dominic Sisneros >Assignee: Kouhei Sutou >Priority: Major > > MSYS2 doesn't include the parquet libraries so cannot use red-parquet which > uses gobject-introspection against libparquet -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5055) [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
[ https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-5055: Summary: [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby (was: [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby) > [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby > > > Key: ARROW-5055 > URL: https://issues.apache.org/jira/browse/ARROW-5055 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 0.12.1 > Environment: windows, MSYS2 >Reporter: Dominic Sisneros >Assignee: Kouhei Sutou >Priority: Major > > MSYS2 doesn't include the parquet libraries so cannot use red-parquet which > uses gobject-introspection against libparquet -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5055) [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby
[ https://issues.apache.org/jira/browse/ARROW-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-5055: --- Assignee: Kouhei Sutou > [Ruby][Msys2] libparquet needs to be installed in Msys2 for ruby > > > Key: ARROW-5055 > URL: https://issues.apache.org/jira/browse/ARROW-5055 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby >Affects Versions: 0.12.1 > Environment: windows, MSYS2 >Reporter: Dominic Sisneros >Assignee: Kouhei Sutou >Priority: Major > > MSYS2 doesn't include the parquet libraries so cannot use red-parquet which > uses gobject-introspection against libparquet -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5345) [C++] Relax Field hashing in DictionaryMemo
[ https://issues.apache.org/jira/browse/ARROW-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5345: Fix Version/s: (was: 0.14.0) > [C++] Relax Field hashing in DictionaryMemo > --- > > Key: ARROW-5345 > URL: https://issues.apache.org/jira/browse/ARROW-5345 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Follow up to ARROW-3144 > Currently we associate dictionaries with a hash table mapping a Field's > memory address to a dictionary id. This poses an issue if two RecordBatches > are equal (equal field names, equal types) but were instantiated separately. > We don't have a hash function in C++ for Field so we should consider > implementing one and using that instead (if it is not too expensive) so that > same but "different" (different C++ objects) won't blow up in the user's face > with an unintuitive error (this did in fact occur once in the Python test > suite, not sure exactly why it wasn't a problem before, I think it worked "by > accident") -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4021) [Ruby] Error building red-arrow on msys2
[ https://issues.apache.org/jira/browse/ARROW-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-4021. - Resolution: Fixed Fix Version/s: 0.14.0 I think that this is pkg-config gem issue. And it has been fixed. > [Ruby] Error building red-arrow on msys2 > > > Key: ARROW-4021 > URL: https://issues.apache.org/jira/browse/ARROW-4021 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby > Environment: windows 7, ruby >Reporter: Dominic Sisneros >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.14.0 > > Attachments: gem_make.out, mkmf.log > > > Trying to install red-arrow on ruby version 2.5.3 and it doesn't compile. I > installed arrow with msys2 > "mingw64/mingw-w64-x86_64-arrow 0.11.1-1 [installed] > Apache Arrow is a cross-language development platform for in-memory data > (mingw-w64)" > C:\Users\Dominic E > Sisneros\Documents\work_new\projects\nexcom\dominic\SLCI_RTR\drawings\working>ruby > --version > ruby 2.5.3p105 (2018-10-18 revision 65156) [x64-mingw32] > E:\Sisneros\Documents\work_new\projects\nexcom\dominic\SLCI_RTR\drawings\working>gem > install red-arrow > g required msys2 packages: mingw-w64-x86_64-glib2 > mingw-w64-x86_64-glib2-2.58.1-1 is up to date -- skipping > native extensions. This could take a while... > rror installing red-arrow: > RROR: Failed to build gem native extension. > nt directory: > E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2 > /rubyinstaller-2.5.3-1-x64/bin/ruby.exe -r > ./siteconf20181213-23396-1gomjgx.rb extconf.rb > for --enable-debug-build option... no > for -Wall option to compiler... yes > for -Waggregate-return option to compiler... yes > for -Wcast-align option to compiler... yes > for -Wextra option to compiler... yes > for -Wformat=2 option to compiler... yes > for -Winit-self option to compiler... yes > for -Wlarger-than-65500 option to compiler... yes > for -Wmissing-declarations option to compiler... yes > for -Wmissing-format-attribute option to compiler... yes > for -Wmissing-include-dirs option to compiler... yes > for -Wmissing-noreturn option to compiler... yes > for -Wmissing-prototypes option to compiler... yes > for -Wnested-externs option to compiler... yes > for -Wold-style-definition option to compiler... yes > for -Wpacked option to compiler... yes > for -Wp,-D_FORTIFY_SOURCE=2 option to compiler... yes > for -Wpointer-arith option to compiler... yes > for -Wswitch-default option to compiler... yes > for -Wswitch-enum option to compiler... yes > for -Wundef option to compiler... yes > for -Wout-of-line-declaration option to compiler... no > for -Wunsafe-loop-optimizations option to compiler... yes > for -Wwrite-strings option to compiler... yes > for Windows... yes > for gobject-2.0 version (>= 2.12.0)... yes > for gthread-2.0... yes > for unistd.h... yes > for io.h... yes > for g_spawn_close_pid() in glib.h... no > for g_thread_init() in glib.h... no > for g_main_depth() in glib.h... no > for g_listenv() in glib.h... no > for rb_check_array_type() in ruby.h... yes > for rb_check_hash_type() in ruby.h... yes > for rb_exec_recursive() in ruby.h... yes > for rb_errinfo() in ruby.h... yes > for rb_thread_call_without_gvl() in ruby.h... yes > for ruby_native_thread_p() in ruby.h... yes > for rb_thread_call_with_gvl() in ruby.h... yes > for rb_gc_register_mark_object() in ruby.h... yes > for rb_exc_new_str() in ruby.h... yes > for rb_enc_str_new_static() in ruby.h... yes > for curr_thread in ruby.h,node.h... no > for rb_curr_thread in ruby.h,node.h... no > ruby-glib2.pc > glib-enum-types.c > glib-enum-types.h > Makefile > irectory: > E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2 > TDIR=" clean > irectory: > E:/rubies/rubyinstaller-2.5.3-1-x64/lib/ruby/gems/2.5.0/gems/glib2-3.3.0/ext/glib2 > TDIR=" > glib-enum-types.c > rbglib-bytes.c > rbglib-gc.c > .c: In function 'gc_marker_mark_each': > .c:26:30: warning: unused parameter 'key' [-Wunused-parameter] > r_mark_each(gpointer key, gpointer value, gpointer user_data) > ~^~~ > .c:26:60: warning: unused parameter 'user_data' [-Wunused-parameter] > r_mark_each(gpointer key, gpointer value, gpointer user_data) > ~^ > .c: At top level: > .c:53:5: warning: missing initializer for field 'reserved' of 'struct > ' [-Wmissing-field-initializers] > ncluded from E:/rubies/rubyinstaller-2.5.3-1-x64/include/ruby-2.5.0/ruby.h:33, > from rbgobject.h:27, > from rbgprivate.h:33, > from rbglib-gc.c:21: > /rubyinstaller-2.5.3-1-x64/include/ruby-2.5.0/ruby/ruby.h:1088:8: note: > 'reserved' declared here > eserved[2]; /* For future extension. > ~~~ > rbglib-variant-type.c > rbglib-variant.c > rbglib.c > In function 'rbg_scan_o
[jira] [Commented] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly
[ https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853418#comment-16853418 ] Wes McKinney commented on ARROW-5138: - I think we should change the RangeIndex optimization to only do so for trivial RangeIndex starting at 0 and with step 1. Then this issue is resolved > [Python/C++] Row group retrieval doesn't restore index properly > --- > > Key: ARROW-5138 > URL: https://issues.apache.org/jira/browse/ARROW-5138 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Florian Jetter >Priority: Minor > Labels: parquet > Fix For: 0.14.0 > > > When retrieving row groups the index is no longer properly restored to its > initial value and is set to an range index starting at zero no matter what. > version 0.12.1 restored and int64 index with the correct index values. > {code:python} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > print(pa.__version__) > df = pd.DataFrame( > {"a": [1, 2, 3, 4]} > ) > print("total DF") > print(df.index) > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table(table, buf, chunk_size=2) > reader = pa.BufferReader(buf.getvalue().to_pybytes()) > parquet_file = pq.ParquetFile(reader) > rg = parquet_file.read_row_group(1) > df_restored = rg.to_pandas() > print("Row group") > print(df_restored.index) > {code} > Previous behavior > {code:python} > 0.12.1 > total DF > RangeIndex(start=0, stop=4, step=1) > Row group > Int64Index([2, 3], dtype='int64') > {code} > Behavior now > {code:python} > 0.13.0 > total DF > RangeIndex(start=0, stop=4, step=1) > Row group > RangeIndex(start=0, stop=2, step=1) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5143) [Flight] Enable integration testing of batches with dictionaries
[ https://issues.apache.org/jira/browse/ARROW-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853419#comment-16853419 ] Wes McKinney commented on ARROW-5143: - This might work now, [~lidavidm] can you give it a try to see? > [Flight] Enable integration testing of batches with dictionaries > > > Key: ARROW-5143 > URL: https://issues.apache.org/jira/browse/ARROW-5143 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC, Integration >Reporter: David Li >Priority: Major > Labels: flight > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels
[ https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853417#comment-16853417 ] Wes McKinney commented on ARROW-5082: - In investigating the wheels I found that something is wrong wrong with the shared library symlinks causing the shared libs to be duplicated. I do not know how much this impacts the final wheel size. Until that issue is fixed at least, I'm not comfortable releasing the project again > [Python][Packaging] Reduce size of macOS and manylinux1 wheels > -- > > Key: ARROW-5082 > URL: https://issues.apache.org/jira/browse/ARROW-5082 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Blocker > Fix For: 0.14.0 > > > The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is > mostly because of LLVM but we should take a closer look to see if the size > can be reduced -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels
[ https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853417#comment-16853417 ] Wes McKinney edited comment on ARROW-5082 at 5/31/19 8:52 PM: -- In investigating the wheels I found that something is wrong with the shared library symlinks causing the shared libs to be duplicated. I do not know how much this impacts the final wheel size. Until that issue is fixed at least, I'm not comfortable releasing the project again was (Author: wesmckinn): In investigating the wheels I found that something is wrong wrong with the shared library symlinks causing the shared libs to be duplicated. I do not know how much this impacts the final wheel size. Until that issue is fixed at least, I'm not comfortable releasing the project again > [Python][Packaging] Reduce size of macOS and manylinux1 wheels > -- > > Key: ARROW-5082 > URL: https://issues.apache.org/jira/browse/ARROW-5082 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Blocker > Fix For: 0.14.0 > > > The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is > mostly because of LLVM but we should take a closer look to see if the size > can be reduced -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5082) [Python][Packaging] Reduce size of macOS and manylinux1 wheels
[ https://issues.apache.org/jira/browse/ARROW-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5082: Priority: Blocker (was: Major) > [Python][Packaging] Reduce size of macOS and manylinux1 wheels > -- > > Key: ARROW-5082 > URL: https://issues.apache.org/jira/browse/ARROW-5082 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Blocker > Fix For: 0.14.0 > > > The wheels more than tripled in size from 0.12.0 to 0.13.0. I think this is > mostly because of LLVM but we should take a closer look to see if the size > can be reduced -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5415) [Release] Release script should update R version everywhere
[ https://issues.apache.org/jira/browse/ARROW-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson reassigned ARROW-5415: -- Assignee: Neal Richardson > [Release] Release script should update R version everywhere > --- > > Key: ARROW-5415 > URL: https://issues.apache.org/jira/browse/ARROW-5415 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: release > Fix For: 0.14.0 > > > See [https://github.com/apache/arrow/pull/4322#discussion_r287151330.] There > are probably other places that should be updated (NEWS.md, which doesn't yet > exist but needs to). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5033) [C++] JSON table writer
[ https://issues.apache.org/jira/browse/ARROW-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5033: --- Assignee: (was: Benjamin Kietzman) > [C++] JSON table writer > --- > > Key: ARROW-5033 > URL: https://issues.apache.org/jira/browse/ARROW-5033 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Priority: Minor > > Users who need to emit json in line delimited format currently cannot do so > using arrow. It should be straightforward to implement this efficiently, and > it will be very helpful for testing and benchmarking -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5033) [C++] JSON table writer
[ https://issues.apache.org/jira/browse/ARROW-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5033: Fix Version/s: (was: 0.14.0) > [C++] JSON table writer > --- > > Key: ARROW-5033 > URL: https://issues.apache.org/jira/browse/ARROW-5033 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > > Users who need to emit json in line delimited format currently cannot do so > using arrow. It should be straightforward to implement this efficiently, and > it will be very helpful for testing and benchmarking -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5036) [Plasma][C++] Serialization tests resort to memcpy to check equality
[ https://issues.apache.org/jira/browse/ARROW-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5036: Fix Version/s: (was: 0.14.0) > [Plasma][C++] Serialization tests resort to memcpy to check equality > > > Key: ARROW-5036 > URL: https://issues.apache.org/jira/browse/ARROW-5036 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Francois Saint-Jacques >Priority: Major > > {code:bash} > 1: > /tmp/arrow-0.13.0.Q4czW/apache-arrow-0.13.0/cpp/src/plasma/test/serialization_tests.cc:193: > Failure > 1: Expected equality of these values: > 1: memcmp(&plasma_objects[object_ids[0]], &plasma_objects_return[0], > sizeof(PlasmaObject)) > 1: Which is: 45 > 1: 0 > 1: [ FAILED ] PlasmaSerialization.GetReply (0 ms) > {code} > The source of the problem is the random_plasma_object stack allocated object. > As a fix, I propose that PlasmaObject implements the `operator==` method and > drops the memcpy equality check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5028) [Python][C++] Arrow to Parquet conversion drops and corrupts values
[ https://issues.apache.org/jira/browse/ARROW-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853414#comment-16853414 ] Wes McKinney commented on ARROW-5028: - [~marco.neumann.by] have you been able to make any progress with this? > [Python][C++] Arrow to Parquet conversion drops and corrupts values > --- > > Key: ARROW-5028 > URL: https://issues.apache.org/jira/browse/ARROW-5028 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.13.0 > Environment: python 3.6 >Reporter: Marco Neumann >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > Attachments: dct.pickle.gz > > > I am sorry if this bugs feels rather long and the reproduction data is large, > but I was not able to reduce the data even further while still triggering the > problem. I was able to trigger this behavior on master and on {{0.11.1}}. > {code:python} > import io > import os.path > import pickle > import numpy as np > import pyarrow as pa > import pyarrow.parquet as pq > def dct_to_table(index_dct): > labeled_array = pa.array(np.array(list(index_dct.keys( > partition_array = pa.array(np.array(list(index_dct.values( > return pa.Table.from_arrays( > [labeled_array, partition_array], names=['a', 'b'] > ) > def check_pq_nulls(data): > fp = io.BytesIO(data) > pfile = pq.ParquetFile(fp) > assert pfile.num_row_groups == 1 > md = pfile.metadata.row_group(0) > col = md.column(1) > assert col.path_in_schema == 'b.list.item' > assert col.statistics.null_count == 0 # fails > def roundtrip(table): > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > data = buf.getvalue().to_pybytes() > # this fails: > # check_pq_nulls(data) > reader = pa.BufferReader(data) > return pq.read_table(reader) > with open(os.path.join(os.path.dirname(__file__), 'dct.pickle'), 'rb') as fp: > dct = pickle.load(fp) > # this does NOT help: > # pa.set_cpu_count(1) > # import gc; gc.disable() > table = dct_to_table(dct) > # this fixes the issue: > # table = pa.Table.from_pandas(table.to_pandas()) > table2 = roundtrip(table) > assert table.column('b').null_count == 0 > assert table2.column('b').null_count == 0 # fails > # if table2 is converted to pandas, you can also observe that some values at > the end of column b are `['']` which clearly is not present in the original > data > {code} > I would also be thankful for any pointers on where the bug comes from or on > who to reduce the test case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4999) [Doc][C++] Add examples on how to construct with ArrayData::Make instead of builder classes
[ https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4999: Summary: [Doc][C++] Add examples on how to construct with ArrayData::Make instead of builder classes (was: [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes) > [Doc][C++] Add examples on how to construct with ArrayData::Make instead of > builder classes > --- > > Key: ARROW-4999 > URL: https://issues.apache.org/jira/browse/ARROW-4999 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Priority: Minor > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4888) [C++/Python] Test build with conda's defaults channel
[ https://issues.apache.org/jira/browse/ARROW-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4888: Fix Version/s: (was: 0.14.0) > [C++/Python] Test build with conda's defaults channel > - > > Key: ARROW-4888 > URL: https://issues.apache.org/jira/browse/ARROW-4888 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Packaging, Python >Reporter: Uwe L. Korn >Priority: Major > > We mostly use {{conda-forge}} as the developers of Arrow but also have some > users that would build with packages from {{defaults}}. As the versions of > packages is a bit behind (and sometimes the contents are different), we > should also have a docker-test for this channel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4917: Fix Version/s: (was: 0.14.0) > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Major > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4999) [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes
[ https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4999: Component/s: C++ > [Doc] Add examples on how to construct with ArrayData::Make instead of > builder classes > -- > > Key: ARROW-4999 > URL: https://issues.apache.org/jira/browse/ARROW-4999 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Priority: Minor > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4960) [R] Add crossbow task for r-arrow-feedstock
[ https://issues.apache.org/jira/browse/ARROW-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4960: Fix Version/s: (was: 0.14.0) > [R] Add crossbow task for r-arrow-feedstock > --- > > Key: ARROW-4960 > URL: https://issues.apache.org/jira/browse/ARROW-4960 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, R >Reporter: Uwe L. Korn >Priority: Major > > We also have an R package on conda-forge now: > [https://github.com/conda-forge/r-arrow-feedstock] This should be tested > using crossbow as we do with the other packages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-4884) [C++] conda-forge thrift-cpp package not available via pkg-config or cmake
[ https://issues.apache.org/jira/browse/ARROW-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4884. --- Resolution: Not A Problem > [C++] conda-forge thrift-cpp package not available via pkg-config or cmake > -- > > Key: ARROW-4884 > URL: https://issues.apache.org/jira/browse/ARROW-4884 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Artifact of CMake refactor > I opened https://github.com/conda-forge/thrift-cpp-feedstock/issues/35 about > investigating why Thrift does not export the correct files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4868) [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04
[ https://issues.apache.org/jira/browse/ARROW-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4868: Fix Version/s: (was: 0.14.0) > [C++][Gandiva] Build fails with system Boost on Ubuntu Trusty 14.04 > --- > > Key: ARROW-4868 > URL: https://issues.apache.org/jira/browse/ARROW-4868 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Wes McKinney >Priority: Major > > It would be nice for things to work out of the box, but maybe not worth it. I > can use vendored Boost for now > {code} > /usr/include/boost/functional/hash/extensions.hpp:269:20: error: no matching > function for call to 'hash_value' > return hash_value(val); >^~ > /usr/include/boost/functional/hash/hash.hpp:249:17: note: in instantiation of > member function 'boost::hash > >::operator()' requested here > seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2); > ^ > /home/wesm/code/arrow/cpp/src/gandiva/filter_cache_key.h:40:12: note: in > instantiation of function template specialization > 'boost::hash_combine >' requested here > boost::hash_combine(result, configuration); >^ > /usr/include/boost/functional/hash/extensions.hpp:70:17: note: candidate > template ignored: could not match 'pair' against 'shared_ptr' > std::size_t hash_value(std::pair const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:79:17: note: candidate > template ignored: could not match 'vector' against 'shared_ptr' > std::size_t hash_value(std::vector const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:85:17: note: candidate > template ignored: could not match 'list' against 'shared_ptr' > std::size_t hash_value(std::list const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:91:17: note: candidate > template ignored: could not match 'deque' against 'shared_ptr' > std::size_t hash_value(std::deque const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:97:17: note: candidate > template ignored: could not match 'set' against 'shared_ptr' > std::size_t hash_value(std::set const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:103:17: note: candidate > template ignored: could not match 'multiset' against 'shared_ptr' > std::size_t hash_value(std::multiset const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:109:17: note: candidate > template ignored: could not match 'map' against 'shared_ptr' > std::size_t hash_value(std::map const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:115:17: note: candidate > template ignored: could not match 'multimap' against 'shared_ptr' > std::size_t hash_value(std::multimap const& v) > ^ > /usr/include/boost/functional/hash/extensions.hpp:121:17: note: candidate > template ignored: could not match 'complex' against 'shared_ptr' > std::size_t hash_value(std::complex const& v) > ^ > /usr/include/boost/functional/hash/hash.hpp:187:57: note: candidate template > ignored: substitution failure [with T = > std::shared_ptr]: no type named 'type' in > 'boost::hash_detail::basic_numbers >' > typename boost::hash_detail::basic_numbers::type hash_value(T v) > ^ > /usr/include/boost/functional/hash/hash.hpp:193:56: note: candidate template > ignored: substitution failure [with T = > std::shared_ptr]: no type named 'type' in > 'boost::hash_detail::long_numbers >' > typename boost::hash_detail::long_numbers::type hash_value(T v) > ^ > /usr/include/boost/functional/hash/hash.hpp:199:57: note: candidate template > ignored: substitution failure [with T = > std::shared_ptr]: no type named 'type' in > 'boost::hash_detail::ulong_numbers >' > typename boost::hash_detail::ulong_numbers::type hash_value(T v) > ^ > /usr/include/boost/functional/hash/hash.hpp:205:31: note: candidate template > ignored: disabled by 'enable_if' [with T = > std::shared_ptr] > typename boost::enable_if, std::size_t>::type > ^ > /usr/include/boost/functional/hash/hash.hpp:213:36: note: candidate template > ignored: could not match 'T *const' against 'const > std::shared_ptr' > template std::size_t hash_value(T* const& v) >^ > /usr/include/boost/functional/hash/hash.hpp:306:24: note: candidate template > ignored: could not match 'const T [N]' against 'const > std::shared_ptr' > inline std::size
[jira] [Updated] (ARROW-3975) Find a better organizational scheme for inter-language integration / protocol tests and integration tests between Apache Arrow and third party projects
[ https://issues.apache.org/jira/browse/ARROW-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-3975: --- Component/s: Continuous Integration > Find a better organizational scheme for inter-language integration / protocol > tests and integration tests between Apache Arrow and third party projects > --- > > Key: ARROW-3975 > URL: https://issues.apache.org/jira/browse/ARROW-3975 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Some integration tests with 3rd party projects have gotten moved to > https://github.com/apache/arrow/tree/master/integration which doesn't look > right to me. I suggest we either find a new home in the codebase for the > protocol integration tests or move the 3rd party integration tests somewhere > else -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5452) [R] Add documentation website (pkgdown)
[ https://issues.apache.org/jira/browse/ARROW-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5452: --- Component/s: R Documentation > [R] Add documentation website (pkgdown) > --- > > Key: ARROW-5452 > URL: https://issues.apache.org/jira/browse/ARROW-5452 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > pkgdown ([https://pkgdown.r-lib.org/]) is the standard for R package > documentation websites. Build this for arrow and deploy it at > https://arrow.apache.org/docs/r. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4875) [C++] MSVC Boost warnings after CMake refactor on cmake 3.12
[ https://issues.apache.org/jira/browse/ARROW-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4875: Fix Version/s: (was: 0.14.0) > [C++] MSVC Boost warnings after CMake refactor on cmake 3.12 > > > Key: ARROW-4875 > URL: https://issues.apache.org/jira/browse/ARROW-4875 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > I haven't investigated if this was present before the refactor, but since we > set {{Boost_ADDITIONAL_VERSIONS}} in theory this "scary" warning should not > show up > {code} > CMake Warning at C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847 > (message): > New Boost version may have incorrect or missing dependencies and imported > targets > Call Stack (most recent call first): > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959 > (_Boost_COMPONENT_DEPENDENCIES) > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618 > (_Boost_MISSING_DEPENDENCIES) > cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package) > CMakeLists.txt:536 (include) > CMake Warning at C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847 > (message): > New Boost version may have incorrect or missing dependencies and imported > targets > Call Stack (most recent call first): > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959 > (_Boost_COMPONENT_DEPENDENCIES) > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618 > (_Boost_MISSING_DEPENDENCIES) > cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package) > CMakeLists.txt:536 (include) > CMake Warning at C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:847 > (message): > New Boost version may have incorrect or missing dependencies and imported > targets > Call Stack (most recent call first): > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:959 > (_Boost_COMPONENT_DEPENDENCIES) > C:/Program Files (x86)/Microsoft Visual > Studio/2017/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.12/Modules/FindBoost.cmake:1618 > (_Boost_MISSING_DEPENDENCIES) > cmake_modules/ThirdpartyToolchain.cmake:1893 (find_package) > CMakeLists.txt:536 (include) > -- Boost version: 1.69.0 > -- Found the following Boost libraries: > -- regex > -- system > -- filesystem > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4864) [C++] gandiva-micro_benchmarks is broken in MSVC build
[ https://issues.apache.org/jira/browse/ARROW-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4864: Fix Version/s: (was: 0.14.0) > [C++] gandiva-micro_benchmarks is broken in MSVC build > -- > > Key: ARROW-4864 > URL: https://issues.apache.org/jira/browse/ARROW-4864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Pindikura Ravindra >Priority: Major > > Not a blocking issue for 0.13. I encountered this when debugging the CMake > refactor branch with Visual Studio 2015 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4830) [Python] Remove backward compatibility hacks from pyarrow.pandas_compat
[ https://issues.apache.org/jira/browse/ARROW-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4830: Fix Version/s: (was: 0.14.0) > [Python] Remove backward compatibility hacks from pyarrow.pandas_compat > --- > > Key: ARROW-4830 > URL: https://issues.apache.org/jira/browse/ARROW-4830 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > > This code is growing less maintainable. I think we can remove these backwards > compatibility hacks since there are released versions of pyarrow that can be > used to read old metadata and "fix" Parquet files if need be -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5470: --- Component/s: Continuous Integration > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4860) [C++] Build AWS C++ SDK for Windows in conda-forge
[ https://issues.apache.org/jira/browse/ARROW-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4860: Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Build AWS C++ SDK for Windows in conda-forge > -- > > Key: ARROW-4860 > URL: https://issues.apache.org/jira/browse/ARROW-4860 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: filesystem > Fix For: 0.15.0 > > > We the aws-sdk-cpp package to be able to use the C++ SDK for S3 support. it > is currently available for Linux and macOS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4809) [Python] import error with undefined symbol _ZNK5arrow6Status8ToStringB5xcc11Ev
[ https://issues.apache.org/jira/browse/ARROW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853411#comment-16853411 ] Wes McKinney commented on ARROW-4809: - I'm inclined to close this unless there is something we can do in the Arrow project to help > [Python] import error with undefined symbol > _ZNK5arrow6Status8ToStringB5xcc11Ev > --- > > Key: ARROW-4809 > URL: https://issues.apache.org/jira/browse/ARROW-4809 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1 > Environment: RHELS 6.10; Python 3.7.2 >Reporter: David Schwab >Priority: Major > > I installed conda 4.5.12 and created a new environment named test-env. I > activated this environment and installed several packages with conda, > including pyarrow. When I run a Python shell and import pyarrow, I get the > following error: > > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "/test-env/lib/python3.7/site-packages/pyarrow/__init__.py", line 54, > in > from pyarrow.lib import cpu_count, set cpu_count > Import Error: > /test-env/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so: > undefined symbol: _ZNK5arrow6Status8ToStringB5xcc11Ev > {code} > From Googling, I believe this has to do with the compiler flags used to build > either pyarrow or one of its dependencies (libboost has been suggested); I > can build the package from source if I need to, but I'm not sure what flags I > would need to set to fix the error. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4810) [Format][C++] Add "LargeList" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4810: Fix Version/s: (was: 0.14.0) 0.15.0 > [Format][C++] Add "LargeList" type with 64-bit offsets > -- > > Key: ARROW-4810 > URL: https://issues.apache.org/jira/browse/ARROW-4810 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Wes McKinney >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Mentioned in https://github.com/apache/arrow/issues/3845 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4809) [Python] import error with undefined symbol _ZNK5arrow6Status8ToStringB5xcc11Ev
[ https://issues.apache.org/jira/browse/ARROW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4809: Fix Version/s: (was: 0.14.0) > [Python] import error with undefined symbol > _ZNK5arrow6Status8ToStringB5xcc11Ev > --- > > Key: ARROW-4809 > URL: https://issues.apache.org/jira/browse/ARROW-4809 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1 > Environment: RHELS 6.10; Python 3.7.2 >Reporter: David Schwab >Priority: Major > > I installed conda 4.5.12 and created a new environment named test-env. I > activated this environment and installed several packages with conda, > including pyarrow. When I run a Python shell and import pyarrow, I get the > following error: > > {code:java} > Traceback (most recent call last): > File "", line 1, in > File "/test-env/lib/python3.7/site-packages/pyarrow/__init__.py", line 54, > in > from pyarrow.lib import cpu_count, set cpu_count > Import Error: > /test-env/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so: > undefined symbol: _ZNK5arrow6Status8ToStringB5xcc11Ev > {code} > From Googling, I believe this has to do with the compiler flags used to build > either pyarrow or one of its dependencies (libboost has been suggested); I > can build the package from source if I need to, but I'm not sure what flags I > would need to set to fix the error. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4798) [C++] Re-enable runtime/references cpplint check
[ https://issues.apache.org/jira/browse/ARROW-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4798: Fix Version/s: (was: 0.14.0) > [C++] Re-enable runtime/references cpplint check > > > Key: ARROW-4798 > URL: https://issues.apache.org/jira/browse/ARROW-4798 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > This will help keep the codebase clean. > We might consider defining some custom filters for cpplint warnings we want > to suppress, like it doesn't like {{benchmark::State&}} because of the > non-const reference -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions
[ https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853410#comment-16853410 ] Wes McKinney commented on ARROW-4787: - FYI [~fsaintjacques] [~pitrou] -- it is important to be able to compute analytics for values occurring when hash keys are null, rather than dropping them > [C++] Include "null" values (perhaps with an option to toggle on/off) in hash > kernel actions > > > Key: ARROW-4787 > URL: https://issues.apache.org/jira/browse/ARROW-4787 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > Null is a meaningful value in the context of analytics. We should have the > option of considering it distinctly in e.g. {{ValueCounts}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4788) [C++] Develop less verbose API for constructing StructArray
[ https://issues.apache.org/jira/browse/ARROW-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4788: Fix Version/s: (was: 0.14.0) > [C++] Develop less verbose API for constructing StructArray > --- > > Key: ARROW-4788 > URL: https://issues.apache.org/jira/browse/ARROW-4788 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > See comment at > https://github.com/apache/arrow/pull/3579/files#diff-7a1bd8476ae3e687fa8d961059596f06R526 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4799) [C++] Propose alternative strategy for handling Operation logical output types
[ https://issues.apache.org/jira/browse/ARROW-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4799: Fix Version/s: (was: 0.14.0) > [C++] Propose alternative strategy for handling Operation logical output types > -- > > Key: ARROW-4799 > URL: https://issues.apache.org/jira/browse/ARROW-4799 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Currently in the prototype work in ARROW-4782, operations are being "boxed" > in a strongly typed Expr types. An alternative structure would be for an > operation to define a virtual > {code} > virtual std::shared_ptr out_type() const = 0; > {code} > Where {{ArgType}} is some class that encodes the arity (array vs. scalar > vs) and value type (if any) that is emitted by the operation. > Operations emitting multiple pieces of data would need some kind of "tuple" > object output. We can iterate on this -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4761) [C++] Support zstandard<1
[ https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4761: Fix Version/s: (was: 0.14.0) > [C++] Support zstandard<1 > - > > Key: ARROW-4761 > URL: https://issues.apache.org/jira/browse/ARROW-4761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Uwe L. Korn >Priority: Major > > To support building with as many system packages as possible on Ubuntu, we > should support building with zstandard 0.5.1 which is the one available on > Ubuntu Xenial. Given the size of our current code for Zstandard, this seems > feasible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4761) [C++] Support zstandard<1
[ https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4761: --- Assignee: (was: Uwe L. Korn) > [C++] Support zstandard<1 > - > > Key: ARROW-4761 > URL: https://issues.apache.org/jira/browse/ARROW-4761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > To support building with as many system packages as possible on Ubuntu, we > should support building with zstandard 0.5.1 which is the one available on > Ubuntu Xenial. Given the size of our current code for Zstandard, this seems > feasible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5470: -- Labels: pull-request-available (was: ) > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions
[ https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4787: Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Include "null" values (perhaps with an option to toggle on/off) in hash > kernel actions > > > Key: ARROW-4787 > URL: https://issues.apache.org/jira/browse/ARROW-4787 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > Null is a meaningful value in the context of analytics. We should have the > option of considering it distinctly in e.g. {{ValueCounts}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853409#comment-16853409 ] Neal Richardson commented on ARROW-5470: Just reinstalling the packages that got removed seems to fix it. We now get through our expected R package build failure [https://travis-ci.org/nealrichardson/arrow/jobs/539871131] > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Fix For: 0.14.0 > > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4504) [C++] Reduce the number of unit test executables
[ https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4504: -- Labels: pull-request-available (was: ) > [C++] Reduce the number of unit test executables > > > Key: ARROW-4504 > URL: https://issues.apache.org/jira/browse/ARROW-4504 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Link times are a significant drag in MSVC builds. They don't affect Linux > nearly as much when building with Ninja. I suggest we combine some of the > fast-running tests within logical units to see if we can cut down from 106 > test executables to 70 or so > {code} > 100% tests passed, 0 tests failed out of 107 > Label Time Summary: > arrow-tests = 21.19 sec*proc (48 tests) > arrow_python-tests= 0.26 sec*proc (1 test) > example = 0.05 sec*proc (1 test) > gandiva-tests = 11.65 sec*proc (39 tests) > parquet-tests = 35.81 sec*proc (18 tests) > unittest = 68.92 sec*proc (106 tests) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853383#comment-16853383 ] Wes McKinney commented on ARROW-1983: - Well, one issue is how to use the _metadata file to read data from the files it lists within without having to parse those files' respective metadata again. I think this may require a little bit of refactoring in the Parquet C++ library > [Python] Add ability to write parquet `_metadata` file > -- > > Key: ARROW-1983 > URL: https://issues.apache.org/jira/browse/ARROW-1983 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Jim Crist >Priority: Major > Labels: beginner, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file > (mostly just schema information). It would be useful to add the ability to > write a {{_metadata}} file as well. This should include information about > each row group in the dataset, including summary statistics. Having this > summary file would allow filtering of row groups without needing to access > each file beforehand. > This would require that the user is able to get the written RowGroups out of > a {{pyarrow.parquet.write_table}} call and then give these objects as a list > to new function that then passes them on as C++ objects to {{parquet-cpp}} > that generates the respective {{_metadata}} file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853378#comment-16853378 ] Rick Zamora commented on ARROW-1983: I submitted a PR to perform the metadata aggregation and metadata-only file write ([https://github.com/apache/arrow/pull/4405]). I just syncronized with the master branch, so hopefully I can address any suggestions/concerns people have relatively quickly. Are there any additional features that we need for "utilizing" the metadata file within arrow.parque itself? I believe the existing read_metadata function should be sufficient for the needs of dask. > [Python] Add ability to write parquet `_metadata` file > -- > > Key: ARROW-1983 > URL: https://issues.apache.org/jira/browse/ARROW-1983 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Jim Crist >Priority: Major > Labels: beginner, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently {{pyarrow.parquet}} can only write the {{_common_metadata}} file > (mostly just schema information). It would be useful to add the ability to > write a {{_metadata}} file as well. This should include information about > each row group in the dataset, including summary statistics. Having this > summary file would allow filtering of row groups without needing to access > each file beforehand. > This would require that the user is able to get the written RowGroups out of > a {{pyarrow.parquet.write_table}} call and then give these objects as a list > to new function that then passes them on as C++ objects to {{parquet-cpp}} > that generates the respective {{_metadata}} file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja
[ https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853341#comment-16853341 ] Wes McKinney commented on ARROW-5473: - The suspicious line is https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1259 > [C++] Build failure on googletest_ep on Windows when using Ninja > > > Key: ARROW-5473 > URL: https://issues.apache.org/jira/browse/ARROW-5473 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I consistently get this error when trying to use Ninja locally: > {code} > -- extracting... > > src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz' > > dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep' > -- extracting... [tar xfz] > -- extracting... [analysis] > -- extracting... [rename] > CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file): > file RENAME failed to rename > > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1 > to > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep > because: Directory not empty > [179/623] Building CXX object > src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj > ninja: build stopped: subcommand failed. > {code} > I'm running within cmdr terminal emulator so it's conceivable there's some > path modifications that are causing issues. > The CMake invocation is > {code} > cmake -G "Ninja" ^ -DCMAKE_BUILD_TYPE=Release ^ > -DARROW_BUILD_TESTS=on ^ -DARROW_CXXFLAGS="/WX /MP" ^ > -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON > -DARROW_VERBOSE_THIRDPARTY_BUILD=on .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja
Wes McKinney created ARROW-5473: --- Summary: [C++] Build failure on googletest_ep on Windows when using Ninja Key: ARROW-5473 URL: https://issues.apache.org/jira/browse/ARROW-5473 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.14.0 I consistently get this error when trying to use Ninja locally: {code} -- extracting... src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz' dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep' -- extracting... [tar xfz] -- extracting... [analysis] -- extracting... [rename] CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file): file RENAME failed to rename C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1 to C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep because: Directory not empty [179/623] Building CXX object src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj ninja: build stopped: subcommand failed. {code} I'm running within cmdr terminal emulator so it's conceivable there's some path modifications that are causing issues. The CMake invocation is {code} cmake -G "Ninja" ^ -DCMAKE_BUILD_TYPE=Release ^ -DARROW_BUILD_TESTS=on ^ -DARROW_CXXFLAGS="/WX /MP" ^ -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON -DARROW_VERBOSE_THIRDPARTY_BUILD=on .. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4504) [C++] Reduce the number of unit test executables
[ https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4504: --- Assignee: Wes McKinney > [C++] Reduce the number of unit test executables > > > Key: ARROW-4504 > URL: https://issues.apache.org/jira/browse/ARROW-4504 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Link times are a significant drag in MSVC builds. They don't affect Linux > nearly as much when building with Ninja. I suggest we combine some of the > fast-running tests within logical units to see if we can cut down from 106 > test executables to 70 or so > {code} > 100% tests passed, 0 tests failed out of 107 > Label Time Summary: > arrow-tests = 21.19 sec*proc (48 tests) > arrow_python-tests= 0.26 sec*proc (1 test) > example = 0.05 sec*proc (1 test) > gandiva-tests = 11.65 sec*proc (39 tests) > parquet-tests = 35.81 sec*proc (18 tests) > unittest = 68.92 sec*proc (106 tests) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5433) [C++][Parquet] improve parquet-reader columns information
[ https://issues.apache.org/jira/browse/ARROW-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5433. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4403 [https://github.com/apache/arrow/pull/4403] > [C++][Parquet] improve parquet-reader columns information > - > > Key: ARROW-5433 > URL: https://issues.apache.org/jira/browse/ARROW-5433 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Renat Valiullin >Priority: Trivial > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > replace column name by column path and better type information -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5433) [C++][Parquet] improve parquet-reader columns information
[ https://issues.apache.org/jira/browse/ARROW-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5433: --- Assignee: Renat Valiullin > [C++][Parquet] improve parquet-reader columns information > - > > Key: ARROW-5433 > URL: https://issues.apache.org/jira/browse/ARROW-5433 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Renat Valiullin >Assignee: Renat Valiullin >Priority: Trivial > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > replace column name by column path and better type information -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4343) [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4343: Fix Version/s: (was: 0.14.0) > [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to > docker-compose setup > - > > Key: ARROW-4343 > URL: https://issues.apache.org/jira/browse/ARROW-4343 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Until we formally stop supporting Trusty it would be useful to be able to > verify in Docker that builds work there. I still have an Ubuntu 14.04 machine > that I use (and I've been filing bugs that I find on it) but not sure for how > much longer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4159) [C++] Check for -Wdocumentation issues
[ https://issues.apache.org/jira/browse/ARROW-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4159: -- Labels: pull-request-available (was: ) > [C++] Check for -Wdocumentation issues > --- > > Key: ARROW-4159 > URL: https://issues.apache.org/jira/browse/ARROW-4159 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > I fixed some -Wdocumentation issues in ARROW-4157 that showed up on one Linux > distribution but not another, both with clang-6.0. Not sure why that is > exactly, but it would be good to try to reproduce and see if our CI can be > improved to catch these, or in worst case we could do it in one of our > docker-compose builds -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4097) [C++] Add function to "conform" a dictionary array to a target new dictionary
[ https://issues.apache.org/jira/browse/ARROW-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4097: Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Add function to "conform" a dictionary array to a target new dictionary > - > > Key: ARROW-4097 > URL: https://issues.apache.org/jira/browse/ARROW-4097 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > Follow up work to ARROW-554. > Unifying multiple dictionary-encoded arrays is one use case. Another is > rewriting a DictionaryArray to be based on another dictionary. For example, > this would be used to implement Cast from one dictionary type to another. > This will need to be able to insert nulls where there are values that are not > found in the target dictionary > see also discussion at > https://github.com/apache/arrow/pull/3165#discussion_r243025730 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5472) [Development] Add warning to PR merge tool if no JIRA component is set
Wes McKinney created ARROW-5472: --- Summary: [Development] Add warning to PR merge tool if no JIRA component is set Key: ARROW-5472 URL: https://issues.apache.org/jira/browse/ARROW-5472 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Wes McKinney Fix For: 0.14.0 This will help with JIRA hygiene (there are over 300 resolved issues this moment with no component set) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5218) [C++] Improve build when third-party library locations are specified
[ https://issues.apache.org/jira/browse/ARROW-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5218: Component/s: C++ > [C++] Improve build when third-party library locations are specified > - > > Key: ARROW-5218 > URL: https://issues.apache.org/jira/browse/ARROW-5218 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Deepak Majeti >Assignee: Deepak Majeti >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The current CMake build system does not handle user specified third-party > library locations well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5418) [CI][R] Run code coverage and report to codecov.io
[ https://issues.apache.org/jira/browse/ARROW-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5418: Component/s: R > [CI][R] Run code coverage and report to codecov.io > -- > > Key: ARROW-5418 > URL: https://issues.apache.org/jira/browse/ARROW-5418 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5087) [Debian] APT repository no longer contains libarrow-dev
[ https://issues.apache.org/jira/browse/ARROW-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5087: Component/s: Packaging > [Debian] APT repository no longer contains libarrow-dev > --- > > Key: ARROW-5087 > URL: https://issues.apache.org/jira/browse/ARROW-5087 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Steven Fackler >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.13.0 > > > After following the Debian APT repository setup instructions in > [https://arrow.apache.org/install/], apt can no longer find the libarrow-dev > package: > {noformat} > root@674af4cba924:/# apt update > Get:1 http://apt.llvm.org/stretch llvm-toolchain-stretch-7 InRelease [4235 B] > Get:3 http://apt.llvm.org/stretch llvm-toolchain-stretch-7/main Sources [2506 > B] > Get:4 http://apt.llvm.org/stretch llvm-toolchain-stretch-7/main amd64 > Packages [9063 B] > Hit:2 http://security-cdn.debian.org/debian-security stretch/updates InRelease > Ign:5 https://dl.bintray.com/apache/arrow/debian stretch InRelease > Get:6 https://dl.bintray.com/apache/arrow/debian stretch Release [4087 B] > Get:8 https://dl.bintray.com/apache/arrow/debian stretch Release.gpg [833 B] > Ign:7 http://cdn-fastly.deb.debian.org/debian stretch InRelease > Hit:9 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease > Get:10 https://dl.bintray.com/apache/arrow/debian stretch/main amd64 Packages > [3036 B] > Hit:11 http://cdn-fastly.deb.debian.org/debian stretch Release > Fetched 23.8 kB in 0s (33.1 kB/s) > Reading package lists... Done > Building dependency tree > Reading state information... Done > 1 package can be upgraded. Run 'apt list --upgradable' to see it. > root@674af4cba924:/# apt install -y libarrow-dev > Reading package lists... Done > Building dependency tree > Reading state information... Done > E: Unable to locate package libarrow-dev > root@674af4cba924:/# apt search libarrow > Sorting... Done > Full Text Search... Done > libarrow-cuda-glib-dev/unknown 0.13.0-1 amd64 > Apache Arrow is a data processing library for analysis > libarrow-cuda-glib13/unknown 0.13.0-1 amd64 > Apache Arrow is a data processing library for analysis > libarrow-cuda13/unknown 0.13.0-1 amd64 > Apache Arrow is a data processing library for analysis{noformat} > This worked just fine last week, so I assume something bad happened with the > 0.13 release? The packages seem to be in bintray at least: > [https://bintray.com/apache/arrow/debian/0.13.0#files/debian%2Fpool%2Fstretch%2Fmain%2Fa%2Fapache-arrow] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints
[ https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5430: -- Labels: parquet pull-request-available (was: parquet) > [Python] Can read but not write parquet partitioned on large ints > - > > Key: ARROW-5430 > URL: https://issues.apache.org/jira/browse/ARROW-5430 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 > Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64. >Reporter: Robin Kåveland >Priority: Minor > Labels: parquet, pull-request-available > > Here's a contrived example that reproduces this issue using pandas: > {code:java} > import numpy as np > import pandas as pd > real_usernames = np.array(['anonymize', 'me']) > usernames = pd.util.hash_array(real_usernames) > login_count = [13, 9] > df = pd.DataFrame({'user': usernames, 'logins': login_count}) > df.to_parquet('can_write.parq', partition_cols=['user']) > # But not read > pd.read_parquet('can_write.parq'){code} > Expected behaviour: > * Either the write fails > * Or the read succeeds > Actual behaviour: The read fails with the following error: > {code:java} > Traceback (most recent call last): > File "", line 2, in > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py", > line 282, in read_parquet > return impl.read(path, columns=columns, **kwargs) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py", > line 129, in read > **kwargs).to_pandas() > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 1152, in read_table > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py", > line 181, in read_parquet > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 1014, in read > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 587, in read > dictionary = partitions.levels[i].dictionary > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 642, in dictionary > dictionary = lib.array(integer_keys) > File "pyarrow/array.pxi", line 173, in pyarrow.lib.array > File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status > pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to > C long{code} > I set the priority to minor here because it's easy enough to work around this > in user code unless you really need the 64 bit hash (and you probably > shouldn't be partitioning on that anyway). > I could take a stab at writing a patch for this if there's interest? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853274#comment-16853274 ] Neal Richardson commented on ARROW-5470: Using xenial fixes the compilation error, but it breaks R by removing libgfortran. After successfully building the C++ library and moving on to installing R packages, R itself fails to start: {code:java} $ Rscript -e 'install.packages(c("remotes"));if (!all(c("remotes") %in% installed.packages())) { q(status = 1, save = "no")}' Error: package or namespace load failed for ‘stats’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/home/travis/R-bin/lib/R/library/stats/libs/stats.so': libgfortran.so.3: cannot open shared object file: No such file or directory {code} I confirmed that R is not broken before we build the C++ library: [https://github.com/nealrichardson/arrow/commit/d27d374488d500d329c67a58256e80d473b8] It appears that on Xenial, `sudo apt-get install -q clang-7 clang-format-7 clang-tidy-7` removes fortran [https://travis-ci.org/nealrichardson/arrow/jobs/539817605#L717-L750]: {code:java} The following packages will be REMOVED: gfortran gfortran-5 libblas-dev libgfortran-5-dev libgfortran3 liblapack-dev liblapack3 {code} while on Trusty, no packages are removed: [https://travis-ci.org/apache/arrow/jobs/538795366#L1061-L1093] > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Fix For: 0.14.0 > > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson reassigned ARROW-5470: -- Assignee: Neal Richardson > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Blocker > Fix For: 0.14.0 > > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
[ https://issues.apache.org/jira/browse/ARROW-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853231#comment-16853231 ] Neal Richardson commented on ARROW-5470: Update: it wasn't the env var: [https://travis-ci.org/nealrichardson/arrow/jobs/539794848] Next theory: OS/library version. R is using Trusty while GLib is on Xenial, so there are these version differences: GLib: -- Building using CMake version: 3.12.4 -- The C compiler identification is GNU 5.4.0 -- The CXX compiler identification is GNU 5.4.0 -- BOOST_VERSION: 1.67.0 (but later) -- Boost version: 1.58.0 R: -- Building using CMake version: 3.9.2 -- The C compiler identification is GNU 4.8.4 -- The CXX compiler identification is GNU 4.8.4 -- BOOST_VERSION: 1.67.0 (but later) -- Boost version: 1.54.0 > [CI] C++ local filesystem patch breaks Travis R job > --- > > Key: ARROW-5470 > URL: https://issues.apache.org/jira/browse/ARROW-5470 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Neal Richardson >Priority: Blocker > Fix For: 0.14.0 > > > https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and > required downstream bindings to be updated. Romain wasn't immediately > available to update R, so we marked the R job on Travis as an "allowed > failure". That failure looked like this: > [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ > library built fine, but then the R package failed to build because it didn't > line up with what's in C++. > Then, the C++ local file system patch > (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, > though we were still ignoring the R build, which continued to fail. But, it > started failing differently. Here's what the R build failure looks like on > that PR, and on master since then: > [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ > library is failing to build, so we're not even getting to the expected R > failure. > For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar > setup to the R build, and it's still passing. One difference between the two > jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which > sounds related to some open R issues, and `boost::filesystem` appears all > over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5471) [C++][Gandiva]Array offset is ignored in Gandiva projector
Zeyuan Shang created ARROW-5471: --- Summary: [C++][Gandiva]Array offset is ignored in Gandiva projector Key: ARROW-5471 URL: https://issues.apache.org/jira/browse/ARROW-5471 Project: Apache Arrow Issue Type: Bug Reporter: Zeyuan Shang I used the test case in [https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25], and found an issue when I was using the slice operator {{input_batch[1:]}}. It seems that the offset is ignored in the Gandiva projector. {code:java} import pyarrow as pa import pyarrow.gandiva as gandiva builder = gandiva.TreeExprBuilder() field_a = pa.field('a', pa.int32()) field_b = pa.field('b', pa.int32()) schema = pa.schema([field_a, field_b]) field_result = pa.field('res', pa.int32()) node_a = builder.make_field(field_a) node_b = builder.make_field(field_b) condition = builder.make_function("greater_than", [node_a, node_b], pa.bool_()) if_node = builder.make_if(condition, node_a, node_b, pa.int32()) expr = builder.make_expression(if_node, field_result) projector = gandiva.make_projector( schema, [expr], pa.default_memory_pool()) a = pa.array([10, 12, -20, 5], type=pa.int32()) b = pa.array([5, 15, 15, 17], type=pa.int32()) e = pa.array([10, 15, 15, 17], type=pa.int32()) input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b']) r, = projector.evaluate(input_batch[1:]) print(r) {code} If we use the full record batch {{input_batch}}, the expected output is {{[10, 15, 15, 17]}}. So if we use {{input_batch[1:]}}, the expected output should be {{[15, 15, 17]}}, however this script returned {{[10, 15, 15]}}. It seems that the projector ignores the offset and always reads from 0. A corresponding issue is created in GitHub as well [https://github.com/apache/arrow/issues/4420] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays
[ https://issues.apache.org/jira/browse/ARROW-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet reassigned ARROW-5467: -- Assignee: Sebastien Binet > [Go] implement read/write IPC for Time32/Time64 arrays > -- > > Key: ARROW-5467 > URL: https://issues.apache.org/jira/browse/ARROW-5467 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays
[ https://issues.apache.org/jira/browse/ARROW-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet reassigned ARROW-5468: -- Assignee: Sebastien Binet > [Go] implement read/write IPC for Timestamp arrays > -- > > Key: ARROW-5468 > URL: https://issues.apache.org/jira/browse/ARROW-5468 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays
[ https://issues.apache.org/jira/browse/ARROW-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet reassigned ARROW-5469: -- Assignee: Sebastien Binet > [Go] implement read/write IPC for Date32/Date64 arrays > -- > > Key: ARROW-5469 > URL: https://issues.apache.org/jira/browse/ARROW-5469 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5266) [Go] implement read/write IPC for Float16
[ https://issues.apache.org/jira/browse/ARROW-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet reassigned ARROW-5266: -- Assignee: Sebastien Binet > [Go] implement read/write IPC for Float16 > - > > Key: ARROW-5266 > URL: https://issues.apache.org/jira/browse/ARROW-5266 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5470) [CI] C++ local filesystem patch breaks Travis R job
Neal Richardson created ARROW-5470: -- Summary: [CI] C++ local filesystem patch breaks Travis R job Key: ARROW-5470 URL: https://issues.apache.org/jira/browse/ARROW-5470 Project: Apache Arrow Issue Type: Improvement Reporter: Neal Richardson Fix For: 0.14.0 https://issues.apache.org/jira/browse/ARROW-3144 changed a C++ API and required downstream bindings to be updated. Romain wasn't immediately available to update R, so we marked the R job on Travis as an "allowed failure". That failure looked like this: [https://travis-ci.org/apache/arrow/jobs/538795366#L3711-L3830] The C++ library built fine, but then the R package failed to build because it didn't line up with what's in C++. Then, the C++ local file system patch (https://issues.apache.org/jira/browse/ARROW-5378) landed. Travis passed, though we were still ignoring the R build, which continued to fail. But, it started failing differently. Here's what the R build failure looks like on that PR, and on master since then: [https://travis-ci.org/apache/arrow/jobs/539207245#L2520-L2640] The C++ library is failing to build, so we're not even getting to the expected R failure. For reference, the "C++ & GLib & Ruby w/ gcc 5.4" build has the most similar setup to the R build, and it's still passing. One difference between the two jobs is that the GLib one has `ARROW_TRAVIS_USE_VENDORED_BOOST=1`, which sounds related to some open R issues, and `boost::filesystem` appears all over the error in the R job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays
[ https://issues.apache.org/jira/browse/ARROW-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5469: -- Labels: pull-request-available (was: ) > [Go] implement read/write IPC for Date32/Date64 arrays > -- > > Key: ARROW-5469 > URL: https://issues.apache.org/jira/browse/ARROW-5469 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays
[ https://issues.apache.org/jira/browse/ARROW-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5467: -- Labels: pull-request-available (was: ) > [Go] implement read/write IPC for Time32/Time64 arrays > -- > > Key: ARROW-5467 > URL: https://issues.apache.org/jira/browse/ARROW-5467 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Sebastien Binet >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)