[jira] [Resolved] (ARROW-8224) [C++] Remove APIs deprecated prior to 0.16.0
[ https://issues.apache.org/jira/browse/ARROW-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8224. - Resolution: Fixed Issue resolved by pull request 6735 [https://github.com/apache/arrow/pull/6735] > [C++] Remove APIs deprecated prior to 0.16.0 > > > Key: ARROW-8224 > URL: https://issues.apache.org/jira/browse/ARROW-8224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5585) [Go] rename arrow.TypeEquals into arrow.TypeEqual
[ https://issues.apache.org/jira/browse/ARROW-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5585: -- Labels: pull-request-available (was: ) > [Go] rename arrow.TypeEquals into arrow.TypeEqual > - > > Key: ARROW-5585 > URL: https://issues.apache.org/jira/browse/ARROW-5585 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Priority: Major > Labels: pull-request-available > > this is to follow Go' stdlib conventions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
Andy Grove created ARROW-8249: - Summary: [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent Key: ARROW-8249 URL: https://issues.apache.org/jira/browse/ARROW-8249 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Andy Grove Fix For: 1.0.0 We now have two similar APIs with Table and LogicalPlanBuilder and although they are similar, there are some differences and it would be good to unify them. There is also code duplication and it most likely makes sense for the Table API to delegate to the query builder API to build logical plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors
[ https://issues.apache.org/jira/browse/ARROW-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8246. - Resolution: Fixed Issue resolved by pull request 6743 [https://github.com/apache/arrow/pull/6743] > [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors > - > > Key: ARROW-8246 > URL: https://issues.apache.org/jira/browse/ARROW-8246 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > See > https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/ > This seems to be the MinGW equivalent of {{/bigobj}} in MSVC -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8222) [C++] Use bcp to make a slim boost for bundled build
[ https://issues.apache.org/jira/browse/ARROW-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069092#comment-17069092 ] Wes McKinney edited comment on ARROW-8222 at 3/27/20, 10:53 PM: To collect some anecdotal evidence about problems with the current boost_ep. It seems the entire boostorg has some kind of rate limiting issue on Bintray and trying to access e.g. https://dl.bintray.com/boostorg/release/1.72.0/source/boost_1_72_0.tar.gz yields 403 Forbidden. So all the more reason to host our EP artifact on GitHub or some other place within our agency was (Author: wesmckinn): To collect some anecdotal evidence about problems with the current boost_ep. It seems the entire boostorg has some kind of rate limiting issue on Bintray and trying to access dl.bintray.com yields 403 Forbidden. So all the more reason to host our EP artifact on GitHub or some other place within our agency > [C++] Use bcp to make a slim boost for bundled build > > > Key: ARROW-8222 > URL: https://issues.apache.org/jira/browse/ARROW-8222 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We don't use much of Boost (just system, filesystem, and regex), but when we > do a bundled build, we still download and extract all of boost. The tarball > itself is 113mb, expanded is over 700mb. This can be slow, and it requires a > lot of free disk space that we don't really need. > [bcp|https://www.boost.org/doc/libs/1_72_0/tools/bcp/doc/html/index.html] is > a boost tool that lets you extract a subset of boost, resolving any of its > necessary dependencies across boost. The savings for us could be huge: > {code} > mkdir test > ./bcp system.hpp filesystem.hpp regex.hpp test > tar -czf test.tar.gz test/ > {code} > The resulting tarball is 885K (kilobytes!). > {{bcp}} also lets you re-namespace, so this would (IIUC) solve ARROW-4286 as > well. > We would need a place to host this tarball, and we would have to updated it > whenever we (1) bump the boost version or (2) add a new boost library > dependency. This patch would of course include a script that would generate > the tarball. Given the small size, we could also consider just vendoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8222) [C++] Use bcp to make a slim boost for bundled build
[ https://issues.apache.org/jira/browse/ARROW-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069092#comment-17069092 ] Wes McKinney commented on ARROW-8222: - To collect some anecdotal evidence about problems with the current boost_ep. It seems the entire boostorg has some kind of rate limiting issue on Bintray and trying to access dl.bintray.com yields 403 Forbidden. So all the more reason to host our EP artifact on GitHub or some other place within our agency > [C++] Use bcp to make a slim boost for bundled build > > > Key: ARROW-8222 > URL: https://issues.apache.org/jira/browse/ARROW-8222 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We don't use much of Boost (just system, filesystem, and regex), but when we > do a bundled build, we still download and extract all of boost. The tarball > itself is 113mb, expanded is over 700mb. This can be slow, and it requires a > lot of free disk space that we don't really need. > [bcp|https://www.boost.org/doc/libs/1_72_0/tools/bcp/doc/html/index.html] is > a boost tool that lets you extract a subset of boost, resolving any of its > necessary dependencies across boost. The savings for us could be huge: > {code} > mkdir test > ./bcp system.hpp filesystem.hpp regex.hpp test > tar -czf test.tar.gz test/ > {code} > The resulting tarball is 885K (kilobytes!). > {{bcp}} also lets you re-namespace, so this would (IIUC) solve ARROW-4286 as > well. > We would need a place to host this tarball, and we would have to updated it > whenever we (1) bump the boost version or (2) add a new boost library > dependency. This patch would of course include a script that would generate > the tarball. Given the small size, we could also consider just vendoring it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8248) [C++] vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)
[ https://issues.apache.org/jira/browse/ARROW-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069085#comment-17069085 ] Wes McKinney commented on ARROW-8248: - This seems to be something that the vcpkg maintainers did on purpose https://github.com/microsoft/vcpkg/blob/master/ports/arrow/portfile.cmake#L46 You should probably report the problem to them directly > [C++] vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib) > -- > > Key: ARROW-8248 > URL: https://issues.apache.org/jira/browse/ARROW-8248 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Developer Tools >Affects Versions: 0.16.0 >Reporter: Scott Wilson >Priority: Major > > After installing Arrow via vcpkg, build the library per the steps below. > CMake builds the shared arrow library (.dll) and then the static arrow > library (.lib). It overwrites the shared arrow.lib (exports) with the static > arrow.lib. This results in multiple link/execution problems when using the vc > projects to build the example projects until you realize that shared arrow > needs to be rebuilt. (This took me two days.) > Also, many of the projects added with the extra -D flags (beyond > ARROW_BUILD_TESTS) don't build. > *** > "C:\Program Files (x86)\Microsoft Visual > Studio\2017\Professional\Common7\Tools\VsDevCmd.bat" -arch=amd64 > cd F:\Dev\vcpkg\buildtrees\arrow\src\row-0.16.0-872c330822\cpp > mkdir build > cd build > cmake .. -G "Visual Studio 15 2017 Win64" -DARROW_BUILD_TESTS=ON > -DARROW_BUILD_EXAMPLES=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON > -DCMAKE_BUILD_TYPE=Debug > cmake --build . --config Debug -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8248) [C++] vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)
[ https://issues.apache.org/jira/browse/ARROW-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8248: Summary: [C++] vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib) (was: vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)) > [C++] vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib) > -- > > Key: ARROW-8248 > URL: https://issues.apache.org/jira/browse/ARROW-8248 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Developer Tools >Affects Versions: 0.16.0 >Reporter: Scott Wilson >Priority: Major > > After installing Arrow via vcpkg, build the library per the steps below. > CMake builds the shared arrow library (.dll) and then the static arrow > library (.lib). It overwrites the shared arrow.lib (exports) with the static > arrow.lib. This results in multiple link/execution problems when using the vc > projects to build the example projects until you realize that shared arrow > needs to be rebuilt. (This took me two days.) > Also, many of the projects added with the extra -D flags (beyond > ARROW_BUILD_TESTS) don't build. > *** > "C:\Program Files (x86)\Microsoft Visual > Studio\2017\Professional\Common7\Tools\VsDevCmd.bat" -arch=amd64 > cd F:\Dev\vcpkg\buildtrees\arrow\src\row-0.16.0-872c330822\cpp > mkdir build > cd build > cmake .. -G "Visual Studio 15 2017 Win64" -DARROW_BUILD_TESTS=ON > -DARROW_BUILD_EXAMPLES=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON > -DCMAKE_BUILD_TYPE=Debug > cmake --build . --config Debug -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8248) vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)
Scott Wilson created ARROW-8248: --- Summary: vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib) Key: ARROW-8248 URL: https://issues.apache.org/jira/browse/ARROW-8248 Project: Apache Arrow Issue Type: Bug Components: C++, Developer Tools Affects Versions: 0.16.0 Reporter: Scott Wilson After installing Arrow via vcpkg, build the library per the steps below. CMake builds the shared arrow library (.dll) and then the static arrow library (.lib). It overwrites the shared arrow.lib (exports) with the static arrow.lib. This results in multiple link/execution problems when using the vc projects to build the example projects until you realize that shared arrow needs to be rebuilt. (This took me two days.) Also, many of the projects added with the extra -D flags (beyond ARROW_BUILD_TESTS) don't build. *** "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\Tools\VsDevCmd.bat" -arch=amd64 cd F:\Dev\vcpkg\buildtrees\arrow\src\row-0.16.0-872c330822\cpp mkdir build cd build cmake .. -G "Visual Studio 15 2017 Win64" -DARROW_BUILD_TESTS=ON -DARROW_BUILD_EXAMPLES=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON -DCMAKE_BUILD_TYPE=Debug cmake --build . --config Debug -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8241) [Rust] Add convenience methods to Schema
[ https://issues.apache.org/jira/browse/ARROW-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8241. --- Resolution: Fixed Issue resolved by pull request 6740 [https://github.com/apache/arrow/pull/6740] > [Rust] Add convenience methods to Schema > > > Key: ARROW-8241 > URL: https://issues.apache.org/jira/browse/ARROW-8241 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h > Remaining Estimate: 0h > > I would like to add the following methods to Schema to make it easier to work > with. > > {code:java} > pub fn field_with_name(, name: ) -> Result<>; > pub fn index_of(, name: ) -> Result; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7792) [R] read_feather does not close connection to file
[ https://issues.apache.org/jira/browse/ARROW-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069051#comment-17069051 ] Neal Richardson commented on ARROW-7792: Yes I've been waiting for your patch to land before tackling this. > [R] read_feather does not close connection to file > -- > > Key: ARROW-7792 > URL: https://issues.apache.org/jira/browse/ARROW-7792 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Martin >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 50m > Remaining Estimate: 0h > > x = as.data.frame(seq(1:100)) > pbFilename <- file.path(getwd(), "reproduceBug.feather") > arrow::write_feather(x = x, sink = pbFilename) > file.exists(pbFilename) > file.remove(pbFilename) > arrow::write_feather(x = x, sink = pbFilename) > tempDX <- arrow::read_feather(file = pbFilename, as_data_frame = T) > file.exists(pbFilename) > file.remove(pbFilename) > >Warning message: > >In file.remove(pbFilename) : > >cannot remove file > >'C:/Martin/Repo/ReinforcementLearner/reproduceBug.feather', reason > > 'Permission denied' > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7792) [R] read_feather does not close connection to file
[ https://issues.apache.org/jira/browse/ARROW-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069043#comment-17069043 ] Wes McKinney commented on ARROW-7792: - [~npr] we should probably revisit this in context with Feather V2 ARROW-5510. Now writing is boiled down to a single function {{arrow::ipc::feather::WriteTable}} which can fail. In Python we have this scenario guarded with try/except to make sure that the file handle is cleaned up (we got identical bug reports): https://github.com/apache/arrow/blob/master/python/pyarrow/feather.py#L179 > [R] read_feather does not close connection to file > -- > > Key: ARROW-7792 > URL: https://issues.apache.org/jira/browse/ARROW-7792 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Martin >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 50m > Remaining Estimate: 0h > > x = as.data.frame(seq(1:100)) > pbFilename <- file.path(getwd(), "reproduceBug.feather") > arrow::write_feather(x = x, sink = pbFilename) > file.exists(pbFilename) > file.remove(pbFilename) > arrow::write_feather(x = x, sink = pbFilename) > tempDX <- arrow::read_feather(file = pbFilename, as_data_frame = T) > file.exists(pbFilename) > file.remove(pbFilename) > >Warning message: > >In file.remove(pbFilename) : > >cannot remove file > >'C:/Martin/Repo/ReinforcementLearner/reproduceBug.feather', reason > > 'Permission denied' > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7783) [C++] ARROW_DATASET should enable ARROW_COMPUTE
[ https://issues.apache.org/jira/browse/ARROW-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7783: -- Labels: pull-request-available (was: ) > [C++] ARROW_DATASET should enable ARROW_COMPUTE > --- > > Key: ARROW-7783 > URL: https://issues.apache.org/jira/browse/ARROW-7783 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > > Currenty, passing {{-DARROW_DATASET=ON}} to CMake doesn't enable > ARROW_COMPUTE, which leads to linker errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7783) [C++] ARROW_DATASET should enable ARROW_COMPUTE
[ https://issues.apache.org/jira/browse/ARROW-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-7783: --- Assignee: Wes McKinney (was: Francois Saint-Jacques) > [C++] ARROW_DATASET should enable ARROW_COMPUTE > --- > > Key: ARROW-7783 > URL: https://issues.apache.org/jira/browse/ARROW-7783 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: Wes McKinney >Priority: Major > Fix For: 0.17.0 > > > Currenty, passing {{-DARROW_DATASET=ON}} to CMake doesn't enable > ARROW_COMPUTE, which leads to linker errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7605) [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a
[ https://issues.apache.org/jira/browse/ARROW-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-7605: Fix Version/s: (was: 0.17.0) 1.0.0 > [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a > --- > > Key: ARROW-7605 > URL: https://issues.apache.org/jira/browse/ARROW-7605 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > If ARROW_JEMALLOC=ON, then currently the libarrow.a cannot be used for static > linking without also obtaining libjemalloc_pic.a -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7605) [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a
[ https://issues.apache.org/jira/browse/ARROW-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069039#comment-17069039 ] Wes McKinney commented on ARROW-7605: - I haven't forgotten about this. This change is too risky to rush into 0.17.0 but I hope to have a patch ready for it in the near future that we can make sure it robust to different scenarios after 0.17.0 goes out. If someone wants to pick up the project from me, you are welcome to do so > [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a > --- > > Key: ARROW-7605 > URL: https://issues.apache.org/jira/browse/ARROW-7605 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > If ARROW_JEMALLOC=ON, then currently the libarrow.a cannot be used for static > linking without also obtaining libjemalloc_pic.a -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7605) [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a
[ https://issues.apache.org/jira/browse/ARROW-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-7605: Summary: [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a (was: [C++] Merge private je_arrow symbols into produced libarrow.a) > [C++] Merge jemalloc and other BUNDLED dependencies into libarrow.a > --- > > Key: ARROW-7605 > URL: https://issues.apache.org/jira/browse/ARROW-7605 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > If ARROW_JEMALLOC=ON, then currently the libarrow.a cannot be used for static > linking without also obtaining libjemalloc_pic.a -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6528) [C++] Spurious Flight test failures (port allocation failure)
[ https://issues.apache.org/jira/browse/ARROW-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6528. - Assignee: David Li Resolution: Fixed I'm closing as Resolved, the binding to port 0 changes should help -- if this occurs again we should reopen and then figure out where a port is failing to allocate > [C++] Spurious Flight test failures (port allocation failure) > - > > Key: ARROW-6528 > URL: https://issues.apache.org/jira/browse/ARROW-6528 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Assignee: David Li >Priority: Major > Fix For: 0.17.0 > > > Seems like our port allocation scheme inside unit tests is still not very > reliable :-/ > https://ci.ursalabs.org/#/builders/71/builds/4147/steps/8/logs/stdio > {code} > [--] 3 tests from TestMetadata > [ RUN ] TestMetadata.DoGet > E0905 12:45:40.322644527 10203 server_chttp2.cc:40] > {"created":"@1567687540.322612245","description":"No address added out of > total 1 > resolved","file":"../src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":394,"referenced_errors":[{"created":"@1567687540.322609844","description":"Unable > to configure > socket","fd":7,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":217,"referenced_errors":[{"created":"@1567687540.322602634","description":"Address > already in > use","errno":98,"file":"../src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":190,"os_error":"Address > already in use","syscall":"bind"}]}]} > ../src/arrow/flight/flight_test.cc:429: Failure > Failed > 'server->Init(options)' failed with Unknown error: Server did not start > properly > /buildbot/AMD64_Conda_Python_3_7/cpp/build-support/run-test.sh: line 97: > 10203 Segmentation fault (core dumped) $TEST_EXECUTABLE "$@" 2>&1 > 10204 Done| $ROOT/build-support/asan_symbolize.py > 10205 Done| ${CXXFILT:-c++filt} > 10206 Done| > $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE > 10207 Done| $pipe_cmd 2>&1 > 10208 Done| tee $LOGFILE > /buildbot/AMD64_Conda_Python_3_7/cpp/build/src/arrow/flight > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6895) [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader repeats returned values when calling `NextBatch()`
[ https://issues.apache.org/jira/browse/ARROW-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6895. - Resolution: Fixed Issue resolved by pull request 6460 [https://github.com/apache/arrow/pull/6460] > [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader > repeats returned values when calling `NextBatch()` > --- > > Key: ARROW-6895 > URL: https://issues.apache.org/jira/browse/ARROW-6895 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 > Environment: Linux 5.2.17-200.fc30.x86_64 (Docker) >Reporter: Adam Hooper >Assignee: Adam Hooper >Priority: Critical > Labels: pull-request-available > Fix For: 0.17.0 > > Attachments: 01-fix-arrow-6895.diff, bad.parquet, > reset-dictionary-on-read.diff, works.parquet > > Time Spent: 1.5h > Remaining Estimate: 0h > > Given most columns, I can run a loop like: > {code:cpp} > std::unique_ptr columnReader(/*...*/); > while (nRowsRemaining > 0) { > int n = std::min(100, nRowsRemaining); > std::shared_ptr chunkedArray; > auto status = columnReader->NextBatch(n, ); > // ... and then use `chunkedArray` > nRowsRemaining -= n; > } > {code} > (The context is: "convert Parquet to CSV/JSON, with small memory footprint." > Used in https://github.com/CJWorkbench/parquet-to-arrow) > Normally, the first {{NextBatch()}} return value looks like {{val0...val99}}; > the second return value looks like {{val100...val199}}; and so on. > ... but with a {{ByteArrayDictionaryRecordReader}}, that isn't the case. The > first {{NextBatch()}} return value looks like {{val0...val100}}; the second > return value looks like {{val0...val99, val100...val199}} (ChunkedArray with > two arrays); the third return value looks like {{val0...val99, > val100...val199, val200...val299}} (ChunkedArray with three arrays); and so > on. The returned arrays are never cleared. > In sum: {{NextBatch()}} on a dictionary column reader returns the wrong > values. > I've attached a minimal Parquet file that presents this problem with the > above code; and I've written a patch that fixes this one case, to illustrate > where things are wrong. I don't think I understand enough edge cases to > decree that my patch is a correct fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6895) [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader repeats returned values when calling `NextBatch()`
[ https://issues.apache.org/jira/browse/ARROW-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-6895: --- Assignee: Adam Hooper > [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader > repeats returned values when calling `NextBatch()` > --- > > Key: ARROW-6895 > URL: https://issues.apache.org/jira/browse/ARROW-6895 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 > Environment: Linux 5.2.17-200.fc30.x86_64 (Docker) >Reporter: Adam Hooper >Assignee: Adam Hooper >Priority: Critical > Labels: pull-request-available > Fix For: 0.17.0 > > Attachments: 01-fix-arrow-6895.diff, bad.parquet, > reset-dictionary-on-read.diff, works.parquet > > Time Spent: 1.5h > Remaining Estimate: 0h > > Given most columns, I can run a loop like: > {code:cpp} > std::unique_ptr columnReader(/*...*/); > while (nRowsRemaining > 0) { > int n = std::min(100, nRowsRemaining); > std::shared_ptr chunkedArray; > auto status = columnReader->NextBatch(n, ); > // ... and then use `chunkedArray` > nRowsRemaining -= n; > } > {code} > (The context is: "convert Parquet to CSV/JSON, with small memory footprint." > Used in https://github.com/CJWorkbench/parquet-to-arrow) > Normally, the first {{NextBatch()}} return value looks like {{val0...val99}}; > the second return value looks like {{val100...val199}}; and so on. > ... but with a {{ByteArrayDictionaryRecordReader}}, that isn't the case. The > first {{NextBatch()}} return value looks like {{val0...val100}}; the second > return value looks like {{val0...val99, val100...val199}} (ChunkedArray with > two arrays); the third return value looks like {{val0...val99, > val100...val199, val200...val299}} (ChunkedArray with three arrays); and so > on. The returned arrays are never cleared. > In sum: {{NextBatch()}} on a dictionary column reader returns the wrong > values. > I've attached a minimal Parquet file that presents this problem with the > above code; and I've written a patch that fixes this one case, to illustrate > where things are wrong. I don't think I understand enough edge cases to > decree that my patch is a correct fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6895) [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader repeats returned values when calling `NextBatch()`
[ https://issues.apache.org/jira/browse/ARROW-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-6895: --- Assignee: (was: Wes McKinney) > [C++][Parquet] parquet::arrow::ColumnReader: ByteArrayDictionaryRecordReader > repeats returned values when calling `NextBatch()` > --- > > Key: ARROW-6895 > URL: https://issues.apache.org/jira/browse/ARROW-6895 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.15.0 > Environment: Linux 5.2.17-200.fc30.x86_64 (Docker) >Reporter: Adam Hooper >Priority: Critical > Labels: pull-request-available > Fix For: 0.17.0 > > Attachments: 01-fix-arrow-6895.diff, bad.parquet, > reset-dictionary-on-read.diff, works.parquet > > Time Spent: 1.5h > Remaining Estimate: 0h > > Given most columns, I can run a loop like: > {code:cpp} > std::unique_ptr columnReader(/*...*/); > while (nRowsRemaining > 0) { > int n = std::min(100, nRowsRemaining); > std::shared_ptr chunkedArray; > auto status = columnReader->NextBatch(n, ); > // ... and then use `chunkedArray` > nRowsRemaining -= n; > } > {code} > (The context is: "convert Parquet to CSV/JSON, with small memory footprint." > Used in https://github.com/CJWorkbench/parquet-to-arrow) > Normally, the first {{NextBatch()}} return value looks like {{val0...val99}}; > the second return value looks like {{val100...val199}}; and so on. > ... but with a {{ByteArrayDictionaryRecordReader}}, that isn't the case. The > first {{NextBatch()}} return value looks like {{val0...val100}}; the second > return value looks like {{val0...val99, val100...val199}} (ChunkedArray with > two arrays); the third return value looks like {{val0...val99, > val100...val199, val200...val299}} (ChunkedArray with three arrays); and so > on. The returned arrays are never cleared. > In sum: {{NextBatch()}} on a dictionary column reader returns the wrong > values. > I've attached a minimal Parquet file that presents this problem with the > above code; and I've written a patch that fixes this one case, to illustrate > where things are wrong. I don't think I understand enough edge cases to > decree that my patch is a correct fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8231) [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema metadata
[ https://issues.apache.org/jira/browse/ARROW-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8231. - Fix Version/s: 0.17.0 Resolution: Fixed Issue resolved by pull request 6742 [https://github.com/apache/arrow/pull/6742] > [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema > metadata > > > Key: ARROW-8231 > URL: https://issues.apache.org/jira/browse/ARROW-8231 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The parquet-format FileMetaData struct contains optional key value pairs with > additional metadata about the schema: > [https://docs.rs/parquet-format/2.6.0/src/parquet_format/parquet_format.rs.html#3821] > When the parquet file was generated using the java avro parquet writer, this > for example contains the original avro schema under the `parquet.avro.schema` > or `avro.schema` keys. > It would be nice if this metadata was accessible through the > `arrow::datatypes::Schema.metadata` field. > I'm willing to implement and create a pull request for this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8231) [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema metadata
[ https://issues.apache.org/jira/browse/ARROW-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-8231: --- Assignee: Jörn Horstmann > [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema > metadata > > > Key: ARROW-8231 > URL: https://issues.apache.org/jira/browse/ARROW-8231 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Jörn Horstmann >Assignee: Jörn Horstmann >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The parquet-format FileMetaData struct contains optional key value pairs with > additional metadata about the schema: > [https://docs.rs/parquet-format/2.6.0/src/parquet_format/parquet_format.rs.html#3821] > When the parquet file was generated using the java avro parquet writer, this > for example contains the original avro schema under the `parquet.avro.schema` > or `avro.schema` keys. > It would be nice if this metadata was accessible through the > `arrow::datatypes::Schema.metadata` field. > I'm willing to implement and create a pull request for this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8243. --- Resolution: Fixed Issue resolved by pull request 6741 [https://github.com/apache/arrow/pull/6741] > [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder > -- > > Key: ARROW-8243 > URL: https://issues.apache.org/jira/browse/ARROW-8243 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > LogicalPlanBuilder project method takes a whereas other methods take a > Vec. It makes sense to take Vec and take ownership of these inputs since they > are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069004#comment-17069004 ] Caleb Overman commented on ARROW-8245: -- We're currently on 0.16.0 and have a patch to ignore directories with a . prefix. Happy to do a PR for this - are there any other known prefixes that should be ignored? > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > Fix For: 0.17.0 > > > When writing a partitioned parquet file Spark can create a temporary hidden > {{.spark-staging}} directory within the parquet file. Because it is a > directory and not a file, it is not skipped when trying to read the parquet > file. Pyarrow currently only skips directories prefixed with {{_}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors
[ https://issues.apache.org/jira/browse/ARROW-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8246: -- Labels: pull-request-available (was: ) > [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors > - > > Key: ARROW-8246 > URL: https://issues.apache.org/jira/browse/ARROW-8246 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > > See > https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/ > This seems to be the MinGW equivalent of {{/bigobj}} in MSVC -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors
[ https://issues.apache.org/jira/browse/ARROW-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-8246: --- Assignee: Wes McKinney > [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors > - > > Key: ARROW-8246 > URL: https://issues.apache.org/jira/browse/ARROW-8246 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.17.0 > > > See > https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/ > This seems to be the MinGW equivalent of {{/bigobj}} in MSVC -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8217) [R][C++] Fix crashing test in test-dataset.R on 32-bit Windows from ARROW-7979
[ https://issues.apache.org/jira/browse/ARROW-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068983#comment-17068983 ] Wes McKinney commented on ARROW-8217: - It sort of suggests that that the segfault is originating in the R bindings instead of the C++ library, which would be weird but I suppose it's possible. I think the debug build error can be resolved with ARROW-8246 > [R][C++] Fix crashing test in test-dataset.R on 32-bit Windows from ARROW-7979 > -- > > Key: ARROW-8217 > URL: https://issues.apache.org/jira/browse/ARROW-8217 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.17.0 > > > If we can obtain a gdb backtrace from the failed test in > https://github.com/apache/arrow/pull/6638 then we can sort out what's wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8247) [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table
[ https://issues.apache.org/jira/browse/ARROW-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8247: Description: This is a follow up to ARROW-7741 so we have a path to the old Parquet writer logic in the event that bugs are reported and we need to give users a workaround. Eventually this option will be removed once the prior writing code is removed (was: This is a follow up to ARROW-7741 so we have a path to the old Parquet writer logic in the event that bugs are reported and we need to give users a workaround) > [Python] Expose Parquet writing "engine" setting in > pyarrow.parquet.write_table > --- > > Key: ARROW-8247 > URL: https://issues.apache.org/jira/browse/ARROW-8247 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.17.0 > > > This is a follow up to ARROW-7741 so we have a path to the old Parquet writer > logic in the event that bugs are reported and we need to give users a > workaround. Eventually this option will be removed once the prior writing > code is removed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8247) [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table
Wes McKinney created ARROW-8247: --- Summary: [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table Key: ARROW-8247 URL: https://issues.apache.org/jira/browse/ARROW-8247 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.17.0 This is a follow up to ARROW-7741 so we have a path to the old Parquet writer logic in the event that bugs are reported and we need to give users a workaround -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors
Wes McKinney created ARROW-8246: --- Summary: [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors Key: ARROW-8246 URL: https://issues.apache.org/jira/browse/ARROW-8246 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.17.0 See https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/ This seems to be the MinGW equivalent of {{/bigobj}} in MSVC -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8245: Fix Version/s: 0.17.0 > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > Fix For: 0.17.0 > > > When writing a partitioned parquet file Spark can create a temporary hidden > {{.spark-staging}} directory within the parquet file. Because it is a > directory and not a file, it is not skipped when trying to read the parquet > file. Pyarrow currently only skips directories prefixed with {{_}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068974#comment-17068974 ] Wes McKinney commented on ARROW-8245: - Ah I see that the issue is that this exclusion is only applied to file paths. See https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L927 This should be easy to fix. Will also need to be handled in the C++ Datasets API cc [~jorisvandenbossche] > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > > When writing a partitioned parquet file Spark can create a temporary hidden > {{.spark-staging}} directory within the parquet file. Because it is a > directory and not a file, it is not skipped when trying to read the parquet > file. Pyarrow currently only skips directories prefixed with {{_}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068973#comment-17068973 ] Wes McKinney commented on ARROW-8245: - What version of the library are you using? https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L877 > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > > When writing a partitioned parquet file Spark can create a temporary hidden > {{.spark-staging}} directory within the parquet file. Because it is a > directory and not a file, it is not skipped when trying to read the parquet > file. Pyarrow currently only skips directories prefixed with {{_}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-3329) [Python] Error casting decimal(38, 4) to int64
[ https://issues.apache.org/jira/browse/ARROW-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068969#comment-17068969 ] Wes McKinney commented on ARROW-3329: - You need to clean temporary files out of the python/ directory with {{git clean -fdx python}}. This should be added to the documentation > [Python] Error casting decimal(38, 4) to int64 > -- > > Key: ARROW-3329 > URL: https://issues.apache.org/jira/browse/ARROW-3329 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Python version : 3.6.5 > Pyarrow version : 0.10.0 >Reporter: Kavita Sheth >Assignee: Jacek Pliszka >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Git issue LInk : https://github.com/apache/arrow/issues/2627 > I want to cast pyarrow table column from decimal(38,4) to int64. > col.cast(pa.int64()) > Error: > File "pyarrow/table.pxi", line 443, in pyarrow.lib.Column.cast > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: No cast implemented from decimal(38, > 4) to int64 > Python version : 3.6.5 > Pyarrow version : 0.10.0 > is it not implemented yet or I am not using it correctly? If not implemented > yet, then any work around to cast columns? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8070) [C++] Cast segfaults on unsupported cast from list to utf8
[ https://issues.apache.org/jira/browse/ARROW-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8070. - Resolution: Fixed Issue resolved by pull request 6738 [https://github.com/apache/arrow/pull/6738] > [C++] Cast segfaults on unsupported cast from list to utf8 > -- > > Key: ARROW-8070 > URL: https://issues.apache.org/jira/browse/ARROW-8070 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Daniel Nugent >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Was messing around with some nested arrays and found a pretty easy to > reproduce segfault: > {code:java} > Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) > [GCC 7.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np, pyarrow as pa > >>> pa.__version__ > '0.16.0' > >>> np.__version__ > '1.18.1' > >>> x=[np.array([b'a',b'b'])] > >>> a = pa.array(x,pa.list_(pa.binary())) > >>> a > > [ > [ > 61, > 62 > ] > ] > >>> a.cast(pa.string()) > Segmentation fault > {code} > I don't know if that cast makes sense, but I left the checks on, so I would > not expect a segfault from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8242) [C++] Flight fails to compile on GCC 4.8
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-8242. - Resolution: Fixed Issue resolved by pull request 6739 [https://github.com/apache/arrow/pull/6739] > [C++] Flight fails to compile on GCC 4.8 > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Blocker > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 50m > Remaining Estimate: 0h > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8217) [R][C++] Fix crashing test in test-dataset.R on 32-bit Windows from ARROW-7979
[ https://issues.apache.org/jira/browse/ARROW-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068931#comment-17068931 ] Neal Richardson commented on ARROW-8217: I did a build with -DARROW_CXXFLAGS="-g", no difference. That was on the C++ build only though, any reason to think it would matter to compile the R bindings with that too? Even after the hanging build was resolved on master, I still can't get a debug build: https://github.com/ursa-labs/arrow-r-nightly/runs/538449507?check_suite_focus=true#step:7:1126 {code} [ 16%] Building CXX object src/arrow/CMakeFiles/arrow_static.dir/record_batch.cc.obj C:/Rtools/mingw_64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.3/../../../../x86_64-w64-mingw32/bin/as.exe: CMakeFiles/arrow_static.dir/array/diff.cc.obj: too many sections (43914) D:\a\_temp\msys\msys64\tmp\ccqm015L.s: Assembler messages: D:\a\_temp\msys\msys64\tmp\ccqm015L.s: Fatal error: can't write CMakeFiles/arrow_static.dir/array/diff.cc.obj: File too big C:/Rtools/mingw_64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.3/../../../../x86_64-w64-mingw32/bin/as.exe: CMakeFiles/arrow_static.dir/array/diff.cc.obj: too many sections (43914) D:\a\_temp\msys\msys64\tmp\ccqm015L.s: Fatal error: can't close CMakeFiles/arrow_static.dir/array/diff.cc.obj: File too big {code} > [R][C++] Fix crashing test in test-dataset.R on 32-bit Windows from ARROW-7979 > -- > > Key: ARROW-8217 > URL: https://issues.apache.org/jira/browse/ARROW-8217 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.17.0 > > > If we can obtain a gdb backtrace from the failed test in > https://github.com/apache/arrow/pull/6638 then we can sort out what's wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068929#comment-17068929 ] Wes McKinney edited comment on ARROW-8238 at 3/27/20, 5:36 PM: --- Definitely weird since we build these tests in CI. If you are having a hard time figuring it out I can try to reproduce locally on my Windows 10 machine was (Author: wesmckinn): Definitely weird since we build these functions in CI. If you are having a hard time figuring it out I can try to reproduce locally on my Windows 10 machine > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068929#comment-17068929 ] Wes McKinney commented on ARROW-8238: - Definitely weird since we build these functions in CI. If you are having a hard time figuring it out I can try to reproduce locally on my Windows 10 machine > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7741) [C++][Parquet] Incorporate new level generation logic in parquet write path with a flag to revert back to old logic
[ https://issues.apache.org/jira/browse/ARROW-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-7741. - Fix Version/s: 0.17.0 Resolution: Fixed Issue resolved by pull request 6586 [https://github.com/apache/arrow/pull/6586] > [C++][Parquet] Incorporate new level generation logic in parquet write path > with a flag to revert back to old logic > --- > > Key: ARROW-7741 > URL: https://issues.apache.org/jira/browse/ARROW-7741 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Affects Versions: 0.17.0 >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > This is likely going to be a decent amount of changes we should isolate them > behind a feature flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8061) [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support row groups)
[ https://issues.apache.org/jira/browse/ARROW-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques resolved ARROW-8061. --- Fix Version/s: 0.17.0 Resolution: Fixed Issue resolved by pull request 6670 [https://github.com/apache/arrow/pull/6670] > [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support > row groups) > - > > Key: ARROW-8061 > URL: https://issues.apache.org/jira/browse/ARROW-8061 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Dataset >Reporter: Joris Van den Bossche >Assignee: Ben Kietzman >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Specifically for parquet (not sure if it will be relevant for other file > formats as well, for IPC/feather potentially ther record batch), it would be > useful to target row groups instead of files as fragments. > Quoting the original design documents: _"In datasets consisting of many > fragments, the dataset API must expose the granularity of fragments in a > public way to enable parallel processing, if desired. "._ > And a comment from Wes on that: _"a single Parquet file can "export" one or > more fragments based on settings. The default might be to split fragments > based on row group"_ > Currently, the level on which fragments are defined (at least in the typical > partitioned parquet dataset) is "1 file == 1 fragment". > Would it be possible or desirable to make this more fine grained, where you > could also opt to have a fragment per row group? > We could have a ParquetFragment that has this option, and a ParquetFileFormat > specific option to say what the granularity of a fragment is (file vs row > group)? > cc [~fsaintjacques] [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7908) [R] Can't install package without setting LIBARROW_DOWNLOAD=true
[ https://issues.apache.org/jira/browse/ARROW-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-7908. Fix Version/s: 0.17.0 Assignee: Neal Richardson Resolution: Fixed > [R] Can't install package without setting LIBARROW_DOWNLOAD=true > > > Key: ARROW-7908 > URL: https://issues.apache.org/jira/browse/ARROW-7908 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.16.0 > Environment: Operating System: Red Hat Enterprise Linux Server 7.6 > (Maipo) > CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server > Kernel: Linux 3.10.0-957.35.2.el7.x86_64 > Architecture: x86-64 >Reporter: Taeke >Assignee: Neal Richardson >Priority: Major > Fix For: 0.17.0 > > > Hi, > Installing arrow in R does not work intuitively on our server. > {code:r} > install.packages("arrow")` > {code} > results in an error: > {code:sh} > Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' > (as 'lib' is unspecified) > trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' > Content type 'application/x-gzip' length 216119 bytes (211 KB) > == > downloaded 211 KB > * installing *source* package 'arrow' ... > ** package 'arrow' successfully unpacked and MD5 sums checked > ** using staged installation > PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW > PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib > -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 > -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static > -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > In file included from array.cpp:18:0: > ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or > directory > {code} > It appears that the C++ code is not built. With arrow 0.16.0.1 things do work > out, because it tries to build the C++ code from source. With arrow 0.16.0.2 > such is no longer the case. I could finish the installation by setting the > environment variable LIBARROW_DOWNLOAD to 'true': > {code:java} > export LIBARROW_DOWNLOAD=true > {code} > That, apparently, triggers the build from source. I would have expected that > I would not need to set this variable explicitly. > I found that [between > versions|https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420], > the default value of LIBARROW_DOWNLOAD has changed: > {code:sh} > - download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false") > + download_ok <- env_is("LIBARROW_DOWNLOAD", "true") > {code} > In our environment, that variable was _not_ set, resulting (accidentally?) in > download_ok being false and therefore the libraries not being installed and > finally the resulting error above. > > I can't quite figure out the logic behind all this, but it would be nice if > we'd be able to install the package without first having to set > LIBARROW_DOWNLOAD. > > Thank you for looking into this! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7908) [R] Can't install package without setting LIBARROW_DOWNLOAD=true
[ https://issues.apache.org/jira/browse/ARROW-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068828#comment-17068828 ] Neal Richardson commented on ARROW-7908: Glad to hear! > [R] Can't install package without setting LIBARROW_DOWNLOAD=true > > > Key: ARROW-7908 > URL: https://issues.apache.org/jira/browse/ARROW-7908 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.16.0 > Environment: Operating System: Red Hat Enterprise Linux Server 7.6 > (Maipo) > CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server > Kernel: Linux 3.10.0-957.35.2.el7.x86_64 > Architecture: x86-64 >Reporter: Taeke >Priority: Major > > Hi, > Installing arrow in R does not work intuitively on our server. > {code:r} > install.packages("arrow")` > {code} > results in an error: > {code:sh} > Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' > (as 'lib' is unspecified) > trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' > Content type 'application/x-gzip' length 216119 bytes (211 KB) > == > downloaded 211 KB > * installing *source* package 'arrow' ... > ** package 'arrow' successfully unpacked and MD5 sums checked > ** using staged installation > PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW > PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib > -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 > -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static > -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > In file included from array.cpp:18:0: > ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or > directory > {code} > It appears that the C++ code is not built. With arrow 0.16.0.1 things do work > out, because it tries to build the C++ code from source. With arrow 0.16.0.2 > such is no longer the case. I could finish the installation by setting the > environment variable LIBARROW_DOWNLOAD to 'true': > {code:java} > export LIBARROW_DOWNLOAD=true > {code} > That, apparently, triggers the build from source. I would have expected that > I would not need to set this variable explicitly. > I found that [between > versions|https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420], > the default value of LIBARROW_DOWNLOAD has changed: > {code:sh} > - download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false") > + download_ok <- env_is("LIBARROW_DOWNLOAD", "true") > {code} > In our environment, that variable was _not_ set, resulting (accidentally?) in > download_ok being false and therefore the libraries not being installed and > finally the resulting error above. > > I can't quite figure out the logic behind all this, but it would be nice if > we'd be able to install the package without first having to set > LIBARROW_DOWNLOAD. > > Thank you for looking into this! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7908) [R] Can't install package without setting LIBARROW_DOWNLOAD=true
[ https://issues.apache.org/jira/browse/ARROW-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068827#comment-17068827 ] Taeke commented on ARROW-7908: -- Ah, I see. I did install from CRAN, but in order to work around the error of the missing codegen.R, i switched to github, which lead to the mismatch in version numbers. I could not find another way to get the installation starting (other than setting LIBARROW_DOWNLOAD, which I tried to avoid). Indeed, with the nightly build it does install properly. Thanks a lot! > [R] Can't install package without setting LIBARROW_DOWNLOAD=true > > > Key: ARROW-7908 > URL: https://issues.apache.org/jira/browse/ARROW-7908 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.16.0 > Environment: Operating System: Red Hat Enterprise Linux Server 7.6 > (Maipo) > CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server > Kernel: Linux 3.10.0-957.35.2.el7.x86_64 > Architecture: x86-64 >Reporter: Taeke >Priority: Major > > Hi, > Installing arrow in R does not work intuitively on our server. > {code:r} > install.packages("arrow")` > {code} > results in an error: > {code:sh} > Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' > (as 'lib' is unspecified) > trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' > Content type 'application/x-gzip' length 216119 bytes (211 KB) > == > downloaded 211 KB > * installing *source* package 'arrow' ... > ** package 'arrow' successfully unpacked and MD5 sums checked > ** using staged installation > PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW > PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib > -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 > -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static > -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > In file included from array.cpp:18:0: > ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or > directory > {code} > It appears that the C++ code is not built. With arrow 0.16.0.1 things do work > out, because it tries to build the C++ code from source. With arrow 0.16.0.2 > such is no longer the case. I could finish the installation by setting the > environment variable LIBARROW_DOWNLOAD to 'true': > {code:java} > export LIBARROW_DOWNLOAD=true > {code} > That, apparently, triggers the build from source. I would have expected that > I would not need to set this variable explicitly. > I found that [between > versions|https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420], > the default value of LIBARROW_DOWNLOAD has changed: > {code:sh} > - download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false") > + download_ok <- env_is("LIBARROW_DOWNLOAD", "true") > {code} > In our environment, that variable was _not_ set, resulting (accidentally?) in > download_ok being false and therefore the libraries not being installed and > finally the resulting error above. > > I can't quite figure out the logic behind all this, but it would be nice if > we'd be able to install the package without first having to set > LIBARROW_DOWNLOAD. > > Thank you for looking into this! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Overman updated ARROW-8245: - Description: When writing a partitioned parquet file Spark can create a temporary hidden {{.spark-staging}} directory within the parquet file. Because it is a directory and not a file, it is not skipped when trying to read the parquet file. Pyarrow currently only skips directories prefixed with {{_}}. (was: When writing a partitioned parquet file Spark can create a temporary hidden `.spark-staging` directory within the parquet file. Because it is a directory and not a file, it is not skipped when trying to read the parquet file. Pyarrow currently only skips directories prefixed with `_`.) > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > > When writing a partitioned parquet file Spark can create a temporary hidden > {{.spark-staging}} directory within the parquet file. Because it is a > directory and not a file, it is not skipped when trying to read the parquet > file. Pyarrow currently only skips directories prefixed with {{_}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7908) [R] Can't install package without setting LIBARROW_DOWNLOAD=true
[ https://issues.apache.org/jira/browse/ARROW-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068806#comment-17068806 ] Neal Richardson commented on ARROW-7908: codegen.R and decor are not required, despite the error messages they throw: they're allowed to fail. I'm not sure why you're seeing a version number of .9000 unless you're installing from git/github, not CRAN. In any case, I believe installation should work now on the latest dev version. Could you try installing (no env vars required) from our nightly repository, {{install.packages("arrow", repos="https://dl.bintray.com/ursalabs/arrow-r;)}}? > [R] Can't install package without setting LIBARROW_DOWNLOAD=true > > > Key: ARROW-7908 > URL: https://issues.apache.org/jira/browse/ARROW-7908 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.16.0 > Environment: Operating System: Red Hat Enterprise Linux Server 7.6 > (Maipo) > CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server > Kernel: Linux 3.10.0-957.35.2.el7.x86_64 > Architecture: x86-64 >Reporter: Taeke >Priority: Major > > Hi, > Installing arrow in R does not work intuitively on our server. > {code:r} > install.packages("arrow")` > {code} > results in an error: > {code:sh} > Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' > (as 'lib' is unspecified) > trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' > Content type 'application/x-gzip' length 216119 bytes (211 KB) > == > downloaded 211 KB > * installing *source* package 'arrow' ... > ** package 'arrow' successfully unpacked and MD5 sums checked > ** using staged installation > PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW > PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib > -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 > -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static > -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" > -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 > -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o > In file included from array.cpp:18:0: > ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or > directory > {code} > It appears that the C++ code is not built. With arrow 0.16.0.1 things do work > out, because it tries to build the C++ code from source. With arrow 0.16.0.2 > such is no longer the case. I could finish the installation by setting the > environment variable LIBARROW_DOWNLOAD to 'true': > {code:java} > export LIBARROW_DOWNLOAD=true > {code} > That, apparently, triggers the build from source. I would have expected that > I would not need to set this variable explicitly. > I found that [between > versions|https://github.com/apache/arrow/commit/660d0e7cbaa1cfb51498299d445636fdd6a58420], > the default value of LIBARROW_DOWNLOAD has changed: > {code:sh} > - download_ok <- locally_installing && !env_is("LIBARROW_DOWNLOAD", "false") > + download_ok <- env_is("LIBARROW_DOWNLOAD", "true") > {code} > In our environment, that variable was _not_ set, resulting (accidentally?) in > download_ok being false and therefore the libraries not being installed and > finally the resulting error above. > > I can't quite figure out the logic behind all this, but it would be nice if > we'd be able to install the package without first having to set > LIBARROW_DOWNLOAD. > > Thank you for looking into this! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8245) [Python] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Overman updated ARROW-8245: - Labels: parquet (was: ) > [Python] Skip hidden directories when reading partitioned parquet files > --- > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > > When writing a partitioned parquet file Spark can create a temporary hidden > `.spark-staging` directory within the parquet file. Because it is a directory > and not a file, it is not skipped when trying to read the parquet file. > Pyarrow currently only skips directories prefixed with `_`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Overman updated ARROW-8245: - Summary: [Python][Parquet] Skip hidden directories when reading partitioned parquet files (was: [Python] Skip hidden directories when reading partitioned parquet files) > [Python][Parquet] Skip hidden directories when reading partitioned parquet > files > > > Key: ARROW-8245 > URL: https://issues.apache.org/jira/browse/ARROW-8245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Caleb Overman >Priority: Minor > Labels: parquet > > When writing a partitioned parquet file Spark can create a temporary hidden > `.spark-staging` directory within the parquet file. Because it is a directory > and not a file, it is not skipped when trying to read the parquet file. > Pyarrow currently only skips directories prefixed with `_`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8245) [Python] Skip hidden directories when reading partitioned parquet files
Caleb Overman created ARROW-8245: Summary: [Python] Skip hidden directories when reading partitioned parquet files Key: ARROW-8245 URL: https://issues.apache.org/jira/browse/ARROW-8245 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Caleb Overman When writing a partitioned parquet file Spark can create a temporary hidden `.spark-staging` directory within the parquet file. Because it is a directory and not a file, it is not skipped when trying to read the parquet file. Pyarrow currently only skips directories prefixed with `_`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068794#comment-17068794 ] Joris Van den Bossche commented on ARROW-8244: -- Thanks for opening the issue [~rjzamora] Agreed this is a problem, and I think we should at least also return the path (so it can be fixed afterwards), or otherwise set it ourselves (optionally). Regarding those different options: starting to also return the path together with the metadata is not really backwards compatible, so we would need to add additional keyword like `path_collector` in addition to `metadata_collector`. For simply always populating the file path, that might depend on whether there are other use cases for collecting this metadata (although I assume dask is the main user of this keyword). A github search turned up dask, cudf and spatialpandas as users of the `metadata_collector` keyword. I assume `cudf` needs the same fix as dask. I didn't check yet how it's used in spatialpandas. I suppose optionally populating it is the safest, I am only doubtful that having it optional behind a new keyword is actually useful (whether there are use cases for not wanting to populate it). > [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" > metadata fields > --- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Rick Zamora >Priority: Minor > Labels: parquet > Fix For: 0.17.0 > > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Fix Version/s: 0.17.0 > [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" > metadata fields > --- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish >Reporter: Rick Zamora >Priority: Minor > Fix For: 0.17.0 > > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Labels: parquet (was: ) > [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" > metadata fields > --- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Rick Zamora >Priority: Minor > Labels: parquet > Fix For: 0.17.0 > > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Component/s: Python > [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" > metadata fields > --- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Rick Zamora >Priority: Minor > Fix For: 0.17.0 > > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Zamora updated ARROW-8244: --- Summary: [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields (was: [Python] Add `write_to_dataset` option to populate the "file_path" metadata fields) > [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" > metadata fields > --- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish >Reporter: Rick Zamora >Priority: Minor > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8244) [Python] Add `write_to_dataset` option to populate the "file_path" metadata fields
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Zamora updated ARROW-8244: --- Summary: [Python] Add `write_to_dataset` option to populate the "file_path" metadata fields (was: Add `write_to_dataset` option to populate the "file_path" metadata fields) > [Python] Add `write_to_dataset` option to populate the "file_path" metadata > fields > -- > > Key: ARROW-8244 > URL: https://issues.apache.org/jira/browse/ARROW-8244 > Project: Apache Arrow > Issue Type: Wish >Reporter: Rick Zamora >Priority: Minor > > Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been > using the `write_to_dataset` API to write partitioned parquet datasets. This > PR is switching to a (hopefully temporary) custom solution, because that API > makes it difficult to populate the the "file_path" column-chunk metadata > fields that are returned within the optional `metadata_collector` kwarg. > Dask needs to set these fields correctly in order to generate a proper global > `"_metadata"` file. > Possible solutions to this problem: > # Optionally populate the file-path fields within `write_to_dataset` > # Always populate the file-path fields within `write_to_dataset` > # Return the file paths for the data written within `write_to_dataset` (up > to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8244) Add `write_to_dataset` option to populate the "file_path" metadata fields
Rick Zamora created ARROW-8244: -- Summary: Add `write_to_dataset` option to populate the "file_path" metadata fields Key: ARROW-8244 URL: https://issues.apache.org/jira/browse/ARROW-8244 Project: Apache Arrow Issue Type: Wish Reporter: Rick Zamora Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been using the `write_to_dataset` API to write partitioned parquet datasets. This PR is switching to a (hopefully temporary) custom solution, because that API makes it difficult to populate the the "file_path" column-chunk metadata fields that are returned within the optional `metadata_collector` kwarg. Dask needs to set these fields correctly in order to generate a proper global `"_metadata"` file. Possible solutions to this problem: # Optionally populate the file-path fields within `write_to_dataset` # Always populate the file-path fields within `write_to_dataset` # Return the file paths for the data written within `write_to_dataset` (up to the user to manually populate the file-path fields) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7941) [Rust] [DataFusion] Logical plan should support unresolved column references
[ https://issues.apache.org/jira/browse/ARROW-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-7941: -- Description: It should be possible to build a logical plan using colum names rather than indices since it is more intuitive. There should be an optimizer rule that resolves the columns and replaces these unresolved columns with column indices. was: I made a mistake in the design of the logical plan. It is better to refer to columns by name rather than index. Benefits of making this change: * Allows for support for schemaless data sources e.g. JSON * Reduces the complexity of the optimizer rules > [Rust] [DataFusion] Logical plan should support unresolved column references > > > Key: ARROW-7941 > URL: https://issues.apache.org/jira/browse/ARROW-7941 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > It should be possible to build a logical plan using colum names rather than > indices since it is more intuitive. There should be an optimizer rule that > resolves the columns and replaces these unresolved columns with column > indices. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7941) [Rust] [DataFusion] Logical plan should support unresolved column references
[ https://issues.apache.org/jira/browse/ARROW-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-7941: -- Summary: [Rust] [DataFusion] Logical plan should support unresolved column references (was: [Rust] [DataFusion] Logical plan should refer to columns by name not index) > [Rust] [DataFusion] Logical plan should support unresolved column references > > > Key: ARROW-7941 > URL: https://issues.apache.org/jira/browse/ARROW-7941 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I made a mistake in the design of the logical plan. It is better to refer to > columns by name rather than index. > Benefits of making this change: > * Allows for support for schemaless data sources e.g. JSON > * Reduces the complexity of the optimizer rules > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8243: -- Component/s: Rust - DataFusion Rust > [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder > -- > > Key: ARROW-8243 > URL: https://issues.apache.org/jira/browse/ARROW-8243 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > LogicalPlanBuilder project method takes a whereas other methods take a > Vec. It makes sense to take Vec and take ownership of these inputs since they > are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8243: -- Fix Version/s: 0.17.0 > [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder > -- > > Key: ARROW-8243 > URL: https://issues.apache.org/jira/browse/ARROW-8243 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > LogicalPlanBuilder project method takes a whereas other methods take a > Vec. It makes sense to take Vec and take ownership of these inputs since they > are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4957) [Rust] [DataFusion] Implement get_supertype correctly
[ https://issues.apache.org/jira/browse/ARROW-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-4957: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] Implement get_supertype correctly > - > > Key: ARROW-4957 > URL: https://issues.apache.org/jira/browse/ARROW-4957 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.13.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: beginner, pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current implementation of get_supertype (used in type coercion logic) is > very hacky and should be re-implemented with better unit tests as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8205) [Rust] Arrow should enforce unique field names in a schema
[ https://issues.apache.org/jira/browse/ARROW-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8205: -- Description: There does not seem to be any validation to avoid schemas being created with duplicate field names. We should add this along with unit tests. This will require changing the signature of the constructors to try_new with a Result return type. was:There does not seem to be any validation to avoid schemas being created with duplicate field names. We should add this along with unit tests. > [Rust] Arrow should enforce unique field names in a schema > -- > > Key: ARROW-8205 > URL: https://issues.apache.org/jira/browse/ARROW-8205 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.16.0 >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > There does not seem to be any validation to avoid schemas being created with > duplicate field names. We should add this along with unit tests. > This will require changing the signature of the constructors to try_new with > a Result return type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8205) [Rust] Arrow should enforce unique field names in a schema
[ https://issues.apache.org/jira/browse/ARROW-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8205: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] Arrow should enforce unique field names in a schema > -- > > Key: ARROW-8205 > URL: https://issues.apache.org/jira/browse/ARROW-8205 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.16.0 >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > There does not seem to be any validation to avoid schemas being created with > duplicate field names. We should add this along with unit tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8231) [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema metadata
[ https://issues.apache.org/jira/browse/ARROW-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8231: -- Labels: pull-request-available (was: ) > [Rust] Parse key_value_metadata from parquet FileMetaData into arrow schema > metadata > > > Key: ARROW-8231 > URL: https://issues.apache.org/jira/browse/ARROW-8231 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Jörn Horstmann >Priority: Minor > Labels: pull-request-available > > The parquet-format FileMetaData struct contains optional key value pairs with > additional metadata about the schema: > [https://docs.rs/parquet-format/2.6.0/src/parquet_format/parquet_format.rs.html#3821] > When the parquet file was generated using the java avro parquet writer, this > for example contains the original avro schema under the `parquet.avro.schema` > or `avro.schema` keys. > It would be nice if this metadata was accessible through the > `arrow::datatypes::Schema.metadata` field. > I'm willing to implement and create a pull request for this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-7507) [Rust] Bump Thrift version to 0.13 in parquet-format and parquet
[ https://issues.apache.org/jira/browse/ARROW-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove closed ARROW-7507. - Resolution: Not A Problem This issue was resolved by running cargo update IIRC > [Rust] Bump Thrift version to 0.13 in parquet-format and parquet > > > Key: ARROW-7507 > URL: https://issues.apache.org/jira/browse/ARROW-7507 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.15.1 >Reporter: Mahmut Bulut >Assignee: Andy Grove >Priority: Major > Labels: parquet > Fix For: 0.17.0 > > > *Problem Description* > Currently, `byteorder` crate changes is not incorporated in both > `parquet-format` and `parquet` crates. Both should have consistently updated > to the thrift 0.13 in reverse order(first parquet-format then parquet) to > update the dependencies which are using older versions. > This makes clashing versions from other crates that are following the > upstream. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7681) [Rust] Explicitly seeking a BufReader will discard the internal buffer
[ https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068757#comment-17068757 ] Andy Grove commented on ARROW-7681: --- Deferring this to 1.0.0 due to the concerns about the PR adding further dependencies on unstable Rust features > [Rust] Explicitly seeking a BufReader will discard the internal buffer > -- > > Key: ARROW-7681 > URL: https://issues.apache.org/jira/browse/ARROW-7681 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This behavior was observed in the Parquet Rust file reader > (parquet/src/util/io.rs). > > Pull request: [https://github.com/apache/arrow/pull/6280] > > From the Rust documentation for BufReader: > > "Seeking always discards the internal buffer, even if the seek position would > otherwise fall within it. This guarantees that calling {{.into_inner()}} > immediately after a seek yields the underlying reader at the same position." > > [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7681) [Rust] Explicitly seeking a BufReader will discard the internal buffer
[ https://issues.apache.org/jira/browse/ARROW-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-7681: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] Explicitly seeking a BufReader will discard the internal buffer > -- > > Key: ARROW-7681 > URL: https://issues.apache.org/jira/browse/ARROW-7681 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This behavior was observed in the Parquet Rust file reader > (parquet/src/util/io.rs). > > Pull request: [https://github.com/apache/arrow/pull/6280] > > From the Rust documentation for BufReader: > > "Seeking always discards the internal buffer, even if the seek position would > otherwise fall within it. This guarantees that calling {{.into_inner()}} > immediately after a seek yields the underlying reader at the same position." > > [https://doc.rust-lang.org/std/io/struct.BufReader.html#impl-Seek] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6583) [Rust] Question and Request for Examples of Array Operations
[ https://issues.apache.org/jira/browse/ARROW-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-6583: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] Question and Request for Examples of Array Operations > > > Key: ARROW-6583 > URL: https://issues.apache.org/jira/browse/ARROW-6583 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Arthur Maciejewicz >Priority: Minor > Fix For: 1.0.0 > > > Hi all, thank you for your excellent work on Arrow. > As I was going through the example for the Rust Arrow implementation, > specifically the read_csv example > [https://github.com/apache/arrow/blob/master/rust/arrow/examples/read_csv.rs] > , as well as the generated Rustdocs, and unit tests, it was not quite clear > what the intended usage is for operations such as filtering and masking over > Arrays. > One particular use-case I'm interested in is finding all values in an Array > such that x >= N for all x. I came across arrow::compute::array_ops::filter, > which seems to be similar to what I want, although it's expecting a mask to > already be constructed before performing the filter operation, and it was not > obviously visible in the documentation, leading me to believe this might not > be idiomatic usage. > More generally, is the expectation for Arrays on the Rust side that they are > just simple data abstractions, without exposing higher-order methods such as > filtering/masking? Is the intent to leave that to users? If I missed some > piece of documentation, please let me know. For my use-case I ended up trying > something like: > {code:java} > let column = batch.column(0).as_any().downcast_ref::().unwrap(); > let mut builder = BooleanBuilder::new(batch.num_rows()); > let N = 5.0; > for i in 0..batch.num_rows() { >if column.value(i).unwrap() > N { > builder.append_value(true).unwrap(); >} else { > builder.append_value(false).unwrap(); >} > } > let mask = builder.finish(); > let filtered_column = filter(column, mask);{code} > If possible, could you provide examples of intended usage of Arrays? Thank > you! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8242) [C++] Flight fails to compile on GCC 4.8
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8242: --- Priority: Blocker (was: Major) > [C++] Flight fails to compile on GCC 4.8 > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Blocker > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8242) [C++] Flight fails to compile on GCC 4.8
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8242: --- Fix Version/s: 0.17.0 > [C++] Flight fails to compile on GCC 4.8 > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Blocker > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6890) [Rust] [Parquet] ArrowReader fails with seg fault
[ https://issues.apache.org/jira/browse/ARROW-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove closed ARROW-6890. - Resolution: Fixed This was fixed some time ago > [Rust] [Parquet] ArrowReader fails with seg fault > - > > Key: ARROW-6890 > URL: https://issues.apache.org/jira/browse/ARROW-6890 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Renjie Liu >Priority: Major > Fix For: 0.17.0 > > > ArrowReader fails with seg fault when trying to read an unsupported type, > like Utf8. We should have it return an Err instead of causing a segmentation > fault. > > See [https://github.com/apache/arrow/pull/5641] for a reproducible test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8243: -- Labels: pull-request-available (was: ) > [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder > -- > > Key: ARROW-8243 > URL: https://issues.apache.org/jira/browse/ARROW-8243 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > > LogicalPlanBuilder project method takes a whereas other methods take a > Vec. It makes sense to take Vec and take ownership of these inputs since they > are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder
Andy Grove created ARROW-8243: - Summary: [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder Key: ARROW-8243 URL: https://issues.apache.org/jira/browse/ARROW-8243 Project: Apache Arrow Issue Type: Improvement Reporter: Andy Grove Assignee: Andy Grove LogicalPlanBuilder project method takes a whereas other methods take a Vec. It makes sense to take Vec and take ownership of these inputs since they are being used to build the plan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8241) [Rust] Add convenience methods to Schema
[ https://issues.apache.org/jira/browse/ARROW-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8241: -- Labels: pull-request-available (was: ) > [Rust] Add convenience methods to Schema > > > Key: ARROW-8241 > URL: https://issues.apache.org/jira/browse/ARROW-8241 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > > I would like to add the following methods to Schema to make it easier to work > with. > > {code:java} > pub fn field_with_name(, name: ) -> Result<>; > pub fn index_of(, name: ) -> Result; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8242) [C++] Flight fails to compile on GCC 4.8
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8242: -- Labels: pull-request-available (was: ) > [C++] Flight fails to compile on GCC 4.8 > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8242) [C++] Flight fails to compile on GCC 4.8
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8242: --- Summary: [C++] Flight fails to compile on GCC 4.8 (was: [C++] GCC 4.8 fails to compileFlight) > [C++] Flight fails to compile on GCC 4.8 > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8242) [C++] GCC 4.8 fails to compileFlight
[ https://issues.apache.org/jira/browse/ARROW-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8242: --- Summary: [C++] GCC 4.8 fails to compileFlight (was: [C++] GCC 4.8 fails to compile Flight) > [C++] GCC 4.8 fails to compileFlight > > > Key: ARROW-8242 > URL: https://issues.apache.org/jira/browse/ARROW-8242 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > See recent build log > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8242) [C++] GCC 4.8 fails to compile Flight
Krisztian Szucs created ARROW-8242: -- Summary: [C++] GCC 4.8 fails to compile Flight Key: ARROW-8242 URL: https://issues.apache.org/jira/browse/ARROW-8242 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Krisztian Szucs Assignee: Krisztian Szucs See recent build log https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8241) [Rust] Add convenience methods to Schema
[ https://issues.apache.org/jira/browse/ARROW-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8241: -- Summary: [Rust] Add convenience methods to Schema (was: Add convenience methods to Schema) > [Rust] Add convenience methods to Schema > > > Key: ARROW-8241 > URL: https://issues.apache.org/jira/browse/ARROW-8241 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I would like to add the following methods to Schema to make it easier to work > with. > > {code:java} > pub fn field_with_name(, name: ) -> Result<>; > pub fn index_of(, name: ) -> Result; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8241) Add convenience methods to Schema
Andy Grove created ARROW-8241: - Summary: Add convenience methods to Schema Key: ARROW-8241 URL: https://issues.apache.org/jira/browse/ARROW-8241 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 I would like to add the following methods to Schema to make it easier to work with. {code:java} pub fn field_with_name(, name: ) -> Result<>; pub fn index_of(, name: ) -> Result; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)
[ https://issues.apache.org/jira/browse/ARROW-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068693#comment-17068693 ] Antoine Pitrou commented on ARROW-8240: --- cc [~kszucs] > [Python] New FS interface (pyarrow.fs) does not seem to work correctly for > HDFS (Python 3.6, pyarrow 0.16.0) > > > Key: ARROW-8240 > URL: https://issues.apache.org/jira/browse/ARROW-8240 > Project: Apache Arrow > Issue Type: Bug >Reporter: Yaqub Alwan >Priority: Major > > I'll preface this with the limited setup I had to do: > {{export CLASSPATH=$(hadoop classpath --glob)}} > {{export > ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib64}} > > Then I ran the following: > {code} > In [1]: import pyarrow.fs > > > > In [2]: c = pyarrow.fs.HadoopFileSystem() > > > > In [3]: sel = pyarrow.fs.FileSelector('/user/rwiumli') > > > > In [4]: c.get_target_stats(sel) > > > > --- > OSError Traceback (most recent call last) > in > > 1 c.get_target_stats(sel) > ~/tmp/venv/lib/python3.6/site-packages/pyarrow/_fs.pyx in > pyarrow._fs.FileSystem.get_target_stats() > ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in > pyarrow.lib.pyarrow_internal_check_status() > ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() > OSError: HDFS list directory failed, errno: 2 (No such file or directory) > In [5]: sel = pyarrow.fs.FileSelector('.') > > > > In [6]: c.get_target_stats(sel) > > > > Out[6]: > [, > , > ] > In [7]: !ls > > > > sample.py sandeep venv > In [8]: > {code} > It looks like the new hadoop fs interface is doing a local lookup? > Ok fine... > {code} > In [8]: sel = pyarrow.fs.FileSelector('hdfs:///user/rwiumli') # shouldnt have > to do this > > > In [9]: c.get_target_stats(sel) > > > > hdfsGetPathInfo(hdfs:///user/rwiumli): getFileInfo error: > IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: > file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, > expected: file:/// > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:593) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418) > hdfsListDirectory(hdfs:///user/rwiumli): FileSystem#listStatus error: > IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: >
[jira] [Updated] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)
[ https://issues.apache.org/jira/browse/ARROW-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaqub Alwan updated ARROW-8240: --- Description: I'll preface this with the limited setup I had to do: {{export CLASSPATH=$(hadoop classpath --glob)}} {{export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib64}} Then I ran the following: {code} In [1]: import pyarrow.fs In [2]: c = pyarrow.fs.HadoopFileSystem() In [3]: sel = pyarrow.fs.FileSelector('/user/rwiumli') In [4]: c.get_target_stats(sel) --- OSError Traceback (most recent call last) in > 1 c.get_target_stats(sel) ~/tmp/venv/lib/python3.6/site-packages/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.get_target_stats() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: HDFS list directory failed, errno: 2 (No such file or directory) In [5]: sel = pyarrow.fs.FileSelector('.') In [6]: c.get_target_stats(sel) Out[6]: [, , ] In [7]: !ls sample.py sandeep venv In [8]: {code} It looks like the new hadoop fs interface is doing a local lookup? Ok fine... {code} In [8]: sel = pyarrow.fs.FileSelector('hdfs:///user/rwiumli') # shouldnt have to do this In [9]: c.get_target_stats(sel) hdfsGetPathInfo(hdfs:///user/rwiumli): getFileInfo error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418) hdfsListDirectory(hdfs:///user/rwiumli): FileSystem#listStatus error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:410) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1566) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1609) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:667) --- OSError
[jira] [Commented] (ARROW-3329) [Python] Error casting decimal(38, 4) to int64
[ https://issues.apache.org/jira/browse/ARROW-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068688#comment-17068688 ] Jacek Pliszka commented on ARROW-3329: -- OK, tried that but there are errors there too 1 it also is inconsistent with pushd arrow 2 pushd arrow/cpp/build before cmake should be without build # libbz2 is missing even though it was not missing with pip # same error at the end python setup.py build_ext --inplace -- Running cmake --build for pyarrow cmake --build . --config release -- Error: could not load cache error: command 'cmake' failed with exit status 1 > [Python] Error casting decimal(38, 4) to int64 > -- > > Key: ARROW-3329 > URL: https://issues.apache.org/jira/browse/ARROW-3329 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Python version : 3.6.5 > Pyarrow version : 0.10.0 >Reporter: Kavita Sheth >Assignee: Jacek Pliszka >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Git issue LInk : https://github.com/apache/arrow/issues/2627 > I want to cast pyarrow table column from decimal(38,4) to int64. > col.cast(pa.int64()) > Error: > File "pyarrow/table.pxi", line 443, in pyarrow.lib.Column.cast > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: No cast implemented from decimal(38, > 4) to int64 > Python version : 3.6.5 > Pyarrow version : 0.10.0 > is it not implemented yet or I am not using it correctly? If not implemented > yet, then any work around to cast columns? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)
Yaqub Alwan created ARROW-8240: -- Summary: [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0) Key: ARROW-8240 URL: https://issues.apache.org/jira/browse/ARROW-8240 Project: Apache Arrow Issue Type: Bug Reporter: Yaqub Alwan I'll preface this with the limited setup I had to do: {{export CLASSPATH=$(hadoop classpath --glob)}} {{export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib64}} Then I ran the following: {{code}} In [1]: import pyarrow.fs In [2]: c = pyarrow.fs.HadoopFileSystem() In [3]: sel = pyarrow.fs.FileSelector('/user/rwiumli') In [4]: c.get_target_stats(sel) --- OSError Traceback (most recent call last) in > 1 c.get_target_stats(sel) ~/tmp/venv/lib/python3.6/site-packages/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.get_target_stats() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: HDFS list directory failed, errno: 2 (No such file or directory) In [5]: sel = pyarrow.fs.FileSelector('.') In [6]: c.get_target_stats(sel) Out[6]: [, , ] In [7]: !ls sample.py sandeep venv In [8]: {{code}} It looks like the new hadoop fs interface is doing a local lookup? Ok fine... {{code}} In [8]: sel = pyarrow.fs.FileSelector('hdfs:///user/rwiumli') # shouldnt have to do this In [9]: c.get_target_stats(sel) hdfsGetPathInfo(hdfs:///user/rwiumli): getFileInfo error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:593) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418) hdfsListDirectory(hdfs:///user/rwiumli): FileSystem#listStatus error: IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:410) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1566) at
[jira] [Updated] (ARROW-8070) [C++] Cast segfaults on unsupported cast from list to utf8
[ https://issues.apache.org/jira/browse/ARROW-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8070: -- Labels: pull-request-available (was: ) > [C++] Cast segfaults on unsupported cast from list to utf8 > -- > > Key: ARROW-8070 > URL: https://issues.apache.org/jira/browse/ARROW-8070 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Daniel Nugent >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > > Was messing around with some nested arrays and found a pretty easy to > reproduce segfault: > {code:java} > Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) > [GCC 7.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np, pyarrow as pa > >>> pa.__version__ > '0.16.0' > >>> np.__version__ > '1.18.1' > >>> x=[np.array([b'a',b'b'])] > >>> a = pa.array(x,pa.list_(pa.binary())) > >>> a > > [ > [ > 61, > 62 > ] > ] > >>> a.cast(pa.string()) > Segmentation fault > {code} > I don't know if that cast makes sense, but I left the checks on, so I would > not expect a segfault from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8070) [C++] Cast segfaults on unsupported cast from list to utf8
[ https://issues.apache.org/jira/browse/ARROW-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8070: --- Summary: [C++] Cast segfaults on unsupported cast from list to utf8 (was: [Python] Array.cast segfaults on unsupported cast from list to utf8) > [C++] Cast segfaults on unsupported cast from list to utf8 > -- > > Key: ARROW-8070 > URL: https://issues.apache.org/jira/browse/ARROW-8070 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Daniel Nugent >Assignee: Krisztian Szucs >Priority: Major > Fix For: 0.17.0 > > > Was messing around with some nested arrays and found a pretty easy to > reproduce segfault: > {code:java} > Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) > [GCC 7.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np, pyarrow as pa > >>> pa.__version__ > '0.16.0' > >>> np.__version__ > '1.18.1' > >>> x=[np.array([b'a',b'b'])] > >>> a = pa.array(x,pa.list_(pa.binary())) > >>> a > > [ > [ > 61, > 62 > ] > ] > >>> a.cast(pa.string()) > Segmentation fault > {code} > I don't know if that cast makes sense, but I left the checks on, so I would > not expect a segfault from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8070) [C++] Cast segfaults on unsupported cast from list to utf8
[ https://issues.apache.org/jira/browse/ARROW-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-8070: --- Component/s: (was: Python) C++ > [C++] Cast segfaults on unsupported cast from list to utf8 > -- > > Key: ARROW-8070 > URL: https://issues.apache.org/jira/browse/ARROW-8070 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Daniel Nugent >Assignee: Krisztian Szucs >Priority: Major > Fix For: 0.17.0 > > > Was messing around with some nested arrays and found a pretty easy to > reproduce segfault: > {code:java} > Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) > [GCC 7.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np, pyarrow as pa > >>> pa.__version__ > '0.16.0' > >>> np.__version__ > '1.18.1' > >>> x=[np.array([b'a',b'b'])] > >>> a = pa.array(x,pa.list_(pa.binary())) > >>> a > > [ > [ > 61, > 62 > ] > ] > >>> a.cast(pa.string()) > Segmentation fault > {code} > I don't know if that cast makes sense, but I left the checks on, so I would > not expect a segfault from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7908) [R] Can't install package without setting LIBARROW_DOWNLOAD=true
[ https://issues.apache.org/jira/browse/ARROW-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068657#comment-17068657 ] Taeke commented on ARROW-7908: -- Hi, Sorry for the long silence. With ARROW_R_DEV=TRUE I get: {code:sh} trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' Content type 'application/x-gzip' length 216119 bytes (211 KB) == downloaded 211 KB * installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation *** Generating code with data-raw/codegen.R Fatal error: cannot open file 'data-raw/codegen.R': No such file or directory PKG_CFLAGS=-I/tmp/Rtmp7CrqGP/R.INSTALL1bebe61d5312e/arrow/libarrow/arrow-0.16.0.2/include -DARROW_R_WITH_ARROW PKG_LIBS=-L/tmp/Rtmp7CrqGP/R.INSTALL1bebe61d5312e/arrow/libarrow/arrow-0.16.0.2/lib -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic ** libs g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I/tmp/Rtmp7CrqGP/R.INSTALL1bebe61d5312e/arrow/libarrow/arrow-0.16.0.2/include -DARROW_R_WITH_ARROW -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c array.cpp -o array.o In file included from array.cpp:18:0: ./arrow_types.h:201:31: fatal error: arrow/dataset/api.h: No such file or directory #include ^ compilation terminated. make: *** [array.o] Error 1 ERROR: compilation failed for package ‘arrow’ {code} data-raw/codegen.R misses because it is specified in .Rbuildignore. Removing that line from .Rbuildignore makes the installation run somewhat further, but data-raw/codegen.R than fails: {code:java} *** Generating code with data-raw/codegen.R Error in library(decor) : there is no package called ‘decor’ Calls: suppressPackageStartupMessages -> withCallingHandlers -> library Execution halted {code} That I could fix (manually, at least) by installing decor: {code:java} remotes::install_github("romainfrancois/decor") {code} That line is, understandably, commented out in data-raw/codegen.R Finally configure tries to run tools/linuxlibs.R, which tries to download_source(), but that fails due to an invalid version: {code:java} trying URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.16.0.9000.zip' Error in download.file(source_url, tf1, quiet = quietly) : (converted from warning) cannot open URL 'https://dl.bintray.com/ursalabs/arrow-r/libarrow/src/arrow-0.16.0.9000.zip': HTTP status was '404 Not Found' {code} This I could fix by changing the version in the DESCRIPTION to 0.16.0.2 After that the installation concludes as expected. In summary, for this to work: * data_raw/codegen.R needs not be included in .Rbuildignore * decor needs to become a dependency (?) * version number needs to be updated in DESCRIPTION to correspond with an available download > [R] Can't install package without setting LIBARROW_DOWNLOAD=true > > > Key: ARROW-7908 > URL: https://issues.apache.org/jira/browse/ARROW-7908 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.16.0 > Environment: Operating System: Red Hat Enterprise Linux Server 7.6 > (Maipo) > CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server > Kernel: Linux 3.10.0-957.35.2.el7.x86_64 > Architecture: x86-64 >Reporter: Taeke >Priority: Major > > Hi, > Installing arrow in R does not work intuitively on our server. > {code:r} > install.packages("arrow")` > {code} > results in an error: > {code:sh} > Installing package into '/home//R/x86_64-redhat-linux-gnu-library/3.6' > (as 'lib' is unspecified) > trying URL 'https://cloud.r-project.org/src/contrib/arrow_0.16.0.2.tar.gz' > Content type 'application/x-gzip' length 216119 bytes (211 KB) > == > downloaded 211 KB > * installing *source* package 'arrow' ... > ** package 'arrow' successfully unpacked and MD5 sums checked > ** using staged installation > PKG_CFLAGS=-I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include > -DARROW_R_WITH_ARROW > PKG_LIBS=-L/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/lib > -larrow_dataset -lparquet -larrow -lthrift -lsnappy -lz -lzstd -llz4 > -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static > -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic > ** libs > g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG > -I/tmp/Rtmp3v1BDf/R.INSTALL4a5d5d9f8bc8/arrow/libarrow/arrow-0.16.0.2/include >
[jira] [Commented] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068499#comment-17068499 ] Antoine Pitrou commented on ARROW-8238: --- I don't think that would make a difference, but you can try it out. > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068495#comment-17068495 ] Yibo Cai commented on ARROW-8238: - For these local helper functions in unit test files, any reason we didn't define them as static? > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068490#comment-17068490 ] Yibo Cai commented on ARROW-8238: - [~apitrou], trying to nail down the issue bysimplified and manual link steps. Looks like symbol collisions. One finding is this problem may be fixed(only through a simple test, not fully verified) by defining functions as static in all test sources. Will do more tests. [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_test.cc#L52] > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
[ https://issues.apache.org/jira/browse/ARROW-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068473#comment-17068473 ] Antoine Pitrou commented on ARROW-8238: --- This looks weird. Have you found a fix? > [C++][Compute] Failed to build compute tests on windows with msvc2015 > - > > Key: ARROW-8238 > URL: https://issues.apache.org/jira/browse/ARROW-8238 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Compute >Reporter: Yibo Cai >Priority: Minor > > Build Arrow compute tests on Windows10 with MSVC2015: > {code:bash} > cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON > -DARROW_BUILD_TESTS=ON .. > ninja -j3 > {code} > Build failed with below message: > {code:bash} > [311/405] Linking CXX executable release\arrow-misc-test.exe > FAILED: release/arrow-misc-test.exe > cmd.exe /C "cd . && > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E > vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir > --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe > --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- > C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." > LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo > src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj > src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj > /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib > /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 > /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console > release\arrow_testing.lib release\arrow.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib > googletest_ep-prefix\src\googletest_ep\lib\gtest.lib > googletest_ep-prefix\src\googletest_ep\lib\gmock.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib > C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib > Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib > ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST > /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) > with the following output: > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::vector std::allocator >(class std::initializer_list,class > std::allocator const &)" > (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl > std::vector >::~vector std::allocator >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) > already defined in result_test.cc.obj > arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned > __int64 __cdecl std::vector >::size(void)const > " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in > result_test.cc.obj > release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply > defined symbols found > [313/405] Building CXX object > src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj > ninja: build stopped: subcommand failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8239) [Java] fix param checks in splitAndTransfer method
[ https://issues.apache.org/jira/browse/ARROW-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8239: -- Labels: pull-request-available (was: ) > [Java] fix param checks in splitAndTransfer method > -- > > Key: ARROW-8239 > URL: https://issues.apache.org/jira/browse/ARROW-8239 > Project: Apache Arrow > Issue Type: Bug >Reporter: Prudhvi Porandla >Assignee: Prudhvi Porandla >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8239) [Java] fix param checks in splitAndTransfer method
Prudhvi Porandla created ARROW-8239: --- Summary: [Java] fix param checks in splitAndTransfer method Key: ARROW-8239 URL: https://issues.apache.org/jira/browse/ARROW-8239 Project: Apache Arrow Issue Type: Bug Reporter: Prudhvi Porandla Assignee: Prudhvi Porandla -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015
Yibo Cai created ARROW-8238: --- Summary: [C++][Compute] Failed to build compute tests on windows with msvc2015 Key: ARROW-8238 URL: https://issues.apache.org/jira/browse/ARROW-8238 Project: Apache Arrow Issue Type: Bug Components: C++ - Compute Reporter: Yibo Cai Build Arrow compute tests on Windows10 with MSVC2015: {code:bash} cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON .. ninja -j3 {code} Build failed with below message: {code:bash} [311/405] Linking CXX executable release\arrow-misc-test.exe FAILED: release/arrow-misc-test.exe cmd.exe /C "cd . && C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir --rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests -- C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console release\arrow_testing.lib release\arrow.lib googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib googletest_ep-prefix\src\googletest_ep\lib\gtest.lib googletest_ep-prefix\src\googletest_ep\lib\gmock.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ." LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj /out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib /pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console release\arrow_testing.lib release\arrow.lib googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib googletest_ep-prefix\src\googletest_ep\lib\gtest.lib googletest_ep-prefix\src\googletest_ep\lib\gmock.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) with the following output: arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl std::vector >::vector >(class std::initializer_list,class std::allocator const &)" (??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z) already defined in result_test.cc.obj arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl std::vector >::~vector >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) already defined in result_test.cc.obj arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned __int64 __cdecl std::vector >::size(void)const " (?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in result_test.cc.obj release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply defined symbols found [313/405] Building CXX object src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj ninja: build stopped: subcommand failed. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)