[jira] [Created] (ARROW-4286) [C++/R] Namespace vendored Boost
Uwe L. Korn created ARROW-4286: -- Summary: [C++/R] Namespace vendored Boost Key: ARROW-4286 URL: https://issues.apache.org/jira/browse/ARROW-4286 Project: Apache Arrow Issue Type: New Feature Components: C++, Packaging, R Reporter: Uwe L. Korn Fix For: 0.13.0 For R, we vendor Boost and thus also include the symbols privately in our modules. While they are private, some things like virtual destructors can still interfere with other packages that vendor Boost. We should also namespace the vendored Boost as we do in the manylinux1 packaging: https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4280) [C++][Documentation] It looks like flex and bison are required for parquet
[ https://issues.apache.org/jira/browse/ARROW-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-4280. Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3417 [https://github.com/apache/arrow/pull/3417] > [C++][Documentation] It looks like flex and bison are required for parquet > -- > > Key: ARROW-4280 > URL: https://issues.apache.org/jira/browse/ARROW-4280 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Trivial > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When trying to build parquet, it initially failed because it couldn't find > flex and bison. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4261) [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using Arrow as a subproject
[ https://issues.apache.org/jira/browse/ARROW-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-4261. Resolution: Fixed Issue resolved by pull request 3396 [https://github.com/apache/arrow/pull/3396] > [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using > Arrow as a subproject > --- > > Key: ARROW-4261 > URL: https://issues.apache.org/jira/browse/ARROW-4261 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Michael Vilim >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Builds using Arrow as a CMake subproject (using add_subdirectory) will fail > if the IPC, Flight, Thrift, or Plasma features are turned on. This issue is > caused by the use of CMAKE_SOURCE_DIR and CMAKE_BINARY_DIR which point to the > top level directories of the CMake project (source and output, respectively). > In most of the cases where these paths are used, they are intended to point > to the Arrow source and build dirs. Defining and using CMake variables for > those top level Arrow folders solves the issue. > I will open a pull request to fix the issue. > A project that demonstrates the issue and the patch can be found here: > https://github.com/mvilim/arrow-as-subproject > Note: there are several other locations in the repo where CMAKE_SOURCE_DIR > and CMAKE_BINARY_DIR are used (outside of the main cpp build, the > cmake_modules, and the Gandiva subproject, for example). I hesitate to change > these without an easy way to test all the possible build paths. I choosing a > safe route here and changing only the most straightforward ones (and ones > most likely to be used with Arrow as a subproject). If you would prefer I try > to change all uses of these variables, let me know (and let me know if you > have a straightforward way to test the supported build configurations). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4261) [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using Arrow as a subproject
[ https://issues.apache.org/jira/browse/ARROW-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-4261: -- Assignee: Michael Vilim > [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using > Arrow as a subproject > --- > > Key: ARROW-4261 > URL: https://issues.apache.org/jira/browse/ARROW-4261 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Michael Vilim >Assignee: Michael Vilim >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Builds using Arrow as a CMake subproject (using add_subdirectory) will fail > if the IPC, Flight, Thrift, or Plasma features are turned on. This issue is > caused by the use of CMAKE_SOURCE_DIR and CMAKE_BINARY_DIR which point to the > top level directories of the CMake project (source and output, respectively). > In most of the cases where these paths are used, they are intended to point > to the Arrow source and build dirs. Defining and using CMake variables for > those top level Arrow folders solves the issue. > I will open a pull request to fix the issue. > A project that demonstrates the issue and the patch can be found here: > https://github.com/mvilim/arrow-as-subproject > Note: there are several other locations in the repo where CMAKE_SOURCE_DIR > and CMAKE_BINARY_DIR are used (outside of the main cpp build, the > cmake_modules, and the Gandiva subproject, for example). I hesitate to change > these without an easy way to test all the possible build paths. I choosing a > safe route here and changing only the most straightforward ones (and ones > most likely to be used with Arrow as a subproject). If you would prefer I try > to change all uses of these variables, let me know (and let me know if you > have a straightforward way to test the supported build configurations). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4280) [C++][Documentation] It looks like flex and bison are required for parquet
[ https://issues.apache.org/jira/browse/ARROW-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated ARROW-4280: --- Fix Version/s: (was: 0.12.0) 0.13.0 > [C++][Documentation] It looks like flex and bison are required for parquet > -- > > Key: ARROW-4280 > URL: https://issues.apache.org/jira/browse/ARROW-4280 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When trying to build parquet, it initially failed because it couldn't find > flex and bison. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4287) [C++] Ensure minimal bison version on OSX for Thrift
Uwe L. Korn created ARROW-4287: -- Summary: [C++] Ensure minimal bison version on OSX for Thrift Key: ARROW-4287 URL: https://issues.apache.org/jira/browse/ARROW-4287 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.13.0 Thrift currently just uses the first bison it finds but needs actually a newer one. We should look for the minimal version required and fall back explicitly to homebrew and use the newer version if it is available there. Note: I'll add a fix in our CMake toolchain but will also try to upstream this to Thrift. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4167) [Gandiva] switch to arrow/util/variant
[ https://issues.apache.org/jira/browse/ARROW-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4167: -- Labels: pull-request-available (was: ) > [Gandiva] switch to arrow/util/variant > -- > > Key: ARROW-4167 > URL: https://issues.apache.org/jira/browse/ARROW-4167 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > gandiva cpp uses boost variant. It should switch to arrow/util/variant. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4288) Installation instructions don't work on Ubuntu 18.04
Kirill Müller created ARROW-4288: Summary: Installation instructions don't work on Ubuntu 18.04 Key: ARROW-4288 URL: https://issues.apache.org/jira/browse/ARROW-4288 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Kirill Müller The R package seems to require statically linking to Boost. One way to achieve this on Ubuntu is to use the vendored Boost. See also ARROW-4286 which discusses namespacing Boost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4288) [R] Installation instructions don't work on Ubuntu 18.04
[ https://issues.apache.org/jira/browse/ARROW-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Müller updated ARROW-4288: - Summary: [R] Installation instructions don't work on Ubuntu 18.04 (was: Installation instructions don't work on Ubuntu 18.04) > [R] Installation instructions don't work on Ubuntu 18.04 > > > Key: ARROW-4288 > URL: https://issues.apache.org/jira/browse/ARROW-4288 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Kirill Müller >Priority: Major > > The R package seems to require statically linking to Boost. One way to > achieve this on Ubuntu is to use the vendored Boost. > See also ARROW-4286 which discusses namespacing Boost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4288) [R] Installation instructions don't work on Ubuntu 18.04
[ https://issues.apache.org/jira/browse/ARROW-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4288: -- Labels: pull-request-available (was: ) > [R] Installation instructions don't work on Ubuntu 18.04 > > > Key: ARROW-4288 > URL: https://issues.apache.org/jira/browse/ARROW-4288 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Kirill Müller >Priority: Major > Labels: pull-request-available > > The R package seems to require statically linking to Boost. One way to > achieve this on Ubuntu is to use the vendored Boost. > See also ARROW-4286 which discusses namespacing Boost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow
[ https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4275: -- Labels: pull-request-available (was: ) > [C++] gandiva-decimal_single_test extremely slow > > > Key: ARROW-4275 > URL: https://issues.apache.org/jira/browse/ARROW-4275 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, Gandiva >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > > {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind: > {code} > 99/100 Test #128: gandiva-decimal_single_test ... Passed > 397.11 sec > 100/100 Test #130: gandiva-decimal_single_test_static Passed > 338.97 sec > {code} > (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707) > Something should be done to make it faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow
[ https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746293#comment-16746293 ] Pindikura Ravindra commented on ARROW-4275: --- gandiva has a cache of JITs to solve these kind of issues. but, there is a bug in the cache lookup. > [C++] gandiva-decimal_single_test extremely slow > > > Key: ARROW-4275 > URL: https://issues.apache.org/jira/browse/ARROW-4275 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, Gandiva >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind: > {code} > 99/100 Test #128: gandiva-decimal_single_test ... Passed > 397.11 sec > 100/100 Test #130: gandiva-decimal_single_test_static Passed > 338.97 sec > {code} > (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707) > Something should be done to make it faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow
[ https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746337#comment-16746337 ] Pindikura Ravindra commented on ARROW-4275: --- After the fix, 99/99 Test #128: gandiva-decimal_single_test ... Passed 143.53 sec > [C++] gandiva-decimal_single_test extremely slow > > > Key: ARROW-4275 > URL: https://issues.apache.org/jira/browse/ARROW-4275 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, Gandiva >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind: > {code} > 99/100 Test #128: gandiva-decimal_single_test ... Passed > 397.11 sec > 100/100 Test #130: gandiva-decimal_single_test_static Passed > 338.97 sec > {code} > (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707) > Something should be done to make it faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4287) [C++] Ensure minimal bison version on OSX for Thrift
[ https://issues.apache.org/jira/browse/ARROW-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4287: -- Labels: pull-request-available (was: ) > [C++] Ensure minimal bison version on OSX for Thrift > > > Key: ARROW-4287 > URL: https://issues.apache.org/jira/browse/ARROW-4287 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Thrift currently just uses the first bison it finds but needs actually a > newer one. We should look for the minimal version required and fall back > explicitly to homebrew and use the newer version if it is available there. > Note: I'll add a fix in our CMake toolchain but will also try to upstream > this to Thrift. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4289) [C++] Forward AR and RANLIB to thirdparty builds
Uwe L. Korn created ARROW-4289: -- Summary: [C++] Forward AR and RANLIB to thirdparty builds Key: ARROW-4289 URL: https://issues.apache.org/jira/browse/ARROW-4289 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Uwe L. Korn Assignee: Uwe L. Korn On OSX Mojave, it seems that there are many version of AR present. CMake seems to detect the right one whereas some thirdparty tooling picks up the wrong one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4289) [C++] Forward AR and RANLIB to thirdparty builds
[ https://issues.apache.org/jira/browse/ARROW-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4289: -- Labels: pull-request-available (was: ) > [C++] Forward AR and RANLIB to thirdparty builds > > > Key: ARROW-4289 > URL: https://issues.apache.org/jira/browse/ARROW-4289 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > On OSX Mojave, it seems that there are many version of AR present. CMake > seems to detect the right one whereas some thirdparty tooling picks up the > wrong one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew
[ https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-4290: -- Assignee: Uwe L. Korn > [C++/Gandiva] Support detecting correct LLVM version in Homebrew > > > Key: ARROW-4290 > URL: https://issues.apache.org/jira/browse/ARROW-4290 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Gandiva >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > > We should also search in homebrew for the matching LLVM version for Gandiva > on OSX. You can install it via {{brew install llvm@6}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew
Uwe L. Korn created ARROW-4290: -- Summary: [C++/Gandiva] Support detecting correct LLVM version in Homebrew Key: ARROW-4290 URL: https://issues.apache.org/jira/browse/ARROW-4290 Project: Apache Arrow Issue Type: New Feature Components: C++, Gandiva Reporter: Uwe L. Korn We should also search in homebrew for the matching LLVM version for Gandiva on OSX. You can install it via {{brew install llvm@6}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew
[ https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4290: -- Labels: pull-request-available (was: ) > [C++/Gandiva] Support detecting correct LLVM version in Homebrew > > > Key: ARROW-4290 > URL: https://issues.apache.org/jira/browse/ARROW-4290 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Gandiva >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > We should also search in homebrew for the matching LLVM version for Gandiva > on OSX. You can install it via {{brew install llvm@6}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4167) [Gandiva] switch to arrow/util/variant
[ https://issues.apache.org/jira/browse/ARROW-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4167. - Resolution: Fixed Fix Version/s: (was: 0.13.0) 0.12.0 Issue resolved by pull request 3425 [https://github.com/apache/arrow/pull/3425] > [Gandiva] switch to arrow/util/variant > -- > > Key: ARROW-4167 > URL: https://issues.apache.org/jira/browse/ARROW-4167 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > gandiva cpp uses boost variant. It should switch to arrow/util/variant. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4291) [Dev] Support selecting features in release scripts
Uwe L. Korn created ARROW-4291: -- Summary: [Dev] Support selecting features in release scripts Key: ARROW-4291 URL: https://issues.apache.org/jira/browse/ARROW-4291 Project: Apache Arrow Issue Type: New Feature Components: Developer Tools, Packaging Reporter: Uwe L. Korn Sometimes not all components can be verified on a system. We should provide some environment variables to exclude them to proceed to the next step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4291) [Dev] Support selecting features in release scripts
[ https://issues.apache.org/jira/browse/ARROW-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4291: -- Labels: pull-request-available (was: ) > [Dev] Support selecting features in release scripts > --- > > Key: ARROW-4291 > URL: https://issues.apache.org/jira/browse/ARROW-4291 > Project: Apache Arrow > Issue Type: New Feature > Components: Developer Tools, Packaging >Reporter: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > Sometimes not all components can be verified on a system. We should provide > some environment variables to exclude them to proceed to the next step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4291) [Dev] Support selecting features in release scripts
[ https://issues.apache.org/jira/browse/ARROW-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-4291: -- Assignee: Uwe L. Korn > [Dev] Support selecting features in release scripts > --- > > Key: ARROW-4291 > URL: https://issues.apache.org/jira/browse/ARROW-4291 > Project: Apache Arrow > Issue Type: New Feature > Components: Developer Tools, Packaging >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Sometimes not all components can be verified on a system. We should provide > some environment variables to exclude them to proceed to the next step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4264) [C++] Convert DCHECKs in that check compute/* input parameters to error statuses
[ https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4264: Summary: [C++] Convert DCHECKs in that check compute/* input parameters to error statuses (was: Convert DCHECKs in that check compute/* input parameters to error statuses) > [C++] Convert DCHECKs in that check compute/* input parameters to error > statuses > > > Key: ARROW-4264 > URL: https://issues.apache.org/jira/browse/ARROW-4264 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > > DCHECKs seem to be used where Status::Invalid is more appropriate (so > programs don't crash). See conversation on > https://github.com/apache/arrow/pull/3287/files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
[ https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4254: --- Assignee: Wes McKinney > [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt > -- > > Key: ARROW-4254 > URL: https://issues.apache.org/jira/browse/ARROW-4254 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > These tests use an API that was not available in the Boost in Ubuntu 14.04; > we can change them to use the more compatible API > {code} > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc: > In member function ‘virtual void > gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’: > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > ^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203: > error: template argument 1 is invalid >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > >^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > > ^ > make[2]: *** > [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error > 1 > make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails
[ https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-1918. --- Resolution: Not A Problem Closing as stale. The integration part works now > [JS] Integration portion of verify-release-candidate.sh fails > - > > Key: ARROW-1918 > URL: https://issues.apache.org/jira/browse/ARROW-1918 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.8.0 >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Fix For: JS-0.5.0 > > > I'm going to temporarily disable this in my fixes in ARROW-1917 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-2959) Dockerize verify-release-candidate.{sh,bat}
[ https://issues.apache.org/jira/browse/ARROW-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-2959. --- Resolution: Won't Fix Closing as Won't Fix. The release verification process is regularly turning up issues that would be occluded by a Dockerized build (e.g. I found several problems on Ubuntu 14.04 for 0.12) I think we should definitely make it more straightforward / simpler to set up the user environment to run the script though > Dockerize verify-release-candidate.{sh,bat} > --- > > Key: ARROW-2959 > URL: https://issues.apache.org/jira/browse/ARROW-2959 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Affects Versions: 0.9.0 >Reporter: Phillip Cloud >Priority: Major > > There are a number of issues with the linux version of this script that would > disappear if the commands were all being run in a docker container. > Anyone with docker installed should be able to verify the release candidate > We could probably do the same for windows as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4292) [Release] Add script to test release verification script against master branch
Wes McKinney created ARROW-4292: --- Summary: [Release] Add script to test release verification script against master branch Key: ARROW-4292 URL: https://issues.apache.org/jira/browse/ARROW-4292 Project: Apache Arrow Issue Type: New Feature Components: Developer Tools Reporter: Wes McKinney Fix For: 0.13.0 This should enable us to find problems with the verification script well before releases happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
[ https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4254: -- Labels: pull-request-available (was: ) > [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt > -- > > Key: ARROW-4254 > URL: https://issues.apache.org/jira/browse/ARROW-4254 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > These tests use an API that was not available in the Boost in Ubuntu 14.04; > we can change them to use the more compatible API > {code} > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc: > In member function ‘virtual void > gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’: > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > ^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203: > error: template argument 1 is invalid >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > >^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > > ^ > make[2]: *** > [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error > 1 > make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4250) [C++][Gandiva] Use approximate comparisons for floating point numbers in gandiva-projector-test
[ https://issues.apache.org/jira/browse/ARROW-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746457#comment-16746457 ] Wes McKinney commented on ARROW-4250: - Of course this failure has ended up being non-deterministic. I'm going to add an approximate version of {{AssertArraysEqual}} in arrow_testing so that we can use that here, and anywhere where we are doing floating point comparisons where equality can be within acceptable tolerance (~1E-13 or so) > [C++][Gandiva] Use approximate comparisons for floating point numbers in > gandiva-projector-test > --- > > Key: ARROW-4250 > URL: https://issues.apache.org/jira/browse/ARROW-4250 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Gandiva >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > I experienced a failure due to floating point comparison when running the > release verification script for 0.12.0 RC2. > {code} > [==] Running 13 tests from 1 test case. > [--] Global test environment set-up. > [--] 13 tests from TestProjector > [ RUN ] TestProjector.TestProjectCache > [ OK ] TestProjector.TestProjectCache (584 ms) > [ RUN ] TestProjector.TestProjectCacheFieldNames > [ OK ] TestProjector.TestProjectCacheFieldNames (319 ms) > [ RUN ] TestProjector.TestProjectCacheDouble > [ OK ] TestProjector.TestProjectCacheDouble (304 ms) > [ RUN ] TestProjector.TestProjectCacheFloat > [ OK ] TestProjector.TestProjectCacheFloat (305 ms) > [ RUN ] TestProjector.TestIntSumSub > [ OK ] TestProjector.TestIntSumSub (200 ms) > [ RUN ] TestProjector.TestAllIntTypes > [ OK ] TestProjector.TestAllIntTypes (1945 ms) > [ RUN ] TestProjector.TestExtendedMath > /tmp/arrow-0.12.0.a2ADf/apache-arrow-0.12.0/cpp/src/gandiva/tests/projector_test.cc:358: > Failure > Value of: (expected_cbrt)->Equals(outputs.at(0)) > Actual: false > Expected: true > expected array: [ > 2.51984, > 2.15443, > -2.41014, > 2.02469 > ] actual array: [ > 2.51984, > 2.15443, > -2.41014, > 2.02469 > ] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4293) [C++] Can't access parquet statistics on binary columns
Ildar created ARROW-4293: Summary: [C++] Can't access parquet statistics on binary columns Key: ARROW-4293 URL: https://issues.apache.org/jira/browse/ARROW-4293 Project: Apache Arrow Issue Type: Bug Reporter: Ildar Hi, I'm trying to use per-column statistics (min/max values) to filter out row groups while reading parquet file. But I don't see statistics built for binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards statistics that have sort order {{UNSIGNED }}and haven't been created by {{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} before. But do they still persist? For example, I have parquet file created with {{parquet-mr}} version 1.10, it seems to have correct min/max values for binary columns. And {{parquet-cpp}} works fine for me if I remove this code from {{HasCorrectStatistics()}} func: {{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}} {{ return false; }}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4293) [C++] Can't access parquet statistics on binary columns
[ https://issues.apache.org/jira/browse/ARROW-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ildar updated ARROW-4293: - Description: Hi, I'm trying to use per-column statistics (min/max values) to filter out row groups while reading parquet file. But I don't see statistics built for binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards statistics that have sort order {{UNSIGNED and haven't been created by parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} before. But do they still persist? For example, I have parquet file created with {{parquet-mr}} version 1.10, it seems to have correct min/max values for binary columns. And {{parquet-cpp}} works fine for me if I remove this code from {{HasCorrectStatistics()}} func: {code:java} if (SortOrder::SIGNED != sort_order && !max_equals_min) { return false; }{code} was: Hi, I'm trying to use per-column statistics (min/max values) to filter out row groups while reading parquet file. But I don't see statistics built for binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards statistics that have sort order {{UNSIGNED }}and haven't been created by {{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} before. But do they still persist? For example, I have parquet file created with {{parquet-mr}} version 1.10, it seems to have correct min/max values for binary columns. And {{parquet-cpp}} works fine for me if I remove this code from {{HasCorrectStatistics()}} func: {{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}} {{ return false; }}} > [C++] Can't access parquet statistics on binary columns > --- > > Key: ARROW-4293 > URL: https://issues.apache.org/jira/browse/ARROW-4293 > Project: Apache Arrow > Issue Type: Bug >Reporter: Ildar >Priority: Major > > Hi, > I'm trying to use per-column statistics (min/max values) to filter out row > groups while reading parquet file. But I don't see statistics built for > binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} > discards statistics that have sort order {{UNSIGNED and haven't been created > by parquet-cpp}}. As I understand there used to be some issues in > {{parquet-mr}} before. But do they still persist? > For example, I have parquet file created with {{parquet-mr}} version 1.10, it > seems to have correct min/max values for binary columns. And {{parquet-cpp}} > works fine for me if I remove this code from {{HasCorrectStatistics()}} func: > > {code:java} > if (SortOrder::SIGNED != sort_order && !max_equals_min) { > return false; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4293) [C++] Can't access parquet statistics on binary columns
[ https://issues.apache.org/jira/browse/ARROW-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746555#comment-16746555 ] Deepak Majeti commented on ARROW-4293: -- This should be a Parquet JIRA. [~wesmckinn] Can we move this Jira to the Parquet project? {{HasCorrectStatistics()}} has to be updated to accept all statistics written by parquet-mr 1.10.0 parquet-mr implemented the new fixed min-max statistics in the following Jira that went into the 1.10.0 release https://issues.apache.org/jira/browse/PARQUET-1025 > [C++] Can't access parquet statistics on binary columns > --- > > Key: ARROW-4293 > URL: https://issues.apache.org/jira/browse/ARROW-4293 > Project: Apache Arrow > Issue Type: Bug >Reporter: Ildar >Priority: Major > > Hi, > I'm trying to use per-column statistics (min/max values) to filter out row > groups while reading parquet file. But I don't see statistics built for > binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} > discards statistics that have sort order {{UNSIGNED and haven't been created > by parquet-cpp}}. As I understand there used to be some issues in > {{parquet-mr}} before. But do they still persist? > For example, I have parquet file created with {{parquet-mr}} version 1.10, it > seems to have correct min/max values for binary columns. And {{parquet-cpp}} > works fine for me if I remove this code from {{HasCorrectStatistics()}} func: > > {code:java} > if (SortOrder::SIGNED != sort_order && !max_equals_min) { > return false; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Khandelwal updated ARROW-4294: - Description: Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. Adding support for a pluggable external store that Plasma can evict objects to will address this issue. My proposal is described below. *Requirements* * Objects in Plasma should be evicted to a external store rather than being removed altogether * Communication to the external storage service should be through a very thin, shim interface. At the same time, the interface should be general enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) * Should be pluggable (e.g., it should be simple to add in or remove the external storage service for eviction, switch between different remote services, etc.) and easy to implement *Assumptions/Non-Requirements* * The external store has practically infinite storage * The external store's write operation is idempotent and atomic; this is needed ensure there are no race conditions due to multiple concurrent evictions of the same object. *Proposed Implementation* * Define a ExternalStore interface with a Connect call. The call returns an ExternalStoreHandle, that exposes Put and Get calls. Any external store that needs to be supported has to have this interface implemented. * In order to read or write data to the external store in a thread-safe manner, one ExternalStoreHandle should be created per-thread. While the ExternalStoreHandle itself is not required to be thread-safe, multiple ExternalStoreHandles across multiple threads should be able to modify the external store in a thread-safe manner. * Replace the DeleteObjects method in the Plasma Store with an EvictObjects method. If an external store is specified for the Plasma store, the EvictObjects method would mark the object state as PLASMA_EVICTED, write the object data to the external store (via the ExternalStoreHandle) and reclaim the memory associated with the object data/metadata rather than remove the entry from the Object Table altogether. In case there is no valid external store, the eviction path would remain the same (i.e., the object entry is still deleted from the Object Table). * The Get method in Plasma Store now tries to fetch the object from external store if it is not found locally and there is an external store associated with the Plasma Store. The method tries to offload this to an external worker thread pool with a fire-and-forget model, but may need to do this synchronously if there are too many requests already enqueued. * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which can be appended to with implementations of the ExternalStore and ExternalStoreHandle interfaces, which will then be compiled into the plasma_store_server executable. was: Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. Adding support for a pluggable external store that Plasma can evict objects to will address this issue. My proposal is described below. *Requirements* * Objects in Plasma should be evicted to a external store rather than being removed altogether * Communication to the external storage service should be through a very thin, shim interface. At the same time, the interface should be general enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) * Should be pluggable (e.g., it should be simple to add in or remove the external storage service for eviction, switch between different remote services, etc.) and easy to implement *Assumptions/Non-Requirements* * The external store has practically infinite storage * The external store's write operation is idempotent and atomic; this is needed ensure there are no race conditions due to multiple concurrent evictions of the same object. *Proposed Implementation* * Define a ExternalStore interface with a Connect call. The call returns an ExternalStoreHandle, that exposes Put and Get calls. Any external store that needs to be supported has to have this interface implemented. * In order to read or write data to the external store in a thread-safe manner, one ExternalStoreHandle should be created per-thread. While the ExternalStoreHandle itself is not required to be thread-safe, multiple ExternalStoreHandles across multiple threads should be able to modify the external store in a thread-safe manner. * Replace the DeleteObjects method in the Plasma Store with an EvictObjects method. If an external store is specif
[jira] [Created] (ARROW-4294) [Plasma] Add support for evicting objects to external store
Anurag Khandelwal created ARROW-4294: Summary: [Plasma] Add support for evicting objects to external store Key: ARROW-4294 URL: https://issues.apache.org/jira/browse/ARROW-4294 Project: Apache Arrow Issue Type: New Feature Components: C++ Affects Versions: 0.11.1 Reporter: Anurag Khandelwal Fix For: 0.13.0 Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. Adding support for a pluggable external store that Plasma can evict objects to will address this issue. My proposal is described below. *Requirements* * Objects in Plasma should be evicted to a external store rather than being removed altogether * Communication to the external storage service should be through a very thin, shim interface. At the same time, the interface should be general enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) * Should be pluggable (e.g., it should be simple to add in or remove the external storage service for eviction, switch between different remote services, etc.) and easy to implement *Assumptions/Non-Requirements* * The external store has practically infinite storage * The external store's write operation is idempotent and atomic; this is needed ensure there are no race conditions due to multiple concurrent evictions of the same object. *Proposed Implementation* * Define a ExternalStore interface with a Connect call. The call returns an ExternalStoreHandle, that exposes Put and Get calls. Any external store that needs to be supported has to have this interface implemented. * In order to read or write data to the external store in a thread-safe manner, one ExternalStoreHandle should be created per-thread. While the ExternalStoreHandle itself is not required to be thread-safe, multiple ExternalStoreHandles across multiple threads should be able to modify the external store in a thread-safe manner. * Replace the DeleteObjects method in the Plasma Store with an EvictObjects method. If an external store is specified for the Plasma store, the EvictObjects method would mark the object state as PLASMA_EVICTED, write the object data to the external store (via the ExternalStoreHandle) and reclaim the memory associated with the object data/metadata rather than remove the entry from the Object Table altogether. In case there is no valid external store, the eviction path would remain the same (i.e., the object entry is still deleted from the Object Table). * The Get method in Plasma Store now tries to fetch the object from external store if it is not found locally and there is an external store associated with the Plasma Store. The method tries to offload this to an external worker thread pool with a fire-and-forget model, but may need to do this synchronously if there are too many requests already enqueued. * *The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which can be appended to with implementations of the ExternalStore and ExternalStoreHandle interfaces, which will then be compiled into the plasma_store_server executable.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Khandelwal updated ARROW-4294: - Description: Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. Adding support for a pluggable external store that Plasma can evict objects to will address this issue. My proposal is described below. *Requirements* * Objects in Plasma should be evicted to a external store rather than being removed altogether * Communication to the external storage service should be through a very thin, shim interface. At the same time, the interface should be general enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) * Should be pluggable (e.g., it should be simple to add in or remove the external storage service for eviction, switch between different remote services, etc.) and easy to implement *Assumptions/Non-Requirements* * The external store has practically infinite storage * The external store's write operation is idempotent and atomic; this is needed ensure there are no race conditions due to multiple concurrent evictions of the same object. *Proposed Implementation* * Define a ExternalStore interface with a Connect call. The call returns an ExternalStoreHandle, that exposes Put and Get calls. Any external store that needs to be supported has to have this interface implemented. * In order to read or write data to the external store in a thread-safe manner, one ExternalStoreHandle should be created per-thread. While the ExternalStoreHandle itself is not required to be thread-safe, multiple ExternalStoreHandles across multiple threads should be able to modify the external store in a thread-safe manner. These handles are most likely going to be wrappers around the external store client interfaces. * Replace the DeleteObjects method in the Plasma Store with an EvictObjects method. If an external store is specified for the Plasma store, the EvictObjects method would mark the object state as PLASMA_EVICTED, write the object data to the external store (via the ExternalStoreHandle) and reclaim the memory associated with the object data/metadata rather than remove the entry from the Object Table altogether. In case there is no valid external store, the eviction path would remain the same (i.e., the object entry is still deleted from the Object Table). * The Get method in Plasma Store now tries to fetch the object from external store if it is not found locally and there is an external store associated with the Plasma Store. The method tries to offload this to an external worker thread pool with a fire-and-forget model, but may need to do this synchronously if there are too many requests already enqueued. * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which can be appended to with implementations of the ExternalStore and ExternalStoreHandle interfaces, which will then be compiled into the plasma_store_server executable. was: Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. Adding support for a pluggable external store that Plasma can evict objects to will address this issue. My proposal is described below. *Requirements* * Objects in Plasma should be evicted to a external store rather than being removed altogether * Communication to the external storage service should be through a very thin, shim interface. At the same time, the interface should be general enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) * Should be pluggable (e.g., it should be simple to add in or remove the external storage service for eviction, switch between different remote services, etc.) and easy to implement *Assumptions/Non-Requirements* * The external store has practically infinite storage * The external store's write operation is idempotent and atomic; this is needed ensure there are no race conditions due to multiple concurrent evictions of the same object. *Proposed Implementation* * Define a ExternalStore interface with a Connect call. The call returns an ExternalStoreHandle, that exposes Put and Get calls. Any external store that needs to be supported has to have this interface implemented. * In order to read or write data to the external store in a thread-safe manner, one ExternalStoreHandle should be created per-thread. While the ExternalStoreHandle itself is not required to be thread-safe, multiple ExternalStoreHandles across multiple threads should be able to modify the external store in a thread-safe manner. * Replace the Dele
[jira] [Assigned] (ARROW-4253) [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
[ https://issues.apache.org/jira/browse/ARROW-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-4253: --- Assignee: Pindikura Ravindra > [GLib] Cannot use non-system Boost specified with $BOOST_ROOT > - > > Key: ARROW-4253 > URL: https://issues.apache.org/jira/browse/ARROW-4253 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Reporter: Wes McKinney >Assignee: Pindikura Ravindra >Priority: Major > Fix For: 0.13.0 > > > When trying to verify the 0.12 RC2 with Boost installed in a separate > directory set to BOOST_ROOT, this directory is not added to the include path, > causing the build to fail to find {{boost/variant.hpp}}, which is leaked in > the Gandiva headers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4253) [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
[ https://issues.apache.org/jira/browse/ARROW-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-4253. - Resolution: Fixed Because ARROW-4167 is solved. > [GLib] Cannot use non-system Boost specified with $BOOST_ROOT > - > > Key: ARROW-4253 > URL: https://issues.apache.org/jira/browse/ARROW-4253 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Reporter: Wes McKinney >Assignee: Pindikura Ravindra >Priority: Major > Fix For: 0.13.0 > > > When trying to verify the 0.12 RC2 with Boost installed in a separate > directory set to BOOST_ROOT, this directory is not added to the include path, > causing the build to fail to find {{boost/variant.hpp}}, which is leaked in > the Gandiva headers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
[ https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-4254. - Resolution: Fixed Issue resolved by pull request 3431 [https://github.com/apache/arrow/pull/3431] > [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt > -- > > Key: ARROW-4254 > URL: https://issues.apache.org/jira/browse/ARROW-4254 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > These tests use an API that was not available in the Boost in Ubuntu 14.04; > we can change them to use the more compatible API > {code} > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc: > In member function ‘virtual void > gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’: > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > ^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203: > error: template argument 1 is invalid >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > >^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > > ^ > make[2]: *** > [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error > 1 > make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4294: -- Labels: features pull-request-available (was: features) > [Plasma] Add support for evicting objects to external store > --- > > Key: ARROW-4294 > URL: https://issues.apache.org/jira/browse/ARROW-4294 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: features, pull-request-available > Fix For: 0.13.0 > > > Currently, when Plasma needs storage space for additional objects, it evicts > objects by deleting them from the Plasma store. This is a problem when it > isn't possible to reconstruct the object or reconstructing it is expensive. > Adding support for a pluggable external store that Plasma can evict objects > to will address this issue. > My proposal is described below. > *Requirements* > * Objects in Plasma should be evicted to a external store rather than being > removed altogether > * Communication to the external storage service should be through a very > thin, shim interface. At the same time, the interface should be general > enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) > * Should be pluggable (e.g., it should be simple to add in or remove the > external storage service for eviction, switch between different remote > services, etc.) and easy to implement > *Assumptions/Non-Requirements* > * The external store has practically infinite storage > * The external store's write operation is idempotent and atomic; this is > needed ensure there are no race conditions due to multiple concurrent > evictions of the same object. > *Proposed Implementation* > * Define a ExternalStore interface with a Connect call. The call returns an > ExternalStoreHandle, that exposes Put and Get calls. Any external store that > needs to be supported has to have this interface implemented. > * In order to read or write data to the external store in a thread-safe > manner, one ExternalStoreHandle should be created per-thread. While the > ExternalStoreHandle itself is not required to be thread-safe, multiple > ExternalStoreHandles across multiple threads should be able to modify the > external store in a thread-safe manner. These handles are most likely going > to be wrappers around the external store client interfaces. > * Replace the DeleteObjects method in the Plasma Store with an EvictObjects > method. If an external store is specified for the Plasma store, the > EvictObjects method would mark the object state as PLASMA_EVICTED, write the > object data to the external store (via the ExternalStoreHandle) and reclaim > the memory associated with the object data/metadata rather than remove the > entry from the Object Table altogether. In case there is no valid external > store, the eviction path would remain the same (i.e., the object entry is > still deleted from the Object Table). > * The Get method in Plasma Store now tries to fetch the object from external > store if it is not found locally and there is an external store associated > with the Plasma Store. The method tries to offload this to an external worker > thread pool with a fire-and-forget model, but may need to do this > synchronously if there are too many requests already enqueued. > * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, > which can be appended to with implementations of the ExternalStore and > ExternalStoreHandle interfaces, which will then be compiled into the > plasma_store_server executable. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Khandelwal updated ARROW-4294: - Component/s: Plasma (C++) > [Plasma] Add support for evicting objects to external store > --- > > Key: ARROW-4294 > URL: https://issues.apache.org/jira/browse/ARROW-4294 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: features, pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, when Plasma needs storage space for additional objects, it evicts > objects by deleting them from the Plasma store. This is a problem when it > isn't possible to reconstruct the object or reconstructing it is expensive. > Adding support for a pluggable external store that Plasma can evict objects > to will address this issue. > My proposal is described below. > *Requirements* > * Objects in Plasma should be evicted to a external store rather than being > removed altogether > * Communication to the external storage service should be through a very > thin, shim interface. At the same time, the interface should be general > enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) > * Should be pluggable (e.g., it should be simple to add in or remove the > external storage service for eviction, switch between different remote > services, etc.) and easy to implement > *Assumptions/Non-Requirements* > * The external store has practically infinite storage > * The external store's write operation is idempotent and atomic; this is > needed ensure there are no race conditions due to multiple concurrent > evictions of the same object. > *Proposed Implementation* > * Define a ExternalStore interface with a Connect call. The call returns an > ExternalStoreHandle, that exposes Put and Get calls. Any external store that > needs to be supported has to have this interface implemented. > * In order to read or write data to the external store in a thread-safe > manner, one ExternalStoreHandle should be created per-thread. While the > ExternalStoreHandle itself is not required to be thread-safe, multiple > ExternalStoreHandles across multiple threads should be able to modify the > external store in a thread-safe manner. These handles are most likely going > to be wrappers around the external store client interfaces. > * Replace the DeleteObjects method in the Plasma Store with an EvictObjects > method. If an external store is specified for the Plasma store, the > EvictObjects method would mark the object state as PLASMA_EVICTED, write the > object data to the external store (via the ExternalStoreHandle) and reclaim > the memory associated with the object data/metadata rather than remove the > entry from the Object Table altogether. In case there is no valid external > store, the eviction path would remain the same (i.e., the object entry is > still deleted from the Object Table). > * The Get method in Plasma Store now tries to fetch the object from external > store if it is not found locally and there is an external store associated > with the Plasma Store. The method tries to offload this to an external worker > thread pool with a fire-and-forget model, but may need to do this > synchronously if there are too many requests already enqueued. > * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, > which can be appended to with implementations of the ExternalStore and > ExternalStoreHandle interfaces, which will then be compiled into the > plasma_store_server executable. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
[ https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4295: -- Labels: pull-request-available (was: ) > [Plasma] Incorrect log message when evicting objects > > > Key: ARROW-4295 > URL: https://issues.apache.org/jira/browse/ARROW-4295 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > When Plasma evicts objects on running out of memory, it prints log messages > of the form: > {quote}There is not enough space to create this object, so evicting x objects > to free up y bytes. The number of bytes in use (before this eviction) is > z.{quote} > However, the reported number of bytes in use (before this eviction) actually > reports the number of bytes *after* the eviction. A straightforward fix is to > simply replace z with (y+z). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
[ https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746918#comment-16746918 ] Anurag Khandelwal commented on ARROW-4295: -- cc [~pcmoritz] > [Plasma] Incorrect log message when evicting objects > > > Key: ARROW-4295 > URL: https://issues.apache.org/jira/browse/ARROW-4295 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When Plasma evicts objects on running out of memory, it prints log messages > of the form: > {quote}There is not enough space to create this object, so evicting x objects > to free up y bytes. The number of bytes in use (before this eviction) is > z.{quote} > However, the reported number of bytes in use (before this eviction) actually > reports the number of bytes *after* the eviction. A straightforward fix is to > simply replace z with (y+z). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746919#comment-16746919 ] Anurag Khandelwal commented on ARROW-4294: -- cc [~pcmoritz] > [Plasma] Add support for evicting objects to external store > --- > > Key: ARROW-4294 > URL: https://issues.apache.org/jira/browse/ARROW-4294 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: features, pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, when Plasma needs storage space for additional objects, it evicts > objects by deleting them from the Plasma store. This is a problem when it > isn't possible to reconstruct the object or reconstructing it is expensive. > Adding support for a pluggable external store that Plasma can evict objects > to will address this issue. > My proposal is described below. > *Requirements* > * Objects in Plasma should be evicted to a external store rather than being > removed altogether > * Communication to the external storage service should be through a very > thin, shim interface. At the same time, the interface should be general > enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) > * Should be pluggable (e.g., it should be simple to add in or remove the > external storage service for eviction, switch between different remote > services, etc.) and easy to implement > *Assumptions/Non-Requirements* > * The external store has practically infinite storage > * The external store's write operation is idempotent and atomic; this is > needed ensure there are no race conditions due to multiple concurrent > evictions of the same object. > *Proposed Implementation* > * Define a ExternalStore interface with a Connect call. The call returns an > ExternalStoreHandle, that exposes Put and Get calls. Any external store that > needs to be supported has to have this interface implemented. > * In order to read or write data to the external store in a thread-safe > manner, one ExternalStoreHandle should be created per-thread. While the > ExternalStoreHandle itself is not required to be thread-safe, multiple > ExternalStoreHandles across multiple threads should be able to modify the > external store in a thread-safe manner. These handles are most likely going > to be wrappers around the external store client interfaces. > * Replace the DeleteObjects method in the Plasma Store with an EvictObjects > method. If an external store is specified for the Plasma store, the > EvictObjects method would mark the object state as PLASMA_EVICTED, write the > object data to the external store (via the ExternalStoreHandle) and reclaim > the memory associated with the object data/metadata rather than remove the > entry from the Object Table altogether. In case there is no valid external > store, the eviction path would remain the same (i.e., the object entry is > still deleted from the Object Table). > * The Get method in Plasma Store now tries to fetch the object from external > store if it is not found locally and there is an external store associated > with the Plasma Store. The method tries to offload this to an external worker > thread pool with a fire-and-forget model, but may need to do this > synchronously if there are too many requests already enqueued. > * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, > which can be appended to with implementations of the ExternalStore and > ExternalStoreHandle interfaces, which will then be compiled into the > plasma_store_server executable. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
Anurag Khandelwal created ARROW-4295: Summary: [Plasma] Incorrect log message when evicting objects Key: ARROW-4295 URL: https://issues.apache.org/jira/browse/ARROW-4295 Project: Apache Arrow Issue Type: Bug Components: C++, Plasma (C++) Affects Versions: 0.11.1 Reporter: Anurag Khandelwal Fix For: 0.13.0 When Plasma evicts objects on running out of memory, it prints log messages of the form: {quote}There is not enough space to create this object, so evicting x objects to free up y bytes. The number of bytes in use (before this eviction) is z.{quote} However, the reported number of bytes in use (before this eviction) actually reports the number of bytes *after* the eviction. A straightforward fix is to simply replace z with (y+z). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment
[ https://issues.apache.org/jira/browse/ARROW-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746937#comment-16746937 ] Anurag Khandelwal commented on ARROW-4296: -- cc [~pcmoritz] > [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled > crashes due to improper memory alignment > --- > > Key: ARROW-4296 > URL: https://issues.apache.org/jira/browse/ARROW-4296 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Fix For: 0.13.0 > > > Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, > most likely due to improper memory alignment. This can be resolved by > changing the dlmemalign call during initialization to use slightly smaller > memory (by ~8KB). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment
Anurag Khandelwal created ARROW-4296: Summary: [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment Key: ARROW-4296 URL: https://issues.apache.org/jira/browse/ARROW-4296 Project: Apache Arrow Issue Type: Bug Components: C++, Plasma (C++) Affects Versions: 0.11.1 Reporter: Anurag Khandelwal Fix For: 0.13.0 Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, most likely due to improper memory alignment. This can be resolved by changing the dlmemalign call during initialization to use slightly smaller memory (by ~8KB). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment
[ https://issues.apache.org/jira/browse/ARROW-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4296: -- Labels: pull-request-available (was: ) > [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled > crashes due to improper memory alignment > --- > > Key: ARROW-4296 > URL: https://issues.apache.org/jira/browse/ARROW-4296 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > > Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, > most likely due to improper memory alignment. This can be resolved by > changing the dlmemalign call during initialization to use slightly smaller > memory (by ~8KB). -- This message was sent by Atlassian JIRA (v7.6.3#76005)