[jira] [Created] (ARROW-4286) [C++/R] Namespace vendored Boost

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4286:
--

 Summary: [C++/R] Namespace vendored Boost
 Key: ARROW-4286
 URL: https://issues.apache.org/jira/browse/ARROW-4286
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging, R
Reporter: Uwe L. Korn
 Fix For: 0.13.0


For R, we vendor Boost and thus also include the symbols privately in our 
modules. While they are private, some things like virtual destructors can still 
interfere with other packages that vendor Boost. We should also namespace the 
vendored Boost as we do in the manylinux1 packaging: 
https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4280) [C++][Documentation] It looks like flex and bison are required for parquet

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4280.

   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3417
[https://github.com/apache/arrow/pull/3417]

> [C++][Documentation] It looks like flex and bison are required for parquet
> --
>
> Key: ARROW-4280
> URL: https://issues.apache.org/jira/browse/ARROW-4280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Documentation
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When trying to build parquet, it initially failed because it couldn't find 
> flex and bison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4261) [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using Arrow as a subproject

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-4261.

Resolution: Fixed

Issue resolved by pull request 3396
[https://github.com/apache/arrow/pull/3396]

> [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using 
> Arrow as a subproject
> ---
>
> Key: ARROW-4261
> URL: https://issues.apache.org/jira/browse/ARROW-4261
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Michael Vilim
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Builds using Arrow as a CMake subproject (using add_subdirectory) will fail 
> if the IPC, Flight, Thrift, or Plasma features are turned on. This issue is 
> caused by the use of CMAKE_SOURCE_DIR and CMAKE_BINARY_DIR which point to the 
> top level directories of the CMake project (source and output, respectively).
> In most of the cases where these paths are used, they are intended to point 
> to the Arrow source and build dirs. Defining and using CMake variables for 
> those top level Arrow folders solves the issue.
> I will open a pull request to fix the issue.
> A project that demonstrates the issue and the patch can be found here: 
> https://github.com/mvilim/arrow-as-subproject
> Note: there are several other locations in the repo where CMAKE_SOURCE_DIR 
> and CMAKE_BINARY_DIR are used (outside of the main cpp build, the 
> cmake_modules, and the Gandiva subproject, for example). I hesitate to change 
> these without an easy way to test all the possible build paths. I choosing a 
> safe route here and changing only the most straightforward ones (and ones 
> most likely to be used with Arrow as a subproject). If you would prefer I try 
> to change all uses of these variables, let me know (and let me know if you 
> have a straightforward way to test the supported build configurations).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4261) [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using Arrow as a subproject

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4261:
--

Assignee: Michael Vilim

> [C++] CMake paths for IPC, Flight, Thrift, and Plasma don't support using 
> Arrow as a subproject
> ---
>
> Key: ARROW-4261
> URL: https://issues.apache.org/jira/browse/ARROW-4261
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Michael Vilim
>Assignee: Michael Vilim
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Builds using Arrow as a CMake subproject (using add_subdirectory) will fail 
> if the IPC, Flight, Thrift, or Plasma features are turned on. This issue is 
> caused by the use of CMAKE_SOURCE_DIR and CMAKE_BINARY_DIR which point to the 
> top level directories of the CMake project (source and output, respectively).
> In most of the cases where these paths are used, they are intended to point 
> to the Arrow source and build dirs. Defining and using CMake variables for 
> those top level Arrow folders solves the issue.
> I will open a pull request to fix the issue.
> A project that demonstrates the issue and the patch can be found here: 
> https://github.com/mvilim/arrow-as-subproject
> Note: there are several other locations in the repo where CMAKE_SOURCE_DIR 
> and CMAKE_BINARY_DIR are used (outside of the main cpp build, the 
> cmake_modules, and the Gandiva subproject, for example). I hesitate to change 
> these without an easy way to test all the possible build paths. I choosing a 
> safe route here and changing only the most straightforward ones (and ones 
> most likely to be used with Arrow as a subproject). If you would prefer I try 
> to change all uses of these variables, let me know (and let me know if you 
> have a straightforward way to test the supported build configurations).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4280) [C++][Documentation] It looks like flex and bison are required for parquet

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-4280:
---
Fix Version/s: (was: 0.12.0)
   0.13.0

> [C++][Documentation] It looks like flex and bison are required for parquet
> --
>
> Key: ARROW-4280
> URL: https://issues.apache.org/jira/browse/ARROW-4280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Documentation
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When trying to build parquet, it initially failed because it couldn't find 
> flex and bison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4287) [C++] Ensure minimal bison version on OSX for Thrift

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4287:
--

 Summary: [C++] Ensure minimal bison version on OSX for Thrift
 Key: ARROW-4287
 URL: https://issues.apache.org/jira/browse/ARROW-4287
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


Thrift currently just uses the first bison it finds but needs actually a newer 
one. We should look for the minimal version required and fall back explicitly 
to homebrew and use the newer version if it is available there.

Note: I'll add a fix in our CMake toolchain but will also try to upstream this 
to Thrift.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4167) [Gandiva] switch to arrow/util/variant

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4167:
--
Labels: pull-request-available  (was: )

> [Gandiva] switch to arrow/util/variant
> --
>
> Key: ARROW-4167
> URL: https://issues.apache.org/jira/browse/ARROW-4167
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> gandiva cpp uses boost variant. It should switch to arrow/util/variant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4288) Installation instructions don't work on Ubuntu 18.04

2019-01-18 Thread JIRA
Kirill Müller created ARROW-4288:


 Summary: Installation instructions don't work on Ubuntu 18.04
 Key: ARROW-4288
 URL: https://issues.apache.org/jira/browse/ARROW-4288
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Kirill Müller


The R package seems to require statically linking to Boost. One way to achieve 
this on Ubuntu is to use the vendored Boost.

See also ARROW-4286 which discusses namespacing Boost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4288) [R] Installation instructions don't work on Ubuntu 18.04

2019-01-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Müller updated ARROW-4288:
-
Summary: [R] Installation instructions don't work on Ubuntu 18.04  (was: 
Installation instructions don't work on Ubuntu 18.04)

> [R] Installation instructions don't work on Ubuntu 18.04
> 
>
> Key: ARROW-4288
> URL: https://issues.apache.org/jira/browse/ARROW-4288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Kirill Müller
>Priority: Major
>
> The R package seems to require statically linking to Boost. One way to 
> achieve this on Ubuntu is to use the vendored Boost.
> See also ARROW-4286 which discusses namespacing Boost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4288) [R] Installation instructions don't work on Ubuntu 18.04

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4288:
--
Labels: pull-request-available  (was: )

> [R] Installation instructions don't work on Ubuntu 18.04
> 
>
> Key: ARROW-4288
> URL: https://issues.apache.org/jira/browse/ARROW-4288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Kirill Müller
>Priority: Major
>  Labels: pull-request-available
>
> The R package seems to require statically linking to Boost. One way to 
> achieve this on Ubuntu is to use the vendored Boost.
> See also ARROW-4286 which discusses namespacing Boost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4275:
--
Labels: pull-request-available  (was: )

> [C++] gandiva-decimal_single_test extremely slow
> 
>
> Key: ARROW-4275
> URL: https://issues.apache.org/jira/browse/ARROW-4275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, Gandiva
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
>
> {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind:
> {code}
>  99/100 Test #128: gandiva-decimal_single_test ...   Passed  
> 397.11 sec
> 100/100 Test #130: gandiva-decimal_single_test_static    Passed  
> 338.97 sec
> {code}
> (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707)
> Something should be done to make it faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow

2019-01-18 Thread Pindikura Ravindra (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746293#comment-16746293
 ] 

Pindikura Ravindra commented on ARROW-4275:
---

gandiva has a cache of JITs to solve these kind of issues. but, there is a bug 
in the cache lookup.

> [C++] gandiva-decimal_single_test extremely slow
> 
>
> Key: ARROW-4275
> URL: https://issues.apache.org/jira/browse/ARROW-4275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, Gandiva
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind:
> {code}
>  99/100 Test #128: gandiva-decimal_single_test ...   Passed  
> 397.11 sec
> 100/100 Test #130: gandiva-decimal_single_test_static    Passed  
> 338.97 sec
> {code}
> (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707)
> Something should be done to make it faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow

2019-01-18 Thread Pindikura Ravindra (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746337#comment-16746337
 ] 

Pindikura Ravindra commented on ARROW-4275:
---

After the fix,

 

99/99 Test #128: gandiva-decimal_single_test ... Passed 143.53 sec

> [C++] gandiva-decimal_single_test extremely slow
> 
>
> Key: ARROW-4275
> URL: https://issues.apache.org/jira/browse/ARROW-4275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, Gandiva
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind:
> {code}
>  99/100 Test #128: gandiva-decimal_single_test ...   Passed  
> 397.11 sec
> 100/100 Test #130: gandiva-decimal_single_test_static    Passed  
> 338.97 sec
> {code}
> (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707)
> Something should be done to make it faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4287) [C++] Ensure minimal bison version on OSX for Thrift

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4287:
--
Labels: pull-request-available  (was: )

> [C++] Ensure minimal bison version on OSX for Thrift
> 
>
> Key: ARROW-4287
> URL: https://issues.apache.org/jira/browse/ARROW-4287
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Thrift currently just uses the first bison it finds but needs actually a 
> newer one. We should look for the minimal version required and fall back 
> explicitly to homebrew and use the newer version if it is available there.
> Note: I'll add a fix in our CMake toolchain but will also try to upstream 
> this to Thrift.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4289) [C++] Forward AR and RANLIB to thirdparty builds

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4289:
--

 Summary: [C++] Forward AR and RANLIB to thirdparty builds
 Key: ARROW-4289
 URL: https://issues.apache.org/jira/browse/ARROW-4289
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


On OSX Mojave, it seems that there are many version of AR present. CMake seems 
to detect the right one whereas some thirdparty tooling picks up the wrong one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4289) [C++] Forward AR and RANLIB to thirdparty builds

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4289:
--
Labels: pull-request-available  (was: )

> [C++] Forward AR and RANLIB to thirdparty builds
> 
>
> Key: ARROW-4289
> URL: https://issues.apache.org/jira/browse/ARROW-4289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> On OSX Mojave, it seems that there are many version of AR present. CMake 
> seems to detect the right one whereas some thirdparty tooling picks up the 
> wrong one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4290:
--

Assignee: Uwe L. Korn

> [C++/Gandiva] Support detecting correct LLVM version in Homebrew
> 
>
> Key: ARROW-4290
> URL: https://issues.apache.org/jira/browse/ARROW-4290
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Gandiva
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>
> We should also search in homebrew for the matching LLVM version for Gandiva 
> on OSX. You can install it via {{brew install llvm@6}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4290:
--

 Summary: [C++/Gandiva] Support detecting correct LLVM version in 
Homebrew
 Key: ARROW-4290
 URL: https://issues.apache.org/jira/browse/ARROW-4290
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Gandiva
Reporter: Uwe L. Korn


We should also search in homebrew for the matching LLVM version for Gandiva on 
OSX. You can install it via {{brew install llvm@6}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4290:
--
Labels: pull-request-available  (was: )

> [C++/Gandiva] Support detecting correct LLVM version in Homebrew
> 
>
> Key: ARROW-4290
> URL: https://issues.apache.org/jira/browse/ARROW-4290
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Gandiva
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> We should also search in homebrew for the matching LLVM version for Gandiva 
> on OSX. You can install it via {{brew install llvm@6}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4167) [Gandiva] switch to arrow/util/variant

2019-01-18 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4167.
-
   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.12.0

Issue resolved by pull request 3425
[https://github.com/apache/arrow/pull/3425]

> [Gandiva] switch to arrow/util/variant
> --
>
> Key: ARROW-4167
> URL: https://issues.apache.org/jira/browse/ARROW-4167
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> gandiva cpp uses boost variant. It should switch to arrow/util/variant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4291) [Dev] Support selecting features in release scripts

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4291:
--

 Summary: [Dev] Support selecting features in release scripts
 Key: ARROW-4291
 URL: https://issues.apache.org/jira/browse/ARROW-4291
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Developer Tools, Packaging
Reporter: Uwe L. Korn


Sometimes not all components can be verified on a system. We should provide 
some environment variables to exclude them to proceed to the next step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4291) [Dev] Support selecting features in release scripts

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4291:
--
Labels: pull-request-available  (was: )

> [Dev] Support selecting features in release scripts
> ---
>
> Key: ARROW-4291
> URL: https://issues.apache.org/jira/browse/ARROW-4291
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools, Packaging
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> Sometimes not all components can be verified on a system. We should provide 
> some environment variables to exclude them to proceed to the next step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4291) [Dev] Support selecting features in release scripts

2019-01-18 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-4291:
--

Assignee: Uwe L. Korn

> [Dev] Support selecting features in release scripts
> ---
>
> Key: ARROW-4291
> URL: https://issues.apache.org/jira/browse/ARROW-4291
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools, Packaging
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Sometimes not all components can be verified on a system. We should provide 
> some environment variables to exclude them to proceed to the next step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4264) [C++] Convert DCHECKs in that check compute/* input parameters to error statuses

2019-01-18 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4264:

Summary: [C++] Convert DCHECKs in that check compute/* input parameters to 
error statuses  (was: Convert DCHECKs in that check compute/* input parameters 
to error statuses)

> [C++] Convert DCHECKs in that check compute/* input parameters to error 
> statuses
> 
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>
> DCHECKs seem to be used where Status::Invalid is more appropriate (so 
> programs don't crash).  See conversation on 
> https://github.com/apache/arrow/pull/3287/files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt

2019-01-18 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4254:
---

Assignee: Wes McKinney

> [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
> --
>
> Key: ARROW-4254
> URL: https://issues.apache.org/jira/browse/ARROW-4254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> These tests use an API that was not available in the Boost in Ubuntu 14.04; 
> we can change them to use the more compatible API
> {code}
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:
>  In member function ‘virtual void 
> gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’:
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
> ^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203:
>  error: template argument 1 is invalid
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>   
> ^
> make[2]: *** 
> [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error 
> 1
> make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2
> make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails

2019-01-18 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1918.
---
Resolution: Not A Problem

Closing as stale. The integration part works now

> [JS] Integration portion of verify-release-candidate.sh fails
> -
>
> Key: ARROW-1918
> URL: https://issues.apache.org/jira/browse/ARROW-1918
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> I'm going to temporarily disable this in my fixes in ARROW-1917



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2959) Dockerize verify-release-candidate.{sh,bat}

2019-01-18 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2959.
---
Resolution: Won't Fix

Closing as Won't Fix. The release verification process is regularly turning up 
issues that would be occluded by a Dockerized build (e.g. I found several 
problems on Ubuntu 14.04 for 0.12)

I think we should definitely make it more straightforward / simpler to set up 
the user environment to run the script though

> Dockerize verify-release-candidate.{sh,bat}
> ---
>
> Key: ARROW-2959
> URL: https://issues.apache.org/jira/browse/ARROW-2959
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Priority: Major
>
> There are a number of issues with the linux version of this script that would 
> disappear if the commands were all being run in a docker container.
> Anyone with docker installed should be able to verify the release candidate
> We could probably do the same for windows as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4292) [Release] Add script to test release verification script against master branch

2019-01-18 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4292:
---

 Summary: [Release] Add script to test release verification script 
against master branch
 Key: ARROW-4292
 URL: https://issues.apache.org/jira/browse/ARROW-4292
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 0.13.0


This should enable us to find problems with the verification script well before 
releases happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4254:
--
Labels: pull-request-available  (was: )

> [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
> --
>
> Key: ARROW-4254
> URL: https://issues.apache.org/jira/browse/ARROW-4254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> These tests use an API that was not available in the Boost in Ubuntu 14.04; 
> we can change them to use the more compatible API
> {code}
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:
>  In member function ‘virtual void 
> gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’:
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
> ^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203:
>  error: template argument 1 is invalid
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>   
> ^
> make[2]: *** 
> [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error 
> 1
> make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2
> make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4250) [C++][Gandiva] Use approximate comparisons for floating point numbers in gandiva-projector-test

2019-01-18 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746457#comment-16746457
 ] 

Wes McKinney commented on ARROW-4250:
-

Of course this failure has ended up being non-deterministic. 

I'm going to add an approximate version of {{AssertArraysEqual}} in 
arrow_testing so that we can use that here, and anywhere where we are doing 
floating point comparisons where equality can be within acceptable tolerance 
(~1E-13 or so)

> [C++][Gandiva] Use approximate comparisons for floating point numbers in 
> gandiva-projector-test
> ---
>
> Key: ARROW-4250
> URL: https://issues.apache.org/jira/browse/ARROW-4250
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I experienced a failure due to floating point comparison when running the 
> release verification script for 0.12.0 RC2. 
> {code}
> [==] Running 13 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 13 tests from TestProjector
> [ RUN  ] TestProjector.TestProjectCache
> [   OK ] TestProjector.TestProjectCache (584 ms)
> [ RUN  ] TestProjector.TestProjectCacheFieldNames
> [   OK ] TestProjector.TestProjectCacheFieldNames (319 ms)
> [ RUN  ] TestProjector.TestProjectCacheDouble
> [   OK ] TestProjector.TestProjectCacheDouble (304 ms)
> [ RUN  ] TestProjector.TestProjectCacheFloat
> [   OK ] TestProjector.TestProjectCacheFloat (305 ms)
> [ RUN  ] TestProjector.TestIntSumSub
> [   OK ] TestProjector.TestIntSumSub (200 ms)
> [ RUN  ] TestProjector.TestAllIntTypes
> [   OK ] TestProjector.TestAllIntTypes (1945 ms)
> [ RUN  ] TestProjector.TestExtendedMath
> /tmp/arrow-0.12.0.a2ADf/apache-arrow-0.12.0/cpp/src/gandiva/tests/projector_test.cc:358:
>  Failure
> Value of: (expected_cbrt)->Equals(outputs.at(0))
>   Actual: false
> Expected: true
> expected array: [
>   2.51984,
>   2.15443,
>   -2.41014,
>   2.02469
> ] actual array: [
>   2.51984,
>   2.15443,
>   -2.41014,
>   2.02469
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4293) [C++] Can't access parquet statistics on binary columns

2019-01-18 Thread Ildar (JIRA)
Ildar created ARROW-4293:


 Summary: [C++] Can't access parquet statistics on binary columns
 Key: ARROW-4293
 URL: https://issues.apache.org/jira/browse/ARROW-4293
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Ildar


Hi,

I'm trying to use per-column statistics (min/max values) to filter out row 
groups while reading parquet file. But I don't see statistics built for binary 
columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards 
statistics that have sort order {{UNSIGNED }}and haven't been created by 
{{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} 
before. But do they still persist?

For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
works fine for me if I remove this code from {{HasCorrectStatistics()}} func:

{{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}}
{{    return false; }}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4293) [C++] Can't access parquet statistics on binary columns

2019-01-18 Thread Ildar (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ildar updated ARROW-4293:
-
Description: 
Hi,

I'm trying to use per-column statistics (min/max values) to filter out row 
groups while reading parquet file. But I don't see statistics built for binary 
columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards 
statistics that have sort order {{UNSIGNED and haven't been created by 
parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} 
before. But do they still persist?

For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
works fine for me if I remove this code from {{HasCorrectStatistics()}} func:

 
{code:java}
if (SortOrder::SIGNED != sort_order && !max_equals_min) {
    return false;
}{code}
 

  was:
Hi,

I'm trying to use per-column statistics (min/max values) to filter out row 
groups while reading parquet file. But I don't see statistics built for binary 
columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards 
statistics that have sort order {{UNSIGNED }}and haven't been created by 
{{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} 
before. But do they still persist?

For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
works fine for me if I remove this code from {{HasCorrectStatistics()}} func:

{{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}}
{{    return false; }}}


> [C++] Can't access parquet statistics on binary columns
> ---
>
> Key: ARROW-4293
> URL: https://issues.apache.org/jira/browse/ARROW-4293
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Ildar
>Priority: Major
>
> Hi,
> I'm trying to use per-column statistics (min/max values) to filter out row 
> groups while reading parquet file. But I don't see statistics built for 
> binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} 
> discards statistics that have sort order {{UNSIGNED and haven't been created 
> by parquet-cpp}}. As I understand there used to be some issues in 
> {{parquet-mr}} before. But do they still persist?
> For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
> seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
> works fine for me if I remove this code from {{HasCorrectStatistics()}} func:
>  
> {code:java}
> if (SortOrder::SIGNED != sort_order && !max_equals_min) {
>     return false;
> }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4293) [C++] Can't access parquet statistics on binary columns

2019-01-18 Thread Deepak Majeti (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746555#comment-16746555
 ] 

Deepak Majeti commented on ARROW-4293:
--

This should be a Parquet JIRA. [~wesmckinn] Can we move this Jira to the 
Parquet project?

{{HasCorrectStatistics()}} has to be updated to accept all statistics written 
by parquet-mr 1.10.0

parquet-mr implemented the new fixed min-max statistics in the following Jira 
that went into the 1.10.0 release

https://issues.apache.org/jira/browse/PARQUET-1025

> [C++] Can't access parquet statistics on binary columns
> ---
>
> Key: ARROW-4293
> URL: https://issues.apache.org/jira/browse/ARROW-4293
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Ildar
>Priority: Major
>
> Hi,
> I'm trying to use per-column statistics (min/max values) to filter out row 
> groups while reading parquet file. But I don't see statistics built for 
> binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} 
> discards statistics that have sort order {{UNSIGNED and haven't been created 
> by parquet-cpp}}. As I understand there used to be some issues in 
> {{parquet-mr}} before. But do they still persist?
> For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
> seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
> works fine for me if I remove this code from {{HasCorrectStatistics()}} func:
>  
> {code:java}
> if (SortOrder::SIGNED != sort_order && !max_equals_min) {
>     return false;
> }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Description: 
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.

 

  was:
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specif

[jira] [Created] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4294:


 Summary: [Plasma] Add support for evicting objects to external 
store
 Key: ARROW-4294
 URL: https://issues.apache.org/jira/browse/ARROW-4294
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * *The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.*

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Description: 
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner. These handles are most likely going to 
be wrappers around the external store client interfaces.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.

 

  was:
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the Dele

[jira] [Assigned] (ARROW-4253) [GLib] Cannot use non-system Boost specified with $BOOST_ROOT

2019-01-18 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-4253:
---

Assignee: Pindikura Ravindra

> [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
> -
>
> Key: ARROW-4253
> URL: https://issues.apache.org/jira/browse/ARROW-4253
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Pindikura Ravindra
>Priority: Major
> Fix For: 0.13.0
>
>
> When trying to verify the 0.12 RC2 with Boost installed in a separate 
> directory set to BOOST_ROOT, this directory is not added to the include path, 
> causing the build to fail to find {{boost/variant.hpp}}, which is leaked in 
> the Gandiva headers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4253) [GLib] Cannot use non-system Boost specified with $BOOST_ROOT

2019-01-18 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-4253.
-
Resolution: Fixed

Because ARROW-4167 is solved.

> [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
> -
>
> Key: ARROW-4253
> URL: https://issues.apache.org/jira/browse/ARROW-4253
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Wes McKinney
>Assignee: Pindikura Ravindra
>Priority: Major
> Fix For: 0.13.0
>
>
> When trying to verify the 0.12 RC2 with Boost installed in a separate 
> directory set to BOOST_ROOT, this directory is not added to the include path, 
> causing the build to fail to find {{boost/variant.hpp}}, which is leaked in 
> the Gandiva headers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt

2019-01-18 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-4254.
-
Resolution: Fixed

Issue resolved by pull request 3431
[https://github.com/apache/arrow/pull/3431]

> [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
> --
>
> Key: ARROW-4254
> URL: https://issues.apache.org/jira/browse/ARROW-4254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> These tests use an API that was not available in the Boost in Ubuntu 14.04; 
> we can change them to use the more compatible API
> {code}
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:
>  In member function ‘virtual void 
> gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’:
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
> ^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203:
>  error: template argument 1 is invalid
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>   
> ^
> make[2]: *** 
> [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error 
> 1
> make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2
> make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4294:
--
Labels: features pull-request-available  (was: features)

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Component/s: Plasma (C++)

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4295:
--
Labels: pull-request-available  (was: )

> [Plasma] Incorrect log message when evicting objects
> 
>
> Key: ARROW-4295
> URL: https://issues.apache.org/jira/browse/ARROW-4295
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When Plasma evicts objects on running out of memory, it prints log messages 
> of the form:
> {quote}There is not enough space to create this object, so evicting x objects 
> to free up y bytes. The number of bytes in use (before this eviction) is 
> z.{quote}
> However, the reported number of bytes in use (before this eviction) actually 
> reports the number of bytes *after* the eviction. A straightforward fix is to 
> simply replace z with (y+z).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-18 Thread Anurag Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746918#comment-16746918
 ] 

Anurag Khandelwal commented on ARROW-4295:
--

cc [~pcmoritz]

> [Plasma] Incorrect log message when evicting objects
> 
>
> Key: ARROW-4295
> URL: https://issues.apache.org/jira/browse/ARROW-4295
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Plasma evicts objects on running out of memory, it prints log messages 
> of the form:
> {quote}There is not enough space to create this object, so evicting x objects 
> to free up y bytes. The number of bytes in use (before this eviction) is 
> z.{quote}
> However, the reported number of bytes in use (before this eviction) actually 
> reports the number of bytes *after* the eviction. A straightforward fix is to 
> simply replace z with (y+z).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746919#comment-16746919
 ] 

Anurag Khandelwal commented on ARROW-4294:
--

cc [~pcmoritz]

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4295:


 Summary: [Plasma] Incorrect log message when evicting objects
 Key: ARROW-4295
 URL: https://issues.apache.org/jira/browse/ARROW-4295
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Plasma (C++)
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


When Plasma evicts objects on running out of memory, it prints log messages of 
the form:

{quote}There is not enough space to create this object, so evicting x objects 
to free up y bytes. The number of bytes in use (before this eviction) is 
z.{quote}

However, the reported number of bytes in use (before this eviction) actually 
reports the number of bytes *after* the eviction. A straightforward fix is to 
simply replace z with (y+z).





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment

2019-01-18 Thread Anurag Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746937#comment-16746937
 ] 

Anurag Khandelwal commented on ARROW-4296:
--

cc [~pcmoritz]

> [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled 
> crashes due to improper memory alignment
> ---
>
> Key: ARROW-4296
> URL: https://issues.apache.org/jira/browse/ARROW-4296
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
> Fix For: 0.13.0
>
>
> Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, 
> most likely due to improper memory alignment. This can be resolved by 
> changing the dlmemalign call during initialization to use slightly smaller 
> memory (by ~8KB).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4296:


 Summary: [Plasma] Starting Plasma store with 
use_one_memory_mapped_file enabled crashes due to improper memory alignment
 Key: ARROW-4296
 URL: https://issues.apache.org/jira/browse/ARROW-4296
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Plasma (C++)
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, most 
likely due to improper memory alignment. This can be resolved by changing the 
dlmemalign call during initialization to use slightly smaller memory (by ~8KB).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4296:
--
Labels: pull-request-available  (was: )

> [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled 
> crashes due to improper memory alignment
> ---
>
> Key: ARROW-4296
> URL: https://issues.apache.org/jira/browse/ARROW-4296
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, 
> most likely due to improper memory alignment. This can be resolved by 
> changing the dlmemalign call during initialization to use slightly smaller 
> memory (by ~8KB).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)