[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals as well

2018-03-03 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2250:
-
Description: 
Currently, if you send an INT and TERM signal to a parent plasma store process 
(Python one) it terminates it without cleaning the child process. This makes it 
hard to run plasma store in non-interactive mode. Inside shell ctrl-c kills 
both processes.

Moreover, INT prints out an ugly KeyboardInterrup exception. Probably something 
nicer should be done.

  was:Currently it cleans up on INT signal. But if it gets the TERM signal, 
then it kills the parent process (Python one) but not the binary process. I 
think both TERM and INT signals should be handled the same.

Summary: plasma_store process should cleanup on INT and TERM signals as 
well  (was: plasma_store process should cleanup on TERM signal as well)

> plasma_store process should cleanup on INT and TERM signals as well
> ---
>
> Key: ARROW-2250
> URL: https://issues.apache.org/jira/browse/ARROW-2250
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Mitar
>Priority: Major
>
> Currently, if you send an INT and TERM signal to a parent plasma store 
> process (Python one) it terminates it without cleaning the child process. 
> This makes it hard to run plasma store in non-interactive mode. Inside shell 
> ctrl-c kills both processes.
> Moreover, INT prints out an ugly KeyboardInterrup exception. Probably 
> something nicer should be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals

2018-03-03 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2250:
-
Summary: plasma_store process should cleanup on INT and TERM signals  (was: 
plasma_store process should cleanup on INT and TERM signals as well)

> plasma_store process should cleanup on INT and TERM signals
> ---
>
> Key: ARROW-2250
> URL: https://issues.apache.org/jira/browse/ARROW-2250
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Mitar
>Priority: Major
>
> Currently, if you send an INT and TERM signal to a parent plasma store 
> process (Python one) it terminates it without cleaning the child process. 
> This makes it hard to run plasma store in non-interactive mode. Inside shell 
> ctrl-c kills both processes.
> Moreover, INT prints out an ugly KeyboardInterrup exception. Probably 
> something nicer should be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2250) plasma_store process should cleanup on TERM signal as well

2018-03-03 Thread Mitar (JIRA)
Mitar created ARROW-2250:


 Summary: plasma_store process should cleanup on TERM signal as well
 Key: ARROW-2250
 URL: https://issues.apache.org/jira/browse/ARROW-2250
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Currently it cleans up on INT signal. But if it gets the TERM signal, then it 
kills the parent process (Python one) but not the binary process. I think both 
TERM and INT signals should be handled the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes

2018-03-03 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385004#comment-16385004
 ] 

Mitar commented on ARROW-1919:
--

Is there any plan when this will be released as new version?

> Plasma hanging if object id is not 20 bytes
> ---
>
> Key: ARROW-1919
> URL: https://issues.apache.org/jira/browse/ARROW-1919
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> This happens if plasma's capability to put an object with a user defined 
> object id is used if the object id is not 20 bytes long. Plasma will hang 
> upon get in that case, we should give an error instead.
> See https://github.com/ray-project/ray/issues/1315



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-03-03 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384879#comment-16384879
 ] 

Wes McKinney commented on ARROW-2247:
-

One possibility is that we could refactor the parquet-cpp build system to 
utilize the Arrow build system via a git submodule, so that the {{libparquet}} 
target can be used within a single unified build system. So we would go from 
two loosely-connected build systems to one. 

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Wes McKinney
>Priority: Major
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-03-03 Thread Deepak Majeti (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384877#comment-16384877
 ] 

Deepak Majeti commented on ARROW-2247:
--

Interesting that the same boost version is causing an issue. Could it be an 
issue with the CDLL  python call?

Having a mono repo for Arrow and Parquet definitely simplifies the build 
toolchain. But, this particular problem can only be solved if we have a single 
library consisting of both Arrow and Parquet correct?

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Wes McKinney
>Priority: Major
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2246) [Python] Use namespaced boost in manylinux1 package

2018-03-03 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384743#comment-16384743
 ] 

Uwe L. Korn commented on ARROW-2246:


Moved this 0.9.0 as we need to ship our own dynamically linked Boost

> [Python] Use namespaced boost in manylinux1 package
> ---
>
> Key: ARROW-2246
> URL: https://issues.apache.org/jira/browse/ARROW-2246
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Boost provides the functionality to generate a namespaced copy of all its 
> implementations. This means that you can have a private copy of Boost in your 
> library that will not come into conflict with other Boost installations in 
> your setting. While for e.g. conda-forge a good ecosystem exists that 
> provides the unique Boost version, in the setting of the manylinux1 wheels we 
> have no control over which other Boost version exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2246) [Python] Use namespaced boost in manylinux1 package

2018-03-03 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2246:
---
Fix Version/s: (was: 0.10.0)
   0.9.0

> [Python] Use namespaced boost in manylinux1 package
> ---
>
> Key: ARROW-2246
> URL: https://issues.apache.org/jira/browse/ARROW-2246
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.0
>
>
> Boost provides the functionality to generate a namespaced copy of all its 
> implementations. This means that you can have a private copy of Boost in your 
> library that will not come into conflict with other Boost installations in 
> your setting. While for e.g. conda-forge a good ecosystem exists that 
> provides the unique Boost version, in the setting of the manylinux1 wheels we 
> have no control over which other Boost version exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2246) [Python] Use namespaced boost in manylinux1 package

2018-03-03 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2246:
---
Priority: Blocker  (was: Major)

> [Python] Use namespaced boost in manylinux1 package
> ---
>
> Key: ARROW-2246
> URL: https://issues.apache.org/jira/browse/ARROW-2246
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
> Fix For: 0.9.0
>
>
> Boost provides the functionality to generate a namespaced copy of all its 
> implementations. This means that you can have a private copy of Boost in your 
> library that will not come into conflict with other Boost installations in 
> your setting. While for e.g. conda-forge a good ecosystem exists that 
> provides the unique Boost version, in the setting of the manylinux1 wheels we 
> have no control over which other Boost version exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2249) [Java/Python] in-process vector sharing from Java to Python

2018-03-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2249:
--

 Summary: [Java/Python] in-process vector sharing from Java to 
Python
 Key: ARROW-2249
 URL: https://issues.apache.org/jira/browse/ARROW-2249
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java - Vectors, Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.10.0


Currently we seem to use in all applications of Arrow the IPC capabilities to 
move data between a Java process and a Python process. While this is 
0-serialization, it is not zero-copy. I'm going to have a first shot at 
exposing Java Vectors in Python as {{pyarrow.Array}}.

This issue can also be used as a tracker for the various sub-tasks that will 
need to be done to complete this rather large milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)