[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932797#comment-16932797
 ] 

Mitar commented on ARROW-1664:
--

It is like extension of DataFrame to multiple dimensions.
{quote}Xarray introduces labels in the form of dimensions, coordinates and 
attributes on top of raw [NumPy|http://www.numpy.org/]-like arrays, which 
allows for a more intuitive, more concise, and less error-prone developer 
experience. The package includes a large and growing library of domain-agnostic 
functions for advanced analytics and visualization with these data structures.

Xarray was inspired by and borrows heavily from 
[pandas|http://pandas.pydata.org/], the popular data analysis package focused 
on labelled tabular data.
{quote}
So internally it is ndarrays. This is why I think serialization could be 
possible, similar to how Pandas DataFrames internally use ndarrays.

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932788#comment-16932788
 ] 

Mitar commented on ARROW-1664:
--

I see. So why not also have then `pa.Table.from_xarray`?

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932786#comment-16932786
 ] 

Mitar commented on ARROW-2051:
--

Sounds good. I will then explore how to do that through extension types.

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932670#comment-16932670
 ] 

Mitar commented on ARROW-1664:
--

Nice. And so Arrow support for Pandas DataFrame is only through:

[https://github.com/pandas-dev/pandas/blob/34fff1f336d3b083dd09f5036c2bb9b80edfb619/pandas/core/arrays/integer.py#L370]

There is no special handling of Pandas DataFrame in arrow?

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932618#comment-16932618
 ] 

Mitar commented on ARROW-2051:
--

I mean, you have 128 bit numbers in Arrow? So why not supporting converting 
UUID to that?

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932606#comment-16932606
 ] 

Mitar commented on ARROW-1664:
--

As [~wesmckinn]  wrote: the idea is to get zero-copy reads. So serializing 
might be slow, but deserializing would be fast.

I think Pandas DataFrame also is not using "arrow under the hood" but arrow 
supports it. Why not then also work on supporting xarray?

It is maybe not a priority now, but it should/could be done, in my view. So I 
would ask to reopen this issue.

> [Python] Support for xarray.DataArray and xarray.Dataset
> 
>
> Key: ARROW-1664
> URL: https://issues.apache.org/jira/browse/ARROW-1664
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Mitar
>Priority: Minor
>
> DataArray and Dataset are efficient in-memory representations for multi 
> dimensional data. It would be great if one could share them between processes 
> using Arrow.
> http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
> http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3399) [Python] Cannot serialize numpy matrix object

2019-03-30 Thread Mitar (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806006#comment-16806006
 ] 

Mitar commented on ARROW-3399:
--

This is still happening in 0.12.1.

I think this should be fixed because it will be quite some time before nobody 
will be using matrix class anymore, even if it is deprecated.

> [Python] Cannot serialize numpy matrix object
> -
>
> Key: ARROW-3399
> URL: https://issues.apache.org/jira/browse/ARROW-3399
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Mitar
>Priority: Major
>
> This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
> Linux.
> {code:java}
> from pyarrow import plasma
> import numpy
> import time
> import subprocess
> import os
> import signal
> m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))
> process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
> '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
> stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
> time.sleep(5)
> client = plasma.connect('/tmp/plasma', '', 0)
> try:
> client.put(m)
> finally:
> client.disconnect()
> os.killpg(os.getpgid(process.pid), signal.SIGTERM)
> {code}
> Error:
> {noformat}
>   File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
>   File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
>   File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum 
> recursion depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3399) Cannot serialize numpy matrix object

2018-10-02 Thread Mitar (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635077#comment-16635077
 ] 

Mitar commented on ARROW-3399:
--

Oh, the difference is not between Arrow 0.9.0 and 0.10.0 but between numpy 
1.14.3 and 1.15.2. Upgrading numpy to latest version throws the error above, 
while it works on an older version.

> Cannot serialize numpy matrix object
> 
>
> Key: ARROW-3399
> URL: https://issues.apache.org/jira/browse/ARROW-3399
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Mitar
>Priority: Major
>
> This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
> Linux.
> {code:java}
> from pyarrow import plasma
> import numpy
> import time
> import subprocess
> import os
> import signal
> m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))
> process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
> '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
> stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
> time.sleep(5)
> client = plasma.connect('/tmp/plasma', '', 0)
> try:
> client.put(m)
> finally:
> client.disconnect()
> os.killpg(os.getpgid(process.pid), signal.SIGTERM)
> {code}
> Error:
> {noformat}
>   File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
>   File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
>   File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum 
> recursion depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3399) Cannot serialize numpy matrix object

2018-10-02 Thread Mitar (JIRA)
Mitar created ARROW-3399:


 Summary: Cannot serialize numpy matrix object
 Key: ARROW-3399
 URL: https://issues.apache.org/jira/browse/ARROW-3399
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mitar


This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
Linux.
{code:java}
from pyarrow import plasma
import numpy
import time
import subprocess
import os
import signal

m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))

process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
'/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
time.sleep(5)
client = plasma.connect('/tmp/plasma', '', 0)

try:
client.put(m)
finally:
client.disconnect()
os.killpg(os.getpgid(process.pid), signal.SIGTERM)
{code}
Error:
{noformat}
  File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion 
depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463610#comment-16463610
 ] 

Mitar commented on ARROW-2273:
--

Isn't it still open for a debate if it will be deprecated? 

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428518#comment-16428518
 ] 

Mitar commented on ARROW-2355:
--

Thanks for explanation. I understand the issues here. And thank you for all the 
work around resolving them.

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428433#comment-16428433
 ] 

Mitar commented on ARROW-2355:
--

How do I do a dependency then on PyArrow, so that on Linux and Windows it would 
install 0.9.0, and on Mac OS X 0.9.0.post1? I currently use strict (==) 
versions in my requirements.txt.

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-04-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428427#comment-16428427
 ] 

Mitar commented on ARROW-2355:
--

pip install pyarrow==0.9.0.post1 
Collecting pyarrow==0.9.0.post1
  Could not find a version that satisfies the requirement pyarrow==0.9.0.post1 
(from versions: 0.2.0, 0.3.0, 0.4.0, 0.4.1, 0.5.0.post2, 0.6.0, 0.7.0, 0.7.1, 
0.8.0, 0.9.0)
No matching distribution found for pyarrow==0.9.0.post1

I do not see it here either: https://pypi.python.org/pypi/pyarrow

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.9.1
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2279) [Python] Better error message if lib cannot be found

2018-03-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388389#comment-16388389
 ] 

Mitar commented on ARROW-2279:
--

Related: https://issues.apache.org/jira/projects/ARROW/issues/ARROW-2269

During debugging/understanding what is going in that issue I have found that a 
better error message to begin with would be helpful. I do not expect anyone to 
hit this once we resolve ARROW-2269, but it is still useful I think.

> [Python] Better error message if lib cannot be found
> 
>
> Key: ARROW-2279
> URL: https://issues.apache.org/jira/browse/ARROW-2279
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Mitar
>Priority: Major
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2273:


 Summary: Cannot deserialize pandas SparseDataFrame
 Key: ARROW-2273
 URL: https://issues.apache.org/jira/browse/ARROW-2273
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Mitar


>>> import pyarrow
>>> import pandas
>>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
Traceback (most recent call last):
  File "", line 1, in 
  File "serialization.pxi", line 441, in pyarrow.lib.deserialize
  File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
  File "serialization.pxi", line 257, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "serialization.pxi", line 174, in 
pyarrow.lib.SerializationContext._deserialize_callback
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
line 77, in _deserialize_pandas_dataframe
return pdcompat.serialized_dict_to_dataframe(data)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in serialized_dict_to_dataframe
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in 
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 478, in _reconstruct_block
block = _int.make_block(block_arr, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 2957, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 120, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387565#comment-16387565
 ] 

Mitar commented on ARROW-2269:
--

I opened a pull request making it easier to debug this: 
https://github.com/apache/arrow/pull/1712

> Cannot build bdist_wheel for Python
> ---
>
> Key: ARROW-2269
> URL: https://issues.apache.org/jira/browse/ARROW-2269
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> I am trying current master.
> I ran:
> 
> python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
> --with-plasma --bundle-arrow-cpp bdist_wheel
> 
> Output:
> 
> running build_ext
> creating build
> creating build/temp.linux-x86_64-3.6
> -- Runnning cmake for pyarrow
> cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python  
> -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
> -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
> -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python
> -- The C compiler identification is GNU 7.2.0
> -- The CXX compiler identification is GNU 7.2.0
> -- Check for working C compiler: /usr/bin/cc
> -- Check for working C compiler: /usr/bin/cc -- works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: /usr/bin/c++
> -- Check for working CXX compiler: /usr/bin/c++ -- works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> INFOCompiler command: /usr/bin/c++
> INFOCompiler version: Using built-in specs.
> COLLECT_GCC=/usr/bin/c++
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
> 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
> --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
> --with-gcc-major-version-only --program-suffix=-7 
> --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
> --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
> --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
> --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
> --with-default-libstdcxx-abi=new --enable-gnu-unique-object 
> --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
> --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
> --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
> --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
> --enable-offload-targets=nvptx-none --without-cuda-driver 
> --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
> --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) 
> INFOCompiler id: GNU
> Selected compiler gcc 7.2.0
> -- Performing Test CXX_SUPPORTS_SSE3
> -- Performing Test CXX_SUPPORTS_SSE3 - Success
> -- Performing Test CXX_SUPPORTS_ALTIVEC
> -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed
> Configured for RELEASE build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug,...})
> -- Build Type: RELEASE
> -- Build output directory: 
> .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/
> -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version 
> "3.6.3") 
> -- Searching for Python libs in 
> .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
> -- Looking for python3.6m
> -- Found Python lib 
> /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
> -- Found PythonLibs: 
> /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
> -- Found NumPy: version "1.14.1" 
> .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include
> -- Searching for Python libs in 
> .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
> -- Looking for python3.6m
> -- Found Python lib 
> /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
> -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
> -- Checking for module 'arrow'
> --   Found arrow, version 0.9.0-SNAPSHOT
> -- Arrow ABI version: 0.0.0
> -- Arrow SO version: 0
> -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so
> -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so
> -- Boost version: 1.63.0
> -- Found the following Boost libraries:
> --   system
> --   filesystem
> --   regex
> Added shared 

[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2269:
-
Description: 
I am trying current master.

I ran:


python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel


Output:


running build_ext
creating build
creating build/temp.linux-x86_64-3.6
-- Runnning cmake for pyarrow
cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python  
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python
-- The C compiler identification is GNU 7.2.0
-- The CXX compiler identification is GNU 7.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
INFOCompiler command: /usr/bin/c++
INFOCompiler version: Using built-in specs.
COLLECT_GCC=/usr/bin/c++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu
Thread model: posix
gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) 

INFOCompiler id: GNU
Selected compiler gcc 7.2.0
-- Performing Test CXX_SUPPORTS_SSE3
-- Performing Test CXX_SUPPORTS_SSE3 - Success
-- Performing Test CXX_SUPPORTS_ALTIVEC
-- Performing Test CXX_SUPPORTS_ALTIVEC - Failed
Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
-- Build output directory: 
.../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/
-- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version 
"3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
-- Looking for python3.6m
-- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
-- Found PythonLibs: 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
-- Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
-- Looking for python3.6m
-- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
-- Checking for module 'arrow'
--   Found arrow, version 0.9.0-SNAPSHOT
-- Arrow ABI version: 0.0.0
-- Arrow SO version: 0
-- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so
-- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so
-- Boost version: 1.63.0
-- Found the following Boost libraries:
--   system
--   filesystem
--   regex
Added shared library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so
Added shared library dependency arrow_python: 
.../Temp/arrow/dist/lib/libarrow_python.so
-- Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so
Added shared library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so
-- Checking for module 'plasma'
--   Found plasma, version 
-- Plasma ABI version: 0.0.0
-- Plasma SO version: 0
-- Found the Plasma core library: .../Temp/arrow/dist/lib/libplasma.so
-- Found Plasma executable: .../Temp/arrow/dist/bin/plasma_store
Added shared library dependency libplasma: .../Temp/arrow/dist/lib/libplasma.so
-- Configuring done
-- Generating done
-- Build 

[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2269:
-
Description: 
I am trying current master.

I ran:

{{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel}}

Output:

running build_ext creating build creating build/temp.linux-x86_64-3.6 -- 
Runnning cmake for pyarrow cmake 
-DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python 
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler 
identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- 
Check for working C compiler: /usr/bin/cc -- Check for working C compiler: 
/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler 
ABI info - done -- Detecting C compile features -- Detecting C compile features 
- done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX 
compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting 
CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX 
compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler 
version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ 
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: 
x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 
7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- 
Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - 
Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test 
CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output 
directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- 
Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- 
Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- 
Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 
'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- 
Arrow SO version: 0 -- Found the Arrow core library: 
.../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: 
.../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found 
the following Boost libraries: -- system -- filesystem -- regex Added shared 
library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared 
library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- 
Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared 
library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking 
for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- 
Plasma SO version: 0 -- Found the Plasma core library: 
.../Temp/arrow/dist/lib/libplasma.so -- Found Plasma executable: 
.../Temp/arrow/dist/bin/plasma_store Added shared library dependency libplasma: 
.../Temp/arrow/dist/lib/libplasma.so -- Configuring done -- Generating 

[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2269:
-
Description: 
I am trying current master.

I ran:

{\{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel }}

Output:

{{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- 
Runnning cmake for pyarrow cmake 
-DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python 
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler 
identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- 
Check for working C compiler: /usr/bin/cc -- Check for working C compiler: 
/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler 
ABI info - done -- Detecting C compile features -- Detecting C compile features 
- done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX 
compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting 
CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX 
compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler 
version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ 
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: 
x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 
7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- 
Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - 
Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test 
CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output 
directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- 
Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- 
Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- 
Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 
'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- 
Arrow SO version: 0 -- Found the Arrow core library: 
.../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: 
.../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found 
the following Boost libraries: -- system -- filesystem -- regex Added shared 
library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared 
library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- 
Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared 
library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking 
for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- 
Plasma SO version: 0 -- Found the Plasma core library: 
.../Temp/arrow/dist/lib/libplasma.so -- Found Plasma executable: 
.../Temp/arrow/dist/bin/plasma_store Added shared library dependency libplasma: 
.../Temp/arrow/dist/lib/libplasma.so -- Configuring done -- Generating 

[jira] [Created] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2269:


 Summary: Cannot build bdist_wheel for Python
 Key: ARROW-2269
 URL: https://issues.apache.org/jira/browse/ARROW-2269
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Affects Versions: 0.9.0
Reporter: Mitar


I am trying current master.

I ran:

{{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel }}

Output:

{{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- 
Runnning cmake for pyarrow cmake 
-DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python 
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler 
identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- 
Check for working C compiler: /usr/bin/cc -- Check for working C compiler: 
/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler 
ABI info - done -- Detecting C compile features -- Detecting C compile features 
- done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX 
compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting 
CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX 
compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler 
version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ 
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: 
x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 
7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- 
Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - 
Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test 
CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output 
directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- 
Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- 
Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- 
Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 
'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- 
Arrow SO version: 0 -- Found the Arrow core library: 
.../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: 
.../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found 
the following Boost libraries: -- system -- filesystem -- regex Added shared 
library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared 
library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- 
Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared 
library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking 
for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- 
Plasma SO version: 0 -- Found the Plasma core library: 
.../Temp/arrow/dist/lib/libplasma.so -- Found 

[jira] [Commented] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals

2018-03-05 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386724#comment-16386724
 ] 

Mitar commented on ARROW-2250:
--

I made an observation that parent process is unnecessary and made pull request 
which just replaces it with plasma store executable. In this way then all 
future signal are handled by the process through that executable.

This makes everything cleaner and means that it is not needed to do any signal 
passing or cleanup.

> plasma_store process should cleanup on INT and TERM signals
> ---
>
> Key: ARROW-2250
> URL: https://issues.apache.org/jira/browse/ARROW-2250
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Mitar
>Priority: Major
>  Labels: pull-request-available
>
> Currently, if you send an INT and TERM signal to a parent plasma store 
> process (Python one) it terminates it without cleaning the child process. 
> This makes it hard to run plasma store in non-interactive mode. Inside shell 
> ctrl-c kills both processes.
> Moreover, INT prints out an ugly KeyboardInterrup exception. Probably 
> something nicer should be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2264) Efficiently serialize numpy arrays with dtype of unicode fixed length string

2018-03-05 Thread Mitar (JIRA)
Mitar created ARROW-2264:


 Summary: Efficiently serialize numpy arrays with dtype of unicode 
fixed length string
 Key: ARROW-2264
 URL: https://issues.apache.org/jira/browse/ARROW-2264
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Looking at the numpy array serialization code it seems that if I have a dtype 
like ">> np.array(['aaa', 'bbb'])}}
{{array(['aaa', 'bbb'], dtype='

[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals as well

2018-03-03 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2250:
-
Description: 
Currently, if you send an INT and TERM signal to a parent plasma store process 
(Python one) it terminates it without cleaning the child process. This makes it 
hard to run plasma store in non-interactive mode. Inside shell ctrl-c kills 
both processes.

Moreover, INT prints out an ugly KeyboardInterrup exception. Probably something 
nicer should be done.

  was:Currently it cleans up on INT signal. But if it gets the TERM signal, 
then it kills the parent process (Python one) but not the binary process. I 
think both TERM and INT signals should be handled the same.

Summary: plasma_store process should cleanup on INT and TERM signals as 
well  (was: plasma_store process should cleanup on TERM signal as well)

> plasma_store process should cleanup on INT and TERM signals as well
> ---
>
> Key: ARROW-2250
> URL: https://issues.apache.org/jira/browse/ARROW-2250
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Mitar
>Priority: Major
>
> Currently, if you send an INT and TERM signal to a parent plasma store 
> process (Python one) it terminates it without cleaning the child process. 
> This makes it hard to run plasma store in non-interactive mode. Inside shell 
> ctrl-c kills both processes.
> Moreover, INT prints out an ugly KeyboardInterrup exception. Probably 
> something nicer should be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals

2018-03-03 Thread Mitar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mitar updated ARROW-2250:
-
Summary: plasma_store process should cleanup on INT and TERM signals  (was: 
plasma_store process should cleanup on INT and TERM signals as well)

> plasma_store process should cleanup on INT and TERM signals
> ---
>
> Key: ARROW-2250
> URL: https://issues.apache.org/jira/browse/ARROW-2250
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Mitar
>Priority: Major
>
> Currently, if you send an INT and TERM signal to a parent plasma store 
> process (Python one) it terminates it without cleaning the child process. 
> This makes it hard to run plasma store in non-interactive mode. Inside shell 
> ctrl-c kills both processes.
> Moreover, INT prints out an ugly KeyboardInterrup exception. Probably 
> something nicer should be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2250) plasma_store process should cleanup on TERM signal as well

2018-03-03 Thread Mitar (JIRA)
Mitar created ARROW-2250:


 Summary: plasma_store process should cleanup on TERM signal as well
 Key: ARROW-2250
 URL: https://issues.apache.org/jira/browse/ARROW-2250
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Currently it cleans up on INT signal. But if it gets the TERM signal, then it 
kills the parent process (Python one) but not the binary process. I think both 
TERM and INT signals should be handled the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes

2018-03-03 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385004#comment-16385004
 ] 

Mitar commented on ARROW-1919:
--

Is there any plan when this will be released as new version?

> Plasma hanging if object id is not 20 bytes
> ---
>
> Key: ARROW-1919
> URL: https://issues.apache.org/jira/browse/ARROW-1919
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> This happens if plasma's capability to put an object with a user defined 
> object id is used if the object id is not 20 bytes long. Plasma will hang 
> upon get in that case, we should give an error instead.
> See https://github.com/ray-project/ray/issues/1315



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes

2017-12-12 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288743#comment-16288743
 ] 

Mitar commented on ARROW-1919:
--

The error should probably be in put, no?

> Plasma hanging if object id is not 20 bytes
> ---
>
> Key: ARROW-1919
> URL: https://issues.apache.org/jira/browse/ARROW-1919
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>
> This happens if plasma's capability to put an object with a user defined 
> object id is used if the object id is not 20 bytes long. Plasma will hang 
> upon get in that case, we should give an error instead.
> See https://github.com/ray-project/ray/issues/1315



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1664) Support for xarray.DataArray and xarray.Dataset

2017-10-10 Thread Mitar (JIRA)
Mitar created ARROW-1664:


 Summary: Support for xarray.DataArray and xarray.Dataset
 Key: ARROW-1664
 URL: https://issues.apache.org/jira/browse/ARROW-1664
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Mitar


DataArray and Dataset are efficient in-memory representations for multi 
dimensional data. It would be great if one could share them between processes 
using Arrow.

http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)