[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset
[ https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932797#comment-16932797 ] Mitar commented on ARROW-1664: -- It is like extension of DataFrame to multiple dimensions. {quote}Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw [NumPy|http://www.numpy.org/]-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures. Xarray was inspired by and borrows heavily from [pandas|http://pandas.pydata.org/], the popular data analysis package focused on labelled tabular data. {quote} So internally it is ndarrays. This is why I think serialization could be possible, similar to how Pandas DataFrames internally use ndarrays. > [Python] Support for xarray.DataArray and xarray.Dataset > > > Key: ARROW-1664 > URL: https://issues.apache.org/jira/browse/ARROW-1664 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mitar >Priority: Minor > > DataArray and Dataset are efficient in-memory representations for multi > dimensional data. It would be great if one could share them between processes > using Arrow. > http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray > http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset
[ https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932788#comment-16932788 ] Mitar commented on ARROW-1664: -- I see. So why not also have then `pa.Table.from_xarray`? > [Python] Support for xarray.DataArray and xarray.Dataset > > > Key: ARROW-1664 > URL: https://issues.apache.org/jira/browse/ARROW-1664 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mitar >Priority: Minor > > DataArray and Dataset are efficient in-memory representations for multi > dimensional data. It would be great if one could share them between processes > using Arrow. > http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray > http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables
[ https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932786#comment-16932786 ] Mitar commented on ARROW-2051: -- Sounds good. I will then explore how to do that through extension types. > [Python] Support serializing UUID objects to tables > --- > > Key: ARROW-2051 > URL: https://issues.apache.org/jira/browse/ARROW-2051 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Omer Katz >Priority: Major > > UUID objects can be easily supported and can be represented as 128-bit > integers or a stream of bytes. > The fastest way I know to construct a UUID object is by using it's 128-bit > (16 bytes) integer representation. > > {code:java} > %timeit uuid.UUID(int=24197857161011715162171839636988778104) > 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx') > 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > %timeit uuid.UUID('12345678-1234-5678-1234-567812345678') > 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > {code} > > Right now I have to do this manually which is pretty tedious. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset
[ https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932670#comment-16932670 ] Mitar commented on ARROW-1664: -- Nice. And so Arrow support for Pandas DataFrame is only through: [https://github.com/pandas-dev/pandas/blob/34fff1f336d3b083dd09f5036c2bb9b80edfb619/pandas/core/arrays/integer.py#L370] There is no special handling of Pandas DataFrame in arrow? > [Python] Support for xarray.DataArray and xarray.Dataset > > > Key: ARROW-1664 > URL: https://issues.apache.org/jira/browse/ARROW-1664 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mitar >Priority: Minor > > DataArray and Dataset are efficient in-memory representations for multi > dimensional data. It would be great if one could share them between processes > using Arrow. > http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray > http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables
[ https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932618#comment-16932618 ] Mitar commented on ARROW-2051: -- I mean, you have 128 bit numbers in Arrow? So why not supporting converting UUID to that? > [Python] Support serializing UUID objects to tables > --- > > Key: ARROW-2051 > URL: https://issues.apache.org/jira/browse/ARROW-2051 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Omer Katz >Priority: Major > > UUID objects can be easily supported and can be represented as 128-bit > integers or a stream of bytes. > The fastest way I know to construct a UUID object is by using it's 128-bit > (16 bytes) integer representation. > > {code:java} > %timeit uuid.UUID(int=24197857161011715162171839636988778104) > 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx') > 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > %timeit uuid.UUID('12345678-1234-5678-1234-567812345678') > 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) > {code} > > Right now I have to do this manually which is pretty tedious. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1664) [Python] Support for xarray.DataArray and xarray.Dataset
[ https://issues.apache.org/jira/browse/ARROW-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932606#comment-16932606 ] Mitar commented on ARROW-1664: -- As [~wesmckinn] wrote: the idea is to get zero-copy reads. So serializing might be slow, but deserializing would be fast. I think Pandas DataFrame also is not using "arrow under the hood" but arrow supports it. Why not then also work on supporting xarray? It is maybe not a priority now, but it should/could be done, in my view. So I would ask to reopen this issue. > [Python] Support for xarray.DataArray and xarray.Dataset > > > Key: ARROW-1664 > URL: https://issues.apache.org/jira/browse/ARROW-1664 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Mitar >Priority: Minor > > DataArray and Dataset are efficient in-memory representations for multi > dimensional data. It would be great if one could share them between processes > using Arrow. > http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray > http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-3399) [Python] Cannot serialize numpy matrix object
[ https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806006#comment-16806006 ] Mitar commented on ARROW-3399: -- This is still happening in 0.12.1. I think this should be fixed because it will be quite some time before nobody will be using matrix class anymore, even if it is deprecated. > [Python] Cannot serialize numpy matrix object > - > > Key: ARROW-3399 > URL: https://issues.apache.org/jira/browse/ARROW-3399 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mitar >Priority: Major > > This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on > Linux. > {code:java} > from pyarrow import plasma > import numpy > import time > import subprocess > import os > import signal > m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) > process = subprocess.Popen(['plasma_store', '-m', '100', '-s', > '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, > stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) > time.sleep(5) > client = plasma.connect('/tmp/plasma', '', 0) > try: > client.put(m) > finally: > client.disconnect() > os.killpg(os.getpgid(process.pid), signal.SIGTERM) > {code} > Error: > {noformat} > File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put > File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum > recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3399) Cannot serialize numpy matrix object
[ https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635077#comment-16635077 ] Mitar commented on ARROW-3399: -- Oh, the difference is not between Arrow 0.9.0 and 0.10.0 but between numpy 1.14.3 and 1.15.2. Upgrading numpy to latest version throws the error above, while it works on an older version. > Cannot serialize numpy matrix object > > > Key: ARROW-3399 > URL: https://issues.apache.org/jira/browse/ARROW-3399 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mitar >Priority: Major > > This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on > Linux. > {code:java} > from pyarrow import plasma > import numpy > import time > import subprocess > import os > import signal > m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) > process = subprocess.Popen(['plasma_store', '-m', '100', '-s', > '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, > stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) > time.sleep(5) > client = plasma.connect('/tmp/plasma', '', 0) > try: > client.put(m) > finally: > client.disconnect() > os.killpg(os.getpgid(process.pid), signal.SIGTERM) > {code} > Error: > {noformat} > File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put > File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum > recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3399) Cannot serialize numpy matrix object
Mitar created ARROW-3399: Summary: Cannot serialize numpy matrix object Key: ARROW-3399 URL: https://issues.apache.org/jira/browse/ARROW-3399 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mitar This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on Linux. {code:java} from pyarrow import plasma import numpy import time import subprocess import os import signal m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) process = subprocess.Popen(['plasma_store', '-m', '100', '-s', '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) time.sleep(5) client = plasma.connect('/tmp/plasma', '', 0) try: client.put(m) finally: client.disconnect() os.killpg(os.getpgid(process.pid), signal.SIGTERM) {code} Error: {noformat} File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463610#comment-16463610 ] Mitar commented on ARROW-2273: -- Isn't it still open for a debate if it will be deprecated? > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428518#comment-16428518 ] Mitar commented on ARROW-2355: -- Thanks for explanation. I understand the issues here. And thank you for all the work around resolving them. > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.9.1 > > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428433#comment-16428433 ] Mitar commented on ARROW-2355: -- How do I do a dependency then on PyArrow, so that on Linux and Windows it would install 0.9.0, and on Mac OS X 0.9.0.post1? I currently use strict (==) versions in my requirements.txt. > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.9.1 > > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX
[ https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428427#comment-16428427 ] Mitar commented on ARROW-2355: -- pip install pyarrow==0.9.0.post1 Collecting pyarrow==0.9.0.post1 Could not find a version that satisfies the requirement pyarrow==0.9.0.post1 (from versions: 0.2.0, 0.3.0, 0.4.0, 0.4.1, 0.5.0.post2, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.9.0) No matching distribution found for pyarrow==0.9.0.post1 I do not see it here either: https://pypi.python.org/pypi/pyarrow > [Python] Unable to import pyarrow [0.9.0] OSX > - > > Key: ARROW-2355 > URL: https://issues.apache.org/jira/browse/ARROW-2355 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bradford W Littooy >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.9.1 > > > I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to > import pyarrow into a python3.6 interpreter, I get the following import error: > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py", > line 47, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: > dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: libarrow_boost_system.dylib > Referenced from: > /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib > Reason: image not found > >>> > I've installed pyarrow (0.9) on an EC2 instance with no issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2279) [Python] Better error message if lib cannot be found
[ https://issues.apache.org/jira/browse/ARROW-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388389#comment-16388389 ] Mitar commented on ARROW-2279: -- Related: https://issues.apache.org/jira/projects/ARROW/issues/ARROW-2269 During debugging/understanding what is going in that issue I have found that a better error message to begin with would be helpful. I do not expect anyone to hit this once we resolve ARROW-2269, but it is still useful I think. > [Python] Better error message if lib cannot be found > > > Key: ARROW-2279 > URL: https://issues.apache.org/jira/browse/ARROW-2279 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe L. Korn >Assignee: Mitar >Priority: Major > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
Mitar created ARROW-2273: Summary: Cannot deserialize pandas SparseDataFrame Key: ARROW-2273 URL: https://issues.apache.org/jira/browse/ARROW-2273 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Mitar >>> import pyarrow >>> import pandas >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) Traceback (most recent call last): File "", line 1, in File "serialization.pxi", line 441, in pyarrow.lib.deserialize File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from File "serialization.pxi", line 257, in pyarrow.lib.SerializedPyObject.deserialize File "serialization.pxi", line 174, in pyarrow.lib.SerializationContext._deserialize_callback File ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", line 77, in _deserialize_pandas_dataframe return pdcompat.serialized_dict_to_dataframe(data) File ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 450, in serialized_dict_to_dataframe for block in data['blocks']] File ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 450, in for block in data['blocks']] File ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 478, in _reconstruct_block block = _int.make_block(block_arr, placement=placement) File ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", line 2957, in make_block return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) File ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", line 120, in __init__ len(self.mgr_locs))) ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2269) Cannot build bdist_wheel for Python
[ https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387565#comment-16387565 ] Mitar commented on ARROW-2269: -- I opened a pull request making it easier to debug this: https://github.com/apache/arrow/pull/1712 > Cannot build bdist_wheel for Python > --- > > Key: ARROW-2269 > URL: https://issues.apache.org/jira/browse/ARROW-2269 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.9.0 >Reporter: Mitar >Priority: Major > > I am trying current master. > I ran: > > python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet > --with-plasma --bundle-arrow-cpp bdist_wheel > > Output: > > running build_ext > creating build > creating build/temp.linux-x86_64-3.6 > -- Runnning cmake for pyarrow > cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python > -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on > -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON > -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python > -- The C compiler identification is GNU 7.2.0 > -- The CXX compiler identification is GNU 7.2.0 > -- Check for working C compiler: /usr/bin/cc > -- Check for working C compiler: /usr/bin/cc -- works > -- Detecting C compiler ABI info > -- Detecting C compiler ABI info - done > -- Detecting C compile features > -- Detecting C compile features - done > -- Check for working CXX compiler: /usr/bin/c++ > -- Check for working CXX compiler: /usr/bin/c++ -- works > -- Detecting CXX compiler ABI info > -- Detecting CXX compiler ABI info - done > -- Detecting CXX compile features > -- Detecting CXX compile features - done > INFOCompiler command: /usr/bin/c++ > INFOCompiler version: Using built-in specs. > COLLECT_GCC=/usr/bin/c++ > COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper > OFFLOAD_TARGET_NAMES=nvptx-none > OFFLOAD_TARGET_DEFAULT=1 > Target: x86_64-linux-gnu > Configured with: ../src/configure -v --with-pkgversion='Ubuntu > 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs > --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr > --with-gcc-major-version-only --program-suffix=-7 > --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id > --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix > --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu > --enable-libstdcxx-debug --enable-libstdcxx-time=yes > --with-default-libstdcxx-abi=new --enable-gnu-unique-object > --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie > --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto > --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 > --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic > --enable-offload-targets=nvptx-none --without-cuda-driver > --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu > --target=x86_64-linux-gnu > Thread model: posix > gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) > INFOCompiler id: GNU > Selected compiler gcc 7.2.0 > -- Performing Test CXX_SUPPORTS_SSE3 > -- Performing Test CXX_SUPPORTS_SSE3 - Success > -- Performing Test CXX_SUPPORTS_ALTIVEC > -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed > Configured for RELEASE build (set with cmake > -DCMAKE_BUILD_TYPE={release,debug,...}) > -- Build Type: RELEASE > -- Build output directory: > .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ > -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version > "3.6.3") > -- Searching for Python libs in > .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu > -- Looking for python3.6m > -- Found Python lib > /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so > -- Found PythonLibs: > /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so > -- Found NumPy: version "1.14.1" > .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include > -- Searching for Python libs in > .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu > -- Looking for python3.6m > -- Found Python lib > /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so > -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") > -- Checking for module 'arrow' > -- Found arrow, version 0.9.0-SNAPSHOT > -- Arrow ABI version: 0.0.0 > -- Arrow SO version: 0 > -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so > -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so > -- Boost version: 1.63.0 > -- Found the following Boost libraries: > -- system > -- filesystem > -- regex > Added shared
[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python
[ https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitar updated ARROW-2269: - Description: I am trying current master. I ran: python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-plasma --bundle-arrow-cpp bdist_wheel Output: running build_ext creating build creating build/temp.linux-x86_64-3.6 -- Runnning cmake for pyarrow cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...}) -- Build Type: RELEASE -- Build output directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found NumPy: version "1.14.1" .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- Arrow SO version: 0 -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found the following Boost libraries: -- system -- filesystem -- regex Added shared library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- Plasma SO version: 0 -- Found the Plasma core library: .../Temp/arrow/dist/lib/libplasma.so -- Found Plasma executable: .../Temp/arrow/dist/bin/plasma_store Added shared library dependency libplasma: .../Temp/arrow/dist/lib/libplasma.so -- Configuring done -- Generating done -- Build
[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python
[ https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitar updated ARROW-2269: - Description: I am trying current master. I ran: {{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-plasma --bundle-arrow-cpp bdist_wheel}} Output: running build_ext creating build creating build/temp.linux-x86_64-3.6 -- Runnning cmake for pyarrow cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found NumPy: version "1.14.1" .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- Arrow SO version: 0 -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found the following Boost libraries: -- system -- filesystem -- regex Added shared library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- Plasma SO version: 0 -- Found the Plasma core library: .../Temp/arrow/dist/lib/libplasma.so -- Found Plasma executable: .../Temp/arrow/dist/bin/plasma_store Added shared library dependency libplasma: .../Temp/arrow/dist/lib/libplasma.so -- Configuring done -- Generating
[jira] [Updated] (ARROW-2269) Cannot build bdist_wheel for Python
[ https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitar updated ARROW-2269: - Description: I am trying current master. I ran: {\{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-plasma --bundle-arrow-cpp bdist_wheel }} Output: {{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- Runnning cmake for pyarrow cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found NumPy: version "1.14.1" .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- Arrow SO version: 0 -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found the following Boost libraries: -- system -- filesystem -- regex Added shared library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- Plasma SO version: 0 -- Found the Plasma core library: .../Temp/arrow/dist/lib/libplasma.so -- Found Plasma executable: .../Temp/arrow/dist/bin/plasma_store Added shared library dependency libplasma: .../Temp/arrow/dist/lib/libplasma.so -- Configuring done -- Generating
[jira] [Created] (ARROW-2269) Cannot build bdist_wheel for Python
Mitar created ARROW-2269: Summary: Cannot build bdist_wheel for Python Key: ARROW-2269 URL: https://issues.apache.org/jira/browse/ARROW-2269 Project: Apache Arrow Issue Type: Bug Components: Packaging Affects Versions: 0.9.0 Reporter: Mitar I am trying current master. I ran: {{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-plasma --bundle-arrow-cpp bdist_wheel }} Output: {{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- Runnning cmake for pyarrow cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found NumPy: version "1.14.1" .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- Searching for Python libs in .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu -- Looking for python3.6m -- Found Python lib /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- Arrow SO version: 0 -- Found the Arrow core library: .../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: .../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found the following Boost libraries: -- system -- filesystem -- regex Added shared library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- Plasma SO version: 0 -- Found the Plasma core library: .../Temp/arrow/dist/lib/libplasma.so -- Found
[jira] [Commented] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals
[ https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386724#comment-16386724 ] Mitar commented on ARROW-2250: -- I made an observation that parent process is unnecessary and made pull request which just replaces it with plasma store executable. In this way then all future signal are handled by the process through that executable. This makes everything cleaner and means that it is not needed to do any signal passing or cleanup. > plasma_store process should cleanup on INT and TERM signals > --- > > Key: ARROW-2250 > URL: https://issues.apache.org/jira/browse/ARROW-2250 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Mitar >Priority: Major > Labels: pull-request-available > > Currently, if you send an INT and TERM signal to a parent plasma store > process (Python one) it terminates it without cleaning the child process. > This makes it hard to run plasma store in non-interactive mode. Inside shell > ctrl-c kills both processes. > Moreover, INT prints out an ugly KeyboardInterrup exception. Probably > something nicer should be done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2264) Efficiently serialize numpy arrays with dtype of unicode fixed length string
Mitar created ARROW-2264: Summary: Efficiently serialize numpy arrays with dtype of unicode fixed length string Key: ARROW-2264 URL: https://issues.apache.org/jira/browse/ARROW-2264 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Mitar Looking at the numpy array serialization code it seems that if I have a dtype like ">> np.array(['aaa', 'bbb'])}} {{array(['aaa', 'bbb'], dtype='
[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals as well
[ https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitar updated ARROW-2250: - Description: Currently, if you send an INT and TERM signal to a parent plasma store process (Python one) it terminates it without cleaning the child process. This makes it hard to run plasma store in non-interactive mode. Inside shell ctrl-c kills both processes. Moreover, INT prints out an ugly KeyboardInterrup exception. Probably something nicer should be done. was:Currently it cleans up on INT signal. But if it gets the TERM signal, then it kills the parent process (Python one) but not the binary process. I think both TERM and INT signals should be handled the same. Summary: plasma_store process should cleanup on INT and TERM signals as well (was: plasma_store process should cleanup on TERM signal as well) > plasma_store process should cleanup on INT and TERM signals as well > --- > > Key: ARROW-2250 > URL: https://issues.apache.org/jira/browse/ARROW-2250 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Mitar >Priority: Major > > Currently, if you send an INT and TERM signal to a parent plasma store > process (Python one) it terminates it without cleaning the child process. > This makes it hard to run plasma store in non-interactive mode. Inside shell > ctrl-c kills both processes. > Moreover, INT prints out an ugly KeyboardInterrup exception. Probably > something nicer should be done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2250) plasma_store process should cleanup on INT and TERM signals
[ https://issues.apache.org/jira/browse/ARROW-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitar updated ARROW-2250: - Summary: plasma_store process should cleanup on INT and TERM signals (was: plasma_store process should cleanup on INT and TERM signals as well) > plasma_store process should cleanup on INT and TERM signals > --- > > Key: ARROW-2250 > URL: https://issues.apache.org/jira/browse/ARROW-2250 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Mitar >Priority: Major > > Currently, if you send an INT and TERM signal to a parent plasma store > process (Python one) it terminates it without cleaning the child process. > This makes it hard to run plasma store in non-interactive mode. Inside shell > ctrl-c kills both processes. > Moreover, INT prints out an ugly KeyboardInterrup exception. Probably > something nicer should be done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2250) plasma_store process should cleanup on TERM signal as well
Mitar created ARROW-2250: Summary: plasma_store process should cleanup on TERM signal as well Key: ARROW-2250 URL: https://issues.apache.org/jira/browse/ARROW-2250 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Mitar Currently it cleans up on INT signal. But if it gets the TERM signal, then it kills the parent process (Python one) but not the binary process. I think both TERM and INT signals should be handled the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes
[ https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385004#comment-16385004 ] Mitar commented on ARROW-1919: -- Is there any plan when this will be released as new version? > Plasma hanging if object id is not 20 bytes > --- > > Key: ARROW-1919 > URL: https://issues.apache.org/jira/browse/ARROW-1919 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > This happens if plasma's capability to put an object with a user defined > object id is used if the object id is not 20 bytes long. Plasma will hang > upon get in that case, we should give an error instead. > See https://github.com/ray-project/ray/issues/1315 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes
[ https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288743#comment-16288743 ] Mitar commented on ARROW-1919: -- The error should probably be in put, no? > Plasma hanging if object id is not 20 bytes > --- > > Key: ARROW-1919 > URL: https://issues.apache.org/jira/browse/ARROW-1919 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > > This happens if plasma's capability to put an object with a user defined > object id is used if the object id is not 20 bytes long. Plasma will hang > upon get in that case, we should give an error instead. > See https://github.com/ray-project/ray/issues/1315 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1664) Support for xarray.DataArray and xarray.Dataset
Mitar created ARROW-1664: Summary: Support for xarray.DataArray and xarray.Dataset Key: ARROW-1664 URL: https://issues.apache.org/jira/browse/ARROW-1664 Project: Apache Arrow Issue Type: Bug Reporter: Mitar DataArray and Dataset are efficient in-memory representations for multi dimensional data. It would be great if one could share them between processes using Arrow. http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian JIRA (v6.4.14#64029)