[jira] [Assigned] (ARROW-1964) [Python] Expose Builder classes
[ https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Hagerman reassigned ARROW-1964: Assignee: Alex Hagerman > [Python] Expose Builder classes > --- > > Key: ARROW-1964 > URL: https://issues.apache.org/jira/browse/ARROW-1964 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Alex Hagerman >Priority: Major > Labels: beginner, pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Having the builder classes available from Python would be very helpful. > Currently a construction of an Arrow array always need to have a Python list > or numpy array as intermediate. As the builder in combination with jemalloc > are very efficient in building up non-chunked memory, it would be nice to > directly use them in certain cases. > The most useful builders are the > [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714] > and > [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872] > as they provide functionality to create columns that are not easily > constructed using NumPy methods in Python. > The basic approach would be to wrap the C++ classes in > https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd > so that they can be used from Cython. Afterwards, we should start a new file > {{python/pyarrow/builder.pxi}} where we have classes take typical Python > objects like {{str}} and pass them on to the C++ classes. At the end, these > classes should also return (Python accessible) {{pyarrow.Array}} instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2542) [Plasma] Refactor object notification code
Philipp Moritz created ARROW-2542: - Summary: [Plasma] Refactor object notification code Key: ARROW-2542 URL: https://issues.apache.org/jira/browse/ARROW-2542 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Replace unique_ptrwith vector -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2541) [Plasma] Clean up macro usage
Philipp Moritz created ARROW-2541: - Summary: [Plasma] Clean up macro usage Key: ARROW-2541 URL: https://issues.apache.org/jira/browse/ARROW-2541 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz There are still a lot of macros being used as constants in the plasma codebase. This should be cleaned up and replaced with constexpr (deprecating them where appropriate). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer
[ https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-2539. --- Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1993 [https://github.com/apache/arrow/pull/1993] > [Plasma] Use unique_ptr instead of raw pointer > -- > > Key: ARROW-2539 > URL: https://issues.apache.org/jira/browse/ARROW-2539 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Minor > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > There are some places in Plasma where explicit new & delete are used, > forgetting to delete can cause memory leak. Use unique_ptr instead when > possible so that memory gets deleted automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2285) [Python] Can't convert Numpy string arrays
[ https://issues.apache.org/jira/browse/ARROW-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2285: -- Labels: pull-request-available (was: ) > [Python] Can't convert Numpy string arrays > -- > > Key: ARROW-2285 > URL: https://issues.apache.org/jira/browse/ARROW-2285 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > {code:python} > >>> arr = np.array([b'foo', b'bar'], dtype='S3') > >>> pa.array(arr, type=pa.binary(3)) > Traceback (most recent call last): > File "", line 1, in > pa.array(arr, type=pa.binary(3)) > File "array.pxi", line 177, in pyarrow.lib.array > File "array.pxi", line 77, in pyarrow.lib._ndarray_to_array > File "error.pxi", line 85, in pyarrow.lib.check_status > ArrowNotImplementedError: > /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1661 code: > converter.Convert() > NumPyConverter doesn't implementconversion. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2273: -- Labels: pull-request-available (was: ) > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > Labels: pull-request-available > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.
[ https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464027#comment-16464027 ] Krisztian Szucs edited comment on ARROW-2535 at 5/4/18 3:34 PM: [~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html Sample config: https://github.com/daskos/daskos/blob/master/.pre-commit-config.yaml was (Author: kszucs): [~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html > [C++/Python] Provide pre-commit hooks that check flake8 et al. > -- > > Key: ARROW-2535 > URL: https://issues.apache.org/jira/browse/ARROW-2535 > Project: Apache Arrow > Issue Type: Task > Components: C++, Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.10.0 > > > We should provide pre-commit hooks that users can install (optionally) that > check e.g. flake8 and clang-format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.
[ https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464027#comment-16464027 ] Krisztian Szucs commented on ARROW-2535: [~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html > [C++/Python] Provide pre-commit hooks that check flake8 et al. > -- > > Key: ARROW-2535 > URL: https://issues.apache.org/jira/browse/ARROW-2535 > Project: Apache Arrow > Issue Type: Task > Components: C++, Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.10.0 > > > We should provide pre-commit hooks that users can install (optionally) that > check e.g. flake8 and clang-format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer
[ https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2539: -- Labels: pull-request-available (was: ) > [Plasma] Use unique_ptr instead of raw pointer > -- > > Key: ARROW-2539 > URL: https://issues.apache.org/jira/browse/ARROW-2539 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Minor > Labels: pull-request-available > > There are some places in Plasma where explicit new & delete are used, > forgetting to delete can cause memory leak. Use unique_ptr instead when > possible so that memory gets deleted automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:39 AM: --- [~mitar], Yes, it is still there. {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow. was (Author: licht-t): [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:38 AM: --- [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, it is not the right time to support this in pyarrow. was (Author: licht-t): [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, it is not the right time yet to support this in pyarrow. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:38 AM: --- [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow. was (Author: licht-t): [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, it is not the right time to support this in pyarrow. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:37 AM: --- [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these, but it is hard to fix all. IMO, it is not the right time yet to support this in pyarrow. was (Author: licht-t): [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these but it is hard to fix all. IMO, it is not the right time yet to support this in pyarrow. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622 ] Licht Takeuchi commented on ARROW-2273: --- [~mitar], Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and has many bugs. I've spent a lot of time to fix these but it is hard to fix all. IMO, it is not the right time yet to support this in pyarrow. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463610#comment-16463610 ] Mitar commented on ARROW-2273: -- Isn't it still open for a debate if it will be deprecated? > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas
[ https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-2459. Resolution: Fixed Assignee: Licht Takeuchi Fix Version/s: 0.10.0 Thanks [~Licht-T] for testing this. > pyarrow: Segfault with pyarrow.deserialize_pandas > - > > Key: ARROW-2459 > URL: https://issues.apache.org/jira/browse/ARROW-2459 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: OS X, Linux >Reporter: Travis Brady >Assignee: Licht Takeuchi >Priority: Major > Fix For: 0.10.0 > > > Following up from [https://github.com/apache/arrow/issues/1884] wherein I > found that calling deserialize_pandas in the linked app.py script in the repo > linked below causes the app.py process to segfault. > I initially observed this on OS X, but have since confirmed that the behavior > exists on Linux as well. > Repo containing example: [https://github.com/travisbrady/sanic-arrow] > And more generally: what is the right way to get a Java-based HTTP > microservice to talk to a Python-based HTTP microservice using Arrow as the > serialization format? I'm exchanging DataFrame type objects (they are > pandas.DataFrame's on the Python side) between the two services for real-time > scoring in a few xgboost models implemented in Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463576#comment-16463576 ] Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 8:54 AM: --- Yes, I'll do that. was (Author: licht-t): Okay, I'll do that. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Licht Takeuchi reassigned ARROW-2273: - Assignee: Licht Takeuchi > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Assignee: Licht Takeuchi >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463576#comment-16463576 ] Licht Takeuchi commented on ARROW-2273: --- Okay, I'll do that. > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically
[ https://issues.apache.org/jira/browse/ARROW-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463558#comment-16463558 ] Zhijun Fu commented on ARROW-2540: -- PR is here: https://github.com/apache/arrow/pull/1996 > [Plasma] add constructor/destructor to make sure dlfree is called > automatically > --- > > Key: ARROW-2540 > URL: https://issues.apache.org/jira/browse/ARROW-2540 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add constructor & destructor to ObjectTableEntry structure to make sure > dlfree() is called for the pointer field when the object gets destructed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically
[ https://issues.apache.org/jira/browse/ARROW-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2540: -- Labels: pull-request-available (was: ) > [Plasma] add constructor/destructor to make sure dlfree is called > automatically > --- > > Key: ARROW-2540 > URL: https://issues.apache.org/jira/browse/ARROW-2540 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Minor > Labels: pull-request-available > > Add constructor & destructor to ObjectTableEntry structure to make sure > dlfree() is called for the pointer field when the object gets destructed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically
Zhijun Fu created ARROW-2540: Summary: [Plasma] add constructor/destructor to make sure dlfree is called automatically Key: ARROW-2540 URL: https://issues.apache.org/jira/browse/ARROW-2540 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Reporter: Zhijun Fu Add constructor & destructor to ObjectTableEntry structure to make sure dlfree() is called for the pointer field when the object gets destructed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode
[ https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-2478. Resolution: Fixed Issue resolved by pull request 1937 [https://github.com/apache/arrow/pull/1937] > [C++] Introduce a checked_cast function that performs a dynamic_cast in debug > mode > -- > > Key: ARROW-2478 > URL: https://issues.apache.org/jira/browse/ARROW-2478 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.9.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This would use {{static_cast}} in release mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer
[ https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463537#comment-16463537 ] Zhijun Fu commented on ARROW-2539: -- I've created an PR for this: [https://github.com/apache/arrow/pull/1993] > [Plasma] Use unique_ptr instead of raw pointer > -- > > Key: ARROW-2539 > URL: https://issues.apache.org/jira/browse/ARROW-2539 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Priority: Minor > > There are some places in Plasma where explicit new & delete are used, > forgetting to delete can cause memory leak. Use unique_ptr instead when > possible so that memory gets deleted automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer
Zhijun Fu created ARROW-2539: Summary: [Plasma] Use unique_ptr instead of raw pointer Key: ARROW-2539 URL: https://issues.apache.org/jira/browse/ARROW-2539 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Reporter: Zhijun Fu There are some places in Plasma where explicit new & delete are used, forgetting to delete can cause memory leak. Use unique_ptr instead when possible so that memory gets deleted automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame
[ https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463520#comment-16463520 ] Uwe L. Korn commented on ARROW-2273: Should we then simply check for that on the serialize side and raise an error? > Cannot deserialize pandas SparseDataFrame > - > > Key: ARROW-2273 > URL: https://issues.apache.org/jira/browse/ARROW-2273 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Mitar >Priority: Major > > >>> import pyarrow > >>> import pandas > >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, > >>> 9]}) > >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", > line 77, in _deserialize_pandas_dataframe > return pdcompat.serialized_dict_to_dataframe(data) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in serialized_dict_to_dataframe > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 450, in > for block in data['blocks']] > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 478, in _reconstruct_block > block = _int.make_block(block_arr, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 2957, in make_block > return klass(values, ndim=ndim, fastpath=fastpath, placement=placement) > File > ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", > line 120, in __init__ > len(self.mgr_locs))) > ValueError: Wrong number of items passed 3, placement implies 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2278) [Python] deserializing Numpy struct arrays raises
[ https://issues.apache.org/jira/browse/ARROW-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2278: -- Labels: pull-request-available (was: ) > [Python] deserializing Numpy struct arrays raises > - > > Key: ARROW-2278 > URL: https://issues.apache.org/jira/browse/ARROW-2278 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Assignee: Licht Takeuchi >Priority: Major > Labels: pull-request-available > > {code:python} > >>> import numpy as np > >>> dt = np.dtype([('x', np.int8), ('y', np.float32)]) > >>> arr = np.arange(5*10, dtype=np.int8).view(dt) > >>> pa.deserialize(pa.serialize(arr).to_buffer()) > Traceback (most recent call last): > File "", line 1, in > pa.deserialize(pa.serialize(arr).to_buffer()) > File "serialization.pxi", line 441, in pyarrow.lib.deserialize > File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from > File "serialization.pxi", line 257, in > pyarrow.lib.SerializedPyObject.deserialize > File "serialization.pxi", line 174, in > pyarrow.lib.SerializationContext._deserialize_callback > File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in > _deserialize_numpy_array_list > return np.array(data[0], dtype=np.dtype(data[1])) > TypeError: a bytes-like object is required, not 'int' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2538) [Java] Introduce BaseWriter.writeNull method
[ https://issues.apache.org/jira/browse/ARROW-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2538: -- Labels: pull-request-available (was: ) > [Java] Introduce BaseWriter.writeNull method > > > Key: ARROW-2538 > URL: https://issues.apache.org/jira/browse/ARROW-2538 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > > When a data set has null values in complex data type, it's hard to call > proper writeNull method. Because writeNull is declared in > AbstractFieldWriter, which is package local, not public. Moreover, > UnionListWriter has no implementation of writeNull method, so it falls into > AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by > default. So BaseWriter.writeNull is required to write null values inside of > complex data types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)