[jira] [Assigned] (ARROW-1964) [Python] Expose Builder classes

2018-05-04 Thread Alex Hagerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Hagerman reassigned ARROW-1964:


Assignee: Alex Hagerman

> [Python] Expose Builder classes
> ---
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Alex Hagerman
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.
> The most useful builders are the 
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
>  and 
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
>  as they provide functionality to create columns that are not easily 
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in 
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
>  so that they can be used from Cython. Afterwards, we should start a new file 
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python 
> objects like {{str}} and pass them on to the C++ classes. At the end, these 
> classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2542) [Plasma] Refactor object notification code

2018-05-04 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2542:
-

 Summary: [Plasma] Refactor object notification code
 Key: ARROW-2542
 URL: https://issues.apache.org/jira/browse/ARROW-2542
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Replace unique_ptr with vector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2541) [Plasma] Clean up macro usage

2018-05-04 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2541:
-

 Summary: [Plasma] Clean up macro usage
 Key: ARROW-2541
 URL: https://issues.apache.org/jira/browse/ARROW-2541
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


There are still a lot of macros being used as constants in the plasma codebase. 
This should be cleaned up and replaced with constexpr (deprecating them where 
appropriate).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer

2018-05-04 Thread Philipp Moritz (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-2539.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1993
[https://github.com/apache/arrow/pull/1993]

> [Plasma] Use unique_ptr instead of raw pointer
> --
>
> Key: ARROW-2539
> URL: https://issues.apache.org/jira/browse/ARROW-2539
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are some places in Plasma where explicit new & delete are used, 
> forgetting to delete can cause memory leak. Use unique_ptr instead when 
> possible so that memory gets deleted automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2285) [Python] Can't convert Numpy string arrays

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2285:
--
Labels: pull-request-available  (was: )

> [Python] Can't convert Numpy string arrays
> --
>
> Key: ARROW-2285
> URL: https://issues.apache.org/jira/browse/ARROW-2285
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> {code:python}
> >>> arr = np.array([b'foo', b'bar'], dtype='S3')
> >>> pa.array(arr, type=pa.binary(3))
> Traceback (most recent call last):
>   File "", line 1, in 
> pa.array(arr, type=pa.binary(3))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "array.pxi", line 77, in pyarrow.lib._ndarray_to_array
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1661 code: 
> converter.Convert()
> NumPyConverter doesn't implement  conversion. 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2273:
--
Labels: pull-request-available  (was: )

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-05-04 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464027#comment-16464027
 ] 

Krisztian Szucs edited comment on ARROW-2535 at 5/4/18 3:34 PM:


[~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. 
There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html

Sample config: 
https://github.com/daskos/daskos/blob/master/.pre-commit-config.yaml


was (Author: kszucs):
[~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. 
There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html

> [C++/Python] Provide pre-commit hooks that check flake8 et al.
> --
>
> Key: ARROW-2535
> URL: https://issues.apache.org/jira/browse/ARROW-2535
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> We should provide pre-commit hooks that users can install (optionally) that 
> check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-05-04 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464027#comment-16464027
 ] 

Krisztian Szucs commented on ARROW-2535:


[~xhochy] Might be an overkill, but https://pre-commit.com/ works pretty well. 
There are plugins for flake8 and clang-format: https://pre-commit.com/hooks.html

> [C++/Python] Provide pre-commit hooks that check flake8 et al.
> --
>
> Key: ARROW-2535
> URL: https://issues.apache.org/jira/browse/ARROW-2535
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> We should provide pre-commit hooks that users can install (optionally) that 
> check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2539:
--
Labels: pull-request-available  (was: )

> [Plasma] Use unique_ptr instead of raw pointer
> --
>
> Key: ARROW-2539
> URL: https://issues.apache.org/jira/browse/ARROW-2539
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>
> There are some places in Plasma where explicit new & delete are used, 
> forgetting to delete can cause memory leak. Use unique_ptr instead when 
> possible so that memory gets deleted automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:39 AM:
---

[~mitar],

Yes, it is still there. {{SparseDataFrame}} is naive implementation and has 
many bugs. I've spent a lot of time to fix these, but it is hard to fix all. 
IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow.


was (Author: licht-t):
[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:38 AM:
---

[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, it is not the right time to support this in pyarrow.


was (Author: licht-t):
[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, it is not the right time yet to support this in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:38 AM:
---

[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, this is not the right time to support {{SparseDataFrame}} in pyarrow.


was (Author: licht-t):
[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, it is not the right time to support this in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 9:37 AM:
---

[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these, but it is hard to fix 
all. IMO, it is not the right time yet to support this in pyarrow.


was (Author: licht-t):
[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these but it is hard to fix all. 
IMO, it is not the right time yet to support this in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi commented on ARROW-2273:
---

[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these but it is hard to fix all. 
IMO, it is not the right time yet to support this in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463610#comment-16463610
 ] 

Mitar commented on ARROW-2273:
--

Isn't it still open for a debate if it will be deprecated? 

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-05-04 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2459.

   Resolution: Fixed
 Assignee: Licht Takeuchi
Fix Version/s: 0.10.0

Thanks [~Licht-T] for testing this.

> pyarrow: Segfault with pyarrow.deserialize_pandas
> -
>
> Key: ARROW-2459
> URL: https://issues.apache.org/jira/browse/ARROW-2459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: OS X, Linux
>Reporter: Travis Brady
>Assignee: Licht Takeuchi
>Priority: Major
> Fix For: 0.10.0
>
>
> Following up from [https://github.com/apache/arrow/issues/1884] wherein I 
> found that calling deserialize_pandas in the linked app.py script in the repo 
> linked below causes the app.py process to segfault.
> I initially observed this on OS X, but have since confirmed that the behavior 
> exists on Linux as well.
> Repo containing example: [https://github.com/travisbrady/sanic-arrow] 
> And more generally: what is the right way to get a Java-based HTTP 
> microservice to talk to a Python-based HTTP microservice using Arrow as the 
> serialization format? I'm exchanging DataFrame type objects (they are 
> pandas.DataFrame's on the Python side) between the two services for real-time 
> scoring in a few xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463576#comment-16463576
 ] 

Licht Takeuchi edited comment on ARROW-2273 at 5/4/18 8:54 AM:
---

Yes, I'll do that.


was (Author: licht-t):
Okay, I'll do that.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Licht Takeuchi reassigned ARROW-2273:
-

Assignee: Licht Takeuchi

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463576#comment-16463576
 ] 

Licht Takeuchi commented on ARROW-2273:
---

Okay, I'll do that.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically

2018-05-04 Thread Zhijun Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463558#comment-16463558
 ] 

Zhijun Fu commented on ARROW-2540:
--

PR is here:    https://github.com/apache/arrow/pull/1996

> [Plasma] add constructor/destructor to make sure dlfree is called 
> automatically
> ---
>
> Key: ARROW-2540
> URL: https://issues.apache.org/jira/browse/ARROW-2540
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add constructor & destructor to ObjectTableEntry structure to make sure 
> dlfree() is called for the pointer field when the object gets destructed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2540:
--
Labels: pull-request-available  (was: )

> [Plasma] add constructor/destructor to make sure dlfree is called 
> automatically
> ---
>
> Key: ARROW-2540
> URL: https://issues.apache.org/jira/browse/ARROW-2540
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
>
> Add constructor & destructor to ObjectTableEntry structure to make sure 
> dlfree() is called for the pointer field when the object gets destructed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically

2018-05-04 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2540:


 Summary: [Plasma] add constructor/destructor to make sure dlfree 
is called automatically
 Key: ARROW-2540
 URL: https://issues.apache.org/jira/browse/ARROW-2540
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Reporter: Zhijun Fu


Add constructor & destructor to ObjectTableEntry structure to make sure 
dlfree() is called for the pointer field when the object gets destructed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-05-04 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2478.

Resolution: Fixed

Issue resolved by pull request 1937
[https://github.com/apache/arrow/pull/1937]

> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer

2018-05-04 Thread Zhijun Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463537#comment-16463537
 ] 

Zhijun Fu commented on ARROW-2539:
--

I've created an PR for this:

[https://github.com/apache/arrow/pull/1993]

 

> [Plasma] Use unique_ptr instead of raw pointer
> --
>
> Key: ARROW-2539
> URL: https://issues.apache.org/jira/browse/ARROW-2539
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Minor
>
> There are some places in Plasma where explicit new & delete are used, 
> forgetting to delete can cause memory leak. Use unique_ptr instead when 
> possible so that memory gets deleted automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer

2018-05-04 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2539:


 Summary: [Plasma] Use unique_ptr instead of raw pointer
 Key: ARROW-2539
 URL: https://issues.apache.org/jira/browse/ARROW-2539
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Reporter: Zhijun Fu


There are some places in Plasma where explicit new & delete are used, 
forgetting to delete can cause memory leak. Use unique_ptr instead when 
possible so that memory gets deleted automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463520#comment-16463520
 ] 

Uwe L. Korn commented on ARROW-2273:


Should we then simply check for that on the serialize side and raise an error?

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2278) [Python] deserializing Numpy struct arrays raises

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2278:
--
Labels: pull-request-available  (was: )

> [Python] deserializing Numpy struct arrays raises
> -
>
> Key: ARROW-2278
> URL: https://issues.apache.org/jira/browse/ARROW-2278
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> import numpy as np
> >>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
> >>> arr = np.arange(5*10, dtype=np.int8).view(dt)
> >>> pa.deserialize(pa.serialize(arr).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
> pa.deserialize(pa.serialize(arr).to_buffer())
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in 
> _deserialize_numpy_array_list
> return np.array(data[0], dtype=np.dtype(data[1]))
> TypeError: a bytes-like object is required, not 'int'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2538) [Java] Introduce BaseWriter.writeNull method

2018-05-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2538:
--
Labels: pull-request-available  (was: )

> [Java] Introduce BaseWriter.writeNull method
> 
>
> Key: ARROW-2538
> URL: https://issues.apache.org/jira/browse/ARROW-2538
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>
> When a data set has null values in complex data type, it's hard to call 
> proper writeNull method. Because writeNull is declared in 
> AbstractFieldWriter, which is package local, not public. Moreover, 
> UnionListWriter has no implementation of writeNull method, so it falls into 
> AbstractFieldWriter.writeNull, which throws an IllegalArgumentException by 
> default. So BaseWriter.writeNull is required to write null values inside of 
> complex data types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)