[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463622#comment-16463622
 ] 

Licht Takeuchi commented on ARROW-2273:
---

[~mitar],

Yes, it is still there. But, {{SparseDataFrame}} is naive implementation and 
has many bugs. I've spent a lot of time to fix these but it is hard to fix all. 
IMO, it is not the right time yet to support this in pyarrow.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Mitar (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463610#comment-16463610
 ] 

Mitar commented on ARROW-2273:
--

Isn't it still open for a debate if it will be deprecated? 

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Licht Takeuchi
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463576#comment-16463576
 ] 

Licht Takeuchi commented on ARROW-2273:
---

Okay, I'll do that.

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-04 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463520#comment-16463520
 ] 

Uwe L. Korn commented on ARROW-2273:


Should we then simply check for that on the serialize side and raise an error?

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-05-03 Thread Licht Takeuchi (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463370#comment-16463370
 ] 

Licht Takeuchi commented on ARROW-2273:
---

SparseDataFrame is planned to be deprecated.
[https://github.com/pandas-dev/pandas/issues/19239]

> Cannot deserialize pandas SparseDataFrame
> -
>
> Key: ARROW-2273
> URL: https://issues.apache.org/jira/browse/ARROW-2273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Mitar
>Priority: Major
>
> >>> import pyarrow
> >>> import pandas
> >>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 
> >>> 9]})
> >>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "serialization.pxi", line 441, in pyarrow.lib.deserialize
>   File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
>   File "serialization.pxi", line 257, in 
> pyarrow.lib.SerializedPyObject.deserialize
>   File "serialization.pxi", line 174, in 
> pyarrow.lib.SerializationContext._deserialize_callback
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
> line 77, in _deserialize_pandas_dataframe
> return pdcompat.serialized_dict_to_dataframe(data)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in serialized_dict_to_dataframe
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 450, in 
> for block in data['blocks']]
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 478, in _reconstruct_block
> block = _int.make_block(block_arr, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 2957, in make_block
> return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
>   File 
> ".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
> line 120, in __init__
> len(self.mgr_locs)))
> ValueError: Wrong number of items passed 3, placement implies 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)