[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369640#comment-16369640
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm closed pull request #1581: ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/README-benchmarks.md b/python/README-benchmarks.md
index 3fecb35cb..60fa88f4a 100644
--- a/python/README-benchmarks.md
+++ b/python/README-benchmarks.md
@@ -41,8 +41,6 @@ First you have to install ASV's development version:
 pip install git+https://github.com/airspeed-velocity/asv.git
 ```
 
-
-
 Then you need to set up a few environment variables:
 
 ```shell
diff --git a/python/benchmarks/convert_pandas.py 
b/python/benchmarks/convert_pandas.py
index c4a7a59cb..244b3dcc8 100644
--- a/python/benchmarks/convert_pandas.py
+++ b/python/benchmarks/convert_pandas.py
@@ -48,3 +48,23 @@ def setup(self, n, dtype):
 
 def time_to_series(self, n, dtype):
 self.arrow_data.to_pandas()
+
+
+class ZeroCopyPandasRead(object):
+
+def setup(self):
+# Transpose to make column-major
+values = np.random.randn(10, 10)
+
+df = pd.DataFrame(values.T)
+ctx = pa.default_serialization_context()
+
+self.serialized = ctx.serialize(df)
+self.as_buffer = self.serialized.to_buffer()
+self.as_components = self.serialized.to_components()
+
+def time_deserialize_from_buffer(self):
+pa.deserialize(self.as_buffer)
+
+def time_deserialize_from_components(self):
+pa.deserialize_components(self.as_components)
diff --git a/python/doc/source/ipc.rst b/python/doc/source/ipc.rst
index 9bf93ffe8..bce8b1ed1 100644
--- a/python/doc/source/ipc.rst
+++ b/python/doc/source/ipc.rst
@@ -317,9 +317,8 @@ An object can be reconstructed from its component-based 
representation using
 Serializing pandas Objects
 --
 
-We provide a serialization context that has optimized handling of pandas
-objects like ``DataFrame`` and ``Series``. This can be created with
-``pyarrow.pandas_serialization_context()``. Combined with component-based
+The default serialization context has optimized handling of pandas
+objects like ``DataFrame`` and ``Series``. Combined with component-based
 serialization above, this enables zero-copy transport of pandas DataFrame
 objects not containing any Python objects:
 
@@ -327,7 +326,7 @@ objects not containing any Python objects:
 
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
-   context = pa.pandas_serialization_context()
+   context = pa.default_serialization_context()
serialized_df = context.serialize(df)
df_components = serialized_df.to_components()
original_df = context.deserialize_components(df_components)
diff --git a/python/pyarrow/__init__.py b/python/pyarrow/__init__.py
index d95954ed3..15a37ca10 100644
--- a/python/pyarrow/__init__.py
+++ b/python/pyarrow/__init__.py
@@ -125,7 +125,6 @@
 localfs = LocalFileSystem.get_instance()
 
 from pyarrow.serialization import (default_serialization_context,
-   pandas_serialization_context,
register_default_serialization_handlers,
register_torch_serialization_handlers)
 
diff --git a/python/pyarrow/pandas_compat.py b/python/pyarrow/pandas_compat.py
index e8fa83fe7..6d4bf5e78 100644
--- a/python/pyarrow/pandas_compat.py
+++ b/python/pyarrow/pandas_compat.py
@@ -27,7 +27,7 @@
 import six
 
 import pyarrow as pa
-from pyarrow.compat import PY2, zip_longest  # noqa
+from pyarrow.compat import builtin_pickle, PY2, zip_longest  # noqa
 
 
 def infer_dtype(column):
@@ -424,11 +424,19 @@ def dataframe_to_serialized_dict(frame):
 block_data.update(dictionary=values.categories,
   ordered=values.ordered)
 values = values.codes
-
 block_data.update(
 placement=block.mgr_locs.as_array,
 block=values
 )
+
+# If we are dealing with an object array, pickle it instead. Note that
+# we do not use isinstance here because _int.CategoricalBlock is a
+# subclass of _int.ObjectBlock.
+if type(block) == _int.ObjectBlock:
+block_data['object'] = None
+block_data['block'] = builtin_pickle.dumps(
+values, protocol=builtin_pickle.HIGHEST_PROTOCOL)
+
 blocks.append(block_data)
 
 return {
@@ -463,6 +471,9 @@ def _reconstruct_block(item):
 block = _int.make_block(block_arr, placement=placement,

[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369638#comment-16369638
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm commented on issue #1581: ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-366845422
 
 
   Merging this, since the last Appveyor build had passed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369616#comment-16369616
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object 
arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-366835235
 
 
   Thanks @wesm I *think* I've enabled it now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369606#comment-16369606
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm commented on issue #1581: ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-366831288
 
 
   @robertnishihara would you mind enabling appveyor on your fork when you have 
a chance? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369590#comment-16369590
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm commented on issue #1581: ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-366821860
 
 
   Sorry for the delay, looking now, and may as well add a benchmark for 
zero-copy DataFrame while I'm at it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360836#comment-16360836
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm commented on issue #1581: ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364946568
 
 
   Yep, I have this on deck to look at today


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359097#comment-16359097
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object 
arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364599769
 
 
   Ok, I'm pretty happy with this now. @wesm @pcmoritz let me know if you have 
any comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359080#comment-16359080
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object 
arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364595801
 
 
   Let's not merge this just yet, I'd like to brainstorm other approaches a 
little.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358991#comment-16358991
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on a change in pull request #1581: ARROW-2121: 
[Python] Handle object arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#discussion_r167156435
 
 

 ##
 File path: python/pyarrow/pandas_compat.py
 ##
 @@ -421,11 +421,18 @@ def dataframe_to_serialized_dict(frame):
 block_data.update(dictionary=values.categories,
   ordered=values.ordered)
 values = values.codes
-
 block_data.update(
 placement=block.mgr_locs.as_array,
 block=values
 )
+
+# If we are dealing with an object array, pickle it instead. Note that
+# we do not use isinstance here because _int.CategoricalBlock is a
+# subclass of _int.ObjectBlock.
+if type(block) == _int.ObjectBlock:
+block_data['object'] = None
+block_data['block'] = builtin_pickle.dumps(values)
 
 Review comment:
   Should we be using `_pickle_to_buffer` here? Does that make a difference?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358992#comment-16358992
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on a change in pull request #1581: ARROW-2121: 
[Python] Handle object arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#discussion_r167350906
 
 

 ##
 File path: python/pyarrow/pandas_compat.py
 ##
 @@ -421,11 +421,19 @@ def dataframe_to_serialized_dict(frame):
 block_data.update(dictionary=values.categories,
   ordered=values.ordered)
 values = values.codes
-
 block_data.update(
 placement=block.mgr_locs.as_array,
 block=values
 )
+
+# If we are dealing with an object array, pickle it instead. Note that
+# we do not use isinstance here because _int.CategoricalBlock is a
+# subclass of _int.ObjectBlock.
+if type(block) == _int.ObjectBlock:
+block_data['object'] = None
+block_data['block'] = builtin_pickle.dumps(
+values, protocol=builtin_pickle.HIGHEST_PROTOCOL)
 
 Review comment:
   Should we be using `_pickle_to_buffer` here? Does that make a difference?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358990#comment-16358990
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object 
arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364573786
 
 
   Some performance numbers. The numbers are somewhat variable if you run the 
benchmarks multiple times.
   
   ```python
   import pyarrow as pa
   import pandas as pd
   df = pd.DataFrame(data={str(i): [i, str(i)] for i in range(10 ** 6)})
   ```
   
   Before this PR
   
   ```python
   context = pa.pandas_serialization_context()
   
   %time s = pa.serialize(df, context=context).to_buffer()  # 570ms
   %time d = pa.deserialize(s, context=context)  # 485ms
   
   %timeit s = pa.serialize(df, context=context).to_buffer()  # 482ms
   %timeit d = pa.deserialize(s, context=context)  # 376ms
   ```
   
   After this PR
   
   ```python
   %time s = pa.serialize(df).to_buffer()  # 577ms
   %time d = pa.deserialize(s)  # 672ms
   
   %timeit s = pa.serialize(df).to_buffer()  # 467ms
   %timeit d = pa.deserialize(s)  # 349ms
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358038#comment-16358038
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on a change in pull request #1581: ARROW-2121: 
[Python] Handle object arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#discussion_r167156435
 
 

 ##
 File path: python/pyarrow/pandas_compat.py
 ##
 @@ -421,11 +421,18 @@ def dataframe_to_serialized_dict(frame):
 block_data.update(dictionary=values.categories,
   ordered=values.ordered)
 values = values.codes
-
 block_data.update(
 placement=block.mgr_locs.as_array,
 block=values
 )
+
+# If we are dealing with an object array, pickle it instead. Note that
+# we do not use isinstance here because _int.CategoricalBlock is a
+# subclass of _int.ObjectBlock.
+if type(block) == _int.ObjectBlock:
+block_data['object'] = None
+block_data['block'] = builtin_pickle.dumps(values)
 
 Review comment:
   Should we be using `_pickle_to_buffer` here? Does that make a difference?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357976#comment-16357976
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara commented on a change in pull request #1581: [WIP] ARROW-2121: 
[Python] Handle object arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#discussion_r167148817
 
 

 ##
 File path: python/pyarrow/pandas_compat.py
 ##
 @@ -421,11 +421,16 @@ def dataframe_to_serialized_dict(frame):
 block_data.update(dictionary=values.categories,
   ordered=values.ordered)
 values = values.codes
-
 block_data.update(
 placement=block.mgr_locs.as_array,
 block=values
 )
+
+# If we are dealing with an object array, pickle it instead.
+if isinstance(block, _int.ObjectBlock):
+block_data['object'] = None
+block_data['block'] = builtin_pickle.dumps(values)
 
 Review comment:
   Should we be using `_pickle_to_buffer` here? Does that make a difference?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357967#comment-16357967
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

wesm commented on issue #1581: [WIP] ARROW-2121: [Python] Handle object arrays 
directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364344672
 
 
   Well, we need to preserve the zero-copy pandas reads. Now that our ASV 
benchmarking setup has been rehabilitated we should be able to do that in this 
patch to verify performance


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2121) Consider special casing object arrays in pandas serializers.

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357951#comment-16357951
 ] 

ASF GitHub Bot commented on ARROW-2121:
---

robertnishihara opened a new pull request #1581: [WIP] ARROW-2121: [Python] 
Handle object arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581
 
 
   The goal here is to get the best of both the `pandas_serialization_context` 
(speed at serializing pandas dataframes containing strings and other objects) 
and the `default_serialization_context` (correctly serializing a large class of 
numpy object arrays).
   
   This PR sort of messes up the function 
`pa.pandas_compat.dataframe_to_serialized_dict`. Is that function just a helper 
function for implementing the custom pandas serializers? Or is it intended to 
be used in other places.
   
   TODO in this PR (assuming you think this approach is reasonable):
   
   - [ ] remove `pandas_serialization_context`
   - [ ] make sure this code path is tested
   - [ ] double check that performance is good
   
   cc @wesm @pcmoritz @devin-petersohn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider special casing object arrays in pandas serializers.
> 
>
> Key: ARROW-2121
> URL: https://issues.apache.org/jira/browse/ARROW-2121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)