[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356405#comment-16356405 ] ASF GitHub Bot commented on ARROW-1950: --- cpcloud commented on issue #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571#issuecomment-363983929 Sweet. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] pandas_type in pandas metadata incorrect for List types > > > Key: ARROW-1950 > URL: https://issues.apache.org/jira/browse/ARROW-1950 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/pandas-dev/pandas/pull/18201#issuecomment-353042438 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356401#comment-16356401 ] ASF GitHub Bot commented on ARROW-1950: --- cpcloud closed pull request #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/cpp/src/arrow/python/arrow_to_pandas.cc b/cpp/src/arrow/python/arrow_to_pandas.cc index fcf05f833..a17d14bf6 100644 --- a/cpp/src/arrow/python/arrow_to_pandas.cc +++ b/cpp/src/arrow/python/arrow_to_pandas.cc @@ -56,8 +56,8 @@ namespace arrow { namespace py { -using internal::kPandasTimestampNull; using internal::kNanosecondsInDay; +using internal::kPandasTimestampNull; using compute::Datum; @@ -90,7 +90,6 @@ struct WrapBytes { static inline bool ListTypeSupported(const DataType& type) { switch (type.id()) { -case Type::NA: case Type::UINT8: case Type::INT8: case Type::UINT16: @@ -104,6 +103,7 @@ static inline bool ListTypeSupported(const DataType& type) { case Type::BINARY: case Type::STRING: case Type::TIMESTAMP: +case Type::NA: // empty list // The above types are all supported. return true; case Type::LIST: { @@ -696,7 +696,6 @@ class ObjectBlock : public PandasBlock { } else if (type == Type::LIST) { auto list_type = std::static_pointer_cast(col->type()); switch (list_type->value_type()->id()) { -CONVERTLISTSLIKE_CASE(FloatType, NA) CONVERTLISTSLIKE_CASE(UInt8Type, UINT8) CONVERTLISTSLIKE_CASE(Int8Type, INT8) CONVERTLISTSLIKE_CASE(UInt16Type, UINT16) @@ -711,6 +710,7 @@ class ObjectBlock : public PandasBlock { CONVERTLISTSLIKE_CASE(BinaryType, BINARY) CONVERTLISTSLIKE_CASE(StringType, STRING) CONVERTLISTSLIKE_CASE(ListType, LIST) +CONVERTLISTSLIKE_CASE(NullType, NA) default: { std::stringstream ss; ss << "Not implemented type for conversion from List to Pandas ObjectBlock: " diff --git a/python/pyarrow/pandas_compat.py b/python/pyarrow/pandas_compat.py index 987bb7555..f5e56a9b2 100644 --- a/python/pyarrow/pandas_compat.py +++ b/python/pyarrow/pandas_compat.py @@ -45,7 +45,7 @@ def get_logical_type_map(): if not _logical_type_map: _logical_type_map.update({ -pa.lib.Type_NA: 'float64', # NaNs +pa.lib.Type_NA: 'empty', pa.lib.Type_BOOL: 'bool', pa.lib.Type_INT8: 'int8', pa.lib.Type_INT16: 'int16', diff --git a/python/pyarrow/tests/test_array.py b/python/pyarrow/tests/test_array.py index 1d5d30071..efbcef5e1 100644 --- a/python/pyarrow/tests/test_array.py +++ b/python/pyarrow/tests/test_array.py @@ -455,7 +455,7 @@ def test_simple_type_construction(): @pytest.mark.parametrize( ('type', 'expected'), [ -(pa.null(), 'float64'), +(pa.null(), 'empty'), (pa.bool_(), 'bool'), (pa.int8(), 'int8'), (pa.int16(), 'int16'), diff --git a/python/pyarrow/tests/test_convert_pandas.py b/python/pyarrow/tests/test_convert_pandas.py index 4f0a68729..7dbf0d7ed 100644 --- a/python/pyarrow/tests/test_convert_pandas.py +++ b/python/pyarrow/tests/test_convert_pandas.py @@ -1404,6 +1404,57 @@ def test_empty_list_roundtrip(self): tm.assert_frame_equal(result, df) +def test_empty_list_metadata(self): +# Create table with array of empty lists, forced to have type +# list(string) in pyarrow +c1 = [["test"], ["a", "b"], None] +c2 = [[], [], []] +arrays = OrderedDict([ +('c1', pa.array(c1, type=pa.list_(pa.string(, +('c2', pa.array(c2, type=pa.list_(pa.string(, +]) +rb = pa.RecordBatch.from_arrays( +list(arrays.values()), +list(arrays.keys()) +) +tbl = pa.Table.from_batches([rb]) + +# First roundtrip changes schema, because pandas cannot preserve the +# type of empty lists +df = tbl.to_pandas() +tbl2 = pa.Table.from_pandas(df, preserve_index=True) +md2 = json.loads(tbl2.schema.metadata[b'pandas'].decode('utf8')) + +# Second roundtrip +df2 = tbl2.to_pandas() +expected = pd.DataFrame(OrderedDict([('c1', c1), ('c2', c2)])) + +tm.assert_frame_equal(df2, expected) + +assert md2['columns'] == [ +{ +'name': 'c1', +'field_name': 'c1', +'metadata': None, +'numpy_type': 'object', +'pandas_type': 'list[unicode]', +}, +
[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356039#comment-16356039 ] ASF GitHub Bot commented on ARROW-1950: --- cpcloud commented on issue #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571#issuecomment-363902634 @xhochy do you mind if i merge this one when it passes? i want to see if my gitbox powers are working. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] pandas_type in pandas metadata incorrect for List types > > > Key: ARROW-1950 > URL: https://issues.apache.org/jira/browse/ARROW-1950 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/pandas-dev/pandas/pull/18201#issuecomment-353042438 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355935#comment-16355935 ] ASF GitHub Bot commented on ARROW-1950: --- cpcloud commented on a change in pull request #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571#discussion_r166729132 ## File path: python/pyarrow/tests/test_convert_pandas.py ## @@ -1404,6 +1404,57 @@ def test_empty_list_roundtrip(self): tm.assert_frame_equal(result, df) +def test_empty_list_metadata(self): +# Create table with array of empty lists, forced to have type +# list(string) in pyarrow +c1 = [["test"], ["a", "b"], None] +c2 = [[], [], []] +arrays = { Review comment: Yep, thanks! I saw that on the CI :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] pandas_type in pandas metadata incorrect for List types > > > Key: ARROW-1950 > URL: https://issues.apache.org/jira/browse/ARROW-1950 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/pandas-dev/pandas/pull/18201#issuecomment-353042438 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355924#comment-16355924 ] ASF GitHub Bot commented on ARROW-1950: --- xhochy commented on a change in pull request #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571#discussion_r166728024 ## File path: python/pyarrow/tests/test_convert_pandas.py ## @@ -1404,6 +1404,57 @@ def test_empty_list_roundtrip(self): tm.assert_frame_equal(result, df) +def test_empty_list_metadata(self): +# Create table with array of empty lists, forced to have type +# list(string) in pyarrow +c1 = [["test"], ["a", "b"], None] +c2 = [[], [], []] +arrays = { Review comment: You will need to use `OrderedDict` here probably to get the same ordering of the results in all Python versions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] pandas_type in pandas metadata incorrect for List types > > > Key: ARROW-1950 > URL: https://issues.apache.org/jira/browse/ARROW-1950 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/pandas-dev/pandas/pull/18201#issuecomment-353042438 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1950) [Python] pandas_type in pandas metadata incorrect for List types
[ https://issues.apache.org/jira/browse/ARROW-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355713#comment-16355713 ] ASF GitHub Bot commented on ARROW-1950: --- cpcloud commented on issue #1571: ARROW-1950: [Python] pandas_type in pandas metadata incorrect for List types URL: https://github.com/apache/arrow/pull/1571#issuecomment-363835406 I'm going to add a C++ test for empty list typed arrays as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] pandas_type in pandas metadata incorrect for List types > > > Key: ARROW-1950 > URL: https://issues.apache.org/jira/browse/ARROW-1950 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/pandas-dev/pandas/pull/18201#issuecomment-353042438 -- This message was sent by Atlassian JIRA (v7.6.3#76005)