Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-27 Thread via GitHub


bretttully commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504975262

   🚀  Thanks for all your help here @jorisvandenbossche!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-27 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504894479

   After merging your PR, Conbench analyzed the 3 benchmarking runs that have 
been run so far on merge-commit 8548c2261b4753f4394ec20aa9c1f35f5e0b870e.
   
   There were 132 benchmark results with an error:
   
   - Commit Run on `arm64-t4g-2xlarge-linux` at [2024-11-27 
18:04:38Z](https://conbench.ursa.dev/compare/runs/979df96ff09c41a2bd69ba53cb73dd60...eab4f6f95e38465fb6ae7b0271d967de/)
 - [`tpch` (R) with engine=arrow, format=parquet, language=R, 
memory_map=False, query_id=TPCH-07, 
scale_factor=1](https://conbench.ursa.dev/benchmark-results/067476d564267c988000ed4ac788c08a)
 - [`tpch` (R) with engine=arrow, format=native, language=R, 
memory_map=False, query_id=TPCH-09, 
scale_factor=1](https://conbench.ursa.dev/benchmark-results/067476d7d038754180003cc466d663d8)
   - and 130 more (see the report linked below)
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/33627078459) 
has more details. It also includes information about 1 possible false positive 
for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-27 Thread via GitHub


jorisvandenbossche merged PR #44720:
URL: https://github.com/apache/arrow/pull/44720


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-27 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504271277

   > Is there anything else for me to do here?
   
   No, just me getting back to merge it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-25 Thread via GitHub


raulcd commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2497346892

   > Is there anything else for me to do here?
   
   I don't think so. I am not comfortable with this area of our codebase so 
I'll let @jorisvandenbossche merge once he's happy about it, but as he already 
approved, he might do that soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-23 Thread via GitHub


bretttully commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2495762911

   Is there anything else for me to do here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


github-actions[bot] commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490984364

   Revision: 685167fb8dc28190f9fd8600bca5df7799663e5a
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-524e782c26](https://github.com/ursacomputing/crossbow/branches/all?query=actions-524e782c26)
   
   |Task|Status|
   ||--|
   |example-python-minimal-build-fedora-conda|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-524e782c26-github-example-python-minimal-build-fedora-conda)](https://github.com/ursacomputing/crossbow/actions/runs/11953010576/job/33319986570)|
   |example-python-minimal-build-ubuntu-venv|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-524e782c26-github-example-python-minimal-build-ubuntu-venv)](https://github.com/ursacomputing/crossbow/actions/runs/11953010367/job/33319986020)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


raulcd commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490977517

   @github-actions crossbow submit example-python-minimal-build-*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


bretttully commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490873819

   I have merged `upstream/main` and pushed tags. Let's see if this works...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490844257

   Thanks for investigating that! 
   
   So then to resolve this here, @bretttully should fetch the upstream tags and 
push that to his fork? Something like
   ```
   git fetch upstream
   git push origin --tags
   ```
   
   (assuming upstream is apache/arrow and origin is bretttully/arrow)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


raulcd commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490666176

   I've opened an issue because we should find a way to not fail if the dev tag 
is not present:
   - https://github.com/apache/arrow/issues/44803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


raulcd commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490360277

   > The logs indicate "Successfully installed pyarrow-0.1.dev16896+ge3b9892"
   
   From the git checkout I see is pulling from the remote on ` Syncing 
repository: bretttully/arrow`. I recall an issue if dev tags are not present we 
are unable to detect the correct version. The remote doesn't seem to have other 
branches and/or tags.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490325537

   (the other failures are the known nightly dlpack failures)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-21 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490314967

   @raulcd it seems something is going wrong with the minimal test builds (eg 
example-python-minimal-build-fedora-conda). The logs indicate "Successfully 
installed pyarrow-0.1.dev16896+ge3b9892", which then messes up pandas detection 
of the pyarrow version (for the pyarrow integration in pandas, pandas checks if 
pyarrow is recent enough and otherwise errors), giving some test failures. 
   
   (but also not entirely sure how this PR causes this issue, since I don't see 
the nightlies fail for the minimal builds at the moment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-20 Thread via GitHub


github-actions[bot] commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2489757867

   Revision: e3b9892e5663f4888bc79c38b9d36bbefcdaf2b4
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-e01b93275b](https://github.com/ursacomputing/crossbow/branches/all?query=actions-e01b93275b)
   
   |Task|Status|
   ||--|
   |example-python-minimal-build-fedora-conda|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-example-python-minimal-build-fedora-conda)](https://github.com/ursacomputing/crossbow/actions/runs/11943636788/job/33293053196)|
   |example-python-minimal-build-ubuntu-venv|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-example-python-minimal-build-ubuntu-venv)](https://github.com/ursacomputing/crossbow/actions/runs/11943636730/job/33293052678)|
   |test-conda-python-3.10|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10)](https://github.com/ursacomputing/crossbow/actions/runs/11943636596/job/33293051382)|
   |test-conda-python-3.10-cython2|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-cython2)](https://github.com/ursacomputing/crossbow/actions/runs/11943636683/job/33293052512)|
   |test-conda-python-3.10-hdfs-2.9.2|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/11943636650/job/33293052279)|
   |test-conda-python-3.10-hdfs-3.2.1|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/11943636573/job/33293051473)|
   |test-conda-python-3.10-pandas-latest-numpy-latest|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-pandas-latest-numpy-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636852/job/33293053431)|
   |test-conda-python-3.10-substrait|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-substrait)](https://github.com/ursacomputing/crossbow/actions/runs/11943636844/job/33293053233)|
   |test-conda-python-3.11|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11)](https://github.com/ursacomputing/crossbow/actions/runs/11943636757/job/33293053197)|
   |test-conda-python-3.11-dask-latest|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636907/job/33293053925)|
   |test-conda-python-3.11-dask-upstream_devel|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/1194363/job/33293052227)|
   |test-conda-python-3.11-hypothesis|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-hypothesis)](https://github.com/ursacomputing/crossbow/actions/runs/11943636858/job/33293053951)|
   |test-conda-python-3.11-pandas-latest-numpy-1.26|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-latest-numpy-1.26)](https://github.com/ursacomputing/crossbow/actions/runs/11943636845/job/33293053428)|
   |test-conda-python-3.11-pandas-latest-numpy-latest|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-latest-numpy-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636971/job/33293054692)|
   |test-conda-python-3.11-pandas-nightly-numpy-nightly|[![GitHub 
Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-nightly-numpy-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/11943636614/job/33293051378)|
   |test-conda-python-3.11-pandas-upstream_devel-numpy-nightly|[![GitHub 
A

Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-20 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2489754835

   @github-actions crossbow submit -g python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-20 Thread via GitHub


jorisvandenbossche commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1851119942


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4411,32 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+ 

Review Comment:
   ```suggestion
   ```
   
   Small whitespace issue the linter is complaining about



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-20 Thread via GitHub


jorisvandenbossche commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1849308797


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4411,29 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():

Review Comment:
   ```suggestion
   def test_to_pandas_extension_dtypes_mapping_complex_type():
   # https://github.com/apache/arrow/pull/44720
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-20 Thread via GitHub


raulcd commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2488018513

   > is the process that I can merge this following approval, or is that done 
by a core maintainer?
   
   A committer will merge, probably @jorisvandenbossche in this specific case, 
once everything is running and addressed. I've triggered CI for the latest 
changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-19 Thread via GitHub


bretttully commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2487214002

   Thanks @jorisvandenbossche -- is the process that I can merge this following 
approval, or is that done by a core maintainer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


bretttully commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1843171313


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():
+pa_type = pa.struct(
+[
+pa.field("bar", pa.bool_(), nullable=False),
+pa.field("baz", pa.float32(), nullable=True),
+],
+)
+pd_type = pd.ArrowDtype(pa_type)
+schema = pa.schema([pa.field("foo", pa_type)])
+df0 = pd.DataFrame(
+[
+{"foo": {"bar": True, "baz": np.float32(1)}},
+{"foo": {"bar": True, "baz": None}},
+],
+).astype({"foo": pd_type})
+
+# Round trip df0 into df1
+with io.BytesIO() as stream:
+df0.to_parquet(stream, schema=schema)
+stream.seek(0)
+df1 = pd.read_parquet(stream, dtype_backend="pyarrow")

Review Comment:
   Addressed in 
https://github.com/apache/arrow/pull/44720/commits/1c23076fb37f8903762c7416db12e8e453e23366



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


bretttully commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1842850169


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():
+pa_type = pa.struct(
+[
+pa.field("bar", pa.bool_(), nullable=False),
+pa.field("baz", pa.float32(), nullable=True),
+],
+)
+pd_type = pd.ArrowDtype(pa_type)
+schema = pa.schema([pa.field("foo", pa_type)])
+df0 = pd.DataFrame(
+[
+{"foo": {"bar": True, "baz": np.float32(1)}},
+{"foo": {"bar": True, "baz": None}},
+],
+).astype({"foo": pd_type})
+
+# Round trip df0 into df1
+with io.BytesIO() as stream:
+df0.to_parquet(stream, schema=schema)
+stream.seek(0)
+df1 = pd.read_parquet(stream, dtype_backend="pyarrow")

Review Comment:
   The error only gets thrown once the pandas metadata is added to the table. 
That's why I have used a round-trip test. Is there another way to generate that 
metadata and set it on the table before calling to_pandas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


jorisvandenbossche commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1842857876


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():
+pa_type = pa.struct(
+[
+pa.field("bar", pa.bool_(), nullable=False),
+pa.field("baz", pa.float32(), nullable=True),
+],
+)
+pd_type = pd.ArrowDtype(pa_type)
+schema = pa.schema([pa.field("foo", pa_type)])
+df0 = pd.DataFrame(
+[
+{"foo": {"bar": True, "baz": np.float32(1)}},
+{"foo": {"bar": True, "baz": None}},
+],
+).astype({"foo": pd_type})
+
+# Round trip df0 into df1
+with io.BytesIO() as stream:
+df0.to_parquet(stream, schema=schema)
+stream.seek(0)
+df1 = pd.read_parquet(stream, dtype_backend="pyarrow")

Review Comment:
   The pandas `to_parquet` method essentially just does a `table = 
pa.Table.from_pandas(df)` and then writes that to parquet (and `pa.table(df)` 
is a shorter less explicit version of that, but you can also use 
Table.from_pandas)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


jorisvandenbossche commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1842855604


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():
+pa_type = pa.struct(
+[
+pa.field("bar", pa.bool_(), nullable=False),
+pa.field("baz", pa.float32(), nullable=True),
+],
+)
+pd_type = pd.ArrowDtype(pa_type)
+schema = pa.schema([pa.field("foo", pa_type)])
+df0 = pd.DataFrame(
+[
+{"foo": {"bar": True, "baz": np.float32(1)}},
+{"foo": {"bar": True, "baz": None}},
+],
+).astype({"foo": pd_type})
+
+# Round trip df0 into df1
+with io.BytesIO() as stream:
+df0.to_parquet(stream, schema=schema)
+stream.seek(0)
+df1 = pd.read_parquet(stream, dtype_backend="pyarrow")

Review Comment:
   The metadata gets added on the pyarrow side, so `table = pa.table(df)` will 
do that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477364376

   The `test_dlpack` failure in the tests you can ignore 
(https://github.com/apache/arrow/issues/44728)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


bretttully commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477346521

   > > By switching the logical ordering, it means that we don't need to call 
`_pandas_api.pandas_dtype(dtype)` when using the pyarrow backend,
   > 
   > And because you added a `name not in ext_columns` to the subsequent 
methods to fill `ext_columns`, this should preserve the priority of the 
different methods to determine the pandas dtype? (metadata < pyarrow extension 
type < types_mapper)
   
   Yes, exactly. Priority remains the same, but functions are skipped if the 
field already has a type, meaning that the code causing the error is no longer 
called if types_mapper is provided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


jorisvandenbossche commented on code in PR #44720:
URL: https://github.com/apache/arrow/pull/44720#discussion_r1842832234


##
python/pyarrow/tests/test_pandas.py:
##
@@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping():
 assert isinstance(result['a'].dtype, pd.PeriodDtype)
 
 
+
+def test_to_pandas_extension_dtypes_mapping_complex_type():
+pa_type = pa.struct(
+[
+pa.field("bar", pa.bool_(), nullable=False),
+pa.field("baz", pa.float32(), nullable=True),
+],
+)
+pd_type = pd.ArrowDtype(pa_type)
+schema = pa.schema([pa.field("foo", pa_type)])
+df0 = pd.DataFrame(
+[
+{"foo": {"bar": True, "baz": np.float32(1)}},
+{"foo": {"bar": True, "baz": None}},
+],
+).astype({"foo": pd_type})
+
+# Round trip df0 into df1
+with io.BytesIO() as stream:
+df0.to_parquet(stream, schema=schema)
+stream.seek(0)
+df1 = pd.read_parquet(stream, dtype_backend="pyarrow")

Review Comment:
   You might not need the roundtrip to parquet, but a `table = pa.table(df); 
result = table.to_pandas(types_mapper=pd.ArrowDtype)` should be sufficient to 
test this?
   
   I know this doesn't test exactly `pd.read_parquet` in its entirety, but it 
should test the relevant part on the pyarrow side, and an actual 
pd.read_parquet test can still be added to pandas



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-14 Thread via GitHub


jorisvandenbossche commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477313281

   > By switching the logical ordering, it means that we don't need to call 
`_pandas_api.pandas_dtype(dtype)` when using the pyarrow backend,
   
   And because you added a `name not in ext_columns` to the subsequent methods 
to fill `ext_columns`, this should preserve the priority of the different 
methods to determine the pandas dtype? (metadata < pyarrow extension type < 
types_mapper)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]

2024-11-13 Thread via GitHub


github-actions[bot] commented on PR #44720:
URL: https://github.com/apache/arrow/pull/44720#issuecomment-2475073119

   :warning: GitHub issue #39914 **has been automatically assigned in GitHub** 
to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org