Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504975262 🚀 Thanks for all your help here @jorisvandenbossche! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
conbench-apache-arrow[bot] commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504894479 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 8548c2261b4753f4394ec20aa9c1f35f5e0b870e. There were 132 benchmark results with an error: - Commit Run on `arm64-t4g-2xlarge-linux` at [2024-11-27 18:04:38Z](https://conbench.ursa.dev/compare/runs/979df96ff09c41a2bd69ba53cb73dd60...eab4f6f95e38465fb6ae7b0271d967de/) - [`tpch` (R) with engine=arrow, format=parquet, language=R, memory_map=False, query_id=TPCH-07, scale_factor=1](https://conbench.ursa.dev/benchmark-results/067476d564267c988000ed4ac788c08a) - [`tpch` (R) with engine=arrow, format=native, language=R, memory_map=False, query_id=TPCH-09, scale_factor=1](https://conbench.ursa.dev/benchmark-results/067476d7d038754180003cc466d663d8) - and 130 more (see the report linked below) There were no benchmark performance regressions. 🎉 The [full Conbench report](https://github.com/apache/arrow/runs/33627078459) has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche merged PR #44720: URL: https://github.com/apache/arrow/pull/44720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2504271277 > Is there anything else for me to do here? No, just me getting back to merge it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
raulcd commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2497346892 > Is there anything else for me to do here? I don't think so. I am not comfortable with this area of our codebase so I'll let @jorisvandenbossche merge once he's happy about it, but as he already approved, he might do that soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2495762911 Is there anything else for me to do here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
github-actions[bot] commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490984364 Revision: 685167fb8dc28190f9fd8600bca5df7799663e5a Submitted crossbow builds: [ursacomputing/crossbow @ actions-524e782c26](https://github.com/ursacomputing/crossbow/branches/all?query=actions-524e782c26) |Task|Status| ||--| |example-python-minimal-build-fedora-conda|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-524e782c26-github-example-python-minimal-build-fedora-conda)](https://github.com/ursacomputing/crossbow/actions/runs/11953010576/job/33319986570)| |example-python-minimal-build-ubuntu-venv|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-524e782c26-github-example-python-minimal-build-ubuntu-venv)](https://github.com/ursacomputing/crossbow/actions/runs/11953010367/job/33319986020)| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
raulcd commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490977517 @github-actions crossbow submit example-python-minimal-build-* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490873819 I have merged `upstream/main` and pushed tags. Let's see if this works... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490844257 Thanks for investigating that! So then to resolve this here, @bretttully should fetch the upstream tags and push that to his fork? Something like ``` git fetch upstream git push origin --tags ``` (assuming upstream is apache/arrow and origin is bretttully/arrow) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
raulcd commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490666176 I've opened an issue because we should find a way to not fail if the dev tag is not present: - https://github.com/apache/arrow/issues/44803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
raulcd commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490360277 > The logs indicate "Successfully installed pyarrow-0.1.dev16896+ge3b9892" From the git checkout I see is pulling from the remote on ` Syncing repository: bretttully/arrow`. I recall an issue if dev tags are not present we are unable to detect the correct version. The remote doesn't seem to have other branches and/or tags. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490325537 (the other failures are the known nightly dlpack failures) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2490314967 @raulcd it seems something is going wrong with the minimal test builds (eg example-python-minimal-build-fedora-conda). The logs indicate "Successfully installed pyarrow-0.1.dev16896+ge3b9892", which then messes up pandas detection of the pyarrow version (for the pyarrow integration in pandas, pandas checks if pyarrow is recent enough and otherwise errors), giving some test failures. (but also not entirely sure how this PR causes this issue, since I don't see the nightlies fail for the minimal builds at the moment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
github-actions[bot] commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2489757867 Revision: e3b9892e5663f4888bc79c38b9d36bbefcdaf2b4 Submitted crossbow builds: [ursacomputing/crossbow @ actions-e01b93275b](https://github.com/ursacomputing/crossbow/branches/all?query=actions-e01b93275b) |Task|Status| ||--| |example-python-minimal-build-fedora-conda|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-example-python-minimal-build-fedora-conda)](https://github.com/ursacomputing/crossbow/actions/runs/11943636788/job/33293053196)| |example-python-minimal-build-ubuntu-venv|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-example-python-minimal-build-ubuntu-venv)](https://github.com/ursacomputing/crossbow/actions/runs/11943636730/job/33293052678)| |test-conda-python-3.10|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10)](https://github.com/ursacomputing/crossbow/actions/runs/11943636596/job/33293051382)| |test-conda-python-3.10-cython2|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-cython2)](https://github.com/ursacomputing/crossbow/actions/runs/11943636683/job/33293052512)| |test-conda-python-3.10-hdfs-2.9.2|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/11943636650/job/33293052279)| |test-conda-python-3.10-hdfs-3.2.1|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/11943636573/job/33293051473)| |test-conda-python-3.10-pandas-latest-numpy-latest|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-pandas-latest-numpy-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636852/job/33293053431)| |test-conda-python-3.10-substrait|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.10-substrait)](https://github.com/ursacomputing/crossbow/actions/runs/11943636844/job/33293053233)| |test-conda-python-3.11|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11)](https://github.com/ursacomputing/crossbow/actions/runs/11943636757/job/33293053197)| |test-conda-python-3.11-dask-latest|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636907/job/33293053925)| |test-conda-python-3.11-dask-upstream_devel|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/1194363/job/33293052227)| |test-conda-python-3.11-hypothesis|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-hypothesis)](https://github.com/ursacomputing/crossbow/actions/runs/11943636858/job/33293053951)| |test-conda-python-3.11-pandas-latest-numpy-1.26|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-latest-numpy-1.26)](https://github.com/ursacomputing/crossbow/actions/runs/11943636845/job/33293053428)| |test-conda-python-3.11-pandas-latest-numpy-latest|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-latest-numpy-latest)](https://github.com/ursacomputing/crossbow/actions/runs/11943636971/job/33293054692)| |test-conda-python-3.11-pandas-nightly-numpy-nightly|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-e01b93275b-github-test-conda-python-3.11-pandas-nightly-numpy-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/11943636614/job/33293051378)| |test-conda-python-3.11-pandas-upstream_devel-numpy-nightly|[![GitHub A
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2489754835 @github-actions crossbow submit -g python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1851119942 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4411,32 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + Review Comment: ```suggestion ``` Small whitespace issue the linter is complaining about -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1849308797 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4411,29 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): Review Comment: ```suggestion def test_to_pandas_extension_dtypes_mapping_complex_type(): # https://github.com/apache/arrow/pull/44720 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
raulcd commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2488018513 > is the process that I can merge this following approval, or is that done by a core maintainer? A committer will merge, probably @jorisvandenbossche in this specific case, once everything is running and addressed. I've triggered CI for the latest changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2487214002 Thanks @jorisvandenbossche -- is the process that I can merge this following approval, or is that done by a core maintainer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1843171313 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): +pa_type = pa.struct( +[ +pa.field("bar", pa.bool_(), nullable=False), +pa.field("baz", pa.float32(), nullable=True), +], +) +pd_type = pd.ArrowDtype(pa_type) +schema = pa.schema([pa.field("foo", pa_type)]) +df0 = pd.DataFrame( +[ +{"foo": {"bar": True, "baz": np.float32(1)}}, +{"foo": {"bar": True, "baz": None}}, +], +).astype({"foo": pd_type}) + +# Round trip df0 into df1 +with io.BytesIO() as stream: +df0.to_parquet(stream, schema=schema) +stream.seek(0) +df1 = pd.read_parquet(stream, dtype_backend="pyarrow") Review Comment: Addressed in https://github.com/apache/arrow/pull/44720/commits/1c23076fb37f8903762c7416db12e8e453e23366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1842850169 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): +pa_type = pa.struct( +[ +pa.field("bar", pa.bool_(), nullable=False), +pa.field("baz", pa.float32(), nullable=True), +], +) +pd_type = pd.ArrowDtype(pa_type) +schema = pa.schema([pa.field("foo", pa_type)]) +df0 = pd.DataFrame( +[ +{"foo": {"bar": True, "baz": np.float32(1)}}, +{"foo": {"bar": True, "baz": None}}, +], +).astype({"foo": pd_type}) + +# Round trip df0 into df1 +with io.BytesIO() as stream: +df0.to_parquet(stream, schema=schema) +stream.seek(0) +df1 = pd.read_parquet(stream, dtype_backend="pyarrow") Review Comment: The error only gets thrown once the pandas metadata is added to the table. That's why I have used a round-trip test. Is there another way to generate that metadata and set it on the table before calling to_pandas? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1842857876 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): +pa_type = pa.struct( +[ +pa.field("bar", pa.bool_(), nullable=False), +pa.field("baz", pa.float32(), nullable=True), +], +) +pd_type = pd.ArrowDtype(pa_type) +schema = pa.schema([pa.field("foo", pa_type)]) +df0 = pd.DataFrame( +[ +{"foo": {"bar": True, "baz": np.float32(1)}}, +{"foo": {"bar": True, "baz": None}}, +], +).astype({"foo": pd_type}) + +# Round trip df0 into df1 +with io.BytesIO() as stream: +df0.to_parquet(stream, schema=schema) +stream.seek(0) +df1 = pd.read_parquet(stream, dtype_backend="pyarrow") Review Comment: The pandas `to_parquet` method essentially just does a `table = pa.Table.from_pandas(df)` and then writes that to parquet (and `pa.table(df)` is a shorter less explicit version of that, but you can also use Table.from_pandas) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1842855604 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): +pa_type = pa.struct( +[ +pa.field("bar", pa.bool_(), nullable=False), +pa.field("baz", pa.float32(), nullable=True), +], +) +pd_type = pd.ArrowDtype(pa_type) +schema = pa.schema([pa.field("foo", pa_type)]) +df0 = pd.DataFrame( +[ +{"foo": {"bar": True, "baz": np.float32(1)}}, +{"foo": {"bar": True, "baz": None}}, +], +).astype({"foo": pd_type}) + +# Round trip df0 into df1 +with io.BytesIO() as stream: +df0.to_parquet(stream, schema=schema) +stream.seek(0) +df1 = pd.read_parquet(stream, dtype_backend="pyarrow") Review Comment: The metadata gets added on the pyarrow side, so `table = pa.table(df)` will do that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477364376 The `test_dlpack` failure in the tests you can ignore (https://github.com/apache/arrow/issues/44728) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
bretttully commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477346521 > > By switching the logical ordering, it means that we don't need to call `_pandas_api.pandas_dtype(dtype)` when using the pyarrow backend, > > And because you added a `name not in ext_columns` to the subsequent methods to fill `ext_columns`, this should preserve the priority of the different methods to determine the pandas dtype? (metadata < pyarrow extension type < types_mapper) Yes, exactly. Priority remains the same, but functions are skipped if the field already has a type, meaning that the code causing the error is no longer called if types_mapper is provided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on code in PR #44720: URL: https://github.com/apache/arrow/pull/44720#discussion_r1842832234 ## python/pyarrow/tests/test_pandas.py: ## @@ -4411,6 +4412,31 @@ def test_to_pandas_extension_dtypes_mapping(): assert isinstance(result['a'].dtype, pd.PeriodDtype) + +def test_to_pandas_extension_dtypes_mapping_complex_type(): +pa_type = pa.struct( +[ +pa.field("bar", pa.bool_(), nullable=False), +pa.field("baz", pa.float32(), nullable=True), +], +) +pd_type = pd.ArrowDtype(pa_type) +schema = pa.schema([pa.field("foo", pa_type)]) +df0 = pd.DataFrame( +[ +{"foo": {"bar": True, "baz": np.float32(1)}}, +{"foo": {"bar": True, "baz": None}}, +], +).astype({"foo": pd_type}) + +# Round trip df0 into df1 +with io.BytesIO() as stream: +df0.to_parquet(stream, schema=schema) +stream.seek(0) +df1 = pd.read_parquet(stream, dtype_backend="pyarrow") Review Comment: You might not need the roundtrip to parquet, but a `table = pa.table(df); result = table.to_pandas(types_mapper=pd.ArrowDtype)` should be sufficient to test this? I know this doesn't test exactly `pd.read_parquet` in its entirety, but it should test the relevant part on the pyarrow side, and an actual pd.read_parquet test can still be added to pandas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
jorisvandenbossche commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2477313281 > By switching the logical ordering, it means that we don't need to call `_pandas_api.pandas_dtype(dtype)` when using the pyarrow backend, And because you added a `name not in ext_columns` to the subsequent methods to fill `ext_columns`, this should preserve the priority of the different methods to determine the pandas dtype? (metadata < pyarrow extension type < types_mapper) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-39914: [pyarrow] Reorder to_pandas extension dtype mapping [arrow]
github-actions[bot] commented on PR #44720: URL: https://github.com/apache/arrow/pull/44720#issuecomment-2475073119 :warning: GitHub issue #39914 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org