Chen Ming created ARROW-10140: --------------------------------- Summary: No data for map column of a parquet file created from pyarrow and pandas Key: ARROW-10140 URL: https://issues.apache.org/jira/browse/ARROW-10140 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 1.0.1 Reporter: Chen Ming Attachments: test_map.py
Hi, I'm having problems reading parquet files with 'map' data type created by pyarrow. I followed [https://stackoverflow.com/questions/63553715/pyarrow-data-types-for-columns-that-have-lists-of-dictionaries] to convert a pandas DF to an arrow table, then call write_table to output a parquet file: (We also referred to https://issues.apache.org/jira/browse/ARROW-9812) {code:java} import pandas as pd import pyarrow as pa import pyarrow.parquet as pq print(f'PyArrow Version = {pa.__version__}') print(f'Pandas Version = {pd.__version__}') df = pd.DataFrame({ 'col1': pd.Series([ [('id', 'something'), ('value2', 'else')], [('id', 'something2'), ('value','else2')], ]), 'col2': pd.Series(['foo', 'bar']) }) udt = pa.map_(pa.string(), pa.string()) schema = pa.schema([pa.field('col1', udt), pa.field('col2', pa.string())]) table = pa.Table.from_pandas(df, schema) pq.write_table(table, './test_map.parquet') {code} The above code (attached as test_map.py) runs smoothly on my developing computer: {code:java} PyArrow Version = 1.0.1 Pandas Version = 1.1.2 {code} And generated the test_map.parquet file (attached as test_map.parquet) successfully. Then I use parquet-tools (1.11.1) to read the file, but get the following output: {code:java} $ java -jar parquet-tools-1.11.1.jar head test_map.parquet col1: .key_value: .key_value: col2 = foo col1: .key_value: .key_value: col2 = bar {code} I also checked the schema of the parquet file: {code:java} java -jar parquet-tools-1.11.1.jar schema test_map.parquet message schema { optional group col1 (MAP) { repeated group key_value { required binary key (STRING); optional binary value (STRING); } } optional binary col2 (STRING); }{code} Am I doing something wrong? We need to output the data a parquet files, and query them later. -- This message was sent by Atlassian Jira (v8.3.4#803005)