[ https://issues.apache.org/jira/browse/ARROW-6573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-6573: -------------------------------- Summary: [Python] Segfault when writing to parquet (was: Segfault when writing to parquet) > [Python] Segfault when writing to parquet > ----------------------------------------- > > Key: ARROW-6573 > URL: https://issues.apache.org/jira/browse/ARROW-6573 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.14.1 > Environment: Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. > Using Anaconda distribution of Python 3.7. > Reporter: Josh Weinstock > Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When attempting to write out a pyarrow table to parquet I am observing a > segfault when there is a mismatch between the schema and the datatypes. > Here is a reproducible example: > > {code:java} > import pyarrow as pa > import pyarrow.parquet as pq > data = dict() > data["key"] = [0, 1, 2, 3] # segfault > #data["key"] = ["0", "1", "2", "3"] # no segfault > schema = pa.schema({"key" : pa.string()}) > table = pa.Table.from_pydict(data, schema = schema) > print("now writing out test file") > pq.write_table(table, "test.parquet") > {code} > This results in a segfault when writing the table. Running > > {code:java} > gdb -ex r --args python test.py > {code} > Yields > > > {noformat} > Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in > virtual thunk to > parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> > >::Put(parquet::ByteArray const*, int) () from > /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14 > {noformat} > > > Thanks for all of your arrow work, > Josh -- This message was sent by Atlassian Jira (v8.3.4#803005)