Josh Weinstock created ARROW-6573: ------------------------------------- Summary: Segfault when writing to parquet Key: ARROW-6573 URL: https://issues.apache.org/jira/browse/ARROW-6573 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.14.1 Environment: Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7. Reporter: Josh Weinstock
When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes. Here is a reproducible example: {code:java} import pyarrow as pa import pyarrow.parquet as pq data = dict() data["key"] = [0, 1, 2, 3] # segfault #data["key"] = ["0", "1", "2", "3"] # no segfault schema = pa.schema({"key" : pa.string()}) table = pa.Table.from_pydict(data, schema = schema) print("now writing out test file") pq.write_table(table, "test.parquet") {code} This results in a segfault when writing the table. Running {code:java} gdb -ex r --args python test.py {code} Yields {noformat} Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14 {noformat} Thanks for all of your arrow work, Josh -- This message was sent by Atlassian Jira (v8.3.2#803003)