David Li created ARROW-11596: -------------------------------- Summary: [C++][Python][Dataset] SIGSEGV when executing scan tasks with Python executors Key: ARROW-11596 URL: https://issues.apache.org/jira/browse/ARROW-11596 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 3.0.0 Reporter: David Li Assignee: David Li
This crashes for me with a segfault: {code:python} import concurrent.futures import queue import numpy as np import pyarrow as pa import pyarrow.dataset as ds import pyarrow.fs as fs import pyarrow.parquet as pq schema = pa.schema([("foo", pa.float64())]) table = pa.table([np.random.uniform(size=1024)], schema=schema) path = "/tmp/foo.parquet" pq.write_table(table, path) dataset = pa.dataset.FileSystemDataset.from_paths( [path], schema=schema, format=ds.ParquetFileFormat(), filesystem=fs.LocalFileSystem(), ) with concurrent.futures.ThreadPoolExecutor(2) as executor: tasks = dataset.scan() q = queue.Queue() def _prebuffer(): for task in tasks: iterator = task.execute() next(iterator) q.put(iterator) executor.submit(_prebuffer).result() next(q.get()) {code} {noformat} $ uname -a Linux chaconne 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux $ pip freeze numpy==1.20.1 pyarrow==3.0.0 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)