yurikoomiga created ARROW-16642: ----------------------------------- Summary: An Error Occured While Reading Parquet File Using C++ - GetRecordBatchReader -Corrupt snappy compressed data. Key: ARROW-16642 URL: https://issues.apache.org/jira/browse/ARROW-16642 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 8.0.0 Environment: C++,arrow 7.0.0 ,snappy 1.1.8, arrow 8.0.0 pyarrow 7.0.0 ubuntu 9.4.0 python3.8,
Reporter: yurikoomiga Attachments: test_std_02.py Hi All When I use Arrow Reading Parquet File like follow: ``` auto st = parquet::arrow::FileReader::Make( arrow::default_memory_pool(), parquet::ParquetFileReader::Open(_parquet, _properties), &_reader); arrow::Status status = _reader->GetRecordBatchReader(\{_current_group},_parquet_column_ids, &_rb_batch); _reader->set_batch_size(65536); _reader->set_use_threads(true); status = _rb_batch->ReadNext(&_batch); ` ``` status is not ok and an error occured like this: `IOError: Corrupt snappy compressed data.` When I comment out this statement ` _reader->set_use_threads(true);`,The program runs normally and I can read parquet file well. Program errors only occur when I read multiple columns and using `_reader->set_use_threads(true); `and a single column will not occur error The testing parquet file is created by pyarrow,I use only 1 group and each group has 3000000 records. The parquet file has 20 columns including int and string types you can create a test parquet file using attachment python script Reading file using C++,arrow 7.0.0 ,snappy 1.1.8 Writting file using python3.8 ,pyarrow 7.0.0 Looking forward to your reply Thank you! -- This message was sent by Atlassian Jira (v8.20.7#820007)