Martin Morgan created ARROW-14677: ------------------------------------- Summary: macOS R package arrow segfault on `open_dataset()` Key: ARROW-14677 URL: https://issues.apache.org/jira/browse/ARROW-14677 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 6.0.0 Reporter: Martin Morgan
Following a slack post (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing a public bucket with the R client {code:java} df <- arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/") {code} leads to a segfault {code:java} *** caught segfault *** address 0x0, cause 'unknown' Traceback: 1: dataset__DatasetFactory_Finish1(self, unify_schemas) 2: factory$Finish(schema, isTRUE(unify_schemas)) 3: doTryCatch(return(expr), name, parentenv, handler) 4: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 5: tryCatchList(expr, classes, parentenv, handlers) 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e) { handle_parquet_io_error(e, format)} ) 7: arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/") {code} The arrow portion of the lldb traceback is {code:java} (lldb) thread backtrace thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr() + 46 frame #1: 0x0000000128bb6ac2 arrow.so`void parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned char const*, unsigned int*, parquet::format::FileMetaData*) + 309 frame #2: 0x0000000128bb5f49 arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 517 frame #3: 0x0000000128bace0d arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85 frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 89 frame #5: 0x0000000128b9cb4a arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer> const&, unsigned int) + 118 frame #6: 0x0000000128b9df43 arrow.so`parquet::SerializedFile::ParseMetaData() + 607 frame #7: 0x0000000128b9dc6c arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) + 214 frame #8: 0x0000000128b9eb72 arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, parquet::ReaderProperties const&, std::_1::shared_ptr<parquet::FileMetaData>) + 58 frame #9: 0x0000000128c8a988 arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource const&, arrow::dataset::ScanOptions*) const + 286 frame #10: 0x0000000128c8a72e arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource const&) const + 44 frame #11: 0x0000000128c0b994 arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions) + 336 frame #12: 0x0000000128c09079 arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions) + 43 frame #13: 0x0000000128c0c1cf arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions) + 541 frame #14: 0x0000000128a66805 arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory> const&, bool) + 69 frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + 154 {code} arrow was installed from source on {code:java} > sessionInfo() R Under development (unstable) (2021-10-28 r81109) Platform: x86_64-apple-darwin19.6.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_6.0.0.2 loaded via a namespace (and not attached): [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0 [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1 [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5 [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4 {code} During package installation, the one step that was 'new' to me was the use of autobrew {code:java} *** Downloading apache-arrow Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code} I'm not sure how to validate that this use is consistent with my brew installation. -- This message was sent by Atlassian Jira (v8.20.1#820001)