internal::checked_pointer_cast isn't really anything special. It simply switches between std::static_pointer_cast<T> and std::dynamic_pointer_cast<T> depending on debug/release compilation. So you can choose one or the other depending on how confident you are in the type you are casting.
On Sat, May 22, 2021 at 9:23 PM Xander Dunn <[email protected]> wrote: > Alright, I got it working: > > parquet::WriterProperties::Builder file_writer_options_builder; > file_writer_options_builder.compression(arrow::Compression::BROTLI); > > //file_writer_options_builder.compression(arrow::Compression::UNCOMPRESSED); > std::shared_ptr<parquet::WriterProperties> props = > file_writer_options_builder.build(); > > std::shared_ptr<ds::FileWriteOptions> file_write_options = > format->DefaultWriteOptions(); > auto parquet_options = > arrow::internal::checked_pointer_cast<ds::ParquetFileWriteOptions>(file_write_options); > parquet_options->writer_properties = props; > arrow::dataset::FileSystemDatasetWriteOptions write_options; > write_options.file_write_options = parquet_options; > > But surely a call to arrow::internal is not the intended usage? > > On Sat, May 22, 2021 at 8:52 PM Xander Dunn <[email protected]> wrote: > >> I see how to compress writes to a particular file using >> arrow::io::CompressedOutputStream::Make, but I’m having difficulty >> figuring out how to make Dataset writes compressed. I have my code set >> up similar to the CreateExampleParquetHivePartitionedDataset example here >> <https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset_documentation_example.cc#L113>. >> >> >> I suspect there is some option on the FileSystemDatasetWriteOptions to >> specify compression, but I haven’t been able to uncover it: >> >> ds::FileSystemDatasetWriteOptions write_options; >> write_options.file_write_options = format->DefaultWriteOptions(); >> write_options.filesystem = filesystem; >> write_options.base_dir = base_path; >> write_options.partitioning = partitioning; >> write_options.basename_template = "part{i}.parquet"; >> ABORT_ON_FAILURE(ds::FileSystemDataset::Write(write_options, scanner)); >> >> FileSystemDatasetWriteOptions is defined here >> <https://github.com/apache/arrow/blob/602a76ac58bc8de60a353648f02cf11891563e77/cpp/src/arrow/dataset/file_base.h#L331> >> and doesn’t have a compression option. >> >> The file_write_options property is a ParquetFileWriteOptions, which is >> defined here >> <https://github.com/apache/arrow/blob/8b4942728e7347dc921a2d423e996fea5f9e2102/cpp/src/arrow/dataset/file_parquet.h#L222> >> and has a parquet::WriterProperties and parquet::ArrowWriterProperties. >> It’s created here: >> >> std::shared_ptr<FileWriteOptions> ParquetFileFormat::DefaultWriteOptions() { >> std::shared_ptr<ParquetFileWriteOptions> options( >> new ParquetFileWriteOptions(shared_from_this())); >> options->writer_properties = parquet::default_writer_properties(); >> options->arrow_writer_properties = >> parquet::default_arrow_writer_properties(); >> return options; >> } >> >> parquet::WriterProperties can be created with a compression specified >> like this: >> >> parquet::WriterProperties::Builder file_writer_options_builder; >> file_writer_options_builder.compression(arrow::Compression::BROTLI); >> std::shared_ptr<parquet::WriterProperties> props = >> file_writer_options_builder.build(); >> >> However, I have been unable to create a FileWriteOptions which includes >> this WriterProperties. What is shared_from_this()? Creating a >> FileWriteOptions with std::make_shared<> doesn’t compile. Any pointers >> on creating a FileWriteOptions in my project, or a better way to specify >> the compression type on a dataset write? >> >
