You can create an InMemoryDataset from a RecordBatch. See [1] for docs and [2] for example code. You may be able to find something similar for filtering tables.
[1]: https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset15InMemoryDataset15InMemoryDatasetENSt10shared_ptrI6SchemaEE17RecordBatchVector [2]: https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/operators.cpp#L50 Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Jul 25, 2022 at 8:49 PM 1057445597 <[email protected]> wrote: > I use the follows code to filter table, but always core dump at > scanner_builder->Filter(filter_expression_). Is there a better way to > filter a table? or a Recordbatch? > > by the way dataset::ScannerBuilder always core dump when I used it in tfio > to create a tensorflow dataset, It's most likely buggy > > > // Read file columns and build a table > std::shared_ptr<::arrow::Table> table; > CHECK_ARROW(reader->ReadTable(column_indices_, &table)); > // Convert the table to a sequence of batches > auto tr = std::make_shared<arrow::TableBatchReader>(*table.get()); > > // filter > auto scanner_builder = arrow::dataset::ScannerBuilder:: > FromRecordBatchReader(tr); > if (!dataset()->filter_.empty()) { > std::cout << filter_expression_.ToString() << std::endl; > scanner_builder->Filter(filter_expression_); > } > > ------------------------------ > 1057445597 > [email protected] > > <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=> > >
