You can create an InMemoryDataset from a RecordBatch. See [1] for docs and
[2] for example code. You may be able to find something similar for
filtering tables.

[1]:
https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset15InMemoryDataset15InMemoryDatasetENSt10shared_ptrI6SchemaEE17RecordBatchVector
[2]:
https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/operators.cpp#L50

Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


On Mon, Jul 25, 2022 at 8:49 PM 1057445597 <[email protected]> wrote:

> I use the follows code to filter table, but always core dump at
> scanner_builder->Filter(filter_expression_). Is there a better way to
> filter a table? or a Recordbatch?
>
> by the way dataset::ScannerBuilder always core dump when I used it in tfio
> to create a tensorflow dataset, It's most likely buggy
>
>
> // Read file columns and build a table
> std::shared_ptr<::arrow::Table> table;
> CHECK_ARROW(reader->ReadTable(column_indices_, &table));
> // Convert the table to a sequence of batches
> auto tr = std::make_shared<arrow::TableBatchReader>(*table.get());
>
> // filter
> auto scanner_builder = arrow::dataset::ScannerBuilder::
> FromRecordBatchReader(tr);
> if (!dataset()->filter_.empty()) {
> std::cout << filter_expression_.ToString() << std::endl;
> scanner_builder->Filter(filter_expression_);
> }
>
> ------------------------------
> 1057445597
> [email protected]
>
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>
>

Reply via email to