Using C++//Arrow to filter out large parquet files and I’m able to do this
successfully. The current poc implementation is based on nested for/loops which
I would like to avoid this and instead use built-in filter/take functions or
some recommendations to extract (take functions ?) arrays of indices or
booleans to filter out rows.
The input (data) array/column type is MapArray[key:String,
value:StructArray[id:String, …]]
The input filter is a {filter_key: “some string”, filter_ids: [“aaa”, “bee”,
“see”, ..] }
- Where filter_key, and filter_ids is to match contents of input MapArray
The output I’m looking for is either array of booleans or indices of input
array that match the input filer.
Thank you