Hi,
yes the indexing DAG can support this today and even if not, it can be
easily fixed. Main issue would be how we encode the mapping well.
for e.g if we want map from user_id to all events that belong to the user,
we need a different, scalable way of storing this mapping.
I can organize this
Hello Prashant, thanks for your time.
> With non unique keys how would tagging of records (for updates /
deletes) work?
Currently both GLOBAL_SIMPLE/BLOOM work out of the box in the mentioned
context. See below pyspark script and results. As for the
implementation, the tagLocationBacktoRecords
Hi Nicolas,
The RI feature is designed for max performance as it is at a record-count
scale. Hence, the schema is simplified and minimized.
With non unique keys how would tagging of records (for updates / deletes)
work? How would record Index know which mapping of the array to return for
a given
hi there,
Just tested preview of RLI (rfc-08), amazing feature. Soon the fast COW
(rfc-68) will be based on RLI to get the parquet offsets and allow
targeting parquet row groups.
RLI is a global index, therefore it assumes the hudi key is present in
at most one parquet file. As a result in the