Hi, I could use some input.
I am periodically incrementing an arrow table with additional arrow data.
The arrow data is retrieved from the same place, so same schema.
I am creating a fresh table from the incoming buffer with some client side
column additions, and then concating the new table onto the primary table.

This works fantastic, until I use the countBy method - which looks to only
use the last batch dictionary which comes from the most recent poll. This
dictionary might account for 1% of the data in the table, and thus is
definitely not a delta.

What's my next step? I'm close to just fixing the countBy function, but
that doesn't solve the problem at the core - the last batch dictionary is
supposed to be the most complete delta of the previous batches. Any use of
the batch dictionaries will be invalid as they are only reflective of their
batch.

I've tried - concating batches/chunks and retaining all buffers from every
poll iteration and loading them at the same via a batchreader.all (hoping
some logic i've not seen would unify batch dictionaries).

Thanks,
-Dan Lustig

Reply via email to