I'm kind of new to Trident, and was looking for some help making
persistentAggregates group values over a defined set of tuples.
Groupby() followed by a regular (combiner) aggregate accumulates values for
the duration of a batch, then outputs them all onto an output stream, which
is very convenient.
I'd like to make a persistentAggregate behave similarly, except that it
aggregates across batches, then outputs all its aggregates onto an output
stream when some external signal occurs, and then clears its database to
begin the next aggregation period.
One approach would be to have the spout put a "Done" flag in each tuple,
e.g. emit tuples like ("DIMENSION_1", "DIMENSION_2",
"METRIC_TO_BE_AGGREGATED", "DONE"), where it's doing .groupby(new
Fields("DIMENSION_1", "DIMENSION_2")), and the final tuple of each
aggregation period has the Done flag set, or maybe better, a final batch
would be sent consisting of a single tuple with null values for all the
other columns and the Done flag set to indicate all previous batches are
complete.
The problem with that is I'm not sure how to get the tuple with the "Done"
flag to go to all the partitions of the persistentAggregate - is there any
way to broadcast a specific tuple to all partitions?
Another approach would be to split another stream off the spout, that
watches for the "Done" flag and generates a state query when it sees it.
But it doesn't appear to be possible for a single state query to generate
more than a single tuple of output - it appears to want to be given a list
of keys to query for, and return the tuples for those keys - I don't see a
way for a state query to dump the whole DB.
Is there any way to do what I'm trying to do?