UNCLASSIFIED
Hi,
I'm reasonably new to using Accumulo so I apologise if some of my terminology
is incorrect.
A bit of overview
We have an Accumulo table that ingests data in daily increments and ages off
data in daily increments. For each unique rowid we maintain a daily max and
min value and a count, using the MinCombiner, MaxCombiner and SummingCombiner.
When a user queries the table for a rowid, scan iterators are added to
calculate the min, max and count across the entire table by adding up the daily
summaries of min, max and count.
The timestamp is truncated to a days timestamp, eg 1111100000000 in the example
below. This approach allows us to age off a days worth of data without having
to recalculate the summary data because it is calculated by the scan iterators.
The problem
The issue I have come across is when the scan iterators are added I get
different results based on the priority of the minCombiner and maxCombiner.
The priority of the SummingCombiner seems unaffected when I change its
priority. If the MinCombiner's priority is higher (smaller number) than the
MaxCombiner the result is correct, but if I switch the priorities and give the
MaxCombiner the higher priority the result is incorrect and the minCombiner is
not run.
This looks like
----------------------------------------------------------------------------
Range range = new Range("harry", "harry~");
//Setup the MIN
IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc",
MinCombiner.class");
MinCombiner.setColumns(isTotalMin, Collections.singleton(new
Iterator.setting.Colomn("min")));
MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);
//Setup the MAX
IteratorSetting isTotalMax = new IteratorSetting ( 16, "Max Calc",
MaxCombiner.class");
MaxCombiner.setColumns(isTotalMax, Collections.singleton(new
Iterator.setting.Colomn("max")));
MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);
//Setup the MIN
IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc",
SummingCombiner.class");
SummingCombiner.setColumns(isTotalCount, Collections.singleton(new
Iterator.setting.Colomn("count")));
SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);
Scanner s = connector.createScanner(tableName, new Authorizations("L1", "L2");
s.addScanIterator(isTotalCount);
s.addScanIterator(isTotalMin);
s.addScanIterator(isTotalMax);
s.setRange(range);
s.fetchColumnFamily(new Text("count");
s.fetchColumnFamily(new Text("min");
s.fetchColumnFamily(new Text("max");
for (Entry<Key, Value> e : s) {
System.out.println(e.getKey().getRow() + ", " + e.getKey().getColumnFamily()
+ ", " + e.getKey().getColumnQualifier() + ", VALUE: " + e.getValue());
}
--------------------------------------------------------------
If I run the above I get:
harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
harry, min, 1111100000000, VALUE: 999
This is correct.
However if I alter the priority of the MaxCombiner to be 14 and leave the
MinCombiner at 15 I get:
harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
I lose the min value altogether. I have tested altering the priority of the
SummingCombiner but it doesn't seem to have any effect.
This may be due to the way I have setup the iterators or could be an Accumulo
bug.
Keen to hear any thoughts.
Thanks in advance,
Matt
IMPORTANT: This email remains the property of the Department of Defence and is
subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have
received this email in error, you are requested to contact the sender and
delete the email.