UNCLASSIFIED

Hi,

I'm reasonably new to using Accumulo so I apologise if some of my terminology 
is incorrect.

A bit of overview

We have an Accumulo table that ingests data in daily increments and ages off 
data in daily increments.  For each unique rowid we maintain a daily max and 
min value and a count, using the MinCombiner, MaxCombiner and SummingCombiner.  
When a user queries the table for a rowid, scan iterators are added to 
calculate the min, max and count across the entire table by adding up the daily 
summaries of min, max and count.

The timestamp is truncated to a days timestamp, eg 1111100000000 in the example 
below.  This approach allows us to age off a days worth of data without having 
to recalculate the summary data because it is calculated by the scan iterators.

The problem

The issue I have come across is when the scan iterators are added I get 
different results based on the priority of the minCombiner and maxCombiner.  
The priority of the SummingCombiner seems unaffected when I change its 
priority. If the MinCombiner's priority is higher (smaller number) than the 
MaxCombiner the result is correct, but if I switch the priorities and give the 
MaxCombiner the higher priority the result is incorrect and the minCombiner is 
not run.


This looks like
----------------------------------------------------------------------------

Range range = new Range("harry", "harry~");

//Setup the MIN
IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc", 
MinCombiner.class");
MinCombiner.setColumns(isTotalMin, Collections.singleton(new 
Iterator.setting.Colomn("min")));
MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);

//Setup the MAX
IteratorSetting isTotalMax = new IteratorSetting ( 16, "Max Calc", 
MaxCombiner.class");
MaxCombiner.setColumns(isTotalMax, Collections.singleton(new 
Iterator.setting.Colomn("max")));
MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);

//Setup the MIN
IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc", 
SummingCombiner.class");
SummingCombiner.setColumns(isTotalCount, Collections.singleton(new 
Iterator.setting.Colomn("count")));
SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);

Scanner s = connector.createScanner(tableName, new Authorizations("L1", "L2");
s.addScanIterator(isTotalCount);
s.addScanIterator(isTotalMin);
s.addScanIterator(isTotalMax);
s.setRange(range);
s.fetchColumnFamily(new Text("count");
s.fetchColumnFamily(new Text("min");
s.fetchColumnFamily(new Text("max");
for (Entry<Key, Value> e : s) {
  System.out.println(e.getKey().getRow() + ", " + e.getKey().getColumnFamily() 
+ ", " + e.getKey().getColumnQualifier() + ", VALUE: " + e.getValue());
}

--------------------------------------------------------------

If I run the above I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
harry, min, 1111100000000, VALUE: 999

This is correct.

However if I alter the priority of the MaxCombiner to be 14 and leave the 
MinCombiner at 15 I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500

I lose the min value altogether.  I have tested altering the priority of the 
SummingCombiner but it doesn't seem to have any effect.

This may be due to the way I have setup the iterators or could be an Accumulo 
bug.

Keen to hear any thoughts.

Thanks in advance,
Matt

IMPORTANT: This email remains the property of the Department of Defence and is 
subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have 
received this email in error, you are requested to contact the sender and 
delete the email.

Reply via email to