you should be able to roll up on keys with a condition similar to:
if( source.hasTop() ) {
Key start = new Key(source.getTopKey()); // avoid instance-reuse issues
long count = 0;
while( source.hasTop() && start.equals( source.getTopKey(),
PartialKey.ROW_COLFAM_COLQUAL_COLVIS ) {
count += deserialize(source.getTopValue());
source.next();
}
Value new_top_value = serialize(count);
// start can represent the top key of the iterator
}
We can flesh this out further if you run into issues. I think that we may
need to set the start key's timestamp to 0 so that it sorts after all the
other cells with a similar prefix.
On Tue, Jul 1, 2014 at 10:41 PM, Matthew Purdy <
[email protected]> wrote:
>
>
> USE CASE: on scan only; want to have a "summing combiner" that rolls
> up by (rowId, colfam, colqual) on all row keys where the client has
> visibility.
>
> below is a simple example that expresses the use case.
>
> accumulo table holding student to professor relationship by departments
>
>
> +----------+------------------+-----------+--------------+-----+
> | rowId | colfam | colqual | colvis | val |
> +----------+------------------+-----------+--------------+-----+
> | student1 | TAKES_CLASS_WITH | prof1 | MATH_DEPT | 1 |
> | student1 | TAKES_CLASS_WITH | prof1 | MATH_DEPT | 1 |
> | student1 | TAKES_CLASS_WITH | prof1 | COM_SCI_DEPT | 1 |
> | student1 | TAKES_CLASS_WITH | prof1 | COM_SCI_DEPT | 1 |
> | student2 | TAKES_CLASS_WITH | prof1 | MATH_DEPT | 1 |
> | student2 | TAKES_CLASS_WITH | prof1 | COM_SCI_DEPT | 1 |
> +----------+------------------+-----------+--------------+-----+
>
>
> with the summing combiner the results would be
>
> +----------+------------------+-----------+--------------+-----+
> | rowId | colfam | colqual | colvis | val |
> +----------+------------------+-----------+--------------+-----+
> | student1 | TAKES_CLASS_WITH | prof1 | MATH_DEPT | 2 |
> | student1 | TAKES_CLASS_WITH | prof1 | COM_SCI_DEPT | 2 |
> | student2 | TAKES_CLASS_WITH | prof1 | MATH_DEPT | 1 |
> | student2 | TAKES_CLASS_WITH | prof1 | COM_SCI_DEPT | 1 |
> +----------+------------------+-----------+--------------+-----+
>
> - the math department can only see math department totals
> - the com sci department can only see the com sci department total
> - the office of the dean has both access
>
> therefore when scanning (it wouldnt work for compaction), how
> can you sum over colvis?
>
> assuming you had both colvis access the desired results would be:
>
> +----------+------------------+-----------+-----+
> | rowId | colfam | colqual | val |
> +----------+------------------+-----------------+
> | student1 | TAKES_CLASS_WITH | prof1 | 4 |
> | student2 | TAKES_CLASS_WITH | prof1 | 2 |
> +----------+------------------+-----------+-----+
>
>
>