Scans during Compaction

Dylan Hutchison Mon, 23 Feb 2015 09:37:56 -0800

Hello all,

When I initiate a full major compaction (with flushing turned on) manually via
the Accumulo API
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact(java.lang.String,
org.apache.hadoop.io.Text, org.apache.hadoop.io.Text, java.util.List,
boolean, boolean)>, how does the table appear to

1. clients that started scanning the table before the major compaction
began;
2. clients that start scanning during the major compaction?

I'm interested in the case where there is an iterator attached to the full
major compaction that modifies entries (respecting sorted order of entries).

The best possible answer for my use case, with case #2 more important than
case #1 and *low latency* more important than high throughput, is that

1. clients that started scanning before the compaction began would not
see entries altered by the compaction-time iterator;
2. clients that start scanning during the major compaction stream back
entries as they finish processing from the major compaction, such that the
clients *only* see entries that have passed through the compaction-time
iterator.

How accurate are these descriptions? If #2 really were as I would like it
to be, then a scan on the range (-inf,+inf) started after compaction would
"monitor compaction progress," such that the first entry batch transmits to
the scanner as soon as it is available from the major compaction, and the
scanner finishes (receives all entries) exactly when the compaction
finishes. If this is not possible, I may make something to that effect by
calling the blocking version of compact().

Bonus: how does cancelCompaction()
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#cancelCompaction(java.lang.String)>
affect clients scanning in case #1 and case #2?

Regards,
Dylan Hutchison

Scans during Compaction

Reply via email to