Hello all,
When I initiate a full major compaction (with flushing turned on) manually via
the Accumulo API
https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact(java.lang.String,
org.apache.hadoop.io.Text, org.apache.hadoop.io.Text,
Dylan,
The effect of a major compaction is never seen in queries before the major
compaction completes. At the end of the major compaction there is a
multi-phase commit which eventually replaces all of the old files with the
new file. At that point the major compaction will have completely
Dylan,
I think the way this is generally solved is by using an idempotent iterator
that can be applied at both full major compaction and query scopes to give
a consistent view. Aggregation, age-off filtering, and all the other
standard iterators have the property that you can leave them in place
Thanks Adam and Keith.
I see the following as a potential solution that achieves (1) low latency
for clients that want to see entries after an iterator and (2) the entries
from that iterator persisting in the Accumulo table.
1. Start a major compaction in thread T1 of a client with the
Is your iterator which is rewriting data during compaction idempotent?
If you can apply the same function (the iterator) multiple times over
the data (maybe only in the scan, maybe in the scan and by a major
compaction), the only concern is doing a bit more work in the server.
Given that you
Good suggestion; I will follow up with a design document in the next few
days.
Creating idempotentency via indicator entries (in the column family,
timestamp or something else) is one option to work in an iterator that
should run once over a table's entries. I think we may have the
opportunity