Scans during Compaction

2015-02-23 Thread Dylan Hutchison
Hello all, When I initiate a full major compaction (with flushing turned on) manually via the Accumulo API https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact(java.lang.String, org.apache.hadoop.io.Text, org.apache.hadoop.io.Text,

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
Dylan, The effect of a major compaction is never seen in queries before the major compaction completes. At the end of the major compaction there is a multi-phase commit which eventually replaces all of the old files with the new file. At that point the major compaction will have completely

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
Dylan, I think the way this is generally solved is by using an idempotent iterator that can be applied at both full major compaction and query scopes to give a consistent view. Aggregation, age-off filtering, and all the other standard iterators have the property that you can leave them in place

Re: Scans during Compaction

2015-02-23 Thread Dylan Hutchison
Thanks Adam and Keith. I see the following as a potential solution that achieves (1) low latency for clients that want to see entries after an iterator and (2) the entries from that iterator persisting in the Accumulo table. 1. Start a major compaction in thread T1 of a client with the

Re: Scans during Compaction

2015-02-23 Thread Josh Elser
Is your iterator which is rewriting data during compaction idempotent? If you can apply the same function (the iterator) multiple times over the data (maybe only in the scan, maybe in the scan and by a major compaction), the only concern is doing a bit more work in the server. Given that you

Re: Scans during Compaction

2015-02-23 Thread Dylan Hutchison
Good suggestion; I will follow up with a design document in the next few days. Creating idempotentency via indicator entries (in the column family, timestamp or something else) is one option to work in an iterator that should run once over a table's entries. I think we may have the opportunity