Good suggestion; I will follow up with a design document in the next few
days.
Creating idempotentency via indicator entries (in the column family,
timestamp or something else) is one option to work in an iterator that
should run once over a table's entries. I think we may have the
opportunity to solve a more general problem---scans merging data from
multiple table sources with user-defined merge & compute functions---in
addition to my use case by re-approaching the problem. Think
selective-scan->transform->join->transform->write-out. Analytics on the
server.
My specific use case is table-table multiplication, treating table rows,
column qualifiers and values as the components of a matrix. We view a
table as a /sparse/ matrix by treating non-present (row, colQ, value)
entries as zeros in the matrix. Accumulo offers an advantage when we
only want to run /on selected ranges/ from the input tables, as opposed
to running on whole tables where Yarn/Mapreduce may work better. The
table-table multiplication should run on tables in Accumulo, the result
/persisting/ to Accumulo, so that we never need return values to the
client. We value /low latency/ over high throughput, so that we can
perform multiplications interactively. The user should have a way to
/monitor/ multiplication progress, perhaps by a live count of the number
of entries processed or (harder) a live sample of result entries from
the multiplication. The user should be able to /stop/ an operation
midway once he decides enough entries processed. In addition to
interactivity, we may want to perform multiplications /in series/ and
parallel. They form base /building blocks/ for higher-level algorithms.
I promise I will write these details up more formally, including how I
made them work so far and putting them in more general context. Will
post in a separate thread.
Regards,
Dylan Hutchison
On Mon, Feb 23, 2015 at 2:16 PM, Adam Fuchs <[email protected]
<mailto:[email protected]>> wrote:
Dylan,
I think the way this is generally solved is by using an idempotent
iterator that can be applied at both full major compaction and query
scopes to give a consistent view. Aggregation, age-off filtering,
and all the other "standard" iterators have the property that you
can leave them in place and get a consistent answer even if they are
applied multiple times. Major compaction and query-time iterators
are even simpler than the general case, since you don't really need
to worry about partial views of the underlying data. In your case I
think you are trying to use an iterator that needs to be applied
exactly once to a complete stream of data (either at query time or
major compaction time). What we should probably do is look at
options for more generally supporting that type of iterator. You
could help us a ton by describing exactly what you want your
iterator to do, and we can all propose a few ideas for how this
might be implemented. Here are a couple off the top of my head:
1. If you can reform your iterator so that it is idempotent then you
can apply it liberally. This might be possible using some sort of
flag that the major compactor puts in the data and the query-time
iterator looks for to determine if the compaction has already
happened. We often use version numbers in column families to this
effect. Special row keys at the beginning of the tablet might also
be an option. This would be doable without changes to Accumulo.
2. We could build a mechanism into core accumulo that applies an
iterator with exactly once semantics, such that the user submits a
transformation as an iterator and it gets applied similarly to how
you described. The query-time reading of results of the major
compaction might be overkill, but that would be a possible
optimization that we could think about engineering in a second pass.
Adam
On Mon, Feb 23, 2015 at 1:42 PM, Dylan Hutchison
<[email protected] <mailto:[email protected]>> wrote:
Thanks Adam and Keith.
I see the following as a potential solution that achieves (1)
low latency for clients that want to see entries after an
iterator and (2) the entries from that iterator persisting in
the Accumulo table.
1. Start a major compaction in thread T1 of a client with the
iterator set, blocking until the compaction completes.
2. Start scanning in thread T2 of the client with the same
iterator now set at scan-time scope. Use an isolated scanner
to make sure we do not read the results of the major
compaction committing, though this is not full-proof due to
timing and that the isolated scanner is row-wise.
3. Eventually, T1 unblocks and signals that the compaction
completes. T1 interrupts T2.
4. Thread T2 stops scanning, removes the scan-time iterator,
and starts scanning again at the point it last left off, now
seeing the results of the major compaction which already
passed through the iterator.
The whole scheme is only necessary if the client wants results
faster than the major compaction completes. A disadvantage is
duplicated work -- the iterator runs at scan-time and at
compaction-time until the compaction finishes. This may strain
server resources.
Will think about other schemes. If only we could attach an
apply-once scan-time iterator, that also persists its results to
an Accumulo table in a streaming fashion. Or on the flip side,
a one-time compaction iterator that streams results, such that
we could scan from them right away instead of needing to wait
for the entire compaction to complete.
Regards,
Dylan Hutchison
On Mon, Feb 23, 2015 at 12:48 PM, Adam Fuchs <[email protected]
<mailto:[email protected]>> wrote:
Dylan,
The effect of a major compaction is never seen in queries
before the major compaction completes. At the end of the
major compaction there is a multi-phase commit which
eventually replaces all of the old files with the new file.
At that point the major compaction will have completely
processed the given tablet's data (although other tablets
may not be synchronized). For long-running non-isolated
queries (more than a second or so) the iterator tree is
occasionally rebuilt and re-seeked. When it is rebuilt it
will use whatever is the latest file set, which will include
the results of a completed major compaction.
In your case #1 that's a tricky guarantee to make across a
whole tablet, but it can be made one row at a time by using
an isolated iterator.
To make your case #2 work, you probably will have to
implement some higher-level logic to only start your query
after the major compaction has completed, using an external
mechanism to track the completion of your transformation.
Adam
On Mon, Feb 23, 2015 at 12:35 PM, Dylan Hutchison
<[email protected] <mailto:[email protected]>> wrote:
Hello all,
When I initiate a full major compaction (with flushing
turned on) manually via the Accumulo API
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact(java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20java.util.List,%20boolean,%20boolean)>,
how does the table appear to
1. clients that started scanning the table before the
major compaction began;
2. clients that start scanning during the major compaction?
I'm interested in the case where there is an iterator
attached to the full major compaction that modifies
entries (respecting sorted order of entries).
The best possible answer for my use case, with case #2
more important than case #1 and *low latency* more
important than high throughput, is that
1. clients that started scanning before the compaction
began would not see entries altered by
the compaction-time iterator;
2. clients that start scanning during the major
compaction stream back entries as they finish
processing from the major compaction, such that the
clients /only/ see entries that have passed through
the compaction-time iterator.
How accurate are these descriptions? If #2 really were
as I would like it to be, then a scan on the range
(-inf,+inf) started after compaction would "monitor
compaction progress," such that the first entry batch
transmits to the scanner as soon as it is available from
the major compaction, and the scanner finishes (receives
all entries) exactly when the compaction finishes. If
this is not possible, I may make something to that
effect by calling the blocking version of compact().
Bonus: how does cancelCompaction()
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#cancelCompaction(java.lang.String)>
affect clients scanning in case #1 and case #2?
Regards,
Dylan Hutchison
--
www.cs.stevens.edu/~dhutchis <http://www.cs.stevens.edu/~dhutchis>
--
www.cs.stevens.edu/~dhutchis <http://www.cs.stevens.edu/~dhutchis>