Hi Niclas,

1. Accumulo uses a VersioningIterator for all tables which ensures that you
see the latest version of a particular entry, defined as the entry that has
the highest value for the timestamp.  Older versions of the same key (row
ID + family + qualifier + visibility) are compacted away by Accumulo and
will eventually be deleted.  You can set the number of versions you want to
keep to something other than the default of 1 (see
https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps
).

2. Related to #1, Accumulo will update the value to the latest version of
entry.  I believe if you keep writing the same entry with the same data
over and over again, you'll see them if you are keeping more than one
version of the same entry.  AFAIK there is no "put if absent" behavior
without reading for every write.  You can, of course, configure an existing
iterator or write your own to achieve whatever logic you want as far as
what versions to keep of what columns of your data model.

3. The "Scanner" will return entries in order.  Related to #1, it will only
return the latest version of an entry (by default).  If you are keeping
more versions of the same entry, then you would see the newest entry
first.  The "BatchScanner" is multi-threaded and communicates to several
tablets at once, returning entries out of order.  One common pattern is to
use the WholeRowIterator when scanning.  This iterator serializes all
entries with the same row into one entry on the server side, then you can
deserialize the row on the client side to view the entire contents of a row
at once.  The order of the rows themselves is still undefined when using a
BatchScanner due to the multi-threaded nature of the scanner.

Hope this helps!
--Adam

On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman <nic...@apache.org> wrote:

> Hi,
> I am steaming new on Accumulo, but tasked to put it into what used to be
> Apache Polygene (now in Attic) as a entity store, one that keeps history.
>
> I have a couple of questions;
> 1. Assuming that I can guarantee that no one executes any explicit
> deletes, can I rely on the mutation sequences not disappearing over time?
>
> 2. Part of storing a row, I have a "metadata" qualifier, that contains
> static information. But since I don't know whether the row exists without
> reading it first, then IIUIC I will fill the "metadata" with the same
> information over and over again.... OR, does Accumulo realize that this is
> the same byte[] as before and won't update the value, alternatively
> creating a new Key, but pointing to the same Value?  I effectively want a
> "putIfAbsent()"
>
> 3. The Scanner can fetch multiple rows, and constrained by CF and
> qualifier. I think that is quite clear. But what does the iterator()
> actually return? I presume that it is many key/value paris, of ALL
> timestamped values. But what is the order guarantees here? I get the
> impression that within a row->cf->qualifier, the returned values are in
> timestamp order, newest first. And I think that within a row, I am
> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all
> ascending). But am I also guaranteed that the iterator is "done" with a row
> when the has changed? Or can rows be interleaved in the iterator?
>
> Thanks in advance
> Niclas
>

Reply via email to