limitVersion = false would *not* set the default VersioningIterator,
effectively keeping every entry you write to Accumulo.  Sounds like it hits
your requirement of "versions never to be removed", though keep in mind
that your static "metadata" qualifier would also never be versioned/deleted.

On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman <nic...@apache.org> wrote:

> Ah! I had some misunderstandings implanted in me, and good to get
> corrected.
>
> For
>
> connector.tableOperations.create(String tableName, boolean limitVersion);
>
>
> Will limitVersion=false disable versioning completely and I will always
> only have one version, or will it have a "no limit" and "no removal" policy
> of versions?
>
> Well, to be clear, I am looking for "versions never to be removed", a
> requirement that made me smile and remember "Accumulo can do that
> automatically", rather than implement that at a higher level.
>
> Thanks
>
> On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook <adamjsh...@gmail.com>
> wrote:
>
>> Hi Niclas,
>>
>> 1. Accumulo uses a VersioningIterator for all tables which ensures that
>> you see the latest version of a particular entry, defined as the entry that
>> has the highest value for the timestamp.  Older versions of the same key
>> (row ID + family + qualifier + visibility) are compacted away by Accumulo
>> and will eventually be deleted.  You can set the number of versions you
>> want to keep to something other than the default of 1 (see
>> https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps
>> ).
>>
>> 2. Related to #1, Accumulo will update the value to the latest version of
>> entry.  I believe if you keep writing the same entry with the same data
>> over and over again, you'll see them if you are keeping more than one
>> version of the same entry.  AFAIK there is no "put if absent" behavior
>> without reading for every write.  You can, of course, configure an existing
>> iterator or write your own to achieve whatever logic you want as far as
>> what versions to keep of what columns of your data model.
>>
>> 3. The "Scanner" will return entries in order.  Related to #1, it will
>> only return the latest version of an entry (by default).  If you are
>> keeping more versions of the same entry, then you would see the newest
>> entry first.  The "BatchScanner" is multi-threaded and communicates to
>> several tablets at once, returning entries out of order.  One common
>> pattern is to use the WholeRowIterator when scanning.  This iterator
>> serializes all entries with the same row into one entry on the server side,
>> then you can deserialize the row on the client side to view the entire
>> contents of a row at once.  The order of the rows themselves is still
>> undefined when using a BatchScanner due to the multi-threaded nature of the
>> scanner.
>>
>> Hope this helps!
>> --Adam
>>
>> On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman <nic...@apache.org>
>> wrote:
>>
>>> Hi,
>>> I am steaming new on Accumulo, but tasked to put it into what used to be
>>> Apache Polygene (now in Attic) as a entity store, one that keeps history.
>>>
>>> I have a couple of questions;
>>> 1. Assuming that I can guarantee that no one executes any explicit
>>> deletes, can I rely on the mutation sequences not disappearing over time?
>>>
>>> 2. Part of storing a row, I have a "metadata" qualifier, that contains
>>> static information. But since I don't know whether the row exists without
>>> reading it first, then IIUIC I will fill the "metadata" with the same
>>> information over and over again.... OR, does Accumulo realize that this is
>>> the same byte[] as before and won't update the value, alternatively
>>> creating a new Key, but pointing to the same Value?  I effectively want a
>>> "putIfAbsent()"
>>>
>>> 3. The Scanner can fetch multiple rows, and constrained by CF and
>>> qualifier. I think that is quite clear. But what does the iterator()
>>> actually return? I presume that it is many key/value paris, of ALL
>>> timestamped values. But what is the order guarantees here? I get the
>>> impression that within a row->cf->qualifier, the returned values are in
>>> timestamp order, newest first. And I think that within a row, I am
>>> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all
>>> ascending). But am I also guaranteed that the iterator is "done" with a row
>>> when the has changed? Or can rows be interleaved in the iterator?
>>>
>>> Thanks in advance
>>> Niclas
>>>
>>

Reply via email to