The `du` command should show in bytes. Keep in mind that Accumulo
compresses data in its files. If the number doesn't match what you see
for the *.rf files in Hadoop, there may be a bug. Please let us know
if you find this to be the case.

On Tue, Apr 14, 2020 at 10:30 PM Niclas Hedhman <[email protected]> wrote:
>
> Yes, a bit of experimentation and I figured that out.
>
> As for the "putIfAbsent"; I can actually figure that out from the data being 
> written in this case, effectively an event store, and all rows starts with a 
> "created" event.
>
> One more small question;
> there is a "du" command, does it really report "bytes" or is it kB, of 
> storage space needed? The number seems too small for bytes, and if in kB then 
> it is over the hdfs physical disk usage...
>
> Cheers
> Niclas
>
> On Tue, Apr 14, 2020 at 9:49 PM Adam J. Shook <[email protected]> wrote:
>>
>> limitVersion = false would *not* set the default VersioningIterator, 
>> effectively keeping every entry you write to Accumulo.  Sounds like it hits 
>> your requirement of "versions never to be removed", though keep in mind that 
>> your static "metadata" qualifier would also never be versioned/deleted.
>>
>> On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman <[email protected]> wrote:
>>>
>>> Ah! I had some misunderstandings implanted in me, and good to get corrected.
>>>
>>> For
>>>
>>> connector.tableOperations.create(String tableName, boolean limitVersion);
>>>
>>>
>>> Will limitVersion=false disable versioning completely and I will always 
>>> only have one version, or will it have a "no limit" and "no removal" policy 
>>> of versions?
>>>
>>> Well, to be clear, I am looking for "versions never to be removed", a 
>>> requirement that made me smile and remember "Accumulo can do that 
>>> automatically", rather than implement that at a higher level.
>>>
>>> Thanks
>>>
>>> On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook <[email protected]> wrote:
>>>>
>>>> Hi Niclas,
>>>>
>>>> 1. Accumulo uses a VersioningIterator for all tables which ensures that 
>>>> you see the latest version of a particular entry, defined as the entry 
>>>> that has the highest value for the timestamp.  Older versions of the same 
>>>> key (row ID + family + qualifier + visibility) are compacted away by 
>>>> Accumulo and will eventually be deleted.  You can set the number of 
>>>> versions you want to keep to something other than the default of 1 (see 
>>>> https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps).
>>>>
>>>> 2. Related to #1, Accumulo will update the value to the latest version of 
>>>> entry.  I believe if you keep writing the same entry with the same data 
>>>> over and over again, you'll see them if you are keeping more than one 
>>>> version of the same entry.  AFAIK there is no "put if absent" behavior 
>>>> without reading for every write.  You can, of course, configure an 
>>>> existing iterator or write your own to achieve whatever logic you want as 
>>>> far as what versions to keep of what columns of your data model.
>>>>
>>>> 3. The "Scanner" will return entries in order.  Related to #1, it will 
>>>> only return the latest version of an entry (by default).  If you are 
>>>> keeping more versions of the same entry, then you would see the newest 
>>>> entry first.  The "BatchScanner" is multi-threaded and communicates to 
>>>> several tablets at once, returning entries out of order.  One common 
>>>> pattern is to use the WholeRowIterator when scanning.  This iterator 
>>>> serializes all entries with the same row into one entry on the server 
>>>> side, then you can deserialize the row on the client side to view the 
>>>> entire contents of a row at once.  The order of the rows themselves is 
>>>> still undefined when using a BatchScanner due to the multi-threaded nature 
>>>> of the scanner.
>>>>
>>>> Hope this helps!
>>>> --Adam
>>>>
>>>> On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman <[email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>> I am steaming new on Accumulo, but tasked to put it into what used to be 
>>>>> Apache Polygene (now in Attic) as a entity store, one that keeps history.
>>>>>
>>>>> I have a couple of questions;
>>>>> 1. Assuming that I can guarantee that no one executes any explicit 
>>>>> deletes, can I rely on the mutation sequences not disappearing over time?
>>>>>
>>>>> 2. Part of storing a row, I have a "metadata" qualifier, that contains 
>>>>> static information. But since I don't know whether the row exists without 
>>>>> reading it first, then IIUIC I will fill the "metadata" with the same 
>>>>> information over and over again.... OR, does Accumulo realize that this 
>>>>> is the same byte[] as before and won't update the value, alternatively 
>>>>> creating a new Key, but pointing to the same Value?  I effectively want a 
>>>>> "putIfAbsent()"
>>>>>
>>>>> 3. The Scanner can fetch multiple rows, and constrained by CF and 
>>>>> qualifier. I think that is quite clear. But what does the iterator() 
>>>>> actually return? I presume that it is many key/value paris, of ALL 
>>>>> timestamped values. But what is the order guarantees here? I get the 
>>>>> impression that within a row->cf->qualifier, the returned values are in 
>>>>> timestamp order, newest first. And I think that within a row, I am 
>>>>> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all 
>>>>> ascending). But am I also guaranteed that the iterator is "done" with a 
>>>>> row when the has changed? Or can rows be interleaved in the iterator?
>>>>>
>>>>> Thanks in advance
>>>>> Niclas

Reply via email to