The `du` command should show in bytes. Keep in mind that Accumulo compresses data in its files. If the number doesn't match what you see for the *.rf files in Hadoop, there may be a bug. Please let us know if you find this to be the case.
On Tue, Apr 14, 2020 at 10:30 PM Niclas Hedhman <[email protected]> wrote: > > Yes, a bit of experimentation and I figured that out. > > As for the "putIfAbsent"; I can actually figure that out from the data being > written in this case, effectively an event store, and all rows starts with a > "created" event. > > One more small question; > there is a "du" command, does it really report "bytes" or is it kB, of > storage space needed? The number seems too small for bytes, and if in kB then > it is over the hdfs physical disk usage... > > Cheers > Niclas > > On Tue, Apr 14, 2020 at 9:49 PM Adam J. Shook <[email protected]> wrote: >> >> limitVersion = false would *not* set the default VersioningIterator, >> effectively keeping every entry you write to Accumulo. Sounds like it hits >> your requirement of "versions never to be removed", though keep in mind that >> your static "metadata" qualifier would also never be versioned/deleted. >> >> On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman <[email protected]> wrote: >>> >>> Ah! I had some misunderstandings implanted in me, and good to get corrected. >>> >>> For >>> >>> connector.tableOperations.create(String tableName, boolean limitVersion); >>> >>> >>> Will limitVersion=false disable versioning completely and I will always >>> only have one version, or will it have a "no limit" and "no removal" policy >>> of versions? >>> >>> Well, to be clear, I am looking for "versions never to be removed", a >>> requirement that made me smile and remember "Accumulo can do that >>> automatically", rather than implement that at a higher level. >>> >>> Thanks >>> >>> On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook <[email protected]> wrote: >>>> >>>> Hi Niclas, >>>> >>>> 1. Accumulo uses a VersioningIterator for all tables which ensures that >>>> you see the latest version of a particular entry, defined as the entry >>>> that has the highest value for the timestamp. Older versions of the same >>>> key (row ID + family + qualifier + visibility) are compacted away by >>>> Accumulo and will eventually be deleted. You can set the number of >>>> versions you want to keep to something other than the default of 1 (see >>>> https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps). >>>> >>>> 2. Related to #1, Accumulo will update the value to the latest version of >>>> entry. I believe if you keep writing the same entry with the same data >>>> over and over again, you'll see them if you are keeping more than one >>>> version of the same entry. AFAIK there is no "put if absent" behavior >>>> without reading for every write. You can, of course, configure an >>>> existing iterator or write your own to achieve whatever logic you want as >>>> far as what versions to keep of what columns of your data model. >>>> >>>> 3. The "Scanner" will return entries in order. Related to #1, it will >>>> only return the latest version of an entry (by default). If you are >>>> keeping more versions of the same entry, then you would see the newest >>>> entry first. The "BatchScanner" is multi-threaded and communicates to >>>> several tablets at once, returning entries out of order. One common >>>> pattern is to use the WholeRowIterator when scanning. This iterator >>>> serializes all entries with the same row into one entry on the server >>>> side, then you can deserialize the row on the client side to view the >>>> entire contents of a row at once. The order of the rows themselves is >>>> still undefined when using a BatchScanner due to the multi-threaded nature >>>> of the scanner. >>>> >>>> Hope this helps! >>>> --Adam >>>> >>>> On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> I am steaming new on Accumulo, but tasked to put it into what used to be >>>>> Apache Polygene (now in Attic) as a entity store, one that keeps history. >>>>> >>>>> I have a couple of questions; >>>>> 1. Assuming that I can guarantee that no one executes any explicit >>>>> deletes, can I rely on the mutation sequences not disappearing over time? >>>>> >>>>> 2. Part of storing a row, I have a "metadata" qualifier, that contains >>>>> static information. But since I don't know whether the row exists without >>>>> reading it first, then IIUIC I will fill the "metadata" with the same >>>>> information over and over again.... OR, does Accumulo realize that this >>>>> is the same byte[] as before and won't update the value, alternatively >>>>> creating a new Key, but pointing to the same Value? I effectively want a >>>>> "putIfAbsent()" >>>>> >>>>> 3. The Scanner can fetch multiple rows, and constrained by CF and >>>>> qualifier. I think that is quite clear. But what does the iterator() >>>>> actually return? I presume that it is many key/value paris, of ALL >>>>> timestamped values. But what is the order guarantees here? I get the >>>>> impression that within a row->cf->qualifier, the returned values are in >>>>> timestamp order, newest first. And I think that within a row, I am >>>>> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all >>>>> ascending). But am I also guaranteed that the iterator is "done" with a >>>>> row when the has changed? Or can rows be interleaved in the iterator? >>>>> >>>>> Thanks in advance >>>>> Niclas
