Re: visibility expression & column compression

Josh Elser Mon, 24 Aug 2015 12:37:32 -0700

Visibility labels are not replaced with any other types of identifierswhich means that, considering nothing else, a visibility label which has20 characters will take up more space than one that only has 2characters. This is a conscious decision to make sure it is completelyobvious what the label on some data is without an external lookup table.

Accumulo uses two strategies to reduce the size of data on disk: runlength encoding and a compression algorithm. The run-length encoding isused to prevent common prefixes in a sequential Keys from being storedmultiple times. For example, given the following Keys


row1 cf:cq []
row2 cf:cq []

the RLE would prevent "row" from being stored a second time. Familiesand qualifiers would only be replaced with a back-reference if there isa common Key-prefix that extends into the family or qualifier.

A compression algorithm, GZ by default, is then applied to the result ofthe encoding. Snappy is another common compression algorithm used byAccumulo instances.


- Josh

[email protected] wrote:

Hi there,

My question is how Accumulo compression works in regards to visibility
labels.

Is there any difference between ”VeryLargeLargeLarge &
AlsoLargeLargeLarge” and “A&B” expressions? Will it be internally
compiled to a low data consuming structure?

Same question applies to column and qualifier names. Is there any
difference?

The reason for this question is simple – we are trying to find out what
would be the data utilization overhead for different approaches.

Regards

Roman

Please consider the environment before printing this email. This message
should be regarded as confidential. If you have received this email in
error please notify the sender and destroy it immediately. Statements of
intent shall only become binding when confirmed in hard copy by an
authorised signatory. The contents of this email may relate to dealings
with other companies under the control of BAE Systems Applied
Intelligence Limited, details of which can be found at
http://www.baesystems.com/Businesses/index.htm.

Re: visibility expression & column compression

Reply via email to