Visibility labels are not replaced with any other types of identifiers which means that, considering nothing else, a visibility label which has 20 characters will take up more space than one that only has 2 characters. This is a conscious decision to make sure it is completely obvious what the label on some data is without an external lookup table.

Accumulo uses two strategies to reduce the size of data on disk: run length encoding and a compression algorithm. The run-length encoding is used to prevent common prefixes in a sequential Keys from being stored multiple times. For example, given the following Keys

row1 cf:cq []
row2 cf:cq []

the RLE would prevent "row" from being stored a second time. Families and qualifiers would only be replaced with a back-reference if there is a common Key-prefix that extends into the family or qualifier.

A compression algorithm, GZ by default, is then applied to the result of the encoding. Snappy is another common compression algorithm used by Accumulo instances.

- Josh

[email protected] wrote:
Hi there,

My question is how Accumulo compression works in regards to visibility
labels.

Is there any difference between ”VeryLargeLargeLarge &
AlsoLargeLargeLarge” and “A&B” expressions? Will it be internally
compiled to a low data consuming structure?

Same question applies to column and qualifier names. Is there any
difference?

The reason for this question is simple – we are trying to find out what
would be the data utilization overhead for different approaches.

Regards

Roman

Please consider the environment before printing this email. This message
should be regarded as confidential. If you have received this email in
error please notify the sender and destroy it immediately. Statements of
intent shall only become binding when confirmed in hard copy by an
authorised signatory. The contents of this email may relate to dealings
with other companies under the control of BAE Systems Applied
Intelligence Limited, details of which can be found at
http://www.baesystems.com/Businesses/index.htm.

Reply via email to