Resending (see below) due to brief ASF email outage. -- Christopher L Tubbs II http://gravatar.com/ctubbsii
On Mon, Aug 24, 2015 at 2:54 PM, Christopher <[email protected]> wrote: > Accumulo has a few kinds of compression inside RFiles when apply to > visibility expressions. > > First, there's the block compression in the file. This is going to be > gzip, or another supported compression type. But, before that, we have > a couple of ways to reduce the size of the data written: > > 1. if the visibility expression in of one key is exactly the same as > the key which immediately preceded it, VE(K) == VE(K-1), the RFile > writer stores a flag which instructs the reader to re-use the previous > visibility expression, in lieu of the visibility expression itself. > > 2. in the case of non-exact matches, the RFile writer stores the > number of bytes it shares with the previous key as a common prefix, > and then the rest of the bytes which are different. > > (Note: these optimizations actually apply to the row, colfam, colqual, > too, but you specifically asked about colvis.) > > What we don't do is create a lookup table or anything like that. We > think it's really important that the visibility be stored with the > data it protects, so that the visibility is always there for > determining authorization to read it. So, we don't do anything beyond > the few small optimizations during serialization, and certainly > nothing that would separate the data too far from its visibility > expression. > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Mon, Aug 24, 2015 at 12:58 PM, [email protected] > <[email protected]> wrote: >> Hi there, >> >> >> >> My question is how Accumulo compression works in regards to visibility >> labels. >> >> >> >> Is there any difference between ”VeryLargeLargeLarge & AlsoLargeLargeLarge” >> and “A&B” expressions? Will it be internally compiled to a low data >> consuming structure? >> >> >> >> Same question applies to column and qualifier names. Is there any >> difference? >> >> >> >> The reason for this question is simple – we are trying to find out what >> would be the data utilization overhead for different approaches. >> >> >> >> Regards >> >> Roman >> >> Please consider the environment before printing this email. This message >> should be regarded as confidential. If you have received this email in error >> please notify the sender and destroy it immediately. Statements of intent >> shall only become binding when confirmed in hard copy by an authorised >> signatory. The contents of this email may relate to dealings with other >> companies under the control of BAE Systems Applied Intelligence Limited, >> details of which can be found at >> http://www.baesystems.com/Businesses/index.htm.
