Thanks Eric.  That helps.  With regards to the repeating key piece, does this 
only happen for successive cells?  In other words, if I use the same cf in 
every row of a table, does that cf get repeated each time, or does this cf 
repetition work across rows.  I hope that makes sense.

Thanks,

Tejay

From: Eric Newton [mailto:[email protected]]
Sent: Monday, June 25, 2012 4:46 PM
To: [email protected]
Subject: EXTERNAL: Re: RFile details

Here's my high-level understanding.  Let me know which aspect you would like to 
know more about.

RFile is built on top of BCFile, so you would need to dig up documentation on 
that. Most of the compression is performed at that layer.

However, RFile uses a few bits of each key/value to encode any repeating row, 
cf, cq, cv information.  This is helpful when a file contains just one row, or 
when most of the data has the same visibility.

BTW, "R" in RFile, stands for "Relative Key."

Column families are grouped together into locality groups, and those families 
falling outside of any defined family group go in the "default" locality group. 
 Column family -> locality group mappings are written to metadata at the end of 
the RFile.  Locality groups are stored in successive sections of a file.   
Input is re-scanned multiple times during compactions to produce locality 
groups that match a tables family->group mapping at the time of the compaction.

In 1.3, index information is stored in one large block at the end of the file.  
In 1.4, the index blocks are hierarchical, to support incremental loading of 
the index.

-Eric

On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E 
<[email protected]<mailto:[email protected]>> wrote:
All,
                Can anyone point me to a design paper or other source of some 
detail on how RFiles work?  I'm curious about the compression under the covers 
as well as the layout on disk of column families, etc.

Thanks,
Tejay Cardon

Reply via email to