I believe the relative key encoding only occurs on keys which are adjacent. So a cf in different rows well not have relative encoding unless there was nothing else between them in that range.
Now, keep in mind that after relative key encoding, we still run a compression algorithm so highly repetitive, non-adjacent keys should still end up tiny on disk. Sent from my phone, so pardon the typos and brevity. On Jun 25, 2012 7:54 PM, "Cardon, Tejay E" <[email protected]> wrote: > Thanks Eric. That helps. With regards to the repeating key piece, does > this only happen for successive cells? In other words, if I use the same > cf in every row of a table, does that cf get repeated each time, or does > this cf repetition work across rows. I hope that makes sense.**** > > ** ** > > Thanks,**** > > ** ** > > Tejay**** > > ** ** > > *From:* Eric Newton [mailto:[email protected]] > *Sent:* Monday, June 25, 2012 4:46 PM > *To:* [email protected] > *Subject:* EXTERNAL: Re: RFile details**** > > ** ** > > Here's my high-level understanding. Let me know which aspect you would > like to know more about.**** > > ** ** > > RFile is built on top of BCFile, so you would need to dig up documentation > on that. Most of the compression is performed at that layer.**** > > ** ** > > However, RFile uses a few bits of each key/value to encode any repeating > row, cf, cq, cv information. This is helpful when a file contains just one > row, or when most of the data has the same visibility.**** > > ** ** > > BTW, "R" in RFile, stands for "Relative Key."**** > > ** ** > > Column families are grouped together into locality groups, and those > families falling outside of any defined family group go in the "default" > locality group. Column family -> locality group mappings are written to > metadata at the end of the RFile. Locality groups are stored in successive > sections of a file. Input is re-scanned multiple times during compactions > to produce locality groups that match a tables family->group mapping at the > time of the compaction.**** > > ** ** > > In 1.3, index information is stored in one large block at the end of the > file. In 1.4, the index blocks are hierarchical, to support incremental > loading of the index. **** > > ** ** > > -Eric**** > > ** ** > > On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <[email protected]> > wrote:**** > > All,**** > > Can anyone point me to a design paper or other source of > some detail on how RFiles work? I’m curious about the compression under > the covers as well as the layout on disk of column families, etc.**** > > **** > > Thanks,**** > > Tejay Cardon**** > > ** ** >
