I found this useful article that explains the internal storage of HFile

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
<http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html>

On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <[email protected]> wrote:

> I also found this informative article
>
> http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
>
>
>
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
> the key value pair be
> eg column family1 with one qualifier 1 with 2 versions
>
> key1 : rowkey1+column family1:qualifier1+timestamp1
> value1: corresponding cell value1
> key2 :  rowkey1+column family1:qualifier1+timestamp2
> value2: corresponding cell value 2
> key3:  rowkey2+column family1:qualifier1+timestamp1
> value3: corresponding cell value 3
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <[email protected]>wrote:
>
>> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
>> might help.
>>
>> Viv
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[email protected]>wrote:
>>
>>> My fellow superb hbase experts,
>>>
>>> Looking at the HFile specs and have some questions.
>>> How is a particular table cell in a HBase table being represented in the
>>> HFile? Does the key of the key value pair represent the rowkey+column
>>> family:qualifier+timestamp and the value represent the corresponding cell
>>> value? If so, to read a row, multiple key/value pair reads have to be
>>> done?
>>>
>>> Thank you :)
>>>
>>>
>>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[email protected]>
>>> wrote:
>>>
>>> > Thank you, I will definitely take a look. Also, the TFile spec below
>>> helps
>>> > me to understand more,
>>> > what an exciting work !
>>> >
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > <
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[email protected]>
>>> wrote:
>>> >
>>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>>> >> > I am browsing through the hadoop.io package and was wondering what
>>> >> other
>>> >> > file formats are available in hadoop other than SequenceFile and
>>> TFile?
>>> >> > Is all data written through hadoop including those from hbase saved
>>> in
>>> >> the
>>> >> > above formats? It seems like SequenceFile is in key value pair
>>> format.
>>> >>
>>> >> Avro includes a file format that works with Hadoop.
>>> >>
>>> >>
>>> >>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>> >>
>>> >> Doug
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Reply via email to