Re: Hbase performance with HDFS

Doug Meil Thu, 07 Jul 2011 12:39:09 -0700

Hi there-

You should read the architecture section...


http://hbase.apache.org/book.html#architecture

re:  "blobs"

http://hbase.apache.org/book.html#supported.datatypes




On 7/7/11 3:30 PM, "Mohit Anchlia" <[email protected]> wrote:

>Thanks that helps! Just few more questions:
>
>You mentioned about compactions, when do those occur and what triggers
>them? Does it cause additional space usage when that happens, if it
>does it would mean you always need to have much more disk then you
>really need.
>
>Since HDFS is mostly write once how are updates/deletes handled?
>
>Is Hbase also suitable for Blobs?
>
>On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[email protected]>
>wrote:
>> Some thoughts off the top of my head. Lars' architecture material
>> might/should cover this too. Pretty sure his book will.
>> Regarding reads:
>> One does not have to read a whole HDFS block. You can request arbitrary
>>byte
>> ranges with the block, via positioned reads. (It is true also that HDFS
>>can
>> be improved for better random reading performance in ways not
>>necessarily
>> yet committed to trunk or especially a 0.20.x branch with append
>>support for
>> HBase. See https://issues.apache.org/jira/browse/HDFS-1323)
>> HBase holds indexes to store files in HDFS in memory. We also open all
>>store
>> files at the HDFS layer and stash those references. Additionally, users
>>can
>> specify the use of bloom filters to improve query time performance
>>through
>> wholesale skipping of HFile reads if they are known not to contain data
>>that
>> satisfies the query. Bloom filters are held in memory as well.
>> So with indexes resident in memory when handling Gets we know the byte
>> ranges within HDFS block(s) that contain the data of interest. With
>> positioned reads we retrieve only those bytes from a DataNode. With
>>optional
>> bloomfilters we avoid whole HFiles entirely.
>> Regarding writes:
>> I think you should consult the bigtable paper again if you are still
>>asking
>> about the write path. The database is log structured. Writes are
>>accumulated
>> in memory, and flushed all at once. Later flush files are compacted as
>> needed, because as you point out GFS and HDFS are optimized for
>>streaming
>> sequential reads and writes.
>>
>> Best regards,
>>
>>   - Andy
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>> ________________________________
>> From: Mohit Anchlia <[email protected]>
>> To: [email protected]; Andrew Purtell <[email protected]>
>> Sent: Thursday, July 7, 2011 11:53 AM
>> Subject: Re: Hbase performance with HDFS
>>
>> I have looked at bigtable and it's ssTables etc. But my question is
>> directly related to how it's used with HDFS. HDFS recommends large
>> files, bigger blocks, write once and read many sequential reads. But
>> accessing small rows and writing small rows is more random and
>> different than inherent design of HDFS. How do these 2 go together and
>> is able to provide performance.
>>
>> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[email protected]>
>>wrote:
>>> Hi Mohit,
>>>
>>> Start here: http://labs.google.com/papers/bigtable.html
>>>
>>> Best regards,
>>>
>>>
>>>     - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>Hein
>>> (via Tom White)
>>>
>>>
>>>>________________________________
>>>>From: Mohit Anchlia <[email protected]>
>>>>To: [email protected]
>>>>Sent: Thursday, July 7, 2011 11:12 AM
>>>>Subject: Hbase performance with HDFS
>>>>
>>>>I've been trying to understand how Hbase can provide good performance
>>>>using HDFS when purpose of HDFS is sequential large block sizes which
>>>>is inherently different than of Hbase where it's more random and row
>>>>sizes might be very small.
>>>>
>>>>I am reading this but doesn't answer my question. It does say that
>>>>HFile block size is different but how it really works with HDFS is
>>>>what I am trying to understand.
>>>>
>>>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>>>
>>>>
>>>>
>>
>>
>>

Re: Hbase performance with HDFS

Reply via email to