Re: Hbase performance with HDFS

Mohit Anchlia Thu, 07 Jul 2011 15:17:53 -0700

Got it. I was thinking compaction happens local to the node. But it
makes sense from what you have explained.


On Thu, Jul 7, 2011 at 3:12 PM, Andrew Purtell <[email protected]> wrote:
>> 1) When compactions occur on Node A would it also include b2 and b3
>> which is actually a redundant copy? My guess is yes.
>
>
> I don't follow your question.
>
> HDFS files are read by opening an input stream. This stream is fed data from 
> block replicas chosen at random. One block replica for each block. The reader 
> doesn't see "redundant copies".
>
>> 2) Now compaction occurs and creates HFile3 which as you said is
>> replicated. But what happens to HFile1 and HFile2? I am assuming it
>> gets deleted.
>
>
> They are deleted.
>
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
>
>
> ----- Original Message -----
>> From: Mohit Anchlia <[email protected]>
>> To: [email protected]
>> Cc:
>> Sent: Thursday, July 7, 2011 3:02 PM
>> Subject: Re: Hbase performance with HDFS
>>
>>T hanks! I understand what you mean however I have little confusion.
>> Does it mean there are unused block sitting around? For eg:
>>
>> HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
>> B:b1,(b2),b3 and Node C:b1,b2,(b3).
>>
>> HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
>> B:b1,(b2),b3 and Node C:b1,b2,(b3)
>>
>> I have 2 questions:
>>
>> 1) When compactions occur on Node A would it also include b2 and b3
>> which is actually a redundant copy? My guess is yes.
>> 2) Now compaction occurs and creates HFile3 which as you said is
>> replicated. But what happens to HFile1 and HFile2? I am assuming it
>> gets deleted.
>>
>> Thanks for everyones patience!
>>
>> On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[email protected]> wrote:
>>>  The nice part of using HDFS as the file system is that the replication is
>> taken care of by the file system.  So, when the compaction finishes, that 
>> means
>> the replication has already taken place.
>>>
>>>  -----Original Message-----
>>>  From: Mohit Anchlia [mailto:[email protected]]
>>>  Sent: Thursday, July 07, 2011 2:02 PM
>>>  To: [email protected]; Andrew Purtell
>>>  Subject: Re: Hbase performance with HDFS
>>>
>>>  Thanks Andrew. Really helpful. I think I have one more question right
>>>  now :) Underneath HDFS replicates blocks by default 3. Not sure how it
>>>  relates to HFile and compactions. When compaction occurs is it also
>>>  happening on the replica blocks from other nodes? If not then how does
>>>  it work when one node fails.
>>>
>>>  On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[email protected]>
>> wrote:
>>>>>  You mentioned about compactions, when do those occur and what
>> triggers
>>>>>  them?
>>>>
>>>>  Compactions are triggered by an algorithm that monitors the number of
>> flush files in a store and the size of them, and is configurable in several
>> dimensions.
>>>>
>>>>>  Does it cause additional space usage when that happens
>>>>
>>>>  Yes.
>>>>
>>>>>  if it
>>>>>  does it would mean you always need to have much more disk then you
>>>>>  really need.
>>>>
>>>>
>>>>  Not all regions are compacted at once. Each region by default is
>> constrained to 256 MB. Not all regions will hold the full amount of data. The
>> result is not a perfect copy (doubling) if some data has been deleted or are
>> associated with TTLs that have expired. The merge sorted result is moved into
>> place and the old files are deleted as soon as the compaction completes. So 
>> how
>> much more is "much more"? You can't write to any kind of data
>> store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or...
>>>>
>>>>>  Since HDFS is mostly write once how are updates/deletes handled?
>>>>
>>>>
>>>>  Not mostly, only write once.
>>>>
>>>>  From the BigTable paper, section 5.3: "A valid read operation is
>> executed on a merged view of the sequence of SSTables and the memtable. Since
>> the SSTables and the memtable are lexicographically sorted data structures, 
>> the
>> merged view can be formed efficiently." So what this means is all the store
>> files and the memstore serve effectively as change logs sorted in reverse
>> chronological order.
>>>>
>>>>  Deletes are just another write, but one that writes tombstones
>> "covering" data with older timestamps.
>>>>
>>>>  When serving queries, HBase searches store files back in time until it
>> finds data at the coordinates requested or a tombstone.
>>>>
>>>>  The process of compaction not only merge sorts a bunch of accumulated
>> store files (from flushes) into fewer store files (or one) for read 
>> efficiency,
>> it also performs housekeeping, dropping data "covered" by the delete
>> tombstones. Incidentally this is also how TTLs are supported: expired values 
>> are
>> dropped as well.
>>>>
>>>>  Best regards,
>>>>
>>>>     - Andy
>>>>
>>>>  Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein (via Tom White)
>>>>
>>>>
>>>>> ________________________________
>>>>> From: Mohit Anchlia <[email protected]>
>>>>> To: Andrew Purtell <[email protected]>
>>>>> Cc: "[email protected]" <[email protected]>
>>>>> Sent: Thursday, July 7, 2011 12:30 PM
>>>>> Subject: Re: Hbase performance with HDFS
>>>>>
>>>>> Thanks that helps! Just few more questions:
>>>>>
>>>>> You mentioned about compactions, when do those occur and what
>> triggers
>>>>> them? Does it cause additional space usage when that happens, if it
>>>>> does it would mean you always need to have much more disk then you
>>>>> really need.
>>>>>
>>>>> Since HDFS is mostly write once how are updates/deletes handled?
>>>>>
>>>>> Is Hbase also suitable for Blobs?
>>>>>
>>>>> On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell
>> <[email protected]> wrote:
>>>>>>  Some thoughts off the top of my head. Lars' architecture
>> material
>>>>>>  might/should cover this too. Pretty sure his book will.
>>>>>>  Regarding reads:
>>>>>>  One does not have to read a whole HDFS block. You can request
>> arbitrary byte
>>>>>>  ranges with the block, via positioned reads. (It is true also
>> that HDFS can
>>>>>>  be improved for better random reading performance in ways not
>> necessarily
>>>>>>  yet committed to trunk or especially a 0.20.x branch with
>> append support for
>>>>>>  HBase. See https://issues.apache.org/jira/browse/HDFS-1323)
>>>>>>  HBase holds indexes to store files in HDFS in memory. We also
>> open all store
>>>>>>  files at the HDFS layer and stash those references.
>> Additionally, users can
>>>>>>  specify the use of bloom filters to improve query time
>> performance through
>>>>>>  wholesale skipping of HFile reads if they are known not to
>> contain data that
>>>>>>  satisfies the query. Bloom filters are held in memory as well.
>>>>>>  So with indexes resident in memory when handling Gets we know
>> the byte
>>>>>>  ranges within HDFS block(s) that contain the data of interest.
>> With
>>>>>>  positioned reads we retrieve only those bytes from a DataNode.
>> With optional
>>>>>>  bloomfilters we avoid whole HFiles entirely.
>>>>>>  Regarding writes:
>>>>>>  I think you should consult the bigtable paper again if you are
>> still asking
>>>>>>  about the write path. The database is log structured. Writes
>> are accumulated
>>>>>>  in memory, and flushed all at once. Later flush files are
>> compacted as
>>>>>>  needed, because as you point out GFS and HDFS are optimized for
>> streaming
>>>>>>  sequential reads and writes.
>>>>>>
>>>>>>  Best regards,
>>>>>>
>>>>>>    - Andy
>>>>>>  Problems worthy of attack prove their worth by hitting back. -
>> Piet Hein
>>>>>>  (via Tom White)
>>>>>>
>>>>>>  ________________________________
>>>>>>  From: Mohit Anchlia <[email protected]>
>>>>>>  To: [email protected]; Andrew Purtell
>> <[email protected]>
>>>>>>  Sent: Thursday, July 7, 2011 11:53 AM
>>>>>>  Subject: Re: Hbase performance with HDFS
>>>>>>
>>>>>>  I have looked at bigtable and it's ssTables etc. But my
>> question is
>>>>>>  directly related to how it's used with HDFS. HDFS
>> recommends large
>>>>>>  files, bigger blocks, write once and read many sequential
>> reads. But
>>>>>>  accessing small rows and writing small rows is more random and
>>>>>>  different than inherent design of HDFS. How do these 2 go
>> together and
>>>>>>  is able to provide performance.
>>>>>>
>>>>>>  On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell
>> <[email protected]> wrote:
>>>>>>>  Hi Mohit,
>>>>>>>
>>>>>>>  Start here: http://labs.google.com/papers/bigtable.html
>>>>>>>
>>>>>>>  Best regards,
>>>>>>>
>>>>>>>
>>>>>>>      - Andy
>>>>>>>
>>>>>>>  Problems worthy of attack prove their worth by hitting
>> back. - Piet Hein
>>>>>>>  (via Tom White)
>>>>>>>
>>>>>>>
>>>>>>>> ________________________________
>>>>>>>> From: Mohit Anchlia <[email protected]>
>>>>>>>> To: [email protected]
>>>>>>>> Sent: Thursday, July 7, 2011 11:12 AM
>>>>>>>> Subject: Hbase performance with HDFS
>>>>>>>>
>>>>>>>> I've been trying to understand how Hbase can provide
>> good performance
>>>>>>>> using HDFS when purpose of HDFS is sequential large
>> block sizes which
>>>>>>>> is inherently different than of Hbase where it's
>> more random and row
>>>>>>>> sizes might be very small.
>>>>>>>>
>>>>>>>> I am reading this but doesn't answer my question. It
>> does say that
>>>>>>>> HFile block size is different but how it really works
>> with HDFS is
>>>>>>>> what I am trying to understand.
>>>>>>>>
>>>>>>>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hbase performance with HDFS

Reply via email to