Hi there- You should read the architecture section...
http://hbase.apache.org/book.html#architecture re: "blobs" http://hbase.apache.org/book.html#supported.datatypes On 7/7/11 3:30 PM, "Mohit Anchlia" <[email protected]> wrote: >Thanks that helps! Just few more questions: > >You mentioned about compactions, when do those occur and what triggers >them? Does it cause additional space usage when that happens, if it >does it would mean you always need to have much more disk then you >really need. > >Since HDFS is mostly write once how are updates/deletes handled? > >Is Hbase also suitable for Blobs? > >On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[email protected]> >wrote: >> Some thoughts off the top of my head. Lars' architecture material >> might/should cover this too. Pretty sure his book will. >> Regarding reads: >> One does not have to read a whole HDFS block. You can request arbitrary >>byte >> ranges with the block, via positioned reads. (It is true also that HDFS >>can >> be improved for better random reading performance in ways not >>necessarily >> yet committed to trunk or especially a 0.20.x branch with append >>support for >> HBase. See https://issues.apache.org/jira/browse/HDFS-1323) >> HBase holds indexes to store files in HDFS in memory. We also open all >>store >> files at the HDFS layer and stash those references. Additionally, users >>can >> specify the use of bloom filters to improve query time performance >>through >> wholesale skipping of HFile reads if they are known not to contain data >>that >> satisfies the query. Bloom filters are held in memory as well. >> So with indexes resident in memory when handling Gets we know the byte >> ranges within HDFS block(s) that contain the data of interest. With >> positioned reads we retrieve only those bytes from a DataNode. With >>optional >> bloomfilters we avoid whole HFiles entirely. >> Regarding writes: >> I think you should consult the bigtable paper again if you are still >>asking >> about the write path. The database is log structured. Writes are >>accumulated >> in memory, and flushed all at once. Later flush files are compacted as >> needed, because as you point out GFS and HDFS are optimized for >>streaming >> sequential reads and writes. >> >> Best regards, >> >> - Andy >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> ________________________________ >> From: Mohit Anchlia <[email protected]> >> To: [email protected]; Andrew Purtell <[email protected]> >> Sent: Thursday, July 7, 2011 11:53 AM >> Subject: Re: Hbase performance with HDFS >> >> I have looked at bigtable and it's ssTables etc. But my question is >> directly related to how it's used with HDFS. HDFS recommends large >> files, bigger blocks, write once and read many sequential reads. But >> accessing small rows and writing small rows is more random and >> different than inherent design of HDFS. How do these 2 go together and >> is able to provide performance. >> >> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[email protected]> >>wrote: >>> Hi Mohit, >>> >>> Start here: http://labs.google.com/papers/bigtable.html >>> >>> Best regards, >>> >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet >>>Hein >>> (via Tom White) >>> >>> >>>>________________________________ >>>>From: Mohit Anchlia <[email protected]> >>>>To: [email protected] >>>>Sent: Thursday, July 7, 2011 11:12 AM >>>>Subject: Hbase performance with HDFS >>>> >>>>I've been trying to understand how Hbase can provide good performance >>>>using HDFS when purpose of HDFS is sequential large block sizes which >>>>is inherently different than of Hbase where it's more random and row >>>>sizes might be very small. >>>> >>>>I am reading this but doesn't answer my question. It does say that >>>>HFile block size is different but how it really works with HDFS is >>>>what I am trying to understand. >>>> >>>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>>> >>>> >>>> >> >> >>
