Got it. I was thinking compaction happens local to the node. But it makes sense from what you have explained.
On Thu, Jul 7, 2011 at 3:12 PM, Andrew Purtell <[email protected]> wrote: >> 1) When compactions occur on Node A would it also include b2 and b3 >> which is actually a redundant copy? My guess is yes. > > > I don't follow your question. > > HDFS files are read by opening an input stream. This stream is fed data from > block replicas chosen at random. One block replica for each block. The reader > doesn't see "redundant copies". > >> 2) Now compaction occurs and creates HFile3 which as you said is >> replicated. But what happens to HFile1 and HFile2? I am assuming it >> gets deleted. > > > They are deleted. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > > ----- Original Message ----- >> From: Mohit Anchlia <[email protected]> >> To: [email protected] >> Cc: >> Sent: Thursday, July 7, 2011 3:02 PM >> Subject: Re: Hbase performance with HDFS >> >>T hanks! I understand what you mean however I have little confusion. >> Does it mean there are unused block sitting around? For eg: >> >> HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node >> B:b1,(b2),b3 and Node C:b1,b2,(b3). >> >> HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node >> B:b1,(b2),b3 and Node C:b1,b2,(b3) >> >> I have 2 questions: >> >> 1) When compactions occur on Node A would it also include b2 and b3 >> which is actually a redundant copy? My guess is yes. >> 2) Now compaction occurs and creates HFile3 which as you said is >> replicated. But what happens to HFile1 and HFile2? I am assuming it >> gets deleted. >> >> Thanks for everyones patience! >> >> On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[email protected]> wrote: >>> The nice part of using HDFS as the file system is that the replication is >> taken care of by the file system. So, when the compaction finishes, that >> means >> the replication has already taken place. >>> >>> -----Original Message----- >>> From: Mohit Anchlia [mailto:[email protected]] >>> Sent: Thursday, July 07, 2011 2:02 PM >>> To: [email protected]; Andrew Purtell >>> Subject: Re: Hbase performance with HDFS >>> >>> Thanks Andrew. Really helpful. I think I have one more question right >>> now :) Underneath HDFS replicates blocks by default 3. Not sure how it >>> relates to HFile and compactions. When compaction occurs is it also >>> happening on the replica blocks from other nodes? If not then how does >>> it work when one node fails. >>> >>> On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[email protected]> >> wrote: >>>>> You mentioned about compactions, when do those occur and what >> triggers >>>>> them? >>>> >>>> Compactions are triggered by an algorithm that monitors the number of >> flush files in a store and the size of them, and is configurable in several >> dimensions. >>>> >>>>> Does it cause additional space usage when that happens >>>> >>>> Yes. >>>> >>>>> if it >>>>> does it would mean you always need to have much more disk then you >>>>> really need. >>>> >>>> >>>> Not all regions are compacted at once. Each region by default is >> constrained to 256 MB. Not all regions will hold the full amount of data. The >> result is not a perfect copy (doubling) if some data has been deleted or are >> associated with TTLs that have expired. The merge sorted result is moved into >> place and the old files are deleted as soon as the compaction completes. So >> how >> much more is "much more"? You can't write to any kind of data >> store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... >>>> >>>>> Since HDFS is mostly write once how are updates/deletes handled? >>>> >>>> >>>> Not mostly, only write once. >>>> >>>> From the BigTable paper, section 5.3: "A valid read operation is >> executed on a merged view of the sequence of SSTables and the memtable. Since >> the SSTables and the memtable are lexicographically sorted data structures, >> the >> merged view can be formed efficiently." So what this means is all the store >> files and the memstore serve effectively as change logs sorted in reverse >> chronological order. >>>> >>>> Deletes are just another write, but one that writes tombstones >> "covering" data with older timestamps. >>>> >>>> When serving queries, HBase searches store files back in time until it >> finds data at the coordinates requested or a tombstone. >>>> >>>> The process of compaction not only merge sorts a bunch of accumulated >> store files (from flushes) into fewer store files (or one) for read >> efficiency, >> it also performs housekeeping, dropping data "covered" by the delete >> tombstones. Incidentally this is also how TTLs are supported: expired values >> are >> dropped as well. >>>> >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet >> Hein (via Tom White) >>>> >>>> >>>>> ________________________________ >>>>> From: Mohit Anchlia <[email protected]> >>>>> To: Andrew Purtell <[email protected]> >>>>> Cc: "[email protected]" <[email protected]> >>>>> Sent: Thursday, July 7, 2011 12:30 PM >>>>> Subject: Re: Hbase performance with HDFS >>>>> >>>>> Thanks that helps! Just few more questions: >>>>> >>>>> You mentioned about compactions, when do those occur and what >> triggers >>>>> them? Does it cause additional space usage when that happens, if it >>>>> does it would mean you always need to have much more disk then you >>>>> really need. >>>>> >>>>> Since HDFS is mostly write once how are updates/deletes handled? >>>>> >>>>> Is Hbase also suitable for Blobs? >>>>> >>>>> On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell >> <[email protected]> wrote: >>>>>> Some thoughts off the top of my head. Lars' architecture >> material >>>>>> might/should cover this too. Pretty sure his book will. >>>>>> Regarding reads: >>>>>> One does not have to read a whole HDFS block. You can request >> arbitrary byte >>>>>> ranges with the block, via positioned reads. (It is true also >> that HDFS can >>>>>> be improved for better random reading performance in ways not >> necessarily >>>>>> yet committed to trunk or especially a 0.20.x branch with >> append support for >>>>>> HBase. See https://issues.apache.org/jira/browse/HDFS-1323) >>>>>> HBase holds indexes to store files in HDFS in memory. We also >> open all store >>>>>> files at the HDFS layer and stash those references. >> Additionally, users can >>>>>> specify the use of bloom filters to improve query time >> performance through >>>>>> wholesale skipping of HFile reads if they are known not to >> contain data that >>>>>> satisfies the query. Bloom filters are held in memory as well. >>>>>> So with indexes resident in memory when handling Gets we know >> the byte >>>>>> ranges within HDFS block(s) that contain the data of interest. >> With >>>>>> positioned reads we retrieve only those bytes from a DataNode. >> With optional >>>>>> bloomfilters we avoid whole HFiles entirely. >>>>>> Regarding writes: >>>>>> I think you should consult the bigtable paper again if you are >> still asking >>>>>> about the write path. The database is log structured. Writes >> are accumulated >>>>>> in memory, and flushed all at once. Later flush files are >> compacted as >>>>>> needed, because as you point out GFS and HDFS are optimized for >> streaming >>>>>> sequential reads and writes. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> - Andy >>>>>> Problems worthy of attack prove their worth by hitting back. - >> Piet Hein >>>>>> (via Tom White) >>>>>> >>>>>> ________________________________ >>>>>> From: Mohit Anchlia <[email protected]> >>>>>> To: [email protected]; Andrew Purtell >> <[email protected]> >>>>>> Sent: Thursday, July 7, 2011 11:53 AM >>>>>> Subject: Re: Hbase performance with HDFS >>>>>> >>>>>> I have looked at bigtable and it's ssTables etc. But my >> question is >>>>>> directly related to how it's used with HDFS. HDFS >> recommends large >>>>>> files, bigger blocks, write once and read many sequential >> reads. But >>>>>> accessing small rows and writing small rows is more random and >>>>>> different than inherent design of HDFS. How do these 2 go >> together and >>>>>> is able to provide performance. >>>>>> >>>>>> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell >> <[email protected]> wrote: >>>>>>> Hi Mohit, >>>>>>> >>>>>>> Start here: http://labs.google.com/papers/bigtable.html >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> >>>>>>> - Andy >>>>>>> >>>>>>> Problems worthy of attack prove their worth by hitting >> back. - Piet Hein >>>>>>> (via Tom White) >>>>>>> >>>>>>> >>>>>>>> ________________________________ >>>>>>>> From: Mohit Anchlia <[email protected]> >>>>>>>> To: [email protected] >>>>>>>> Sent: Thursday, July 7, 2011 11:12 AM >>>>>>>> Subject: Hbase performance with HDFS >>>>>>>> >>>>>>>> I've been trying to understand how Hbase can provide >> good performance >>>>>>>> using HDFS when purpose of HDFS is sequential large >> block sizes which >>>>>>>> is inherently different than of Hbase where it's >> more random and row >>>>>>>> sizes might be very small. >>>>>>>> >>>>>>>> I am reading this but doesn't answer my question. It >> does say that >>>>>>>> HFile block size is different but how it really works >> with HDFS is >>>>>>>> what I am trying to understand. >>>>>>>> >>>>>>>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >> >
