Elliot, Is there a video or slides? I guess I have to register to view it?
On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark <[email protected]>wrote: > If you want to get into the really nitty gritty I found Lars' presentation > really insightful. > > http://www.hbasecon.com/sessions/learning-hbase-internals/ > > On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <[email protected] > >wrote: > > > > > Hi there, I think you probably want to look at thisŠ > > > > Hbase catalog metadataŠ > > > > http://hbase.apache.org/book.html#arch.catalog > > > > How data is stored internallyŠ > > > > http://hbase.apache.org/book.html#regions.arch > > > > Lots of versioning description hereŠ > > > > http://hbase.apache.org/book.html#datamodel > > > > > > > > Long story short, client talks directly to RegionServers, Hbase looks at > > multiple StoreFiles. > > > > > > > > On 6/1/12 4:27 PM, "S Ahmed" <[email protected]> wrote: > > > > >(reference: > > >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html) > > > > > >A row consists of a key, and column families, along with a timestamp. > > > > > >So for example: > > > > > >key = com.example.com/some/path > > > > > >cf: outboundlinks { > > > com.example.com/link1, > > > com.example.com/link2, > > > .. > > >} > > > > > >Data is stored like this: > > > > > >Region Server -> Store -> StoreFile -> HFile > > > > > >Now when a client requests a particular key, the hmaster figures out > which > > >region server holds the data, this information is returned the client > > >(which saves it locally), and then it makes a request to the region > > >server. > > > > > >Now since the actual data files are immutable, if you modify a > particular > > >value in a CF, it is tombestombed (not sure how that works but > understand > > >it at a high level). > > > > > >So if I make a request for a given key, going with the example above, a > > >particular url on the website example.com, and i want all the > > >outboundlinks > > >I reference the column family "outboudnlinks" which can store millions > of > > >urls. > > > > > >What process/service/class is in charge of assembling the various files > to > > >get all the correct data? > > > > > >Summary of my question: > > >What I am trying to understand is, if a particular CF has millions of > > >values, and if a single value is mutated, a new file has to be created. > > >So > > >this means, if I query for that value i.e. it is included in my result > > >set, > > >how does hbase know where to look for the latest data? > > > > > >So basically from what I understand, making a get request for a > particular > > >key, cf will have to potentially look at more than one StoreFile (or > > >HFile?) correct? > > > > > > >
