There are slide. I think you have to register with an email and fist/last name to download the ppt.
On Sun, Jun 3, 2012 at 12:21 PM, S Ahmed <[email protected]> wrote: > Elliot, > > Is there a video or slides? I guess I have to register to view it? > > On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark <[email protected] > >wrote: > > > If you want to get into the really nitty gritty I found Lars' > presentation > > really insightful. > > > > http://www.hbasecon.com/sessions/learning-hbase-internals/ > > > > On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil <[email protected] > > >wrote: > > > > > > > > Hi there, I think you probably want to look at thisŠ > > > > > > Hbase catalog metadataŠ > > > > > > http://hbase.apache.org/book.html#arch.catalog > > > > > > How data is stored internallyŠ > > > > > > http://hbase.apache.org/book.html#regions.arch > > > > > > Lots of versioning description hereŠ > > > > > > http://hbase.apache.org/book.html#datamodel > > > > > > > > > > > > Long story short, client talks directly to RegionServers, Hbase looks > at > > > multiple StoreFiles. > > > > > > > > > > > > On 6/1/12 4:27 PM, "S Ahmed" <[email protected]> wrote: > > > > > > >(reference: > > > >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html > ) > > > > > > > >A row consists of a key, and column families, along with a timestamp. > > > > > > > >So for example: > > > > > > > >key = com.example.com/some/path > > > > > > > >cf: outboundlinks { > > > > com.example.com/link1, > > > > com.example.com/link2, > > > > .. > > > >} > > > > > > > >Data is stored like this: > > > > > > > >Region Server -> Store -> StoreFile -> HFile > > > > > > > >Now when a client requests a particular key, the hmaster figures out > > which > > > >region server holds the data, this information is returned the client > > > >(which saves it locally), and then it makes a request to the region > > > >server. > > > > > > > >Now since the actual data files are immutable, if you modify a > > particular > > > >value in a CF, it is tombestombed (not sure how that works but > > understand > > > >it at a high level). > > > > > > > >So if I make a request for a given key, going with the example above, > a > > > >particular url on the website example.com, and i want all the > > > >outboundlinks > > > >I reference the column family "outboudnlinks" which can store millions > > of > > > >urls. > > > > > > > >What process/service/class is in charge of assembling the various > files > > to > > > >get all the correct data? > > > > > > > >Summary of my question: > > > >What I am trying to understand is, if a particular CF has millions of > > > >values, and if a single value is mutated, a new file has to be > created. > > > >So > > > >this means, if I query for that value i.e. it is included in my result > > > >set, > > > >how does hbase know where to look for the latest data? > > > > > > > >So basically from what I understand, making a get request for a > > particular > > > >key, cf will have to potentially look at more than one StoreFile (or > > > >HFile?) correct? > > > > > > > > > > > >
