Hi Jacques, I did consider that. So, this increases the on-disk size of my data by 3-4x (=600-800MB). That still does not explain why reading 1row (=~4MB with overhead) takes 5sec. About serialization/deserialization on the client side - it happens on a different thread out of a buffer and most of the time, that thread is just idling.
Gurjeet On Sun, Aug 12, 2012 at 2:05 PM, Jacques <[email protected]> wrote: > Something to consider is that HBase stores and retrieves the row key (8 > bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every > single value. The schemaless nature of HBase generally means that this > data has to be stored for each row (certain kinds of newer block level > compression can make this less). So depending on your column qualifiers, > you're going to be looking at potentially a huge amount of overhead when > you're dealing with 200,000 cells in a single row. I also wonder whether > you're dealing with a large amount of overhead simply on the > serialization/deserialization/instantiation side if you're pulling back > that many values. > > I'm not sure how many people are using that many cells in a single row and > trying to read or write them all at once. > > Other's may have more thoughts. > > Jacques > > > > On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <[email protected]> wrote: > >> Hi Ted, >> >> Yes, I am using the cloudera distribution 3. >> >> Gurjeet >> >> Sent from my iPad >> >> On Aug 12, 2012, at 7:11 AM, Ted Yu <[email protected]> wrote: >> >> > Gurjeet: >> > Can you tell us which HBase version you are using ? >> > >> > Thanks >> > >> > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <[email protected]> >> wrote: >> > >> >> Thanks for the reply Stack. My comments are inline. >> >> >> >>> You've checked out the perf section of the refguide? >> >>> >> >>> http://hbase.apache.org/book.html#performance >> >> >> >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine. >> >> Both configurations are backed by SSDs and Hbase options are set to >> >> >> >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" >> >> >> >> The data that I am dealing with is static. The table never changes >> >> after the first load. >> >> >> >> Even some of my GET requests are taking up to a full 60 seconds when >> >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a >> >> single row (~1MB) seems a extremely high to me. >> >> >> >> Thanks again for your help. >> >> >>
