Something to consider is that HBase stores and retrieves the row key (8 bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every single value. The schemaless nature of HBase generally means that this data has to be stored for each row (certain kinds of newer block level compression can make this less). So depending on your column qualifiers, you're going to be looking at potentially a huge amount of overhead when you're dealing with 200,000 cells in a single row. I also wonder whether you're dealing with a large amount of overhead simply on the serialization/deserialization/instantiation side if you're pulling back that many values.
I'm not sure how many people are using that many cells in a single row and trying to read or write them all at once. Other's may have more thoughts. Jacques On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <[email protected]> wrote: > Hi Ted, > > Yes, I am using the cloudera distribution 3. > > Gurjeet > > Sent from my iPad > > On Aug 12, 2012, at 7:11 AM, Ted Yu <[email protected]> wrote: > > > Gurjeet: > > Can you tell us which HBase version you are using ? > > > > Thanks > > > > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <[email protected]> > wrote: > > > >> Thanks for the reply Stack. My comments are inline. > >> > >>> You've checked out the perf section of the refguide? > >>> > >>> http://hbase.apache.org/book.html#performance > >> > >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine. > >> Both configurations are backed by SSDs and Hbase options are set to > >> > >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" > >> > >> The data that I am dealing with is static. The table never changes > >> after the first load. > >> > >> Even some of my GET requests are taking up to a full 60 seconds when > >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a > >> single row (~1MB) seems a extremely high to me. > >> > >> Thanks again for your help. > >> >
