What advantage you will be gaining by compressing? Less space? But then it will add compression/decompression performance overhead. A trade-off but a especially significant as space is cheap and redundancy is OK with such data stores.
Having said that, more importantly, what are your read use-cases or access patterns? That should drive your decision about row key design. Regards, Shahab On Thu, Aug 29, 2013 at 5:21 AM, Wasim Karani <[email protected]>wrote: > I am using HBase to store webtable content like how google is using > bigtable. > For reference of google bigtable > My question is on RowKey, how we should be forming it. > What google is doing is saving the URL in a reverse order as you can see in > the PDF document "com.cnn.www" so that all the links associated with > cnn.com > will be manages in same block of GFS which will be lot easier to scan. > I can use the same thing as google is using but wont it will be cool if I > use > some algorithm to compress the url > > For eg. > > RewKey | Google Bigtable > | Algorithm output > www.cnn.com/index.php | com.cnn.www/index.php > | 12as/435 > www.cnn.com/news/business/index.html | > com.cnn.www/news/business/index.html > | 12as/2as/dcx/asd > www.cnn.com/news/sports/index.html | com.cnn.www/news/sports/index.html > | 12as/2as/eds/scf > Reason behind doing this is rowkey will be shorter as per the Hbase design > schema (Mentioned in topic 6.3.2.3. Rowkey Length). > > So what do I need from you guys is to know am I correct over here.... > Also if I am correct what Algorithm I should using. I am using python over > thrift as a programming language so code will be overwhelming for me... > >
