The "HBase: The Definitive Guide" answer seems pretty, um, definitive to me. The only reason I would even consider going against that advice is if I had solid knowledge that it was impossible for a user to have more than 100,000 emails. But even then it seems like a difficult design decision to justify. How does that design help you do something?
Dave -----Original Message----- From: Srikanth P. Shreenivas [mailto:[email protected]] Sent: Thursday, September 01, 2011 11:53 AM To: [email protected] Subject: Tall-Narrow vs. Flat-Wide Tables Hi, HBase: The Definitive Guide book's chapter 9 talks about Tall-Narrow vs Flat-wide tables. (http://ofps.oreilly.com/titles/9781449396107/advanced.html) It seems to propose that Tall-Narrow tables (more rows, less columns) is better design. One of the issue it talks about with "Flat-wide" tables (less rows and more columns) is ... In addition, HBase can only split at row boundaries, which also enforces the recommendation to go with tall-narrow tables. Imagine you have all emails of a user in a single row. This will work for the majority of users, but there will be outliers that will have magnitudes of emails more in their inbox. So much so that a single row could outgrow the maximum file/region size and work against the region split facility. ... So, my query is that is it a bad idea to have a table as given in above example wherein emails are stored by adding columns. I seem to have a similar table in my application, wherein I have a region size of 1GB and cell value of 10KB. So, will I run into region-split issue mentioned above after 100000 (1GB / 10KB = 100000) columns. Regards, Srikanth ________________________________ http://www.mindtree.com/email/disclaimer.html
