Folks,

I’m new to HBase (but not new to these sorts of data stores.) I think HBase 
would be a good fit for a project I’m working on, except for one thing: the 
amount of data we’re talking about, here, is far smaller than what’s usually 
recommended for HBase. As I read the docs, though, it seems like the main 
argument against small datasets is replication: HDFS requires a bunch of nodes 
right from the start and that’s overkill for my use.

So, what’s the motivation behind labeling standalone HBase deployments “dev 
only”? If all I really need is a table full of keys and all of that will fit 
comfortably in a single node, and if I have my own backup solution (literally, 
backing up the VM on which it’ll run), why bother with HDFS and distributed 
HBase?

(As an aside, I could go to something like Berkeley DB but then I don’t get all 
the nice coprocessors and filters and so on, not to mention cell-level 
security. Because I work with patient data the latter is definitely a huge win.)

Thanks for your help.


Joseph Rose
Intelligent Health Laboratory
Boston Children’s Hospital

Reply via email to