Folks, I’m new to HBase (but not new to these sorts of data stores.) I think HBase would be a good fit for a project I’m working on, except for one thing: the amount of data we’re talking about, here, is far smaller than what’s usually recommended for HBase. As I read the docs, though, it seems like the main argument against small datasets is replication: HDFS requires a bunch of nodes right from the start and that’s overkill for my use.
So, what’s the motivation behind labeling standalone HBase deployments “dev only”? If all I really need is a table full of keys and all of that will fit comfortably in a single node, and if I have my own backup solution (literally, backing up the VM on which it’ll run), why bother with HDFS and distributed HBase? (As an aside, I could go to something like Berkeley DB but then I don’t get all the nice coprocessors and filters and so on, not to mention cell-level security. Because I work with patient data the latter is definitely a huge win.) Thanks for your help. Joseph Rose Intelligent Health Laboratory Boston Children’s Hospital
