My use of HBase is essentially what Stack describes: I serialize little log entry objects with (mostly) protobuf and store them in a single cell in HBase. I did this at first because it was easy, and made a note to go back and break out the fields into their own columns, and in fact into multiple column families in some cases. When I went back and did this, I found that my 'exploded' schema was actually slower to scan than the 'blob' schema was, and filters didn't seem to help all that much. This was in the 0.20 days, IIRC. So this is to say, +1 on storing blobs in HBase.
I don't know if this would work for you, but what's worked well for me is to write side files for Hive to read as I ingest entries into HBase. I like HBase for durability, random access, sorting, and scanning, and I'll continue to use it to store the golden copy for the foreseeable future, but I've found that Hive against text files is at least a couple of times faster than MR against an HBase source for my map reduce needs. If you find that what you need from the Hive schema changes over time, you can simply nuke the files and recreate them with a map reduce against the golden copy in HBase. Sandy
