Sorry for the late bump...
It is quite nice to store JSON as strings in HBase, i.e. use for
example JSONObject to convert to something like "{ "name' : "lars" }"
and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler
you can use Hive and its built in JSON support to query cells like so:
select get_json_object(hbase_table.value, '$.name') from hbase_table
where key = <some-key>;
and it returns "lars".
Lars
On Mon, Jan 31, 2011 at 10:15 PM, Sandy Pratt <[email protected]> wrote:
> My use of HBase is essentially what Stack describes: I serialize little log
> entry objects with (mostly) protobuf and store them in a single cell in
> HBase. I did this at first because it was easy, and made a note to go back
> and break out the fields into their own columns, and in fact into multiple
> column families in some cases. When I went back and did this, I found that
> my 'exploded' schema was actually slower to scan than the 'blob' schema was,
> and filters didn't seem to help all that much. This was in the 0.20 days,
> IIRC. So this is to say, +1 on storing blobs in HBase.
>
> I don't know if this would work for you, but what's worked well for me is to
> write side files for Hive to read as I ingest entries into HBase. I like
> HBase for durability, random access, sorting, and scanning, and I'll continue
> to use it to store the golden copy for the foreseeable future, but I've found
> that Hive against text files is at least a couple of times faster than MR
> against an HBase source for my map reduce needs. If you find that what you
> need from the Hive schema changes over time, you can simply nuke the files
> and recreate them with a map reduce against the golden copy in HBase.
>
> Sandy
>