Sorry for the late bump...

It is quite nice to store JSON as strings in HBase, i.e. use for
example JSONObject to convert to something like "{ "name' : "lars" }"
and then Bytes.toBytes(jsonString). Since Hive now has a HBase handler
you can use Hive and its built in JSON support to query cells like so:

select get_json_object(hbase_table.value, '$.name') from hbase_table
where key = <some-key>;

and it returns "lars".

Lars

On Mon, Jan 31, 2011 at 10:15 PM, Sandy Pratt <[email protected]> wrote:
> My use of HBase is essentially what Stack describes: I serialize little log 
> entry objects with (mostly) protobuf and store them in a single cell in 
> HBase.  I did this at first because it was easy, and made a note to go back 
> and break out the fields into their own columns, and in fact into multiple 
> column families in some cases.  When I went back and did this, I found that 
> my 'exploded' schema was actually slower to scan than the 'blob' schema was, 
> and filters didn't seem to help all that much.  This was in the 0.20 days, 
> IIRC.  So this is to say, +1 on storing blobs in HBase.
>
> I don't know if this would work for you, but what's worked well for me is to 
> write side files for Hive to read as I ingest entries into HBase.  I like 
> HBase for durability, random access, sorting, and scanning, and I'll continue 
> to use it to store the golden copy for the foreseeable future, but I've found 
> that Hive against text files is at least a couple of times faster than MR 
> against an HBase source for my map reduce needs.  If you find that what you 
> need from the Hive schema changes over time, you can simply nuke the files 
> and recreate them with a map reduce against the golden copy in HBase.
>
> Sandy
>

Reply via email to