The requirement is to be able to search from a list of tags, each record can 
have a possible large number of tags.  There would be more than one tag field.

An example might 3 different hashtag fields.  They do have to be different; we 
can't have just one tag cloud.

The data size is large so we need to be able to search the tag clouds over 
large numbers.  Millions but not billions (for now)

e.g:

I was wondering what the best method would be

1) a column per tag value.
ID, name, some_attributes..., type1_tag_1,  type1_tag_2

While hbase is happy with many columns I can't see how to index this

2) A tag join table.  Maybe just a single row key  ID + single tag.  Then it 
becomes a straight join of ID + tag.   Thus it would be indexed.

3) Is there a crafty way of using column families?  Could that be indexed 
efficiently?

Any tips/tricks gratefully received

Simon

Reply via email to