Hey Josh

Thanks much for the feedback.  Sounds good

S
________________________________
From: Josh Elser <els...@apache.org>
Sent: 23 February 2021 9:14 AM
To: user@phoenix.apache.org <user@phoenix.apache.org>
Subject: Re: Advice wanted on supporting a tag feature for searching an HBase 
Table via Phoenix

I had a similar sort of issues (granted, less data scale), and I went
with option 2.

If you put the rowkey of your "data" table plus the tag itself into the
rowkey for your other table/index, you should be able to grow without
running into HBase scalability (though, pulling 10GB of tags for one
lookup would be crazy slow :P). It's a fast rowkey, prefix scan to pull
all the tags for the "data record".

Just don't forget that hbase won't split a single row across multiple
Regions. That's the important part in designing this table.

On 2/21/21 11:51 PM, Simon Mottram wrote:
> The requirement is to be able to search from a list of tags, each record
> can have a possible large number of tags.  There would be more than one
> tag field.
>
> An example might 3 different hashtag fields.  They do have to be
> different; we can't have just one tag cloud.
>
> The data size is large so we need to be able to search the tag clouds
> over large numbers.  Millions but not billions (for now)
>
> e.g:
>
> I was wondering what the best method would be
>
> 1) a column per tag value.
> ID, name, some_attributes..., type1_tag_1,  type1_tag_2
>
> While hbase is happy with many columns I can't see how to index this
>
> 2) A tag join table.  Maybe just a single row key  ID + single tag.
> Then it becomes a straight join of ID + tag.   Thus it would be indexed.
>
> 3) Is there a crafty way of using column families?  Could that be
> indexed efficiently?
>
> Any tips/tricks gratefully received
>
> Simon

Reply via email to