Hey Josh Thanks much for the feedback. Sounds good
S ________________________________ From: Josh Elser <els...@apache.org> Sent: 23 February 2021 9:14 AM To: user@phoenix.apache.org <user@phoenix.apache.org> Subject: Re: Advice wanted on supporting a tag feature for searching an HBase Table via Phoenix I had a similar sort of issues (granted, less data scale), and I went with option 2. If you put the rowkey of your "data" table plus the tag itself into the rowkey for your other table/index, you should be able to grow without running into HBase scalability (though, pulling 10GB of tags for one lookup would be crazy slow :P). It's a fast rowkey, prefix scan to pull all the tags for the "data record". Just don't forget that hbase won't split a single row across multiple Regions. That's the important part in designing this table. On 2/21/21 11:51 PM, Simon Mottram wrote: > The requirement is to be able to search from a list of tags, each record > can have a possible large number of tags. There would be more than one > tag field. > > An example might 3 different hashtag fields. They do have to be > different; we can't have just one tag cloud. > > The data size is large so we need to be able to search the tag clouds > over large numbers. Millions but not billions (for now) > > e.g: > > I was wondering what the best method would be > > 1) a column per tag value. > ID, name, some_attributes..., type1_tag_1, type1_tag_2 > > While hbase is happy with many columns I can't see how to index this > > 2) A tag join table. Maybe just a single row key ID + single tag. > Then it becomes a straight join of ID + tag. Thus it would be indexed. > > 3) Is there a crafty way of using column families? Could that be > indexed efficiently? > > Any tips/tricks gratefully received > > Simon