Hi, general strategy and schemata approach question.

I've got a lot of different data in a relational db I'm trying to make
searchable. One thing for example is searching for people by email
address. I have 6 tables that might be, 10s of millions of records
and none of it standardized. So it's mixed case and may have multiple
emails in one field or something which isn't an email address at all.

To do that as a one off isn't too bad but the data will be added to,
and PKs will get phased out and split into multiple PKs etc. Also I
want this on a number of other fields too that will need different
transformations applied to the data and come from their own set of
tables.

I could do this a number of ways but I'm not satisfied with any of them
and I don't think that such a generic proposition has no tools already
somewhat suited for this task.

The best tools for this may not be HBase but I'd like to
put my HBase cluster to work on this and have it available to
MR jobs.

Best, James

Reply via email to