Hi, general strategy and schemata approach question. I've got a lot of different data in a relational db I'm trying to make searchable. One thing for example is searching for people by email address. I have 6 tables that might be, 10s of millions of records and none of it standardized. So it's mixed case and may have multiple emails in one field or something which isn't an email address at all.
To do that as a one off isn't too bad but the data will be added to, and PKs will get phased out and split into multiple PKs etc. Also I want this on a number of other fields too that will need different transformations applied to the data and come from their own set of tables. I could do this a number of ways but I'm not satisfied with any of them and I don't think that such a generic proposition has no tools already somewhat suited for this task. The best tools for this may not be HBase but I'd like to put my HBase cluster to work on this and have it available to MR jobs. Best, James
