Hi James, it seems a problem of search for non-standardized documents, I think solr (or some like this) may meet your requires. good luck.
2013/12/3 James Pettyjohn <[email protected]> > Hi, general strategy and schemata approach question. > > I've got a lot of different data in a relational db I'm trying to make > searchable. One thing for example is searching for people by email > address. I have 6 tables that might be, 10s of millions of records > and none of it standardized. So it's mixed case and may have multiple > emails in one field or something which isn't an email address at all. > > To do that as a one off isn't too bad but the data will be added to, > and PKs will get phased out and split into multiple PKs etc. Also I > want this on a number of other fields too that will need different > transformations applied to the data and come from their own set of > tables. > > I could do this a number of ways but I'm not satisfied with any of them > and I don't think that such a generic proposition has no tools already > somewhat suited for this task. > > The best tools for this may not be HBase but I'd like to > put my HBase cluster to work on this and have it available to > MR jobs. > > Best, James >
