Take a look at SOLR and Lucene. You should be able to a text search on the Hbase data written via Phoenix. It works via the hbase replication mechanism so should be near-real time. I think you would have to use the SOLR API to do the initial search, which would get you the Hbase rowkey, which you could parse and do a follow up Phoenix query for additional data. Note that I haven't done any of the above myself, so your mileage may vary.
> On Apr 3, 2017, at 6:27 PM, Randy <ruw...@gmail.com> wrote: > > Wondering if anyone knows whether there is an approach to swap in custom > indexing implementation, while leveraging all other functionalities of > Phoenix. The initial goal is just in SELECT query, but would be nice to make > custom index maintenance integrated in record life cycle as well. > > Phoenix supports secondary index already, but need to be more flexible with > real large data set when the format and quality varies. > > For example, assuming we have a table "PEOPLE" which has a column "NAME" > stored person's name. If there is a record with "Joe Smith" as the value of > "NAME" column, it would be really powerful if we can find it by variants or > partial name as criteria. Ideally all the following query would find the same > record if we can plug-in a custom indexing implementation in Phoenix: > > SELECT * FROM PEOPLE WHERE NAME='Joe Smith'; > SELECT * FROM PEOPLE WHERE NAME='Smith,Joe'; > SELECT * FROM PEOPLE WHERE NAME='Joseph Smith'; > > Given the secondary index has global vs. local implementation. I would > imagine there is some level of abstraction already on consuming the index. > Not expecting it would be official supported API, just some guidance on where > to start would be greatly appreciated. > > Thanks, > > Randy