Hi All I have around 20 million company name and I want to index them. Currently What I am doing I am tokenizing and for each token I am applying Metaphone 3 and then Stroring each token in Hbase. When I get new query(company to match) I will again tokenize and apply metaphone3 as I did when I stored them in Hbase Now for each token I will query Hbase and collate the result.
This seems inefficient and has some issue even after implementing the functionality of WordDelimiterFilterFactory<http://stackoverflow.com/questions/17707733/worddelimiterfilterfactory-not-including-all-permutations> and singleFilter factory. I am thinking to index these companies name in solr since all the functionality already there? Do we have support for spark?