I need to ask this question because I think it might be something Lucene.Net 
can do but I'm not sure.  I have a list of 8,000+ words that are considered 
Cancer Terms by the NCI 
https://www.cancer.gov/publications/dictionaries/cancer-terms?expand=A
I have the terms stored locally but I need to index articles that I have 
downloaded and count the number of times each word appears in the article.  The 
purpose for this is to determine if the article is Cancer Related.  I work for 
a NCI Designated Cancer Center and I need a way to analyze the our Researchers 
publications that are Members of our Cancer Center.  I know that a slow way to 
do this is to loop each and every word and see if the indexof give a positive 
result or I have found a suggestion of creating a match criteria using Regex 
with all 8,000 words.

But I feel that if I Index the Cancer Terms using Lucene.net I should be able 
to do the same thing but faster????

If I'm totally off the mark just let me know.  I've been on the user group for 
over 15 years and love the potential.

Thanks,
Bill





-------------------------------------------------------------------------
This message was secured via TLS by MUSC.

Reply via email to