I am about to try using the Porter stemmer with the ConceptMapper and wonder if anyone has any experience with this. Any suggestions, caveats, etc. would be most welcome.
A couple questions: * I presume I will need to stem the lookup dictionary when I build it. Or can I do that at some other point in the pipeline? * I plan to use the Lucene implementation of Porter stemmer and wrap it with a class that implements the interface required by ConceptMapper. Unless someone knows of a version of Porter stemmer that already implements that interface? * Will I also need to stem the stop-words dictionary? * I see the following comment preceding the stem() method in TokenNormalizer. I assume this is not really true because the default stemmer does not appear to be a Porter stemmer implementation. * If the stemming flag is true, then return the stemmed form of the supplied word using the * Porter stemmer. * Is there anything else I should be aware of such as how this might affect the search strategy? * Is it possible to get to the stemmed form of the word/phrase that matched? For instance could it be copied to the token? * Does anyone have experience with stemming medical terms? I would be running this against clinical notes typed by a physician about a patient. My dictionary was built from SNOMED concepts. Will stemming even help? Thanks, Larry Kline </pre>The contents of this electronic mail message and any attachments are confidential, possibly privileged and intended for the addressee(s) only.<br>Only the addressee(s) may read, disseminate, retain or otherwise use this message. If received in error, please immediately inform the sender and then delete this message without disclosing its contents to anyone.</pre>
