Re: Fwd: Re: Some questions about Dictionary and DictionaryNameFinder

Jim - FooBar(); Fri, 24 Feb 2012 03:10:46 -0800

On 24/02/12 05:09, James Kosin wrote:

Jim,


Maybe the problem is how you have created the dictionary.  The
DictionaryNameFinder's find() method is a greedy method that will match
as many tokens as possible.
If it isn't matching more than one token than that is probably all the
dictionary contains per entry.

Look at the simple example in the test packages for
opennlp.tools.namefind DictionaryNameFinderTest.java in the source packages.

There has a good example.

James


Hi James,

Well, the dictionary i created manually...basically i extracted all thedrug-names from drugbank.xml and wrote them to a txt file (one entry perline). then i processed that text-file in order to produce the xmlversion of the proper dictionary. What i have after doing all that is afile with contents of the type:


<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
<entry><token>Lepirudin</token></entry>
<entry><token>Cetuximab</token></entry>
<entry><token>Dornase Alfa</token></entry>
<entry><token>Denileukin diftitox</token></entry>
<entry><token>Etanercept</token></entry>
<entry><token>Bivalirudin</token></entry>
<entry><token>Leuprolide</token></entry>
<entry><token>Peginterferon alfa-2a</token></entry>
<entry><token>Alteplase</token></entry>
......
......
......etc etc

As you can see some drugs are multi-word entities and also the firstcharacter of each word is capitalized. Whenever i call the find() methodall i'm getting are the exact matches which means that case-sensitivitydoesn ot work either!!! For example i'm getting "Cetuximab" but not"cetuximab"...so the problem is twofold...Firstly and more importantly Icannot find multi-word entities even though they do exist in thedictionary and the test data. Secondly, even though i'm settingcase_sensitive="false" in both the xml file and the constructor of theDictionaryNameFinder, the actual results that i 'm getting are alwayscase-sensitive!!!


Can you see any problems with the xml file?

Jim

Re: Fwd: Re: Some questions about Dictionary and DictionaryNameFinder

Reply via email to