Re: Dictionary lookup novice question

James Kosin Sat, 09 Mar 2013 07:14:53 -0800

Hi Dan,

The dictionary element is to add to the name recognizer to help findnames that don't match or to help enforce name recognition here. I'mnot exactly sure if this is quite what you want to do.

There is a lesser used Dictionary name finder that may be more suited towhat you are wanting to do... I think. But, the current version in1.5.2 has a few bugs. You can get a pre-release here:http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/ of our nextrelease to help with the problems.

The dictionary format is fairly straight forward .... though not welldocumented. There are also several CLI tools to convert files to adictionary format.


I guess I'll try to better the documentation here.... :-)

<?xml version="1.0" encoding="UTF-8"?><dictionary case_sensitive="true">
<entry>
<token>Patrick</token>
</entry>
</dictionary>

The dictionary contains entries for the tokens for each. When theDictionaryNameFinder is called, it will attempt to find the longestmatching series from the dictionary in the document.This sort of dictionary is best for keywords, some names and specialwords. You could use this type of dictionary populated with thekeywords for c/c++ and it could parse and tag a program file with allthe keywords.


Let me know if I'm headed down the wrong path here....

Thanks,
James

On 3/8/2013 11:56 PM, Daniel Franc wrote:

Hi James,

Thanks for your reply.  Maybe my questions are too elementary so sorry!
I was running through the OpenNLP manual and went through the"tokenizer" step(http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer).
Then when running through the "name finder" step it alluded to analternative separate dictionary lookup step (end of this section:http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.recognition.api)
I was able to create a dictionary for lookup, but I can't figure outhow to load it up or search with it.
My eventual goal is have a method to look up a set of terms within adocument as an alternative way to classify or tag the document and notnecessarily use the statistical name finder. I'm not familiar withJWNL but I could give that a try. It seems that I could manually codea text search through a document, but I thought I'd try to use OpenNLPfirst.
Thanks again -- Dan
On Fri, Mar 8, 2013 at 4:22 PM, James Kosin <[email protected]<mailto:[email protected]>> wrote:
    Dan,

    I'm guessing when you say tokenized you mean with POS values.  If
    so, a better approach would be to use the JWNL library to look up
    the dictionary terms.  We use this with our coref component and
    isn't hard to get working.  The biggest thing with POS is
    selecting the right one.  It may be better to build a model for
    the POS tokenizer than to build a dictionary for this.  Unless you
    are meaning for a different language.

    I guess I need more information from you on what you are trying to
    accomplish?

    James


    On 3/8/2013 6:05 PM, Daniel Franc wrote:

        Hello friends,

        I am at a novice level for both OpenNLP and Java and have been
        fumbling
        around to put together a working version of the software with
        some success
        thanks to the documentation provided!  My eventual goal is
        partially to
        look up terms within a pre-defined dictionary, and I've been
        able to use
        the dictionary creator to create a basic dictionary to lookup
        from as here:

             dictionary.serialize(new FileOutputStream(
        "/Applications/apache-opennlp-1.5.2-incubating/dictionarynames.txt"));
        My particular questions are:

        1. Can someone help me with loading this dictionary after it
        was previously
        created?

        2. Is there a straightforward was to implement a basic lookup
        mechanism for
        tokenized text?

        Thanks for your help!
        -Dan

Re: Dictionary lookup novice question

Reply via email to