Jorn,

1)  The last commit should have said non-static and not static.

2) The UIMA Annotators don't seem to be using the case sensitivity dictionary; so, I've set the serializer to use true for the case sensitivity flag and not save the return result. Otherwise, we would really want the UIMA StringDictionary to be really a Dictionary instead from OpenNLP tools.

3) The POSDictionary is an interesting situation. There are actually multiple issues:
  public String[] getTags(String word) {
    if (isCaseSensitive) {
      return dictionary.get(word);
    }
    else {
      return dictionary.get(word.toLowerCase());
    }
  }
This section of code is totally broken for the following reasons, (a) it completely depends on how the dictionary is built originally, if case sensitive then you can't try the above to implement a case insensitive, or visa versa. (b) it will never find the correct item otherwise. The original code was only setup for the caseSensitive flag to be always true. We need to look over the POSDictionary and determine how we want this to work and outline a plan. Can I get a vote on any ideas?

Thanks,
James

On 8/8/2011 9:54 PM, James Kosin (JIRA) wrote:
     [ 
https://issues.apache.org/jira/browse/OPENNLP-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081362#comment-13081362
 ]

James Kosin commented on OPENNLP-239:
-------------------------------------

Jorn,

Sorry, I've been busy...  The default was because I didn't know the
default state for many of the models.  Most of the time it is based on
how they are created.  I can fix that easily; so that it gets set to
true if not present.

The static was required because the function is static and doesn't have
access to the non-static members.  I agree it was a nasty compromise.
The other way to go would be to add the serializing to the Dictionary
object itself... but, I don't know the problems with growing the
Dictionary class too large... lastly, we could have the serializing as
non-static meaning we would need to create a DictionarySerializer to use.

James

I'm also not familiar with the



Case Sensitivie Flag&  Custom Tag Dictionary
--------------------------------------------

                 Key: OPENNLP-239
                 URL: https://issues.apache.org/jira/browse/OPENNLP-239
             Project: OpenNLP
          Issue Type: New Feature
          Components: Parser
    Affects Versions: tools-1.5.1-incubating
            Reporter: mark meiklejohn
            Assignee: James Kosin
             Fix For: tools-1.5.2-incubating


Unable to set case sensitive flag as per TreebankParser 1.3.1 or use a custom 
tag dictionary
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



Reply via email to