[ 
https://issues.apache.org/jira/browse/OPENNLP-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078718#comment-13078718
 ] 

James Kosin commented on OPENNLP-239:
-------------------------------------

Okay,

Here are some of the issues:
(a)  I didn't add a check for duplicates, it was only a statement of fact that 
there is only one.  But, it does matter when building the dictionary that if 
case sensitive or not that a dictionary that is not case sensitive may want to 
keep the duplicates out, especially if when building the dictionary they are 
just adding them as they are originally done which may cause issues later.

(b)  I tried reviewing a method to remove the isCaseSensitive entry from the 
StringListWrapper... unfortunately, we have a few small problems: (1) the 
StringListWrapper is a static class (most likely to increase performance), this 
means the StringListWrapper doesn't have access to the Dictionary caseSensitive 
flag to follow the entries.  (2) even if I change that, then a user will be 
allowed to change a Dictionary that was created to be case insensitive to a 
case sensitive type which will most likely produce undesireable results [what 
may be happening now].

The current implementation, allows a bit more flexibility:
Dictionary    Entry      Other Entry     Result
--------------     --------     ----------------      -----------
true             true        true                a case sensitive comparison is 
done with equals()
false            true        false              a case insensitive comparison 
is done with equals()
xxxx            false       xxxx             a case insensitive comparison is 
done with equals()

This can be changed later if needed; but in English.  If either one is case 
insensitive when doing a compare then the compare is done with case insensitive 
comparison.  ie:  'Hello' and 'hello' are always the same equals(),

I kept them separate for now in that we could create a Dictionary class later 
that allows a mixed comparison with other strings... ie: If we had a string we 
know always to be capitalized and case sensitive like 'ASCII' we could put the 
entry in a Dictionary that also contains entries that have case insensitivity, 
with other words containign false for the case attribute.

The bad part about putting the flag only in the Dictionary flag is that then we 
only have one flag, and we loose the original Dictionary meaning and then 
having the flag also comes into question for building the Dictionary.

Sorry, I don't really mean to sound ugly... I'm just trying to get some 
discussion on this and what we could do about the problem.  I'm okay with 
moving the attribute to the dictionary attributes section instead of the entry 
attributes section.  Just trying to keep options open; since, it doesn't look 
like moving the flag is easily done without causing more issues.


> Case Sensitivie Flag & Custom Tag Dictionary
> --------------------------------------------
>
>                 Key: OPENNLP-239
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-239
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: mark meiklejohn
>            Assignee: James Kosin
>             Fix For: tools-1.5.2-incubating
>
>
> Unable to set case sensitive flag as per TreebankParser 1.3.1 or use a custom 
> tag dictionary

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to