[
https://issues.apache.org/jira/browse/OPENNLP-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078718#comment-13078718
]
James Kosin commented on OPENNLP-239:
-------------------------------------
Okay,
Here are some of the issues:
(a) I didn't add a check for duplicates, it was only a statement of fact that
there is only one. But, it does matter when building the dictionary that if
case sensitive or not that a dictionary that is not case sensitive may want to
keep the duplicates out, especially if when building the dictionary they are
just adding them as they are originally done which may cause issues later.
(b) I tried reviewing a method to remove the isCaseSensitive entry from the
StringListWrapper... unfortunately, we have a few small problems: (1) the
StringListWrapper is a static class (most likely to increase performance), this
means the StringListWrapper doesn't have access to the Dictionary caseSensitive
flag to follow the entries. (2) even if I change that, then a user will be
allowed to change a Dictionary that was created to be case insensitive to a
case sensitive type which will most likely produce undesireable results [what
may be happening now].
The current implementation, allows a bit more flexibility:
Dictionary Entry Other Entry Result
-------------- -------- ---------------- -----------
true true true a case sensitive comparison is
done with equals()
false true false a case insensitive comparison
is done with equals()
xxxx false xxxx a case insensitive comparison is
done with equals()
This can be changed later if needed; but in English. If either one is case
insensitive when doing a compare then the compare is done with case insensitive
comparison. ie: 'Hello' and 'hello' are always the same equals(),
I kept them separate for now in that we could create a Dictionary class later
that allows a mixed comparison with other strings... ie: If we had a string we
know always to be capitalized and case sensitive like 'ASCII' we could put the
entry in a Dictionary that also contains entries that have case insensitivity,
with other words containign false for the case attribute.
The bad part about putting the flag only in the Dictionary flag is that then we
only have one flag, and we loose the original Dictionary meaning and then
having the flag also comes into question for building the Dictionary.
Sorry, I don't really mean to sound ugly... I'm just trying to get some
discussion on this and what we could do about the problem. I'm okay with
moving the attribute to the dictionary attributes section instead of the entry
attributes section. Just trying to keep options open; since, it doesn't look
like moving the flag is easily done without causing more issues.
> Case Sensitivie Flag & Custom Tag Dictionary
> --------------------------------------------
>
> Key: OPENNLP-239
> URL: https://issues.apache.org/jira/browse/OPENNLP-239
> Project: OpenNLP
> Issue Type: New Feature
> Components: Parser
> Affects Versions: tools-1.5.1-incubating
> Reporter: mark meiklejohn
> Assignee: James Kosin
> Fix For: tools-1.5.2-incubating
>
>
> Unable to set case sensitive flag as per TreebankParser 1.3.1 or use a custom
> tag dictionary
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira