[
https://issues.apache.org/jira/browse/OPENNLP-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boris Galitsky updated OPENNLP-413:
-----------------------------------
Attachment: patch.OPENNLP-413.txt
two boundary cases are demonstrated in
ParserChunker2MatcherProcessorTest :
"How to deduct rental expense from income ";
VS "How to deduct repair expense from rental income.";
[[ [NN-expense IN-from NN-income ], [JJ-rental NN-* ], [NN-income ]], [
[TO-to VB-deduct JJ-rental NN-* ], [VB-deduct NN-expense IN-from NN-income ]]]
MatchScore is adequate ( = 2.8) and bagOfWordsScore = 5.0 is too high
"Way to minimize medical expense for my daughter" VS
"Means to deduct educational expense for my son";
[[ [JJ-* NN-expense IN-for PRP$-my NN-* ], [PRP$-my NN-* ]], [ [TO-to VB-*
JJ-* NN-expense IN-for PRP$-my NN-* ]]]
MatchScore is adequate ( = 2.2) and bagOfWordsScore = 1.0 is too low
> demonstration how sensitive syntactic match is compared to bag-of-words
> approach
> --------------------------------------------------------------------------------
>
> Key: OPENNLP-413
> URL: https://issues.apache.org/jira/browse/OPENNLP-413
> Project: OpenNLP
> Issue Type: Improvement
> Components: Similarity
> Reporter: Boris Galitsky
> Assignee: Boris Galitsky
> Attachments: patch.OPENNLP-413.txt
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> per Jason's recommendation: have you done
> > standard similarity based on the standard bag-of-words model?
> I do simple bag-of-words with its own list of stopwords and compare two
> approaches on the pair of cases:
> 1) similar words but different meaning
> 2) different words but similar meaning
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira