[
https://issues.apache.org/jira/browse/STANBOL-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-624.
-----------------------------------------
Resolution: Fixed
implemented with 1341438
> The NamedEntityTagging engine should use confidence values between [0..1]
> -------------------------------------------------------------------------
>
> Key: STANBOL-624
> URL: https://issues.apache.org/jira/browse/STANBOL-624
> Project: Stanbol
> Issue Type: Bug
> Components: Enhancer
> Affects Versions: 0.9.0-incubating
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Fix For: 0.10.0-incubating
>
>
> Currently the Solr result scores are used as confidence. Only exact matches
> are sorted in front of partial matches. However Solr result scores are not
> within the range [0..1] what makes it hard for clients to process confidence
> values.
> The suggestion is to use the following algorithm to "normalize" confidence
> values of this engine
> * score ... the Solr result score of the current entity
> * maxScore ... the highest Solr result score
> * maxExactScore ... the highest Solr result score of an Entity the exactly
> matches the fise:selected-text
> * levenshteinSimilarity ... the
> LevenshteinDistance(selectedText,label)/Math.max(selectedText.length(),label.length())
> The normalized Score is calculated as follows:
> if(levenshteinSimilarity == 1) //exact match
> score = score/maxExactScore;
> else
> score = score*levenshteinSimilarity/maxScore
> This ensures that
> * If there is a exact match it will have the confidence 1.0
> * If there are multiple exact matches they will be rated based on the Solr
> result scores (normalized to 1 using the result score of the best exact match
> as base)
> * all partial matches will have a score <= the levenshteinSimilarity
> * Partial matches are normalized by using the max result score (regardless if
> the result with the max Solr result score is a exact match or not).
> Note: This resembles a disambiguation based on the label of the Entity as
> well as possible Document Boosts in the Solr index. This is NOT intended to
> be a real Entity Disambiguation algorithm.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira