Hi all
I have implemented two improvements for the KeywordLinkingEngine
1. Improved POS tag processing (STANBOL-685): The new algorithm is now
more likely to filter non-nouns as it cuts the minimum POS tag
probability requirement for doing so in half
2. Added configuration option for the minimum similarity needed so
that tokens if the text are considered to match tokens in the label of
an Entity (STANBOL-686).
The documentation of the KeywordLinkingEngine [1] was also updated to
reflect this changes.
In addition I added support for the Spanish POS model (STANBOL-688)
available via
https://github.com/utcompling/OpenNLP-Models
So the KeywordLinkingEngine can now use POS tags to improve Entity
extraction from Spanish texts.
Feedback related to those changes is very welcome!
best
Rupert
[1]
http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html
Links to the Issues:
https://issues.apache.org/jira/browse/STANBOL-685
https://issues.apache.org/jira/browse/STANBOL-686
https://issues.apache.org/jira/browse/STANBOL-688
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen