Improvements for the KeywordLinkingEngine and POS tagging support for Spanish

Rupert Westenthaler Thu, 12 Jul 2012 01:38:09 -0700

Hi all

I have implemented two improvements for the KeywordLinkingEngine


1. Improved POS tag processing (STANBOL-685): The new algorithm is now
more likely to filter non-nouns as it cuts the minimum POS tag
probability requirement for doing so in half
2. Added configuration option for the minimum similarity needed so
that tokens if the text are considered to match tokens in the label of
an Entity (STANBOL-686).

The documentation of the KeywordLinkingEngine [1] was also updated to
reflect this changes.

In addition I added support for the Spanish POS model (STANBOL-688)
available via

    https://github.com/utcompling/OpenNLP-Models

So the KeywordLinkingEngine can now use POS tags to improve Entity
extraction from Spanish texts.


Feedback related to those changes is very welcome!

best
Rupert


[1] 
http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html

Links to the Issues:

https://issues.apache.org/jira/browse/STANBOL-685
https://issues.apache.org/jira/browse/STANBOL-686
https://issues.apache.org/jira/browse/STANBOL-688


-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Improvements for the KeywordLinkingEngine and POS tagging support for Spanish

Reply via email to