On 25.03.2012, at 16:46, Allel Benbrahim wrote:
> Hello
> The results we get from Stanbol are quite oftenly fuzzy.
> For instance, we have in a text an occurrence "Jean-Luc Mélenchon", who is
> candidate to the french elections, and the result obtained by Stanbol for
> this is "People -> Jean-Luc Godard", who is a famous french movie-maker.
>
Do you get this result by using the "NER engine -> NamedEntity linking engine"
or the "KeywordLinkingEngine"?
> It seams that this issue is similar to the one reported by Mathieu d'Aquin
> and for which a Jira case has been opened in September 2011 with an update
> on the 6th of March 2012.
>
Do you refer to
http://markmail.org/message/jifnvswo7rlq2epv and
https://issues.apache.org/jira/browse/STANBOL-320?
?
> Could you confirm us that this issue is still ongoing ?
> Would it be more relevant if we extracted results in english rather than
> french ?
English is definitely better supported than French, because OpenNLP has both
NER models and POS (part of speech) models for English and nothing for French.
> Is french planned in the roadmap ?
I am unsure if we should invest much time in filtering and post-processing of
Enhancement results as such "optimization" are rather application case
specific. However for some common sources of false suggestions (as the one
referenced by STANBOL-320) might be exceptions to that.
I think in the long run investing in good entity disambiguation algorithms is
the better way to go - and yes there are plans in that direction.
best
Rupert
> Thanks