-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If you are using DBPedia as a source of enhancement possibilities, I wonder if that has to do more with a bias in the DBpedia dataset than any bias in Stanbol?
- --- A. Soroka Software & Systems Engineering :: Online Library Environment the University of Virginia Library On Mar 14, 2012, at 1:20 PM, Mathieu D'Aquin wrote: > Hi Rupert, > > Thanks for the quick answer and the pointer. > In summery, if I understand well, it is the enhancer's normal behaviour to > return such entities (e.g., that everybody called Sean will be recognised as > Sean Connery) and the only thing for me to do is to apply some post > processing/filtering. > > Would there be some documentation explaining more comprehensively what kind > of filters should be applied for different types of entities? I noticed for > example that the enhancer biased towards american presidents and american > universities. Actually, generally, it is quite biased towards american > things. > > Thanks! > Mathieu. > > On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote: > >> Hi >> On 14.03.2012, at 12:25, Mathieu D'Aquin wrote: >> >>> Hi All, >>> >>> I'm trying to use the enhancer service, currently with the default >>> settings, but it seems to be behaving rather funnily. >>> (note that I only care about EntityAnnotation's with references to dbpedia >>> entities). >>> >>> For example, I have tried with the text of the page >>> http://sssw.org/2012/invited-speakers-tutors/ >>> >>> And it gives very weird (even random looking) results, such as "Sean >>> Connery" or "Nazi Germany". >>> >> If you find "Germany" as a location Stanbol will return three suggested >> entities. In this case this will be >> >> 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125) >> 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984) >> 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215) >> >> (confidence values for the NamedEntityTaggingEngine are the Solr scores for >> the used query) >> >> I guess this is the reason why you are getting Nazi_Germany as an suggestion >> for a lot of pages. >> >> For Persons the problem is with cases where OpenNLP NER (Named Entity >> Recognition) marks a Person in the text, but only provides the given or >> family (e.g. "sean"). In this case the Entity linking will provide you with >> the most prominent person in DBpedia with that name - in your case "Sean >> Connery". >> >> This problem is also described by >> [STANBOL-320](https://issues.apache.org/jira/browse/STANBOL-320). >> >>> This weird behaviour is not limited to this page. I have processed several >>> thousand pages and clearly the results have not been what we would have >>> expected (very often, for example, it gives us the entity "Jesus" for no >>> obvious reason). >>> >> >> Jesus is also a "Person" in DBpedia. So I assume that this is similar to >> "sean" -> "Sean Connery" >> >>> Am I doing something wrong? >>> Do the default enhancer services need some kind of configuration? >>> >> >> related to this I would suggest to >> >> * only consider the suggestion with the highest confidence >> * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the >> "fise:selected-text" property only has a given or family name >> >> >> best >> Rupert >> >>> I have looked at the documentation but couldn't find anything that seemed >>> to be helpful with this respect. >>> >>> Thanks! >>> Mathieu. >>> >>> -- >>> The Open University is incorporated by Royal Charter (RC 000391), an exempt >>> charity in England & Wales and a charity registered in Scotland (SC 038302). >> > -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJPYOX0AAoJEATpPYSyaoIkckEIAMr+BIkDTgram4Ow7NeEOSxj K+vSWHStUfaOXnWSj8v6unwDls/yS6H+CZn20rezeLkJZ7nckOc+9TQIcwhbl0yV LxYsx7NIfiefPKwCGyDH1n8Y4080CspXgWKO5+38pTT5+EjHtU4ienLhDIRjETY7 +cTh2mQN4fe8VoYgpgl1YQgpafCMmZHwP36ftA3likEO2ZGdOJmPzTpEGR/2A2FQ kYVZshoX6Y6sjSnD+gCfxwPPliE9Td8tJGxKECmAKn8/JRRaDSsQ9AckN3E3hGEg 1guc4HHkIRmJcu7wTbJR6gHmXm5zLWtdMHqLxf6z7KYRb3TkwA22erO+WD8PWs0= =aYov -----END PGP SIGNATURE-----
