Hi all, on 01.06. Pablo and I gave you an update of the DBpedia Spotlight integration in Stanbol (I replied to it with this mail, so you can review it again if you wis). We had implemented an EnhancementEngine, which did the entire annotation process (spotting, candidates selection and disambiguation), to demo the functionality DBpedia Spotlight would bring to Stanbol. In the last two weeks, apart from running evaluations, we created 3 further Enhancement Engines, which allow the combination of DBpedia Spotlight functionality with other stanbol engines in a chain context, as follows:
URL abbreviations: engineURL: http://spotlight.dbpedia.org/stanbol/enhancer/engine/ chainURL: http://spotlight.dbpedia.org/stanbol/enhancer/chain/ engineUrl/dbpspotlightannotate --------------------- This was the EnhancementEngine we were reffering to above. The only thing that changed here is, that the parameter "onlyNER", which could be used to do only spotting, was removed, as we extracted spotting in a separate engine. This engine performs the entire annotation process, from natural language text to DBpedia URIs. engineUrl/dbpspotlightspot --------------------- This engine does only NER, and stores the results as TextAnnotations, which can be input to a stanbol engine which does linking. It also performs other kinds of phrase recognition besides NER [1]. engineUrl/dbpspotlightdisambiguate --------------------- This engine reads TextAnnotations, stored by a stanbol NER engine, and does candidate selection, disambiguation and linking. engineUrl/dbpspotlightcandidates --------------------- This engine equivalent to annotate, except all possible disambiguations for a TextAnnotation are returned, and not only the best disambiguation as in /dbpspotlightannotate Apart from the EnhancementEngines, the following EnhancementChains were defined: chainURL/dbpspotlight --------------------- This chain replicates the functionality of /dbpspotlightannotate, by chaining /dbpspotlightspot and /dbpspotlightdisambiguate. Note: langid is run first, and only english texts are processed chainURL/dbpspotlightonlyspot --------------------- Demonstrates the use of /dbpspotlightspot with a diferent linker, in this case /dbpediaLinking chainURL/dbpspotlightonlydisambiguate --------------------- Demonstrates the use of /dbpspotlightdisambiguate with a diferent NER engine, in this case /ner We would greatly appreciate feedback and improvement suggestions. [1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Spotting kind regards, Iavor Am 01.06.2012 17:37, schrieb Pablo Mendes: > *Hi all,* > *Iavor [1] and myself [2] have been working on integrating DBpedia > Spotlight functionality into Stanbol as part of the EAP [3]. **This message > is to give you a status update.* > > *In DBpedia Spotlight content enhancement goes through Spotting and > Disambiguation. First, spotting recognizes phrases that should be annotated > in the input text. If all you care about are named entities, you can use a > named entity recognizer (NER) in this step. DBpedia Spotlight includes NER, > Keyphrase Extraction, and other spotting techniques that can find phrases > referring to all 3.5M entities in DBpedia. Second, disambiguation (a.k.a. > Entity Linking) attempts to map each recognized phrase to its correct > identifier in DBpedia. The entire content enhancement process is > implemented in DBpedia Spotlight via the /annotate REST endpoint. > > We have developed a new enhancement engine for Stanbol using /annotate > endpoint in DBpedia Spotlight. The current version is currently being > tested and improved, and can be accessed at: > http://spotlight.dbpedia.org/stanbol/<http://spotlight.dbpedia.org/stanbol/system/console> > enhancer <http://88.198.155.99:9090/enhancer> > Configuration: > http://spotlight.dbpedia.org/stanbol/system/console > The account data for the felix configuration is the default one. The Engine > can be configured there as well. For instance, the option onlyNER (boolean > value, inspired by the OpenCalais plug-in) runs only the phrase > recognition (spotting) stage and creates TextAnnotations with the results, > which can be used in a chain context by other engines (e.g. DBpedia Linking) > > Our next step is to create an enhancement chain with two enhancement > engines: DBpedia Spotlight Spotting and DBpedia Spotlight Disambiguation. > > We have performed preliminary evaluations of the new enhancement engine > using the Stanbol Benchmark Component (SBC). The SBC allows evaluating > content enhancement engines based on examples of desired and undesired > behavior defined through Benchmark Definition Language (BDL) statements. We > have transformed the dataset from Kulkarni et al. 2009 [4] into BDL. The > BDL data set is available from: > http://spotlight.dbpedia.org/download/stanbol/ > > The SBC is a nice way to perform manual inspection of the behavior of the > enhancement chain for different examples in the evaluation dataset. > However, for evaluations with several hundreds of examples, it would be > interesting to have scores that summarize the performance for the entire > dataset. We are in the process of conducting large scale experiments with > existing datasets, aiming at producing precision and recall figures for > different enhancement chains.* > > Any comments are very welcome. > > Cheers, > Pablo > > [1] http://www.iks-project.eu/community/people/pablo-n-mendes > [2] http://www.iks-project.eu/community/people/iavor-jelev > [3] http://wiki.iks-project.eu/index.php/GzEvD_Proposal > [4] http://www.cse.iitb.ac.in/soumen/doc/QCQ/ >
