Hi Rupert, 1. Do you think it might make sense to allow multiple EngineInstances > using different Spotting algorithms? >
You mean that instead of a "/spot" you would have a /lingpipespot, /ner and /keyphrases? Can be done, but do you have a use case for that? Could this also be done as some sort of URL rewrite? From /spot?spotter=LingPipeSpotter to /lingpipespotter > Our next step is to create an enhancement chain with two enhancement > > engines: DBpedia Spotlight Spotting and DBpedia Spotlight Disambiguation. > > So basically to split this engine in to separate one, right? Correct. Split this engine into two, and have a chain connecting the engines. Had not yet time to look at the examples in detail, but if the license > if [4] allows and you agree we could think about to make them > available as part of the Stanbol Enhancer. I could not find licensing information. I can ask the authors directly. > This is completely true. Can you start an Jira Issue about that. I > will definitely help with implementing this. Sure. Here: https://issues.apache.org/jira/browse/STANBOL-652 I have already implemented quite a bit of evaluation code. If you want, you can use it as a starting point. My approach is to go over the dataset and write out a log of each annotation attempt. Based on this log, I run a series of R scripts to interpret the evaluation results. Cheers, Pablo On Mon, Jun 4, 2012 at 11:45 AM, Rupert Westenthaler < [email protected]> wrote: > Hi Pablo > > I made som tests and the spotting looks great. Also tried some some of > the different Spotting algorithms (NER, LingPipeSpotter (very slow) > and Kea). > > Here are some Questions/Suggestions related to the engine. > > 1. Do you think it might make sense to allow multiple EngineInstances > using different Spotting algorithms? > > 2. I noticed that created TextAnnotations do not have "dc-terms:type" > information. This property is used to represent the "nature" (e.g. > Person, Organisation, Place in case of Named Entities) by the the > Stanbol Enhancement Structure. So if such information are available it > would be great to set it. > > 3. I would suggest to add support for the type suggestion filter > feature as shown in the 2nd example of the user manuel [1] > > [1] http://wiki.dbpedia.org/spotlight/usersmanual#h139-10 > > On Fri, Jun 1, 2012 at 5:37 PM, Pablo Mendes <[email protected]> > wrote: > > Our next step is to create an enhancement chain with two enhancement > > engines: DBpedia Spotlight Spotting and DBpedia Spotlight Disambiguation. > > So basically to split this engine in to separate one, right? > > > We have performed preliminary evaluations of the new enhancement engine > > using the Stanbol Benchmark Component (SBC). The SBC allows evaluating > > content enhancement engines based on examples of desired and undesired > > behavior defined through Benchmark Definition Language (BDL) statements. > We > > have transformed the dataset from Kulkarni et al. 2009 [4] into BDL. The > > BDL data set is available from: > > http://spotlight.dbpedia.org/download/stanbol/ > > > > Had not yet time to look at the examples in detail, but if the license > if [4] allows and you agree we could think about to make them > available as part of the Stanbol Enhancer. > > > The SBC is a nice way to perform manual inspection of the behavior of the > > enhancement chain for different examples in the evaluation dataset. > > However, for evaluations with several hundreds of examples, it would be > > interesting to have scores that summarize the performance for the entire > > dataset. We are in the process of conducting large scale experiments with > > existing datasets, aiming at producing precision and recall figures for > > different enhancement chains.* > > > > This is completely true. Can you start an Jira Issue about that. I > will definitely help with implementing this. > > best > Rupert > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >
