Re: DBpedia Spotlight and Stanbol

Pablo Mendes Mon, 04 Jun 2012 03:51:55 -0700

Hi Rupert,

1. Do you think it might make sense to allow multiple EngineInstances
> using different Spotting algorithms?
>


You mean that instead of a "/spot" you would have a /lingpipespot, /ner and
/keyphrases? Can be done, but do you have a use case for that? Could this
also be done as some sort of URL rewrite? From
/spot?spotter=LingPipeSpotter to /lingpipespotter

> Our next step is to create an enhancement chain with two enhancement
> > engines: DBpedia Spotlight Spotting and DBpedia Spotlight Disambiguation.
>
> So basically to split this engine in to separate one, right?


Correct. Split this engine into two, and have a chain connecting the
engines.

Had not yet time to look at the examples in detail, but if the license
> if [4] allows and you agree we could think about to make them
> available as part of the Stanbol Enhancer.


I could not find licensing information. I can ask the authors directly.


> This is completely true. Can you start an Jira Issue about that. I
> will definitely help with implementing this.


Sure. Here: https://issues.apache.org/jira/browse/STANBOL-652
I have already implemented quite a bit of evaluation code. If you want, you
can use it as a starting point. My approach is to go over the dataset and
write out a log of each annotation attempt. Based on this log, I run a
series of R scripts to interpret the evaluation results.

Cheers,
Pablo


On Mon, Jun 4, 2012 at 11:45 AM, Rupert Westenthaler <
[email protected]> wrote:

> Hi Pablo
>
> I made som tests and the spotting looks great. Also tried some some of
> the different Spotting algorithms (NER, LingPipeSpotter (very slow)
> and  Kea).
>
> Here are some Questions/Suggestions related to the engine.
>
> 1. Do you think it might make sense to allow multiple EngineInstances
> using different Spotting algorithms?
>
> 2. I noticed that created TextAnnotations do not have "dc-terms:type"
> information. This property is used to represent the "nature" (e.g.
> Person, Organisation, Place in case of Named Entities) by the the
> Stanbol Enhancement Structure. So if such information are available it
> would be great to set it.
>
> 3. I would suggest to add support for the type suggestion filter
> feature as shown in the 2nd example of the user manuel [1]
>
> [1] http://wiki.dbpedia.org/spotlight/usersmanual#h139-10
>
> On Fri, Jun 1, 2012 at 5:37 PM, Pablo Mendes <[email protected]>
> wrote:
> > Our next step is to create an enhancement chain with two enhancement
> > engines: DBpedia Spotlight Spotting and DBpedia Spotlight Disambiguation.
>
> So basically to split this engine in to separate one, right?
>
> > We have performed preliminary evaluations of the new enhancement engine
> > using the Stanbol Benchmark Component (SBC). The SBC allows evaluating
> > content enhancement engines based on examples of desired and undesired
> > behavior defined through Benchmark Definition Language (BDL) statements.
> We
> > have transformed the dataset from Kulkarni et al. 2009 [4] into BDL. The
> > BDL data set is available from:
> > http://spotlight.dbpedia.org/download/stanbol/
> >
>
> Had not yet time to look at the examples in detail, but if the license
> if [4] allows and you agree we could think about to make them
> available as part of the Stanbol Enhancer.
>
> > The SBC is a nice way to perform manual inspection of the behavior of the
> > enhancement chain for different examples in the evaluation dataset.
> > However, for evaluations with several hundreds of examples, it would be
> > interesting to have scores that summarize the performance for the entire
> > dataset. We are in the process of conducting large scale experiments with
> > existing datasets, aiming at producing precision and recall figures for
> > different enhancement chains.*
> >
>
> This is completely true. Can you start an Jira Issue about that. I
> will definitely help with implementing this.
>
> best
> Rupert
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: DBpedia Spotlight and Stanbol

Reply via email to