Fwd: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" (Siwei Yu)

Rupert Westenthaler Fri, 23 Mar 2012 09:21:51 -0700

Hi Stanbol community

Let me forward this very good discussion and proposal for integrating DBpedia 
Spotlight with Apache Stanbol.


Feedback is very welcome!

best
Rupert Westenthaler

> From: Pablo Mendes <[email protected]>
> Subject: Re: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for 
> "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" 
> (Siwei Yu)
> Date: 23. März 2012 16:02:24 MEZ
> To: Siwei Yu <[email protected]>
> Cc: Rupert Westenthaler <[email protected]>, 
> [email protected]
> 
> 
> Hi Siwei, (switching to dbp-spotlight-developers, as to avoid spamming users 
> in dbp-spotlight-users)
> Please see answers below.
> 
> On Fri, Mar 23, 2012 at 3:51 PM, Siwei Yu <[email protected]> wrote:
> Dear Pablo and Rupert,
> 
> I'm sorry to post an incomplete email just now. Please ignore the
> previous email.
> 
> No problem. I figured it was an accidental ctrl+enter.
>  
> 
> Thanks a lot for your instructions! According to your comments, let me
> summarise the current status of the service mapped to the four stages:
> (1) Spotting, (2) Candidate Selection, (3) Disambiguation, (4)
> Filtering
> /annotate: (1), (2), (3)first candidate, (4)
> /candidate: (1), (2), (3)all candidate
> /disambiguate: (3)
> /feedback: not implemented
> Please let me know if the previous summary is incorrect.
> 
> 
> Correct.
> 
>  
> 
> However, Apache Stanbol each Enhancement Engine in an Enhancement
> Chain handles single task respectively (Rupert, is it true?). The
> functions of Enhancement Engines are not supposed to overlap others.
> We need to adjust the services of DBpedia Spotlight as follows:
> /spot: (1), to be implemented in this project, for DBpediaSpotlightSpotEngine
> 
> 
> It is likely that we will implement /spot for release v0.6, which may happen 
> before GSoC starts.
> 
> 
> /candidate: (2), to be refactored from current status, for
> DBpediaSpotlightCandidateEngine
> /disambiguate: (3), to be refactored from current status, for
> DBpediaSpotlightDisambiguateEngine
> 
> 
> We would probably provide a wrapper, rather than a refactored version.
> 
>  
> /filter: (4), to be implemented in this project, for
> DBpediaSpotlightFilterEngine
> As to /annotate, I think it's a complicated service which is not
> applicable for Apache Stanbol's "single task for each Enhancement
> Engine" requirement. But we can retain it for DBpedia Spotlight for
> other users (i.e. not for Apache Stanbol).
> 
> Sounds like /annotate would be an enhancement chain.
>  
> The /feedback API could be interesting, which I'd like to try to
> implement. More details should be discussed beforehand. However, I'm
> not sure there's enough time to complete it in this two-month summer.
> 
> I don't feel like wrapping DBpedia Spotlight classes is enough for a 
> summer-long coding project.
> You should include the /feedback in your project to make it stronger. 
> This API should take in feedback from any CMS, as Stanbol is CMS-agnostic.
> It should be able to store and later let engines query those, in order to 
> learn from their mistakes.
> You could think, for example, about filtering implementations that would use 
> feedback data to stop making the same mistakes.
> This is potentially the most interesting part for this project idea.
> 
>  
> 
> If the project scopes discussed above are generally OK, I'd like to
> think about the project plan and come up with a project proposal
> draft.
> 
> By the way, I have two small questions for DBpedia Spotlight Spotting
> and Enhancement Chain:
> 1. For Pablo, it's mentioned in [3] that there're three
> implementations for Spotting: Ling Pipe Spotter, Trie Spotter, Ling
> Pipe Chunk Spotter. How does /annotate determine which the best
> implementation is, for a service request? Can the user choose among
> them manually by sending different parameter(s)?
> 
> We also have by now 4 other implementations. We have to update the 
> documentation.
> Please see: 
> http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Daiber-Rajapakse-Sasaki-Bizer-DBpediaSpotlight-LREC2012.pdf
>  
> 2. For Rupert, could you please show me some examples of Enhancement
> Chain? I've studied some Enhancement Engines here [1]. I can
> understand how an individual Enhancement Engine works and how to
> implement a new one. After studying [2], I find Enhancement Chain a
> little confusing. Could you please lead me to the source code of the
> implementation of a concrete Enhancement Chain? I want to know the
> data I/O interface from one Enhancement Engine to another. In other
> words, how do the output of an Enhancement Engine become the input of
> another one?
> 
> Best regards,
> Siwei Yu
> 
> [1] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/list.html
> [2] http://incubator.apache.org/stanbol/docs/trunk/enhancer/chains/
> [3] http://wiki.dbpedia.org/spotlight/technicaldocumentation?v=3qy
> 
> > On Wed, Mar 21, 2012 at 4:27 PM, Rupert Westenthaler
> > <[email protected]> wrote:
> >>
> >> Hi Siwei Yu, Pablo
> >>
> >> see my comments inline. To make it better readable I also removed the
> >> parts of the mail that are not relevant to my comments.
> >>
> >> On Wed, Mar 21, 2012 at 12:01 AM, Pablo Mendes <[email protected]> 
> >> wrote:
> >> > On Tue, Mar 20, 2012 at 4:24 PM, Siwei Yu <[email protected]> wrote:
> >> >> 2. Should I develop one Enhancement Engine containing three services,
> >> >> or three engines (i.e. each service as an engine)? It's maybe related
> >> >> to the service function granularity. What's your opinion?
> >> >
> >> >
> >> > We could have one engine for each task separately, and an enhancement 
> >> > chain
> >> > should connect them together. We should also introduce a REST API /spot 
> >> > for
> >> > (1). We could perhaps make /candidates implement only (2) and make 
> >> > /annotate
> >> > accept a &verbose=on to act like the current /candidates does.
> >> >
> >> > Besides all of this reorganization that has to happen, Rupert is the guy
> >> > from Stanbol that can help you position your application in that regard.
> >> >
> >>
> >> I fully agree with that.
> >>
> >> Having separate EnhancementEngines for spotting, candidates selection
> >> and disambiguation would provide a lot of additional flexibility to
> >> experienced Stanbol users as they could even use parts of the DBpedia
> >> Spotlight functionalities within their existing enhancement engines.
> >>
> >> The definition of a  DBpedia Spotlight EnhancementChain ensures that
> >> typical users can use Spotlight without the need to know the inner
> >> working. Users would just need to send enhancement requests to
> >> "http://{host}:{port}/enhancer/chin/dbpedia"; assuming that the DBpedia
> >> Spotlight chain is called "dbpedia". There would even be the
> >> possibility to make the Dbpedia Spotlight EnhancementChain the default
> >> enhancement chain so that requests to "/enhancer" would be processed
> >> by it.
> >>
> >> >>
> >> >> By the way, my name is Siwei Yu. I have good knowledge of semantic
> >> >> technologies, such as RDF, OWL, SPARQL. I'm also familiar with the
> >> >> mainstream Java based RDF/OWL processing tools like owlapi, Jena,
> >> >> Sesame, AllegroGraph. I have strong Java coding skills with of good
> >> >> knowledge of the software design patterns. My research background
> >> >> meets the requirements very well. I believe it'll be a wonderful
> >> >> summer working with the DBpedia Spotlight community.
> >> >
> >> >
> >> > It would be good if you leveraged some of your Semantic Web background in
> >> > your application. The idea of a /feedback API, which receives corrections
> >> > made by the users could fit well in this regard.
> >> >
> >>
> >> A feedback API is also something that would be interesting for the
> >> Stanbol Enhancer.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
>

Fwd: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" (Siwei Yu)

Reply via email to