Hi Stanbol community Let me forward this very good discussion and proposal for integrating DBpedia Spotlight with Apache Stanbol.
Feedback is very welcome! best Rupert Westenthaler > From: Pablo Mendes <[email protected]> > Subject: Re: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for > "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" > (Siwei Yu) > Date: 23. März 2012 16:02:24 MEZ > To: Siwei Yu <[email protected]> > Cc: Rupert Westenthaler <[email protected]>, > [email protected] > > > Hi Siwei, (switching to dbp-spotlight-developers, as to avoid spamming users > in dbp-spotlight-users) > Please see answers below. > > On Fri, Mar 23, 2012 at 3:51 PM, Siwei Yu <[email protected]> wrote: > Dear Pablo and Rupert, > > I'm sorry to post an incomplete email just now. Please ignore the > previous email. > > No problem. I figured it was an accidental ctrl+enter. > > > Thanks a lot for your instructions! According to your comments, let me > summarise the current status of the service mapped to the four stages: > (1) Spotting, (2) Candidate Selection, (3) Disambiguation, (4) > Filtering > /annotate: (1), (2), (3)first candidate, (4) > /candidate: (1), (2), (3)all candidate > /disambiguate: (3) > /feedback: not implemented > Please let me know if the previous summary is incorrect. > > > Correct. > > > > However, Apache Stanbol each Enhancement Engine in an Enhancement > Chain handles single task respectively (Rupert, is it true?). The > functions of Enhancement Engines are not supposed to overlap others. > We need to adjust the services of DBpedia Spotlight as follows: > /spot: (1), to be implemented in this project, for DBpediaSpotlightSpotEngine > > > It is likely that we will implement /spot for release v0.6, which may happen > before GSoC starts. > > > /candidate: (2), to be refactored from current status, for > DBpediaSpotlightCandidateEngine > /disambiguate: (3), to be refactored from current status, for > DBpediaSpotlightDisambiguateEngine > > > We would probably provide a wrapper, rather than a refactored version. > > > /filter: (4), to be implemented in this project, for > DBpediaSpotlightFilterEngine > As to /annotate, I think it's a complicated service which is not > applicable for Apache Stanbol's "single task for each Enhancement > Engine" requirement. But we can retain it for DBpedia Spotlight for > other users (i.e. not for Apache Stanbol). > > Sounds like /annotate would be an enhancement chain. > > The /feedback API could be interesting, which I'd like to try to > implement. More details should be discussed beforehand. However, I'm > not sure there's enough time to complete it in this two-month summer. > > I don't feel like wrapping DBpedia Spotlight classes is enough for a > summer-long coding project. > You should include the /feedback in your project to make it stronger. > This API should take in feedback from any CMS, as Stanbol is CMS-agnostic. > It should be able to store and later let engines query those, in order to > learn from their mistakes. > You could think, for example, about filtering implementations that would use > feedback data to stop making the same mistakes. > This is potentially the most interesting part for this project idea. > > > > If the project scopes discussed above are generally OK, I'd like to > think about the project plan and come up with a project proposal > draft. > > By the way, I have two small questions for DBpedia Spotlight Spotting > and Enhancement Chain: > 1. For Pablo, it's mentioned in [3] that there're three > implementations for Spotting: Ling Pipe Spotter, Trie Spotter, Ling > Pipe Chunk Spotter. How does /annotate determine which the best > implementation is, for a service request? Can the user choose among > them manually by sending different parameter(s)? > > We also have by now 4 other implementations. We have to update the > documentation. > Please see: > http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Daiber-Rajapakse-Sasaki-Bizer-DBpediaSpotlight-LREC2012.pdf > > 2. For Rupert, could you please show me some examples of Enhancement > Chain? I've studied some Enhancement Engines here [1]. I can > understand how an individual Enhancement Engine works and how to > implement a new one. After studying [2], I find Enhancement Chain a > little confusing. Could you please lead me to the source code of the > implementation of a concrete Enhancement Chain? I want to know the > data I/O interface from one Enhancement Engine to another. In other > words, how do the output of an Enhancement Engine become the input of > another one? > > Best regards, > Siwei Yu > > [1] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/list.html > [2] http://incubator.apache.org/stanbol/docs/trunk/enhancer/chains/ > [3] http://wiki.dbpedia.org/spotlight/technicaldocumentation?v=3qy > > > On Wed, Mar 21, 2012 at 4:27 PM, Rupert Westenthaler > > <[email protected]> wrote: > >> > >> Hi Siwei Yu, Pablo > >> > >> see my comments inline. To make it better readable I also removed the > >> parts of the mail that are not relevant to my comments. > >> > >> On Wed, Mar 21, 2012 at 12:01 AM, Pablo Mendes <[email protected]> > >> wrote: > >> > On Tue, Mar 20, 2012 at 4:24 PM, Siwei Yu <[email protected]> wrote: > >> >> 2. Should I develop one Enhancement Engine containing three services, > >> >> or three engines (i.e. each service as an engine)? It's maybe related > >> >> to the service function granularity. What's your opinion? > >> > > >> > > >> > We could have one engine for each task separately, and an enhancement > >> > chain > >> > should connect them together. We should also introduce a REST API /spot > >> > for > >> > (1). We could perhaps make /candidates implement only (2) and make > >> > /annotate > >> > accept a &verbose=on to act like the current /candidates does. > >> > > >> > Besides all of this reorganization that has to happen, Rupert is the guy > >> > from Stanbol that can help you position your application in that regard. > >> > > >> > >> I fully agree with that. > >> > >> Having separate EnhancementEngines for spotting, candidates selection > >> and disambiguation would provide a lot of additional flexibility to > >> experienced Stanbol users as they could even use parts of the DBpedia > >> Spotlight functionalities within their existing enhancement engines. > >> > >> The definition of a DBpedia Spotlight EnhancementChain ensures that > >> typical users can use Spotlight without the need to know the inner > >> working. Users would just need to send enhancement requests to > >> "http://{host}:{port}/enhancer/chin/dbpedia" assuming that the DBpedia > >> Spotlight chain is called "dbpedia". There would even be the > >> possibility to make the Dbpedia Spotlight EnhancementChain the default > >> enhancement chain so that requests to "/enhancer" would be processed > >> by it. > >> > >> >> > >> >> By the way, my name is Siwei Yu. I have good knowledge of semantic > >> >> technologies, such as RDF, OWL, SPARQL. I'm also familiar with the > >> >> mainstream Java based RDF/OWL processing tools like owlapi, Jena, > >> >> Sesame, AllegroGraph. I have strong Java coding skills with of good > >> >> knowledge of the software design patterns. My research background > >> >> meets the requirements very well. I believe it'll be a wonderful > >> >> summer working with the DBpedia Spotlight community. > >> > > >> > > >> > It would be good if you leveraged some of your Semantic Web background in > >> > your application. The idea of a /feedback API, which receives corrections > >> > made by the users could fit well in this regard. > >> > > >> > >> A feedback API is also something that would be interesting for the > >> Stanbol Enhancer. > >> > >> best > >> Rupert Westenthaler > >> > >> -- > >> | Rupert Westenthaler [email protected] > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen >
