Hi all,

 I think you should start with a really simple solution for this and then
> improve this first simple algorithm.


This was exactly the approach taken by the DBpedia Spotlight project. We
have built a few entity linkers (a.k.a. disambiguators) based on Lucene
first, and started incrementally making them more sophisticated. If you are
a fan of not repeating work, please feel free to look at what we've done.

http://spotlight.dbpedia.org

Our disambiguators will be integrated as EnhancementEngines in Stanbol
within the next couple of months.

If you're a fan of reimplementing things to make them better, I'd say you
should look elsewhere. There are some interesting approaches out there that
have not been open sourced, but that have papers describing their
algorithms. Implementing them would be probably more beneficial for the
community than reimplementing what we did.

Cheers,
Pablo

On Mon, Apr 23, 2012 at 3:56 PM, kritarth anand <[email protected]>wrote:

> Thanks a lot Fabian for your inputs. I'll definitely add on them in my
> proposal.
>
> On Mon, Apr 23, 2012 at 7:23 PM, Fabian Christ <
> [email protected]
> > wrote:
>
> > Hi Kritarth,
> >
> > I have read your proposal and building such a disambiguation engine is
> > a challenging task. Here are some thoughts:
> >
> > - Did you think about restriction for the domain, or the kind of text
> > that this engine would/should work best for? It is often the case that
> > you can not implement the single engine that always works well. So
> > maybe you should think a little bit about the kind of content that you
> > would like to support.
> >
> > - Do you have access to any scientific network? Perhaps looking in the
> > scientific world for published papers about entity disambiguation may
> > give you some ideas and would widen your view on the problem.
> >
> > - I think you should start with a really simple solution for this and
> > then improve this first simple algorithm. Having a simple trivial
> > solution makes it more easy to have something to compare. Sometimes it
> > happens that the advanced algorithms are not any better than the
> > trivial ones. So try it ;)
> >
> > Best,
> >  - Fabian
> >
> > Am 18. April 2012 11:02 schrieb kritarth anand <[email protected]
> >:
> > > Hi guys,
> > >
> > > Hope your doing well. I was advised by my supervisor Dr. Rupert that to
> > > interest people in my application, I should provide little summary of
> my
> > > proposal. Please do have a look at it below, in case you do find it
> > > interesting or if you might want to suggest something on that. You may
> > rad
> > > the entire documents
> > >
> > > My proposal is Entity Disambiguation as an Enhancement engine in
> >  Stanbol.
> > > You can have a look at it JIRA page,
> > https://issues.apache.org/jira/browse/*
> > > STANBOL*-223 . I propose to build it during the summers as a part of
> > Google
> > > Summer of Code. Any advice from you guys is most welcome
> > >
> > > Kritarth
> > >
> > > On Tue, Apr 17, 2012 at 8:36 PM, kritarth anand <
> > [email protected]>wrote:
> > >
> > >> Hi Guys,
> > >>
> > >> Hope your doing well. Please do take out few minutes and have a look
> at
> > my
> > >> proposal. Your feedback is extremely valuable for me.
> > >>
> > >> Kritarth
> > >>
> > >>
> > >> On Mon, Apr 16, 2012 at 12:23 AM, kritarth anand <
> > [email protected]
> > >> > wrote:
> > >>
> > >>> Dear Fabian,
> > >>>
> > >>> Thanks for pointing it out.
> > >>>
> > >>> @All
> > >>>
> > >>> I have attached the PDF versions of my proposal and Background Info
> > with
> > >>> this mail. You may also find the proposal on this Google Document
> > >>>
> > >>>
> > >>>
> >
> https://docs.google.com/document/d/1BA0x9craA2kiFn0jM-66HSS7SFCk5Q5U5gyEWaRftIk/edit
> > >>>
> > >>> It is editable so you might add on comments there itself  so that you
> > can
> > >>> add on some one elses advice too. You can anyways mail me.
> > >>>
> > >>> Kritarth Anand
> > >>>
> > >>> On Mon, Apr 16, 2012 at 12:13 AM, Fabian Christ <
> > >>> [email protected]> wrote:
> > >>>
> > >>>> Hi Kritarth,
> > >>>>
> > >>>> and welcome to Stanbol. Could you share the proposal in any open
> > >>>> format like PDF, HTML, plain text or via an URL? Not all of us have
> > >>>> access to the newest M$ office suite.
> > >>>>
> > >>>> Thanks, and looking forward for your contribution!
> > >>>>
> > >>>> Best,
> > >>>>  - Fabian
> > >>>>
> > >>>> Am 15. April 2012 10:21 schrieb kritarth anand <
> > [email protected]
> > >>>> >:
> > >>>> > Hi,
> > >>>> >
> > >>>> > I would like to convey my warm greetings to the entire Stanbol
> > >>>> community. My
> > >>>> > name is Kritarth Anand. I study Computer Science and Indian
> > Institute
> > >>>> of
> > >>>> > Technology Delhi. I am a potential candidate working on “Entity
> > >>>> > disambiguation in Stanbol enhancement engines” as part of Google
> > >>>> Summer of
> > >>>> > Code. If I am successful, I‘ll be coordinating with you guys.
> > >>>> >
> > >>>> >
> > >>>> > I write to you all to request for some feedback on my proposal, I
> > have
> > >>>> given
> > >>>> > out below. You might be able to give me valuable suggestions to
> > >>>> improve my
> > >>>> > proposal, incorporate details, omit unnecessary ones and get a
> more
> > >>>> > realistic with timeline that I have suggested.
> > >>>> >
> > >>>> >
> > >>>> > Please feel free to discuss any matters whenever you might like. I
> > have
> > >>>> > attached two documents with this mail. One of the of two is the
> > >>>> proposal
> > >>>> > suggested and the other little bit details about my background.
> > >>>> >
> > >>>> >
> > >>>> > Kritarth Anand
> > >>>> >
> > >>>> > www.cse.iitd.ac.in/~cs5080213<
> http://www.cse.iitd.ac.in/%7Ecs5080213><
> > http://www.cse.iitd.ac.in/%7Ecs5080213>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Fabian
> > >>>> http://twitter.com/fctwitt
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
> >
> > --
> > Fabian
> > http://twitter.com/fctwitt
> >
>

Reply via email to