My philosophy for similarity component is that an engineer without background in linguistic can do text processing. He/she would install OpenNLP, and would call assessRelevance(text1, text2) function, without any knowledge of what is heppening inside. That would significantly extend the user base of OpenNLP. The problem domains I used for illustration is search (a standard domain for linguistic apps) and content generation (a state-of-art technology, in my opinion). Again, to incorporate these into user apps users do not need to know anything about parsing, chunking, etc. RegardsBoris
> Date: Fri, 2 Dec 2011 13:10:23 +0100 > From: [email protected] > To: [email protected] > Subject: Re: any hints on how to get chunking info from Parse? > > On 12/1/11 8:08 PM, Boris Galitsky wrote: > > I spent last couple of weeks understanding how OpenNLP parser does > > chunking, how chunking occurs separately in opennlp.tools.chunker, and I > > came to conclusion that using independently trained chunker on the results > > of parser gives significantly higher accuracy of resultant parsing, and > > therefore makes 'similarity' component much more accurate as a result. > > Lets look at an example (I added stars): > > two NP& VP are extracted, but what kills similarity component is the last > > part of the latter: > > ****to-TO drive-NN**** > > Parse Tree Chunk list = [NP [Its-PRP$ classy-JJ design-NN and-CC the-DT > > Mercedes-NNP name-NN ], VP [make-VBP it-PRP a-DT very-RB cool-JJ vehicle-NN > > *******to-TO drive-NN**** ]] > > > > When I apply the chunker which has its own problems ( but most importantly > > was trained independently) I can then apply rules to fix these cases for > > matching with other sub-VP like 'to-VB'. > > I understand it works slower that way. > > I would propose we have two version of similarity, one that just does > > without chunker and one which uses it (and also an additional 'correction' > > algo ? ). > > I have now both versions, but only the latter passes current tests. > > Ok, sounds good to me, but we should assume that the user can run the > parser and chunker them self. Your similarity component simply accepts > a parse tree in one case and a parse tree plus chunks in the other case. > > What do you think? > > Jörn
