Re: POS tagging in Lucene

2016-10-19 Thread Tommaso Teofili
I think it might be helpful to handle POS tags as TypeAttributes so that the input and output texts would cleaner and you can still filter and retrieve tokens by type (e.g. with TypeTokenFilter). My 2 cents, Tommaso Il giorno mer 19 ott 2016 alle ore 11:56 Niki Pavlopoulou ha

Re: POS tagging in Lucene

2016-10-19 Thread Niki Pavlopoulou
Hi Steve, thank you for your answer. I created a custom Lucene Analyser in the end. Just to clarify on what I mean, Lucene works perfectly for pure words, but since it does not support POS tagging some workaround needs to be done for the analysis of tokens with POS tags. For example: Input

Re: POS tagging in Lucene

2016-10-18 Thread Steve Rowe
Hi Niki, > On Oct 18, 2016, at 7:27 AM, Niki Pavlopoulou wrote: > > Hi all, > > I am using Lucene and OpenNLP for POS tagging. I would like to support > biGrams with POS tags as well. For example, I would like something like > that: > > Input: (I[PRP], am[VBP], using[VBG],

POS tagging in Lucene

2016-10-18 Thread Niki Pavlopoulou
Hi all, I am using Lucene and OpenNLP for POS tagging. I would like to support biGrams with POS tags as well. For example, I would like something like that: Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP]) Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP]) The problem above