I think it might be helpful to handle POS tags as TypeAttributes so that
the input and output texts would cleaner and you can still filter and
retrieve tokens by type (e.g. with TypeTokenFilter).
My 2 cents,
Tommaso
Il giorno mer 19 ott 2016 alle ore 11:56 Niki Pavlopoulou
ha
Hi Steve,
thank you for your answer. I created a custom Lucene Analyser in the end.
Just to clarify on what I mean, Lucene works perfectly for pure words, but
since it does not support POS tagging some workaround needs to be done for
the analysis of tokens with POS tags. For example:
Input
Hi Niki,
> On Oct 18, 2016, at 7:27 AM, Niki Pavlopoulou wrote:
>
> Hi all,
>
> I am using Lucene and OpenNLP for POS tagging. I would like to support
> biGrams with POS tags as well. For example, I would like something like
> that:
>
> Input: (I[PRP], am[VBP], using[VBG],
Hi all,
I am using Lucene and OpenNLP for POS tagging. I would like to support
biGrams with POS tags as well. For example, I would like something like
that:
Input: (I[PRP], am[VBP], using[VBG], Lucene[NNP])
Output: (I[PRP] am[VBP], am[VBP] using[VBG], using[VBG] Lucene[NNP])
The problem above