It is English.
I am using Lucene StandardAnalyzer, it index the words at correct
positions. Can we map the token position from OpenNLP to Lucene?

Tri.

On Sun, Nov 6, 2011 at 7:28 AM, James Kosin <[email protected]> wrote:

> Tri,
>
> Unfortunately, it depends on the input language.  Only thing I've found
> is it may be better to find the tokens that are punctuation.  A hint is
> most tokens that are punctuation are a single character wide.  But,
> again that may not be the case depending on the encoding and the
> punctuation.  Words are usually a bit longer.
>
> James
>
> On 11/5/2011 2:14 PM, Tri Nguyen wrote:
> > Thank you James,
> > I don't count the token having pattern ".*[A-Za-z0-9]+.*" and check some
> > cases it works.
> > The token is not satisfied that pattern can be a punctuation. Is that
> > pattern enough to cover a keyword?
> > Can we incorporate Lucene and OpenNLP so that the keyword position and
> > Named Entity position are compatible?
> >
> >
> > On Sun, Nov 6, 2011 at 12:22 AM, James Kosin <[email protected]>
> wrote:
> >
> >> Tri,
> >>
> >> You could just subtract the number of punctuation tokens from the
> >> offsets you get.
> >> On 11/5/2011 1:08 PM, Tri Nguyen wrote:
> >>> On Sat, Nov 5, 2011 at 11:30 PM, Jörn Kottmann <[email protected]>
> >> wrote:
> >>>> On 11/5/11 4:53 PM, Tri Nguyen wrote:
> >>>>
> >>>>> Obama is correct, but Bill Gates. Since the NameFinderME return the
> >> token
> >>>>> index (position in the token array) not the keyword position (the
> >> keyword
> >>>>> position in the text). I want to cooperate with keyword position in
> >>>>> Lucene.
> >>>>>
> >>>> What is a keyword position?
> >>>>
> >>> It is the order of the word in the text.
> >>> Ex:
> >>> Barack: 0
> >>> Obama: 1
> >>> president: 3
> >>> US: 5
> >>> he: 6
> >>> 1961: 11
> >>> Bill: 12
> >>>
> >>>> Jörn
> >>>>
> >>
>
>

Reply via email to