Re: getting original indexes from JCas Text (Stanford NER and UIMA)

Christopher Manning Sat, 07 Nov 2009 13:20:41 -0800

Jörn Kottmann <kottm...@...> writes:
> 
> Julio C. wrote:
> > Hi everybody,
> >
> > I'm working with the Stanford NER and UIMA and I was wondering if there's an
> > easy and clean way of get the position of word begin/end from the original
> > JCas document(from .getDocumentText()) after it was converted into
> > List<List<CoreLabel>> and processed by the NER.
> >   
> Maybe you can keep an array of your word annotations and
> then use the absolute index of a Core Label to map back to
> the word annotation which then can be used to retrieve its
> offset and length.
> 
> Otherwise you could use a map, where you map from Core Label
> to word annotation.
> 
> Jörn


Hi Julio,

I'm not sure of the UIMA end of things (whose UIMA wrapper of Stanford NER
are you using? FLorian Laws'?).

But the CoreLabel objects can store begin and end character offsets.  They're
just a map.  So if the wrapper doesn't already, it should be able to be adapted
to store the character offsets (under a key such as CharacterOffsetStart), and
then you can get it on the output.

Chris.

Re: getting original indexes from JCas Text (Stanford NER and UIMA)

Reply via email to