On 8/2/11 7:42 PM, [email protected] wrote:
Hi,

To the application I am developing it is important to know the head of a
chunk.

I added a * to the chunk tag to mark tokena that are the head of the phrase.
For example I have:

Me pron-pers *B-NP
pergunto v-fin B-VP
sempre adv *B-ADVP
quem pron-indp *B-NP
podia v-fin B-VP
ter v-inf I-VP
sido v-pcp I-VP
aquele pron-det B-NP
jovem adj I-NP
alemão n *I-NP
. . O

It is working OK and the F-1 is almost the same as if I there was no head
mark.
But I have some issues. With this mark the method Chunker.chunkAsSpans() and
the UIMA Chunker doesn't work properly because the current implementation
don't know how to handle the * while computing the spans.

I would like to ask you if adding it to OpenNLP is a good idea. If yes I
would change the trunk code to handle this head symbol, or maybe you should
give me some advise on how to do that without the need of changing the
current implementation.


Would it be available for other languages also?
Maybe most people who needs this might just use the parser.

Anyway it should be easy to extend the ChunkerME class in a way that modifying
the labels as you did is possible without modifying the OpenNLP code.

Jörn

Reply via email to