Hi Svetoslav,
thanks for your reply.
Adopting universal tags is a future solution, but I need to play with current status of OpenNLP :-)
That's why I need deeper knowledge on what's available today.

About summarizing: I'm playing with the Paco Nathan's implementation of TextRank (https://github.com/ceteri/textrank/). I removed all WordNet references, but there is still a difference between noun tags in English (NN) and Spanish (NC).
So I guess other languages might differ as well.

Regards,

Renzo

On 24/01/2013 12:06, Svetoslav Marinov wrote:
You may look at Google's initiative for universal tagsets:

http://code.google.com/p/universal-pos-tags/

Could share a bit more about your work in document summarization? I am
personally interested in this.

Best,
Svetoslav

On 2013-01-24 11:59, "Renzo"<[email protected]>  wrote:

Hi all,
I'm pretty new to OpenNLP.
My interest is almost related to fetch document summaries using
algorithms such as TextRank.
This task requires sentence and token splitting - here's where OpenNLP
enters the game.
I also need some degree of POS to detect nouns, verbs and so on, in
order to add some linguistic support to the ranking process.

It was fairly surprising to discover that noun tags - for example - are
language dependent. Thus an "isNoun" predicate needs a specific answer
for each language. It's "NN" for English, but it may be different for
others.

I just wonder if there is a common (e.g. language-independent) way to
answer such a kind of questions.

Furthermore, is the logical format of available binary files documented
anywhere ? Is there any way to browse those files to inspect the used
tag list ?
Thanks,

Renzo





Reply via email to