You may look at Google's initiative for universal tagsets: http://code.google.com/p/universal-pos-tags/
Could share a bit more about your work in document summarization? I am personally interested in this. Best, Svetoslav On 2013-01-24 11:59, "Renzo" <[email protected]> wrote: >Hi all, >I'm pretty new to OpenNLP. >My interest is almost related to fetch document summaries using >algorithms such as TextRank. >This task requires sentence and token splitting - here's where OpenNLP >enters the game. >I also need some degree of POS to detect nouns, verbs and so on, in >order to add some linguistic support to the ranking process. > >It was fairly surprising to discover that noun tags - for example - are >language dependent. Thus an "isNoun" predicate needs a specific answer >for each language. It's "NN" for English, but it may be different for >others. > >I just wonder if there is a common (e.g. language-independent) way to >answer such a kind of questions. > >Furthermore, is the logical format of available binary files documented >anywhere ? Is there any way to browse those files to inspect the used >tag list ? >Thanks, > >Renzo > >
