Re: about fetching common-meaning tags

Jörn Kottmann Thu, 24 Jan 2013 03:42:29 -0800

On 01/24/2013 11:59 AM, Renzo wrote:

Hi all,
I'm pretty new to OpenNLP.
My interest is almost related to fetch document summaries usingalgorithms such as TextRank.This task requires sentence and token splitting - here's where OpenNLPenters the game.I also need some degree of POS to detect nouns, verbs and so on, inorder to add some linguistic support to the ranking process.
It was fairly surprising to discover that noun tags - for example -are language dependent. Thus an "isNoun" predicate needs a specificanswer for each language. It's "NN" for English, but it may bedifferent for others.
I just wonder if there is a common (e.g. language-independent) way toanswer such a kind of questions.
Furthermore, is the logical format of available binary filesdocumented anywhere ? Is there any way to browse those files toinspect the used tag list ?

No, we did not write up a specification of our model formats. Tough, youcan find lots of information about it in various places.All the models are zip files, which contain simple artifacts, e.g. xmldictionary, etc and maxent models. You can find theformat explanation about the maxent models somewhere in maxent project,but usually that is used like a black box, because

the model can't really be modified after training.

Let us know if you have more questions about the formats, its probablyeasier when we discuss it component by component,

depending on your needs.

Tokenization, sentence splitting and the pos tagging are usually easy toget to perform nicely, especially when you do some training.The existing models are mostly trained on news articles and might notperform that well on other domains.


Jörn

Re: about fetching common-meaning tags

Reply via email to