Re: OpenNLP for Indonesian Language Processing

Jörn Kottmann Tue, 03 May 2011 04:54:22 -0700

On 5/3/11 1:24 PM, Muhammad Dhito wrote:

Hi,


I has been working on OpenNLP recently for my  final project. I'm
trying to adapt OpenNLP for Indonesian language processing. But, i'm
just adapting four components: sentence detector, tokenizer,
part-of-speech tagger, and chunker.

Is it enough if I'm just providing the Indonesian model so I could use
OpenNLP to process Indonesian text?

It is of course nice if you provide the models to others, we might notbe able

to redistribute them here, but maybe you can just put them somewhere.

On which corpus do you train? If they are publicly available it would benice

to add support to parse it directly to OpenNLP like we did with a couple
of corpora already. Your contribution here would be very welcome.

Should I make some changes in
OpenNLP's source code according to Indonesian grammar by adding some
language-specific features?


Mabye you get better results with language specific features, we should

support that and already did first steps to make that easier, e.g. thelanguage

is stored inside our models.

Please feel free to propose new features which are specific forIndonesian, we

will see how they could be integrated.

Thanks,
Jörn

Re: OpenNLP for Indonesian Language Processing

Reply via email to