Thanks for the answers.

Unfortunately, today there is still no Indonesian corpus available
publicly. My lecturer and I have been trying to create our own
Indonesian corpus.

About language specific features, where can I implement them in
OpenNLP? I mean, in which class exactly?

Thanks,
Dhito


On 5/3/11, Jörn Kottmann <[email protected]> wrote:
> On 5/3/11 1:24 PM, Muhammad Dhito wrote:
>> Hi,
>>
>> I has been working on OpenNLP recently for my  final project. I'm
>> trying to adapt OpenNLP for Indonesian language processing. But, i'm
>> just adapting four components: sentence detector, tokenizer,
>> part-of-speech tagger, and chunker.
>>
>> Is it enough if I'm just providing the Indonesian model so I could use
>> OpenNLP to process Indonesian text?
>
> It is of course nice if you provide the models to others, we might not
> be able
> to redistribute them here, but maybe you can just put them somewhere.
>
> On which corpus do you train? If they are publicly available it would be
> nice
> to add support to parse it directly to OpenNLP like we did with a couple
> of corpora already. Your contribution here would be very welcome.
>
>> Should I make some changes in
>> OpenNLP's source code according to Indonesian grammar by adding some
>> language-specific features?
>>
>
> Mabye you get better results with language specific features, we should
> support that and already did first steps to make that easier, e.g. the
> language
> is stored inside our models.
>
> Please feel free to propose new features which are specific for
> Indonesian, we
> will see how they could be integrated.
>
> Thanks,
> Jörn
>
>

Reply via email to