Thanks for the answers. Unfortunately, today there is still no Indonesian corpus available publicly. My lecturer and I have been trying to create our own Indonesian corpus.
About language specific features, where can I implement them in OpenNLP? I mean, in which class exactly? Thanks, Dhito On 5/3/11, Jörn Kottmann <[email protected]> wrote: > On 5/3/11 1:24 PM, Muhammad Dhito wrote: >> Hi, >> >> I has been working on OpenNLP recently for my final project. I'm >> trying to adapt OpenNLP for Indonesian language processing. But, i'm >> just adapting four components: sentence detector, tokenizer, >> part-of-speech tagger, and chunker. >> >> Is it enough if I'm just providing the Indonesian model so I could use >> OpenNLP to process Indonesian text? > > It is of course nice if you provide the models to others, we might not > be able > to redistribute them here, but maybe you can just put them somewhere. > > On which corpus do you train? If they are publicly available it would be > nice > to add support to parse it directly to OpenNLP like we did with a couple > of corpora already. Your contribution here would be very welcome. > >> Should I make some changes in >> OpenNLP's source code according to Indonesian grammar by adding some >> language-specific features? >> > > Mabye you get better results with language specific features, we should > support that and already did first steps to make that easier, e.g. the > language > is stored inside our models. > > Please feel free to propose new features which are specific for > Indonesian, we > will see how they could be integrated. > > Thanks, > Jörn > >
