Re: obtaining lang.train via DoccatConverter

Jörn Kottmann Fri, 24 Aug 2012 00:41:54 -0700

Hello,

try to use a newer version of OpenNLP, if I remeber correctly
this was done for 1.5.2, otherwise have a look at the trunk version.


Jörn

On 08/22/2012 03:34 PM, Duygu aralıoğlu wrote:

I need to obtain training data for turkish to use in Sentence Detector
Training to get tr-sent.bin, which will be later used in both opennlp and
wikipediaminer.

I have downloaded corpora for turkish from
http://corpora.uni-leipzig.de/download.html. Then used the command:

$ bin/opennlp DoccatConverter leipzig -lang tr -data
Leipzig/tr100k/sentences.txt >> lang.train

However, there is no DoccatConverter TOOL. How can I obtain the train data
from sentences.txt? Btw, I am working with opennlp-1.5.0

Thanks in advance...

Duygu

Re: obtaining lang.train via DoccatConverter

Reply via email to