Re: sentence model

Jim foo.bar Wed, 09 Jan 2013 04:12:21 -0800

On 09/01/13 08:40, ali koyuncu wrote:

And than I will save them in a file which is binary
such as en-sent.bin It is right ?( InputStream modelIn = *new*FileInputStream(
*"en-sent.bin"*);)

No, this is not accurate...You have not really read the documentationhave you? First of all you need to train a model...This is why you needthe data (correctly identified sentences). Refer to the 'SentenceDetector Training' section here:

http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect.detection

So the procedure goes something like this:

1. find or construct the data you need (turk-sent.train) *
2. train the sentence detector
3. save the trained model (turk-sent.bin)
4. finally use it

there is documentation and code snippets on the above website...makesure you read it first. If i were you I would copy some news articlesfrom a Turkish newspaper and do he sentence splitting with my own eyesin order to produce the train-set...


*for English the data would be something like this (1 sentence per line):

Pierre Vinken, 61 years old, will join the board as a nonexecutive director 
Nov. 29.
Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC,
    was named a director of this British industrial conglomerate.


Hope that helps,

Jim

Re: sentence model

Reply via email to