On 09/01/13 08:40, ali koyuncu wrote:
And than I will save them in a file which is binary
such as en-sent.bin It is right ?( InputStream modelIn = *new*FileInputStream(
*"en-sent.bin"*);)
No, this is not accurate...You have not really read the documentation
have you? First of all you need to train a model...This is why you need
the data (correctly identified sentences). Refer to the 'Sentence
Detector Training' section here:
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect.detection
So the procedure goes something like this:
1. find or construct the data you need (turk-sent.train) *
2. train the sentence detector
3. save the trained model (turk-sent.bin)
4. finally use it
there is documentation and code snippets on the above website...make
sure you read it first. If i were you I would copy some news articles
from a Turkish newspaper and do he sentence splitting with my own eyes
in order to produce the train-set...
*for English the data would be something like this (1 sentence per line):
Pierre Vinken, 61 years old, will join the board as a nonexecutive director
Nov. 29.
Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC,
was named a director of this British industrial conglomerate.
Hope that helps,
Jim