Carlos,
It is possible to train a sentence detector to separate sentences;
however, you will have to provide your own training set. For the
training set you wouldn't have any punctuation and each sentence would
be on a separate line.
Be warned, you will need a lot of training data in this case due to the
absence of the punctuation.
The harder part will be getting a model to add the proper punctuation.
In English we have the keywords of: How, When, Where, Who, What... to
help determine questions. Other languages use other keys to denote
questions, statements, and expressions in a sentence.
Hopefully, you don't have to work with English; because, most cases it
isn't easy to determine sentence boundaries based on the grammar or word
composition alone. English is very bad about that.
Good Luck, it sounds like you have an interesting problem.
James Kosin
On 12/25/2015 1:15 AM, Carlos A wrote:
Hello all,
Is there any better way to separate sentences, that have NO punctuation,
with OpenNLP?
The sentence parser will not work in some cases.
In other words, I would like to be able to separate phrases, do some sort
of Sentence Boundary Segmentation Disambiguation on text that are
transcripts which have no punctuation. And then, once sentences are
separated, add the punctuation with a period or a question mark if the
sentence starts as a question.
Something like using the chunker so that I can determine the sentences
based on their NP VP, NP VP NP, and so on.
Thank you.
C.