Re: Sentence Rules vs. Models

[EMAIL PROTECTED] Fri, 14 Dec 2007 10:36:56 -0800

You might try the sentence boundary detector from the International Components 
for Unicode project:


http://icu-project.org/userguide/boundaryAnalysis.html

This implements the Unicode Standard Annex 29 rules (expressed as regular 
expressions).

This also detects boundaries for characters, words, and lines.

I haven't tried it myself, so I don't know how well it works.  However, they do 
say in their documentation that these are relatively simple rules and some 
applications may require more sophisticated linguistic analysis.  On the other 
hand, the rules cover many languages.


Greg Holmberg

 -------------- Original message ----------------------
From: jonathan doklovic <[EMAIL PROTECTED]>
> Hi,
> 
> I've been playing around with the opennlp wrappers and will probably
> make use of the entity detection, but I was wondering about the sentence
> and token detection.
> 
> It seems that a model (statistical) based approach may be overkill and
> more of a pain to correct errors in.
> 
> I was wondering if there's any reason not to use a rule based
> sentence/token detector that then feeds the opennlp pos and entity model
> based annotators?
> 
> Any thought are welcome.
> 
> - Jonathan

Re: Sentence Rules vs. Models

Reply via email to