It probably depends on your target data more than anything. If you are looking at newswire you'll have different requirements than if you are looking at e-mail. The other thing to worry about is what is the cost if you get it wrong. You could conceivably get good part-of-speech tagging results even without great sentences but if you are trying to use some deep parser it could break down a lot.
-----Original Message----- From: jonathan doklovic [mailto:[EMAIL PROTECTED] Sent: Friday, December 14, 2007 12:55 PM To: UIMA User Subject: Sentence Rules vs. Models Hi, I've been playing around with the opennlp wrappers and will probably make use of the entity detection, but I was wondering about the sentence and token detection. It seems that a model (statistical) based approach may be overkill and more of a pain to correct errors in. I was wondering if there's any reason not to use a rule based sentence/token detector that then feeds the opennlp pos and entity model based annotators? Any thought are welcome. - Jonathan
