TokenizerMEEvaluatorTool.java

Jörn Kottmann Tue, 12 Jul 2011 06:35:32 -0700

On 7/12/11 3:11 PM, [email protected] wrote:

Added: 
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
URL:http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java?rev=1145578&view=auto
==============================================================================
--- 
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
 (added)
+++ 
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
 Tue Jul

...

+
+  @ParameterDescription(valueName = "charsetName", description = "specifies the 
encoding which should be used for reading and writing text")
+  @OptionalParameter(defaultValue="UTF-8")
+  Charset getEncoding();


We should decide how we handle this, and do it consistently.
The trainers declare it as a mandatory parameter, the evaluators declare
it as optional now and take UTF-8 as default.

In my opinion we should either force the user to specify it, then he

needs to think about the encoding. Or we use the platform defaultencoding, becausethat is the default a user would expect by convention since all softwaretools usually

operate with the platform default encoding.

Or is there a good reason to use UTF-8 as a default?

I know that this is a decision which is difficult to get right,
as far as I know we have been criticized for the current way of doing
it because people don't want to pass the encoding parameter all the time.

Jörn

Re: svn commit: r1145578 - in /incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline: BasicEvaluationParameters.java sentdetect/SentenceDetectorEvaluatorTool.java tokenizer/TokenizerMEEvaluatorTool.java

Reply via email to