On 7/12/11 3:11 PM, [email protected] wrote:
Added:
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
URL:http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java?rev=1145578&view=auto
==============================================================================
---
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
(added)
+++
incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/BasicEvaluationParameters.java
Tue Jul
...
+
+ @ParameterDescription(valueName = "charsetName", description = "specifies the
encoding which should be used for reading and writing text")
+ @OptionalParameter(defaultValue="UTF-8")
+ Charset getEncoding();
We should decide how we handle this, and do it consistently.
The trainers declare it as a mandatory parameter, the evaluators declare
it as optional now and take UTF-8 as default.
In my opinion we should either force the user to specify it, then he
needs to think about the encoding. Or we use the platform default
encoding, because
that is the default a user would expect by convention since all software
tools usually
operate with the platform default encoding.
Or is there a good reason to use UTF-8 as a default?
I know that this is a decision which is difficult to get right,
as far as I know we have been criticized for the current way of doing
it because people don't want to pass the encoding parameter all the time.
Jörn