Re: svn commit: r1587969 - /opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java

2014-04-16 Thread William Colen
What do you think of this change?

This can break compatibility with old Doccat models created using the
NGramFeatureGenerator.
But probably the old models are not working anyway.

Thank you
William


2014-04-16 13:39 GMT-03:00 :

> Author: colen
> Date: Wed Apr 16 16:39:40 2014
> New Revision: 1587969
>
> URL: http://svn.apache.org/r1587969
> Log:
> OPENNLP-673 Added prefix to the NGram feature generator
>
> Modified:
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java
>
> Modified:
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java
> URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java?rev=1587969&r1=1587968&r2=1587969&view=diff
>
> ==
> ---
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java
> (original)
> +++
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java
> Wed Apr 16 16:39:40 2014
> @@ -28,7 +28,7 @@ public class NGramFeatureGenerator imple
>  List features = new ArrayList();
>
>  for (int i = 0; i < text.length - 1; i++) {
> -  features.add(text[i] + " " + text[i + 1]);
> +  features.add("ng=" + text[i] + ":" + text[i + 1]);
>  }
>
>  return features;
>
>
>


Re: svn commit: r1587944 [1/2] - in /opennlp/trunk/opennlp-tools/src: main/java/opennlp/tools/cmdline/doccat/ main/java/opennlp/tools/doccat/ main/java/opennlp/tools/sentdetect/ main/java/opennlp/tool

2014-04-16 Thread William Colen
Jörn,

Can you please review my change to the ExtensionLoader? I modified it to
accept singletons (private constructor and the field INSTANCE).

Thank you,
William


2014-04-16 12:26 GMT-03:00 :

> Author: colen
> Date: Wed Apr 16 15:26:24 2014
> New Revision: 1587944
>
> URL: http://svn.apache.org/r1587944
> Log:
> OPENNLP-674 Added factory to Doccat
>
> Added:
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/DoccatFactory.java
>   (with props)
>
> opennlp/trunk/opennlp-tools/src/test/java/opennlp/tools/doccat/DoccatFactoryTest.java
>   (with props)
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/doccat/
>
> opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/doccat/DoccatSample.txt
>   (with props)
> Modified:
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatCrossValidatorTool.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/TrainingParams.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/DoccatCrossValidator.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/DoccatModel.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/doccat/DocumentCategorizerME.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/sentdetect/SentenceDetectorFactory.java
>
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/util/ext/ExtensionLoader.java
>
> Modified:
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatCrossValidatorTool.java
> URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatCrossValidatorTool.java?rev=1587944&r1=1587943&r2=1587944&view=diff
>
> ==
> ---
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatCrossValidatorTool.java
> (original)
> +++
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatCrossValidatorTool.java
> Wed Apr 16 15:26:24 2014
> @@ -34,8 +34,10 @@ import opennlp.tools.cmdline.doccat.Docc
>  import opennlp.tools.cmdline.params.CVParams;
>  import opennlp.tools.doccat.DoccatCrossValidator;
>  import opennlp.tools.doccat.DoccatEvaluationMonitor;
> +import opennlp.tools.doccat.DoccatFactory;
>  import opennlp.tools.doccat.DocumentSample;
>  import opennlp.tools.doccat.FeatureGenerator;
> +import opennlp.tools.tokenize.Tokenizer;
>  import opennlp.tools.util.eval.EvaluationMonitor;
>  import opennlp.tools.util.model.ModelUtil;
>
> @@ -88,13 +90,18 @@ public final class DoccatCrossValidatorT
>  FeatureGenerator[] featureGenerators = DoccatTrainerTool
>  .createFeatureGenerators(params.getFeatureGenerators());
>
> +Tokenizer tokenizer = DoccatTrainerTool.createTokenizer(params
> +.getTokenizer());
> +
>  DoccatEvaluationMonitor[] listenersArr = listeners
>  .toArray(new DoccatEvaluationMonitor[listeners.size()]);
>
>  DoccatCrossValidator validator;
>  try {
> +  DoccatFactory factory = DoccatFactory.create(params.getFactory(),
> +  tokenizer, featureGenerators);
>validator = new DoccatCrossValidator(params.getLang(), mlParams,
> -  featureGenerators, listenersArr);
> +  factory, listenersArr);
>
>validator.evaluate(sampleStream, params.getFolds());
>  } catch (IOException e) {
>
> Modified:
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
> URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java?rev=1587944&r1=1587943&r2=1587944&view=diff
>
> ==
> ---
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
> (original)
> +++
> opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/doccat/DoccatTrainerTool.java
> Wed Apr 16 15:26:24 2014
> @@ -26,16 +26,19 @@ import opennlp.tools.cmdline.TerminateTo
>  import opennlp.tools.cmdline.doccat.DoccatTrainerTool.TrainerToolParams;
>  import opennlp.tools.cmdline.params.TrainingToolParams;
>  import opennlp.tools.doccat.BagOfWordsFeatureGenerator;
> +import opennlp.tools.doccat.DoccatFactory;
>  import opennlp.tools.doccat.DoccatModel;
>  import opennlp.tools.doccat.DocumentCategorizerME;
>  import opennlp.tools.doccat.DocumentSample;
>  import opennlp.tools.doccat.FeatureGenerator;
> +import opennlp.tools.tokenize.Tokenizer;
> +import opennlp.tools.tokenize.WhitespaceTokenizer;
>  import opennlp.tools.util.ext.ExtensionLoader;
>  import opennlp.tools.util.model.ModelUtil;
>
>  public class DoccatTrainerTool
>  extends AbstractTrainerTool {
> -
> +
>interface TrainerToolParams extends TrainingParams, TrainingToolParams {
>}