[jira] [Closed] (OPENNLP-1179) BRAT Annotator service Fails to start
[ https://issues.apache.org/jira/browse/OPENNLP-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-1179. > BRAT Annotator service Fails to start > -- > > Key: OPENNLP-1179 > URL: https://issues.apache.org/jira/browse/OPENNLP-1179 > Project: OpenNLP > Issue Type: Bug > Components: Applications >Affects Versions: 1.8.4 > Environment: Linux (Centos) >Reporter: Daniel Russ > Fix For: 1.8.5 > > > me@machine$ brat-annotation-service > Error: Could not find or load main class opennlp.bratann.NameFinderAnnService > caused by: > /usr/bin/java -Xmx1024m -cp 'lib/*' opennlp.bratann.NameFinderAnnService > should be: > VACMD -Xmx1024m -cp $(echo $OPENNLP_HOME/lib/*.jar | tr ' ' ':') > opennlp.bratann.NameFinderAnnService $@ > as is found in the opennlp script. I thought this was fixed already, but > clearly was not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1179) BRAT Annotator service Fails to start
[ https://issues.apache.org/jira/browse/OPENNLP-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ resolved OPENNLP-1179. -- Resolution: Fixed The startup file now uses "-cp ${OpenNLPHome}/lib/* which includes all the jars in the lib directory. > BRAT Annotator service Fails to start > -- > > Key: OPENNLP-1179 > URL: https://issues.apache.org/jira/browse/OPENNLP-1179 > Project: OpenNLP > Issue Type: Bug > Components: Applications >Affects Versions: 1.8.4 > Environment: Linux (Centos) >Reporter: Daniel Russ > Fix For: 1.8.5 > > > me@machine$ brat-annotation-service > Error: Could not find or load main class opennlp.bratann.NameFinderAnnService > caused by: > /usr/bin/java -Xmx1024m -cp 'lib/*' opennlp.bratann.NameFinderAnnService > should be: > VACMD -Xmx1024m -cp $(echo $OPENNLP_HOME/lib/*.jar | tr ' ' ':') > opennlp.bratann.NameFinderAnnService $@ > as is found in the opennlp script. I thought this was fixed already, but > clearly was not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1179) BRAT Annotator service Fails to start
Daniel Russ created OPENNLP-1179: Summary: BRAT Annotator service Fails to start Key: OPENNLP-1179 URL: https://issues.apache.org/jira/browse/OPENNLP-1179 Project: OpenNLP Issue Type: Bug Components: Applications Affects Versions: 1.8.4 Environment: Linux (Centos) Reporter: Daniel Russ Fix For: 1.8.5 me@machine$ brat-annotation-service Error: Could not find or load main class opennlp.bratann.NameFinderAnnService caused by: /usr/bin/java -Xmx1024m -cp 'lib/*' opennlp.bratann.NameFinderAnnService should be: VACMD -Xmx1024m -cp $(echo $OPENNLP_HOME/lib/*.jar | tr ' ' ':') opennlp.bratann.NameFinderAnnService $@ as is found in the opennlp script. I thought this was fixed already, but clearly was not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1172) Add Annotator notes to BratAnnotation
[ https://issues.apache.org/jira/browse/OPENNLP-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-1172. Resolution: Fixed > Add Annotator notes to BratAnnotation > - > > Key: OPENNLP-1172 > URL: https://issues.apache.org/jira/browse/OPENNLP-1172 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Affects Versions: 1.8.3 >Reporter: Daniel Russ >Assignee: Daniel Russ >Priority: Minor > Fix For: 1.8.4 > > > The Brat Annotator allows Annotators to add Notes to entites/relations. The > BratAnnotation class should reflect it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1172) Add Annotator notes to BratAnnotation
Daniel Russ created OPENNLP-1172: Summary: Add Annotator notes to BratAnnotation Key: OPENNLP-1172 URL: https://issues.apache.org/jira/browse/OPENNLP-1172 Project: OpenNLP Issue Type: Improvement Components: Formats Affects Versions: 1.8.3 Reporter: Daniel Russ Priority: Minor Fix For: 1.8.4 The Brat Annotator allows Annotators to add Notes to entites/relations. The BratAnnotation class should reflect it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OPENNLP-1158) The Brat Annotation Service does not serialize results appropriately
[ https://issues.apache.org/jira/browse/OPENNLP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261394#comment-16261394 ] Daniel Russ commented on OPENNLP-1158: -- Making the fields in opennlp.bratann.NameFinderResource$NameAnn public solves this problem. > The Brat Annotation Service does not serialize results appropriately > > > Key: OPENNLP-1158 > URL: https://issues.apache.org/jira/browse/OPENNLP-1158 > Project: OpenNLP > Issue Type: Bug > Components: Applications >Affects Versions: 1.8.3 >Reporter: Daniel Russ > Fix For: 1.8.4 > > > After starting up the BratAnnotatorService NameFinderAnnSerive, BRAT passes > text to the service, but it never returns. > curl -v -H "Content-type: text/plain" -H "Accept: application/json" -X > POST -d "I am a fireman" localhost:8123/ner > * About to connect() to localhost port 8123 (#0) > * Trying 127.0.0.1... connected > * Connected to localhost (127.0.0.1) port 8123 (#0) > > POST /ner HTTP/1.1 > > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 > > zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > > Host: localhost:8123 > > Content-type: text/plain > > Accept: application/json > > Content-Length: 14 > > > < HTTP/1.1 400 Bad Request > < Content-Type: text/plain > < Date: Tue, 21 Nov 2017 19:43:15 GMT > < Connection: close > < Content-Length: 247 > < > * Closing connection #0 > No serializer found for class opennlp.bratann.NameFinderResource$NameAnn and > no properties discovered to create BeanSerializer (to avoid exception, > disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: > java.util.HashMap["0"] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1158) The Brat Annotation Service does not serialize results appropriately
Daniel Russ created OPENNLP-1158: Summary: The Brat Annotation Service does not serialize results appropriately Key: OPENNLP-1158 URL: https://issues.apache.org/jira/browse/OPENNLP-1158 Project: OpenNLP Issue Type: Bug Components: Applications Affects Versions: 1.8.3 Reporter: Daniel Russ Fix For: 1.8.4 After starting up the BratAnnotatorService NameFinderAnnSerive, BRAT passes text to the service, but it never returns. curl -v -H "Content-type: text/plain" -H "Accept: application/json" -X POST -d "I am a fireman" localhost:8123/ner * About to connect() to localhost port 8123 (#0) * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 8123 (#0) > POST /ner HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 > zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > Host: localhost:8123 > Content-type: text/plain > Accept: application/json > Content-Length: 14 > < HTTP/1.1 400 Bad Request < Content-Type: text/plain < Date: Tue, 21 Nov 2017 19:43:15 GMT < Connection: close < Content-Length: 247 < * Closing connection #0 No serializer found for class opennlp.bratann.NameFinderResource$NameAnn and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: java.util.HashMap["0"] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1133) The Gaussian smoother cannot be used
Daniel Russ created OPENNLP-1133: Summary: The Gaussian smoother cannot be used Key: OPENNLP-1133 URL: https://issues.apache.org/jira/browse/OPENNLP-1133 Project: OpenNLP Issue Type: Bug Components: Machine Learning Affects Versions: 1.8.2 Reporter: Daniel Russ Fix For: 1.8.3 In the GISTrainer, the variable useGaussianSmoothing cannot be set using the TrainingParameters -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1116) Add Concatenate Stream method for Collections of streams
Daniel Russ created OPENNLP-1116: Summary: Add Concatenate Stream method for Collections of streams Key: OPENNLP-1116 URL: https://issues.apache.org/jira/browse/OPENNLP-1116 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: 1.8.1 Reporter: Daniel Russ Priority: Trivial Fix For: 1.8.2 Minor change to opennlp.tools.util.ObjectStreamUtls. First change the signature of the createObjectStream(final ObjectStream... streams) to concatenateObjectStream(final ObjectStream... streams), and add a method concatenateObjectStream(final Collectionstreams) The reason behind this is that I often pull data from multiple files, whereas it is possible to create an array of ObjectStreams, it is easier to work with Lists. Also, the name of the method is clearer. It concatenates a list/array of ObjectStreams as opposed the the createObjectStream(final Collection collection) which makes an obectstream of items in the collection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1077) Make BratNameSampleStream constructors public
[ https://issues.apache.org/jira/browse/OPENNLP-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-1077. Resolution: Fixed > Make BratNameSampleStream constructors public > - > > Key: OPENNLP-1077 > URL: https://issues.apache.org/jira/browse/OPENNLP-1077 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Affects Versions: 1.8.0 >Reporter: Daniel Russ > Fix For: 1.8.1 > > > Reading in Brat formatted data without using the CLI or the > BratAnnotationService is difficult because the Constructor is public and the > Factory requires the program to supply the command line argument as a > String[]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (OPENNLP-1079) Refactor BratNameSampleStream
[ https://issues.apache.org/jira/browse/OPENNLP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-1079. Resolution: Fixed > Refactor BratNameSampleStream > - > > Key: OPENNLP-1079 > URL: https://issues.apache.org/jira/browse/OPENNLP-1079 > Project: OpenNLP > Issue Type: Improvement >Reporter: Daniel Russ >Priority: Minor > > Create a BratAnnotationParser that parses a BratDocument and creates a > List The NameSampleStream,read() method would call this > directly. > Consider Making the changes for the other formats as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OPENNLP-1079) Refactor BratNameSampleStream
Daniel Russ created OPENNLP-1079: Summary: Refactor BratNameSampleStream Key: OPENNLP-1079 URL: https://issues.apache.org/jira/browse/OPENNLP-1079 Project: OpenNLP Issue Type: Improvement Reporter: Daniel Russ Priority: Minor Create a BratAnnotationParser that parses a BratDocument and creates a List The NameSampleStream,read() method would call this directly. Consider Making the changes for the other formats as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OPENNLP-1077) Make BratNameSampleStream constructors public
Daniel Russ created OPENNLP-1077: Summary: Make BratNameSampleStream constructors public Key: OPENNLP-1077 URL: https://issues.apache.org/jira/browse/OPENNLP-1077 Project: OpenNLP Issue Type: Improvement Components: Formats Affects Versions: 1.8.0 Reporter: Daniel Russ Fix For: 1.8.1 Reading in Brat formatted data without using the CLI or the BratAnnotationService is difficult because the Constructor is public and the Factory requires the program to supply the command line argument as a String[]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (OPENNLP-1056) DictionaryLemmatizer throws a NullPointerException
[ https://issues.apache.org/jira/browse/OPENNLP-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-1056. Resolution: Fixed > DictionaryLemmatizer throws a NullPointerException > -- > > Key: OPENNLP-1056 > URL: https://issues.apache.org/jira/browse/OPENNLP-1056 > Project: OpenNLP > Issue Type: Bug > Components: Lemmatizer >Affects Versions: 1.8.0 >Reporter: Daniel Russ > Labels: easyfix > Fix For: 1.8.0 > > > If the word/POS combination are not found in the dictionary, the Lemmatizer > throws a NPE. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OPENNLP-1056) DictionaryLemmatizer throws a NullPointerException
Daniel Russ created OPENNLP-1056: Summary: DictionaryLemmatizer throws a NullPointerException Key: OPENNLP-1056 URL: https://issues.apache.org/jira/browse/OPENNLP-1056 Project: OpenNLP Issue Type: Bug Components: Lemmatizer Affects Versions: 1.8.0 Reporter: Daniel Russ Fix For: 1.8.0 If the word/POS combination are not found in the dictionary, the Lemmatizer throws a NPE. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (OPENNLP-987) TrainerFactory.getEventTrainer() does not take a TrainingParameter
[ https://issues.apache.org/jira/browse/OPENNLP-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ closed OPENNLP-987. --- Resolution: Duplicate Already fixed. Not in the released version yet. > TrainerFactory.getEventTrainer() does not take a TrainingParameter > -- > > Key: OPENNLP-987 > URL: https://issues.apache.org/jira/browse/OPENNLP-987 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Affects Versions: 1.7.2 >Reporter: Daniel Russ >Priority: Minor > Fix For: 1.8.0 > > > When creating a trainer, the client creates a TrainingParameter object. > Unfortunately, the TrainerFactory does not use it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OPENNLP-987) TrainerFactory.getEventTrainer() does not take a TrainingParameter
Daniel Russ created OPENNLP-987: --- Summary: TrainerFactory.getEventTrainer() does not take a TrainingParameter Key: OPENNLP-987 URL: https://issues.apache.org/jira/browse/OPENNLP-987 Project: OpenNLP Issue Type: Bug Components: Machine Learning Affects Versions: 1.7.2 Reporter: Daniel Russ Priority: Minor Fix For: 1.8.0 When creating a trainer, the client creates a TrainingParameter object. Unfortunately, the TrainerFactory does not use it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (OPENNLP-981) Training EventStream Hash was removed from event trainer
[ https://issues.apache.org/jira/browse/OPENNLP-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ resolved OPENNLP-981. - Resolution: Fixed Also added a test in PerceptronPrepAttachTest.java to check that the reportMap contained the hash. > Training EventStream Hash was removed from event trainer > > > Key: OPENNLP-981 > URL: https://issues.apache.org/jira/browse/OPENNLP-981 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Affects Versions: 1.7.2 >Reporter: Daniel Russ >Priority: Minor > Fix For: 1.8.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OPENNLP-981) Training EventStream Hash was removed from event trainer
Daniel Russ created OPENNLP-981: --- Summary: Training EventStream Hash was removed from event trainer Key: OPENNLP-981 URL: https://issues.apache.org/jira/browse/OPENNLP-981 Project: OpenNLP Issue Type: Bug Components: Machine Learning Affects Versions: 1.7.2 Reporter: Daniel Russ Priority: Minor Fix For: 1.8.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (OPENNLP-977) TrainerFactory uses Deprecated methods
[ https://issues.apache.org/jira/browse/OPENNLP-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ resolved OPENNLP-977. - Resolution: Fixed > TrainerFactory uses Deprecated methods > -- > > Key: OPENNLP-977 > URL: https://issues.apache.org/jira/browse/OPENNLP-977 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Affects Versions: 1.7.2 >Reporter: Daniel Russ >Priority: Minor > Fix For: 1.8.0 > > > getEventTrainer/getEventModelSequenceTrainer use maps instead of Training > Parameters. Also EventModelSequenceTrainer uses maps instead of > TrainingParameters. This is not an outward facing interface to most users. > All init(Map,Map ) methods should be deprecated > in favor of init(TraingParameters, Map -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OPENNLP-974) DataIndexer Param can't be set to a class name
[ https://issues.apache.org/jira/browse/OPENNLP-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854581#comment-15854581 ] Daniel Russ commented on OPENNLP-974: - this may have been addressed in OpenNLP-977 > DataIndexer Param can't be set to a class name > -- > > Key: OPENNLP-974 > URL: https://issues.apache.org/jira/browse/OPENNLP-974 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Affects Versions: 1.7.2 >Reporter: Joern Kottmann >Priority: Minor > Fix For: 1.7.3 > > > The DataIndexer factory is capable of loading a DataIndexer by its class > name. The TrainerFactory validation method rejects that as invalid. The > validation should be updated to allow this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (OPENNLP-971) Remove static training methods from GIS
[ https://issues.apache.org/jira/browse/OPENNLP-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ resolved OPENNLP-971. - Resolution: Won't Fix Fix Version/s: (was: 1.7.3) Since the class was deprecated, and all functionality of GIS was moved to GISTrainer. This fix is no longer needed > Remove static training methods from GIS > --- > > Key: OPENNLP-971 > URL: https://issues.apache.org/jira/browse/OPENNLP-971 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.7.1 >Reporter: Daniel Russ >Priority: Minor > > The pluggable TrainingParameters has been implemented. There is no reason to > call the static train methods on GIS. They should be Deprecated in 1.7.3, > and removed in a later version. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OPENNLP-929) GIS not indexing
[ https://issues.apache.org/jira/browse/OPENNLP-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ updated OPENNLP-929: Attachment: GisIndexTest.java > GIS not indexing > > > Key: OPENNLP-929 > URL: https://issues.apache.org/jira/browse/OPENNLP-929 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Reporter: Daniel Russ > Attachments: GisIndexTest.java > > > If the user calls the GIS.train(ObjectStream) methods, the trainer is > not indexing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-929) GIS not indexing
[ https://issues.apache.org/jira/browse/OPENNLP-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815589#comment-15815589 ] Daniel Russ commented on OPENNLP-929: - I am not sure why this was not caught by any of the JUnit tests. I have attached a JUnit test that fails > GIS not indexing > > > Key: OPENNLP-929 > URL: https://issues.apache.org/jira/browse/OPENNLP-929 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Reporter: Daniel Russ > > If the user calls the GIS.train(ObjectStream) methods, the trainer is > not indexing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-929) GIS not indexing
Daniel Russ created OPENNLP-929: --- Summary: GIS not indexing Key: OPENNLP-929 URL: https://issues.apache.org/jira/browse/OPENNLP-929 Project: OpenNLP Issue Type: Bug Components: Machine Learning Reporter: Daniel Russ If the user calls the GIS.train(ObjectStream) methods, the trainer is not indexing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-927) Merge TrainingParameters and PluggableParameters
Daniel Russ created OPENNLP-927: --- Summary: Merge TrainingParameters and PluggableParameters Key: OPENNLP-927 URL: https://issues.apache.org/jira/browse/OPENNLP-927 Project: OpenNLP Issue Type: New Feature Components: Machine Learning Affects Versions: 1.7.0 Reporter: Daniel Russ Priority: Minor The PluggableParameters class was added to pull out the get(Int/String/Boolean)Parameters() methods from the AbstractTrainer. Merge the functionality of the PluggableParameters into the TrainingParameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-900) DictionaryLemmatizer dictionary and LemmatizerME training format different
Daniel Russ created OPENNLP-900: --- Summary: DictionaryLemmatizer dictionary and LemmatizerME training format different Key: OPENNLP-900 URL: https://issues.apache.org/jira/browse/OPENNLP-900 Project: OpenNLP Issue Type: Bug Components: Lemmatizer Affects Versions: 1.6.0, 1.7.0 Reporter: Daniel Russ Priority: Minor Fix For: 1.7.0 The LemmatizerME training data has a format of word\tpos\tlemma. The DictionaryLemmatizer format is word\tlemma\tpos. Can we make them the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-880) Refactor the GIS trainer integration
[ https://issues.apache.org/jira/browse/OPENNLP-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768423#comment-15768423 ] Daniel Russ commented on OPENNLP-880: - Pull request made. > Refactor the GIS trainer integration > > > Key: OPENNLP-880 > URL: https://issues.apache.org/jira/browse/OPENNLP-880 > Project: OpenNLP > Issue Type: Improvement >Reporter: Joern Kottmann >Priority: Minor > Attachments: patch.txt > > > The GIS code was never reshaped to fit properly into the new Training API. > There are a couple of issues e.g. not using parameters which should be fixed. > TODO: Update this description and list the changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-880) Refactor the GIS trainer integration
[ https://issues.apache.org/jira/browse/OPENNLP-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ updated OPENNLP-880: Attachment: patch.txt > Refactor the GIS trainer integration > > > Key: OPENNLP-880 > URL: https://issues.apache.org/jira/browse/OPENNLP-880 > Project: OpenNLP > Issue Type: Improvement >Reporter: Joern Kottmann >Priority: Minor > Attachments: patch.txt > > > The GIS code was never reshaped to fit properly into the new Training API. > There are a couple of issues e.g. not using parameters which should be fixed. > TODO: Update this description and list the changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-870) Please Make ContextGenerator a Generic Type
[ https://issues.apache.org/jira/browse/OPENNLP-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595197#comment-15595197 ] Daniel Russ commented on OPENNLP-870: - /** Here is a Patch **/ diff --git a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/BasicContextGenerator.java b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/BasicContextGenerator.java index 1175ecc..ec90d60 100644 --- a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/BasicContextGenerator.java +++ b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/BasicContextGenerator.java @@ -29,7 +29,7 @@ package opennlp.tools.ml.maxent; * cp_1 cp_2 ... cp_n * */ -public class BasicContextGenerator implements ContextGenerator { +public class BasicContextGenerator implements ContextGenerator { private String separator = " "; @@ -42,8 +42,7 @@ public class BasicContextGenerator implements ContextGenerator { /** * Builds up the list of contextual predicates given a String. */ - public String[] getContext(Object o) { -String s = (String) o; + public String[] getContext(String s) { return (String[]) s.split(separator); } diff --git a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/ContextGenerator.java b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/ContextGenerator.java index 0582323..fa92846 100644 --- a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/ContextGenerator.java +++ b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/ContextGenerator.java @@ -22,11 +22,11 @@ package opennlp.tools.ml.maxent; /** * Generate contexts for maxent decisions. */ -public interface ContextGenerator { +public interface ContextGenerator { /** * Builds up the list of contextual predicates given an Object. */ - public String[] getContext(Object o); + public String[] getContext(T o); } diff --git a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/RealBasicEventStream.java b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/RealBasicEventStream.java index 97ff167..682185a 100644 --- a/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/RealBasicEventStream.java +++ b/opennlp-tools/src/main/java/opennlp/tools/ml/maxent/RealBasicEventStream.java @@ -26,7 +26,7 @@ import opennlp.tools.ml.model.RealValueFileEventStream; import opennlp.tools.util.ObjectStream; public class RealBasicEventStream implements ObjectStream { - ContextGenerator cg = new BasicContextGenerator(); + ContextGenerator cg = new BasicContextGenerator(); ObjectStream ds; public RealBasicEventStream(ObjectStream ds) { > Please Make ContextGenerator a Generic Type > --- > > Key: OPENNLP-870 > URL: https://issues.apache.org/jira/browse/OPENNLP-870 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Reporter: Daniel Russ >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > public interface ContextGenerator { > /** >* Builds up the list of contextual predicates given an Object. >*/ > public String[] getContext(T o); > } > If this is a generic method, it makes writing ContextGenerators easier to > debug at compile time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-786) Depricate EventStream
[ https://issues.apache.org/jira/browse/OPENNLP-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ updated OPENNLP-786: Description: GIS No longer accepts an EventStream in GIS.trainModel(...). There is no need for the EventStream class anymore. There are 2 AbstractEventStream classes. opennlp.tools.ml.model.AbstractEventStream and opennlp.tools.utils.EventStream. Remove or Deprecate opennlp.tools.ml.model.AbstractEventStream to avoid confusion. (was: GIS No longer accepts an EventStream in GIS.trainModel(...). ) Depricate EventStream - Key: OPENNLP-786 URL: https://issues.apache.org/jira/browse/OPENNLP-786 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: 1.6.0 Reporter: Daniel Russ Fix For: 1.6.0 GIS No longer accepts an EventStream in GIS.trainModel(...). There is no need for the EventStream class anymore. There are 2 AbstractEventStream classes. opennlp.tools.ml.model.AbstractEventStream and opennlp.tools.utils.EventStream. Remove or Deprecate opennlp.tools.ml.model.AbstractEventStream to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-759) Speed up GIS training by saving Executor in the GISTrainer
[ https://issues.apache.org/jira/browse/OPENNLP-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ updated OPENNLP-759: Attachment: GISTrainer.patch Sorry for the early problems. I used svn diff GISTrainer.patch to create this patch. I hope it works. Thank you. Dan Speed up GIS training by saving Executor in the GISTrainer -- Key: OPENNLP-759 URL: https://issues.apache.org/jira/browse/OPENNLP-759 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: maxent-3.0.3 Reporter: Daniel Russ Assignee: Joern Kottmann Priority: Minor Labels: Patch Attachments: GISTrainer.patch In GISTrainer.nextIteration(double) an ExecutorService is created and shutdown. I don't see a reason to create a ExecutorService for each iteration. If you create the ExecutorService in the TrainModels(int, dataindexer, Prior, int, int) method you can save it as a field in GISTrainer or pass it as an argument to findParameters(int, double). To test it out, I made a MyGIS and MyGISTrainer classes with the fixes. There was a 5% speedup with 100 my dataset and 100 iterations of GIS. I would be happy to share the code with you. (I can not share my data though, sorry data-use agreements). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-759) Speed up GIS training by saving Executor in the GISTrainer
[ https://issues.apache.org/jira/browse/OPENNLP-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Russ updated OPENNLP-759: Attachment: (was: GISTrainer.patch) Speed up GIS training by saving Executor in the GISTrainer -- Key: OPENNLP-759 URL: https://issues.apache.org/jira/browse/OPENNLP-759 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: maxent-3.0.3 Reporter: Daniel Russ Assignee: Joern Kottmann Priority: Minor Labels: Patch Attachments: GISTrainer.patch In GISTrainer.nextIteration(double) an ExecutorService is created and shutdown. I don't see a reason to create a ExecutorService for each iteration. If you create the ExecutorService in the TrainModels(int, dataindexer, Prior, int, int) method you can save it as a field in GISTrainer or pass it as an argument to findParameters(int, double). To test it out, I made a MyGIS and MyGISTrainer classes with the fixes. There was a 5% speedup with 100 my dataset and 100 iterations of GIS. I would be happy to share the code with you. (I can not share my data though, sorry data-use agreements). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-759) Speed up GIS training by saving Executor in the GISTrainer
[ https://issues.apache.org/jira/browse/OPENNLP-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347201#comment-14347201 ] Daniel Russ commented on OPENNLP-759: - Added the file GISTrainer.patch. This is my first patch file. If it is not formatted properly, let me know and I can either give you the new code or you could give me a hand making the patch. Speed up GIS training by saving Executor in the GISTrainer -- Key: OPENNLP-759 URL: https://issues.apache.org/jira/browse/OPENNLP-759 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: maxent-3.0.3 Reporter: Daniel Russ Priority: Minor Attachments: GISTrainer.patch In GISTrainer.nextIteration(double) an ExecutorService is created and shutdown. I don't see a reason to create a ExecutorService for each iteration. If you create the ExecutorService in the TrainModels(int, dataindexer, Prior, int, int) method you can save it as a field in GISTrainer or pass it as an argument to findParameters(int, double). To test it out, I made a MyGIS and MyGISTrainer classes with the fixes. There was a 5% speedup with 100 my dataset and 100 iterations of GIS. I would be happy to share the code with you. (I can not share my data though, sorry data-use agreements). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-759) Speed up GIS training by saving Executor in the GISTrainer
Daniel Russ created OPENNLP-759: --- Summary: Speed up GIS training by saving Executor in the GISTrainer Key: OPENNLP-759 URL: https://issues.apache.org/jira/browse/OPENNLP-759 Project: OpenNLP Issue Type: Improvement Components: Machine Learning Affects Versions: maxent-3.0.3 Reporter: Daniel Russ Priority: Minor In GISTrainer.nextIteration(double) an ExecutorService is created and shutdown. I don't see a reason to create a ExecutorService for each iteration. If you create the ExecutorService in the TrainModels(int, dataindexer, Prior, int, int) method you can save it as a field in GISTrainer or pass it as an argument to findParameters(int, double). To test it out, I made a MyGIS and MyGISTrainer classes with the fixes. There was a 5% speedup with 100 my dataset and 100 iterations of GIS. I would be happy to share the code with you. (I can not share my data though, sorry data-use agreements). -- This message was sent by Atlassian JIRA (v6.3.4#6332)