[jira] [Commented] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267687#comment-15267687 ] Kellen Sunderland commented on JOSHUA-260: -- This isn't the kind of change that can be made overnight, so don't worry about not looking into it by June. It's a more long term consideration, and I can try and sell you a bit more on it next week. If we use Guice alone the benefit it would provide is that all of our implementations will be configured and hooked up in a single class at launch time, based on our launch configuration. We won't have to have branchpoints in the codebase to handle different arguments that were passed in when the library was launched. An example of code that could be simplified (in Decoder.java) would be: if (joshuaConfiguration.amortized_sorting) { Decoder.LOG(1, "Grammar sorting happening lazily on-demand."); } else { long pre_sort_time = System.currentTimeMillis(); for (Grammar grammar : this.grammars) { grammar.sortGrammar(this.featureFunctions); } Decoder.LOG(1, String.format("Grammar sorting took %d seconds.", (System.currentTimeMillis() - pre_sort_time) / 1000)); } We could replace this kind of code with a subclass of Decoder that automatically is used when a configuration option is set (in this case when the option amortized_sorting is false). This would help keep the size of a class like Decoder small, it spreads out the logic of the code to various subclasses and automatically chooses the correct subclass at launch time. So that's the benefit of just using juice and doing some OO refactoring, but there are some nice libraries that will do some of things you have on your wish-list. I think we can use some combination of args4j and typesafe config to accomplish most of the functionality you want. Args4j in particular will make it easy to generate documentation and help for any cli arguments (looks like this is already somewhat the case for the GrammarPacker). Typesafe config also allows you to override any configuration from the cli as an arg. We of course don't have to make these changes all at once. We can gradually introduce Guice and Args4j and then consider how to update the config aspects of Joshua. > Integrate IoC (Inversion of Control) into Joshua > > > Key: JOSHUA-260 > URL: https://issues.apache.org/jira/browse/JOSHUA-260 > Project: Joshua > Issue Type: Improvement >Reporter: Kellen Sunderland > > I'd like to propose we investigate looking into using guice > (https://github.com/google/guice) in conjunction with joshua's configuration > system. I believe it would give us a nice way to map what is in the > configuration to the code paths, and implementations used within Joshua. It > also would go a long way to allowing us to integrate unit tests throughout > all the important classes in Joshua. What does everyone think? Would IoC be > a good pattern to adopt? Is everyone ok with using guice (versus say some > other IoC library). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267566#comment-15267566 ] Matt Post commented on JOSHUA-260: -- This looks cool. I am not going to be able to look into it until June, but we could chat about it next week. Can you say more about how this interacts with the config system? I'd love to see that overhauled. It would be really nice to do better argument processing. The features I like in the current system are: - being able to list all parameters in a config file, but then to override them on the command line - (nice but less important) collapsing different arguments to equiv. classes (e.g., "top-n" = "topn" = "topN" etc) It would be nice to have: - builtin documentation to each parameter - the ability to invoke the decoder with -help My 20 second look at guice though seems to suggest this is something quite different, though? > Integrate IoC (Inversion of Control) into Joshua > > > Key: JOSHUA-260 > URL: https://issues.apache.org/jira/browse/JOSHUA-260 > Project: Joshua > Issue Type: Improvement >Reporter: Kellen Sunderland > > I'd like to propose we investigate looking into using guice > (https://github.com/google/guice) in conjunction with joshua's configuration > system. I believe it would give us a nice way to map what is in the > configuration to the code paths, and implementations used within Joshua. It > also would go a long way to allowing us to integrate unit tests throughout > all the important classes in Joshua. What does everyone think? Would IoC be > a good pattern to adopt? Is everyone ok with using guice (versus say some > other IoC library). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-172) Speed up grammar file reading with memory-mapped files
[ https://issues.apache.org/jira/browse/JOSHUA-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267568#comment-15267568 ] Matt Post commented on JOSHUA-172: -- Agreed. > Speed up grammar file reading with memory-mapped files > -- > > Key: JOSHUA-172 > URL: https://issues.apache.org/jira/browse/JOSHUA-172 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post > Fix For: 6.1 > > > [This > document|http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly] > should be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-145) Add truecasing
[ https://issues.apache.org/jira/browse/JOSHUA-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267521#comment-15267521 ] Matt Post commented on JOSHUA-145: -- Reclassified. I recently added a related feature to Joshua. If you invoke the decoder with -lowercase, all the input sentence tokens will be lowercased, and the grammar lookups will used the lowercase version. It then adds an annotation on each token of the form lettercase = {lower, upper, all-upper} This is available to any feature function, for example. If you also invoke the decoder with "-project-case", it will use word-level alignments to project source-language case to the target language, according to the following logic: - If aligned to the first word, case is only projected if it is "all-upper" - Otherwise, project the source-language case This does things like project all caps, and capitalization of names (including if they were OOVs). It's different from true-casing or re-casing. I haven't done a thorough comparison, but this was the method that helped put a relatively simple Joshua system in first place for WMT 2016 en-tr. > Add truecasing > -- > > Key: JOSHUA-145 > URL: https://issues.apache.org/jira/browse/JOSHUA-145 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.1 > > > Joshua currently lowercases all data; a better approach is truecasing, where > the most frequent capitalization pattern is used for each token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-172) Speed up grammar file reading with memory-mapped files
[ https://issues.apache.org/jira/browse/JOSHUA-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267193#comment-15267193 ] Kellen Sunderland commented on JOSHUA-172: -- This ticket shouldn't be open should it? In the current source it seems that the grammar is being memory mapped. > Speed up grammar file reading with memory-mapped files > -- > > Key: JOSHUA-172 > URL: https://issues.apache.org/jira/browse/JOSHUA-172 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post > Fix For: 6.1 > > > [This > document|http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly] > should be helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267086#comment-15267086 ] Matt Post commented on JOSHUA-259: -- For the Hadoop test, it currently tests rolling out its own Hadoop cluster. This is something I'd like to remove from Joshua (the ability to set up its own infrastructure), so I am going to change it so that it just tests your current one, exiting without failure if $HADOOP is not defined. Unless there are any objections. > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267080#comment-15267080 ] Matt Post commented on JOSHUA-259: -- I am having some failures, but not all of yours. - OS X 10.11: test/server/http and test/server/tcp-text - CentOS 6.7: test/thrax/extraction test/server/http test/server/tcp-text (for test/decoder/too-long: did you recompile after pulling?) The failure of most of these is an error often enough that I have just ignored them, which is bad practice. I can fix these later today. > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-259) Integration tests are failing
Kellen Sunderland created JOSHUA-259: Summary: Integration tests are failing Key: JOSHUA-259 URL: https://issues.apache.org/jira/browse/JOSHUA-259 Project: Joshua Issue Type: Bug Reporter: Kellen Sunderland Several integration tests are currently failing with Joshua. I have a quick fix coming for one of the tests but just in case we need more discussion around the failures I'll open a bug. The currently failing tests for me: test/decoder/too-long test/server/http test/server/tcp-text test/thrax/extraction and test/decoder/moses-compat (but this is easy to fix, simple extra space in the expected file) These are failing under OS X 10.11. If working under other environments feel free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)