[jira] [Commented] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua

2016-05-02 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267687#comment-15267687
 ] 

Kellen Sunderland commented on JOSHUA-260:
--

This isn't the kind of change that can be made overnight, so don't worry about 
not looking into it by June.  It's a more long term consideration, and I can 
try and sell you a bit more on it next week.  

If we use Guice alone the benefit it would provide is that all of our 
implementations will be configured and hooked up in a single class at launch 
time, based on our launch configuration.  We won't have to have branchpoints in 
the codebase to handle different arguments that were passed in when the library 
was launched.  An example of code that could be simplified (in Decoder.java) 
would be:

 if (joshuaConfiguration.amortized_sorting) {
Decoder.LOG(1, "Grammar sorting happening lazily on-demand.");
  } else {
long pre_sort_time = System.currentTimeMillis();
for (Grammar grammar : this.grammars) {
  grammar.sortGrammar(this.featureFunctions);
}
Decoder.LOG(1, String.format("Grammar sorting took %d seconds.",
(System.currentTimeMillis() - pre_sort_time) / 1000));
  }

We could replace this kind of code with a subclass of Decoder that 
automatically is used when a configuration option is set (in this case when the 
option amortized_sorting is false).  This would help keep the size of a class 
like Decoder small, it spreads out the logic of the code to various subclasses 
and automatically chooses the correct subclass at launch time.

So that's the benefit of just using juice and doing some OO refactoring, but 
there are some nice libraries that will do some of things you have on your 
wish-list.  I think we can use some combination of args4j and typesafe config 
to accomplish most of the functionality you want.  Args4j in particular will 
make it easy to generate documentation and help for any cli arguments (looks 
like this is already somewhat the case for the GrammarPacker).  Typesafe config 
also allows you to override any configuration from the cli as an arg.

We of course don't have to make these changes all at once.  We can gradually 
introduce Guice and Args4j and then consider how to update the config aspects 
of Joshua.


> Integrate IoC (Inversion of Control) into Joshua
> 
>
> Key: JOSHUA-260
> URL: https://issues.apache.org/jira/browse/JOSHUA-260
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>
> I'd like to propose we investigate looking into using guice 
> (https://github.com/google/guice) in conjunction with joshua's configuration 
> system.  I believe it would give us a nice way to map what is in the 
> configuration to the code paths, and implementations used within Joshua.  It 
> also would go a long way to allowing us to integrate unit tests throughout 
> all the important classes in Joshua.  What does everyone think?  Would IoC be 
> a good pattern to adopt?  Is everyone ok with using guice (versus say some 
> other IoC library).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua

2016-05-02 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267566#comment-15267566
 ] 

Matt Post commented on JOSHUA-260:
--

This looks cool. I am not going to be able to look into it until June, but we 
could chat about it next week. 

Can you say more about how this interacts with the config system? I'd love to 
see that overhauled. It would be really nice to do better argument processing. 
The features I like in the current system are:

- being able to list all parameters in a config file, but then to override them 
on the command line
- (nice but less important) collapsing different arguments to equiv. classes 
(e.g., "top-n" = "topn" = "topN" etc)

It would be nice to have:

- builtin documentation to each parameter
- the ability to invoke the decoder with -help

My 20 second look at guice though seems to suggest this is something quite 
different, though?

> Integrate IoC (Inversion of Control) into Joshua
> 
>
> Key: JOSHUA-260
> URL: https://issues.apache.org/jira/browse/JOSHUA-260
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>
> I'd like to propose we investigate looking into using guice 
> (https://github.com/google/guice) in conjunction with joshua's configuration 
> system.  I believe it would give us a nice way to map what is in the 
> configuration to the code paths, and implementations used within Joshua.  It 
> also would go a long way to allowing us to integrate unit tests throughout 
> all the important classes in Joshua.  What does everyone think?  Would IoC be 
> a good pattern to adopt?  Is everyone ok with using guice (versus say some 
> other IoC library).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-172) Speed up grammar file reading with memory-mapped files

2016-05-02 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267568#comment-15267568
 ] 

Matt Post commented on JOSHUA-172:
--

Agreed.

> Speed up grammar file reading with memory-mapped files
> --
>
> Key: JOSHUA-172
> URL: https://issues.apache.org/jira/browse/JOSHUA-172
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.1
>
>
> [This 
> document|http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly]
>  should be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-145) Add truecasing

2016-05-02 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267521#comment-15267521
 ] 

Matt Post commented on JOSHUA-145:
--

Reclassified.

I recently added a related feature to Joshua. If you invoke the decoder with 
-lowercase, all the input sentence tokens will be lowercased, and the grammar 
lookups will used the lowercase version. It then adds an annotation on each 
token of the form

lettercase = {lower, upper, all-upper}

This is available to any feature function, for example. If you also invoke the 
decoder with "-project-case", it will use word-level alignments to project 
source-language case to the target language, according to the following logic:

- If aligned to the first word, case is only projected if it is "all-upper"
- Otherwise, project the source-language case

This does things like project all caps, and capitalization of names (including 
if they were OOVs). It's different from true-casing or re-casing. I haven't 
done a thorough comparison, but this was the method that helped put a 
relatively simple Joshua system in first place for WMT 2016 en-tr.

> Add truecasing
> --
>
> Key: JOSHUA-145
> URL: https://issues.apache.org/jira/browse/JOSHUA-145
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Joshua currently lowercases all data; a better approach is truecasing, where 
> the most frequent capitalization pattern is used for each token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-172) Speed up grammar file reading with memory-mapped files

2016-05-02 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267193#comment-15267193
 ] 

Kellen Sunderland commented on JOSHUA-172:
--

This ticket shouldn't be open should it?  In the current source it seems that 
the grammar is being memory mapped.

> Speed up grammar file reading with memory-mapped files
> --
>
> Key: JOSHUA-172
> URL: https://issues.apache.org/jira/browse/JOSHUA-172
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.1
>
>
> [This 
> document|http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly]
>  should be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-259) Integration tests are failing

2016-05-02 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267086#comment-15267086
 ] 

Matt Post commented on JOSHUA-259:
--

For the Hadoop test, it currently tests rolling out its own Hadoop cluster. 
This is something I'd like to remove from Joshua (the ability to set up its own 
infrastructure), so I am going to change it so that it just tests your current 
one, exiting without failure if $HADOOP is not defined. Unless there are any 
objections.

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-259) Integration tests are failing

2016-05-02 Thread Matt Post (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267080#comment-15267080
 ] 

Matt Post commented on JOSHUA-259:
--

I am having some failures, but not all of yours.

- OS X 10.11: test/server/http and test/server/tcp-text
- CentOS 6.7: test/thrax/extraction test/server/http test/server/tcp-text

(for test/decoder/too-long: did you recompile after pulling?)

The failure of most of these is an error often enough that I have just ignored 
them, which is bad practice. I can fix these later today.

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-259) Integration tests are failing

2016-05-02 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-259:


 Summary: Integration tests are failing
 Key: JOSHUA-259
 URL: https://issues.apache.org/jira/browse/JOSHUA-259
 Project: Joshua
  Issue Type: Bug
Reporter: Kellen Sunderland


Several integration tests are currently failing with Joshua.  I have a quick 
fix coming for one of the tests but just in case we need more discussion around 
the failures I'll open a bug.

The currently failing tests for me:
test/decoder/too-long
test/server/http
test/server/tcp-text
test/thrax/extraction

and 

test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
expected file)

These are failing under OS X 10.11.  If working under other environments feel 
free to post a 'works for me'.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)