[jira] [Comment Edited] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2016-11-30 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708118#comment-15708118
 ] 

Kellen Sunderland edited comment on JOSHUA-324 at 11/30/16 10:15 AM:
-

Opening a pull shortly to add a license to [7].  Sorry for missing this one.

The binary file highlighted in [11] is used in the regression test 
org.apache.joshua.decoder.ff.lm.berkeley_lm.LMGrammarBerkeleyTest.  I think 
it's valuable to include it as part of the test suite.  I don't think it 
includes any executable code or compiled source if that's a concern.  It's just 
a serialized POJO.  We can remove this test if needed for the release, but I'm 
going to read up on the binary policy to see if there's some way we can leave 
it in.


was (Author: kellen.sunderland):
Opening a pull shortly to add a license to [7].  Sorry for missing this one.

The binary file highlighted in [11] is used in the regression test 
org.apache.joshua.decoder.ff.lm.berkeley_lm.LMGrammarBerkeleyTest.  I think 
it's valuable to include it as part of the test suite.  I don't think it 
includes any executable code if that's a concern.  It's just a serialized POJO. 
 We can remove this test if needed for the release, but I'm going to read up on 
the binary policy to see if there's some way we can leave it in.

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-296) Refactor threading code

2016-08-29 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland resolved JOSHUA-296.
--
Resolution: Fixed

Fixed in this PR https://github.com/apache/incubator-joshua/pull/45

> Refactor threading code
> ---
>
> Key: JOSHUA-296
> URL: https://issues.apache.org/jira/browse/JOSHUA-296
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Kellen Sunderland
>Priority: Minor
> Fix For: 6.1
>
>
> The thread-handling code is a bit more complicated than it needs to be. We'd 
> like to simplify this using Executors while maintaining the current 
> stream-based processing features:
> - Input stream: decoding starts and is multithreaded even before the whole 
> input has been received (e.g., so that STDIN works)
> - Multithreading: translations are automatically assigned across threads in a 
> thread pool
> - Output stream: decoding returns right away and callers can block while 
> waiting for translations to assemble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-307) Java-based tokenization and normalization

2016-08-29 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447105#comment-15447105
 ] 

Kellen Sunderland commented on JOSHUA-307:
--

+1.  This would be great, and could go into the CLI module.

> Java-based tokenization and normalization
> -
>
> Key: JOSHUA-307
> URL: https://issues.apache.org/jira/browse/JOSHUA-307
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Priority: Minor
> Fix For: 6.2
>
>
> Currently, Joshua expects data to be lowercased, normalized, and tokenized 
> consistent with the way the training data was prepared before being passed 
> in. This requires calling Perl scripts on the input data. It would be nice if 
> these Perl scripts (located under $JOSHUA/scripts/preparation) were rewritten 
> in Java (under org.apache.joshua.util) so that Joshua could do this 
> normalization itself. This would be particularly useful for the language 
> packs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-285) Not all RuntimeExceptions are caught

2016-08-29 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446080#comment-15446080
 ] 

Kellen Sunderland commented on JOSHUA-285:
--

This is fixed in PR https://github.com/apache/incubator-joshua/pull/45 .  Any 
uncaught exception will now be propagated from the threadpool thread that it 
occurs on, back to the main thread that is iterating over translation results.  
The main thread can have control over how to handle these failures, but they 
will likely be fatal.  In the case of the CLI tool for example we can just 
crash with a stack trace.

There's also a test specifically causing a runtime exception on a worker thread 
and ensuring that it propagates to the main response thread.

> Not all RuntimeExceptions are caught
> 
>
> Key: JOSHUA-285
> URL: https://issues.apache.org/jira/browse/JOSHUA-285
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Kellen Sunderland
> Fix For: 6.1
>
>
> In many instances Joshua threads will throw a RuntimeException that is not 
> caught, causing the decoder to hang indefinitely. These should be caught and, 
> if serious enough, cause the decoder to die. An example of an error that is 
> caught is running out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-95) Vocabulary locking

2016-08-18 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426055#comment-15426055
 ] 

Kellen Sunderland commented on JOSHUA-95:
-

Yes, all the contention on Vocabulary has been removed.  

> Vocabulary locking
> --
>
> Key: JOSHUA-95
> URL: https://issues.apache.org/jira/browse/JOSHUA-95
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Juri Ganitkevitch
> Fix For: 6.2
>
>
> Vocabulary::id() is still synchronized and a potential point of contention. 
> It would be nice to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-295) Revamp dependency organization in Joshua

2016-08-18 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-295:
-
Affects Version/s: 6.2

> Revamp dependency organization in Joshua
> 
>
> Key: JOSHUA-295
> URL: https://issues.apache.org/jira/browse/JOSHUA-295
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.2
>Reporter: Kellen Sunderland
>
> We would like to separate dependencies in Joshua by create a multi-module 
> maven project.  This will allow us to decouple our codebase and make it more 
> modular.  This means consumers of Joshua who are only interested in a core 
> library do not have to pull in dependencies for things like Http servers or 
> database clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-303) Simplify feature handling code within Joshua

2016-08-18 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-303:


 Summary: Simplify feature handling code within Joshua
 Key: JOSHUA-303
 URL: https://issues.apache.org/jira/browse/JOSHUA-303
 Project: Joshua
  Issue Type: Improvement
Affects Versions: 6.2, 7
Reporter: Kellen Sunderland


There's currently a lot of code branching and special cases necessary in Joshua 
to properly handle sparse versus dense features.  We could refactor this code 
to remove the distinction which would simplify many classes (FeatureVector, 
Rule, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main

2016-08-18 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426045#comment-15426045
 ] 

Kellen Sunderland commented on JOSHUA-221:
--

Maybe we could resolve this by using args4j for the JoshuaDecoder.main?

> ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main
> ---
>
> Key: JOSHUA-221
> URL: https://issues.apache.org/jira/browse/JOSHUA-221
> Project: Joshua
>  Issue Type: Bug
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> {code}
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar 
> -version -v
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>   at joshua.decoder.ArgsParser.(ArgsParser.java:43)
>   at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-287) KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so (libken.dylib on OSX)

2016-08-13 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland resolved JOSHUA-287.
--
Resolution: Fixed

UnsatisfiedLinkErrors are now wrapped with a descriptive RuntimeException.  The 
message has been updated to indicate that kenlm has not been found, but that 
this may not be a fatal error (e.g. if you're using Berkley).  Tests have been 
updated to skip any tests relying on KenLM by wrapping previously mentioned 
descriptive RuntimeExceptions in a TestNG SkipException.

> KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so 
> (libken.dylib on OSX)
> ---
>
> Key: JOSHUA-287
> URL: https://issues.apache.org/jira/browse/JOSHUA-287
> Project: Joshua
>  Issue Type: Bug
>  Components: core, kenlm
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Kellen Sunderland
> Fix For: 6.1
>
>
> As explained in 
> http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01189.html 
> currently we have an issue, where, when checked out from master the following 
> RuntimeException is thrown.
> {code}
> ---
>  T E S T S
> ---
> Running TestSuite
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 
> OOVPenalty=-200.000 | -198.000
> ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib
> ERROR - *This probably means that the KenLM library didn't compile.
> ERROR - *Make sure that BOOST_ROOT is set to the root of your boost
> ERROR - *installation (it's not /opt/local/, the default), change to
> ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the
> ERROR - *website (joshua-decoder.org).
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> {code}
> We need to fix this such that we can run static source code analysis via 
> sonar and have our results available on analysis.apache.org.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JOSHUA-287) KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so (libken.dylib on OSX)

2016-07-28 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397701#comment-15397701
 ] 

Kellen Sunderland edited comment on JOSHUA-287 at 7/28/16 3:42 PM:
---

I should have addressed this in my latest PR.  
https://github.com/apache/incubator-joshua/pull/33

I think the correct behaviour is to throw a RuntimeException here.  There are 
two environments to consider, testing, and the general case. At test time my 
feeling is that if a user doesn't have libkenlm we should simply skip any tests 
that rely on KenLM.  We shouldn't force users to download/compile KenLM just to 
run unit tests.  In general if we make a call into the KenLM class and the 
library is not on our java.library.path it's a serious error.  We should throw 
a descriptive exception (now a KenLMLoadException which extends Runtime) and 
let the caller deal with it.  

Can you provide some details on how this breaks Sonar?  Is it still broken 
after the PR?


was (Author: kellen.sunderland):
I should have addressed this in my latest PR.  
https://github.com/apache/incubator-joshua/pull/33

I think the correct behaviour is to throw a RuntimeException here.  There are 
two environments to consider, testing, and the general case. At test time my 
feeling is that we should simply skip any tests that rely on KenLM.  We 
shouldn't force users to download/compile KenLM just to run unit tests.  In 
general if we make a call into the KenLM class and the library is not on our 
java.library.path it's a serious error.  We should throw a descriptive 
exception (now a KenLMLoadException which extends Runtime) and let the caller 
deal with it.  

Can you provide some details on how this breaks Sonar?  Is it still broken 
after the PR?

> KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so 
> (libken.dylib on OSX)
> ---
>
> Key: JOSHUA-287
> URL: https://issues.apache.org/jira/browse/JOSHUA-287
> Project: Joshua
>  Issue Type: Bug
>  Components: core, kenlm
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> As explained in 
> http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01189.html 
> currently we have an issue, where, when checked out from master the following 
> RuntimeException is thrown.
> {code}
> ---
>  T E S T S
> ---
> Running TestSuite
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 
> OOVPenalty=-200.000 | -198.000
> ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib
> ERROR - *This probably means that the KenLM library didn't compile.
> ERROR - *Make sure that BOOST_ROOT is set to the root of your boost
> ERROR - *installation (it's not /opt/local/, the default), change to
> ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the
> ERROR - *website (joshua-decoder.org).
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> {code}
> We need to fix this such that we can run static source code analysis via 
> sonar and have our results available on analysis.apache.org.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-287) KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so (libken.dylib on OSX)

2016-07-28 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397701#comment-15397701
 ] 

Kellen Sunderland commented on JOSHUA-287:
--

I should have addressed this in my latest PR.  
https://github.com/apache/incubator-joshua/pull/33

I think the correct behaviour is to throw a RuntimeException here.  There are 
two environments to consider, testing, and the general case. At test time my 
feeling is that we should simply skip any tests that rely on KenLM.  We 
shouldn't force users to download/compile KenLM just to run unit tests.  In 
general if we make a call into the KenLM class and the library is not on our 
java.library.path it's a serious error.  We should throw a descriptive 
exception (now a KenLMLoadException which extends Runtime) and let the caller 
deal with it.  

Can you provide some details on how this breaks Sonar?  Is it still broken 
after the PR?

> KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so 
> (libken.dylib on OSX)
> ---
>
> Key: JOSHUA-287
> URL: https://issues.apache.org/jira/browse/JOSHUA-287
> Project: Joshua
>  Issue Type: Bug
>  Components: core, kenlm
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> As explained in 
> http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01189.html 
> currently we have an issue, where, when checked out from master the following 
> RuntimeException is thrown.
> {code}
> ---
>  T E S T S
> ---
> Running TestSuite
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 
> OOVPenalty=-200.000 | -198.000
> ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib
> ERROR - *This probably means that the KenLM library didn't compile.
> ERROR - *Make sure that BOOST_ROOT is set to the root of your boost
> ERROR - *installation (it's not /opt/local/, the default), change to
> ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the
> ERROR - *website (joshua-decoder.org).
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> {code}
> We need to fix this such that we can run static source code analysis via 
> sonar and have our results available on analysis.apache.org.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-279) Cannot build Joshua master branch

2016-07-11 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370387#comment-15370387
 ] 

Kellen Sunderland commented on JOSHUA-279:
--

Hey Lewis, sorry about these tests failing. We should not be breaking master 
like this, I'll try to ensure this doesn't happen in the future.  I'd propose 
in the short term we ignore any tests that rely on KenLM.  

There's a few options we can look at in the long term:  

*  Figure out an acceptable way to grab the KenLM binaries as a dependency when 
our project is built.  This would mean downloading them from a reliable source, 
and ensuring the binaries match our platform.
*  Download KenLM source as a dependency and build it locally when Joshua 
builds (but don't include KenLM source in our repository).
*  I was going to look into the feasibility of mocking out the language model 
for these tests.  I'm skeptical that this will work as there's likely millions 
of calls that would need to be mocked, even in the case of a small integration 
test example.  That being said maybe we can perform some kind of simplification 
to allow the tests to be useful and still be mockable.  
*  Using a different, Java based LM for unit tests.  We could for example use 
BerkleyLM, or a very simple Java LM implementation.



> Cannot build Joshua master branch
> -
>
> Key: JOSHUA-279
> URL: https://issues.apache.org/jira/browse/JOSHUA-279
> Project: Joshua
>  Issue Type: Bug
>  Components: build, documentation, tests
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Hi Folks,
> We need to be cautious of whatever is committed to master branch... the build 
> has been broken for quite some time and there are constant Javadoc issues 
> which make the build unstable as well.
> For example, when i make an attempt to build master branch we have failing 
> tests
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install
> ...
> ---
>  T E S T S
> ---
> Running TestSuite
> tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 
> OOVPenalty=-200.000 | -198.000
> ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib
> ERROR - *This probably means that the KenLM library didn't compile.
> ERROR - *Make sure that BOOST_ROOT is set to the root of your boost
> ERROR - *installation (it's not /opt/local/, the default), change to
> ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the
> ERROR - *website (joshua-decoder.org).
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - sentence 0 too long 401, truncating to length 200
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> WARN - no grammars supplied!  Supplying dummy glue grammar.
> %
> %
> %
> %
> %
> %
> %
> %
> %
> Tests run: 126, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 1.818 sec 
> <<< FAILURE! - in TestSuite
> setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest)  
> Time elapsed: 0.075 sec  <<< FAILURE!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52)
> Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken 
> in java.library.path
>   at 
> org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52)
> Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path
>   at 
> org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52)
> Results :
> Failed tests:
> org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest)
>   Run 1: ClassBasedLanguageModelTest.setUp:52 » ExceptionInInitializer
>   Run 2: PASS
> Tests run: 124, Failures: 1, Errors: 0, Skipped: 4
> [INFO] 
> 
> [INFO] BUILD FAILURE
> {code}
> As a workaround I thought I will try to build the project without running the 
> 

[jira] [Commented] (JOSHUA-274) Use another HTTPServer other than Suns

2016-05-31 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307596#comment-15307596
 ] 

Kellen Sunderland commented on JOSHUA-274:
--

I would propose we don't put anything web server specific in the core Joshua 
jar.  In my mind the Joshua package should really contain just a translation 
library.  We could provide other jars with CLIs and a Restful service consuming 
this library (maybe even built by default).  

My reasoning is that if you are consuming Joshua strictly as a translation 
library you may prefer there be no web-service code there.  If you're already 
hosting this code in your own web service it could be quite confusing to have 
similar functionality in the library you are exposing.  Worse would be the case 
where through a configuration error you accidentally turn on a second web 
service (maybe on a different port).  

The other point I'd make is that by including http functionality in the main 
package we're adding a bunch of dependancies on things like json libs, etc.  
These dependancies could conflict with anyone wanting to use the package as a 
service. 

> Use another HTTPServer other than Suns
> --
>
> Key: JOSHUA-274
> URL: https://issues.apache.org/jira/browse/JOSHUA-274
> Project: Joshua
>  Issue Type: Improvement
>  Components: decoders
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> This issue concerns the use of the 
> [HttpServer|https://github.com/apache/incubator-joshua/blob/master/src/joshua/decoder/JoshuaDecoder.java#L31]
>  within JoshuaDecoder.java. 
> We should replace the com.sun.net.httpserver.HttpServer implementation and 
> other Sun classes with ones from the Java API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-275) Revamp the Configuration System

2016-05-27 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-275:
-
Description: 
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

{code:java}
@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...
{code}

and then a config similar to

\# Joshua configuration file
{code:javascript}
config = {
default-non-terminal = X
goal-symbol = GOAL
...

PackedGrammar: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}
{code}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.

  was:
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
default-non-terminal = X
goal-symbol = GOAL
...

PackedGrammar: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.


> Revamp the Configuration System
> ---
>
> Key: JOSHUA-275
> URL: https://issues.apache.org/jira/browse/JOSHUA-275
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.1, 6.2, 7
>Reporter: Kellen Sunderland
>
> I'd like to propose we centralize Joshua's configuration system to make use 
> of typesafe/config https://github.com/typesafehub/config .  This config 
> system looks like JSON but with comments so it's easy to read.  Because it's 
> JSON it supports hierarchies of configurations, lists of configuration etc 
> quite easily.  It has some nice features like parsing time automatically.  
> The main advantage here though is that we have a standard config system that 
> doesn't have to be manually parsed.
> Here's a quick example of how we can use it:
> {code:java}
> @Inject
> public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
>  String grammar_dir,
>  @TypesafeConfig("PackedGrammar.span_limit")
>  int span_limit, 
>  String owner, 
>  String type) throws FileNotFoundException, 
> IOException ...
> {code}
> and then a config similar to
> \# Joshua configuration file
> {code:javascript}
> config = {
> default-non-terminal = X
> goal-symbol = GOAL
> ...
> 
> PackedGrammar: {
> type: thrax,
> grammar_dir: /local/grammars/...
> span_limit: 50
> }
> ...
> }
> {code}
> Version: TBD, but it's a breaking change so we may consider putting it in 
> Joshua 7.
> Totally open to other config / injection systems if others want to suggest 
> any of their favorites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-275) Revamp the Configuration System

2016-05-27 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-275:
-
Description: 
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
default-non-terminal = X
goal-symbol = GOAL
...

PackedGrammar: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.

  was:
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
default-non-terminal = X
goal-symbol = GOAL
...

tm: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.


> Revamp the Configuration System
> ---
>
> Key: JOSHUA-275
> URL: https://issues.apache.org/jira/browse/JOSHUA-275
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.1, 6.2, 7
>Reporter: Kellen Sunderland
>
> I'd like to propose we centralize Joshua's configuration system to make use 
> of typesafe/config https://github.com/typesafehub/config .  This config 
> system looks like JSON but with comments so it's easy to read.  Because it's 
> JSON it supports hierarchies of configurations, lists of configuration etc 
> quite easily.  It has some nice features like parsing time automatically.  
> The main advantage here though is that we have a standard config system that 
> doesn't have to be manually parsed.
> Here's a quick example of how we can use it:
> @Inject
> public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
>  String grammar_dir,
>  @TypesafeConfig("PackedGrammar.span_limit")
>  int span_limit, 
>  String owner, 
>  String type) throws FileNotFoundException, 
> IOException ...
> and then a config similar to
> \# Joshua configuration file
> config = {
> default-non-terminal = X
> goal-symbol = GOAL
> ...
> 
> PackedGrammar: {
> type: thrax,
> grammar_dir: /local/grammars/...
> span_limit: 50
> }
> ...
> }
> Version: TBD, but it's a breaking change so we may consider putting it in 
> Joshua 7.
> Totally open to other config / injection systems if others want to suggest 
> any of their favorites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-275) Revamp the Configuration System

2016-05-27 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-275:
-
Description: 
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
default-non-terminal = X
goal-symbol = GOAL
...

tm: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.

  was:
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
\# Joshua configuration file
default-non-terminal = X
goal-symbol = GOAL
...

tm: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.


> Revamp the Configuration System
> ---
>
> Key: JOSHUA-275
> URL: https://issues.apache.org/jira/browse/JOSHUA-275
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.1, 6.2, 7
>Reporter: Kellen Sunderland
>
> I'd like to propose we centralize Joshua's configuration system to make use 
> of typesafe/config https://github.com/typesafehub/config .  This config 
> system looks like JSON but with comments so it's easy to read.  Because it's 
> JSON it supports hierarchies of configurations, lists of configuration etc 
> quite easily.  It has some nice features like parsing time automatically.  
> The main advantage here though is that we have a standard config system that 
> doesn't have to be manually parsed.
> Here's a quick example of how we can use it:
> @Inject
> public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
>  String grammar_dir,
>  @TypesafeConfig("PackedGrammar.span_limit")
>  int span_limit, 
>  String owner, 
>  String type) throws FileNotFoundException, 
> IOException ...
> and then a config similar to
> \# Joshua configuration file
> config = {
> default-non-terminal = X
> goal-symbol = GOAL
> ...
> 
> tm: {
> type: thrax,
> grammar_dir: /local/grammars/...
> span_limit: 50
> }
> ...
> }
> Version: TBD, but it's a breaking change so we may consider putting it in 
> Joshua 7.
> Totally open to other config / injection systems if others want to suggest 
> any of their favorites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-275) Revamp the Configuration System

2016-05-27 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-275:
-
Description: 
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

\# Joshua configuration file
config = {
\# Joshua configuration file
default-non-terminal = X
goal-symbol = GOAL
...

tm: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.

  was:
I'd like to propose we centralize Joshua's configuration system to make use of 
typesafe/config https://github.com/typesafehub/config .  This config system 
looks like JSON but with comments so it's easy to read.  Because it's JSON it 
supports hierarchies of configurations, lists of configuration etc quite 
easily.  It has some nice features like parsing time automatically.  The main 
advantage here though is that we have a standard config system that doesn't 
have to be manually parsed.

Here's a quick example of how we can use it:

@Inject
public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
 String grammar_dir,
 @TypesafeConfig("PackedGrammar.span_limit")
 int span_limit, 
 String owner, 
 String type) throws FileNotFoundException, IOException 
...

and then a config similar to

# Joshua configuration file
config = {
# Joshua configuration file
default-non-terminal = X
goal-symbol = GOAL
...

tm: {
type: thrax,
grammar_dir: /local/grammars/...
span_limit: 50
}
...
}

Version: TBD, but it's a breaking change so we may consider putting it in 
Joshua 7.

Totally open to other config / injection systems if others want to suggest any 
of their favorites.


> Revamp the Configuration System
> ---
>
> Key: JOSHUA-275
> URL: https://issues.apache.org/jira/browse/JOSHUA-275
> Project: Joshua
>  Issue Type: Improvement
>Affects Versions: 6.1, 6.2, 7
>Reporter: Kellen Sunderland
>
> I'd like to propose we centralize Joshua's configuration system to make use 
> of typesafe/config https://github.com/typesafehub/config .  This config 
> system looks like JSON but with comments so it's easy to read.  Because it's 
> JSON it supports hierarchies of configurations, lists of configuration etc 
> quite easily.  It has some nice features like parsing time automatically.  
> The main advantage here though is that we have a standard config system that 
> doesn't have to be manually parsed.
> Here's a quick example of how we can use it:
> @Inject
> public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir")
>  String grammar_dir,
>  @TypesafeConfig("PackedGrammar.span_limit")
>  int span_limit, 
>  String owner, 
>  String type) throws FileNotFoundException, 
> IOException ...
> and then a config similar to
> \# Joshua configuration file
> config = {
> \# Joshua configuration file
> default-non-terminal = X
> goal-symbol = GOAL
> ...
> 
> tm: {
> type: thrax,
> grammar_dir: /local/grammars/...
> span_limit: 50
> }
> ...
> }
> Version: TBD, but it's a breaking change so we may consider putting it in 
> Joshua 7.
> Totally open to other config / injection systems if others want to suggest 
> any of their favorites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-266) Refactor key interfaces and core code for a future release.

2016-05-13 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-266:


 Summary: Refactor key interfaces and core code for a future 
release. 
 Key: JOSHUA-266
 URL: https://issues.apache.org/jira/browse/JOSHUA-266
 Project: Joshua
  Issue Type: Improvement
Reporter: Kellen Sunderland
Priority: Minor


We've discussed making some modifications to the key interfaces.  This ticket 
can focus on making large changes to the codebase for a future release.  This 
work will likely take some time and some collaboration.  I'd suggest some the 
code for this be a separate release branch.

Some issues we can work on:
*  I'd propose we conform to the SOLID principles for our major interfaces.  
https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)  . 
*  We can look at Sparse / Dense feature vectors and how to handle them 
naturally in Joshua.
*  Refactor objects that may now be used more broadly than was originally 
intended (for example Vocabulary class).
*  We should have a general discussion around what parts of the codebase are 
responsible for what functions.  We should clearly define what logic should be 
a part of the Grammar versus the Feature Functions for example, and make sure 
logic doesn't leak from one of these objects to the others.

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-265) Refactor key interfaces and core code for a future release.

2016-05-13 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-265:


 Summary: Refactor key interfaces and core code for a future 
release. 
 Key: JOSHUA-265
 URL: https://issues.apache.org/jira/browse/JOSHUA-265
 Project: Joshua
  Issue Type: Improvement
Reporter: Kellen Sunderland
Priority: Minor


We've discussed making some modifications to the key interfaces.  This ticket 
can focus on making large changes to the codebase for a future release.  This 
work will likely take some time and some collaboration.  I'd suggest some the 
code for this be a separate release branch.

Some issues we can work on:
*  I'd propose we conform to the SOLID principles for our major interfaces.  
https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)  . 
*  We can look at Sparse / Dense feature vectors and how to handle them 
naturally in Joshua.
*  Refactor objects that may now be used more broadly than was originally 
intended (for example Vocabulary class).
*  We should have a general discussion around what parts of the codebase are 
responsible for what functions.  We should clearly define what logic should be 
a part of the Grammar versus the Feature Functions for example, and make sure 
logic doesn't leak from one of these objects to the others.

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-264) Remove system exits and replace with RuntimeExceptions

2016-05-13 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-264:


 Summary: Remove system exits and replace with RuntimeExceptions
 Key: JOSHUA-264
 URL: https://issues.apache.org/jira/browse/JOSHUA-264
 Project: Joshua
  Issue Type: Improvement
Reporter: Kellen Sunderland


When Joshua is used a library it's much more convenient to get 
RuntimeExceptions when a fatal error happens.  This way the host process can 
possibly handle the error or take some appropriate action (alarm, log, etc).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-263) Standardize logging across Joshua

2016-05-13 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland closed JOSHUA-263.

Resolution: Duplicate

> Standardize logging across Joshua
> -
>
> Key: JOSHUA-263
> URL: https://issues.apache.org/jira/browse/JOSHUA-263
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-263) Standardize logging across Joshua

2016-05-13 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland updated JOSHUA-263:
-
Description: (was: We would like to standardize logging across Joshua.  
The purpose is to provide a very loose coupling to concrete logging systems, 
such that organizations can plug in whatever loggers they want at runtime.

There's also a surprisingly large performance consideration that can be 
addressed here as well.  There's a few cases where a ton of the cpu work we're 
doing is actually to build strings that don't get logged at the logging levels 
we're running under.  Lazy evaluation of these strings should prevent this 
issue.)

> Standardize logging across Joshua
> -
>
> Key: JOSHUA-263
> URL: https://issues.apache.org/jira/browse/JOSHUA-263
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-263) Standardize logging across Joshua

2016-05-13 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-263:


 Summary: Standardize logging across Joshua
 Key: JOSHUA-263
 URL: https://issues.apache.org/jira/browse/JOSHUA-263
 Project: Joshua
  Issue Type: Improvement
Reporter: Kellen Sunderland


We would like to standardize logging across Joshua.  The purpose is to provide 
a very loose coupling to concrete logging systems, such that organizations can 
plug in whatever loggers they want at runtime.

There's also a surprisingly large performance consideration that can be 
addressed here as well.  There's a few cases where a ton of the cpu work we're 
doing is actually to build strings that don't get logged at the logging levels 
we're running under.  Lazy evaluation of these strings should prevent this 
issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua

2016-05-13 Thread Kellen Sunderland (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellen Sunderland reassigned JOSHUA-260:


Assignee: Kellen Sunderland

> Integrate IoC (Inversion of Control) into Joshua
> 
>
> Key: JOSHUA-260
> URL: https://issues.apache.org/jira/browse/JOSHUA-260
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>Assignee: Kellen Sunderland
>
> I'd like to propose we investigate looking into using guice 
> (https://github.com/google/guice) in conjunction with joshua's configuration 
> system.  I believe it would give us a nice way to map what is in the 
> configuration to the code paths, and implementations used within Joshua.  It 
> also would go a long way to allowing us to integrate unit tests throughout 
> all the important classes in Joshua.  What does everyone think?  Would IoC be 
> a good pattern to adopt?  Is everyone ok with using guice (versus say some 
> other IoC library).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua

2016-05-02 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267687#comment-15267687
 ] 

Kellen Sunderland commented on JOSHUA-260:
--

This isn't the kind of change that can be made overnight, so don't worry about 
not looking into it by June.  It's a more long term consideration, and I can 
try and sell you a bit more on it next week.  

If we use Guice alone the benefit it would provide is that all of our 
implementations will be configured and hooked up in a single class at launch 
time, based on our launch configuration.  We won't have to have branchpoints in 
the codebase to handle different arguments that were passed in when the library 
was launched.  An example of code that could be simplified (in Decoder.java) 
would be:

 if (joshuaConfiguration.amortized_sorting) {
Decoder.LOG(1, "Grammar sorting happening lazily on-demand.");
  } else {
long pre_sort_time = System.currentTimeMillis();
for (Grammar grammar : this.grammars) {
  grammar.sortGrammar(this.featureFunctions);
}
Decoder.LOG(1, String.format("Grammar sorting took %d seconds.",
(System.currentTimeMillis() - pre_sort_time) / 1000));
  }

We could replace this kind of code with a subclass of Decoder that 
automatically is used when a configuration option is set (in this case when the 
option amortized_sorting is false).  This would help keep the size of a class 
like Decoder small, it spreads out the logic of the code to various subclasses 
and automatically chooses the correct subclass at launch time.

So that's the benefit of just using juice and doing some OO refactoring, but 
there are some nice libraries that will do some of things you have on your 
wish-list.  I think we can use some combination of args4j and typesafe config 
to accomplish most of the functionality you want.  Args4j in particular will 
make it easy to generate documentation and help for any cli arguments (looks 
like this is already somewhat the case for the GrammarPacker).  Typesafe config 
also allows you to override any configuration from the cli as an arg.

We of course don't have to make these changes all at once.  We can gradually 
introduce Guice and Args4j and then consider how to update the config aspects 
of Joshua.


> Integrate IoC (Inversion of Control) into Joshua
> 
>
> Key: JOSHUA-260
> URL: https://issues.apache.org/jira/browse/JOSHUA-260
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Kellen Sunderland
>
> I'd like to propose we investigate looking into using guice 
> (https://github.com/google/guice) in conjunction with joshua's configuration 
> system.  I believe it would give us a nice way to map what is in the 
> configuration to the code paths, and implementations used within Joshua.  It 
> also would go a long way to allowing us to integrate unit tests throughout 
> all the important classes in Joshua.  What does everyone think?  Would IoC be 
> a good pattern to adopt?  Is everyone ok with using guice (versus say some 
> other IoC library).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-172) Speed up grammar file reading with memory-mapped files

2016-05-02 Thread Kellen Sunderland (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267193#comment-15267193
 ] 

Kellen Sunderland commented on JOSHUA-172:
--

This ticket shouldn't be open should it?  In the current source it seems that 
the grammar is being memory mapped.

> Speed up grammar file reading with memory-mapped files
> --
>
> Key: JOSHUA-172
> URL: https://issues.apache.org/jira/browse/JOSHUA-172
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.1
>
>
> [This 
> document|http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly]
>  should be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-259) Integration tests are failing

2016-05-02 Thread Kellen Sunderland (JIRA)
Kellen Sunderland created JOSHUA-259:


 Summary: Integration tests are failing
 Key: JOSHUA-259
 URL: https://issues.apache.org/jira/browse/JOSHUA-259
 Project: Joshua
  Issue Type: Bug
Reporter: Kellen Sunderland


Several integration tests are currently failing with Joshua.  I have a quick 
fix coming for one of the tests but just in case we need more discussion around 
the failures I'll open a bug.

The currently failing tests for me:
test/decoder/too-long
test/server/http
test/server/tcp-text
test/thrax/extraction

and 

test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
expected file)

These are failing under OS X 10.11.  If working under other environments feel 
free to post a 'works for me'.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)