[jira] [Commented] (JOSHUA-335) Consider using thrax2
[ https://issues.apache.org/jira/browse/JOSHUA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606589#comment-16606589 ] Lewis John McGibbney commented on JOSHUA-335: - [~mjwall] bq. Either by replacing it in the pipeline.pl this is the quickest solution bq. or reworking the pipeline as I have seen discussed in other tickets. This is what needs done. It is hellish and I think we should invest a GSoC project in trying to achieve it. Can you reference the ticket here please if there is one? > Consider using thrax2 > - > > Key: JOSHUA-335 > URL: https://issues.apache.org/jira/browse/JOSHUA-335 > Project: Joshua > Issue Type: Improvement > Components: thrax >Reporter: Michael Wall >Priority: Minor > > Ran across this https://github.com/jweese/thrax2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JOSHUA-335) Consider using thrax2
[ https://issues.apache.org/jira/browse/JOSHUA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606465#comment-16606465 ] Lewis John McGibbney commented on JOSHUA-335: - [~mjwall] this is interesting. IIRC Thrax is always used when building LM's in Joshua. Are you planning on taking this on? > Consider using thrax2 > - > > Key: JOSHUA-335 > URL: https://issues.apache.org/jira/browse/JOSHUA-335 > Project: Joshua > Issue Type: Improvement > Components: thrax >Reporter: Michael Wall >Priority: Minor > > Ran across this https://github.com/jweese/thrax2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JOSHUA-334) Update Homebrew Formular with all language pack options
[ https://issues.apache.org/jira/browse/JOSHUA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351304#comment-16351304 ] Lewis John McGibbney commented on JOSHUA-334: - Progress can be seen at https://github.com/lewismc/homebrew-core/tree/joshua_language_packs Still lots of SHA256 calculation and remote URL resolution fir dropbox but we are getting there. > Update Homebrew Formular with all language pack options > --- > > Key: JOSHUA-334 > URL: https://issues.apache.org/jira/browse/JOSHUA-334 > Project: Joshua > Issue Type: Improvement > Components: homebrew-formula, language packs >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 6.2 > > > When I originally wrote the [Homebrew > Formula|https://github.com/Homebrew/homebrew-core/blob/00eea5b204b069416142352ca24314c024f5d6c7/Formula/joshua.rb#L18-L20], > I added options for installing the old *with-es-en-phrase-pack*, > *with-ar-en-phrase-pack* and *with-zh-en-hiero-pack* language packs. > Back then, these were staged on Matt's server at Johns Hopkin but they have > since been relocated to Tom's dropbox. Additionally, we now have a wealth of > other language packs which are not currently available through the Formula. > This issue is pretty large in scope, but in essence will update the Formula > to provide options for installing [all of our language > packs|https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs]. > Once this is done, it will be very powerful and extremely useful tooling for > Joshua. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (JOSHUA-328) failure when glue grammar is listed first
[ https://issues.apache.org/jira/browse/JOSHUA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-328: Fix Version/s: (was: 6.1) 6.2 > failure when glue grammar is listed first > - > > Key: JOSHUA-328 > URL: https://issues.apache.org/jira/browse/JOSHUA-328 > Project: Joshua > Issue Type: Bug >Affects Versions: 6.1 >Reporter: Matt Post >Priority: Major > Fix For: 6.2 > > > If doing CKY-decoding (-search cky), listing the glue grammar before the > packed grammar results in a parsing failure. E.g., the following lines in the > config file: > tm = thrax -maxspan -1 -owner glue -path model/glue.grammar > tm = thrax -maxspan 20 -path model/grammar.packed -owner pt > will result in failed decoding every time, and a printing of the following > error message: > ERROR - the goal_bin does not have exactly one item -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JOSHUA-332) Merge 7 branch into master
[ https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320535#comment-16320535 ] Lewis John McGibbney commented on JOSHUA-332: - If this is an entire PITA I would just leave it and close it as not an issue. > Merge 7 branch into master > --- > > Key: JOSHUA-332 > URL: https://issues.apache.org/jira/browse/JOSHUA-332 > Project: Joshua > Issue Type: Task > Components: core >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 7 > > > As discussed on the mailing list, let's branch _master_ into a _6x_ branch > and merge branch _7_ into _master_ in order to keep developing on top of the > latest in the main branch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (JOSHUA-333) The English-English Language Pack download links are broken.
[ https://issues.apache.org/jira/browse/JOSHUA-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313425#comment-16313425 ] Lewis John McGibbney commented on JOSHUA-333: - [~bugg_tb] were these files copied when we migrated from [~post]'s server to Dropbox? > The English-English Language Pack download links are broken. > > > Key: JOSHUA-333 > URL: https://issues.apache.org/jira/browse/JOSHUA-333 > Project: Joshua > Issue Type: Bug >Reporter: David Gonzalez > > On the Apache Joshua English-English wiki page the ruleset (PPDB v2) > downloads are all broken (404). > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65142863 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (JOSHUA-332) Merge 7 branch into master
[ https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220815#comment-16220815 ] Lewis John McGibbney commented on JOSHUA-332: - Damn Tommaso. Is there still a lot of work to do? > Merge 7 branch into master > --- > > Key: JOSHUA-332 > URL: https://issues.apache.org/jira/browse/JOSHUA-332 > Project: Joshua > Issue Type: Task > Components: core >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 7 > > > As discussed on the mailing list, let's branch _master_ into a _6x_ branch > and merge branch _7_ into _master_ in order to keep developing on top of the > latest in the main branch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (JOSHUA-332) Merge 7 branch into master
[ https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219517#comment-16219517 ] Lewis John McGibbney commented on JOSHUA-332: - [~teofili] I see that your recent [link mailing list discussion|https://lists.apache.org/thread.html/b43cdffd8f3ea7b7c70929eed4aaa989af31bcdc5b5e8320ff412dd4@%3Cdev.joshua.apache.org%3E] may have not been resolved yet. Is this preventing the replacement of current master with 7 branch? Thanks > Merge 7 branch into master > --- > > Key: JOSHUA-332 > URL: https://issues.apache.org/jira/browse/JOSHUA-332 > Project: Joshua > Issue Type: Task > Components: core >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 7 > > > As discussed on the mailing list, let's branch _master_ into a _6x_ branch > and merge branch _7_ into _master_ in order to keep developing on top of the > latest in the main branch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
[ https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876442#comment-15876442 ] Lewis John McGibbney commented on JOSHUA-324: - [~teofili] yes thank you very much, please do. > Address Apache Joshua 6.1 RC#2 Issues > - > > Key: JOSHUA-324 > URL: https://issues.apache.org/jira/browse/JOSHUA-324 > Project: Joshua > Issue Type: Task >Affects Versions: 6.1 >Reporter: Lewis John McGibbney >Assignee: Tommaso Teofili >Priority: Blocker > Fix For: 6.1 > > > Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows > {code} > == > - Your missing incubating in the release artifacts name. [1] > - There are a number of binary files in the source release that look to be > compiled source code. > I checked: > - name doesn’t include incubating > - signatures and hashes correct > - DISCLAIMER exists > - LICENSE is missing a few things (see below) > - a source file is missing an Apache header [7] > - Several unexpected binary files are contained in the source release > [8][9][10][11] > - Can compile from source > License is missing: > - MIT licensed normalize.css v3.0.3 bundled in [5] > - glyph icon fonts [6] > Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually > both are bare or both have .txt extension. > Also while looking at your site I noticed that the download links of you > incubating site [2] points to github, please change to point to the offical > release area. > Also the 6.1 release has already been tagged and it available for public > download on github [4] before this vote is finished. This is IMO against > Apache release policy [3] please remove. > I also notice you recently released the language packs (18th Nov) but there > doesn’t seem to have been a vote for that? Any reason for this? > === > [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases > [2] > https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home > [3] http://www.apache.org/dev/release.html#what > [4] https://github.com/apache/incubator-joshua/releases > [5] ./demo/bootstrap/css/bootstrap.min.css > [6] apache-joshua-6.1/demo/bootstrap/fonts/* > [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java > [8] ./bin/GIZA++ > [9] ./bin/mkcls > [10 ]./bin/snt2cooc.out > [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz > [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html > [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html > {code} > This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
[ https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838087#comment-15838087 ] Lewis John McGibbney commented on JOSHUA-324: - [~post] the only pending issue is the mvn assembly issue I described at http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg02023.html I'll have a crack today and try to resolve it. > Address Apache Joshua 6.1 RC#2 Issues > - > > Key: JOSHUA-324 > URL: https://issues.apache.org/jira/browse/JOSHUA-324 > Project: Joshua > Issue Type: Task >Affects Versions: 6.1 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows > {code} > == > - Your missing incubating in the release artifacts name. [1] > - There are a number of binary files in the source release that look to be > compiled source code. > I checked: > - name doesn’t include incubating > - signatures and hashes correct > - DISCLAIMER exists > - LICENSE is missing a few things (see below) > - a source file is missing an Apache header [7] > - Several unexpected binary files are contained in the source release > [8][9][10][11] > - Can compile from source > License is missing: > - MIT licensed normalize.css v3.0.3 bundled in [5] > - glyph icon fonts [6] > Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually > both are bare or both have .txt extension. > Also while looking at your site I noticed that the download links of you > incubating site [2] points to github, please change to point to the offical > release area. > Also the 6.1 release has already been tagged and it available for public > download on github [4] before this vote is finished. This is IMO against > Apache release policy [3] please remove. > I also notice you recently released the language packs (18th Nov) but there > doesn’t seem to have been a vote for that? Any reason for this? > === > [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases > [2] > https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home > [3] http://www.apache.org/dev/release.html#what > [4] https://github.com/apache/incubator-joshua/releases > [5] ./demo/bootstrap/css/bootstrap.min.css > [6] apache-joshua-6.1/demo/bootstrap/fonts/* > [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java > [8] ./bin/GIZA++ > [9] ./bin/mkcls > [10 ]./bin/snt2cooc.out > [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz > [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html > [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html > {code} > This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
[ https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827360#comment-15827360 ] Lewis John McGibbney commented on JOSHUA-324: - I'll be finishing my QA and producing an RC#3 tomorrow folks. Thanks. I've just committed {code} commit ae755a8bc0b1de9475285fcc8d35d8a8b5f00a6f Author: Lewis John McGibbneyDate: Tue Jan 17 19:12:10 2017 -0800 JOSHUA-324 Address Apache Joshua 6.1 RC#2 Issues {code} > Address Apache Joshua 6.1 RC#2 Issues > - > > Key: JOSHUA-324 > URL: https://issues.apache.org/jira/browse/JOSHUA-324 > Project: Joshua > Issue Type: Task >Affects Versions: 6.1 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows > {code} > == > - Your missing incubating in the release artifacts name. [1] > - There are a number of binary files in the source release that look to be > compiled source code. > I checked: > - name doesn’t include incubating > - signatures and hashes correct > - DISCLAIMER exists > - LICENSE is missing a few things (see below) > - a source file is missing an Apache header [7] > - Several unexpected binary files are contained in the source release > [8][9][10][11] > - Can compile from source > License is missing: > - MIT licensed normalize.css v3.0.3 bundled in [5] > - glyph icon fonts [6] > Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually > both are bare or both have .txt extension. > Also while looking at your site I noticed that the download links of you > incubating site [2] points to github, please change to point to the offical > release area. > Also the 6.1 release has already been tagged and it available for public > download on github [4] before this vote is finished. This is IMO against > Apache release policy [3] please remove. > I also notice you recently released the language packs (18th Nov) but there > doesn’t seem to have been a vote for that? Any reason for this? > === > [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases > [2] > https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home > [3] http://www.apache.org/dev/release.html#what > [4] https://github.com/apache/incubator-joshua/releases > [5] ./demo/bootstrap/css/bootstrap.min.css > [6] apache-joshua-6.1/demo/bootstrap/fonts/* > [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java > [8] ./bin/GIZA++ > [9] ./bin/mkcls > [10 ]./bin/snt2cooc.out > [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz > [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html > [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html > {code} > This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
[ https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706577#comment-15706577 ] Lewis John McGibbney commented on JOSHUA-324: - Hi Folks, I've assigned this to myself and will begin working on a pull request to incrementally address the above issues. > Address Apache Joshua 6.1 RC#2 Issues > - > > Key: JOSHUA-324 > URL: https://issues.apache.org/jira/browse/JOSHUA-324 > Project: Joshua > Issue Type: Task >Affects Versions: 6.1 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows > {code} > == > - Your missing incubating in the release artifacts name. [1] > - There are a number of binary files in the source release that look to be > compiled source code. > I checked: > - name doesn’t include incubating > - signatures and hashes correct > - DISCLAIMER exists > - LICENSE is missing a few things (see below) > - a source file is missing an Apache header [7] > - Several unexpected binary files are contained in the source release > [8][9][10][11] > - Can compile from source > License is missing: > - MIT licensed normalize.css v3.0.3 bundled in [5] > - glyph icon fonts [6] > Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually > both are bare or both have .txt extension. > Also while looking at your site I noticed that the download links of you > incubating site [2] points to github, please change to point to the offical > release area. > Also the 6.1 release has already been tagged and it available for public > download on github [4] before this vote is finished. This is IMO against > Apache release policy [3] please remove. > I also notice you recently released the language packs (18th Nov) but there > doesn’t seem to have been a vote for that? Any reason for this? > === > [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases > [2] > https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home > [3] http://www.apache.org/dev/release.html#what > [4] https://github.com/apache/incubator-joshua/releases > [5] ./demo/bootstrap/css/bootstrap.min.css > [6] apache-joshua-6.1/demo/bootstrap/fonts/* > [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java > [8] ./bin/GIZA++ > [9] ./bin/mkcls > [10 ]./bin/snt2cooc.out > [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz > [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html > [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html > {code} > This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
[ https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned JOSHUA-324: --- Assignee: Lewis John McGibbney > Address Apache Joshua 6.1 RC#2 Issues > - > > Key: JOSHUA-324 > URL: https://issues.apache.org/jira/browse/JOSHUA-324 > Project: Joshua > Issue Type: Task >Affects Versions: 6.1 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows > {code} > == > - Your missing incubating in the release artifacts name. [1] > - There are a number of binary files in the source release that look to be > compiled source code. > I checked: > - name doesn’t include incubating > - signatures and hashes correct > - DISCLAIMER exists > - LICENSE is missing a few things (see below) > - a source file is missing an Apache header [7] > - Several unexpected binary files are contained in the source release > [8][9][10][11] > - Can compile from source > License is missing: > - MIT licensed normalize.css v3.0.3 bundled in [5] > - glyph icon fonts [6] > Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually > both are bare or both have .txt extension. > Also while looking at your site I noticed that the download links of you > incubating site [2] points to github, please change to point to the offical > release area. > Also the 6.1 release has already been tagged and it available for public > download on github [4] before this vote is finished. This is IMO against > Apache release policy [3] please remove. > I also notice you recently released the language packs (18th Nov) but there > doesn’t seem to have been a vote for that? Any reason for this? > === > [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases > [2] > https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home > [3] http://www.apache.org/dev/release.html#what > [4] https://github.com/apache/incubator-joshua/releases > [5] ./demo/bootstrap/css/bootstrap.min.css > [6] apache-joshua-6.1/demo/bootstrap/fonts/* > [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java > [8] ./bin/GIZA++ > [9] ./bin/mkcls > [10 ]./bin/snt2cooc.out > [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz > [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html > [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html > {code} > This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues
Lewis John McGibbney created JOSHUA-324: --- Summary: Address Apache Joshua 6.1 RC#2 Issues Key: JOSHUA-324 URL: https://issues.apache.org/jira/browse/JOSHUA-324 Project: Joshua Issue Type: Task Affects Versions: 6.1 Reporter: Lewis John McGibbney Priority: Blocker Fix For: 6.1 Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows {code} == - Your missing incubating in the release artifacts name. [1] - There are a number of binary files in the source release that look to be compiled source code. I checked: - name doesn’t include incubating - signatures and hashes correct - DISCLAIMER exists - LICENSE is missing a few things (see below) - a source file is missing an Apache header [7] - Several unexpected binary files are contained in the source release [8][9][10][11] - Can compile from source License is missing: - MIT licensed normalize.css v3.0.3 bundled in [5] - glyph icon fonts [6] Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually both are bare or both have .txt extension. Also while looking at your site I noticed that the download links of you incubating site [2] points to github, please change to point to the offical release area. Also the 6.1 release has already been tagged and it available for public download on github [4] before this vote is finished. This is IMO against Apache release policy [3] please remove. I also notice you recently released the language packs (18th Nov) but there doesn’t seem to have been a vote for that? Any reason for this? === [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases [2] https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home [3] http://www.apache.org/dev/release.html#what [4] https://github.com/apache/incubator-joshua/releases [5] ./demo/bootstrap/css/bootstrap.min.css [6] apache-joshua-6.1/demo/bootstrap/fonts/* [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java [8] ./bin/GIZA++ [9] ./bin/mkcls [10 ]./bin/snt2cooc.out [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html {code} This is a blocking issue and until addressed we cannot release 6.1-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-315) Thrax keeps all rules
[ https://issues.apache.org/jira/browse/JOSHUA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-315: Fix Version/s: (was: 6.2) 6.1 > Thrax keeps all rules > - > > Key: JOSHUA-315 > URL: https://issues.apache.org/jira/browse/JOSHUA-315 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post > Fix For: 6.1 > > > When extracting rules, Thrax keeps *all* options for each target side. For > large bitexts and common source sides (e.g., "de" for Spanish–English), there > can be tens of thousands of translations, due to errors in the alignments and > phenomena like garbage collection. The decoder throws out all but the top > num_translation_options of these (default 20), but before doing so, it has to > score all the target side options with all feature functions, include the > language model. This slows down "warming up" of the model and means that the > first sentences to use these items are very slow to translation. > I have updated scripts/training/filter-rules.pl to filter out using Thrax's > rarity penalty field, but it would be much better if Thrax were to keep only > the most 100 frequent translation options for each source side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-316: Fix Version/s: (was: 6.2) 6.1 > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
[ https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-317. - Resolution: Fixed > SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 > > > Key: JOSHUA-317 > URL: https://issues.apache.org/jira/browse/JOSHUA-317 > Project: Joshua > Issue Type: Bug > Components: tuner >Affects Versions: 6.0.5 > Environment: Python 3.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > {code} > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp3/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue > took 0 seconds (0s) > [mert-1] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > [CHANGED] > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 > 'ITERATIONS': `iterations`, > ^ > SyntaxError: invalid syntax > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-316. - Resolution: Fixed > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-290) Provide Joshua artifact as a bundle
[ https://issues.apache.org/jira/browse/JOSHUA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-290: Fix Version/s: 6.2 > Provide Joshua artifact as a bundle > --- > > Key: JOSHUA-290 > URL: https://issues.apache.org/jira/browse/JOSHUA-290 > Project: Joshua > Issue Type: Task > Components: build >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 6.2 > > > I think it'd be good if we could make the Joshua artifact an OSGi _bundle_. > This would have no impact on plain java applications but would give the > following benefits: > - make it possible to install it in OSGi environments > - optionally introduce semantic versioning (in addition with the baseline > plugin) that would help track e.g. if changes in APIs break backward > compatibility -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-51) add jhclark/bigfatlm
[ https://issues.apache.org/jira/browse/JOSHUA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-51: --- Fix Version/s: 6.1 > add jhclark/bigfatlm > > > Key: JOSHUA-51 > URL: https://issues.apache.org/jira/browse/JOSHUA-51 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.2 > > > It would be nice to leverage more Hadoop tools in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-314) Enable set structured-output from config file
[ https://issues.apache.org/jira/browse/JOSHUA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-314: Fix Version/s: 6.2 > Enable set structured-output from config file > - > > Key: JOSHUA-314 > URL: https://issues.apache.org/jira/browse/JOSHUA-314 > Project: Joshua > Issue Type: Improvement > Components: core >Reporter: Tommaso Teofili > Fix For: 6.2 > > > Currently if one sets _use-structured-output = true_ in joshua.config that > results in error when parsing the config as it's not explicitly handled by > {{JoshuaConfiguration#readConfig}} (it can only be set programmatically), I > think it'd be nice to be able to configure it from config file too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-51) add jhclark/bigfatlm
[ https://issues.apache.org/jira/browse/JOSHUA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-51: --- Fix Version/s: (was: 6.1) 6.2 > add jhclark/bigfatlm > > > Key: JOSHUA-51 > URL: https://issues.apache.org/jira/browse/JOSHUA-51 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.2 > > > It would be nice to leverage more Hadoop tools in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-323) Joshua 6.1 Release Management
[ https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-323. - Resolution: Fixed > Joshua 6.1 Release Management > - > > Key: JOSHUA-323 > URL: https://issues.apache.org/jira/browse/JOSHUA-323 > Project: Joshua > Issue Type: Task > Components: build, release >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > This is a governing ticket for reference more than anything else. We need to > add all release specific build additions to parent pom.xml which enable us to > roll a release candidate. > The process is also being documented over at > https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management
[ https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656783#comment-15656783 ] Lewis John McGibbney commented on JOSHUA-323: - All licensing is now addressed and merged into master. I have some work to do with regards to release packaging which is not quite up to scratch but I will work on that tomorrow. > Joshua 6.1 Release Management > - > > Key: JOSHUA-323 > URL: https://issues.apache.org/jira/browse/JOSHUA-323 > Project: Joshua > Issue Type: Task > Components: build, release >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > This is a governing ticket for reference more than anything else. We need to > add all release specific build additions to parent pom.xml which enable us to > roll a release candidate. > The process is also being documented over at > https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management
[ https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656193#comment-15656193 ] Lewis John McGibbney commented on JOSHUA-323: - Progress going well. RAT license headers are taking a wee while but will have them cracked for tomorrow. Following files are outstanding. Progress can be tracked over on https://github.com/apache/incubator-joshua/pull/76 {code} Files with unapproved licenses: scripts/analysis/sentence-by-sentence.pl scripts/analysis/tree_visualizer scripts/copy-config.pl scripts/distributedLM/config.template scripts/distributedLM/create_remote_sym_tbl.pl scripts/distributedLM/filter_lm.pl scripts/distributedLM/get_grammar_eng_voc.pl scripts/distributedLM/get_grammar_eng_voc_from_cn_voc.pl scripts/distributedLM/global_symol_list scripts/distributedLM/lm.list.withweights scripts/ems/config.ghkm scripts/ems/config.hiero scripts/ems/config.phrase scripts/ems/experiment.meta scripts/language-pack/build_lp.sh scripts/language-pack/README.template scripts/misc/canonical_path scripts/misc/iso639 scripts/preparation/detokenize.pl scripts/preparation/lowercase.pl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ca scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.cs scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.de scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.el scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.en scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.es scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.fr scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.hu scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.is scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.it scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.lv scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.nl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pt scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ro scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ru scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sk scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sv scripts/preparation/normalize.pl scripts/preparation/tokenize.pl scripts/support/bbn2plf.pl scripts/support/extract-1best scripts/support/grammar-packer.pl scripts/support/moses2joshua.pl scripts/support/moses2joshua_grammar.pl scripts/support/phrase2hiero.py scripts/support/score-hypothesis.pl scripts/support/split2files scripts/training/add-OOVs.pl scripts/training/build-vocab.pl scripts/training/cachepipe/bashrc scripts/training/cachepipe/CachePipe.pm scripts/training/filter-empty-lines.pl scripts/training/filter-rules.pl scripts/training/get_grammar_features.pl scripts/training/lowercase-leaves.pl scripts/training/mira/feature_label_munger.pl scripts/training/mira/run-mira.pl scripts/training/paralign.pl scripts/training/parallelize/LocalConfig.pm scripts/training/parallelize/Makefile scripts/training/parallelize/parallelize.pl scripts/training/parallelize/sentclient.c scripts/training/parallelize/sentserver.c scripts/training/parallelize/sentserver.h scripts/training/paste scripts/training/run-giza.pl scripts/training/scat scripts/training/summarize.pl scripts/training/templates/alignment/jacana/resources/model/tagdict scripts/training/templates/alignment/word-align.conf scripts/training/templates/glue-grammar scripts/training/templates/glue-grammar.itg scripts/training/templates/hadoop/core-site.xml scripts/training/templates/hadoop/hdfs-site.xml scripts/training/templates/hadoop/mapred-site.xml scripts/training/templates/hadoop/masters scripts/training/templates/hadoop/slaves scripts/training/templates/thrax-hiero.conf scripts/training/templates/thrax-phrasal.conf scripts/training/templates/thrax-phrase-gt.conf scripts/training/templates/thrax-phrase.conf scripts/training/templates/thrax-samt.conf scripts/training/templates/tune/decoder_command scripts/training/templates/tune/decoder_command.qsub scripts/training/templates/tune/joshua.config scripts/training/TODO scripts/training/trim_parallel_corpus.pl scripts/training/unmap-html.pl {code} > Joshua 6.1 Release Management > - > > Key: JOSHUA-323 > URL: https://issues.apache.org/jira/browse/JOSHUA-323 > Project: Joshua > Issue Type: Task > Components: build, release >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > This is a
[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
[ https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-317: Fix Version/s: (was: 6.1) 6.2 > SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 > > > Key: JOSHUA-317 > URL: https://issues.apache.org/jira/browse/JOSHUA-317 > Project: Joshua > Issue Type: Bug > Components: tuner >Affects Versions: 6.0.5 > Environment: Python 3.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.2 > > > {code} > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp3/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue > took 0 seconds (0s) > [mert-1] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > [CHANGED] > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 > 'ITERATIONS': `iterations`, > ^ > SyntaxError: invalid syntax > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-323) Joshua 6.1 Release Management
Lewis John McGibbney created JOSHUA-323: --- Summary: Joshua 6.1 Release Management Key: JOSHUA-323 URL: https://issues.apache.org/jira/browse/JOSHUA-323 Project: Joshua Issue Type: Task Components: release, build Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Blocker Fix For: 6.1 This is a governing ticket for reference more than anything else. We need to add all release specific build additions to parent pom.xml which enable us to roll a release candidate. The process is also being documented over at https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-321) Add JOSHUA env to ./bin/bleu and ./bin/extract-1best bash scripts
Lewis John McGibbney created JOSHUA-321: --- Summary: Add JOSHUA env to ./bin/bleu and ./bin/extract-1best bash scripts Key: JOSHUA-321 URL: https://issues.apache.org/jira/browse/JOSHUA-321 Project: Joshua Issue Type: Bug Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: 6.1 Right now both bleu and extract-1best do not have the required $JOSHUA env variable which will result in an error if it is not set within the users environment. This currently breaks the Homebrew install amongst other things so we should add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder
[ https://issues.apache.org/jira/browse/JOSHUA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629835#comment-15629835 ] Lewis John McGibbney commented on JOSHUA-318: - Agreed, it's set for fix 6.2... if we ever release 6.2. > scripts/training/run_tuner.py should enable configurable memory usage when > invioking joshua-decoder > --- > > Key: JOSHUA-318 > URL: https://issues.apache.org/jira/browse/JOSHUA-318 > Project: Joshua > Issue Type: Improvement > Components: tuner >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney > Fix For: 6.2 > > > When I run the run_tuner.py script I can easily run into the following > {code} > [mert-1] rebuilding... > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) > at org.apache.joshua.decoder.Decoder.(Decoder.java:128) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, > in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, > in main > run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, > opts.decoder_config, opts.decoder_output_file, opts) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, > in run_zmert > opts.metric, opts.iterations or 10) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, > in setup_configs > for feature,weight in get_features(config): > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, > in get_features > output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % > (JOSHUA, config_file), shell=True) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in > check_output > **kwargs).stdout > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in > run > output=stdout, stderr=stderr) > subprocess.CalledProcessError: Command > '/usr/local/incubator-joshua/bin/joshua-decoder -c > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > -show-weights -v 0' returned non-zero exit status 1 > {code} > This is because, by default the joshua-decoder script runs with 4g of memory. > The run_runer.py script should be flexible enough to continue with the memory > allocation provided when a pipe was initially invoked. This value should then > be passed to the joshua-decoder script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
[ https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629831#comment-15629831 ] Lewis John McGibbney commented on JOSHUA-317: - lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments $ python --version Python 3.5.2 :: Continuum Analytics, Inc. > SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 > > > Key: JOSHUA-317 > URL: https://issues.apache.org/jira/browse/JOSHUA-317 > Project: Joshua > Issue Type: Bug > Components: tuner >Affects Versions: 6.0.5 > Environment: Python 3.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > {code} > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp3/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue > took 0 seconds (0s) > [mert-1] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > [CHANGED] > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 > 'ITERATIONS': `iterations`, > ^ > SyntaxError: invalid syntax > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"
[ https://issues.apache.org/jira/browse/JOSHUA-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-319. - Resolution: Not A Problem The issue is produced by the pipeline failing on the mert stage! The mert stage is then cached as a pseudo-complete status however the final config file is never truly produced. This causes the subsequent decoding task to fail. I re-ran another pipeline, which just finished flawlessly. > test-decode decoder_command results in java.lang.NumberFormatException: For > input string: "MAXSPAN" > --- > > Key: JOSHUA-319 > URL: https://issues.apache.org/jira/browse/JOSHUA-319 > Project: Joshua > Issue Type: Bug > Components: decoders >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > When I run the following command > {code} > /usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero --corpus > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune > --test > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test > --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model > training" --aligner berkeley --hadoop-mem 10g --tmp > /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar > /usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem > 10g > {code} > I end up with the following message. > {code} > INFO - Parameters read from configuration file: joshua.config > INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path > /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed' > INFO - tm = 'thrax -maxspan -1 -owner glue -path > /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue' > INFO - defaultnonterminal = 'X' > INFO - goalsymbol = 'GOAL' > INFO - markoovs = 'false' > INFO - search = 'cky' > INFO - pop-limit: 5000 > INFO - poplimit = '5000' > INFO - topn = '300' > INFO - useuniquenbest = 'true' > INFO - outputformat = '%i ||| %s ||| %f ||| %c' > INFO - includealignindex = 'false' > INFO - featurefunction = 'OOVPenalty' > INFO - featurefunction = 'WordPenalty' > INFO - c = 'joshua.config' > INFO - threads = '1' > INFO - topn = '0' > INFO - outputformat = '%s' > INFO - Read 3 weights (0 of them dense) > Exception in thread "main" java.lang.NumberFormatException: For input string: > "MAXSPAN" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) > at org.apache.joshua.decoder.Decoder.(Decoder.java:128) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-320) --joshua-mem pipeline parameter is not populated to mert processes
Lewis John McGibbney created JOSHUA-320: --- Summary: --joshua-mem pipeline parameter is not populated to mert processes Key: JOSHUA-320 URL: https://issues.apache.org/jira/browse/JOSHUA-320 Project: Joshua Issue Type: Bug Components: mert, pipeline Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 6.2 As we've discussed on the Joshua mailing list at http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01765.html it is not realistic to reserve only 4g for several tasks which are executed as part of a typical pipeline line. In particular, MERT runs with 4g which is not enough. We should increase this to something like 8g or more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"
[ https://issues.apache.org/jira/browse/JOSHUA-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610743#comment-15610743 ] Lewis John McGibbney commented on JOSHUA-319: - Some supplementary reading folks http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01769.html > test-decode decoder_command results in java.lang.NumberFormatException: For > input string: "MAXSPAN" > --- > > Key: JOSHUA-319 > URL: https://issues.apache.org/jira/browse/JOSHUA-319 > Project: Joshua > Issue Type: Bug > Components: decoders >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > When I run the following command > {code} > /usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero --corpus > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune > --test > /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test > --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model > training" --aligner berkeley --hadoop-mem 10g --tmp > /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar > /usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem > 10g > {code} > I end up with the following message. > {code} > INFO - Parameters read from configuration file: joshua.config > INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path > /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed' > INFO - tm = 'thrax -maxspan -1 -owner glue -path > /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue' > INFO - defaultnonterminal = 'X' > INFO - goalsymbol = 'GOAL' > INFO - markoovs = 'false' > INFO - search = 'cky' > INFO - pop-limit: 5000 > INFO - poplimit = '5000' > INFO - topn = '300' > INFO - useuniquenbest = 'true' > INFO - outputformat = '%i ||| %s ||| %f ||| %c' > INFO - includealignindex = 'false' > INFO - featurefunction = 'OOVPenalty' > INFO - featurefunction = 'WordPenalty' > INFO - c = 'joshua.config' > INFO - threads = '1' > INFO - topn = '0' > INFO - outputformat = '%s' > INFO - Read 3 weights (0 of them dense) > Exception in thread "main" java.lang.NumberFormatException: For input string: > "MAXSPAN" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) > at org.apache.joshua.decoder.Decoder.(Decoder.java:128) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"
Lewis John McGibbney created JOSHUA-319: --- Summary: test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN" Key: JOSHUA-319 URL: https://issues.apache.org/jira/browse/JOSHUA-319 Project: Joshua Issue Type: Bug Components: decoders Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 6.1 When I run the following command {code} /usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero --corpus /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune --test /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model training" --aligner berkeley --hadoop-mem 10g --tmp /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar /usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem 10g {code} I end up with the following message. {code} INFO - Parameters read from configuration file: joshua.config INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed' INFO - tm = 'thrax -maxspan -1 -owner glue -path /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue' INFO - defaultnonterminal = 'X' INFO - goalsymbol = 'GOAL' INFO - markoovs = 'false' INFO - search = 'cky' INFO - pop-limit: 5000 INFO - poplimit = '5000' INFO - topn = '300' INFO - useuniquenbest = 'true' INFO - outputformat = '%i ||| %s ||| %f ||| %c' INFO - includealignindex = 'false' INFO - featurefunction = 'OOVPenalty' INFO - featurefunction = 'WordPenalty' INFO - c = 'joshua.config' INFO - threads = '1' INFO - topn = '0' INFO - outputformat = '%s' INFO - Read 3 weights (0 of them dense) Exception in thread "main" java.lang.NumberFormatException: For input string: "MAXSPAN" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451) at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) at org.apache.joshua.decoder.Decoder.(Decoder.java:128) at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder
[ https://issues.apache.org/jira/browse/JOSHUA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609503#comment-15609503 ] Lewis John McGibbney commented on JOSHUA-318: - The following code is where the sh*t his the fan {code} def get_features(config_file): """Queries the decoder for all dense features that will be fired by the feature functions activated in the config file""" output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % (JOSHUA, config_file), shell=True) features = [] for index, item in enumerate(output.split('\n')): if item != "": features.append(tuple(item.split())) return features {code} > scripts/training/run_tuner.py should enable configurable memory usage when > invioking joshua-decoder > --- > > Key: JOSHUA-318 > URL: https://issues.apache.org/jira/browse/JOSHUA-318 > Project: Joshua > Issue Type: Improvement > Components: tuner >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney > Fix For: 6.2 > > > When I run the run_tuner.py script I can easily run into the following > {code} > [mert-1] rebuilding... > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) > at org.apache.joshua.decoder.Decoder.(Decoder.java:128) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, > in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, > in main > run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, > opts.decoder_config, opts.decoder_output_file, opts) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, > in run_zmert > opts.metric, opts.iterations or 10) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, > in setup_configs > for feature,weight in get_features(config): > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, > in get_features > output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % > (JOSHUA, config_file), shell=True) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in > check_output > **kwargs).stdout > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in > run > output=stdout, stderr=stderr) > subprocess.CalledProcessError: Command > '/usr/local/incubator-joshua/bin/joshua-decoder -c > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > -show-weights -v 0' returned non-zero exit status 1 > {code} > This is because, by default the joshua-decoder script runs with 4g of memory. > The run_runer.py script should be flexible enough to continue with the memory > allocation provided when a pipe was initially invoked. This value should then > be passed to the joshua-decoder script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder
Lewis John McGibbney created JOSHUA-318: --- Summary: scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder Key: JOSHUA-318 URL: https://issues.apache.org/jira/browse/JOSHUA-318 Project: Joshua Issue Type: Improvement Components: tuner Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.2 When I run the run_tuner.py script I can easily run into the following {code} [mert-1] rebuilding... dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config [CHANGED] dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final [NOT FOUND] cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner mert --decoder /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command --decoder-config /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config --decoder-output-file /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest --decoder-log-file /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log --iterations 10 --metric 'BLEU 4 closest' JOB FAILED (return code 1) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385) at org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368) at org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153) at org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458) at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389) at org.apache.joshua.decoder.Decoder.(Decoder.java:128) at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) Traceback (most recent call last): File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, in main(sys.argv) File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, in main run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, opts.decoder_config, opts.decoder_output_file, opts) File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, in run_zmert opts.metric, opts.iterations or 10) File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, in setup_configs for feature,weight in get_features(config): File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, in get_features output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % (JOSHUA, config_file), shell=True) File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in check_output **kwargs).stdout File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '/usr/local/incubator-joshua/bin/joshua-decoder -c /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config -show-weights -v 0' returned non-zero exit status 1 {code} This is because, by default the joshua-decoder script runs with 4g of memory. The run_runer.py script should be flexible enough to continue with the memory allocation provided when a pipe was initially invoked. This value should then be passed to the joshua-decoder script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
[ https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-317: Component/s: (was: er) tuner > SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 > > > Key: JOSHUA-317 > URL: https://issues.apache.org/jira/browse/JOSHUA-317 > Project: Joshua > Issue Type: Bug > Components: tuner >Affects Versions: 6.0.5 > Environment: Python 3.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > {code} > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp3/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue > took 0 seconds (0s) > [mert-1] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > [CHANGED] > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 > 'ITERATIONS': `iterations`, > ^ > SyntaxError: invalid syntax > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
Lewis John McGibbney created JOSHUA-317: --- Summary: SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 Key: JOSHUA-317 URL: https://issues.apache.org/jira/browse/JOSHUA-317 Project: Joshua Issue Type: Bug Components: er Affects Versions: 6.0.5 Environment: Python 3.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 6.1 {code} [tune-bundle] rebuilding... dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh [NOT FOUND] cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config /usr/local/joshua_resources/russian_experiments/exp3/tune/model --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue took 0 seconds (0s) [mert-1] rebuilding... dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config [CHANGED] dep=tune/model/grammar.packed/slice_0.source [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final [NOT FOUND] cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner mert --decoder /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command --decoder-config /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config --decoder-output-file /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest --decoder-log-file /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log --iterations 10 --metric 'BLEU 4 closest' JOB FAILED (return code 1) File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 'ITERATIONS': `iterations`, ^ SyntaxError: invalid syntax {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed JOSHUA-259. --- Resolution: Not A Problem > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > Fix For: 6.1 > > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened JOSHUA-259: - > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > Fix For: 6.1 > > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-259: Fix Version/s: (was: 6.2) 6.1 > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > Fix For: 6.1 > > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (JOSHUA-71) OS X installation depends on coreutils to run thrax test
[ https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened JOSHUA-71: > OS X installation depends on coreutils to run thrax test > > > Key: JOSHUA-71 > URL: https://issues.apache.org/jira/browse/JOSHUA-71 > Project: Joshua > Issue Type: Bug >Reporter: Luke Orland > Fix For: 6.1 > > > the {{gstat}} command from coreutils is not installed in Darwin by default. > One must resolve that dependency via Homebrew, Macports, etc. > The {{test/thrax/test.sh}} test will fail on an OS X system that does not > have coreutils installed. We should either change the test so that it does > not require coreutils in Darwin or make it clear in the (developer) > installation/setup instructions that coreutils are required for this test, > check for coreutils when running the thrax test, and output a helpful message > instructing the developer to go install coreutils if {{gstat}} is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (JOSHUA-95) Vocabulary locking
[ https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened JOSHUA-95: > Vocabulary locking > -- > > Key: JOSHUA-95 > URL: https://issues.apache.org/jira/browse/JOSHUA-95 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Juri Ganitkevitch > Fix For: 6.1 > > > Vocabulary::id() is still synchronized and a potential point of contention. > It would be nice to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (JOSHUA-100) Add Shen et al. (2008) dependency LM
[ https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened JOSHUA-100: - > Add Shen et al. (2008) dependency LM > > > Key: JOSHUA-100 > URL: https://issues.apache.org/jira/browse/JOSHUA-100 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (JOSHUA-107) Verbosity levels
[ https://issues.apache.org/jira/browse/JOSHUA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed JOSHUA-107. --- > Verbosity levels > > > Key: JOSHUA-107 > URL: https://issues.apache.org/jira/browse/JOSHUA-107 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.1 > > > Joshua should support verbosity levels with a command-line switch, so it's > easy to shut it up with something like {{-v 0}} or {{-q}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (JOSHUA-100) Add Shen et al. (2008) dependency LM
[ https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed JOSHUA-100. --- Resolution: Fixed > Add Shen et al. (2008) dependency LM > > > Key: JOSHUA-100 > URL: https://issues.apache.org/jira/browse/JOSHUA-100 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-100) Add Shen et al. (2008) dependency LM
[ https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-100: Fix Version/s: (was: 6.2) 6.1 > Add Shen et al. (2008) dependency LM > > > Key: JOSHUA-100 > URL: https://issues.apache.org/jira/browse/JOSHUA-100 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: Matt Post > Fix For: 6.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-95) Vocabulary locking
[ https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-95: --- Fix Version/s: (was: 6.2) 6.1 > Vocabulary locking > -- > > Key: JOSHUA-95 > URL: https://issues.apache.org/jira/browse/JOSHUA-95 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Juri Ganitkevitch > Fix For: 6.1 > > > Vocabulary::id() is still synchronized and a potential point of contention. > It would be nice to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (JOSHUA-22) Parallelize MBR computation
[ https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened JOSHUA-22: > Parallelize MBR computation > --- > > Key: JOSHUA-22 > URL: https://issues.apache.org/jira/browse/JOSHUA-22 > Project: Joshua > Issue Type: Bug >Reporter: Joshua Decoder > Fix For: 6.1 > > > MBR should be multithreaded. This would be easy to add following the model > used in the InputManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-22) Parallelize MBR computation
[ https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-22: --- Fix Version/s: (was: 6.2) 6.1 > Parallelize MBR computation > --- > > Key: JOSHUA-22 > URL: https://issues.apache.org/jira/browse/JOSHUA-22 > Project: Joshua > Issue Type: Bug >Reporter: Joshua Decoder > Fix For: 6.1 > > > MBR should be multithreaded. This would be easy to add following the model > used in the InputManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (JOSHUA-95) Vocabulary locking
[ https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed JOSHUA-95. -- Resolution: Fixed > Vocabulary locking > -- > > Key: JOSHUA-95 > URL: https://issues.apache.org/jira/browse/JOSHUA-95 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Juri Ganitkevitch > Fix For: 6.1 > > > Vocabulary::id() is still synchronized and a potential point of contention. > It would be nice to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-316: Fix Version/s: (was: 6.2) 6.1 > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-316: Summary: run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str' (was: run_bundler.py returning JOB FAILED (return code 1)) > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.2 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1)
Lewis John McGibbney created JOSHUA-316: --- Summary: run_bundler.py returning JOB FAILED (return code 1) Key: JOSHUA-316 URL: https://issues.apache.org/jira/browse/JOSHUA-316 Project: Joshua Issue Type: Bug Components: bundler Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Priority: Critical Fix For: 6.2 {code} [glue-tune] rebuilding... dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue [NOT FOUND] cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue took 1 seconds (1s) [tune-bundle] rebuilding... dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source [CHANGED] dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh [NOT FOUND] cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config /usr/local/joshua_resources/russian_experiments/exp2/tune/model --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue JOB FAILED (return code 1) * Running the copy-config.pl script with the command: /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue Traceback (most recent call last): File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 748, in main operations = collect_operations(opts) File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 637, in collect_operations opts.copy_config_options File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 202, in filter_through_copy_config_script result, err = p.communicate(config_text) File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, in _communicate input_view = memoryview(self._input) TypeError: memoryview: a bytes-like object is required, not 'str' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 760, in main(sys.argv) File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 751, in main error_quit(e.message) AttributeError: 'TypeError' object has no attribute 'message' * WARNING: no key 'outputformat' found in config file (appending to end) * WARNING: no key 'search' found in config file (appending to end) * WARNING: no key 'topn' found in config file (appending to end) * WARNING: no key 'markoovs' found in config file (appending to end) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution
[ https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586111#comment-15586111 ] Lewis John McGibbney commented on JOSHUA-312: - boom goes the dynamite :) Thanks [~post] > Even though alignment is cached, it is always re-done in pipeline re-execution > -- > > Key: JOSHUA-312 > URL: https://issues.apache.org/jira/browse/JOSHUA-312 > Project: Joshua > Issue Type: Improvement > Components: alignment >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > Say if a pipeline fails after alignment. The alignment result is never cached > and it becomes necessary to undertake alignment... again! > We should investigate the process for caching alignments as it would really > speed up rerunning end-to-end pipelines for large input datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution
[ https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned JOSHUA-312: --- Assignee: Lewis John McGibbney > Even though alignment is cached, it is always re-done in pipeline re-execution > -- > > Key: JOSHUA-312 > URL: https://issues.apache.org/jira/browse/JOSHUA-312 > Project: Joshua > Issue Type: Improvement > Components: alignment >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Critical > Fix For: 6.2 > > > Say if a pipeline fails after alignment. The alignment result is never cached > and it becomes necessary to undertake alignment... again! > We should investigate the process for caching alignments as it would really > speed up rerunning end-to-end pipelines for large input datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution
[ https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573593#comment-15573593 ] Lewis John McGibbney commented on JOSHUA-312: - OK doke... I managed to reproduce this today. So one of my pipelines just failed, this has to do with me screwing up my paths... however this was after alignment with berkeley aligner. When I went to re-reun the code as follows, alignment was not pulled from the cache... it is completely re-run {code} lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments $ ls -al total 8 drwxr-xr-x 7 lmcgibbn wheel 238 Oct 13 16:48 . drwxr-xr-x 22 lmcgibbn wheel 748 Oct 13 12:09 .. drwxr-xr-x 29 lmcgibbn wheel 986 Oct 13 16:48 .cachepipe -rw-r--r-- 1 lmcgibbn wheel 47 Oct 13 12:24 README drwxr-xr-x 5 lmcgibbn wheel 170 Oct 13 16:48 alignments drwxr-xr-x 12 lmcgibbn wheel 408 Oct 13 12:23 data drwxr-xr-x 6 lmcgibbn wheel 204 Oct 13 12:24 scripts lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments $ /usr/local/incubator-joshua/bin/pipeline.pl --rundir . --type hiero --corpus /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune --test /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test --source en --target ru --readme "Experiment 1 Run 1 of ru --> en model training" --aligner berkeley [train-copy-and-filter] cached, skipping... [train-tokenize-en] cached, skipping... [train-tokenize-ru] cached, skipping... [train-trim] cached, skipping... [train-lowercase-en] cached, skipping... [train-lowercase-ru] cached, skipping... [train-vocab-en] cached, skipping... [train-vocab-ru] cached, skipping... [tune-copy-and-filter] cached, skipping... [tune-tokenize-en] cached, skipping... [tune-tokenize-ru] cached, skipping... [tune-lowercase-en] cached, skipping... [tune-lowercase-ru] cached, skipping... [tune-vocab-en] cached, skipping... [tune-vocab-ru] cached, skipping... [test-copy-and-filter] cached, skipping... [test-tokenize-en] cached, skipping... [test-tokenize-ru] cached, skipping... [test-lowercase-en] cached, skipping... [test-lowercase-ru] cached, skipping... [test-vocab-en] cached, skipping... [test-vocab-ru] cached, skipping... [source-numlines] cached, skipping... [source-numlines] retrieved cached result => 817962 [berkeley-aligner-chunk-0] rebuilding... dep=alignments/0/word-align.conf dep=/usr/local/joshua_resources/russian_experiments/data/train/splits/corpus.en.0 [NOT FOUND] dep=/usr/local/joshua_resources/russian_experiments/data/train/splits/corpus.ru.0 [NOT FOUND] dep=alignments/0/training.align [NOT FOUND] cmd=java -d64 -Xmx10g -jar /usr/local/incubator-joshua/ext/berkeleyaligner/distribution/berkeleyaligner.jar ++alignments/0/word-align.conf {code} The aligner looks as follows {code} lmcgibbn@LMC-056430 /usr/local $ tail -f joshua_resources/russian_experiments/alignments/0/log main() { Execution directory: alignments/0 Preparing Training Data { ERROR: No files found at source /dev/null } [23s, cum. 23s] 817962 training sentences, 0 test sentences Training models: 2 stages { Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations { Initializing forward model [1m16s, cum. 1m16s] Initializing reverse model [1m36s, cum. 2m53s] Joint Train: 817962 sentences, jointly { Iteration 1/5 { Sentence 1/817962 Sentence 2/817962 Sentence 3/817962 Sentence 11/817962 Sentence 40/817962 Sentence 146/817962 ... {code} It would therefore appear to me that YES, the pipeline is cached, however on re-runs, the cache is not consulted and therefore alignment is repeated. > Even though alignment is cached, it is always re-done in pipeline re-execution > -- > > Key: JOSHUA-312 > URL: https://issues.apache.org/jira/browse/JOSHUA-312 > Project: Joshua > Issue Type: Improvement > Components: alignment >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.2 > > > Say if a pipeline fails after alignment. The alignment result is never cached > and it becomes necessary to undertake alignment... again! > We should investigate the process for caching alignments as it would really > speed up rerunning end-to-end pipelines for large input datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-312) Even though alignment is cached, it is always re-done in pipeline re-execution
[ https://issues.apache.org/jira/browse/JOSHUA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-312: Summary: Even though alignment is cached, it is always re-done in pipeline re-execution (was: Alignment is never cached) > Even though alignment is cached, it is always re-done in pipeline re-execution > -- > > Key: JOSHUA-312 > URL: https://issues.apache.org/jira/browse/JOSHUA-312 > Project: Joshua > Issue Type: Improvement > Components: alignment >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.2 > > > Say if a pipeline fails after alignment. The alignment result is never cached > and it becomes necessary to undertake alignment... again! > We should investigate the process for caching alignments as it would really > speed up rerunning end-to-end pipelines for large input datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-312) Alignment is never cached
Lewis John McGibbney created JOSHUA-312: --- Summary: Alignment is never cached Key: JOSHUA-312 URL: https://issues.apache.org/jira/browse/JOSHUA-312 Project: Joshua Issue Type: Improvement Components: alignment Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Priority: Critical Fix For: 6.2 Say if a pipeline fails after alignment. The alignment result is never cached and it becomes necessary to undertake alignment... again! We should investigate the process for caching alignments as it would really speed up rerunning end-to-end pipelines for large input datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-299) Move regression tests to proper unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477498#comment-15477498 ] lewis john mcgibbney commented on JOSHUA-299: - Mvn clean test is the way to go -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney > Move regression tests to proper unit tests > -- > > Key: JOSHUA-299 > URL: https://issues.apache.org/jira/browse/JOSHUA-299 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > Many of the regression tests (test*.sh under src/test/resources) have been > moved to proper unit tests, but this move should be completed, and the > regression tests should be deleted. This should be done for 6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-299) Move regression tests to proper unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471850#comment-15471850 ] lewis john mcgibbney commented on JOSHUA-299: - Nope did not sorry. Please progress! -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney > Move regression tests to proper unit tests > -- > > Key: JOSHUA-299 > URL: https://issues.apache.org/jira/browse/JOSHUA-299 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > Many of the regression tests (test*.sh under src/test/resources) have been > moved to proper unit tests, but this move should be completed, and the > regression tests should be deleted. This should be done for 6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner
[ https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446876#comment-15446876 ] Lewis John McGibbney commented on JOSHUA-304: - [~post] np at all. No need for sorry. I just tested after clean download of third party deps that this works a charm. Thanks for looking in to it I really appreciate it. I am +1 for merge into master and resolve this as fixed [~post] > word-align.conf alignment template file not compatible with berkeley aligner > > > Key: JOSHUA-304 > URL: https://issues.apache.org/jira/browse/JOSHUA-304 > Project: Joshua > Issue Type: Bug > Components: alignment, berkeley, templates >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > It takes me quite some time to debug what was going on and why pipeline's > were failing when using the berkeley aligner. > It turns out that the word-align.conf template provided at > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf > is not compatible with the berkeley aligner. > In particular the following lines are non compatible > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15 > Evidence of this is provided below > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Exception in thread "main" java.lang.NumberFormatException: For input string: > "5 5" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240) > at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294) > at > edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555) > at > edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604) > at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293) > at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149) > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Cannot create directory: alignments/0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner
[ https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446643#comment-15446643 ] Lewis John McGibbney commented on JOSHUA-304: - Hi [~post] What new steps did you actually add? I've wiped everything that was generated by Joshua. I've rebuilt JOSHUA-304 branch. I'm getting the following {code} $JOSHUA/bin/pipeline.pl --type hiero --rundir /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0 --readme "Baseline Hiero run 0 --lm-gen berkeleylm --lm berkeleylm --aligner berkeley JOSHUA-304" --source es --target en --lm-gen berkeleylm --lm berkeleylm --aligner berkeley --corpus $SPANISH/corpus/asr/callhome_train --corpus $SPANISH/corpus/asr/fisher_train --tune $SPANISH/corpus/asr/fisher_dev --test $SPANISH/corpus/asr/callhome_devtest ... snip ... [test-vocab-es] rebuilding... dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/corpus.es [CHANGED] dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/vocab.es [NOT FOUND] cmd=cat /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/corpus.es | /usr/local/incubator-joshua/scripts/training/build-vocab.pl > /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/vocab.es took 0 seconds (0s) [test-vocab-en] rebuilding... dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/corpus.en [CHANGED] dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/vocab.en [NOT FOUND] cmd=cat /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/corpus.en | /usr/local/incubator-joshua/scripts/training/build-vocab.pl > /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/test/vocab.en took 0 seconds (0s) [source-numlines] rebuilding... dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/train/corpus.es [CHANGED] cmd=cat /usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/train/corpus.es | wc -l took 0 seconds (0s) [source-numlines] retrieved cached result => 151810 [berkeley-aligner-chunk-0] rebuilding... dep=alignments/0/word-align.conf [CHANGED] dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/train/splits/corpus.es.0 [NOT FOUND] dep=/usr/local/jpl/xdata/joshua_experiments/fisher_callhome_experiment/0/data/train/splits/corpus.en.0 [NOT FOUND] dep=alignments/0/training.align [NOT FOUND] cmd=java -d64 -Xmx10g -jar /usr/local/incubator-joshua/ext/berkeleyaligner/distribution/berkeleyaligner.jar ++alignments/0/word-align.conf JOB FAILED (return code 1) [aligner-combine] rebuilding... dep=alignments/0/training.en-es.align [NOT FOUND] dep=alignments/training.align [NOT FOUND] cmd=cat alignments/0/training.en-es.align > alignments/training.align JOB FAILED (return code 1) cat: alignments/0/training.en-es.align: No such file or directory {code} > word-align.conf alignment template file not compatible with berkeley aligner > > > Key: JOSHUA-304 > URL: https://issues.apache.org/jira/browse/JOSHUA-304 > Project: Joshua > Issue Type: Bug > Components: alignment, berkeley, templates >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > It takes me quite some time to debug what was going on and why pipeline's > were failing when using the berkeley aligner. > It turns out that the word-align.conf template provided at > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf > is not compatible with the berkeley aligner. > In particular the following lines are non compatible > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15 > Evidence of this is provided below > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar >
[jira] [Commented] (JOSHUA-297) List supported versions of Hadoop
[ https://issues.apache.org/jira/browse/JOSHUA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436177#comment-15436177 ] Lewis John McGibbney commented on JOSHUA-297: - The supported version is 2.5.2 https://github.com/joshua-decoder/thrax/blob/master/.classpath#L8 > List supported versions of Hadoop > - > > Key: JOSHUA-297 > URL: https://issues.apache.org/jira/browse/JOSHUA-297 > Project: Joshua > Issue Type: Task >Reporter: Bob Paulin >Assignee: Matt Post >Priority: Minor > Fix For: 6.1 > > Attachments: thrax-hadoop0.20.2.log, thrax-hadoop2.6.4.log > > > When working through the training tutorial I noticed that no version of > Hadoop was listed so I tried the latest Hadoop 2.6.4. The Thrax Job failed > on this version. It worked however with 0.20.2 . I found this on > http://joshua.incubator.apache.org/6.0/pipeline.html by hovering over a link > on the Hadoop section. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-305) joshua-6.1-SNAPSHOT-source-release.zip takes ages to build
[ https://issues.apache.org/jira/browse/JOSHUA-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-305. - Resolution: Not A Bug This was due to a large language model being present within the joshua directory. This is not an issue. > joshua-6.1-SNAPSHOT-source-release.zip takes ages to build > -- > > Key: JOSHUA-305 > URL: https://issues.apache.org/jira/browse/JOSHUA-305 > Project: Joshua > Issue Type: Bug > Components: build, core >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > When someone runs mvn clean install, the > joshua-6.1-SNAPSHOT-source-release.zip step takes absolutely ages to build. > We should investigate why this is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-305) joshua-6.1-SNAPSHOT-source-release.zip takes ages to build
Lewis John McGibbney created JOSHUA-305: --- Summary: joshua-6.1-SNAPSHOT-source-release.zip takes ages to build Key: JOSHUA-305 URL: https://issues.apache.org/jira/browse/JOSHUA-305 Project: Joshua Issue Type: Bug Components: build, core Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Priority: Blocker Fix For: 6.1 When someone runs mvn clean install, the joshua-6.1-SNAPSHOT-source-release.zip step takes absolutely ages to build. We should investigate why this is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner
[ https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435615#comment-15435615 ] Lewis John McGibbney commented on JOSHUA-304: - ACK will do. > word-align.conf alignment template file not compatible with berkeley aligner > > > Key: JOSHUA-304 > URL: https://issues.apache.org/jira/browse/JOSHUA-304 > Project: Joshua > Issue Type: Bug > Components: alignment, berkeley, templates >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > It takes me quite some time to debug what was going on and why pipeline's > were failing when using the berkeley aligner. > It turns out that the word-align.conf template provided at > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf > is not compatible with the berkeley aligner. > In particular the following lines are non compatible > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15 > Evidence of this is provided below > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Exception in thread "main" java.lang.NumberFormatException: For input string: > "5 5" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240) > at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294) > at > edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555) > at > edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604) > at edu.berkeley.nlp.fig.exec.Execution.init(Execution.java:293) > at edu.berkeley.nlp.wordAlignment.Main.main(Main.java:149) > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Cannot create directory: alignments/0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner
[ https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435133#comment-15435133 ] Lewis John McGibbney commented on JOSHUA-304: - It may help for me to post the options available within the current berkeley aligner jar which was built when I installed Joshua {code} lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ java -jar ./lib/berkeleyaligner.jar -help Usage: log.maxIndLevel< int> : Maximum indent level. [10] log.msPerLine < int> : Maximum number of milliseconds between consecutive lines of output. [1000] log.file < str> : File to write log. [] log.stdout < bool> : Whether to output to the console. [true] log.note < str> : Dummy placeholder for a comment [] log.forcePrint < bool> : Force printing from logs* [false] log.maxPrintErrors < int> : Maximum number of errors (via error()) to print [1] EMWordAligner.nullProb < dbl> : How to assign null-word probabilities (=1 means 1/n) [1.0E-6] EMWordAligner.usePosteriorDecoding < bool> : Use posterior decoding (recommended for best performance). [true] EMWordAligner.posteriorDecodingThreshold < dbl> : Threshold in [0,1] for deciding whether an alignment should exist. [0.5] EMWordAligner.mergeConsiderNull < bool> : When merging expected sufficient statistics, take into account the NULL (fix). [false] EMWordAligner.handleUnknownWords < bool> : Don't crash with unknown words (better to train on test set). [false] EMWordAligner.priorFraction< dbl> : Fraction of a count to add for links in dictionary prior (1 works well). [0.0] EMWordAligner.numThreads < int> : Number of concurrent threads to use during E-step (set to number of processors). [1] EMWordAligner.safeConcurrency < bool> : Safe concurrency (gets rid of concurrency warnings at the expense of speed) [false] EMWordAligner.evaluateDuringTraining < bool> : Whether to evaluate the model after each training iteration (slower, more memory). [false] TreeWalkModel.usePushProbabilities < bool> : Separate parameters for moving and pushing. [true] TreeWalkModel.conditionOnTag < bool> : Whether to condition distortion on the tag types. [true] TreeWalkModel.cacheTreePaths < bool> : Whether to cache paths through trees (uses lots of memory; faster). [false] Evaluator.searchForThreshold < bool> : Evaluate using line search [false] Evaluator.thresholdIntervals < int> : Sets the number of intervals for posterior threshold line search [20] Evaluator.saveAlignmentObjects < bool> : Save object files for proposed alignments (large files) [false] Main.trainSources < str*> : Directories or files containing training files. [example/train] Main.testSources < str*> : Directory or file containing testing files. [example/test] Main.sentences < int> : Maximum number of the training sentences to use [2147483647] Main.offsetTrainingSentences < int> : Skip this number of the first training sentences [0] Main.maxTestSentences < int> : Maximum number of the test sentences to use [2147483647] Main.offsetTestSentences < int> : Skip this number of the first test sentences [0] Main.foreignSuffix < str> : Foreign language file suffix [f] Main.englishSuffix < str> : English language file suffix [e] Main.itgTrainTestSplitPoint< int> : When writing test (ITG) posteriors, where to divide train/test data? [0] Main.itgInputDir < str> : What directory should we dump ITG test data to? [] Main.reverseAlignments < bool> : Reverse test set alignments (i.e., foreign to english) [false] Main.oneIndexed< bool> : Are alignments one-indexed (default == no, 0-indexed) [false] Main.lowercaseWords< bool> : Convert all words to lowercase [false] Main.leaveTrainingOnDisk < bool> : Don't load and store the training set upfront (slower, but less memory) [false] Main.saveRejects < bool> : Save rejected sentence pairs [false] Main.forwardModels: Which word alignment model to use in the forward direction. [MODEL1 HMM] Main.reverseModels : Which word alignment model to use in the backward direction. [MODEL1 HMM] Main.iters < int*> : Number of iterations to run the model. [5 5] Main.mode : Whether to train the two models jointly or independently. [JOINT JOINT] Main.trainingCacheMaxSize < int> : Max sentence length for caching the HMM trellis (efficiency only). [100] Main.loadParamsDir < str> : Directory to load parameters from. [] Main.loadLexicalModelOnly < bool> : When true, the
[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner
[ https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434164#comment-15434164 ] Lewis John McGibbney commented on JOSHUA-304: - It should be noted that in order for me to override the exceptions thrown above the template ended up looking like the following {code} ## word-align.conf ## -- ## This is an example training script for the Berkeley ## word aligner. In this configuration it uses two HMM ## alignment models trained jointly and then decoded ## using the competitive thresholding heuristic. ## # Training: Defines the training regimen ## forwardModels HMM reverseModels HMM modeJOINT iters 5 ### # Execution: Controls output and program flow ### execDir alignments/0 create saveParams false numThreads 1 msPerLine 1 alignTraining # # Language/Data # foreignSuffix es.0 englishSuffix en.0 # Choose the training sources, which can either be directories or files that list files/directories trainSources /usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/data/train/splits/corpus sentencesMAX testSources /dev/null overwriteExecDir true # # 1-best output # competitiveThresholding {code} > word-align.conf alignment template file not compatible with berkeley aligner > > > Key: JOSHUA-304 > URL: https://issues.apache.org/jira/browse/JOSHUA-304 > Project: Joshua > Issue Type: Bug > Components: alignment, berkeley, templates >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > It takes me quite some time to debug what was going on and why pipeline's > were failing when using the berkeley aligner. > It turns out that the word-align.conf template provided at > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf > is not compatible with the berkeley aligner. > In particular the following lines are non compatible > https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15 > Evidence of this is provided below > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT > lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 > -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar > ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf > Exception in thread "main" java.lang.NumberFormatException: For input string: > "5 5" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143) > at > edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240) > at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294) > at > edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555) > at > edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604) > at
[jira] [Commented] (JOSHUA-299) Move regression tests to proper unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432062#comment-15432062 ] Lewis John McGibbney commented on JOSHUA-299: - I'll scope this issue tomorrow [~post] and see if I can get a PR together. > Move regression tests to proper unit tests > -- > > Key: JOSHUA-299 > URL: https://issues.apache.org/jira/browse/JOSHUA-299 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > Many of the regression tests (test*.sh under src/test/resources) have been > moved to proper unit tests, but this move should be completed, and the > regression tests should be deleted. This should be done for 6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JOSHUA-299) Move regression tests to proper unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned JOSHUA-299: --- Assignee: Lewis John McGibbney > Move regression tests to proper unit tests > -- > > Key: JOSHUA-299 > URL: https://issues.apache.org/jira/browse/JOSHUA-299 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Lewis John McGibbney > Fix For: 6.1 > > > Many of the regression tests (test*.sh under src/test/resources) have been > moved to proper unit tests, but this move should be completed, and the > regression tests should be deleted. This should be done for 6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-287) KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so (libken.dylib on OSX)
[ https://issues.apache.org/jira/browse/JOSHUA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420151#comment-15420151 ] lewis john mcgibbney commented on JOSHUA-287: - Brilliant Kellen thank you > KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so > (libken.dylib on OSX) > --- > > Key: JOSHUA-287 > URL: https://issues.apache.org/jira/browse/JOSHUA-287 > Project: Joshua > Issue Type: Bug > Components: core, kenlm >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Kellen Sunderland > Fix For: 6.1 > > > As explained in > http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01189.html > currently we have an issue, where, when checked out from master the following > RuntimeException is thrown. > {code} > --- > T E S T S > --- > Running TestSuite > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 > OOVPenalty=-200.000 | -198.000 > ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib > ERROR - *This probably means that the KenLM library didn't compile. > ERROR - *Make sure that BOOST_ROOT is set to the root of your boost > ERROR - *installation (it's not /opt/local/, the default), change to > ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the > ERROR - *website (joshua-decoder.org). > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > {code} > We need to fix this such that we can run static source code analysis via > sonar and have our results available on analysis.apache.org. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-249) Joshua Logo
[ https://issues.apache.org/jira/browse/JOSHUA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420149#comment-15420149 ] Lewis John McGibbney commented on JOSHUA-249: - Cool can you please resolve this issue. > Joshua Logo > --- > > Key: JOSHUA-249 > URL: https://issues.apache.org/jira/browse/JOSHUA-249 > Project: Joshua > Issue Type: Task >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 6.1 > > Attachments: apache_joshua_logo.png, apache_joshua_logo.xcf > > > As we discussed on the mailing lists, this issue should gather all proposed > Joshua logo's so we can VOTE on one or more of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-287) KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so (libken.dylib on OSX)
[ https://issues.apache.org/jira/browse/JOSHUA-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-287: Issue Type: Bug (was: Improvement) > KenLM.java catches UnsatisfiedLinkError when attempting to load libken.so > (libken.dylib on OSX) > --- > > Key: JOSHUA-287 > URL: https://issues.apache.org/jira/browse/JOSHUA-287 > Project: Joshua > Issue Type: Bug > Components: core, kenlm >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney > Fix For: 6.1 > > > As explained in > http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01189.html > currently we have an issue, where, when checked out from master the following > RuntimeException is thrown. > {code} > --- > T E S T S > --- > Running TestSuite > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 > OOVPenalty=-200.000 | -198.000 > ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib > ERROR - *This probably means that the KenLM library didn't compile. > ERROR - *Make sure that BOOST_ROOT is set to the root of your boost > ERROR - *installation (it's not /opt/local/, the default), change to > ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the > ERROR - *website (joshua-decoder.org). > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > {code} > We need to fix this such that we can run static source code analysis via > sonar and have our results available on analysis.apache.org. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-283) Implement fast_align as one of the available alignment options
Lewis John McGibbney created JOSHUA-283: --- Summary: Implement fast_align as one of the available alignment options Key: JOSHUA-283 URL: https://issues.apache.org/jira/browse/JOSHUA-283 Project: Joshua Issue Type: Bug Components: alignment, pipeline Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 6.1 For some time now, I've been having issues using GIZA++ for alignment whilst running a Joshua pipeline. Whilst looking for an alternative [~post] and [~kellen.sunderland] mentioned the berkeley aligner and fast_align respectively. Due to the fact that 1) berkeley aligner has not been touched in ~9 years, and 2) no artifact currently exists on Maven Central, I am taking the advice and attempting to use fast_align. This issue will augment the alignment code in Joshua to permit use of fast_align which is ALv2.0 licensed. https://github.com/clab/fast_align -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (JOSHUA-281) split2files.pl support script no longer exists hence pipeline fails
[ https://issues.apache.org/jira/browse/JOSHUA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed JOSHUA-281. --- Resolution: Invalid This is not a bug at all, my input parameters for the pipeline.pl invocation were incorrect. > split2files.pl support script no longer exists hence pipeline fails > --- > > Key: JOSHUA-281 > URL: https://issues.apache.org/jira/browse/JOSHUA-281 > Project: Joshua > Issue Type: Bug > Components: pipeline >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > When I attempt to run a pipeline, I get the following > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ ../bin/pipeline.pl > --rundir . --type hiero --corpus > /usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en > --tune > /usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.tune > --test > /usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.test > --source en --target ru --rundir experiment_1/1 --readme "Russian model > generation experiment 1 run 1" --mbr > [train-copy-and-filter] rebuilding... > > dep=/usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.en > [CHANGED] > > dep=/usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.ru > [CHANGED] > dep=/usr/local/incubator-joshua/experiment_1/1/data/train/train.en [NOT > FOUND] > dep=/usr/local/incubator-joshua/experiment_1/1/data/train/train.ru [NOT > FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/paste > /usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.en > /usr/local/jpl/xdata/joshua_experiments/russian_model/commoncrawl.ru-en.ru | > /usr/local/incubator-joshua/scripts/training/filter-empty-lines.pl | > /usr/local/incubator-joshua/scripts/training/split2files.pl > /usr/local/incubator-joshua/experiment_1/1/data/train/train.en > /usr/local/incubator-joshua/experiment_1/1/data/train/train.ru > JOB FAILED (return code 127) > /bin/bash: /usr/local/incubator-joshua/scripts/training/split2files.pl: No > such file or directory > {code} > The following commit changed the name of the file > {code} > Repository: incubator-joshua > Updated Branches: > refs/heads/master 09fb6a2d3 -> f02bd279e > combined split2files implementations > Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo > Commit: > http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/f02bd279 > Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/f02bd279 > Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/f02bd279 > Branch: refs/heads/master > Commit: f02bd279e892408c9eca2a2a241f21f59cb105e9 > Parents: 09fb6a2 > Author: Matt Post> Authored: Wed May 18 09:12:07 2016 -0400 > Committer: Matt Post > Committed: Wed May 18 09:12:07 2016 -0400 > -- > scripts/support/split2files | 44 +++ > scripts/support/splittabs.pl | 42 - > scripts/training/pipeline.pl | 8 ++--- > scripts/training/split2files.pl | 38 --- > scripts/training/trim_parallel_corpus.pl | 2 +- > 5 files changed, 49 insertions(+), 85 deletions(-) > -- > {code} > I'll submit a PR to do the simple string replace... which is hopefully all > that is wrong here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-280) Existing Language packs not compatible with Joshua master
[ https://issues.apache.org/jira/browse/JOSHUA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359696#comment-15359696 ] Lewis John McGibbney commented on JOSHUA-280: - The existing chinese language pack works just fine {code} lmcgibbn@LMC-032857 /usr/local/Cellar/joshua/HEAD/libexec/zh-en-hiero-pack-2016-01(NUTCH-2089) $ ./run-joshua-server.sh Parameters read from configuration file: tm = 'thrax -path grammar.packed -maxspan 20 -owner pt' tm = 'thrax -path grammar.glue -maxspan -1 -owner glue' defaultnonterminal = 'X' goalsymbol = 'GOAL' featurefunction = 'LanguageModel -lm_order 5 -lm_type berkeleylm -lm_file lm.berkeleylm' markoovs = 'false' search = 'cky' poplimit = '100' topn = '0' useuniquenbest = 'true' outputformat = '%S' includealignindex = 'false' featurefunction = 'OOVPenalty' featurefunction = 'WordPenalty' Parameters overridden from the command line: server-port: 5674 serverport = '5674' c = 'joshua.config' Read 10 weights (0 of them dense) Reading vocabulary: grammar.packed/vocabulary Read 300317 entries from the vocabulary Reading packed config: grammar.packed/config 102030405060708090.100% Reading encoder configuration: grammar.packed/encoding Loaded 62685418 rules Reading grammar from file grammar.glue... MemoryBasedBatchGrammar: Read 4 rules with 4 distinct source sides from 'grammar.glue' Memory used 3447.1 MB Grammar loading took: 39 seconds. Stateful object with state index 0 Loading Berkeley LM from binary lm.berkeleylm FEATURE: tm_pt (weight 0.000) FEATURE: tm_glue (weight 0.000) FEATURE: lm_0, order 5 (weight 0.194) FEATURE: OOVPenalty (weight 0.015) FEATURE: WordPenalty (weight -0.460) Grammar sorting happening lazily on-demand. Model loading took 42 seconds Memory used 4355.5 MB ** TCP Server running and listening on port 5674. {code} > Existing Language packs not compatible with Joshua master > - > > Key: JOSHUA-280 > URL: https://issues.apache.org/jira/browse/JOSHUA-280 > Project: Joshua > Issue Type: Bug > Components: language packs >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > When I work with the existing Spanish --> English language pack at > http://cs.jhu.edu/~post/language-packs/language-pack-es-en-phrase-2015-03-06.tgz, > I get the following error > {code} > lmcgibbn@LMC-032857 > /usr/local/Cellar/joshua/HEAD/libexec/language-pack-es-en-phrase-2015-03-06(NUTCH-2089) > $ ./run-joshua-server.sh > INFO - Parameters read from configuration file: joshua.config > INFO - tm = 'moses -owner pt -maxspan 0 -path phrase-table.packed > -max-source-len 5' > INFO - defaultnonterminal = 'X' > INFO - goalsymbol = 'GOAL' > INFO - featurefunction = 'StateMinimizingLanguageModel -lm_type kenlm > -lm_order 5 -lm_file lm.kenlm' > INFO - markoovs = 'false' > INFO - search = 'stack' > INFO - pop-limit: 100 > INFO - poplimit = '100' > INFO - topn = '0' > INFO - useuniquenbest = 'true' > INFO - outputformat = '%s' > INFO - includealignindex = 'false' > INFO - featurefunction = 'OOVPenalty' > INFO - featurefunction = 'WordPenalty' > INFO - featurefunction = 'Distortion' > INFO - featurefunction = 'PhrasePenalty' > INFO - c = 'joshua.config' > INFO - server-port: 5674 > INFO - serverport = '5674' > INFO - Read 9 weights (0 of them dense) > INFO - Reading vocabulary: phrase-table.packed/vocabulary > INFO - Read 191983 entries from the vocabulary > INFO - Reading packed config: phrase-table.packed/config > 102030405060708090.100% > Exception in thread "main" java.lang.RuntimeException: The grammar at > phrase-table.packed was packed with packer version 0, but the earliest > supported version is 3 > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.readConfig(PackedGrammar.java:1061) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:143) > at > org.apache.joshua.decoder.phrase.PhraseTable.(PhraseTable.java:65) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:603) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514) > at org.apache.joshua.decoder.Decoder.(Decoder.java:126) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-280) Existing Spanish --> English Language pack not compatible with Joshua master
[ https://issues.apache.org/jira/browse/JOSHUA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359690#comment-15359690 ] Lewis John McGibbney commented on JOSHUA-280: - [~post] any idea whats up here? Thanks > Existing Spanish --> English Language pack not compatible with Joshua master > > > Key: JOSHUA-280 > URL: https://issues.apache.org/jira/browse/JOSHUA-280 > Project: Joshua > Issue Type: Bug > Components: language packs >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > When I work with the existing Spanish --> English language pack at > http://cs.jhu.edu/~post/language-packs/language-pack-es-en-phrase-2015-03-06.tgz, > I get the following error > {code} > lmcgibbn@LMC-032857 > /usr/local/Cellar/joshua/HEAD/libexec/language-pack-es-en-phrase-2015-03-06(NUTCH-2089) > $ ./run-joshua-server.sh > INFO - Parameters read from configuration file: joshua.config > INFO - tm = 'moses -owner pt -maxspan 0 -path phrase-table.packed > -max-source-len 5' > INFO - defaultnonterminal = 'X' > INFO - goalsymbol = 'GOAL' > INFO - featurefunction = 'StateMinimizingLanguageModel -lm_type kenlm > -lm_order 5 -lm_file lm.kenlm' > INFO - markoovs = 'false' > INFO - search = 'stack' > INFO - pop-limit: 100 > INFO - poplimit = '100' > INFO - topn = '0' > INFO - useuniquenbest = 'true' > INFO - outputformat = '%s' > INFO - includealignindex = 'false' > INFO - featurefunction = 'OOVPenalty' > INFO - featurefunction = 'WordPenalty' > INFO - featurefunction = 'Distortion' > INFO - featurefunction = 'PhrasePenalty' > INFO - c = 'joshua.config' > INFO - server-port: 5674 > INFO - serverport = '5674' > INFO - Read 9 weights (0 of them dense) > INFO - Reading vocabulary: phrase-table.packed/vocabulary > INFO - Read 191983 entries from the vocabulary > INFO - Reading packed config: phrase-table.packed/config > 102030405060708090.100% > Exception in thread "main" java.lang.RuntimeException: The grammar at > phrase-table.packed was packed with packer version 0, but the earliest > supported version is 3 > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.readConfig(PackedGrammar.java:1061) > at > org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:143) > at > org.apache.joshua.decoder.phrase.PhraseTable.(PhraseTable.java:65) > at > org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:603) > at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:514) > at org.apache.joshua.decoder.Decoder.(Decoder.java:126) > at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JOSHUA-279) Cannot build Joshua master branch
[ https://issues.apache.org/jira/browse/JOSHUA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned JOSHUA-279: --- Assignee: Lewis John McGibbney > Cannot build Joshua master branch > - > > Key: JOSHUA-279 > URL: https://issues.apache.org/jira/browse/JOSHUA-279 > Project: Joshua > Issue Type: Bug > Components: build, documentation, tests >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Hi Folks, > We need to be cautious of whatever is committed to master branch... the build > has been broken for quite some time and there are constant Javadoc issues > which make the build unstable as well. > For example, when i make an attempt to build master branch we have failing > tests > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install > ... > --- > T E S T S > --- > Running TestSuite > tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 > OOVPenalty=-200.000 | -198.000 > ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib > ERROR - *This probably means that the KenLM library didn't compile. > ERROR - *Make sure that BOOST_ROOT is set to the root of your boost > ERROR - *installation (it's not /opt/local/, the default), change to > ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the > ERROR - *website (joshua-decoder.org). > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > % > % > % > % > % > % > % > % > % > Tests run: 126, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 1.818 sec > <<< FAILURE! - in TestSuite > setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) > Time elapsed: 0.075 sec <<< FAILURE! > java.lang.ExceptionInInitializerError > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken > in java.library.path > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Results : > Failed tests: > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) > Run 1: ClassBasedLanguageModelTest.setUp:52 » ExceptionInInitializer > Run 2: PASS > Tests run: 124, Failures: 1, Errors: 0, Skipped: 4 > [INFO] > > [INFO] BUILD FAILURE > {code} > As a workaround I thought I will try to build the project without running the > test suite, however now Javadoc issues prevent me from doing so! > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install > -DskipTests > ... > 1 error > 14 warnings > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 28.144 s > [INFO] Finished at: 2016-07-01T14:11:42-07:00 > [INFO] Final Memory: 37M/303M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on > project joshua: MavenReportException: Error while creating archive: > [ERROR] Exit code: 1 - > /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:217: > warning: no @param for rule > [ERROR] public int[] getRuleIds(final Rule rule) { > [ERROR] ^ > [ERROR] > /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:217: > warning: no
[jira] [Commented] (JOSHUA-279) Cannot build Joshua master branch
[ https://issues.apache.org/jira/browse/JOSHUA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359676#comment-15359676 ] Lewis John McGibbney commented on JOSHUA-279: - commit 342312e309ec1bb9b1074688c1fbd3897783bc49 Author: Lewis John McGibbneyDate: Fri Jul 1 14:40:44 2016 -0700 JOSHUA-279 Cannot build Joshua master branch The above commit fixes the Javadoc and I can now build. The test suite is still failing so I am still building with the -DskipTests flag > Cannot build Joshua master branch > - > > Key: JOSHUA-279 > URL: https://issues.apache.org/jira/browse/JOSHUA-279 > Project: Joshua > Issue Type: Bug > Components: build, documentation, tests >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Hi Folks, > We need to be cautious of whatever is committed to master branch... the build > has been broken for quite some time and there are constant Javadoc issues > which make the build unstable as well. > For example, when i make an attempt to build master branch we have failing > tests > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install > ... > --- > T E S T S > --- > Running TestSuite > tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 > OOVPenalty=-200.000 | -198.000 > ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib > ERROR - *This probably means that the KenLM library didn't compile. > ERROR - *Make sure that BOOST_ROOT is set to the root of your boost > ERROR - *installation (it's not /opt/local/, the default), change to > ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the > ERROR - *website (joshua-decoder.org). > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - sentence 0 too long 401, truncating to length 200 > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > WARN - no grammars supplied! Supplying dummy glue grammar. > % > % > % > % > % > % > % > % > % > Tests run: 126, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 1.818 sec > <<< FAILURE! - in TestSuite > setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) > Time elapsed: 0.075 sec <<< FAILURE! > java.lang.ExceptionInInitializerError > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken > in java.library.path > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path > at > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) > Results : > Failed tests: > org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) > Run 1: ClassBasedLanguageModelTest.setUp:52 » ExceptionInInitializer > Run 2: PASS > Tests run: 124, Failures: 1, Errors: 0, Skipped: 4 > [INFO] > > [INFO] BUILD FAILURE > {code} > As a workaround I thought I will try to build the project without running the > test suite, however now Javadoc issues prevent me from doing so! > {code} > lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install > -DskipTests > ... > 1 error > 14 warnings > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 28.144 s > [INFO] Finished at: 2016-07-01T14:11:42-07:00 > [INFO] Final Memory: 37M/303M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on > project joshua: MavenReportException: Error while creating archive: > [ERROR] Exit code: 1 - >
[jira] [Created] (JOSHUA-279) Cannot build Joshua master branch
Lewis John McGibbney created JOSHUA-279: --- Summary: Cannot build Joshua master branch Key: JOSHUA-279 URL: https://issues.apache.org/jira/browse/JOSHUA-279 Project: Joshua Issue Type: Bug Components: tests, build, documentation Reporter: Lewis John McGibbney Priority: Blocker Fix For: 6.1 Hi Folks, We need to be cautious of whatever is committed to master branch... the build has been broken for quite some time and there are constant Javadoc issues which make the build unstable as well. For example, when i make an attempt to build master branch we have failing tests {code} lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install ... --- T E S T S --- Running TestSuite tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 OOVPenalty=-200.000 | -198.000 ERROR - * FATAL: Can't find libken.so (libken.dylib on OS X) in $JOSHUA/lib ERROR - *This probably means that the KenLM library didn't compile. ERROR - *Make sure that BOOST_ROOT is set to the root of your boost ERROR - *installation (it's not /opt/local/, the default), change to ERROR - *$JOSHUA, and type 'ant kenlm'. If problems persist, see the ERROR - *website (joshua-decoder.org). WARN - sentence 0 too long 401, truncating to length 200 WARN - sentence 0 too long 401, truncating to length 200 WARN - sentence 0 too long 401, truncating to length 200 WARN - sentence 0 too long 401, truncating to length 200 WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. WARN - no grammars supplied! Supplying dummy glue grammar. % % % % % % % % % Tests run: 126, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 1.818 sec <<< FAILURE! - in TestSuite setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) Time elapsed: 0.075 sec <<< FAILURE! java.lang.ExceptionInInitializerError at org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken in java.library.path at org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path at org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(ClassBasedLanguageModelTest.java:52) Results : Failed tests: org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest.setUp(org.apache.joshua.decoder.ff.lm.class_lm.ClassBasedLanguageModelTest) Run 1: ClassBasedLanguageModelTest.setUp:52 » ExceptionInInitializer Run 2: PASS Tests run: 124, Failures: 1, Errors: 0, Skipped: 4 [INFO] [INFO] BUILD FAILURE {code} As a workaround I thought I will try to build the project without running the test suite, however now Javadoc issues prevent me from doing so! {code} lmcgibbn@LMC-032857 /usr/local/incubator-joshua(master) $ mvn clean install -DskipTests ... 1 error 14 warnings [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 28.144 s [INFO] Finished at: 2016-07-01T14:11:42-07:00 [INFO] Final Memory: 37M/303M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on project joshua: MavenReportException: Error while creating archive: [ERROR] Exit code: 1 - /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:217: warning: no @param for rule [ERROR] public int[] getRuleIds(final Rule rule) { [ERROR] ^ [ERROR] /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:217: warning: no @return [ERROR] public int[] getRuleIds(final Rule rule) { [ERROR] ^ [ERROR] /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:231: warning: no @param for words [ERROR] public int getOovs(final int[] words) { [ERROR] ^ [ERROR] /usr/local/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java:231: warning: no @return [ERROR] public int
[jira] [Resolved] (JOSHUA-269) Fix Javadoc in JOSHUA-252 branch to comply with JDK1.8 Spec
[ https://issues.apache.org/jira/browse/JOSHUA-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-269. - Resolution: Fixed > Fix Javadoc in JOSHUA-252 branch to comply with JDK1.8 Spec > --- > > Key: JOSHUA-269 > URL: https://issues.apache.org/jira/browse/JOSHUA-269 > Project: Joshua > Issue Type: Improvement > Components: documentation >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > When we build the JOSHUA-252 codebase on Jira, we get the following > {code} > [INFO] > > [ERROR] BUILD ERROR > [INFO] > > [INFO] An error has occurred in JavaDocs report generation: > Exit code: 1 - > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:629: > warning: no @param for tbl > public void get_ngrams(HashMaptbl, int order, > ArrayList wrds, > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:629: > warning: no @param for order > public void get_ngrams(HashMap tbl, int order, > ArrayList wrds, > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:629: > warning: no @param for wrds > public void get_ngrams(HashMap tbl, int order, > ArrayList wrds, > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:629: > warning: no @param for ignore_null_equiv_symbol > public void get_ngrams(HashMap tbl, int order, > ArrayList wrds, > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:45: > error: malformed HTML > * @author Zhifei Li, (Johns Hopkins University) > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:45: > error: bad use of '>' > * @author Zhifei Li, (Johns Hopkins University) > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/OracleExtractionHG.java:91: > warning: no description for @param >* @param lm_feat_id_ > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/SplitHg.java:33: > error: malformed HTML > * @author Zhifei Li, (Johns Hopkins University) > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/oracle/SplitHg.java:33: > error: bad use of '>' > * @author Zhifei Li, (Johns Hopkins University) > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/browser/Browser.java:77: > error: @param name not found >* @param args the paths to the source, reference, and n-best files > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/browser/Browser.java:79: > warning: no @param for argv > public static void main(String[] argv) throws IOException { > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/browser/Browser.java:79: > warning: no @throws for java.io.IOException > public static void main(String[] argv) throws IOException { > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/tree/Tree.java:165: > warning: no @return > public int size() { > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/tree/Tree.java:172: > warning: no @return > public Node root() { > ^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/tree/Tree.java:51: > error: malformed HTML > * @author Jonny Weese >^ > /home/jenkins/jenkins-slave/workspace/joshua_maven/src/main/java/org/apache/joshua/ui/tree_visualizer/tree/Tree.java:51: > error: bad use of '>' > * @author Jonny Weese > ^ >
[jira] [Updated] (JOSHUA-275) Revamp the Configuration System
[ https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-275: Fix Version/s: (was: 6,2) 6.2 > Revamp the Configuration System > --- > > Key: JOSHUA-275 > URL: https://issues.apache.org/jira/browse/JOSHUA-275 > Project: Joshua > Issue Type: Improvement >Affects Versions: 6.1, 6.2, 7 >Reporter: Kellen Sunderland > Fix For: 6.2 > > > I'd like to propose we centralize Joshua's configuration system to make use > of typesafe/config https://github.com/typesafehub/config . This config > system looks like JSON but with comments so it's easy to read. Because it's > JSON it supports hierarchies of configurations, lists of configuration etc > quite easily. It has some nice features like parsing time automatically. > The main advantage here though is that we have a standard config system that > doesn't have to be manually parsed. > Here's a quick example of how we can use it: > {code:java} > @Inject > public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir") > String grammar_dir, > @TypesafeConfig("PackedGrammar.span_limit") > int span_limit, > String owner, > String type) throws FileNotFoundException, > IOException ... > {code} > and then a config similar to > \# Joshua configuration file > {code:javascript} > config = { > default-non-terminal = X > goal-symbol = GOAL > ... > > PackedGrammar: { > type: thrax, > grammar_dir: /local/grammars/... > span_limit: 50 > } > ... > } > {code} > Version: TBD, but it's a breaking change so we may consider putting it in > Joshua 7. > Totally open to other config / injection systems if others want to suggest > any of their favorites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-265) Refactor key interfaces and core code for a future release.
[ https://issues.apache.org/jira/browse/JOSHUA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-265: Fix Version/s: 6.2 > Refactor key interfaces and core code for a future release. > > > Key: JOSHUA-265 > URL: https://issues.apache.org/jira/browse/JOSHUA-265 > Project: Joshua > Issue Type: Improvement >Reporter: Kellen Sunderland >Priority: Minor > Fix For: 6.2 > > > We've discussed making some modifications to the key interfaces. This ticket > can focus on making large changes to the codebase for a future release. This > work will likely take some time and some collaboration. I'd suggest some the > code for this be a separate release branch. > Some issues we can work on: > * I'd propose we conform to the SOLID principles for our major interfaces. > https://en.wikipedia.org/wiki/SOLID_(object-oriented_design) . > * We can look at Sparse / Dense feature vectors and how to handle them > naturally in Joshua. > * Refactor objects that may now be used more broadly than was originally > intended (for example Vocabulary class). > * We should have a general discussion around what parts of the codebase are > responsible for what functions. We should clearly define what logic should > be a part of the Grammar versus the Feature Functions for example, and make > sure logic doesn't leak from one of these objects to the others. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-275) Revamp the Configuration System
[ https://issues.apache.org/jira/browse/JOSHUA-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-275: Fix Version/s: 6,2 > Revamp the Configuration System > --- > > Key: JOSHUA-275 > URL: https://issues.apache.org/jira/browse/JOSHUA-275 > Project: Joshua > Issue Type: Improvement >Affects Versions: 6.1, 6.2, 7 >Reporter: Kellen Sunderland > Fix For: 6,2 > > > I'd like to propose we centralize Joshua's configuration system to make use > of typesafe/config https://github.com/typesafehub/config . This config > system looks like JSON but with comments so it's easy to read. Because it's > JSON it supports hierarchies of configurations, lists of configuration etc > quite easily. It has some nice features like parsing time automatically. > The main advantage here though is that we have a standard config system that > doesn't have to be manually parsed. > Here's a quick example of how we can use it: > {code:java} > @Inject > public PackedGrammar(@TypesafeConfig("PackedGrammar.grammar_dir") > String grammar_dir, > @TypesafeConfig("PackedGrammar.span_limit") > int span_limit, > String owner, > String type) throws FileNotFoundException, > IOException ... > {code} > and then a config similar to > \# Joshua configuration file > {code:javascript} > config = { > default-non-terminal = X > goal-symbol = GOAL > ... > > PackedGrammar: { > type: thrax, > grammar_dir: /local/grammars/... > span_limit: 50 > } > ... > } > {code} > Version: TBD, but it's a breaking change so we may consider putting it in > Joshua 7. > Totally open to other config / injection systems if others want to suggest > any of their favorites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-268) Phrase-based model error (NullPointerException)
[ https://issues.apache.org/jira/browse/JOSHUA-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-268: Fix Version/s: 6.2 > Phrase-based model error (NullPointerException) > --- > > Key: JOSHUA-268 > URL: https://issues.apache.org/jira/browse/JOSHUA-268 > Project: Joshua > Issue Type: Bug > Components: decoders >Affects Versions: 6.0.5 > Environment: fedora 23 >Reporter: Kyle Richardson >Priority: Minor > Fix For: 6.2 > > > I'm trying to run the phrase.sh example script (the only modification I made > was to take out the --optimizer-runs option, because the system says that > this is an "Unknown option"). > The error comes at the tuning stage (specifically, it fails at some point in > the tuning then complains that it cannot find the "joshua.config.final" > file). > Looking into the log file (tune/joshua.log), it seems to translate and tune a > number of sentences, then it raises the following NullPointerException: > Memory used after sentence 7 is 42.5 MB > Translation 7: -30.617 good how is fine > Input 2: Collecting options took 0.000 seconds > Input 8: Collecting options took 0.000 seconds > Input 2: FATAL UNCAUGHT EXCEPTION: null > java.lang.NullPointerException > at joshua.decoder.phrase.Candidate.score(Candidate.java:214) > at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:136) > at joshua.decoder.phrase.Candidate.compareTo(Candidate.java:19) > at java.util.HashMap.compareComparables(HashMap.java:371) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1920) > at java.util.HashMap.treeifyBin(HashMap.java:771) > at java.util.HashMap.putVal(HashMap.java:643) > at java.util.HashMap.put(HashMap.java:611) > at java.util.HashSet.add(HashSet.java:219) > at joshua.decoder.phrase.Stack.addCandidate(Stack.java:125) > at joshua.decoder.phrase.Stacks.search(Stacks.java:166) > at joshua.decoder.DecoderThread.translate(DecoderThread.java:113) > atjoshua.decoder.Decoder$DecoderThreadRunner.run(Decoder.java:218) > There's nothing informative in the tune/mert.log, it just says that it exited > prematurely. The other processes seem to work as expected (although in the > giza.log, there are a number of "Sentence mismatch error! Line " warnings). > I'm running this on Fedora 23 with Moses. I had no problems training the > hiero model. > note--- > There appears to be an open ticket for more or less the same problem > (JOSHUA-267), the difference however is that in that in this ticket, it > appears that the tuner fails on the first input, whereas here, it already > decodes/tunes several inputs before failing (see above). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JOSHUA-253) Enable execution of Unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved JOSHUA-253. - Resolution: Fixed yeah we fixed it in the Maven work > Enable execution of Unit tests > -- > > Key: JOSHUA-253 > URL: https://issues.apache.org/jira/browse/JOSHUA-253 > Project: Joshua > Issue Type: Test >Affects Versions: 6.0 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > Attachments: JOSHUA-253.patch > > > As per our [discussion on this > topic|http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg00270.html], > [~teofili] correctly identified that unit level tests are not executed. > We need to fix this such that they are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (JOSHUA-253) Enable execution of Unit tests
[ https://issues.apache.org/jira/browse/JOSHUA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned JOSHUA-253: --- Assignee: Lewis John McGibbney > Enable execution of Unit tests > -- > > Key: JOSHUA-253 > URL: https://issues.apache.org/jira/browse/JOSHUA-253 > Project: Joshua > Issue Type: Test >Affects Versions: 6.0 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.1 > > Attachments: JOSHUA-253.patch > > > As per our [discussion on this > topic|http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg00270.html], > [~teofili] correctly identified that unit level tests are not executed. > We need to fix this such that they are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-276) Trivial fixes to 1.8 Javadoc
Lewis John McGibbney created JOSHUA-276: --- Summary: Trivial fixes to 1.8 Javadoc Key: JOSHUA-276 URL: https://issues.apache.org/jira/browse/JOSHUA-276 Project: Joshua Issue Type: Bug Components: core Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Priority: Trivial Fix For: 6.1 There are some trivial Javadoc issues to be fixed in now master branch {code} [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 37.358s [INFO] Finished at: Wed Jun 01 03:28:40 UTC 2016 [INFO] Final Memory: 40M/861M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8:aggregate (default-cli) on project joshua: An error has occurred in JavaDocs report generation: [ERROR] Exit code: 1 - /home/jenkins/jenkins-slave/workspace/joshua_master/src/main/java/org/apache/joshua/decoder/StructuredTranslationFactory.java:47: warning: no description for @param [ERROR] * @param sourceSentence [ERROR] ^ [ERROR] /home/jenkins/jenkins-slave/workspace/joshua_master/src/main/java/org/apache/joshua/decoder/StructuredTranslationFactory.java:48: warning: no description for @param [ERROR] * @param hypergraph [ERROR] ^ [ERROR] /home/jenkins/jenkins-slave/workspace/joshua_master/src/main/java/org/apache/joshua/decoder/StructuredTranslationFactory.java:49: warning: no description for @param [ERROR] * @param featureFunctions [ERROR] ^ [ERROR] /home/jenkins/jenkins-slave/workspace/joshua_master/src/main/java/org/apache/joshua/decoder/ff/FeatureVector.java:80: error: reference not found [ERROR] * features) and in {@link org.apache.joshua.decoder.ff.tm.BilingualRule#estimateRuleCost(java.util.List)} [ERROR] ^ [ERROR] [ERROR] Command line was: /home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.8/jre/../bin/javadoc @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/home/jenkins/jenkins-slave/workspace/joshua_master/target/site/apidocs' dir. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException Build step 'Invoke top-level Maven targets' marked build as failure Publishing Javadoc {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-252) Make it possible to use Maven to build Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309166#comment-15309166 ] Lewis John McGibbney commented on JOSHUA-252: - ACK done https://builds.apache.org/view/H-L/view/Joshua/job/joshua_master/ There is a transient build slave error which I'll try and sort out. [~post] NICE WORK :) > Make it possible to use Maven to build Joshua > - > > Key: JOSHUA-252 > URL: https://issues.apache.org/jira/browse/JOSHUA-252 > Project: Joshua > Issue Type: Improvement > Components: build >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 6.1 > > > As per discussion on the dev@ list for now Ant is the official build tool for > Joshua however we would like to possibly switch to Maven if / when someone is > able to do so. > Assigning to me for now as I could be able to look into this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-271) Thrax invocation should not reply upon $HADOOP being set
Lewis John McGibbney created JOSHUA-271: --- Summary: Thrax invocation should not reply upon $HADOOP being set Key: JOSHUA-271 URL: https://issues.apache.org/jira/browse/JOSHUA-271 Project: Joshua Issue Type: Bug Components: pipeline, thrax Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.1 Right now one cannot run thrax unless the $HADOOP env variable is defined. Every time the hadoop script is invoked it means that the path is coded as $HADOOP/bin/hadoop however what happens if you are using a VM (Vagrant) to connect to a cluster for which no $HADOOP env variable is defined? The hadoop script should be on the path and available to use from there. The only check which should be made is whether it is available from the path or not, if it is not then start_hadoop_cluster subroutine can be called. This reduces code and makes more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-270) pipeline.pl needs major refactoring
Lewis John McGibbney created JOSHUA-270: --- Summary: pipeline.pl needs major refactoring Key: JOSHUA-270 URL: https://issues.apache.org/jira/browse/JOSHUA-270 Project: Joshua Issue Type: Bug Components: pipeline Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.1 Right now [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] is well over 2000 lines long and extremely difficult to navigate. I propose the following * All ENV is refactored into an pipeline_environment file * All Command line parsing and definitions are refactored into a pipeline_cli file * Sanity checking is refactored into a pipeline_sanity_check file * Dependenct Variable Checking is refactored into pipeline_dependent_variable_setting file * filter and preprocess corpora is refactored into pipeline_filter_preprocess_corpora * pipeline_subsampling becomes a file * pipeline_alignment becomes a file * pipeline_parsing becomes a file * pipeline_thrax becomes a file * pipeline_tuning becomes a file * pipeline_testing becomes a file * pipeline_subreoutines becomes a file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-262) Implement all logging as Slf4j over Log4j
[ https://issues.apache.org/jira/browse/JOSHUA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294656#comment-15294656 ] Lewis John McGibbney commented on JOSHUA-262: - I honestly have no idea. > Implement all logging as Slf4j over Log4j > - > > Key: JOSHUA-262 > URL: https://issues.apache.org/jira/browse/JOSHUA-262 > Project: Joshua > Issue Type: Improvement > Components: core >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Assignee: Thamme Gowda N > Fix For: 6.1 > > > [~hsaputra] suggested that we implement all logging as Slf4j over Log4j. If > we use [parameterized logging > notation|http://www.slf4j.org/faq.html#logging_performance] we can have good > logging in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-252) Make it possible to use Maven to build Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283123#comment-15283123 ] Lewis John McGibbney commented on JOSHUA-252: - [~teofili] I am working on this today I will post a pull request ASAP > Make it possible to use Maven to build Joshua > - > > Key: JOSHUA-252 > URL: https://issues.apache.org/jira/browse/JOSHUA-252 > Project: Joshua > Issue Type: Improvement > Components: build >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 6.1 > > > As per discussion on the dev@ list for now Ant is the official build tool for > Joshua however we would like to possibly switch to Maven if / when someone is > able to do so. > Assigning to me for now as I could be able to look into this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-252) Make it possible to use Maven to build Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-252: Component/s: build > Make it possible to use Maven to build Joshua > - > > Key: JOSHUA-252 > URL: https://issues.apache.org/jira/browse/JOSHUA-252 > Project: Joshua > Issue Type: Improvement > Components: build >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > Fix For: 6.1 > > > As per discussion on the dev@ list for now Ant is the official build tool for > Joshua however we would like to possibly switch to Maven if / when someone is > able to do so. > Assigning to me for now as I could be able to look into this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-259) Integration tests are failing
[ https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-259: Fix Version/s: 6.1 > Integration tests are failing > - > > Key: JOSHUA-259 > URL: https://issues.apache.org/jira/browse/JOSHUA-259 > Project: Joshua > Issue Type: Bug >Reporter: Kellen Sunderland > Fix For: 6.1 > > > Several integration tests are currently failing with Joshua. I have a quick > fix coming for one of the tests but just in case we need more discussion > around the failures I'll open a bug. > The currently failing tests for me: > test/decoder/too-long > test/server/http > test/server/tcp-text > test/thrax/extraction > and > test/decoder/moses-compat (but this is easy to fix, simple extra space in the > expected file) > These are failing under OS X 10.11. If working under other environments feel > free to post a 'works for me'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-260) Integrate IoC (Inversion of Control) into Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-260: Fix Version/s: 6.1 > Integrate IoC (Inversion of Control) into Joshua > > > Key: JOSHUA-260 > URL: https://issues.apache.org/jira/browse/JOSHUA-260 > Project: Joshua > Issue Type: Improvement >Reporter: Kellen Sunderland >Assignee: Kellen Sunderland > Fix For: 6.1 > > > I'd like to propose we investigate looking into using guice > (https://github.com/google/guice) in conjunction with joshua's configuration > system. I believe it would give us a nice way to map what is in the > configuration to the code paths, and implementations used within Joshua. It > also would go a long way to allowing us to integrate unit tests throughout > all the important classes in Joshua. What does everyone think? Would IoC be > a good pattern to adopt? Is everyone ok with using guice (versus say some > other IoC library). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-262) Implement all logging as Slf4j over Log4j
[ https://issues.apache.org/jira/browse/JOSHUA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-262: Component/s: core > Implement all logging as Slf4j over Log4j > - > > Key: JOSHUA-262 > URL: https://issues.apache.org/jira/browse/JOSHUA-262 > Project: Joshua > Issue Type: Improvement > Components: core >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney > Fix For: 6.1 > > > [~hsaputra] suggested that we implement all logging as Slf4j over Log4j. If > we use [parameterized logging > notation|http://www.slf4j.org/faq.html#logging_performance] we can have good > logging in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-262) Implement all logging as Slf4j over Log4j
Lewis John McGibbney created JOSHUA-262: --- Summary: Implement all logging as Slf4j over Log4j Key: JOSHUA-262 URL: https://issues.apache.org/jira/browse/JOSHUA-262 Project: Joshua Issue Type: Improvement Affects Versions: 6.0.5 Reporter: Lewis John McGibbney Fix For: 6.1 [~hsaputra] suggested that we implement all logging as Slf4j over Log4j. If we use [parameterized logging notation|http://www.slf4j.org/faq.html#logging_performance] we can have good logging in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-261) Remove ext directory from source tree
[ https://issues.apache.org/jira/browse/JOSHUA-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276506#comment-15276506 ] Lewis John McGibbney commented on JOSHUA-261: - In all honesty, the code can remain in the source tree in SCM but we just can't ship it with a release. > Remove ext directory from source tree > - > > Key: JOSHUA-261 > URL: https://issues.apache.org/jira/browse/JOSHUA-261 > Project: Joshua > Issue Type: Task >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > Right now we have a bunch of cofe bundled in to the > [ext|https://github.com/apache/incubator-joshua/tree/master/ext] directory. I > don't think any of this code can be shipped with an Apache Joshua > (Incubating) release so we need to think about a mechanism for removing it > and making Joshua work in other ways. -- This message was sent by Atlassian JIRA (v6.3.4#6332)