[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management
[ https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656191#comment-15656191 ] ASF GitHub Bot commented on JOSHUA-323: --- GitHub user lewismc opened a pull request: https://github.com/apache/incubator-joshua/pull/76 Release This issue addresses https://issues.apache.org/jira/browse/JOSHUA-323 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/incubator-joshua release Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/76.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #76 commit f45c3fd39ae74604a40046a97b98bb3154ca5ba7 Author: Lewis John McGibbneyDate: 2016-11-11T04:32:22Z JOSHUA-323 Joshua 6.1 Release Management commit 7f2ef263cbaee94ba82cff4e8145a288180a3afa Author: Lewis John McGibbney Date: 2016-11-11T04:53:47Z JOSHUA-323 Joshua 6.1 Release Management commit 10bfa31993a8859adc06b200fa173fb5f18e7919 Author: Lewis John McGibbney Date: 2016-11-11T04:54:59Z JOSHUA-323 Joshua 6.1 Release Management commit c733f45a020034bd05318a8dd31ff4d84aebe19a Author: Lewis John McGibbney Date: 2016-11-11T04:59:23Z JOSHUA-323 Joshua 6.1 Release Management commit c2c59d8435ebb8d7d5dd8191ff2dec24352a4838 Author: Lewis John McGibbney Date: 2016-11-11T05:06:26Z JOSHUA-323 Joshua 6.1 Release Management commit eb994e311a0c7593ad1899df6153b65585760cc8 Author: Lewis John McGibbney Date: 2016-11-11T05:20:32Z JOSHUA-323 Joshua 6.1 Release Management > Joshua 6.1 Release Management > - > > Key: JOSHUA-323 > URL: https://issues.apache.org/jira/browse/JOSHUA-323 > Project: Joshua > Issue Type: Task > Components: build, release >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > This is a governing ticket for reference more than anything else. We need to > add all release specific build additions to parent pom.xml which enable us to > roll a release candidate. > The process is also being documented over at > https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management
[ https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656193#comment-15656193 ] Lewis John McGibbney commented on JOSHUA-323: - Progress going well. RAT license headers are taking a wee while but will have them cracked for tomorrow. Following files are outstanding. Progress can be tracked over on https://github.com/apache/incubator-joshua/pull/76 {code} Files with unapproved licenses: scripts/analysis/sentence-by-sentence.pl scripts/analysis/tree_visualizer scripts/copy-config.pl scripts/distributedLM/config.template scripts/distributedLM/create_remote_sym_tbl.pl scripts/distributedLM/filter_lm.pl scripts/distributedLM/get_grammar_eng_voc.pl scripts/distributedLM/get_grammar_eng_voc_from_cn_voc.pl scripts/distributedLM/global_symol_list scripts/distributedLM/lm.list.withweights scripts/ems/config.ghkm scripts/ems/config.hiero scripts/ems/config.phrase scripts/ems/experiment.meta scripts/language-pack/build_lp.sh scripts/language-pack/README.template scripts/misc/canonical_path scripts/misc/iso639 scripts/preparation/detokenize.pl scripts/preparation/lowercase.pl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ca scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.cs scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.de scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.el scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.en scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.es scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.fr scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.hu scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.is scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.it scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.lv scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.nl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pt scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ro scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ru scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sk scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sl scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sv scripts/preparation/normalize.pl scripts/preparation/tokenize.pl scripts/support/bbn2plf.pl scripts/support/extract-1best scripts/support/grammar-packer.pl scripts/support/moses2joshua.pl scripts/support/moses2joshua_grammar.pl scripts/support/phrase2hiero.py scripts/support/score-hypothesis.pl scripts/support/split2files scripts/training/add-OOVs.pl scripts/training/build-vocab.pl scripts/training/cachepipe/bashrc scripts/training/cachepipe/CachePipe.pm scripts/training/filter-empty-lines.pl scripts/training/filter-rules.pl scripts/training/get_grammar_features.pl scripts/training/lowercase-leaves.pl scripts/training/mira/feature_label_munger.pl scripts/training/mira/run-mira.pl scripts/training/paralign.pl scripts/training/parallelize/LocalConfig.pm scripts/training/parallelize/Makefile scripts/training/parallelize/parallelize.pl scripts/training/parallelize/sentclient.c scripts/training/parallelize/sentserver.c scripts/training/parallelize/sentserver.h scripts/training/paste scripts/training/run-giza.pl scripts/training/scat scripts/training/summarize.pl scripts/training/templates/alignment/jacana/resources/model/tagdict scripts/training/templates/alignment/word-align.conf scripts/training/templates/glue-grammar scripts/training/templates/glue-grammar.itg scripts/training/templates/hadoop/core-site.xml scripts/training/templates/hadoop/hdfs-site.xml scripts/training/templates/hadoop/mapred-site.xml scripts/training/templates/hadoop/masters scripts/training/templates/hadoop/slaves scripts/training/templates/thrax-hiero.conf scripts/training/templates/thrax-phrasal.conf scripts/training/templates/thrax-phrase-gt.conf scripts/training/templates/thrax-phrase.conf scripts/training/templates/thrax-samt.conf scripts/training/templates/tune/decoder_command scripts/training/templates/tune/decoder_command.qsub scripts/training/templates/tune/joshua.config scripts/training/TODO scripts/training/trim_parallel_corpus.pl scripts/training/unmap-html.pl {code} > Joshua 6.1 Release Management > - > > Key: JOSHUA-323 > URL: https://issues.apache.org/jira/browse/JOSHUA-323 > Project: Joshua > Issue Type: Task > Components: build, release >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Blocker > Fix For: 6.1 > > > This is a
[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
[ https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-317: Fix Version/s: (was: 6.1) 6.2 > SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391 > > > Key: JOSHUA-317 > URL: https://issues.apache.org/jira/browse/JOSHUA-317 > Project: Joshua > Issue Type: Bug > Components: tuner >Affects Versions: 6.0.5 > Environment: Python 3.5 >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 6.2 > > > {code} > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp3/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue > took 0 seconds (0s) > [mert-1] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > [CHANGED] > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > [CHANGED] > dep=tune/model/grammar.packed/slice_0.source [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en > /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru > --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner > mert --decoder > /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command > --decoder-config > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config > --decoder-output-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest > --decoder-log-file > /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log > --iterations 10 --metric 'BLEU 4 closest' > JOB FAILED (return code 1) > File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391 > 'ITERATIONS': `iterations`, > ^ > SyntaxError: invalid syntax > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-323) Joshua 6.1 Release Management
Lewis John McGibbney created JOSHUA-323: --- Summary: Joshua 6.1 Release Management Key: JOSHUA-323 URL: https://issues.apache.org/jira/browse/JOSHUA-323 Project: Joshua Issue Type: Task Components: release, build Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Blocker Fix For: 6.1 This is a governing ticket for reference more than anything else. We need to add all release specific build additions to parent pom.xml which enable us to roll a release candidate. The process is also being documented over at https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Lewis Volunteering for 6.1 Release Manager
Just landing back in the states from Berlin. This sounds great Lewis! matt (from my phone) > Le 10 nov. 2016 à 12:02, lewis john mcgibbneya écrit : > > Hi Folks, > I would like to put myself forward as release manager for 6.1. > I've got a lot of experience working with Incubating releases and have been > successful in the position of release manager resulting in the release of > around 20-30 official incubating and top level projects here at Apache. > I'll make sure to document the entire release procedure on our wiki for > future reference. > Does anyone object? If not then I will get to it today. > Lewis > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney
Re: Lewis Volunteering for 6.1 Release Manager
You rock Lewis. I’ll be sure to test it and am eager! ++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-502 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 11/10/16, 9:02 AM, "lewis john mcgibbney"wrote: Hi Folks, I would like to put myself forward as release manager for 6.1. I've got a lot of experience working with Incubating releases and have been successful in the position of release manager resulting in the release of around 20-30 official incubating and top level projects here at Apache. I'll make sure to document the entire release procedure on our wiki for future reference. Does anyone object? If not then I will get to it today. Lewis -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney
Re: Joshua Model Input Format(s) and LM Loading
I've done fair amount of measurement and profiling in this area over the last year, so I can offer a little bit of advice as well. First of all make sure you're not just using a Java profiler if you're looking at a model with KenLM. The perf impact of language model calls will be under-reported. If you want to optimized LM calls using a Java profiler I'd recommend using a Berkley model and measuring the minimization of lm calls there. If you can reduce the amount of work (lm calls) to Berkley, the speed improvements should carry over to KenLM as well. (Of course providing the optimizations are in the Joshua code). If you want to optimize KenLM models you're better off using some combination of a native profiler and a JVM profiler. Even then I'd consider the impact of the calls to the LM as under-reported. This is because making frequent JNI calls amplifies the amount of work required in GC (I think we've discussed that before). Another idea would be to call KenLM with an RPC framework (like gRPC). It'll likely be slower, but then you could measure the Java process with a Java Profiler (without ill-effects on GC). You could also then independently measure the KenLM process with a native profiler. This might give you a fairly accurate view of what to optimize. -Kellen On Wed, Oct 26, 2016 at 9:34 AM, lewis john mcgibbneywrote: > I hear ye loud and clear Matt :) Thank you for the response. > > On Wed, Oct 26, 2016 at 12:30 AM, < > dev-digest-h...@joshua.incubator.apache.org> wrote: > > > > > From: Matt Post > > To: dev@joshua.incubator.apache.org > > Cc: > > Date: Tue, 25 Oct 2016 08:49:19 -0400 > > Subject: Re: Joshua Model Input Format(s) and LM Loading > > Hi Lewis, > > > > Joshua supports two language model representation packages: KenLM [0] and > > BerkeleyLM [1]. These were both developed at about the same time, and > > represented huge gains in doing this task efficiently, over what had > > previously been the standard approach (SRILM). Ken Heafield (who has > > contributed a lot to Joshua) went on to contribute a lot of other > > improvements to language model representation, decoder integration, and > > also the actual construction of language models and their efficient > > interpolation. His goal for a while was to make SRILM completely > > unnecessary, and I think he succeeded. > > > > BerkeleyLM was more of a one-off project. It is slower than KenLM and > > hasn't been touched in years. If you want to understand, your efforts are > > probably best spent looking into KenLM papers. But it's also worth noting > > that Ken is a crack C++ programmer who has spent years hacking away on > > these problems, and your chances of finding any further efficiencies > there > > are probably quite limited unless you have a lot of background in the > area. > > But even if you did, I would recommend you not spend your time that way > — I > > basically consider the LM representation problem to have been solved by > > KenLM. That's not to say that there are some improvements to be had on > the > > Joshua / JNI bridge, but even there, there are probably better things to > do. > > > > matt > > > > [0] KenLM: Faster and Smaller Language Model Queries > > http://www.kheafield.com/professional/avenue/kenlm.pdf > > > > [1] Faster and Smaller N-Gram Language Models > > http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf > > > > >