[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656191#comment-15656191
 ] 

ASF GitHub Bot commented on JOSHUA-323:
---

GitHub user lewismc opened a pull request:

https://github.com/apache/incubator-joshua/pull/76

Release

This issue addresses https://issues.apache.org/jira/browse/JOSHUA-323

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/incubator-joshua release

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #76


commit f45c3fd39ae74604a40046a97b98bb3154ca5ba7
Author: Lewis John McGibbney 
Date:   2016-11-11T04:32:22Z

JOSHUA-323 Joshua 6.1 Release Management

commit 7f2ef263cbaee94ba82cff4e8145a288180a3afa
Author: Lewis John McGibbney 
Date:   2016-11-11T04:53:47Z

JOSHUA-323 Joshua 6.1 Release Management

commit 10bfa31993a8859adc06b200fa173fb5f18e7919
Author: Lewis John McGibbney 
Date:   2016-11-11T04:54:59Z

JOSHUA-323 Joshua 6.1 Release Management

commit c733f45a020034bd05318a8dd31ff4d84aebe19a
Author: Lewis John McGibbney 
Date:   2016-11-11T04:59:23Z

JOSHUA-323 Joshua 6.1 Release Management

commit c2c59d8435ebb8d7d5dd8191ff2dec24352a4838
Author: Lewis John McGibbney 
Date:   2016-11-11T05:06:26Z

JOSHUA-323 Joshua 6.1 Release Management

commit eb994e311a0c7593ad1899df6153b65585760cc8
Author: Lewis John McGibbney 
Date:   2016-11-11T05:20:32Z

JOSHUA-323 Joshua 6.1 Release Management




> Joshua 6.1 Release Management
> -
>
> Key: JOSHUA-323
> URL: https://issues.apache.org/jira/browse/JOSHUA-323
> Project: Joshua
>  Issue Type: Task
>  Components: build, release
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> This is a governing ticket for reference more than anything else. We need to 
> add all release specific build additions to parent pom.xml which enable us to 
> roll a release candidate.
> The process is also being documented over at 
> https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management

2016-11-10 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656193#comment-15656193
 ] 

Lewis John McGibbney commented on JOSHUA-323:
-

Progress going well. RAT license headers are taking a wee while but will have 
them cracked for tomorrow. Following files are outstanding.
Progress can be tracked over on 
https://github.com/apache/incubator-joshua/pull/76
{code}
Files with unapproved licenses:

  scripts/analysis/sentence-by-sentence.pl
  scripts/analysis/tree_visualizer
  scripts/copy-config.pl
  scripts/distributedLM/config.template
  scripts/distributedLM/create_remote_sym_tbl.pl
  scripts/distributedLM/filter_lm.pl
  scripts/distributedLM/get_grammar_eng_voc.pl
  scripts/distributedLM/get_grammar_eng_voc_from_cn_voc.pl
  scripts/distributedLM/global_symol_list
  scripts/distributedLM/lm.list.withweights
  scripts/ems/config.ghkm
  scripts/ems/config.hiero
  scripts/ems/config.phrase
  scripts/ems/experiment.meta
  scripts/language-pack/build_lp.sh
  scripts/language-pack/README.template
  scripts/misc/canonical_path
  scripts/misc/iso639
  scripts/preparation/detokenize.pl
  scripts/preparation/lowercase.pl
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ca
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.cs
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.de
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.el
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.en
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.es
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.fr
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.hu
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.is
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.it
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.lv
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.nl
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pl
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.pt
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ro
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.ru
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sk
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sl
  scripts/preparation/nonbreaking_prefixes/nonbreaking_prefix.sv
  scripts/preparation/normalize.pl
  scripts/preparation/tokenize.pl
  scripts/support/bbn2plf.pl
  scripts/support/extract-1best
  scripts/support/grammar-packer.pl
  scripts/support/moses2joshua.pl
  scripts/support/moses2joshua_grammar.pl
  scripts/support/phrase2hiero.py
  scripts/support/score-hypothesis.pl
  scripts/support/split2files
  scripts/training/add-OOVs.pl
  scripts/training/build-vocab.pl
  scripts/training/cachepipe/bashrc
  scripts/training/cachepipe/CachePipe.pm
  scripts/training/filter-empty-lines.pl
  scripts/training/filter-rules.pl
  scripts/training/get_grammar_features.pl
  scripts/training/lowercase-leaves.pl
  scripts/training/mira/feature_label_munger.pl
  scripts/training/mira/run-mira.pl
  scripts/training/paralign.pl
  scripts/training/parallelize/LocalConfig.pm
  scripts/training/parallelize/Makefile
  scripts/training/parallelize/parallelize.pl
  scripts/training/parallelize/sentclient.c
  scripts/training/parallelize/sentserver.c
  scripts/training/parallelize/sentserver.h
  scripts/training/paste
  scripts/training/run-giza.pl
  scripts/training/scat
  scripts/training/summarize.pl
  scripts/training/templates/alignment/jacana/resources/model/tagdict
  scripts/training/templates/alignment/word-align.conf
  scripts/training/templates/glue-grammar
  scripts/training/templates/glue-grammar.itg
  scripts/training/templates/hadoop/core-site.xml
  scripts/training/templates/hadoop/hdfs-site.xml
  scripts/training/templates/hadoop/mapred-site.xml
  scripts/training/templates/hadoop/masters
  scripts/training/templates/hadoop/slaves
  scripts/training/templates/thrax-hiero.conf
  scripts/training/templates/thrax-phrasal.conf
  scripts/training/templates/thrax-phrase-gt.conf
  scripts/training/templates/thrax-phrase.conf
  scripts/training/templates/thrax-samt.conf
  scripts/training/templates/tune/decoder_command
  scripts/training/templates/tune/decoder_command.qsub
  scripts/training/templates/tune/joshua.config
  scripts/training/TODO
  scripts/training/trim_parallel_corpus.pl
  scripts/training/unmap-html.pl
{code}

> Joshua 6.1 Release Management
> -
>
> Key: JOSHUA-323
> URL: https://issues.apache.org/jira/browse/JOSHUA-323
> Project: Joshua
>  Issue Type: Task
>  Components: build, release
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> This is a 

[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-11-10 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-317:

Fix Version/s: (was: 6.1)
   6.2

> SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
> 
>
> Key: JOSHUA-317
> URL: https://issues.apache.org/jira/browse/JOSHUA-317
> Project: Joshua
>  Issue Type: Bug
>  Components: tuner
>Affects Versions: 6.0.5
> Environment: Python 3.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.2
>
>
> {code}
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
>   took 0 seconds (0s)
> [mert-1] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> [CHANGED]
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
> 'ITERATIONS': `iterations`,
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-323) Joshua 6.1 Release Management

2016-11-10 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-323:
---

 Summary: Joshua 6.1 Release Management
 Key: JOSHUA-323
 URL: https://issues.apache.org/jira/browse/JOSHUA-323
 Project: Joshua
  Issue Type: Task
  Components: release, build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Blocker
 Fix For: 6.1


This is a governing ticket for reference more than anything else. We need to 
add all release specific build additions to parent pom.xml which enable us to 
roll a release candidate.
The process is also being documented over at 
https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Lewis Volunteering for 6.1 Release Manager

2016-11-10 Thread Matt Post
Just landing back in the states from Berlin. This sounds great Lewis!

matt (from my phone)

> Le 10 nov. 2016 à 12:02, lewis john mcgibbney  a écrit :
> 
> Hi Folks,
> I would like to put myself forward as release manager for 6.1.
> I've got a lot of experience working with Incubating releases and have been
> successful in the position of release manager resulting in the release of
> around 20-30 official incubating and top level projects here at Apache.
> I'll make sure to document the entire release procedure on our wiki for
> future reference.
> Does anyone object? If not then I will get to it today.
> Lewis
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney



Re: Lewis Volunteering for 6.1 Release Manager

2016-11-10 Thread Mattmann, Chris A (3010)
You rock Lewis. I’ll be sure to test it and am eager!

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 11/10/16, 9:02 AM, "lewis john mcgibbney"  wrote:

Hi Folks,
I would like to put myself forward as release manager for 6.1.
I've got a lot of experience working with Incubating releases and have been
successful in the position of release manager resulting in the release of
around 20-30 official incubating and top level projects here at Apache.
I'll make sure to document the entire release procedure on our wiki for
future reference.
Does anyone object? If not then I will get to it today.
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney




Re: Joshua Model Input Format(s) and LM Loading

2016-11-10 Thread kellen sunderland
I've done fair amount of measurement and profiling in this area over the
last year, so I can offer a little bit of advice as well.

First of all make sure you're not just using a Java profiler if you're
looking at a model with KenLM.  The perf impact of language model calls
will be under-reported.  If you want to optimized LM calls using a Java
profiler I'd recommend using a Berkley model and measuring the minimization
of  lm calls there.  If you can reduce the amount of work (lm calls) to
Berkley, the speed improvements should carry over to KenLM as well.  (Of
course providing the optimizations are in the Joshua code).

If you want to optimize KenLM models you're better off using some
combination of a native profiler and a JVM profiler.  Even then I'd
consider the impact of the calls to the LM as under-reported.  This is
because making frequent JNI calls amplifies the amount of work required in
GC (I think we've discussed that before).

Another idea would be to call KenLM with an RPC framework (like gRPC).
It'll likely be slower, but then you could measure the Java process with a
Java Profiler (without ill-effects on GC).  You could also then
independently measure the KenLM process with a native profiler.  This might
give you a fairly accurate view of what to optimize.

-Kellen


On Wed, Oct 26, 2016 at 9:34 AM, lewis john mcgibbney 
wrote:

> I hear ye loud and clear Matt :) Thank you for the response.
>
> On Wed, Oct 26, 2016 at 12:30 AM, <
> dev-digest-h...@joshua.incubator.apache.org> wrote:
>
> >
> > From: Matt Post 
> > To: dev@joshua.incubator.apache.org
> > Cc:
> > Date: Tue, 25 Oct 2016 08:49:19 -0400
> > Subject: Re: Joshua Model Input Format(s) and LM Loading
> > Hi Lewis,
> >
> > Joshua supports two language model representation packages: KenLM [0] and
> > BerkeleyLM [1]. These were both developed at about the same time, and
> > represented huge gains in doing this task efficiently, over what had
> > previously been the standard approach (SRILM). Ken Heafield (who has
> > contributed a lot to Joshua) went on to contribute a lot of other
> > improvements to language model representation, decoder integration, and
> > also the actual construction of language models and their efficient
> > interpolation. His goal for a while was to make SRILM completely
> > unnecessary, and I think he succeeded.
> >
> > BerkeleyLM was more of a one-off project. It is slower than KenLM and
> > hasn't been touched in years. If you want to understand, your efforts are
> > probably best spent looking into KenLM papers. But it's also worth noting
> > that Ken is a crack C++ programmer who has spent years hacking away on
> > these problems, and your chances of finding any further efficiencies
> there
> > are probably quite limited unless you have a lot of background in the
> area.
> > But even if you did, I would recommend you not spend your time that way
> — I
> > basically consider the LM representation problem to have been solved by
> > KenLM. That's not to say that there are some improvements to be had on
> the
> > Joshua / JNI bridge, but even there, there are probably better things to
> do.
> >
> > matt
> >
> > [0] KenLM: Faster and Smaller Language Model Queries
> > http://www.kheafield.com/professional/avenue/kenlm.pdf
> >
> > [1] Faster and Smaller N-Gram Language Models
> > http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf
> >
> >
>