[jira] [Commented] (JOSHUA-304) word-align.conf alignment template file not compatible with berkeley aligner

2016-08-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434164#comment-15434164
 ] 

Lewis John McGibbney commented on JOSHUA-304:
-

It should be noted that in order for me to override the exceptions thrown above 
the template ended up looking like the following
{code}
## word-align.conf
## --
## This is an example training script for the Berkeley
## word aligner.  In this configuration it uses two HMM
## alignment models trained jointly and then decoded 
## using the competitive thresholding heuristic.

##
# Training: Defines the training regimen 
##

forwardModels   HMM
reverseModels   HMM
modeJOINT
iters   5

###
# Execution: Controls output and program flow 
###

execDir alignments/0
create
saveParams  false
numThreads  1
msPerLine   1
alignTraining

#
# Language/Data 
#

foreignSuffix   es.0
englishSuffix   en.0

# Choose the training sources, which can either be directories or files that 
list files/directories
trainSources 
/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/data/train/splits/corpus
sentencesMAX
testSources /dev/null
overwriteExecDir true

#
# 1-best output 
#

competitiveThresholding

{code}

> word-align.conf alignment template file not compatible with berkeley aligner
> 
>
> Key: JOSHUA-304
> URL: https://issues.apache.org/jira/browse/JOSHUA-304
> Project: Joshua
>  Issue Type: Bug
>  Components: alignment, berkeley, templates
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> It takes me quite some time to debug what was going on and why pipeline's 
> were failing when using the berkeley aligner.
> It turns out that the word-align.conf template provided at
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf
> is not compatible with the berkeley aligner. 
> In particular the following lines are non compatible
> https://github.com/apache/incubator-joshua/blob/master/scripts/training/templates/alignment/word-align.conf#L12-L15
> Evidence of this is provided below
> {code}
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1, HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'MODEL1 HMM'; valid choices: MODEL1|MODEL2|HMM|SYNTACTIC|NONE
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Invalid enum: 'JOINT JOINT'; valid choices: FORWARD|REVERSE|BOTH_INDEP|JOINT
> lmcgibbn@LMC-032857 /usr/local/incubator-joshua/lib(master) $ java -d64 
> -Xmx10g -jar /usr/local/incubator-joshua/lib/berkeleyaligner.jar 
> ++/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/6/alignments/0/word-align.conf
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "5 5"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:143)
>   at 
> edu.berkeley.nlp.fig.basic.OptInfo.interpretValue(OptionsParser.java:240)
>   at edu.berkeley.nlp.fig.basic.OptInfo.set(OptionsParser.java:294)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.readOptionsFile(OptionsParser.java:555)
>   at 
> edu.berkeley.nlp.fig.basic.OptionsParser.doParse(OptionsParser.java:604)
>   at 

Re: Sentence mismatch error when attempting to regenerate Fisher Spanish CALLHOME corpus

2016-08-23 Thread lewis john mcgibbney
This is as far as I've got. If possible it would be appreciated to move
conversation over to the Jira ticket.
https://issues.apache.org/jira/browse/JOSHUA-304
Lewis

On Tue, Aug 23, 2016 at 8:58 PM, lewis john mcgibbney 
wrote:

> Hi dev@,
> I ran into a bit of bother whilst attempting to complete the example at
> [0].
> Joshua master is installed correctly.
> The problem I am having is almost exactly described at [1]
>
> I attempt to build the model using the following parameters
>
> $JOSHUA/bin/pipeline.pl --type hiero --rundir 1 --readme "Baseline Hiero
> run" --source es --target en --witten-bell --corpus
> $SPANISH/corpus/asr/callhome_train --corpus $SPANISH/corpus/asr/fisher_train
> --tune  $SPANISH/corpus/asr/fisher_dev --test  
> $SPANISH/corpus/asr/callhome_devtest
> --lm-order 3
>
> It seems that the initial aspects of the pipeline run and complete well
> with the following output
>
> [source-numlines] retrieved cached result =>   151810
>
> However when the pipeline progresses to alignment with GIZA, the generated
> log indicates some fatal error which I am not familiarized with [1]. I've
> never seen it.
> As you can see there are many many sentence mismatch errors within a
> final alignment phase with the following log output
>
> ERROR: Can't generate symmetrized alignment file
>
> I then tried to change the aligner to berekelylm as suggested in [1] and
> also based upon some advice given by Matt in a more recent thread. As
> follows
>
> $JOSHUA/bin/pipeline.pl --type hiero --rundir 3 --readme "Baseline Hiero
> run 3" --source es --target en --lm-gen berkeleylm --lm berkeleylm
> --aligner berkeley --corpus $SPANISH/corpus/asr/callhome_train --corpus
> $SPANISH/corpus/asr/fisher_train --tune  $SPANISH/corpus/asr/fisher_dev
> --test  $SPANISH/corpus/asr/callhome_devtest --lm-order 3
>
> However this results in the following output within the early aspects of
> the pipeline
>
> [source-numlines] retrieved cached result =>   151810
> [berkeley-aligner-chunk-0] rebuilding...
>   dep=alignments/0/word-align.conf [CHANGED]
>   dep=/usr/local/incubator-joshua/experiments/fisher_
> callhome_experiment/4/data/train/splits/corpus.es.0 [CHANGED]
>   dep=/usr/local/incubator-joshua/experiments/fisher_
> callhome_experiment/4/data/train/splits/corpus.en.0 [CHANGED]
>   dep=alignments/0/training.align [NOT FOUND]
>   cmd=java -d64 -Xmx10g -jar 
> /usr/local/incubator-joshua/lib/berkeleyaligner.jar
> ++alignments/0/word-align.conf
>   JOB FAILED (return code 1)
> [aligner-combine] rebuilding...
>   dep=alignments/0/training.align [NOT FOUND]
>   dep=alignments/training.align [NOT FOUND]
>   cmd=cat alignments/0/training.align > alignments/training.align
>   JOB FAILED (return code 1)
> cat: alignments/0/training.align: No such file or directory
>
> It turns out of course that the '++alignments/0/word-align.conf' is not
> present. So I am looking for that bug in the codebase right now and will
> try to submit a PR.
>
> Lewis
>
> [0] https://github.com/apache/incubator-joshua/tree/master/
> examples#building-a-spanishenglish-translation-model-
> using-the-fisher-spanish-callhome-corpus
> [1] https://groups.google.com/forum/#!topic/joshua_support/CvNjIRboixc
> [2] https://paste.apache.org/wjm9
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Sentence mismatch error when attempting to regenerate Fisher Spanish CALLHOME corpus

2016-08-23 Thread lewis john mcgibbney
Hi dev@,
I ran into a bit of bother whilst attempting to complete the example at [0].
Joshua master is installed correctly.
The problem I am having is almost exactly described at [1]

I attempt to build the model using the following parameters

$JOSHUA/bin/pipeline.pl --type hiero --rundir 1 --readme "Baseline Hiero
run" --source es --target en --witten-bell --corpus
$SPANISH/corpus/asr/callhome_train --corpus
$SPANISH/corpus/asr/fisher_train --tune  $SPANISH/corpus/asr/fisher_dev
--test  $SPANISH/corpus/asr/callhome_devtest --lm-order 3

It seems that the initial aspects of the pipeline run and complete well
with the following output

[source-numlines] retrieved cached result =>   151810

However when the pipeline progresses to alignment with GIZA, the generated
log indicates some fatal error which I am not familiarized with [1]. I've
never seen it.
As you can see there are many many sentence mismatch errors within a final
alignment phase with the following log output

ERROR: Can't generate symmetrized alignment file

I then tried to change the aligner to berekelylm as suggested in [1] and
also based upon some advice given by Matt in a more recent thread. As
follows

$JOSHUA/bin/pipeline.pl --type hiero --rundir 3 --readme "Baseline Hiero
run 3" --source es --target en --lm-gen berkeleylm --lm berkeleylm
--aligner berkeley --corpus $SPANISH/corpus/asr/callhome_train --corpus
$SPANISH/corpus/asr/fisher_train --tune  $SPANISH/corpus/asr/fisher_dev
--test  $SPANISH/corpus/asr/callhome_devtest --lm-order 3

However this results in the following output within the early aspects of
the pipeline

[source-numlines] retrieved cached result =>   151810
[berkeley-aligner-chunk-0] rebuilding...
  dep=alignments/0/word-align.conf [CHANGED]

dep=/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/4/data/train/splits/corpus.es.0
[CHANGED]

dep=/usr/local/incubator-joshua/experiments/fisher_callhome_experiment/4/data/train/splits/corpus.en.0
[CHANGED]
  dep=alignments/0/training.align [NOT FOUND]
  cmd=java -d64 -Xmx10g -jar
/usr/local/incubator-joshua/lib/berkeleyaligner.jar
++alignments/0/word-align.conf
  JOB FAILED (return code 1)
[aligner-combine] rebuilding...
  dep=alignments/0/training.align [NOT FOUND]
  dep=alignments/training.align [NOT FOUND]
  cmd=cat alignments/0/training.align > alignments/training.align
  JOB FAILED (return code 1)
cat: alignments/0/training.align: No such file or directory

It turns out of course that the '++alignments/0/word-align.conf' is not
present. So I am looking for that bug in the codebase right now and will
try to submit a PR.

Lewis

[0]
https://github.com/apache/incubator-joshua/tree/master/examples#building-a-spanishenglish-translation-model-using-the-fisher-spanish-callhome-corpus
[1] https://groups.google.com/forum/#!topic/joshua_support/CvNjIRboixc
[2] https://paste.apache.org/wjm9

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Build failed in Jenkins: joshua_master #96

2016-08-23 Thread Apache Jenkins Server
See 

Changes:

[lewis.mcgibbney] Update examples README formatting and links.

[lewis.mcgibbney] Update examples README pipeline invocation parameters

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (Ubuntu golang-ppa ubuntu-us ubuntu) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/incubator-joshua.git # timeout=10
Fetching upstream changes from 
https://git-wip-us.apache.org/repos/asf/incubator-joshua.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/incubator-joshua.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision 0744ebf56906dbe70292737cd50a39652407869d 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0744ebf56906dbe70292737cd50a39652407869d
 > git rev-list ff410c297a149400db3cb553b11a930ad01dc7ed # timeout=10
[joshua_master] $ /home/jenkins/tools/maven/latest3/bin/mvn clean install 
javadoc:aggregate
Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory 
file:
   26586
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

[INFO] Scanning for projects...
[INFO] 
[INFO] 
[INFO] Building Apache Joshua Machine Translation Toolkit 6.1-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ joshua ---
[INFO] Deleting 
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.2.1:process (default) @ joshua ---
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ joshua ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ joshua ---
[INFO] Compiling 266 source files to 

[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
joshua ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 349 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
joshua ---
[INFO] Compiling 38 source files to 

[INFO] 
[INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ joshua ---

---
 T E S T S
---
Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory 
file:
   27154
Try using the -Djava.io.tmpdir= option to select an alternate temp location.
Running TestSuite
102030405060708090.100%
ERROR - Can't find libken.so (libken.dylib on OS X) on the Java library path.
tm_pt_0=-2.000 tm_glue_0=3.000 lm_0=-206.718 lm_0_oov=2.000 OOVPenalty=-200.000 
| -198.000
ERROR - Can't find libken.so (libken.dylib on OS X) on the Java library path.
ERROR - Can't find libken.so (libken.dylib on OS X) on the Java library path.
ERROR - Can't find libken.so (libken.dylib on OS X) on the Java library path.
%
%
%
%
%
%
%
%
%
Tests run: 48, Failures: 1, Errors: 0, Skipped: 9, Time elapsed: 4.219 sec <<< 
FAILURE! - in TestSuite
externalizeVocabulary(org.apache.joshua.util.io.BinaryTest)  Time elapsed: 0.01 
sec  <<< FAILURE!
java.io.IOException: No space left on device
at 
org.apache.joshua.util.io.BinaryTest.externalizeVocabulary(BinaryTest.java:56)


Results :

Failed tests: 
  BinaryTest.externalizeVocabulary:56 ยป IO No space left on device

Tests run: 46, Failures: 1, Errors: 0, Skipped: 7

[INFO] 
[INFO] 
[INFO] Skipping Apache Joshua Machine Translation Toolkit
[INFO] This project has been banned from the build due to previous failures.
[INFO] 
[INFO] 
[INFO] BUILD FAILURE
[INFO] 

[GitHub] incubator-joshua pull request #:

2016-08-23 Thread mjpost
Github user mjpost commented on the pull request:


https://github.com/apache/incubator-joshua/commit/d4fdbfd88bab99e244d3ed1fc9cff4ba5e6d124c#commitcomment-18742703
  
The whole static global feature index is a mess. You've gotten rid of the 
dense feature indices on the 7 branch. What if we get rid of the global state 
index as well? We could fix this in initializeFeatureFunctions, with the 
following test:

...
this.featureFunctions.add(feature);
int stateIndex = 0;
if (feature instanceof StatefulFF) {
feature.setStateIndex(stateIndex);
stateIndex++;
}

Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---