yes, the phrase-based decoder supports lattice decoding
http://www.statmt.org/moses/?n=Moses.WordLattices
Hieu Hoang
http://moses-smt.org/
On 1 February 2017 at 23:10, Angli Liu wrote:
> Hi all,
>
> Is there a way to do lattice decoding with BLEU in Moses? I.e.,
Hi all,
Is there a way to do lattice decoding with BLEU in Moses? I.e., given a
word lattice, find the path that represents the highest BLEU score? If so,
what function to call and in what format should I feed a lattice in?
Thanks!
Angli
___
Hi - I trained a phrase based system from a low resource language to
english, and got *13.6633* as the BLEU score. However, when I tested on the
same dev set and computed BLEU against the English corpus in the dev set, I
only got *3.69*. Then I did a manual grid search over the parameter space
in
Hi Liang,
mteval-v13a.pl does some internal tokenization and probably splits those
"~~" words into " ~ ~ ". If this is happening,
it explains your difference in the calculated BLEU scores.
Cheers,
Matthias
On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote:
> Dear Moses Support Team,
>
>I
Dear Moses Support Team,
I added a source context-dependent translation feature in moses baseline
system.
In order to avoid modifying the source code, i append a unique identifier
to every word in the test/dev source file.
for example, a source file with two lines like the
-
> From: tah...@precisiontranslationtools.com
> Date: Sun, 11 Oct 2015 12:53:37 +0700
> To: moses-support@mit.edu
> Subject: Re: [Moses-support] BLEU score difference about 0.13 for one
> dataset is normal?
>
>
> Yes. Each tuning with the same test set will give you
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one
dataset is normal?
Yes. Each tuning with the same test set will give you small variations
in the final BLEU. Yours looks like they're in a normal range.
Date: Sun, 11 Oct 2015 04:23:56 +
From: Davood Mohammadifar
?
my dataset for Persian to English includes:
Training: about 24 sentences
Tune: 1000 sentences
Test: 1000 sentences
From: tah...@precisiontranslationtools.com
Date: Sun, 11 Oct 2015 12:53:37 +0700
To: moses-support@mit.edu
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one
Yes. Each tuning with the same test set will give you small variations in the
final BLEU. Yours looks like they're in a normal range.
Date: Sun, 11 Oct 2015 04:23:56 +
From: Davood Mohammadifar <davood...@hotmail.com>
Subject: [Moses-support] BLEU score difference about 0.13 f
Hello every one
I noticed different BLEU scores for same dataset. Also the difference is not so
much and is about 0.13.
I trained my dataset and tuned development set for Persian-English translation.
after testing, the score was 21.95. For second time i did the same process and
obtained
Hi Davood,
Optimizers like MERT will give you a slightly different result every time
you run them, leading to variance in BLEU score. It's generally a good
idea to use multiple optimizer runs, especially when comparing two
systems. There's a good paper on hypothesis testing for MT that goes
Hi All!
This is my first post here and AT first I want to apologize for my English
but I would like to ask you some questions. I finished a full phrase based
Moses training of EN-PL (English - Polish) corpus (few million sentences
from free sources + half million sentences from commercial tmx).
Hi Tomek,
4.5% definitely indicate that there was an error in your pipeline (or
test data?). However, there are so many places where things could go
wrong, that based on the little information you have us I could not even
start guessing. Check if your line numbers match, that you use tokenized
Hi Tomek
Yes, that's quite a low score. Have a look at the translation output, do
the sentences have lots of English words in them, are they very long,
very short, or scrambled in some other way?
The commonest problem is that something went wrong in corpus
preparation, for example the
Now that I think of it, truecasing should not change file sizes, after
all it only substitutes single letters with their smaller versions, to
the file should stay the same size. Unless Samoan has some weird utf-8
letters that have different byte sizes between captialized and
uncapitalized
I checked for some of my experiments and I get nearly identical bleu
scores when using the standard weights, differences are on the second
place behind the comma if at all. These results now seem more likely,
though there is still variance.
I am still wondering why would true casing produce
I think you are good now. That's what I am getting for a 500 sentences
test set, trained on 10,000 sentences. Similar to your results. For a
larger test set (4000 sentences) and the same training data there is
nearly no variance, 12.89 vs. 12.91. So now you need to scale up and tune.
BLEU =
Nice thanks. Yeah the truecased files I checked had about 18 or so
differences where one file would capitalise the first letter and the other
file wouldn't. I am going to try and compile more data. But I think I will
only manage to get about 10k to 15k parallel segments altogether. Took me
quite a
Hi I delete all the files (I think) generated during a training job before
rerunning the entire training. You think this could cause variation? Here's
the commands I run to delete:
rm ~/corpus/train.tok.en
rm ~/corpus/train.tok.sm
rm ~/corpus/train.true.en
rm ~/corpus/train.true.sm
rm
I don't think so. However, when you repeat those experiments, you might
try to identify where two trainings are starting to diverge by pairwise
comparisions of the same files between two runs. Maybe then we can
deduce something.
On 23.06.2015 00:25, Hokage Sama wrote:
Hi I delete all the
Ok will do
On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
I don't think so. However, when you repeat those experiments, you might
try to identify where two trainings are starting to diverge by pairwise
comparisions of the same files between two runs. Maybe then we
Hi,
I think the average is OK, your variance is however quite high. Did you
retrain the entire system or just optimize parameters a couple of times?
Two useful papers on the topic:
https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
http://www.mt-archive.info/MTS-2011-Cettolo.pdf
On
Hm. That's interesting. The language should not matter.
1) Do not report results without tuning. They are meaningless. There is
a whole thread on that, look for Major bug found in Moses. If you
ignore the trollish aspects it contains may good descriptions why this
is a mistake.
2) Assuming it
Thanks Marcin. Its for a new resource-poor language so I only trained it
with what I could collect so far (i.e. only 190,630 words of parallel
data). I retrained the entire system each time without any tuning.
On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
Hi,
I
Difficult to tell with that little data. Once you get beyond 100,000
segments (or 50,000 at least) i would say 2000 per dev (for tuning) and
test set, rest for training. With that few segments it's hard to give
you any recommendations since it might just not give meaningful results.
It's
You're welcome. Take another close look at those varying bleu scores
though. That would make me worry if it happened to me for the same data
and the same weights.
On 22.06.2015 10:31, Hokage Sama wrote:
Ok thanks. Appreciate your help.
On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt
Ok thanks. Appreciate your help.
On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
Difficult to tell with that little data. Once you get beyond 100,000
segments (or 50,000 at least) i would say 2000 per dev (for tuning) and
test set, rest for training. With that few
Yes the language model was built earlier when I first went through the
manual to build a French-English baseline system. So I just reused it for
my Samoan-English system.
Yes for all three runs I used the same training and testing files.
How can I determine how much parallel data I should set
Ok I will.
On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
You're welcome. Take another close look at those varying bleu scores
though. That would make me worry if it happened to me for the same data and
the same weights.
On 22.06.2015 10:31, Hokage Sama wrote:
Wow that was a long read. Still reading though :) but I see that tuning is
essential. I am fairly new to Moses so could you please check if the
commands I ran were correct (minus the tuning part). I just modified the
commands on the Moses website for building a baseline system. Below are the
Don't see any reason for indeterminism here. Unless mgiza is less stable
for small data than I thought. The lm lm/news-commentary-v8.fr-en.blm.en
has been built earlier somewhere?
And to be sure: for all three runs you used exactly the same data,
training and test set?
On 22.06.2015 09:34,
Ok my scores don't vary so much when I just run tokenisation, truecasing,
and cleaning once. Found some differences beginning from the truecased
files. Here are my results now:
BLEU = 16.85, 48.7/21.0/11.7/6.7 (BP=1.000, ratio=1.089, hyp_len=3929,
ref_len=3609)
BLEU = 16.82, 48.6/21.1/11.6/6.7
Hi,
Since MT training is non-convex and thus the BLEU score varies, which score
should I use for my system? I trained my system three times using the same
data and obtained the three different scores below. Should I take the
average or the best score?
BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000,
Hello everyone
I executed all instructions for building baseline system (French to English SMT
based on moses manual) with following system:
CPU: core i5 2400
RAM: 4GB
OS: Ubuntu 14.04 32-bit
Moses: v3.0 (I used it in virtual machine that statmt.org released)
After executing all of commands
this email may explain it for you
https://www.mail-archive.com/moses-support%40mit.edu/msg00901.html
On 13/11/14 10:46, Maria Marpaung wrote:
Hi please help me,
I have been can run Moses and get a score of BLUEis: BLEU= 74.25,
86.2/77.1/70.7/64.8 (BP=1.000, ratio= 1.000, hyp_len= 59134,
Hi please help me,
I have been can run Moses and get a score of BLUE is: BLEU= 74.25,
86.2/77.1/70.7/64.8 (BP=1.000, ratio= 1.000, hyp_len= 59134, ref_len= 59144)
but I dont understand these values. Can you explain the meaning of each of
these values?Therefore, I have to explain my current
, moses-support-requ...@mit.edu написал(а):
Date: Wed, 23 Apr 2014 10:32:40 +0530
From: Kalyani Baruah kajubarua...@gmail.com
Subject: [Moses-support] Bleu Score
To: moses-support@mit.edu moses-support@mit.edu
Message-ID:
CAJZ5LDdFk+4u0bjpkzEmvbySy7kufkoK4GSSyOSWZV=m+ed...@mail.gmail.com
Hi all
What is the Bleu Score ?
How it is calculated on our corpus ?
Please explain in details.
Regards,
*Kalyanee Kanchan Baruah*
Institute of Science and Technology,
Gauhati University,Guwahati,India
Phone- +91-9706242124
___
Moses-support mailing
38 matches
Mail list logo