subject:"\[Moses\-support\] BLEU score"

Re: [Moses-support] BLEU score decoding word lattice

2017-02-03 Thread Hieu Hoang

yes, the phrase-based decoder supports lattice decoding http://www.statmt.org/moses/?n=Moses.WordLattices Hieu Hoang http://moses-smt.org/ On 1 February 2017 at 23:10, Angli Liu wrote: > Hi all, > > Is there a way to do lattice decoding with BLEU in Moses? I.e.,

[Moses-support] BLEU score decoding word lattice

2017-02-01 Thread Angli Liu

Hi all, Is there a way to do lattice decoding with BLEU in Moses? I.e., given a word lattice, find the path that represents the highest BLEU score? If so, what function to call and in what format should I feed a lattice in? Thanks! Angli ___

[Moses-support] BLEU score on dev set doesn't match what's reported in moses.ini

2016-10-26 Thread Angli Liu

Hi - I trained a phrase based system from a low resource language to english, and got *13.6633* as the BLEU score. However, when I tested on the same dev set and computed BLEU against the English corpus in the dev set, I only got *3.69*. Then I did a manual grid search over the parameter space in

Re: [Moses-support] BLEU score becomes different

2016-01-18 Thread Matthias Huck

Hi Liang, mteval-v13a.pl does some internal tokenization and probably splits those "~~" words into " ~ ~ ". If this is happening, it explains your difference in the calculated BLEU scores. Cheers, Matthias On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote: > Dear Moses Support Team， > >I

[Moses-support] BLEU score becomes different

2016-01-18 Thread 姚亮

Dear Moses Support Team， I added a source context-dependent translation feature in moses baseline system. In order to avoid modifying the source code, i append a unique identifier to every word in the test/dev source file. for example, a source file with two lines like the

Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-14 Thread Michael Denkowski

- > From: tah...@precisiontranslationtools.com > Date: Sun, 11 Oct 2015 12:53:37 +0700 > To: moses-support@mit.edu > Subject: Re: [Moses-support] BLEU score difference about 0.13 for one > dataset is normal? > > > Yes. Each tuning with the same test set will give you

Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-14 Thread Tom Hoar

Subject: Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal? Yes. Each tuning with the same test set will give you small variations in the final BLEU. Yours looks like they're in a normal range. Date: Sun, 11 Oct 2015 04:23:56 + From: Davood Mohammadifar

Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-13 Thread Davood Mohammadifar

? my dataset for Persian to English includes: Training: about 24 sentences Tune: 1000 sentences Test: 1000 sentences From: tah...@precisiontranslationtools.com Date: Sun, 11 Oct 2015 12:53:37 +0700 To: moses-support@mit.edu Subject: Re: [Moses-support] BLEU score difference about 0.13 for one

Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-10 Thread Tom Hoar

Yes. Each tuning with the same test set will give you small variations in the final BLEU. Yours looks like they're in a normal range. Date: Sun, 11 Oct 2015 04:23:56 + From: Davood Mohammadifar <davood...@hotmail.com> Subject: [Moses-support] BLEU score difference about 0.13 f

[Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-10 Thread Davood Mohammadifar

Hello every one I noticed different BLEU scores for same dataset. Also the difference is not so much and is about 0.13. I trained my dataset and tuned development set for Persian-English translation. after testing, the score was 21.95. For second time i did the same process and obtained

Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-10 Thread Michael Denkowski

Hi Davood, Optimizers like MERT will give you a slightly different result every time you run them, leading to variance in BLEU score. It's generally a good idea to use multiple optimizer runs, especially when comparing two systems. There's a good paper on hypothesis testing for MT that goes

[Moses-support] BLEU score

2015-09-07 Thread Tomasz Gawryl

Hi All! This is my first post here and AT first I want to apologize for my English but I would like to ask you some questions. I finished a full phrase based Moses training of EN-PL (English - Polish) corpus (few million sentences from free sources + half million sentences from commercial tmx).

Re: [Moses-support] BLEU score

2015-09-07 Thread Marcin Junczys-Dowmunt

Hi Tomek, 4.5% definitely indicate that there was an error in your pipeline (or test data?). However, there are so many places where things could go wrong, that based on the little information you have us I could not even start guessing. Check if your line numbers match, that you use tokenized

Re: [Moses-support] BLEU score

2015-09-07 Thread Barry Haddow

Hi Tomek Yes, that's quite a low score. Have a look at the translation output, do the sentences have lots of English words in them, are they very long, very short, or scrambled in some other way? The commonest problem is that something went wrong in corpus preparation, for example the

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt

Now that I think of it, truecasing should not change file sizes, after all it only substitutes single letters with their smaller versions, to the file should stay the same size. Unless Samoan has some weird utf-8 letters that have different byte sizes between captialized and uncapitalized

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt

I checked for some of my experiments and I get nearly identical bleu scores when using the standard weights, differences are on the second place behind the comma if at all. These results now seem more likely, though there is still variance. I am still wondering why would true casing produce

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt

I think you are good now. That's what I am getting for a 500 sentences test set, trained on 10,000 sentences. Similar to your results. For a larger test set (4000 sentences) and the same training data there is nearly no variance, 12.89 vs. 12.91. So now you need to scale up and tune. BLEU =

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Hokage Sama

Nice thanks. Yeah the truecased files I checked had about 18 or so differences where one file would capitalise the first letter and the other file wouldn't. I am going to try and compile more data. But I think I will only manage to get about 10k to 15k parallel segments altogether. Took me quite a

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Hi I delete all the files (I think) generated during a training job before rerunning the entire training. You think this could cause variation? Here's the commands I run to delete: rm ~/corpus/train.tok.en rm ~/corpus/train.tok.sm rm ~/corpus/train.true.en rm ~/corpus/train.true.sm rm

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

I don't think so. However, when you repeat those experiments, you might try to identify where two trainings are starting to diverge by pairwise comparisions of the same files between two runs. Maybe then we can deduce something. On 23.06.2015 00:25, Hokage Sama wrote: Hi I delete all the

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Ok will do On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: I don't think so. However, when you repeat those experiments, you might try to identify where two trainings are starting to diverge by pairwise comparisions of the same files between two runs. Maybe then we

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

Hi, I think the average is OK, your variance is however quite high. Did you retrain the entire system or just optimize parameters a couple of times? Two useful papers on the topic: https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf http://www.mt-archive.info/MTS-2011-Cettolo.pdf On

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

Hm. That's interesting. The language should not matter. 1) Do not report results without tuning. They are meaningless. There is a whole thread on that, look for Major bug found in Moses. If you ignore the trollish aspects it contains may good descriptions why this is a mistake. 2) Assuming it

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Thanks Marcin. Its for a new resource-poor language so I only trained it with what I could collect so far (i.e. only 190,630 words of parallel data). I retrained the entire system each time without any tuning. On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: Hi, I

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

Difficult to tell with that little data. Once you get beyond 100,000 segments (or 50,000 at least) i would say 2000 per dev (for tuning) and test set, rest for training. With that few segments it's hard to give you any recommendations since it might just not give meaningful results. It's

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

You're welcome. Take another close look at those varying bleu scores though. That would make me worry if it happened to me for the same data and the same weights. On 22.06.2015 10:31, Hokage Sama wrote: Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: Difficult to tell with that little data. Once you get beyond 100,000 segments (or 50,000 at least) i would say 2000 per dev (for tuning) and test set, rest for training. With that few

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Yes the language model was built earlier when I first went through the manual to build a French-English baseline system. So I just reused it for my Samoan-English system. Yes for all three runs I used the same training and testing files. How can I determine how much parallel data I should set

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Ok I will. On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: You're welcome. Take another close look at those varying bleu scores though. That would make me worry if it happened to me for the same data and the same weights. On 22.06.2015 10:31, Hokage Sama wrote:

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Wow that was a long read. Still reading though :) but I see that tuning is essential. I am fairly new to Moses so could you please check if the commands I ran were correct (minus the tuning part). I just modified the commands on the Moses website for building a baseline system. Below are the

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt

Don't see any reason for indeterminism here. Unless mgiza is less stable for small data than I thought. The lm lm/news-commentary-v8.fr-en.blm.en has been built earlier somewhere? And to be sure: for all three runs you used exactly the same data, training and test set? On 22.06.2015 09:34,

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama

Ok my scores don't vary so much when I just run tokenisation, truecasing, and cleaning once. Found some differences beginning from the truecased files. Here are my results now: BLEU = 16.85, 48.7/21.0/11.7/6.7 (BP=1.000, ratio=1.089, hyp_len=3929, ref_len=3609) BLEU = 16.82, 48.6/21.1/11.6/6.7

[Moses-support] BLEU Score Variance: Which score to use?

2015-06-21 Thread Hokage Sama

Hi, Since MT training is non-convex and thus the BLEU score varies, which score should I use for my system? I trained my system three times using the same data and obtained the three different scores below. Should I take the average or the best score? BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000,

[Moses-support] BLEU score is lower than normal for baseline system (FR to EN)

2015-06-10 Thread Davood Mohammadifar

Hello everyone I executed all instructions for building baseline system (French to English SMT based on moses manual) with following system: CPU: core i5 2400 RAM: 4GB OS: Ubuntu 14.04 32-bit Moses: v3.0 (I used it in virtual machine that statmt.org released) After executing all of commands

Re: [Moses-support] BLEU score Help

2014-11-14 Thread Hieu Hoang

this email may explain it for you https://www.mail-archive.com/moses-support%40mit.edu/msg00901.html On 13/11/14 10:46, Maria Marpaung wrote: Hi please help me, I have been can run Moses and get a score of BLUEis: BLEU= 74.25, 86.2/77.1/70.7/64.8 (BP=1.000, ratio= 1.000, hyp_len= 59134,

[Moses-support] BLEU score Help

2014-11-13 Thread Maria Marpaung

Hi please help me, I have been can run Moses and get a score of BLUE is: BLEU= 74.25, 86.2/77.1/70.7/64.8 (BP=1.000, ratio= 1.000, hyp_len= 59134, ref_len= 59144) but I dont understand these values. Can you explain the meaning of each of these values?Therefore, I have to explain my current

Re: [Moses-support] Bleu Score

2014-04-23 Thread Венцислав Жечев (Ventsislav Zhechev)

, moses-support-requ...@mit.edu написал(а): Date: Wed, 23 Apr 2014 10:32:40 +0530 From: Kalyani Baruah kajubarua...@gmail.com Subject: [Moses-support] Bleu Score To: moses-support@mit.edu moses-support@mit.edu Message-ID: CAJZ5LDdFk+4u0bjpkzEmvbySy7kufkoK4GSSyOSWZV=m+ed...@mail.gmail.com

[Moses-support] Bleu Score

2014-04-22 Thread Kalyani Baruah

Hi all What is the Bleu Score ? How it is calculated on our corpus ? Please explain in details. Regards, *Kalyanee Kanchan Baruah* Institute of Science and Technology, Gauhati University,Guwahati,India Phone- +91-9706242124 ___ Moses-support mailing

38 matches

Mail list logo