Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Hokage Sama
Nice thanks. Yeah the truecased files I checked had about 18 or so differences where one file would capitalise the first letter and the other file wouldn't. I am going to try and compile more data. But I think I will only manage to get about 10k to 15k parallel segments altogether. Took me quite a

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
I think you are good now. That's what I am getting for a 500 sentences test set, trained on 10,000 sentences. Similar to your results. For a larger test set (4000 sentences) and the same training data there is nearly no variance, 12.89 vs. 12.91. So now you need to scale up and tune. BLEU = 12.

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
Now that I think of it, truecasing should not change file sizes, after all it only substitutes single letters with their smaller versions, to the file should stay the same size. Unless Samoan has some weird utf-8 letters that have different byte sizes between captialized and uncapitalized versi

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
I checked for some of my experiments and I get nearly identical bleu scores when using the standard weights, differences are on the second place behind the comma if at all. These results now seem more likely, though there is still variance. I am still wondering why would true casing produce dif

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Ok my scores don't vary so much when I just run tokenisation, truecasing, and cleaning once. Found some differences beginning from the truecased files. Here are my results now: BLEU = 16.85, 48.7/21.0/11.7/6.7 (BP=1.000, ratio=1.089, hyp_len=3929, ref_len=3609) BLEU = 16.82, 48.6/21.1/11.6/6.7 (BP

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Ok will do On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt wrote: > I don't think so. However, when you repeat those experiments, you might > try to identify where two trainings are starting to diverge by pairwise > comparisions of the same files between two runs. Maybe then we can deduce > som

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
I don't think so. However, when you repeat those experiments, you might try to identify where two trainings are starting to diverge by pairwise comparisions of the same files between two runs. Maybe then we can deduce something. On 23.06.2015 00:25, Hokage Sama wrote: > Hi I delete all the file

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Hi I delete all the files (I think) generated during a training job before rerunning the entire training. You think this could cause variation? Here's the commands I run to delete: rm ~/corpus/train.tok.en rm ~/corpus/train.tok.sm rm ~/corpus/train.true.en rm ~/corpus/train.true.sm rm ~/corpus/tra

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Ok I will. On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt wrote: > You're welcome. Take another close look at those varying bleu scores > though. That would make me worry if it happened to me for the same data and > the same weights. > > On 22.06.2015 10:31, Hokage Sama wrote: > >> Ok thanks.

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
You're welcome. Take another close look at those varying bleu scores though. That would make me worry if it happened to me for the same data and the same weights. On 22.06.2015 10:31, Hokage Sama wrote: > Ok thanks. Appreciate your help. > > On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt wrote: > Difficult to tell with that little data. Once you get beyond 100,000 > segments (or 50,000 at least) i would say 2000 per dev (for tuning) and > test set, rest for training. With that few segments it's har

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
Difficult to tell with that little data. Once you get beyond 100,000 segments (or 50,000 at least) i would say 2000 per dev (for tuning) and test set, rest for training. With that few segments it's hard to give you any recommendations since it might just not give meaningful results. It's curren

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Yes the language model was built earlier when I first went through the manual to build a French-English baseline system. So I just reused it for my Samoan-English system. Yes for all three runs I used the same training and testing files. How can I determine how much parallel data I should set aside

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Marcin Junczys-Dowmunt
Don't see any reason for indeterminism here. Unless mgiza is less stable for small data than I thought. The lm lm/news-commentary-v8.fr-en.blm.en has been built earlier somewhere? And to be sure: for all three runs you used exactly the same data, training and test set? On 22.06.2015 09:34, Hok

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-22 Thread Hokage Sama
Wow that was a long read. Still reading though :) but I see that tuning is essential. I am fairly new to Moses so could you please check if the commands I ran were correct (minus the tuning part). I just modified the commands on the Moses website for building a baseline system. Below are the comman

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-21 Thread Marcin Junczys-Dowmunt
Hm. That's interesting. The language should not matter. 1) Do not report results without tuning. They are meaningless. There is a whole thread on that, look for "Major bug found in Moses". If you ignore the trollish aspects it contains may good descriptions why this is a mistake. 2) Assuming i

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-21 Thread Hokage Sama
Thanks Marcin. Its for a new resource-poor language so I only trained it with what I could collect so far (i.e. only 190,630 words of parallel data). I retrained the entire system each time without any tuning. On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt wrote: > Hi, > I think the average is

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-21 Thread Marcin Junczys-Dowmunt
Hi, I think the average is OK, your variance is however quite high. Did you retrain the entire system or just optimize parameters a couple of times? Two useful papers on the topic: https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf http://www.mt-archive.info/MTS-2011-Cettolo.pdf On 22.06.20

[Moses-support] BLEU Score Variance: Which score to use?

2015-06-21 Thread Hokage Sama
Hi, Since MT training is non-convex and thus the BLEU score varies, which score should I use for my system? I trained my system three times using the same data and obtained the three different scores below. Should I take the average or the best score? BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, r