[Moses-support] Final Call for Papers: DiscoMT at EMNLP 2015

2015-06-23 Thread Jorg Tiedemann
EMNLP 2015 Workshop on Discourse in Machine Translation (DiscoMT'15) (http://www.idiap.ch/workshop/DiscoMT) 17 September 2015 -- Lisbon, Portugal Final call for papers - Submission deadline: 28 June 2015 It is well-known that texts have properties that go beyond those

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
Now that I think of it, truecasing should not change file sizes, after all it only substitutes single letters with their smaller versions, to the file should stay the same size. Unless Samoan has some weird utf-8 letters that have different byte sizes between captialized and uncapitalized

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
I checked for some of my experiments and I get nearly identical bleu scores when using the standard weights, differences are on the second place behind the comma if at all. These results now seem more likely, though there is still variance. I am still wondering why would true casing produce

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Marcin Junczys-Dowmunt
I think you are good now. That's what I am getting for a 500 sentences test set, trained on 10,000 sentences. Similar to your results. For a larger test set (4000 sentences) and the same training data there is nearly no variance, 12.89 vs. 12.91. So now you need to scale up and tune. BLEU =

Re: [Moses-support] BLEU Score Variance: Which score to use?

2015-06-23 Thread Hokage Sama
Nice thanks. Yeah the truecased files I checked had about 18 or so differences where one file would capitalise the first letter and the other file wouldn't. I am going to try and compile more data. But I think I will only manage to get about 10k to 15k parallel segments altogether. Took me quite a