Re: [Moses-support] Data collection

2016-04-19 Thread Philipp Koehn
Hi, the common training pipeline limits sentences to at most 80 words. This is due to limitations in GIZA++. There can be any mix of sentence lengths - long sentences, short sentences, single words. There is a good chance for the system to translate "I eat an apple" correctly, if it a training

[Moses-support] Data collection

2016-04-19 Thread Sanjanashree Palanivel
Hi, How the data should be collected for training Moses. I wish to know how much longer and shorter the sentence can be for training moses. What will happens, if the simple sentences like "I eat an apple" are given for training with longer sentences. and what if i give a word as a

Re: [Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Kenneth Heafield
Hi, Any words beyond N-1 have full context and are included in the phrase's score. So it's hypothesis + target phrase + adjustments. And the routine you cite is computing adjustments. Kenneth On 04/19/16 10:50, Evgeny Matusov wrote: > > Hi, > > > my colleagues and I noticed the following

[Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Evgeny Matusov
Hi, my colleagues and I noticed the following in the KenLM code when a Hypo is evaluated with the LM: https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203 Do we understand it correctly that because of this line, for phrases longer than the LM order N only the first N