Cc: moses-support@mit.edu
Sent: Wednesday, 20 May, 2015 20:50:41
Subject: Re: [Moses-support] When to truecase
Got it. So then, how was casing handled in the mbr/mp column? Was all of
the data lowercased, then models trained, then recasing applied after
decoding? Or something else?
On Wed
: moses-support@mit.edu
Sent: Wednesday, 20 May, 2015 20:50:41
Subject: Re: [Moses-support] When to truecase
Got it. So then, how was casing handled in the mbr/mp column? Was all of
the data lowercased, then models trained, then recasing applied after
decoding? Or something else
recaser: builds a Moses model for word translation from lowercased to cased
and also uses a language model. Input to recaser is lowercased.
truecaser: builds a casing model based on the number of times each version
appears in text (e.g. rivet (4/8) Rivet (3) RIVET (1)). Input to truecaser
is as
Philipp (and others),
I'm wondering what people's experience is regarding when truecasing is
applied.
One option is to truecase the training data, then train your TM and LM
using that truecased data. Another option would be to lowercase the data,
train TM and LM on the lowercased data, and then
Hi,
no, the changes are made incrementally.
So the recesed baseline is the previous mbr/mp column.
-phi
On Wed, May 20, 2015 at 2:01 PM, Lane Schwartz dowob...@gmail.com wrote:
Philipp,
In Table 2 of the WMT 2009 paper, are the baseline and truecased
columns directly comparable? In other
Got it. So then, how was casing handled in the mbr/mp column? Was all of
the data lowercased, then models trained, then recasing applied after
decoding? Or something else?
On Wed, May 20, 2015 at 1:30 PM, Philipp Koehn p...@jhu.edu wrote:
Hi,
no, the changes are made incrementally.
So the
Hi,
yes, this is what the RECASER section in EMS enables.
-phi
On Wed, May 20, 2015 at 2:50 PM, Lane Schwartz dowob...@gmail.com wrote:
Got it. So then, how was casing handled in the mbr/mp column? Was all
of the data lowercased, then models trained, then recasing applied after
decoding?
Hi,
see Section 2.2 in our WMT 2009 submission:
http://www.statmt.org/wmt09/pdf/WMT-0929.pdf
One practical reason to avoid recasing is the need
for a second large cased language model.
But there is of course also the practical issue with
have a unique truecasing scheme for each data
condition,
Philipp,
In Table 2 of the WMT 2009 paper, are the baseline and truecased
columns directly comparable? In other words, do the two columns indicate
identical conditions other than a single variable (how and/or when casing
was handled)?
In the baseline condition, how and when was casing handled?