Re: [Moses-support] decoding a confusion network using Moses' API

2012-04-26 Thread Sylvain Raybaud
wild guessing here: in TranslationTask::Run, I see there are many
alternatives for processing the sentence, like doLatticeMBR etc, not
just runing Manager::ProcessSentence()
Maybe one of these alternatives must be run for processing confusion
networks?

cheers

Sylvain


On 26/04/12 15:53, Sylvain Raybaud wrote:
 Hi Barrow
 
   Thanks for the tip, that sounds likely indeed. I'll try it again but
 last time I ran the software through valgrind, I got so many errors in
 external libs that I just gave up.
 
 In the meantime, here is the complete fonction that handles the
 decoding, in case someone sees something obviously wrong in here...
 
 static void moses_translate_phonemes(manager_data_t * pool,
 translation_pair_t * pair) {
 debug(starting);
 
 const TranslationSystem system =
 StaticData::Instance().GetTranslationSystem(TranslationSystem::DEFAULT);
 /* there is only one translation system for now */
 const StaticData staticData = StaticData::Instance();
 const vectorFactorType inputFactorOrder =
 staticData.GetInputFactorOrder();
 
 MyConfusionNet * cn =
 phonemes_to_cn(pool-mp_engine-phonemes_cm,pair-source-phonemes,pool-mp_config-cn_width,pool-mp_config-cn_thresh,inputFactorOrder);
 
 Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
 system);
 manager-ProcessSentence();
 const Hypothesis* hypo = manager-GetBestHypothesis();
 
 string hyp = moses_get_hyp(hypo);
 char * hyp_ret = (char*)malloc((strlen(hyp.c_str())+1)*sizeof(char));
 strcpy(hyp_ret,hyp.c_str());
 
 pair-translation_score = UntransformScore(hypo-GetScore());
 translation_pair_set_target(pair, hyp_ret,NULL);
 
 delete manager;
 delete cn;
 
 }
 
 cheers,
 
 Sylvain
 
 On 26/04/12 13:49, Barry Haddow wrote:
 Hi Sylvain

 I'm not familiar with this part of the code, but the strange score suggests 
 that there's some uninitialised memory. You could try running through 
 valgrind 
 and it might give some clues,

 cheers - Barry

 On Thursday 26 Apr 2012 12:24:11 Sylvain Raybaud wrote:
 Hi all

   I'm using Moses API for decoding a confusion network. The CN is
 created from the output of an ASR engine and a confusion matrix. More
 precisely (even though it's probably irrelevant to my problem), the ASR
 engine provides a string of phonemes (1-best) and the confusion matrix
 provides alternatives for each phonemes (the idea was described in Jiang
 et al., _Phonetic representation based speech translation_, MT Summit
 XIII, 2011).

 When the CN is dumped into a file and I use
 moses -f moses.phonemes.cn.ini  CN
 to decode it, everything is fine.

 But when I use Moses API (loading the same configuration file), I get
 incomplete translations, like:

 ASR output (French): nous font sont toujours chimistes plume
 rassembleront ch je trouve que le office de ce tout de suite
 Phonetic representation: n u f on s on t t u ge u r ch i m i s t z p l
 y m r a s an b l swa r on ch ge swa t r u v k swa l swa oh f i s swa d
 swa s swa t u d s h i t
 Translation: of
 score: 903011968.00

 Note that the transcription is poor (I haven't really tuned the ASR
 engine), but still, the translation ought to be more than just of.
 Sometimes it's several words, I guess it's a phrase in the phrase table.
 The word generally seems to be the translation of a word in the source
 sentence.
 When I use moses on command line to translate either the 1-best or the
 the CN, I get a reasonable translation. When I use the API to translate
 the 1-best phonetic representation, I also get a reasonable translation.
 I think the CN object is created correctly because moses loads it and
 prints it prior to decoding (this is normal verbose behavior). I also
 tried to create a PCN object, and got exactly the same results. So I
 guess the problem is either how I tell moses to decode it or how I
 extract the result from the Hypothesis object. But I'm clueless about
 what's the problem is here, since the code is working when I just
 translate a string. The translation score seems ridiculously high too.
 I'll give below the corresponding code.

 Decoding and hypothesis extraction:
 ***
 [...]
 Manager * manager = new Manager(*cn,staticData.GetSearchAlgorithm(),
 system);
 manager-ProcessSentence();
 const Hypothesis* hypo = manager-GetBestHypothesis();
 string hyp = moses_get_hyp(hypo);
 [...]
 pair-translation_score = UntransformScore(hypo-GetScore());
 [...]

 string moses_get_hyp(const Hypothesis* hypo) {
 return hypo-GetTargetPhraseStringRep();
 }


 Creation of the CN:
 ***

 /** new class derived from ConfusionNet, with a new method for directly
 creating CN */
 class MyConfusionNet : public ConfusionNet {
   public:
 void addCol(Column);
 };

 void MyConfusionNet::addCol(Column col) {
 data.push_back(col);
 }

 /** create a column of the CN */
 static MyConfusionNet::Column create_phoneme_col(confusion_matrix_t *
 cm, const char * ph, int 

[Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Daniel Schaut
Hi all,

I'm running some experiments for my thesis and I've been told by a more
experienced user that the achieved scores for BLEU/METEOR of my MT engine
were too good to be true. Since this is the very first MT engine I've ever
made and I am not experienced with interpreting scores, I really don't know
how to reflect them. The first test set achieves a BLEU score of 0.6508
(v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A
second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR
score of 0.6748.

Here are some basic facts about my system:
Decoding direction: EN-DE
Training corpus: 1.8 mil sentences
Tuning runs: 5
Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
LM type: trigram
TM type: unfactored

I'm now trying to figure out if these scores are realistic at all, as
different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
2011. Any comments regarding the mentioned decoding direction and related
scores will be much appreciated.

Best,
Daniel
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Barry Haddow
Hi Daniel

BLEU scores do vary according to test set, but the scores you report are much 
higher than usual.

The most likely thing is that you have some of your test set included in your 
training set,

cheers - Barry

On Thursday 26 April 2012 19:18:33 Daniel Schaut wrote:
 Hi all,
 
 I'm running some experiments for my thesis and I've been told by a more
 experienced user that the achieved scores for BLEU/METEOR of my MT engine
 were too good to be true. Since this is the very first MT engine I've ever
 made and I am not experienced with interpreting scores, I really don't know
 how to reflect them. The first test set achieves a BLEU score of 0.6508
 (v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A
 second test set indicated a slightly lower BLEU score of 0.6267 and a
  METEOR score of 0.6748.
 
 Here are some basic facts about my system:
 Decoding direction: EN-DE
 Training corpus: 1.8 mil sentences
 Tuning runs: 5
 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
 LM type: trigram
 TM type: unfactored
 
 I'm now trying to figure out if these scores are realistic at all, as
 different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
 2011. Any comments regarding the mentioned decoding direction and related
 scores will be much appreciated.
 
 Best,
 Daniel
 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Francis Tyers
El dj 26 de 04 de 2012 a les 20:18 +0200, en/na Daniel Schaut va
escriure:
 Hi all,
 
 I’m running some experiments for my thesis and I’ve been told by a
 more experienced user that the achieved scores for BLEU/METEOR of my
 MT engine were too good to be true. Since this is the very first MT
 engine I’ve ever made and I am not experienced with interpreting
 scores, I really don’t know how to reflect them. The first test set
 achieves a BLEU score of 0.6508 (v13). METEOR’s final score is 0.7055
 (v1.3, exact, stem, paraphrase). A second test set indicated a
 slightly lower BLEU score of 0.6267 and a METEOR score of 0.6748.
 
 Here are some basic facts about my system:
 
 Decoding direction: EN-DE
 
 Training corpus: 1.8 mil sentences
 
 Tuning runs: 5
 
 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
 
 LM type: trigram
 
 TM type: unfactored
 
 I’m now trying to figure out if these scores are realistic at all, as
 different papers indicate by far lower BLEU scores, e.g. Koehn and
 Hoang 2011. Any comments regarding the mentioned decoding direction
 and related scores will be much appreciated.

Did you try looking at the sentences ? -- 1,000 is few enough to eyeball
them. Have you tried the same system with a different corpus ? (e.g.
EuroParl). Have you checked that your test set and your training set do
not intersect ?

If the scores don't seem believable, then probably they aren't :)

Fran

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread John D Burger
I =think= I recall that pairwise BLEU scores for human translators are usually 
around 0.50, so anything much better than that is indeed suspect.

- JB

On Apr 26, 2012, at 14:18 , Daniel Schaut wrote:

 Hi all,
 
 
 I’m running some experiments for my thesis and I’ve been told by a more 
 experienced user that the achieved scores for BLEU/METEOR of my MT engine 
 were too good to be true. Since this is the very first MT engine I’ve ever 
 made and I am not experienced with interpreting scores, I really don’t know 
 how to reflect them. The first test set achieves a BLEU score of 0.6508 
 (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A 
 second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR 
 score of 0.6748.
 
 
 Here are some basic facts about my system:
 
 Decoding direction: EN-DE
 
 Training corpus: 1.8 mil sentences
 
 Tuning runs: 5
 
 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
 
 LM type: trigram
 
 TM type: unfactored
 
 
 I’m now trying to figure out if these scores are realistic at all, as 
 different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang 
 2011. Any comments regarding the mentioned decoding direction and related 
 scores will be much appreciated.
 
 
 Best,
 
 Daniel
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Miles Osborne
Very short sentences will give you high scores.

Also multiple references will boost them

Miles
On Apr 26, 2012 8:13 PM, John D Burger j...@mitre.org wrote:

 I =think= I recall that pairwise BLEU scores for human translators are
 usually around 0.50, so anything much better than that is indeed suspect.

 - JB

 On Apr 26, 2012, at 14:18 , Daniel Schaut wrote:

  Hi all,
 
 
  I’m running some experiments for my thesis and I’ve been told by a more
 experienced user that the achieved scores for BLEU/METEOR of my MT engine
 were too good to be true. Since this is the very first MT engine I’ve ever
 made and I am not experienced with interpreting scores, I really don’t know
 how to reflect them. The first test set achieves a BLEU score of 0.6508
 (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A
 second test set indicated a slightly lower BLEU score of 0.6267 and a
 METEOR score of 0.6748.
 
 
  Here are some basic facts about my system:
 
  Decoding direction: EN-DE
 
  Training corpus: 1.8 mil sentences
 
  Tuning runs: 5
 
  Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
 
  LM type: trigram
 
  TM type: unfactored
 
 
  I’m now trying to figure out if these scores are realistic at all, as
 different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
 2011. Any comments regarding the mentioned decoding direction and related
 scores will be much appreciated.
 
 
  Best,
 
  Daniel
 
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Merging language models with IRSTLM..?

2012-04-26 Thread Marcello Federico
Hi,

we are currently working on  a project that includes incremental training of 
LMs. 
Hence, there are plans to introduce quick adaptation in IRSTLM, but not soon.

The question is indeed how often you need to adapt the LM. If you are working
with large news LMs then it seems that adapting once a week is enough (you
simply do not collect enough data in fewer days to significantly change the 
LM). 

If you want to continuously update the LM you can also consider using an 
external 
interpolation.  You interpolate two distinct LMs, one fixed and one smaller 
that 
is continuously retrained (should be fast to do), using the interpolate-lm 
command (see manual).

Greetings,
Marcello

 

On Apr 22, 2012, at 9:12 PM, Pratyush Banerjee wrote:

 Hi,
 
 I have recently been trying to create incremental adapted language models 
 using IRSTLM.
 
 I have a in-domain data set on which the mixture adapted weights are computed 
 using the -lm=mix option and i have a larger out-domain dataset from which i 
 incrementally add data to create adapted LMs of different size.
 
 Currently, every time saveBIN is called, the entire lmtable is estimated and 
 saved which makes the process slow...
 
 Is there a functionality in IRSTLM to incrementally train/save adapted 
 Language models?
 
 Secondly, given a existing adapted language model in ARPA format (old), and 
 another small language model built on incremental data (new), 
 
 would it be safe to update the smoothed probabilities (fstar) using the 
 following formula: 
 c_sum(wh) = c_old(wh) + c_new(wh)
 f*_old(w|h)*(c_old(wh)/c_sum(wh)) + f*_new(w|h)*(c_new(wh)/c_sum(wh))
 
 where the c_old and c_new counts are estimated from the ngram tables?
 
 
 Thanks and Regards,
 
 Pratyush
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support