Re: [Moses-support] Phrase Extraction Problem
I did it, follow to Barry's suggestion. I test on a super small corpus with 2 pairs of sentences and generate 800 bilingual phrases :-D Thanks to you, Barry and Prof. Marcello Federico. On Thu, Jan 31, 2013 at 4:08 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote: Hi Cuong If you pass the aligned sentences through your phrase extraction, and through Moses phrase extraction, one at a time then you should be able to see where the difference is. As Marcello said, it could be in the handling of unaligned words, cheers - Barry On 30/01/13 16:39, Cuong Hoang wrote: Hi all, I write a phrase extraction with the rule that is simple from Koehn et. al, 2003: * * *``We collect all aligned phrase pairs that are consistent with the word alignment: The words in a legal phrase pair are only aligned to each other, and not to words outside.* I test on a quite large bilingual corpus contained 500,000 pairs of sentences, and obtain 33 million phrase pairs. However, when I use Moses to extract phrases, I obtain around 90 million pairs. Does MOSES use some other rules, or there is something wrong, isn't it? Thanks, C. Hoang -- * Best Regards, C. Hoang {Mimosa, SMT}@Addict * ___ Moses-support mailing listmoses-supp...@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- * Best Regards, C. Hoang {Mimosa, SMT}@Addict * ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase Extraction Problem
Hi, the total number of extracted phrases in a sentence pair depends on: - the particular word alignment you are considering - the heuristic you adopt for the words left unaligned or aligned with the null word Greetings, Marcello --- Short from my mobile phone On 30/gen/2013, at 05:46 PM, Cuong Hoang hoangcuong2...@gmail.commailto:hoangcuong2...@gmail.com wrote: Hi all, I write a phrase extraction with the rule that is simple from Koehn et. al, 2003: ``We collect all aligned phrase pairs that are consistent with the word alignment: The words in a legal phrase pair are only aligned to each other, and not to words outside. I test on a quite large bilingual corpus contained 500,000 pairs of sentences, and obtain 33 million phrase pairs. However, when I use Moses to extract phrases, I obtain around 90 million pairs. Does MOSES use some other rules, or there is something wrong, isn't it? Thanks, C. Hoang -- Best Regards, C. Hoang {Mimosa, SMT}@Addict ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase Extraction Problem
Hi Cuong If you pass the aligned sentences through your phrase extraction, and through Moses phrase extraction, one at a time then you should be able to see where the difference is. As Marcello said, it could be in the handling of unaligned words, cheers - Barry On 30/01/13 16:39, Cuong Hoang wrote: Hi all, I write a phrase extraction with the rule that is simple from Koehn et. al, 2003: / / /``We collect all aligned phrase pairs that are consistent with the word alignment: The words in a legal phrase pair are only aligned to each other, and not to words outside./ I test on a quite large bilingual corpus contained 500,000 pairs of sentences, and obtain 33 million phrase pairs. However, when I use Moses to extract phrases, I obtain around 90 million pairs. Does MOSES use some other rules, or there is something wrong, isn't it? Thanks, C. Hoang -- / Best Regards, C. Hoang {Mimosa, SMT}@Addict / ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support