Re: [Moses-support] Phrase Extraction Problem

2013-01-31 Thread Cuong Hoang
I did it, follow to Barry's suggestion.
I test on a super small corpus with 2 pairs of sentences and generate 800
bilingual phrases :-D
Thanks to you, Barry and Prof.  Marcello Federico.
On Thu, Jan 31, 2013 at 4:08 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote:

  Hi Cuong

 If you pass the aligned sentences through your phrase extraction, and
 through Moses phrase extraction, one at a time then you should be able to
 see where the difference is. As Marcello said, it could be in the handling
 of unaligned words,

 cheers - Barry


 On 30/01/13 16:39, Cuong Hoang wrote:

 Hi all,
 I write a phrase extraction with the rule that is simple from Koehn et.
 al, 2003:
 *
 *
 *``We collect all aligned phrase pairs that are consistent with the word
 alignment: The words in a legal phrase pair are only aligned to each other,
 and not to words outside.*

  I test on a quite large bilingual corpus contained 500,000 pairs of
 sentences, and obtain 33 million phrase pairs.
 However, when I use Moses to extract phrases, I obtain around 90 million
 pairs.

  Does MOSES use some other rules, or there is something wrong, isn't it?

  Thanks,
 C. Hoang
 --
 *
 Best Regards,
 C. Hoang

  {Mimosa, SMT}@Addict
 *


 ___
 Moses-support mailing listmoses-supp...@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





-- 
*
Best Regards,
C. Hoang

{Mimosa, SMT}@Addict
*
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase Extraction Problem

2013-01-30 Thread Marcello Federico
Hi, the total number of extracted phrases in a sentence pair depends on:
- the particular word alignment you are considering
- the heuristic you adopt for the words left unaligned or aligned with the null 
word

Greetings,

Marcello

---
Short from my mobile phone

On 30/gen/2013, at 05:46 PM, Cuong Hoang 
hoangcuong2...@gmail.commailto:hoangcuong2...@gmail.com wrote:

Hi all,
I write a phrase extraction with the rule that is simple from Koehn et. al, 
2003:

``We collect all aligned phrase pairs that are consistent with the word 
alignment: The words in a legal phrase pair are only aligned to each other, and 
not to words outside.

I test on a quite large bilingual corpus contained 500,000 pairs of sentences, 
and obtain 33 million phrase pairs.
However, when I use Moses to extract phrases, I obtain around 90 million pairs.

Does MOSES use some other rules, or there is something wrong, isn't it?

Thanks,
C. Hoang
--
Best Regards,
C. Hoang

{Mimosa, SMT}@Addict
___
Moses-support mailing list
Moses-support@mit.edumailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase Extraction Problem

2013-01-30 Thread Barry Haddow

Hi Cuong

If you pass the aligned sentences through your phrase extraction, and 
through Moses phrase extraction, one at a time then you should be able 
to see where the difference is. As Marcello said, it could be in the 
handling of unaligned words,


cheers - Barry

On 30/01/13 16:39, Cuong Hoang wrote:

Hi all,
I write a phrase extraction with the rule that is simple from Koehn 
et. al, 2003:

/
/
/``We collect all aligned phrase pairs that are consistent with the 
word alignment: The words in a legal phrase pair are only aligned to 
each other, and not to words outside./


I test on a quite large bilingual corpus contained 500,000 pairs of 
sentences, and obtain 33 million phrase pairs.
However, when I use Moses to extract phrases, I obtain around 90 
million pairs.


Does MOSES use some other rules, or there is something wrong, isn't it?

Thanks,
C. Hoang
--
/
Best Regards,
C. Hoang

{Mimosa, SMT}@Addict
/


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support