subject:"Re\: \[Moses\-support\] differences between moses and moses2 output"

Re: [Moses-support] differences between moses and moses2 output

2016-10-13 Thread Hieu Hoang

hiya

Hieu Hoang
http://www.hoang.co.uk/hieu

On 13 October 2016 at 15:08, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> We haven't checked the probingpt + minlexr speedup yet, however we have
> found some further differences in the output with respect to the standard
> Moses decoder.
>
> It happens sometimes that the order of replacement of placeholders with
> actual numbers is not the good one. For instance :
>
> moses2 output: as of december 2012 , 31
> moses output: as of december 31 , 2012
>
> moses2 output: à jour au 2013 février 15
> moses output: à jour au 15 février 2013
>
> Is this the expected behavior?
>
no, they should work the same way. Model files and example input would be
good so I can replicate it

>
> Another minor difference is the handling of the carriage return character
> ("\r") . It seems to be deleted by standard Moses and converted into
> newline by Moses2.
>
there's no explicit handling of this in either moses or moses2. Whatever
happens is not guaranteed to happen. You're better off preprocessing to
remove \r and other non-printing characters

>
> Best,
> Vito
>
> 2016-10-07 17:24 GMT+02:00 Hieu Hoang :
>
>> yep, it should give you a big speedup compared to probingpt + minlexr
>> model
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 7 October 2016 at 16:21, Vito Mandorino > .com> wrote:
>>
>>> Yes I modified the line in the moses.ini . My comparison was with
>>> respect to probingPT + minlexr reordering model (rather than .gz reordering
>>> model)
>>>
>>> 2016-10-07 16:25 GMT+02:00 Hieu Hoang :
>>>
 weird. it should be a massive speedup (~500%). You have to change the
 moses.ini file slightly

   [feature]
   LexicalReordering … path=reordering-table.msd-bidi
 rectional-fe.0.5.0-0.gz
 to
   [feature]
   LexicalReordering … property-index=0


 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 7 October 2016 at 15:02, Vito Mandorino <
 vito.mandor...@linguacustodia.com> wrote:

> Yes, that worked for me as well, thank you. There is a little
> improvement in speed but not that much actually (about 5% faster using 30
> threads).
>
> 2016-10-04 11:44 GMT+02:00 Hieu Hoang :
>
>> yes - the script expects the files to be gzipped.
>> It runs ok for me. I executed this:
>>
>> MOSES_DIR=~/workspace/github/mosesdecoder.perf
>>
>> $MOSES_DIR/scripts/generic/binarize4moses2.perl
>> --phrase-table=phrase-table.gz 
>> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
>> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>>
>> Got this:
>>
>> Executing: gzip -dc phrase-table.gz |
>> /home/hieu/workspace/github/mosesdecoder.perf/scripts/generi
>> c/../../contrib/sigtest-filter/filter-pt -n 0 | gzip -c >
>> ./tmp.14373/pt.gz
>> ...
>> Reading phrase table finished, writing remaining files to disk.
>>
>> $ ll integrated_phrase-reordering/
>> total 24688
>> drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
>> drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
>> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
>> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
>> -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
>> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
>> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
>> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
>> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>>
>>
>> On 04/10/2016 09:06, Vito Mandorino wrote:
>>
>> The command was
>>
>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>> --phrase-table=/home/vito/phrase-table.sorted
>> --lex-ro=/home/vito/reordering-table.sorted
>> --output-dir=/home/vito/integrated_phrase-reordering/
>> --num-lex-scores=6
>>
>> The tables in the command are sorted with LC_ALL . I attach them in
>> .gz format. Should one use the .gz format also in the command above?
>>
>> Vito
>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>


>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>

Re: [Moses-support] differences between moses and moses2 output

2016-10-13 Thread Vito Mandorino

We haven't checked the probingpt + minlexr speedup yet, however we have
found some further differences in the output with respect to the standard
Moses decoder.

It happens sometimes that the order of replacement of placeholders with
actual numbers is not the good one. For instance :

moses2 output: as of december 2012 , 31
moses output: as of december 31 , 2012

moses2 output: à jour au 2013 février 15
moses output: à jour au 15 février 2013

Is this the expected behavior?

Another minor difference is the handling of the carriage return character
("\r") . It seems to be deleted by standard Moses and converted into
newline by Moses2.

Best,
Vito

2016-10-07 17:24 GMT+02:00 Hieu Hoang :

> yep, it should give you a big speedup compared to probingpt + minlexr model
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 7 October 2016 at 16:21, Vito Mandorino  linguacustodia.com> wrote:
>
>> Yes I modified the line in the moses.ini . My comparison was with respect
>> to probingPT + minlexr reordering model (rather than .gz reordering model)
>>
>> 2016-10-07 16:25 GMT+02:00 Hieu Hoang :
>>
>>> weird. it should be a massive speedup (~500%). You have to change the
>>> moses.ini file slightly
>>>
>>>   [feature]
>>>   LexicalReordering … path=reordering-table.msd-bidi
>>> rectional-fe.0.5.0-0.gz
>>> to
>>>   [feature]
>>>   LexicalReordering … property-index=0
>>>
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 7 October 2016 at 15:02, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com> wrote:
>>>
 Yes, that worked for me as well, thank you. There is a little
 improvement in speed but not that much actually (about 5% faster using 30
 threads).

 2016-10-04 11:44 GMT+02:00 Hieu Hoang :

> yes - the script expects the files to be gzipped.
> It runs ok for me. I executed this:
>
> MOSES_DIR=~/workspace/github/mosesdecoder.perf
>
> $MOSES_DIR/scripts/generic/binarize4moses2.perl
> --phrase-table=phrase-table.gz 
> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>
> Got this:
>
> Executing: gzip -dc phrase-table.gz |
> /home/hieu/workspace/github/mosesdecoder.perf/scripts/generi
> c/../../contrib/sigtest-filter/filter-pt -n 0 | gzip -c >
> ./tmp.14373/pt.gz
> ...
> Reading phrase table finished, writing remaining files to disk.
>
> $ ll integrated_phrase-reordering/
> total 24688
> drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
> drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
> -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>
>
> On 04/10/2016 09:06, Vito Mandorino wrote:
>
> The command was
>
> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
> --phrase-table=/home/vito/phrase-table.sorted
> --lex-ro=/home/vito/reordering-table.sorted
> --output-dir=/home/vito/integrated_phrase-reordering/
> --num-lex-scores=6
>
> The tables in the command are sorted with LC_ALL . I attach them in
> .gz format. Should one use the .gz format also in the command above?
>
> Vito
>
>
>


 --
 *M**. Vito MANDORINO -- Chief Scientist*


 [image: Description : Description : lingua_custodia_final full logo]

  *The Translation Trustee*

 *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

 *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
 <%2B33%206%2084%2065%2068%2089>*

 *Email :*  *vito.mandor...@linguacustodia.com
 *

 *Website :*
 *www.linguacustodia.finance *

>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel :

Re: [Moses-support] differences between moses and moses2 output

2016-10-07 Thread Vito Mandorino

Yes I modified the line in the moses.ini . My comparison was with respect
to probingPT + minlexr reordering model (rather than .gz reordering model)

2016-10-07 16:25 GMT+02:00 Hieu Hoang :

> weird. it should be a massive speedup (~500%). You have to change the
> moses.ini file slightly
>
>   [feature]
>   LexicalReordering … path=reordering-table.msd-
> bidirectional-fe.0.5.0-0.gz
> to
>   [feature]
>   LexicalReordering … property-index=0
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 7 October 2016 at 15:02, Vito Mandorino  linguacustodia.com> wrote:
>
>> Yes, that worked for me as well, thank you. There is a little improvement
>> in speed but not that much actually (about 5% faster using 30 threads).
>>
>> 2016-10-04 11:44 GMT+02:00 Hieu Hoang :
>>
>>> yes - the script expects the files to be gzipped.
>>> It runs ok for me. I executed this:
>>>
>>> MOSES_DIR=~/workspace/github/mosesdecoder.perf
>>>
>>> $MOSES_DIR/scripts/generic/binarize4moses2.perl
>>> --phrase-table=phrase-table.gz 
>>> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
>>> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>>>
>>> Got this:
>>>
>>> Executing: gzip -dc phrase-table.gz |  /home/hieu/workspace/github/mo
>>> sesdecoder.perf/scripts/generic/../../contrib/sigtest-filter/filter-pt
>>> -n 0 | gzip -c > ./tmp.14373/pt.gz
>>> ...
>>> Reading phrase table finished, writing remaining files to disk.
>>>
>>> $ ll integrated_phrase-reordering/
>>> total 24688
>>> drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
>>> drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
>>> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
>>> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
>>> -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
>>> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
>>> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
>>> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
>>> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>>>
>>>
>>> On 04/10/2016 09:06, Vito Mandorino wrote:
>>>
>>> The command was
>>>
>>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>>> --phrase-table=/home/vito/phrase-table.sorted
>>> --lex-ro=/home/vito/reordering-table.sorted
>>> --output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6
>>>
>>> The tables in the command are sorted with LC_ALL . I attach them in .gz
>>> format. Should one use the .gz format also in the command above?
>>>
>>> Vito
>>>
>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-10-07 Thread Hieu Hoang

yep, it should give you a big speedup compared to probingpt + minlexr model

Hieu Hoang
http://www.hoang.co.uk/hieu

On 7 October 2016 at 16:21, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Yes I modified the line in the moses.ini . My comparison was with respect
> to probingPT + minlexr reordering model (rather than .gz reordering model)
>
> 2016-10-07 16:25 GMT+02:00 Hieu Hoang :
>
>> weird. it should be a massive speedup (~500%). You have to change the
>> moses.ini file slightly
>>
>>   [feature]
>>   LexicalReordering … path=reordering-table.msd-bidi
>> rectional-fe.0.5.0-0.gz
>> to
>>   [feature]
>>   LexicalReordering … property-index=0
>>
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 7 October 2016 at 15:02, Vito Mandorino > .com> wrote:
>>
>>> Yes, that worked for me as well, thank you. There is a little
>>> improvement in speed but not that much actually (about 5% faster using 30
>>> threads).
>>>
>>> 2016-10-04 11:44 GMT+02:00 Hieu Hoang :
>>>
 yes - the script expects the files to be gzipped.
 It runs ok for me. I executed this:

 MOSES_DIR=~/workspace/github/mosesdecoder.perf

 $MOSES_DIR/scripts/generic/binarize4moses2.perl
 --phrase-table=phrase-table.gz 
 --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
 --output-dir=integrated_phrase-reordering/ --num-lex-scores=6

 Got this:

 Executing: gzip -dc phrase-table.gz |
 /home/hieu/workspace/github/mosesdecoder.perf/scripts/generi
 c/../../contrib/sigtest-filter/filter-pt -n 0 | gzip -c >
 ./tmp.14373/pt.gz
 ...
 Reading phrase table finished, writing remaining files to disk.

 $ ll integrated_phrase-reordering/
 total 24688
 drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
 drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
 -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
 -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
 -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
 -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
 -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
 -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
 -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat


 On 04/10/2016 09:06, Vito Mandorino wrote:

 The command was

 perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
 --phrase-table=/home/vito/phrase-table.sorted
 --lex-ro=/home/vito/reordering-table.sorted
 --output-dir=/home/vito/integrated_phrase-reordering/
 --num-lex-scores=6

 The tables in the command are sorted with LC_ALL . I attach them in .gz
 format. Should one use the .gz format also in the command above?

 Vito



>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> *
>>>
>>> *Website :*
>>> *www.linguacustodia.finance *
>>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-10-07 Thread Hieu Hoang

weird. it should be a massive speedup (~500%). You have to change the
moses.ini file slightly

  [feature]
  LexicalReordering … path=reordering-table.msd-bidirectional-fe.0.5.0-0.gz
to
  [feature]
  LexicalReordering … property-index=0


Hieu Hoang
http://www.hoang.co.uk/hieu

On 7 October 2016 at 15:02, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Yes, that worked for me as well, thank you. There is a little improvement
> in speed but not that much actually (about 5% faster using 30 threads).
>
> 2016-10-04 11:44 GMT+02:00 Hieu Hoang :
>
>> yes - the script expects the files to be gzipped.
>> It runs ok for me. I executed this:
>>
>> MOSES_DIR=~/workspace/github/mosesdecoder.perf
>>
>> $MOSES_DIR/scripts/generic/binarize4moses2.perl
>> --phrase-table=phrase-table.gz 
>> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
>> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>>
>> Got this:
>>
>> Executing: gzip -dc phrase-table.gz |  /home/hieu/workspace/github/mo
>> sesdecoder.perf/scripts/generic/../../contrib/sigtest-filter/filter-pt
>> -n 0 | gzip -c > ./tmp.14373/pt.gz
>> ...
>> Reading phrase table finished, writing remaining files to disk.
>>
>> $ ll integrated_phrase-reordering/
>> total 24688
>> drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
>> drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
>> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
>> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
>> -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
>> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
>> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
>> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
>> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>>
>>
>> On 04/10/2016 09:06, Vito Mandorino wrote:
>>
>> The command was
>>
>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>> --phrase-table=/home/vito/phrase-table.sorted
>> --lex-ro=/home/vito/reordering-table.sorted
>> --output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6
>>
>> The tables in the command are sorted with LC_ALL . I attach them in .gz
>> format. Should one use the .gz format also in the command above?
>>
>> Vito
>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-10-07 Thread Vito Mandorino

Yes, that worked for me as well, thank you. There is a little improvement
in speed but not that much actually (about 5% faster using 30 threads).

2016-10-04 11:44 GMT+02:00 Hieu Hoang :

> yes - the script expects the files to be gzipped.
> It runs ok for me. I executed this:
>
> MOSES_DIR=~/workspace/github/mosesdecoder.perf
>
> $MOSES_DIR/scripts/generic/binarize4moses2.perl
> --phrase-table=phrase-table.gz 
> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>
> Got this:
>
> Executing: gzip -dc phrase-table.gz |  /home/hieu/workspace/github/
> mosesdecoder.perf/scripts/generic/../../contrib/sigtest-filter/filter-pt
> -n 0 | gzip -c > ./tmp.14373/pt.gz
> ...
> Reading phrase table finished, writing remaining files to disk.
>
> $ ll integrated_phrase-reordering/
> total 24688
> drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
> drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
> -rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>
>
> On 04/10/2016 09:06, Vito Mandorino wrote:
>
> The command was
>
> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
> --phrase-table=/home/vito/phrase-table.sorted
> --lex-ro=/home/vito/reordering-table.sorted 
> --output-dir=/home/vito/integrated_phrase-reordering/
> --num-lex-scores=6
>
> The tables in the command are sorted with LC_ALL . I attach them in .gz
> format. Should one use the .gz format also in the command above?
>
> Vito
>
>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-10-04 Thread Hieu Hoang


yes - the script expects the files to be gzipped.

It runs ok for me. I executed this:

MOSES_DIR=~/workspace/github/mosesdecoder.perf

$MOSES_DIR/scripts/generic/binarize4moses2.perl 
--phrase-table=phrase-table.gz 
--lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz 
--output-dir=integrated_phrase-reordering/ --num-lex-scores=6


Got this:

Executing: gzip -dc phrase-table.gz | 
/home/hieu/workspace/github/mosesdecoder.perf/scripts/generic/../../contrib/sigtest-filter/filter-pt 
-n 0 | gzip -c > ./tmp.14373/pt.gz

...
Reading phrase table finished, writing remaining files to disk.

$ ll integrated_phrase-reordering/
total 24688
drwxrwxr-x 2 hieu hieu 4096 Oct  4 10:38 ./
drwxrwxr-x 5 hieu hieu 4096 Oct  4 10:42 ../
-rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
-rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
-rw-rw-r-- 1 hieu hieu   76 Oct  4 10:42 config
-rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
-rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
-rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
-rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat


On 04/10/2016 09:06, Vito Mandorino wrote:

The command was

perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl 
--phrase-table=/home/vito/phrase-table.sorted 
--lex-ro=/home/vito/reordering-table.sorted 
--output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6


The tables in the command are sorted with LC_ALL . I attach them in 
.gz format. Should one use the .gz format also in the command above?


Vito


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-10-03 Thread Hieu Hoang

if guessed you ran binarize4moses2.perl so please give me the phrase-table
and reordering model, and the exact command you ran

Hieu Hoang
http://www.hoang.co.uk/hieu

On 3 October 2016 at 15:20, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> I have managed to replicate the issue on a smaller corpus. Do you need the
> training corpus, the tables (phrase- and reordering-), or all of them?
>
> Vito
>
> 2016-09-30 13:30 GMT+02:00 Hieu Hoang :
>
>> wow, that looks like a seriously problem.
>>
>> I've not seen this before. If you can make the data file available for
>> download, it would be much appreciated
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 30 September 2016 at 09:11, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> I tried the following command:
>>>
>>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>>> --phrase-table=/home/vito/phrase-table.sorted
>>> --lex-ro=/home/vito/reordering-table.sorted
>>> --output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6
>>>
>>> but it gets stuck with the following message:
>>>
>>> Reading phrase table finished, writing remaining files to disk.
>>> terminate called after throwing an instance of
>>> 'util::ProbingSizeException'
>>>   what():  ./util/probing_hash_table.hh:150 in
>>> util::ProbingHashTable::Entry*
>>> util::ProbingHashTable::Insert(const T&) [with
>>> T = Moses2::Entry; EntryT = Moses2::Entry; HashT = boost::hash>> unsigned int>; EqualT = std::equal_to; ModT =
>>> util::DivMod; util::ProbingHashTable>> ModT>::MutableIterator = Moses2::Entry*; util::ProbingHashTable>> HashT, EqualT, ModT>::Entry = Moses2::Entry] threw ProbingSizeException
>>> because `++entries_ >= buckets_'.
>>> Hash table with 1 buckets is full.
>>>
>>>
>>>
>>>
>>> 2016-09-29 16:25 GMT+02:00 Hieu Hoang :
>>>
 use the script:
scripts/generic/binarize4moses2.perl
 It takes as input the (text) phrase-table and the (text) lexro model.
 It will give you the probing pt which contains the info for both.

 To use this script, Moses MUST be compiled with the flag --with-cmph.
 Also, the program in contrib/sigtest-filter MUST have been successfully
 compiled.



 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 29 September 2016 at 15:19, Vito Mandorino <
 vito.mandor...@linguacustodia.com> wrote:

> Ok thank you, I'll check that. Do you know how to perform the
> integration? Juxtaposing the 4 phrase-table scores and the 6 reordering
> scores before calling CreateProbingPT2 would be enough?
>
> (I have used the CreateProbingPT2 binary and not CreateProbingPT so
> far)
>
> Vito
>
> 2016-09-29 16:07 GMT+02:00 Hieu Hoang :
>
>> you will get another big speedup fromm integrating the lexro into the
>> pt
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 September 2016 at 15:03, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> Yes the model includes a lexicalised reordering model but is not
>>> integrated into the probingPT. The size of the LM is 1.8G.
>>>
>>> 2016-09-29 15:59 GMT+02:00 Hieu Hoang :
>>>
 ps. how big is your LM?

 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 29 September 2016 at 14:58, Hieu Hoang 
 wrote:

> great, thanks. Do you use the lexicalised reordering model, and is
> it integrated into the phrase-table in Moses2?
>
> There is latency in communicating with the server. As Moses2 is
> much faster now, the client can't feed it fast enough. You should see 
> that
> moses2 command line will max out the CPU, whereas the server won't. 
> I'm
> thinking of extending the server to processing multiple sentences at 
> a time
> to speed it up
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 14:49, Vito Mandorino <
> vito.mandor...@linguacustodia.com> wrote:
>
>> Yes, here are some data:
>>
>> Average source sentence length: 29 tokens
>> Phrase-table size, probingPT: 11G
>> Phrase-table size, compact phrase-table: 2.1G
>>
>> Translation time Moses2 with 32 threads: 1m36.511s
>> Translation time Moses with 32 threads: 6m14.248s
>> Translation time Moses2 with 32 threads in server mode: 16m30.137s
>> Translation time Moses with 32 threads in server mode: 62m33.208s
>>
>> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>>
>> So Moses2 is 4 times faster, and 3 times

Re: [Moses-support] differences between moses and moses2 output

2016-10-03 Thread Vito Mandorino

I have managed to replicate the issue on a smaller corpus. Do you need the
training corpus, the tables (phrase- and reordering-), or all of them?

Vito

2016-09-30 13:30 GMT+02:00 Hieu Hoang :

> wow, that looks like a seriously problem.
>
> I've not seen this before. If you can make the data file available for
> download, it would be much appreciated
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 30 September 2016 at 09:11, Vito Mandorino  linguacustodia.com> wrote:
>
>> I tried the following command:
>>
>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>> --phrase-table=/home/vito/phrase-table.sorted
>> --lex-ro=/home/vito/reordering-table.sorted
>> --output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6
>>
>> but it gets stuck with the following message:
>>
>> Reading phrase table finished, writing remaining files to disk.
>> terminate called after throwing an instance of
>> 'util::ProbingSizeException'
>>   what():  ./util/probing_hash_table.hh:150 in
>> util::ProbingHashTable::Entry*
>> util::ProbingHashTable::Insert(const T&) [with
>> T = Moses2::Entry; EntryT = Moses2::Entry; HashT = boost::hash> unsigned int>; EqualT = std::equal_to; ModT =
>> util::DivMod; util::ProbingHashTable> ModT>::MutableIterator = Moses2::Entry*; util::ProbingHashTable> HashT, EqualT, ModT>::Entry = Moses2::Entry] threw ProbingSizeException
>> because `++entries_ >= buckets_'.
>> Hash table with 1 buckets is full.
>>
>>
>>
>>
>> 2016-09-29 16:25 GMT+02:00 Hieu Hoang :
>>
>>> use the script:
>>>scripts/generic/binarize4moses2.perl
>>> It takes as input the (text) phrase-table and the (text) lexro model. It
>>> will give you the probing pt which contains the info for both.
>>>
>>> To use this script, Moses MUST be compiled with the flag --with-cmph.
>>> Also, the program in contrib/sigtest-filter MUST have been successfully
>>> compiled.
>>>
>>>
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 29 September 2016 at 15:19, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com> wrote:
>>>
 Ok thank you, I'll check that. Do you know how to perform the
 integration? Juxtaposing the 4 phrase-table scores and the 6 reordering
 scores before calling CreateProbingPT2 would be enough?

 (I have used the CreateProbingPT2 binary and not CreateProbingPT so far)

 Vito

 2016-09-29 16:07 GMT+02:00 Hieu Hoang :

> you will get another big speedup fromm integrating the lexro into the
> pt
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 15:03, Vito Mandorino <
> vito.mandor...@linguacustodia.com> wrote:
>
>> Yes the model includes a lexicalised reordering model but is not
>> integrated into the probingPT. The size of the LM is 1.8G.
>>
>> 2016-09-29 15:59 GMT+02:00 Hieu Hoang :
>>
>>> ps. how big is your LM?
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 29 September 2016 at 14:58, Hieu Hoang 
>>> wrote:
>>>
 great, thanks. Do you use the lexicalised reordering model, and is
 it integrated into the phrase-table in Moses2?

 There is latency in communicating with the server. As Moses2 is
 much faster now, the client can't feed it fast enough. You should see 
 that
 moses2 command line will max out the CPU, whereas the server won't. I'm
 thinking of extending the server to processing multiple sentences at a 
 time
 to speed it up

 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 29 September 2016 at 14:49, Vito Mandorino <
 vito.mandor...@linguacustodia.com> wrote:

> Yes, here are some data:
>
> Average source sentence length: 29 tokens
> Phrase-table size, probingPT: 11G
> Phrase-table size, compact phrase-table: 2.1G
>
> Translation time Moses2 with 32 threads: 1m36.511s
> Translation time Moses with 32 threads: 6m14.248s
> Translation time Moses2 with 32 threads in server mode: 16m30.137s
> Translation time Moses with 32 threads in server mode: 62m33.208s
>
> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>
> So Moses2 is 4 times faster, and 3 times faster in server mode.
>
> Do you know why in server mode the speed is so much slower with
> respect to batch mode, for both Moses and Moses2?
>
> Best regards,
> Vito
>
> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>
>> cool. do you have any indications of speed, especially when using
>> multiple

Re: [Moses-support] differences between moses and moses2 output

2016-09-30 Thread Vito Mandorino

I tried the following command:

perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
--phrase-table=/home/vito/phrase-table.sorted
--lex-ro=/home/vito/reordering-table.sorted
--output-dir=/home/vito/integrated_phrase-reordering/ --num-lex-scores=6

but it gets stuck with the following message:

Reading phrase table finished, writing remaining files to disk.
terminate called after throwing an instance of 'util::ProbingSizeException'
  what():  ./util/probing_hash_table.hh:150 in
util::ProbingHashTable::Entry*
util::ProbingHashTable::Insert(const T&) [with
T = Moses2::Entry; EntryT = Moses2::Entry; HashT = boost::hash; EqualT = std::equal_to; ModT =
util::DivMod; util::ProbingHashTable::MutableIterator = Moses2::Entry*; util::ProbingHashTable::Entry = Moses2::Entry] threw ProbingSizeException
because `++entries_ >= buckets_'.
Hash table with 1 buckets is full.




2016-09-29 16:25 GMT+02:00 Hieu Hoang :

> use the script:
>scripts/generic/binarize4moses2.perl
> It takes as input the (text) phrase-table and the (text) lexro model. It
> will give you the probing pt which contains the info for both.
>
> To use this script, Moses MUST be compiled with the flag --with-cmph.
> Also, the program in contrib/sigtest-filter MUST have been successfully
> compiled.
>
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 15:19, Vito Mandorino  linguacustodia.com> wrote:
>
>> Ok thank you, I'll check that. Do you know how to perform the
>> integration? Juxtaposing the 4 phrase-table scores and the 6 reordering
>> scores before calling CreateProbingPT2 would be enough?
>>
>> (I have used the CreateProbingPT2 binary and not CreateProbingPT so far)
>>
>> Vito
>>
>> 2016-09-29 16:07 GMT+02:00 Hieu Hoang :
>>
>>> you will get another big speedup fromm integrating the lexro into the pt
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 29 September 2016 at 15:03, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com> wrote:
>>>
 Yes the model includes a lexicalised reordering model but is not
 integrated into the probingPT. The size of the LM is 1.8G.

 2016-09-29 15:59 GMT+02:00 Hieu Hoang :

> ps. how big is your LM?
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 14:58, Hieu Hoang  wrote:
>
>> great, thanks. Do you use the lexicalised reordering model, and is it
>> integrated into the phrase-table in Moses2?
>>
>> There is latency in communicating with the server. As Moses2 is much
>> faster now, the client can't feed it fast enough. You should see that
>> moses2 command line will max out the CPU, whereas the server won't. I'm
>> thinking of extending the server to processing multiple sentences at a 
>> time
>> to speed it up
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 September 2016 at 14:49, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> Yes, here are some data:
>>>
>>> Average source sentence length: 29 tokens
>>> Phrase-table size, probingPT: 11G
>>> Phrase-table size, compact phrase-table: 2.1G
>>>
>>> Translation time Moses2 with 32 threads: 1m36.511s
>>> Translation time Moses with 32 threads: 6m14.248s
>>> Translation time Moses2 with 32 threads in server mode: 16m30.137s
>>> Translation time Moses with 32 threads in server mode: 62m33.208s
>>>
>>> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>>>
>>> So Moses2 is 4 times faster, and 3 times faster in server mode.
>>>
>>> Do you know why in server mode the speed is so much slower with
>>> respect to batch mode, for both Moses and Moses2?
>>>
>>> Best regards,
>>> Vito
>>>
>>> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>>>
 cool. do you have any indications of speed, especially when using
 multiple threads? model sizes and average input sentence length are 
 also
 relevant.



>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> *
>>>
>>> *Website :*
>>> *www.linguacustodia.finance *
>>>
>>
>>
>


 --
 *M**. Vito MANDORINO -- Chief

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Hieu Hoang

use the script:
   scripts/generic/binarize4moses2.perl
It takes as input the (text) phrase-table and the (text) lexro model. It
will give you the probing pt which contains the info for both.

To use this script, Moses MUST be compiled with the flag --with-cmph. Also,
the program in contrib/sigtest-filter MUST have been successfully compiled.



Hieu Hoang
http://www.hoang.co.uk/hieu

On 29 September 2016 at 15:19, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Ok thank you, I'll check that. Do you know how to perform the integration?
> Juxtaposing the 4 phrase-table scores and the 6 reordering scores before
> calling CreateProbingPT2 would be enough?
>
> (I have used the CreateProbingPT2 binary and not CreateProbingPT so far)
>
> Vito
>
> 2016-09-29 16:07 GMT+02:00 Hieu Hoang :
>
>> you will get another big speedup fromm integrating the lexro into the pt
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 September 2016 at 15:03, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> Yes the model includes a lexicalised reordering model but is not
>>> integrated into the probingPT. The size of the LM is 1.8G.
>>>
>>> 2016-09-29 15:59 GMT+02:00 Hieu Hoang :
>>>
 ps. how big is your LM?

 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 29 September 2016 at 14:58, Hieu Hoang  wrote:

> great, thanks. Do you use the lexicalised reordering model, and is it
> integrated into the phrase-table in Moses2?
>
> There is latency in communicating with the server. As Moses2 is much
> faster now, the client can't feed it fast enough. You should see that
> moses2 command line will max out the CPU, whereas the server won't. I'm
> thinking of extending the server to processing multiple sentences at a 
> time
> to speed it up
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 14:49, Vito Mandorino <
> vito.mandor...@linguacustodia.com> wrote:
>
>> Yes, here are some data:
>>
>> Average source sentence length: 29 tokens
>> Phrase-table size, probingPT: 11G
>> Phrase-table size, compact phrase-table: 2.1G
>>
>> Translation time Moses2 with 32 threads: 1m36.511s
>> Translation time Moses with 32 threads: 6m14.248s
>> Translation time Moses2 with 32 threads in server mode: 16m30.137s
>> Translation time Moses with 32 threads in server mode: 62m33.208s
>>
>> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>>
>> So Moses2 is 4 times faster, and 3 times faster in server mode.
>>
>> Do you know why in server mode the speed is so much slower with
>> respect to batch mode, for both Moses and Moses2?
>>
>> Best regards,
>> Vito
>>
>> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>>
>>> cool. do you have any indications of speed, especially when using
>>> multiple threads? model sizes and average input sentence length are also
>>> relevant.
>>>
>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>

>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> *
>>>
>>> *Website :*
>>> *www.linguacustodia.finance *
>>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Hieu Hoang

you will get another big speedup fromm integrating the lexro into the pt

Hieu Hoang
http://www.hoang.co.uk/hieu

On 29 September 2016 at 15:03, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Yes the model includes a lexicalised reordering model but is not
> integrated into the probingPT. The size of the LM is 1.8G.
>
> 2016-09-29 15:59 GMT+02:00 Hieu Hoang :
>
>> ps. how big is your LM?
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 September 2016 at 14:58, Hieu Hoang  wrote:
>>
>>> great, thanks. Do you use the lexicalised reordering model, and is it
>>> integrated into the phrase-table in Moses2?
>>>
>>> There is latency in communicating with the server. As Moses2 is much
>>> faster now, the client can't feed it fast enough. You should see that
>>> moses2 command line will max out the CPU, whereas the server won't. I'm
>>> thinking of extending the server to processing multiple sentences at a time
>>> to speed it up
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 29 September 2016 at 14:49, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com> wrote:
>>>
 Yes, here are some data:

 Average source sentence length: 29 tokens
 Phrase-table size, probingPT: 11G
 Phrase-table size, compact phrase-table: 2.1G

 Translation time Moses2 with 32 threads: 1m36.511s
 Translation time Moses with 32 threads: 6m14.248s
 Translation time Moses2 with 32 threads in server mode: 16m30.137s
 Translation time Moses with 32 threads in server mode: 62m33.208s

 Ram consumption during decoding: 4G for Moses2, 5G for Moses

 So Moses2 is 4 times faster, and 3 times faster in server mode.

 Do you know why in server mode the speed is so much slower with respect
 to batch mode, for both Moses and Moses2?

 Best regards,
 Vito

 2016-09-28 18:52 GMT+02:00 Hieu Hoang :

> cool. do you have any indications of speed, especially when using
> multiple threads? model sizes and average input sentence length are also
> relevant.
>
>
>


 --
 *M**. Vito MANDORINO -- Chief Scientist*


 [image: Description : Description : lingua_custodia_final full logo]

  *The Translation Trustee*

 *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

 *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
 <%2B33%206%2084%2065%2068%2089>*

 *Email :*  *vito.mandor...@linguacustodia.com
 *

 *Website :*
 *www.linguacustodia.finance *

>>>
>>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Vito Mandorino

Ok thank you, I'll check that. Do you know how to perform the integration?
Juxtaposing the 4 phrase-table scores and the 6 reordering scores before
calling CreateProbingPT2 would be enough?

(I have used the CreateProbingPT2 binary and not CreateProbingPT so far)

Vito

2016-09-29 16:07 GMT+02:00 Hieu Hoang :

> you will get another big speedup fromm integrating the lexro into the pt
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 15:03, Vito Mandorino  linguacustodia.com> wrote:
>
>> Yes the model includes a lexicalised reordering model but is not
>> integrated into the probingPT. The size of the LM is 1.8G.
>>
>> 2016-09-29 15:59 GMT+02:00 Hieu Hoang :
>>
>>> ps. how big is your LM?
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 29 September 2016 at 14:58, Hieu Hoang  wrote:
>>>
 great, thanks. Do you use the lexicalised reordering model, and is it
 integrated into the phrase-table in Moses2?

 There is latency in communicating with the server. As Moses2 is much
 faster now, the client can't feed it fast enough. You should see that
 moses2 command line will max out the CPU, whereas the server won't. I'm
 thinking of extending the server to processing multiple sentences at a time
 to speed it up

 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 29 September 2016 at 14:49, Vito Mandorino <
 vito.mandor...@linguacustodia.com> wrote:

> Yes, here are some data:
>
> Average source sentence length: 29 tokens
> Phrase-table size, probingPT: 11G
> Phrase-table size, compact phrase-table: 2.1G
>
> Translation time Moses2 with 32 threads: 1m36.511s
> Translation time Moses with 32 threads: 6m14.248s
> Translation time Moses2 with 32 threads in server mode: 16m30.137s
> Translation time Moses with 32 threads in server mode: 62m33.208s
>
> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>
> So Moses2 is 4 times faster, and 3 times faster in server mode.
>
> Do you know why in server mode the speed is so much slower with
> respect to batch mode, for both Moses and Moses2?
>
> Best regards,
> Vito
>
> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>
>> cool. do you have any indications of speed, especially when using
>> multiple threads? model sizes and average input sentence length are also
>> relevant.
>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>


>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Vito Mandorino

Yes the model includes a lexicalised reordering model but is not integrated
into the probingPT. The size of the LM is 1.8G.

2016-09-29 15:59 GMT+02:00 Hieu Hoang :

> ps. how big is your LM?
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 14:58, Hieu Hoang  wrote:
>
>> great, thanks. Do you use the lexicalised reordering model, and is it
>> integrated into the phrase-table in Moses2?
>>
>> There is latency in communicating with the server. As Moses2 is much
>> faster now, the client can't feed it fast enough. You should see that
>> moses2 command line will max out the CPU, whereas the server won't. I'm
>> thinking of extending the server to processing multiple sentences at a time
>> to speed it up
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 September 2016 at 14:49, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> Yes, here are some data:
>>>
>>> Average source sentence length: 29 tokens
>>> Phrase-table size, probingPT: 11G
>>> Phrase-table size, compact phrase-table: 2.1G
>>>
>>> Translation time Moses2 with 32 threads: 1m36.511s
>>> Translation time Moses with 32 threads: 6m14.248s
>>> Translation time Moses2 with 32 threads in server mode: 16m30.137s
>>> Translation time Moses with 32 threads in server mode: 62m33.208s
>>>
>>> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>>>
>>> So Moses2 is 4 times faster, and 3 times faster in server mode.
>>>
>>> Do you know why in server mode the speed is so much slower with respect
>>> to batch mode, for both Moses and Moses2?
>>>
>>> Best regards,
>>> Vito
>>>
>>> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>>>
 cool. do you have any indications of speed, especially when using
 multiple threads? model sizes and average input sentence length are also
 relevant.



>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> *
>>>
>>> *Website :*
>>> *www.linguacustodia.finance *
>>>
>>
>>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Hieu Hoang

great, thanks. Do you use the lexicalised reordering model, and is it
integrated into the phrase-table in Moses2?

There is latency in communicating with the server. As Moses2 is much faster
now, the client can't feed it fast enough. You should see that moses2
command line will max out the CPU, whereas the server won't. I'm thinking
of extending the server to processing multiple sentences at a time to speed
it up

Hieu Hoang
http://www.hoang.co.uk/hieu

On 29 September 2016 at 14:49, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Yes, here are some data:
>
> Average source sentence length: 29 tokens
> Phrase-table size, probingPT: 11G
> Phrase-table size, compact phrase-table: 2.1G
>
> Translation time Moses2 with 32 threads: 1m36.511s
> Translation time Moses with 32 threads: 6m14.248s
> Translation time Moses2 with 32 threads in server mode: 16m30.137s
> Translation time Moses with 32 threads in server mode: 62m33.208s
>
> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>
> So Moses2 is 4 times faster, and 3 times faster in server mode.
>
> Do you know why in server mode the speed is so much slower with respect to
> batch mode, for both Moses and Moses2?
>
> Best regards,
> Vito
>
> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>
>> cool. do you have any indications of speed, especially when using
>> multiple threads? model sizes and average input sentence length are also
>> relevant.
>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Hieu Hoang

ps. how big is your LM?

Hieu Hoang
http://www.hoang.co.uk/hieu

On 29 September 2016 at 14:58, Hieu Hoang  wrote:

> great, thanks. Do you use the lexicalised reordering model, and is it
> integrated into the phrase-table in Moses2?
>
> There is latency in communicating with the server. As Moses2 is much
> faster now, the client can't feed it fast enough. You should see that
> moses2 command line will max out the CPU, whereas the server won't. I'm
> thinking of extending the server to processing multiple sentences at a time
> to speed it up
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 September 2016 at 14:49, Vito Mandorino  linguacustodia.com> wrote:
>
>> Yes, here are some data:
>>
>> Average source sentence length: 29 tokens
>> Phrase-table size, probingPT: 11G
>> Phrase-table size, compact phrase-table: 2.1G
>>
>> Translation time Moses2 with 32 threads: 1m36.511s
>> Translation time Moses with 32 threads: 6m14.248s
>> Translation time Moses2 with 32 threads in server mode: 16m30.137s
>> Translation time Moses with 32 threads in server mode: 62m33.208s
>>
>> Ram consumption during decoding: 4G for Moses2, 5G for Moses
>>
>> So Moses2 is 4 times faster, and 3 times faster in server mode.
>>
>> Do you know why in server mode the speed is so much slower with respect
>> to batch mode, for both Moses and Moses2?
>>
>> Best regards,
>> Vito
>>
>> 2016-09-28 18:52 GMT+02:00 Hieu Hoang :
>>
>>> cool. do you have any indications of speed, especially when using
>>> multiple threads? model sizes and average input sentence length are also
>>> relevant.
>>>
>>>
>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-29 Thread Vito Mandorino

Yes, here are some data:

Average source sentence length: 29 tokens
Phrase-table size, probingPT: 11G
Phrase-table size, compact phrase-table: 2.1G

Translation time Moses2 with 32 threads: 1m36.511s
Translation time Moses with 32 threads: 6m14.248s
Translation time Moses2 with 32 threads in server mode: 16m30.137s
Translation time Moses with 32 threads in server mode: 62m33.208s

Ram consumption during decoding: 4G for Moses2, 5G for Moses

So Moses2 is 4 times faster, and 3 times faster in server mode.

Do you know why in server mode the speed is so much slower with respect to
batch mode, for both Moses and Moses2?

Best regards,
Vito

2016-09-28 18:52 GMT+02:00 Hieu Hoang :

> cool. do you have any indications of speed, especially when using multiple
> threads? model sizes and average input sentence length are also relevant.
>
>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Hieu Hoang

cool. do you have any indications of speed, especially when using 
multiple threads? model sizes and average input sentence length are also 
relevant.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Vito Mandorino

Now it works! Thanks. On 6000 test sentences the Moses2 output is now
actually 100% identical to the standard Moses output.

Vito

2016-09-28 16:12 GMT+02:00 Hieu Hoang :

> hi Vito,
>
> please git pull and try decoding again. I've just pushed a fix
>https://github.com/hieuhoang/mosesdecoder/commit/
> 0005e98b2674906162ce7945c5edd6a42c9ca418
> Basically, I've changed changed the behavious of the pugi call so that it
> doesn't unescape the  words
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 28 September 2016 at 14:33, Hieu Hoang  wrote:
>
>> ah ok. do you have a moses.ini and example input sentence to go with that.
>>
>> pugixml.cpp is used to parse the input sentence for XML markups for
>> placeholders, forced-translation etc. You shouldn't change the code for
>> pugixml 'cos it's an imported library that we don't control and we may
>> reimport in future if there are new releases. The problem seems to be
>> Moses2' use of the library so it should be fixed in Moses2
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 28 September 2016 at 14:22, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> We are able to replicate the issue with the probingPT version of this
>>> phrase-table:
>>>
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>>
>>> If we understand well, the origin of the issue is in the function
>>> strconv_escape in ./contrib/moses2/pugixml.cpp  which replaces some of
>>> these entities with the actual symbol. Commenting out that part seems to
>>> fix the problem, but we wonder if this may cause any issues elsewhere since
>>> we don't know the purpose of the entity replacement.
>>>
>>> Best regards,
>>> Vito
>>>
>>> 2016-09-28 11:19 GMT+02:00 Hieu Hoang :
>>>
 Can you make your model files available for download?

 Moses and Moses2 aren't guaranteed to give exactly the same answer.
 However, they should be the same quality overall

 Hieu Hoang
 http://www.hoang.co.uk/hieu

 On 28 September 2016 at 09:53, Vito Mandorino <
 vito.mandor...@linguacustodia.com> wrote:

> Hi,
>
> we are testing moses2 and we find a decrease in quality which seems to
> be related to apostrophes. For instance:
>
> Source segment 1:
> mise à disposition des actionnaires des documents d information
> relatifs à la sicav
>
> MT Moses:
> provision shareholders of the briefing material for the sicav
>
> MT Moses2:
> provision of shareholders documents d' information concerning the fund
>
>
> Source segment 2:
> tout titre qui deviendrait spéculatif à la suite d une
> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
> moins que le conseiller en investissement n estime qu il y va
> de l intérêt des actionnaires .
>
> MT Moses:
> any security that would become speculative following a downgrading
> after its takeover by the fund will not be liquidated , unless the
> investment adviser believes it is in the interest of shareholders .
>
> MT Moses2:
> any security that would become speculative following a possible
> downgrade d' by the fund after its acquisition will not be liquidated ,
> unless the investment advisor believes n' stake qu' l' interest of
> shareholders .
>
> It is actually strange that the raw MT output contains the apostrophe
> symbol instead of the  entity . What could the reason be?
>
> Best regards,
> Vito
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Hieu Hoang

hi Vito,

please git pull and try decoding again. I've just pushed a fix

https://github.com/hieuhoang/mosesdecoder/commit/0005e98b2674906162ce7945c5edd6a42c9ca418
Basically, I've changed changed the behavious of the pugi call so that it
doesn't unescape the  words


Hieu Hoang
http://www.hoang.co.uk/hieu

On 28 September 2016 at 14:33, Hieu Hoang  wrote:

> ah ok. do you have a moses.ini and example input sentence to go with that.
>
> pugixml.cpp is used to parse the input sentence for XML markups for
> placeholders, forced-translation etc. You shouldn't change the code for
> pugixml 'cos it's an imported library that we don't control and we may
> reimport in future if there are new releases. The problem seems to be
> Moses2' use of the library so it should be fixed in Moses2
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 28 September 2016 at 14:22, Vito Mandorino  linguacustodia.com> wrote:
>
>> We are able to replicate the issue with the probingPT version of this
>> phrase-table:
>>
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>>
>> If we understand well, the origin of the issue is in the function
>> strconv_escape in ./contrib/moses2/pugixml.cpp  which replaces some of
>> these entities with the actual symbol. Commenting out that part seems to
>> fix the problem, but we wonder if this may cause any issues elsewhere since
>> we don't know the purpose of the entity replacement.
>>
>> Best regards,
>> Vito
>>
>> 2016-09-28 11:19 GMT+02:00 Hieu Hoang :
>>
>>> Can you make your model files available for download?
>>>
>>> Moses and Moses2 aren't guaranteed to give exactly the same answer.
>>> However, they should be the same quality overall
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 28 September 2016 at 09:53, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com> wrote:
>>>
 Hi,

 we are testing moses2 and we find a decrease in quality which seems to
 be related to apostrophes. For instance:

 Source segment 1:
 mise à disposition des actionnaires des documents d information
 relatifs à la sicav

 MT Moses:
 provision shareholders of the briefing material for the sicav

 MT Moses2:
 provision of shareholders documents d' information concerning the fund


 Source segment 2:
 tout titre qui deviendrait spéculatif à la suite d une
 rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
 moins que le conseiller en investissement n estime qu il y va
 de l intérêt des actionnaires .

 MT Moses:
 any security that would become speculative following a downgrading
 after its takeover by the fund will not be liquidated , unless the
 investment adviser believes it is in the interest of shareholders .

 MT Moses2:
 any security that would become speculative following a possible
 downgrade d' by the fund after its acquisition will not be liquidated ,
 unless the investment advisor believes n' stake qu' l' interest of
 shareholders .

 It is actually strange that the raw MT output contains the apostrophe
 symbol instead of the  entity . What could the reason be?

 Best regards,
 Vito


 --
 *M**. Vito MANDORINO -- Chief Scientist*


 [image: Description : Description : lingua_custodia_final full logo]

  *The Translation Trustee*

 *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

 *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
 <%2B33%206%2084%2065%2068%2089>*

 *Email :*  *vito.mandor...@linguacustodia.com
 *

 *Website :*
 *www.linguacustodia.finance *

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Hieu Hoang

ah ok. do you have a moses.ini and example input sentence to go with that.

pugixml.cpp is used to parse the input sentence for XML markups for
placeholders, forced-translation etc. You shouldn't change the code for
pugixml 'cos it's an imported library that we don't control and we may
reimport in future if there are new releases. The problem seems to be
Moses2' use of the library so it should be fixed in Moses2

Hieu Hoang
http://www.hoang.co.uk/hieu

On 28 September 2016 at 14:22, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> We are able to replicate the issue with the probingPT version of this
> phrase-table:
>
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>  |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
>
> If we understand well, the origin of the issue is in the function
> strconv_escape in ./contrib/moses2/pugixml.cpp  which replaces some of
> these entities with the actual symbol. Commenting out that part seems to
> fix the problem, but we wonder if this may cause any issues elsewhere since
> we don't know the purpose of the entity replacement.
>
> Best regards,
> Vito
>
> 2016-09-28 11:19 GMT+02:00 Hieu Hoang :
>
>> Can you make your model files available for download?
>>
>> Moses and Moses2 aren't guaranteed to give exactly the same answer.
>> However, they should be the same quality overall
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 28 September 2016 at 09:53, Vito Mandorino <
>> vito.mandor...@linguacustodia.com> wrote:
>>
>>> Hi,
>>>
>>> we are testing moses2 and we find a decrease in quality which seems to
>>> be related to apostrophes. For instance:
>>>
>>> Source segment 1:
>>> mise à disposition des actionnaires des documents d information
>>> relatifs à la sicav
>>>
>>> MT Moses:
>>> provision shareholders of the briefing material for the sicav
>>>
>>> MT Moses2:
>>> provision of shareholders documents d' information concerning the fund
>>>
>>>
>>> Source segment 2:
>>> tout titre qui deviendrait spéculatif à la suite d une
>>> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
>>> moins que le conseiller en investissement n estime qu il y va
>>> de l intérêt des actionnaires .
>>>
>>> MT Moses:
>>> any security that would become speculative following a downgrading after
>>> its takeover by the fund will not be liquidated , unless the investment
>>> adviser believes it is in the interest of shareholders .
>>>
>>> MT Moses2:
>>> any security that would become speculative following a possible
>>> downgrade d' by the fund after its acquisition will not be liquidated ,
>>> unless the investment advisor believes n' stake qu' l' interest of
>>> shareholders .
>>>
>>> It is actually strange that the raw MT output contains the apostrophe
>>> symbol instead of the  entity . What could the reason be?
>>>
>>> Best regards,
>>> Vito
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> *
>>>
>>> *Website :*
>>> *www.linguacustodia.finance *
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Vito Mandorino

We are able to replicate the issue with the probingPT version of this
phrase-table:

 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||
 |||  ||| 1 1 1 1 ||| 0-0 ||| 1 1 1 ||| |||

If we understand well, the origin of the issue is in the function
strconv_escape in ./contrib/moses2/pugixml.cpp  which replaces some of
these entities with the actual symbol. Commenting out that part seems to
fix the problem, but we wonder if this may cause any issues elsewhere since
we don't know the purpose of the entity replacement.

Best regards,
Vito

2016-09-28 11:19 GMT+02:00 Hieu Hoang :

> Can you make your model files available for download?
>
> Moses and Moses2 aren't guaranteed to give exactly the same answer.
> However, they should be the same quality overall
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 28 September 2016 at 09:53, Vito Mandorino  linguacustodia.com> wrote:
>
>> Hi,
>>
>> we are testing moses2 and we find a decrease in quality which seems to be
>> related to apostrophes. For instance:
>>
>> Source segment 1:
>> mise à disposition des actionnaires des documents d information
>> relatifs à la sicav
>>
>> MT Moses:
>> provision shareholders of the briefing material for the sicav
>>
>> MT Moses2:
>> provision of shareholders documents d' information concerning the fund
>>
>>
>> Source segment 2:
>> tout titre qui deviendrait spéculatif à la suite d une
>> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
>> moins que le conseiller en investissement n estime qu il y va
>> de l intérêt des actionnaires .
>>
>> MT Moses:
>> any security that would become speculative following a downgrading after
>> its takeover by the fund will not be liquidated , unless the investment
>> adviser believes it is in the interest of shareholders .
>>
>> MT Moses2:
>> any security that would become speculative following a possible downgrade
>> d' by the fund after its acquisition will not be liquidated , unless the
>> investment advisor believes n' stake qu' l' interest of shareholders .
>>
>> It is actually strange that the raw MT output contains the apostrophe
>> symbol instead of the  entity . What could the reason be?
>>
>> Best regards,
>> Vito
>>
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

2016-09-28 Thread Hieu Hoang

Can you make your model files available for download?

Moses and Moses2 aren't guaranteed to give exactly the same answer.
However, they should be the same quality overall

Hieu Hoang
http://www.hoang.co.uk/hieu

On 28 September 2016 at 09:53, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> Hi,
>
> we are testing moses2 and we find a decrease in quality which seems to be
> related to apostrophes. For instance:
>
> Source segment 1:
> mise à disposition des actionnaires des documents d information
> relatifs à la sicav
>
> MT Moses:
> provision shareholders of the briefing material for the sicav
>
> MT Moses2:
> provision of shareholders documents d' information concerning the fund
>
>
> Source segment 2:
> tout titre qui deviendrait spéculatif à la suite d une
> rétrogradation après son acquisition par le fonds ne sera pas liquidé , à
> moins que le conseiller en investissement n estime qu il y va
> de l intérêt des actionnaires .
>
> MT Moses:
> any security that would become speculative following a downgrading after
> its takeover by the fund will not be liquidated , unless the investment
> adviser believes it is in the interest of shareholders .
>
> MT Moses2:
> any security that would become speculative following a possible downgrade
> d' by the fund after its acquisition will not be liquidated , unless the
> investment advisor believes n' stake qu' l' interest of shareholders .
>
> It is actually strange that the raw MT output contains the apostrophe
> symbol instead of the  entity . What could the reason be?
>
> Best regards,
> Vito
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> *
>
> *Website :*
> *www.linguacustodia.finance *
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

Re: [Moses-support] differences between moses and moses2 output

23 matches

Site Navigation

Mail list logo

Footer information