[Moses-support] OpenNMT workshop March 2 2018

2018-02-01 Thread Vincent Nguyen
Dear all, In case one would like a good excuse to visit Paris March 2-3 2018, there will be a workshop on OpenNMT. Here is the registration website. http://workshop-paris-2018.opennmt.net/ Cheers, Vincent ___ Moses-support mailing list

Re: [Moses-support] NCv12 number of lines mismatch

2017-09-14 Thread Vincent Nguyen
nano give also the "right" number 270769 but I got some script which find a difference. Le 14/09/2017 à 08:48, Vincent Nguyen a écrit : > okay really weird. > wc gives me the same numbers as you, but gedit give another 2 different > numbers for each file. Must be special c

Re: [Moses-support] NCv12 number of lines mismatch

2017-09-14 Thread Vincent Nguyen
* >>   270769 news-commentary-v12.de-en.de >>   270769 news-commentary-v12.de-en.en >>   541538 total > > What are you running that shows you different line numbers? > > cheers - Barry > > On 12/09/17 10:06, Vincent Nguyen wrote: >> Hi, >> Is there an

[Moses-support] NCv12 number of lines mismatch

2017-09-12 Thread Vincent Nguyen
Hi, Is there an updated version of NCv12 for this http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz the number of lines for de-en is not the same in the 2 languages. Cheers, Vincent ___ Moses-support mailing list

[Moses-support] Chinese tokenizer / detokenizer (segmenter / unsegmenter)

2017-05-29 Thread Vincent Nguyen
Hello team, I have read many post and it looks like most people tend to use the Stanford segmenter. Do you have some good experience with other tools ? Also, what "detokenizer" do you actually use. It seems, that it is not just a question of removing space, especially when Chinese target

Re: [Moses-support] Looking for a tool for training csv delimited and aligned data

2017-04-26 Thread Vincent Nguyen
I think you mixed up input/ouput because in your example at the end, you would like to get pronunciation of a given new word. input is the left hand side and output is the pron. If you are able to rework a little bit the right hand side of your data (you need to stretch the phones one by

Re: [Moses-support] Training backward LM?

2016-10-20 Thread Vincent Nguyen
Hi Michael, Trying to check if you're tests on this subject were successful or not, can you follow up ? thanks ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] News monolingual corpus question

2016-10-05 Thread Vincent Nguyen
re de-duping, and before we > didn't. > > I would say if you want to compare to recent WMT experiments, take the > most recent version of the data, > > cheers - Barry > > On 04/10/16 21:34, Vincent Nguyen wrote: >> >> ok >> this one http://www.statmt.org/wmt11/t

Re: [Moses-support] News monolingual corpus question

2016-10-04 Thread Vincent Nguyen
sed files? > > cheers - Barry > > On 04/10/16 14:40, Vincent Nguyen wrote: >> Hi, >> >> on this link: >> >> http://www.statmt.org/wmt11/translation-task.html >> >> on the download section for monolingual data, there is : >> &

[Moses-support] News monolingual corpus question

2016-10-04 Thread Vincent Nguyen
Hi, on this link: http://www.statmt.org/wmt11/translation-task.html on the download section for monolingual data, there is : one big file : http://www.statmt.org/wmt11/training-monolingual.tgz And separate files, of which news crawls per year. However, when you take a single file for a

Re: [Moses-support] EMS question - no recasing no truecasing

2016-05-30 Thread Vincent Nguyen
2016 at 9:57 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi, I have a basic question on EMS. If I want no recasing and no truecasing, I just put IGNORE next to the 2 sections. However I have the feeling it does n

[Moses-support] EMS question - no recasing no truecasing

2016-05-30 Thread Vincent Nguyen
Hi, I have a basic question on EMS. If I want no recasing and no truecasing, I just put IGNORE next to the 2 sections. However I have the feeling it does not eliminate this step for the EVALUATION step, and there is no ignore within this one. Is this the case ? Thanks, Vincent

[Moses-support] UN V1.0 corpus / Europarl - first shot... EN=>FR

2016-05-30 Thread Vincent Nguyen
First, many thanks for the huge work. open some new languages possibilities not in the europarl. I just made one test comparison : Config 1: Corpus UN v1.0 LM : UN V1.0 + News2014FR DEV+TEST=Newsdiscuss2015 Nist=29.61 Config 2: Corpus Europarl LM : Europarl + News2014FR

Re: [Moses-support] loading time for large LMs

2016-04-10 Thread Vincent Nguyen
SSD drive ? if not, then forget it. try cat > NULL Le 10/04/2016 08:29, Jorg Tiedemann a écrit : Hi, I have a large language model from the common crawl data set and it takes forever to load when running moses. My model is a trigram kenlm binarized with quantization, trie structures and

Re: [Moses-support] language models options

2016-04-06 Thread Vincent Nguyen
of phrase tables and language models matter, too, but not as much, and it seems that in your scenario you are just wondering about splitting up a fixed pool of data. -phi On Wed, Apr 6, 2016 at 6:50 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi,

[Moses-support] language models options

2016-04-06 Thread Vincent Nguyen
Hi, What are (in terms of performance) the difference between the 3 following solutions : 2 corpus, 2 LM, 2 weights calculated at tuning time 2 corpus merged into one, 1 LM 2 corpus, 2 LM interpolated into 1 LM with tuning Will the results be different in the end ? thanks.

Re: [Moses-support] Translating words with apostrophies

2016-04-03 Thread Vincent Nguyen
Apostrophe is tricky to handle properly the tokenizer is language sensitive (see -l option) in French : l'été => l été [with a space between ; and é] in English : today's story => today s story BUT the issue is sometime in corpora you will find some misplaced spaces before or after the

Re: [Moses-support] Maximum Phrase Table length

2016-04-01 Thread Vincent Nguyen
, 2016 at 2:58 PM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hello, Does someone have some support to this (found in the doc) : Maximum Phrase Length The maximum length of phrases is limited to 7 words. The maximum phrase length impa

[Moses-support] Maximum Phrase Table length

2016-03-31 Thread Vincent Nguyen
Hello, Does someone have some support to this (found in the doc) : Maximum Phrase Length The maximum length of phrases is limited to 7 words. The maximum phrase length impacts the size of the phrase translation table, so shorter limits may be desirable, if phrase table size is an issue.

[Moses-support] reordering issue

2016-03-21 Thread Vincent Nguyen
Hi, I have been fighting with some reordering issues. I have tried both LM interpolation and OSM but with no luck. Here is an example Source English : Canada remains very active within the Working Group, and our law enforcement officials also participate in the Working Group’s informal law

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-14 Thread Vincent Nguyen
output of the decoder, and see how it is changed by the detokenizer. -phi On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi, I got the following situation: This group age is translated sometimes in: ce groupe

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-10 Thread Vincent Nguyen
d see how it is changed by the detokenizer. -phi On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi, I got the following situation: This group age is translated sometimes in: ce groupe d'âge (correct)

[Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-09 Thread Vincent Nguyen
Hi, I got the following situation: This group age is translated sometimes in: ce groupe d'âge (correct) ce groupe d" âge (incorrect) ce groupe d "âge (incorrect) I am wondering if this is more a detokenizer issue or a corpus issue, or both. Technically in French, there shouldn't be any space

[Moses-support] philosophical question ....NMT/SMT

2016-03-08 Thread Vincent Nguyen
Guys, I got a question to the mathematicians that you all are :) I have been working and testing Moses as well as Groundhog for months now. When I compare results (when comparability is possible, using same corpus, in-domain, blablabla, ...) I do not see much difference in both systems. So

Re: [Moses-support] bleu-annotation / analysis.perl

2016-03-05 Thread Vincent Nguyen
However I believe this is still not right for unigram sentences. ____ De : "Vincent Nguyen" Date : 26 févr. 2016 22:21:59 A : moses-support@mit.edu <mailto:moses-support@mit.edu> Sujet : Re: [Moses-support] bleu-annotation / analysis.perl

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-28 Thread Vincent Nguyen
owever be somewhat faster than only a single thread. On 17.02.2016 22:44, Vincent Nguyen wrote: I have the feeling it's not. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support __

Re: [Moses-support] bleu-annotation / analysis.perl

2016-02-26 Thread Vincent Nguyen
Am I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ? (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf) Le 26/02/2016 10:02, Vincent Nguyen a écrit : > Hi, > > I w

[Moses-support] bleu-annotation / analysis.perl

2016-02-26 Thread Vincent Nguyen
Hi, I would like to understand better the analysis.perl script that generates the bleu-annotation file. Is there an easy way to get the uncased bleu score of each line instead of the cased calculation ? Am I right that this script recompute its own Bleu score without calling the Nist-Bleu nor

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-18 Thread Vincent Nguyen
in Junczys-Dowmunt wrote: >> It is, just not very well done. It generally does not make sense to have >> more than 8-10 threads. That should however be somewhat faster than only >> a single thread. >> >> On 17.02.2016 22:44, V

[Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-17 Thread Vincent Nguyen
I have the feeling it's not. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Vincent Nguyen
did you add -exec at the end (behind -continue 1) ? Le 08/01/2016 18:16, Nicholas Ruiz a écrit : > Thanks, Tomasz. Unfortunately modifying the config file in the steps > directory didn't work for me. My block looks something like this: > > [EVALUATION:test4] > > tokenized-input =

Re: [Moses-support] How much tuning data?

2015-12-28 Thread Vincent Nguyen
this is fine for tuning. if you want to make it quicker, drag it down to 1000 sentences. Le 28/12/2015 16:37, Read, James C a écrit : Hi, I'm setting up some Moses baseline systems for various language pairs to compare the systems against my own work. I've largely been following the

Re: [Moses-support] easy steps for beginners

2015-12-11 Thread Vincent Nguyen
You managed to install it, so you will need a little efforts to learn basics by yourself here is the starting point : http://www.statmt.org/moses/?n=Moses.Baseline Le 10/12/2015 19:03, Shaimaa Marzouk a écrit : > Dear support team, > > I would be extremely grateful, if you could help me with

Re: [Moses-support] decoder question

2015-12-05 Thread Vincent Nguyen
ively > using across Windows and Posix systems. > > Tom > > > On 12/5/2015 6:13 AM, moses-support-requ...@mit.edu wrote: >> Date: Fri, 4 Dec 2015 23:13:10 + >> From: Ulrich Germann<ulrich.germ...@gmail.com> >> Subject: Re: [Moses-support] decoder questi

[Moses-support] decoder question

2015-12-04 Thread Vincent Nguyen
Actually I don't know if this is a decoder question or such. Here is my issue Let's say I have a text string with 2 sentences, with a period ending the first sentence, but no CR+LF, just a space before the second sentence. When I pass the full string to the pipe : tokenizer + truecaser + moses

Re: [Moses-support] decoder question

2015-12-04 Thread Vincent Nguyen
n I have the feeling that we really need to "sentence-tokenize" first before word-tokenizing. Le 04/12/2015 13:52, John D Burger a écrit : > I think you're asking if Moses translates one sentence at a time. The answer > is yes. > > - John Burger > MITRE > &

[Moses-support] normalize punctuation

2015-12-01 Thread Vincent Nguyen
Hieu, here : http://www.statmt.org/moses/RELEASE-3.0/models/fr-en/config.pb.recase I read : input-tokenizer = "$moses-script-dir/tokenizer/normalize-punctuation.perl $input-extension | $moses-script-dir/tokenizer/tokenizer.perl -a -l $input-extension" output-tokenizer =

[Moses-support] Language model question

2015-11-26 Thread Vincent Nguyen
Hi all, I have a question regarding LMs. Let's take the example of news.2014.shuffle.en When we process it through punctuation normalization for english language, it will for instance put a " " before an apostrophe "it is'nt" = > "it is 'nt" BUT it contains some noise, for instance there is

Re: [Moses-support] Moses on SGE clarification

2015-11-04 Thread Vincent Nguyen
no relative paths. And of course, the binaries need to be executable on all nodes as well. -phi On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: OK guys, not an easy stuff ... I fought to get the prerequisites working but but

Re: [Moses-support] Moses on SGE clarification

2015-10-30 Thread Vincent Nguyen
out : How does Moses steps deal with "Nb of Jobs submitted" versus -threads in the various steps ? Le 29/10/2015 17:45, Vincent Nguyen a écrit : > Ken, > > I just did some further testing on the master node that HAS all installed. > same error as is. > > /netshr/m

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen
) Le 29/10/2015 15:18, Philipp Koehn a écrit : Hi, make sure that all the paths are valid on all the nodes --- so definitely no relative paths. And of course, the binaries need to be executable on all nodes as well. -phi On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <vngu...@neuf

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen
n Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi there, I need some clarification before screwing up some files. I just setup a SGE cluster with a Master + 2 Nodes. to make it clear let say my cluster name i

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen
clear, it runs correctly on the local machine but not when you > run it through SGE? In that case, I suspect it's library version > differences. > > On 10/29/2015 03:09 PM, Vincent Nguyen wrote: >> I get this error : >> >> moses@sgenode1:/netshr/working-en-fr$ /net

Re: [Moses-support] Moses on SAMBA filesystem

2015-10-29 Thread Vincent Nguyen
es on SAMBA is pretty low > priority. However, if you can provide a backtrace (after compiling with > "debug" added to the command) I can try to turn that segfault into an > error message. > > Kenneth > > On 10/29/2015 08:15 PM, Vincent Nguyen wrote: >> it's

Re: [Moses-support] Moses on SAMBA filesystem

2015-10-29 Thread Vincent Nguyen
tuning now so working fine so far btw, in SMB there was another issue with the split command in extraction. Le 29/10/2015 21:44, Vincent Nguyen a écrit : > I'll mount NFS instead and will confirm if working. > thanks > > Le 29/10/2015 21:31, Kenneth Heafield a écrit : >> Hi,

[Moses-support] Moses on SGE clarification

2015-10-28 Thread Vincent Nguyen
Hi there, I need some clarification before screwing up some files. I just setup a SGE cluster with a Master + 2 Nodes. to make it clear let say my cluster name is "default", my master headnode is "master", my 2 other nodes are "node1" and "node2" for EMS : I opened the default

[Moses-support] tokenizer / detokenizer

2015-10-12 Thread Vincent Nguyen
Hello, Pretty sure there is no academic importance to this, but : For the tokenizer we have the -x option to skip XML/HTML tags For the detokenizer it WILL SKIP whatever. cf : while() { if (/^<.+>$/ || /^\s*$/) { #don't try to detokenize XML/HTML tag lines

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Vincent Nguyen
Michael, what score-setting do you use to achieve these results ? if search algo= 1 what cube pruning number ? Le 08/10/2015 19:05, Michael Denkowski a écrit : Hi all, I extended the multi_moses.py script to support multi-threaded moses instances for cases where memory limits the number of

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Vincent Nguyen
LEU/TER/Meteor but this is just one data point and a fairly simple system. I would be curious to see how things work out in other users' systems. Best, Michael On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: out of curiosity, what gain do

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-05 Thread Vincent Nguyen
After many tests, as mentioned before I had made these changes in EMS score-settings = "--GoodTuring --MinScore 2:0.001" and pop limit cube pruning at 400 (instead of 5000 in EMS ) speed is much much higher (without impact on translation) Le 05/10/2015 17:20, Philipp Koehn a écrit : Hi,

[Moses-support] truecase.perl

2015-09-26 Thread Vincent Nguyen
Hello, Quick question regarding this script behavior. Les Banques de la zone Euro sont soumises à : becomes les banques de la zone euro sont soumises à : lowercasing is fine the space between >Les is fine but it did not insert a space between the after the : in : any clue ? Vincent

Re: [Moses-support] truecase.perl

2015-09-26 Thread Vincent Nguyen
actually after > space is always inserted, but before < never inserted. Le 26/09/2015 16:37, Vincent Nguyen a écrit : > Hello, > > Quick question regarding this script behavior. > > Les Banques de la zone Euro sont soumises à : > > becomes > > les banque

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen
/15 a las 16:50, Vincent Nguyen escribió: I agree and would like to. But this is tricky, look at the first 30 lines of my phrase table below. and this happens a lot in the first line of tables where there are or weird codes, EN/FR pairs do not match. ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen
e used: >> 1 ||| One Million Roofs >> >> oui ||| no >> >> To use this list, add the following to your moses.ini file >> >> [feature] >> DeleteRules path=/path/to/list >> >> Not tested. >> >> >> &

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen
er bad translation > options which pop up. > > On Thu, 2015-09-24 at 16:08 +0200, Vincent Nguyen wrote: >> Matthias, >> >> Pruning : >> I use the cube pop limit at 400 instead of default values (1000 or 5000) >> I use the MinScore 0.001 > It seems to me th

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen
try modified Moore-Lewis filtering for data selection. > https://aclweb.org/anthology/D/D11/D11-1033.pdf > > > Cheers, > Matthias > > > On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote: >> This is an interesting subject .. >> >> As a matter

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-23 Thread Vincent Nguyen
ct: Re: [Moses-support] is there a way to remove a bad entry in the phrase table ? To: Vincent Nguyen<vngu...@neuf.fr> Cc: moses-support<moses-support@mit.edu> Hi, you can remove it manually (just edit the text file), there will be no negative consequences. However, it

[Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-22 Thread Vincent Nguyen
Hi, I was wondering if after an analysis of the BLEU-Annotation file we realize that there must be a bad entry in the phrase table, we could remove it manually or in some other ways ? Gracias. V. ___ Moses-support mailing list Moses-support@mit.edu

Re: [Moses-support] Help on pipeline ....

2015-09-17 Thread Vincent Nguyen
. big debate ? Le 16/09/2015 17:30, Vincent Nguyen a écrit : I am struggling with a pipeline . Here is the text1.txt file I would like to translate from FR to EN Les banques de la zone euro sont soumises : au ratio de capital lié à la détention d’actifs risqués (nous nous intéressons

[Moses-support] Help on pipeline ....

2015-09-16 Thread Vincent Nguyen
I am struggling with a pipeline . Here is the text1.txt file I would like to translate from FR to EN Les banques de la zone euro sont soumises : au ratio de capital lié à la détention d’actifs risqués (nous nous intéressons ici au crédit) ; au ratio de levier, qui détermine le capital

[Moses-support] analysis.perl / mteval-v13a.pl / BLEU-annotation

2015-09-14 Thread Vincent Nguyen
Guys, While running EMS with a big test file I realized that the analysis.perl was executed very quickly while the actual Nist-Bleu was much much longer. Also one thing is that the file "BLEU-Annotation" generated during analysis does not contain the right line numbering. it takes 0 as the

Re: [Moses-support] sgm generation for personalized test sets

2015-09-14 Thread Vincent Nguyen
gt; On 9/13/2015 11:01 PM, moses-support-requ...@mit.edu wrote: >> Date: Sun, 13 Sep 2015 10:44:02 +0200 >> From: Vincent Nguyen<vngu...@neuf.fr> >> Subject: Re: [Moses-support] sgm generation for personalized test sets >> To: moses-support<moses-support@mit.edu>

Re: [Moses-support] sgm generation for personalized test sets

2015-09-13 Thread Vincent Nguyen
in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt files. python handles them as additional line breakers. Le 12/09/2015 22:07, Vincent Nguyen a écrit : > Hi, > > What script do you guys use to generate sgm sets based on txt file ? > > I have tried makemteva

[Moses-support] sgm generation for personalized test sets

2015-09-12 Thread Vincent Nguyen
Hi, What script do you guys use to generate sgm sets based on txt file ? I have tried makemteval.py in contrib but there are a few issues. I think these lines: lines = [l.replace('','\"').replace('','\'').replace('','>').replace('','<').replace('','&') for l in filein.read().splitlines()]

[Moses-support] Incremental / combination theory question

2015-09-07 Thread Vincent Nguyen
Hi experts, I have a question about the phrase table theory. If we take a corpus A to create a TM model TMA and a LM model LMA. if we consider a corpus B. Method 1 : We add corpus B to A => corpus AB => TM-AB and LM-AB Method 2: We process corpus B => TMB and LMB then we combine TMA + TMB and

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-09-04 Thread Vincent Nguyen
, 2015 at 10:33 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: is there any benchmark on what value / what impact ? what should I start with as a test 0.001 ? the standard value 0.0001 seems really really low to me maybe I am not getting what t

[Moses-support] Translation Model binarizing step in EMS - multicore ?

2015-09-02 Thread Vincent Nguyen
Hi, Unless I am mistaken, it seems that binarizing the TM step in EMS in not multi core. ttable-binarizer = "$moses-bin-dir/processPhraseTableMin" [training] training-options = "-mgiza -mgiza-cpus 8 -sort-compress gzip -sort-parallel 4 -cores 4" binarize-all =

[Moses-support] Several Issues with Baseline and EMS

2015-09-02 Thread Vincent Nguyen
if you're new to linux you will fight for ever. I would probably go to Slate instead for sure. Le 02/09/2015 17:34, Anita Pal a écrit : For the time being, I'm trying to finish building the baseline system. I've just been following the commands as listed on the Moses website. It's still not

Re: [Moses-support] really weird phrase table crash .....

2015-09-02 Thread Vincent Nguyen
Le 01/09/2015 17:41, Christophe Servan a écrit : > Hello Vincent, > Did you checked whether you have enough disk space? > > Best, > > Christophe > > > -Message d'origine- > De : moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] De > la par

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-09-01 Thread Vincent Nguyen
-orphan-phrase-pairs-from-reordering-table.perl -phi On Mon, Aug 31, 2015 at 10:50 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: thanks, will try and post results. just to be clear: I can re-use the previous extract file I have to rebuild the

Re: [Moses-support] clarification CBPT vs MMSAPT

2015-09-01 Thread Vincent Nguyen
2015 at 1:11 PM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi Uli, For your point3. here is what I would like to do / understand : I have an LM and a TM built with EMS but alignment being done by FastAlign. So there is no vcb files for the base

Re: [Moses-support] really weird phrase table crash .....

2015-09-01 Thread Vincent Nguyen
yes plenty. Le 01/09/2015 17:41, Christophe Servan a écrit : > Hello Vincent, > Did you checked whether you have enough disk space? > > Best, > > Christophe > > > -Message d'origine- > De : moses-support-boun...@mit.edu [mailto:moses-support-boun...@mi

[Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen
Hi, Here are some results with several values with cube pruning pop limit : (pop limit / decoding time for 3000 sentences / BLEU score) 5000 - 15m45 - 29.59 1000 - 4m27 - 29.59 500 - 3m35 - 29.59 200 - 3m15 - 29.51 100 - 3m00 - 29.40 Therefore I took 400 - 3m19 - 29.58 If I am not mistaken

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen
nScore 2:0.0001" in EMS. -phi On Mon, Aug 31, 2015 at 3:03 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Hi, Here are some results with several values with cube pruning pop limit : (pop limit / decoding time for 3000 sentences / BLEU

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen
: hI, 0.0001 should have no impact on translation quality, 0.001 will have some impact 0.01 is probably a bit too drastic. But that's the range you should explore. -phi On Mon, Aug 31, 2015 at 10:33 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote:

Re: [Moses-support] MMSAPT in EMS questions

2015-08-27 Thread Vincent Nguyen
: - EMS includes the mmsapt option to train and binarize the arrays - EMS does NOT include the part of incrementally adding the new data in an automated way. Has to be done manually. Am I understanding things properly ? Le 23/08/2015 09:06, Vincent Nguyen a écrit : Hello, I have a few

[Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen
Guys, I tried the mt adaptive server package from Matecat and I am fighting for the past 3 days but I think now I know why. the mt adaptive application uses some undocumented -print-passthrough option in moses. then I saw some functions to actually Output the passthrough info to STDOUT in

Re: [Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen
, Prashant Mathur a écrit : Hi Vincent, Forgot to tell you that the adaptive MT server works with Moses Release 1.0 There is another version on github which works with the latest version. Try this out. https://github.com/hlt-mt/adaptiveMT —Prashant On Aug 25, 2015, at 9:39 AM, Vincent

Re: [Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen
more about it. I am not familiar with the other parts of code. —Prashant On Aug 25, 2015, at 11:02 AM, Vincent Nguyen vngu...@neuf.fr mailto:vngu...@neuf.fr wrote: well 2 things : - I still don't see any of the methods OutputPassthroughInformation in the previous version of moses

[Moses-support] MMSAPT in EMS questions

2015-08-23 Thread Vincent Nguyen
Hello, I have a few questions on running MMSAPT within EMS. I am refering to the doc here : http://www.statmt.org/moses/?n=Advanced.Incremental and to the sections of the config.basic file of EMS. 1) the doc says initial training run EMS as usual but use modified version of Giza++ and add

Re: [Moses-support] clarification CBPT vs MMSAPT

2015-08-20 Thread Vincent Nguyen
/~riezler/publications/papers/MTJOURNAL2014.pdf http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf [2] http://mt4cat.org/software/adaptive-mt-server On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen vngu...@neuf.fr mailto:vngu...@neuf.fr wrote

Re: [Moses-support] sigtest filtering reordering

2015-08-19 Thread Vincent Nguyen
-entries.perl (someting like that, I am writing this from memory.). You give the pruned phrase-table and the unpruned reordering model to the script, and the script takes care that the contents match. The good thing is, is hardly requires any RAM. Best, Marcin W dniu 2015-08-19 13:44, Vincent Nguyen

[Moses-support] sigtest filtering reordering

2015-08-19 Thread Vincent Nguyen
Hi, it crashed (whereas the sigtest filetring ttable continues ...) and no message for disk space nor out of memory. just a simple killed at the end of the stderr, any clue ? -l = a+e P(f|e) filter limit: 50 Loading Vocabulary... Loading existing vocabulary file:

[Moses-support] clarification CBPT vs MMSAPT

2015-08-19 Thread Vincent Nguyen
Hello support, Going into advanced features of Moses, I am a bit confused by the differences and therefore which path to follow, regarding the 2 features CBPT and MMSAPT. I have the feeling the ultimate goal of both is the same but maybe I am wrong. Can someone explain the actual difference

[Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen
the build-osm crashes in EMS with following error any clue ? 23396000 23397000 23398000 23399000 2340Converting Bilingual Sentence Pair into Operation Corpus Executing: /home/moses/mosesdecoder/bin/generateSequences /home/moses/working/model/OSM.2//e /home/moses/working/model/OSM.2//f

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen
ran out of disk space. Can you find the stderr of lmplz? Kenneth On 08/16/2015 11:11 AM, Vincent Nguyen wrote: the build-osm crashes in EMS with following error any clue ? 23396000 23397000 23398000 23399000 2340Converting Bilingual Sentence Pair into Operation Corpus Executing

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen
/2015 20:02, Vincent Nguyen wrote: right but the config file is the config.basic from which I uncommented the 3 lines for OSM. So I guess the parameters are redundant with what is in the perl script. which one to keep ? either way there is something to correct in the github. Le 16/08/2015 17

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen
a double declaration of -S when running lmplz. That's either a mistake in the config file or in the script On 16/08/2015 14:11, Vincent Nguyen wrote: the build-osm crashes in EMS with following error any clue ? 23396000 23397000 23398000 23399000 2340Converting Bilingual Sentence Pair

Re: [Moses-support] Domain adaptation

2015-08-14 Thread Vincent Nguyen
selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent

[Moses-support] Easiest way to tune with several data sets ?

2015-08-12 Thread Vincent Nguyen
Hi, I am wondering if I could get better results with a larger tuning data set. Is there a way in EMS to cumulate several data set files or do I need to concatenate sets. is last option, how can I do this easily ? just concat the sgm files ? thanks, Vincent

Re: [Moses-support] EMS results - makes sense ?

2015-08-10 Thread Vincent Nguyen
, Vincent Nguyen wrote: thanks for your insights. I am just stuck by the Bleu difference between my 26 and the 30 of WMT11, and some results of WMT14 close to 36 or even 39 I am currently having trouble with hierarchical rule set instead of lexical reordering wondering if I will get better

Re: [Moses-support] EMS results - makes sense ?

2015-08-10 Thread Vincent Nguyen
for the system description (like in table 6 in the UEDIN paper). best wishes, Rico On 10/08/15 08:32, Vincent Nguyen wrote: similarly reading the WMT14 paper from UEDIN, If not mistaken I read : 35.9 in the matrix : http://matrix.statmt.org/systems/show/2106 31.76 for B1 best system on page

Re: [Moses-support] how much disk sapce for the Giga fr-en corpus ?

2015-08-09 Thread Vincent Nguyen
and no more, you're gonna have a hard time doing the rest of the experiments. Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu On 8 August 2015 at 13:55, Vincent Nguyen vngu...@neuf.fr mailto:vngu...@neuf.fr wrote: Hi, I keep adding 100GB on my space

Re: [Moses-support] EMS results - makes sense ?

2015-08-09 Thread Vincent Nguyen
, Vincent Nguyen wrote: Hi, Just a heads up on some EMS results, to get your experienced opinions. Corpus: Europarlv7 + NC2010 fr = en Evaluation NC2011. 1) IRSTLM vs KenLM is much slower for training / tuning. that sounds right. KenLM is also multithreaded, IRSTLM can only be used

Re: [Moses-support] how much disk sapce for the Giga fr-en corpus ?

2015-08-09 Thread Vincent Nguyen
some newstest data sets from several years for tuning. does it help a lot to tune with bigger sets ? Cheers, Vincent Le 09/08/2015 13:47, Vincent Nguyen a écrit : I think at 400GB I was not very far. 500GB was more than enough without the -sort-compress gzip options. Now it's binarizing

[Moses-support] how much disk sapce for the Giga fr-en corpus ?

2015-08-08 Thread Vincent Nguyen
Hi, I keep adding 100GB on my space, even at 400GB it crashed at sorting time after the extract tables now trying 500GB Will I need more ? is there a rule ? cheers, Vincent ___ Moses-support mailing list Moses-support@mit.edu

Re: [Moses-support] EMS results - makes sense ?

2015-08-06 Thread Vincent Nguyen
it running with mgiza it will still take a week or so. Just add fast-align-settings = -d -o -v to the TRAINING section of ems, and make sure that fast_align is in your external-bin-dir. cheers - Barry On 06/08/15 08:40, Vincent Nguyen wrote: so I dropped my hierarchical model since I got

Re: [Moses-support] EMS results - makes sense ?

2015-08-06 Thread Vincent Nguyen
external-bin-dir. cheers - Barry On 06/08/15 08:40, Vincent Nguyen wrote: so I dropped my hierarchical model since I got an error. Switched back to the more data by adding the Giga FR EN source but now another error pops un running Giza Inverse : Using SCRIPTS_ROOTDIR: /home

Re: [Moses-support] EMS results - makes sense ?

2015-08-06 Thread Vincent Nguyen
if you manage to get it running with mgiza it will still take a week or so. Just add fast-align-settings = -d -o -v to the TRAINING section of ems, and make sure that fast_align is in your external-bin-dir. cheers - Barry On 06/08/15 08:40, Vincent Nguyen wrote

Re: [Moses-support] EMS results - makes sense ?

2015-08-06 Thread Vincent Nguyen
you will need more disk. For fr-en/en-fr it's probably not worth the extra effort, cheers - Barry On 04/08/15 15:58, Vincent Nguyen wrote: thanks for your insights. I am just stuck by the Bleu difference between my 26 and the 30 of WMT11, and some results of WMT14 close to 36 or even 39 I am

  1   2   >