Re: [Moses-support] decoder question

2015-12-04 Thread John D Burger
I think you're asking if Moses translates one sentence at a time. The answer is yes. - John Burger MITRE > On Dec 4, 2015, at 04:43, Vincent Nguyen wrote: > > Actually I don't know if this is a decoder question or such. > > Here is my issue > > Let's say I have a text

Re: [Moses-support] Major bug found in Moses

2015-06-24 Thread John D. Burger
On Jun 24, 2015, at 10:47 , Read, James C jcr...@essex.ac.uk wrote: So you still think it's fine that the default would perform at 37 BLEU points less than just selecting the most likely translation of each phrase? Yes, I'm pretty sure we all think that's fine, because one of the steps of

Re: [Moses-support] phrase table

2015-01-15 Thread John D Burger
I've observed this as well. It seems to me there are several competing pressures affecting the number of ngram types in a corpus. On the one hand, as the size of the corpus increases, so does the vocabulary. This obviously increases the number of unigram types (which is the same as the

Re: [Moses-support] Moses tokenizer treats combining diaeresis inconsistently

2014-12-29 Thread John D Burger
This is also a reason to turn Unicode normalization on. If the tokenizer did NFKC at the beginning, then the problem would go away. If I understand the situation correctly, this would only fix this particular example and a few others like it. There are many base+combining grapheme clusters

Re: [Moses-support] Problems installing IRSTLM

2014-04-10 Thread John D. Burger
...@ed.ac.uk wrote: what OS are you on and do you have libtool (or glibtool on macport/osx)? i sometimes see this on older machines On 8 April 2014 18:52, John D. Burger j...@mitre.org wrote: I should add that simply creating the subdirectory doesn't work, later steps expect to find something

[Moses-support] Problems installing IRSTLM

2014-04-08 Thread John D. Burger
Hi - I'm having autotools troubles while installing irstlm-5.80.03 per the directions here: http://www.statmt.org/moses/?n=Moses.Baseline On the very first step I get this: ./regenerate-makefiles.sh Calling /usr/bin/libtoolize You should add the contents of

Re: [Moses-support] Problems installing IRSTLM

2014-04-08 Thread John D. Burger
I should add that simply creating the subdirectory doesn't work, later steps expect to find something there. - JB On Apr 8, 2014, at 13:40 , John D. Burger j...@mitre.org wrote: Hi - I'm having autotools troubles while installing irstlm-5.80.03 per the directions here: http

Re: [Moses-support] No Moses translation without model =3

2014-03-06 Thread John D. Burger
On Mar 6, 2014, at 16:00 , Momo Jeng momo_j...@outlook.com wrote: I'm having a problem getting results from Moses, although I think it's really a problem with GIZA++; please let me know if there's a better place for GIZA questions. When I run Moses instructing GIZA++ to only do model1

Re: [Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread John D. Burger
The default tokenizer script only knows specific rules for a few languages. The fallback (English) rules may suffice for your purposes, they do the obvious thing with spaces and English punctuation, and also handle some special cases for abbreviations like Mr. and Mrs.. I'd suggest you

Re: [Moses-support] Regarding BLEU Score

2013-12-02 Thread John D. Burger
Blue scores are not commensurate even between different corpora in the same translation direction. Bleu is really only comparable for different systems or system variants on the exact same data. In the case of the same corpus in two directions, an imperfect analogy might be gas mileage between

Re: [Moses-support] -lm training parameter

2013-11-04 Thread John D. Burger
We've done something like this in the past. The fact that the check for a non-empty LM happens at the very beginning is somewhat annoying if you have a setup that builds the phrase models and language models in parallel, for instance on a cluster. - JB On Nov 4, 2013, at 07:48 , Tom Hoar

Re: [Moses-support] Trivial Sentence Alignment

2013-07-22 Thread John D. Burger
If you treat entire paragraphs as segments, then you'll presumably end up with very long segments. This will make it difficult to get good alignments, and so the resulting models may be of poor quality. Also note that there will be nothing to prevent the extracted phrases from spanning

Re: [Moses-support] Tuning

2013-03-15 Thread John D. Burger
We did some experiments a long time ago on tuning set size (for Chinese to English). For the standard Moses setup, there are only a dozen or so meta-features to find weights for, so it's no surprise that improvements asymptote sharply after the tuning set gets much bigger than 1-2000 segment

Re: [Moses-support] eos marker

2013-02-12 Thread John D. Burger
This sounds like our workaround. Just to make sure I understand, Tom, it sounds like you add your own extra markers to everything, both for alignment and language modeling, so the parallel files look like this (using ss and /ss instead of your music symbols): ss das ist ein kleines haus .

Re: [Moses-support] Creating Language Model from google 1gram file

2013-01-24 Thread John D. Burger
If you move the count field to the beginning of the line, you can use the -text-has-weights switch of ngram-counts: -text-has-weights Treat the first field in each text input line as a weight factor by which the N-gram counts for that line are to be multiplied. More here:

Re: [Moses-support] Placeholder drift

2012-07-31 Thread John D Burger
Are there any such placeholders in your language modeling data and your parallel training data? If not, all the models are going to treat them as unknown words. In the case of the language model, it doesn't surprise me too much that the placeholders all get pushed together, as that will

Re: [Moses-support] Placeholder drift

2012-07-31 Thread John D Burger
-Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger Gesendet: 31 July 2012 16:09 An: Henry Hu Cc: moses-support@mit.edu Betreff: Re: [Moses-support] Placeholder drift Are there any such placeholders in your language

Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-27 Thread John D Burger
Daniel Schaut wrote: To conclude, one could say that I’ve created an engine suitable for a specific domain? However, the engine’s performance outside my domain equals almost to zero? This is always a problem, especially with statistical MT. For example, we've evaluated high-performing

Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread John D Burger
I =think= I recall that pairwise BLEU scores for human translators are usually around 0.50, so anything much better than that is indeed suspect. - JB On Apr 26, 2012, at 14:18 , Daniel Schaut wrote: Hi all, I’m running some experiments for my thesis and I’ve been told by a more

Re: [Moses-support] train-model.perl not fully --parallel

2011-12-05 Thread John D Burger
Mark Fishel wrote: the --parallel switch of the train-model.perl script is only effective during the first 2 steps -- is there a good reason not to make the phrase scoring (step 6) parallel? Currently it contains a 'for my $direction (f2e,e2f)...', and on a large corpus the scoring can take

[Moses-support] Binary IRSTLM models

2008-08-12 Thread John D. Burger
Hi - We switched to using IRSTLM recently, in order to build bigger language models. I am starting to think, however, that the entire model is still being loaded into memory. Here's part of what Moses prints out now: Start loading LanguageModel /net/tidesserver/tidesserver_raid7/clasr/

[Moses-support] Fwd: decoding: reordering only

2008-08-05 Thread John D. Burger
Oops, forgot to CC the list. From: John D. Burger [EMAIL PROTECTED] Date: August 4, 2008 13:30:30 EDT To: [EMAIL PROTECTED] Subject: Re: [Moses-support] decoding: reordering only Sanne Korzec wrote: Is there a way to force the moses or pharaoh decoder, to use a certain set of phrases

Re: [Moses-support] Trying to debug reduced performance with new Moses

2008-08-05 Thread John D. Burger
Hi - I'm still trying to debug my differences between old and new versions of Moses, which (for us) use SRILM and IRSTLM respectively. My current puzzle is over the very different sizes of the language models resulting from SRILM and IRSTLM - the latter has 5 times as many 5-grams, for

Re: [Moses-support] Trying to debug reduced performance with new Moses

2008-08-05 Thread John D. Burger
Miles Osborne wrote: by default the srilm prunes singletons OK, that's good to know. But when I prune the IRST LM, I still get lots =more= 4-grams than the SRI LM, but lots =fewer= 5-grams (although less than a factor of two in either case). But perhaps I'm a bit in the weeds here ... :)

Re: [Moses-support] Trying to debug reduced performance with new Moses

2008-08-02 Thread John D. Burger
comparing it against? It's almost exactly a year old, sadly. What's the easiest way to tell what version it is? Miles asked about the size of the tuning set - it's 812 segments. That's not that small, is it? Thanks for your prompt replies and suggestions. - John D. Burger MITRE

Re: [Moses-support] Trying to debug reduced performance with new Moses

2008-08-02 Thread John D. Burger
Miles Osborne wrote: i'd check to see how unknown words are handled in either the SRILM or in IRSTLM --that may explain the differences Ah, good suggestion, thanks - OOV is very high in this data. (as for the size of a tuning set, the more the better; right now i'm doing Europarl runs

Re: [Moses-support] recaser

2008-07-17 Thread John D. Burger
Sanne Korzec wrote: I am having trouble understanding what the recaser is doing exactly when evaluating a (dev) test set. Why do we need to train a recaser? Because the default setup in Moses is to train caseless models. This is done by lowercasing the parallel corpus before anything

Re: [Moses-support] Non-deterministic GIZA?

2008-07-16 Thread John D. Burger
installed it, but I'm certain it was before that. I will try the newer version - thanks! - John D. Burger MITRE ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] OT: LDC2004E12

2008-07-14 Thread John D. Burger
Ham, Michael wrote: Those escape numbers are Unicode characters. The Chinese character set does not exist in ASCII, so you have to use UTF-8. Sorry if I wasn't clear: I'm talking about the Chinese side of LDC2004E12, which is not in ASCII or Unicode, it's in GB18030. Apparently,

Re: [Moses-support] mert parallel takes longer than non-parallel?

2008-06-06 Thread John D. Burger
Yee Seng Chan wrote: However, when I tried to parallelize it by submitting say.. 10 jobs, I don’t get faster MERT iterations. In fact, it’s slower. Sometimes, a job can be stuck on one of the grid nodes and after hours, it’s still not completed. Its corresponding output-file e.g…

Re: [Moses-support] Giza HMM errors - NAN

2008-03-25 Thread John D. Burger
Chris Dyer wrote: I haven't looked into what's causing the particular problem on this corpus, but another known problem with the GIZA HMM model is that it doesn't do a fairly standard kind of normalization in the forward-backward training, which causes underflow errors in some sentences