Re: [Moses-support] Moses seems to hang

2010-05-26 Thread Philipp Koehn
Hi, the lm switch requires three items for each language model: factor, order, filename, and type. So, the training ":0" you are referring to is the type (0=srilm). If it is omitted, the value "0" is assumed. -phi On Wed, May 26, 2010 at 3:54 AM, wrote: > Recently, these two threads have disc

Re: [Moses-support] EMS and data preprocessing

2010-05-25 Thread Philipp Koehn
ercase+recase or truecase+detruecase? > > Suzy > > On 25/05/10 8:47 PM, Philipp Koehn wrote: >> >> Hi Suzy, >> >> I could re-produce this error in a way that I assume is what you did. >> You changed the specification of the CORPUS, but you did not >> di

Re: [Moses-support] consolidate-training-data.perl: not found

2010-05-25 Thread Philipp Koehn
model to limited memory such as on-disk > loading of phrase-based model. > > I mean the latest released version (moses-2010-04-26). > > -- > Hwidong Na > KLE lab, POSTECH, KOREA > > > 2010-05-24 (월), 13:06 +0100, Philipp Koehn: >> Hi, >> >> t

Re: [Moses-support] EMS and data preprocessing

2010-05-25 Thread Philipp Koehn
Hi Suzy, I could re-produce this error in a way that I assume is what you did. You changed the specification of the CORPUS, but you did not disable the truecaser. You need to comment out the following settings: [TRUECASER] ### script to train truecaser models # #trainer = $moses-script-dir/reca

Re: [Moses-support] running two moses versions

2010-05-24 Thread Philipp Koehn
Hi, yes, it is no problem to have multiple versions of the decoder, as long as they are in different directories. -phi On Mon, May 24, 2010 at 9:02 PM, Korzec, Sanne wrote: > Hi All, > > I have a checkout from the repository on my system from 2008. > > Is it possible for me to do another checko

Re: [Moses-support] consolidate-training-data.perl: not found

2010-05-24 Thread Philipp Koehn
Hi, there have been some updates over the last two weeks, but the current version should work - as you also indicate. Let me know, if there are any other bugs in the latest revision. What is the "stable version" you are referring to? -phi On Mon, May 24, 2010 at 9:28 AM, Hwidong Na wrote: > Th

Re: [Moses-support] experiment management system and Moses scripts

2010-05-18 Thread Philipp Koehn
Hi, thanks for trying out experiment.perl, please let me know of any other problems you encounter. The two solutions you indicate are both correct. I just updated in the main branch ~/mosesdecoder/scripts/released_files so that should be up to date now. -phi On Tue, May 18, 2010 at 3:23 AM, Suz

Re: [Moses-support] Many possible translations of one word

2010-04-28 Thread Philipp Koehn
> best word, it chooses any word which is not necessarily the good one. > > > Regards > S > > > > 2010/4/27, Philipp Koehn : >> Hi, >> >> I am not entirely sure what the state of the code is here. >> Looking at the code (XmlOption.cpp), it seems to

Re: [Moses-support] Many possible translations of one word

2010-04-27 Thread Philipp Koehn
Hi, I am not entirely sure what the state of the code is here. Looking at the code (XmlOption.cpp), it seems to require to use double bars ("||") to separate the options. See if that works, otherwise please help out with getting the code into shape. -phi On Wed, Apr 21, 2010 at 10:18 AM, SG wr

Re: [Moses-support] convert files from treetiger parser to moses forma for syntax trees

2010-04-19 Thread Philipp Koehn
Hi, this seems to be the output of a tagger and not a syntactic parser, so it is not suitable for a tree-based model. -phi On Mon, Apr 19, 2010 at 6:39 PM, haithem afli wrote: > > Hi; > I'm trying to train Syntax Model . I use Treetiger to annotate the english > trainning data and Amira for ara

Re: [Moses-support] About lexical step

2010-04-16 Thread Philipp Koehn
Hi, in all likelihood this is the result of noisy characters that are not visible but are treated as tokens. -phi On Fri, Apr 16, 2010 at 8:57 AM, Thu, Vuong Hoai wrote: > Hi Hieu, > > When I read carefully output from training model, I determined that in > lexical file (e2f and f2e) have some

Re: [Moses-support] changing moses.ini manually!

2010-04-07 Thread Philipp Koehn
Hi, On Sat, Apr 3, 2010 at 7:26 AM, Somayeh Bakhshaei wrote: > 1. I have used  5 order  LM, and run the tuning step but in moses.ini file > still the LM order is 3! How it is may? The training script gets the language model from the specified user settings, so you need to specify the right orde

Re: [Moses-support] Experiment.perl publications & documentation

2010-04-06 Thread Philipp Koehn
Hi, this is still work in progress - the documentation at the time of the MT Marathon is here: http://www.statmt.org/mtm4/?n=Main.EMSDocumentation -phi On Tue, Apr 6, 2010 at 3:14 PM, Lane Schwartz wrote: > Hi, > > At the MT Marathon, Jon and the other LoonyBin folks presented a paper > descri

Re: [Moses-support] What have I to do?

2010-04-05 Thread Philipp Koehn
Hi, if you work on a particular language pair, I suggest to build a baseline system, analyze the mistakes and consider what needs to be done to improve it. There is typically something that can be done with regard to morphology or reordering. At least you will learn something about the problem of

Re: [Moses-support] beam search limit & stack size

2010-04-05 Thread Philipp Koehn
Hi, the relative limit -b is set very permissive to the point that it has not practical impact. The stack size -s is 200 by default. You can set -s to values around 10-1000 and the parameter -b to 0.001-0.5. See yourself what the effect is (speed/quality). You may also want to look into cube pru

Re: [Moses-support] Threads, moses, and expensive features

2010-03-30 Thread Philipp Koehn
Hi Lane, multi-threading in Moses works the same: different sentences are distributed to different threads. Adding stateful feature functions has two costs: one is the calculation of the function and one is the additional state splitting which hurts recombination is the beam search. The first cos

Re: [Moses-support] training fails on 1.4million fr-en sentence pairs

2010-03-30 Thread Philipp Koehn
Hi, if snt2cooc is causing trouble, I suggest to run the training script with the additional option "--parts 4", which splits up the data for snt2cooc. -phi On Mon, Mar 29, 2010 at 2:43 PM, John Burger wrote: >> C:\cygwin\home\moses\tools\bin\snt2cooc.out: *** fatal error - cmalloc >> would hav

Re: [Moses-support] different bleu scores from nist and moses scripts

2010-03-20 Thread Philipp Koehn
Hi, the NIST script does internal tokenization, while the multi-bleu script assumes that the data is already tokenized. There is also a difference with the brevity penalty in the case of multiple reference translations. -phi On Fri, Mar 19, 2010 at 7:49 PM, Adam Lopez wrote: > IIRC, the princip

Re: [Moses-support] Can't change reordering model

2010-03-20 Thread Philipp Koehn
Hi, thanks for pointing out the error - I fixed the web page. -phi On Fri, Mar 19, 2010 at 10:00 AM, Sara Stymne wrote: > Hi Maria! > > The model msd-bidirectional-e will not work, since you can only > condition lexical reordering models on either the foreign phrase (f) or > both the foreign an

Re: [Moses-support] Increase in LM order results in Lower Bleu score !

2010-03-17 Thread Philipp Koehn
Hi, there is also that a larger language model leads to state splitting during decoding, so you may have to try larger beam sizes to see gains. -phi On Wed, Mar 17, 2010 at 8:14 PM, Hieu Hoang wrote: > hi somayeh > > if you change the LM order, you should change the ini file so the decoder > k

Re: [Moses-support] About mert-moses in mose-chart

2010-03-16 Thread Philipp Koehn
Hi, lexicalized reordering cannot be used in hierarchical models. -phi On Tue, Mar 16, 2010 at 8:49 AM, Bui Hung wrote: > Dear Sir, > > When I use mert-moses in Mose-chart with the command: > nohop nice ./mert-moses.pl > working-dir/tuning/nc-dev2007.lowercased.frworking-dir/tuning/lowercased

Re: [Moses-support] Moses-support Digest, Vol 41, Issue 3

2010-03-04 Thread Philipp Koehn
Hi, there is probably something wrong with the meta data in the xml files. Do they have matching IDs, language names, etc.? Check the NIST web site for the proper format. -phi On Tue, Mar 2, 2010 at 1:27 PM, Somayeh Bakhshaei wrote: > Hi every body, > > I am at the end of running moses, :) > my

Re: [Moses-support] about the moses-chart reordering

2010-02-24 Thread Philipp Koehn
table > Best regards! > > Jie Jiang > CNGL, School of Computing, > Dublin City University, > Glasnevin, Dublin 9. > Tel: +353 (0)1 700 6724 > > > > > 2010/2/24 Philipp Koehn >> >> Hi, >> >> the relationship between the Xs is encoded as wor

Re: [Moses-support] about the moses-chart reordering

2010-02-24 Thread Philipp Koehn
could be calculated from another > reordering-table that is generated in the training phase? > > Best regards! > > Jie Jiang > CNGL, School of Computing, > Dublin City University, > Glasnevin, Dublin 9. > Tel: +353 (0)1 700 6724 > > > > > 2010/2/24 Philipp Koehn

Re: [Moses-support] about the moses-chart reordering

2010-02-24 Thread Philipp Koehn
Hi, the best reference would be David Chiang's Computational Linguistics article on hierarchical phrase-based models. -phi On Wed, Feb 24, 2010 at 11:21 AM, Jie Jiang wrote: > Dear all: > > Could you tell me where can I find the details of moses-chart reordering? > It seems that the reordering

Re: [Moses-support] sentence score and confidence indicator

2010-02-17 Thread Philipp Koehn
Hi Francois, thanks for raising this interesting problem. The translation probability that Moses provides is unlikely to be a good indicator of the quality of the translation, since it will be dominated by the language model component score. In other words, it is more an indicator of how many unu

Re: [Moses-support] Clean-corpus and tokenization problem

2010-02-09 Thread Philipp Koehn
Hi, which corpus are you talking about - where did you get it from and what kind of processing did you do besides tokenization? At some point the number of lines got out of sync, and you should narrow down when that happened. -phi On Tue, Feb 9, 2010 at 9:24 AM, Pavani Y wrote: > Hi All, > >

Re: [Moses-support] moses_chart: usage information for phrase_extract

2010-02-08 Thread Philipp Koehn
Hi, thanks for catching this - I fixed it. -phi On Fri, Feb 5, 2010 at 1:01 AM, Christof Pintaske wrote: > Hi, > > here's a minor discovery in phrase_extract > > phrase_extract does not give any usage information, even though it seems > somebody had the intention to do so: > >    if (argc < 1)

Re: [Moses-support] hypothesis scores

2010-02-03 Thread Philipp Koehn
HI, yes, GetScore() does not include the future score, just the weighted partial score computed so far. The difference to hypo->GetPrevHypo()->GetScore() is the transition cost from the prior best hypothesis. This number should be negative, excecpt in rare cases (for instance adding a common word

Re: [Moses-support] Moses liscencing terms when used in a commercial product

2010-02-01 Thread Philipp Koehn
Hi, Moses has a very liberal license (LGPL) that allows it to be used in commercial products free of charge. We would appreciate a appropriate mention of Moses. -phi On Mon, Feb 1, 2010 at 7:09 AM, wrote: > Hi Philipp, > >  We are very interested to use moses for our language translation purpo

Re: [Moses-support] moses-web : problem printing the web page while translation done OK on the moses server

2010-01-21 Thread Philipp Koehn
Hi, are you using your own installation, or are you referring to http://demo.statmt.org/ ? -phi On Thu, Jan 21, 2010 at 9:24 PM, besacier wrote: > hi > > i am experiencing  a problem using moses-web : sometimes, i don't get > the translated web page on the navigator  while the translation is fu

Re: [Moses-support] moses-chart: "buggy line" in extract.o.sorted

2010-01-20 Thread Philipp Koehn
Hi, it seems you are using hierarchical rules and lexicalized reordering at the same time. This is asking for trouble... -phi On Wed, Jan 20, 2010 at 11:05 PM, Christof Pintaske wrote: > Hi, > > my "extract.o.gz" respectively "extract.o.sorted" produce a large number > of error messages: "buggy

Re: [Moses-support] Is moses manual open-source with tex?

2010-01-06 Thread Philipp Koehn
Hi Bill, here you go: http://www.statmt.org/moses/manual-tex.tgz This is the current snapshot, we will not update it. -phi On Wed, Dec 23, 2009 at 6:05 AM, Bill_Lang(Gmail) wrote: > Hi friends, >   I know that moses manual is daily compiled by tex. Is it open-source > also? If possible, I

Re: [Moses-support] POS LM

2009-12-29 Thread Philipp Koehn
Hi Doren, please read the tutorial on factored models: http://www.statmt.org/moses/?n=Moses.FactoredTutorial -phi On Tue, Dec 29, 2009 at 5:23 AM, EILMT Project wrote: > Hi > > There is pos.lm of the target language in factored model training. I want > to know the steps involved in preparing t

Re: [Moses-support] tuning tree-based models

2009-12-28 Thread Philipp Koehn
Hi, this should work... Does the moses process generate a proper n-best list file? There may be something wrong with running the decoder. Regarding the section "non-terminals" in the moses.ini file - don't worry, this is just a list of special non-terminals that are used for unknown words etc. -

Re: [Moses-support] running giza in parts

2009-12-25 Thread Philipp Koehn
Hi, the running in parts options only affects the "cooc" file creation - which is mostly for memory efficiency, so GIZA++ does not run out of memory. It only makes sense to use this option, if the cooc file creation runs out of memory. -phi On Wed, Dec 23, 2009 at 9:32 PM, Mark Fishel wrote: >

Re: [Moses-support] moses threads compilation problem (with RandLM)

2009-12-16 Thread Philipp Koehn
Hi Alex, unfortunately the randomized language model implementation is not yet thread-safe, so it is incompatible with the multi-threaded Moses. Only the SRILM interface is currently supported. -phi On Wed, Dec 16, 2009 at 4:28 PM, Alexander Fraser wrote: > Hi Barry and other folks, > > I'm als

Re: [Moses-support] About the hierarchical model of Moses

2009-12-14 Thread Philipp Koehn
Hi, one way to reduce the size of the rule table is to enforce a lower span size for the rules, for instance: train-model.perl [...] -extract-options="--MaxSpan 8" The default is 12. -phi 2009/12/14 zhmmc : > Hi > Now I find a problem when I'm training a hierarchical model with script > of

Re: [Moses-support] compiling moses 3 chart

2009-12-08 Thread Philipp Koehn
Hi, try to compile without the --with-berkeleydb switch and everything should work. -phi On Tue, Dec 8, 2009 at 4:19 PM, John Morgan wrote: > Hello, > I guess this question is for Hugh. > I'm trying to compile the new moses chart parsing decoder in the > mt3_chart directory. > I have the berkel

[Moses-support] Hierarchical and syntax-based decoding in Moses

2009-12-07 Thread Philipp Koehn
smoothing for better probabilities of low-count phrase pairs. Almost all of the tree-based code was written by Hieu Hoang, who deserves full credit for this. Regards, Philipp Koehn ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu

Re: [Moses-support] Aligned phrase counts

2009-12-07 Thread Philipp Koehn
Hi, there is no option to report this. I suggest to change the score.cpp code so that the counts are reported somewhere as they are processed. Or, sort the extract files and and count up phrase pairs. -phi On Mon, Dec 7, 2009 at 7:32 AM, Delip Rao wrote: > Hi, > Is there a way to find absolut

Re: [Moses-support] Need help in Moses implementation

2009-12-06 Thread Philipp Koehn
Hi, I do not fully understand your tokenization issues, but you should look into writing a tokenizer that suits your needs. The Moses decoder is agnotistic about tokenization. Regarding the demo: Look at the documentation on the Moses web site on the Moses server implementation. You can also spee

[Moses-support] Call for Participation: ACL WMT 2010 Machine Translation Shared Task

2009-12-04 Thread Philipp Koehn
4: Training data released * February 15: Test data released (available on this web site) * February 19: Results submissions * March 26: Short paper submissions (4 pages) Organizers * Chris Callison-Burch (Johns Hopkins University) * Philipp Koehn (University of Edinburgh

Re: [Moses-support] Binarized SRILM

2009-12-02 Thread Philipp Koehn
Hi, if you generate an n-best list, the types of the different feature functions are more transparent, but it is a safe bet that your example "-312.951" is the unweighted language model weight. -phi On Wed, Dec 2, 2009 at 11:44 PM, marco turchi wrote: > Hi Hieu, > I'll check...  and I'll let u

Re: [Moses-support] -nbest-list option in decoder

2009-12-01 Thread Philipp Koehn
Hi, you should not specify an n-best-list option to the MERT script, since it uses its own file naming convention for the n-best list files it produces. If you want to change the size of the n-best lists to 200 use the option "-nbest 200". -phi On Fri, Nov 27, 2009 at 2:55 PM, Samidh Chatterjee

Re: [Moses-support] moses decoder results on cygwin and dos

2009-11-26 Thread Philipp Koehn
Hi, I have seen text files under windows that add a starting byte to indicate the encoding of the file. Sine the first word is a problem, this may be the cause. -phi On Thu, Nov 26, 2009 at 2:34 PM, Ivan Uemlianin wrote: > Hieu > > Thanks for your comment. > > How can this be a line-ending issu

Re: [Moses-support] Building POS language model with SRILM

2009-11-24 Thread Philipp Koehn
Hi, you are correct that for POS LMs the lower order n-gram counts are very different and smoothing is less relevant. You could train a 7-gram LM with Good Turing smoothing for the lower order n-grams and Kneser-Ney for the higher order n-grams. I have done this occasionally. -phi On Tue, Nov

Re: [Moses-support] Moses decoder with two phrase-tables - Need urgent help

2009-11-23 Thread Philipp Koehn
Hi, On Sat, Nov 21, 2009 at 12:50 PM, Kranthi Achanta wrote: > Hi, > > We are trying to run the Moses decoder with two different phrase-tables. We > gone through the instructions in the Moses site and modified the moses.ini > file according to that but, we couldn’t able to get the output and it a

Re: [Moses-support] about paremetres of Moses'config file

2009-11-23 Thread Philipp Koehn
Hi, Yes. For each weight specified in the config file, there are feature values that are represented in the n-best list. -phi 2009/11/22 iamzcy_hit iamzcy_hit : > Hi,all > I have a question . > Whether the paremetres in Moses' config file are one-to-one > correspondence with the featur

Re: [Moses-support] RDBMS for the decoder?

2009-11-20 Thread Philipp Koehn
Hi, I have no idea how easy it is to use conventional database to store the models. Regarding Google translate: they use very similar methods, but it is implemented in a parallel way. If you are interested in that - check out Hadoop, or this tutorial: http://clear.colorado.edu/NAACLHLT2009/tutori

Re: [Moses-support] Any help on moses scoring steps

2009-11-19 Thread Philipp Koehn
Hi, you will need a source and reference file that is also in the sgm file format. If you do not have this, you will have to build it yourself. See the test file for the WMT task for examples: http://www.statmt.org/wmt09/ With this file, you can run the script: wrap-xml.perl LANGUAGE SOURCE-FILE

Re: [Moses-support] The phrase-table taking much time

2009-11-18 Thread Philipp Koehn
Hi, I assume that you are talking about the GIZA++ training. This takes easily 1-2 days with large corpora. -phi On Wed, Nov 18, 2009 at 4:06 AM, Pavani Y wrote: > Hi all, > > > >   I have given the corpus of French(12MB approximately) & > English(11MB approximately) which I got from Mo

Re: [Moses-support] Format of phrase reordering file extract.o.gz

2009-11-11 Thread Philipp Koehn
therwise.) Does the last source phrase with regard to the following > context have the same policy? If you don't know off the top of your > head, I'll dig into the data and figure it out. > > Thanks, > John > > On Wed, Nov 11, 2009 at 11:59 AM, Philipp Koehn wr

Re: [Moses-support] Format of phrase reordering file extract.o.gz

2009-11-11 Thread Philipp Koehn
Hi, the determination in training, whether a phrase is swap (with regard to previous phrase or next) is based on alignment points around the phrase. Slide 112 in this tutorial defines which alignment points are looked at: http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2006.pdf So, yes

Re: [Moses-support] The flag -early-discarding-threshold in moses

2009-11-09 Thread Philipp Koehn
Hi, we may even remove this option, since cube pruning is doing something very similar, and it is not clear if there are tangible benefits to the early discarding. -phi On Mon, Nov 9, 2009 at 3:24 PM, Chris Dyer wrote: > This functionality is broken in the tip of the trunk. There was a > proje

Re: [Moses-support] Get the input - output word mapping

2009-11-05 Thread Philipp Koehn
Hi, if you use the new training code that is stored away in the mt3_chart branch, you can run train-factored-phrase-model.perl with the additional option: -score-options "--WordAlignment FILE" which will generate a word alignment file (with the specified file name) for all the phrase pairs.

Re: [Moses-support] Efforts to port Moses to Windows without Cygwin?

2009-11-03 Thread Philipp Koehn
HI, yes, boost is only needed for the multi-threaded decoder, not the single-threaded decoder. -phi On Tue, Nov 3, 2009 at 4:32 PM, Hieu Hoang wrote: > MySQL - no. That was taken out a few years ago. > Boost - i think you need it only if you compile the new code in svn > which implements multi-

Re: [Moses-support] Efforts to port Moses to Windows without Cygwin?

2009-11-03 Thread Philipp Koehn
Hi, I am not aware of any plans to port the training code to Windows, since we are not using Windows here at the university. The decoder itself should run on Windows identically. -phi On Tue, Nov 3, 2009 at 1:51 AM, Chia Tee Kiah wrote: > Hello all, > > I read from the Moses FAQ that the Moses

Re: [Moses-support] lex.f2e has wrong number of tokens, skipping

2009-11-02 Thread Philipp Koehn
Hi, this looks like that your training corpus has some noisy ASCII characters that are handled differently by C++ and Perl. You will need to clean up your corpus to remove them. -phi On Mon, Nov 2, 2009 at 12:29 PM, Ivan Uemlianin wrote: > Dear All > > I have Moses running fine on MacOSX. Now

Re: [Moses-support] Hierarchical rule extraction

2009-10-28 Thread Philipp Koehn
Hi Lane, this is described here: http://www.statmt.org/moses/?n=Moses.ChartDecoding Extracting hierarchical rules is pretty straigh-forward, just add "-hierarchical" when you run train-factored-phrase-model.perl We are not entire sure, if there is any sanity to the defaults (span-size 15, for ins

Re: [Moses-support] again , an error

2009-10-24 Thread Philipp Koehn
Hi, yes, it is possible to have a single binary with SRILM and IRSTLM, and even RandLM - there used to be problems but that is resolved now. I tried to find on the web page where that is still mentioned as a problem. If there is indeed such a mention, can you point me to it, so we can remove that

Re: [Moses-support] problem with moses config

2009-10-21 Thread Philipp Koehn
Hi, On Wed, Oct 21, 2009 at 1:32 PM, Bakhshaei wrote: > 2. i can't run moses configuration file . when type ./configure > --with-srilm=/path-to-srilm i get this error: > > bash: ./configure: No such file or directory > > Where is the problem? Can you help me please? You have to run sh regenera

Re: [Moses-support] xml-input support with factorised model?

2009-10-19 Thread Philipp Koehn
Hi, these should be working with the factored model without problems. Have you tried it out? -phi On Mon, Oct 19, 2009 at 12:21 PM, Pouliquen, Bruno wrote: > As advanced features of Moses (wonderful tool!), the XML markup “” > and “” tags are very handy, however, according to the manual they ar

Re: [Moses-support] Mark-up Unkown Words

2009-10-18 Thread Philipp Koehn
Hi, there is currently no such option, although it would be relatively easy to implement this. If you need some pointers to the code that handles unknown words, please let me know. -phi On Fri, Oct 16, 2009 at 9:28 AM, miguel wrote: > Dear list, > > Does moses feature any option that allows the

Re: [Moses-support] Moses: Error in training phrase model

2009-10-09 Thread Philipp Koehn
;." --- end of > sentence marker). > > Could you please let me know if there is a limit on the max length of > sentences - I gave a length of 1 - 60 while running the script. > In addition, is there any limit on the max allowable difference in sentence > length of the parallel

Re: [Moses-support] Moses: Error in training phrase model

2009-10-07 Thread Philipp Koehn
ientation > Executing: > ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract > ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2 > ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7 > --NoFileLimit orientation > PhraseExtract v1.4, wri

Re: [Moses-support] Moses server dies with invalid xml input

2009-10-05 Thread Philipp Koehn
Hi, ok, I am late on that latest change. Please remove all the abort() statments then. In general, it would be better to ensure that no faulty XML is generated in the first place. Graceful decay also leads to error that are hard to track down. -phi 2009/10/5 "Münt, Bernd" : >> Von: phko...@gmai

Re: [Moses-support] Moses server dies with invalid xml input

2009-10-05 Thread Philipp Koehn
Hi, if you want the decoder to just be less picky and gracefully decay on faulty XML input, you can edit the source file moses/src/XmlOption.cpp and remove all "return false;" statements after "TRACE_ERR("ERROR:..." lines. This way, the XML is processed all the way through. -phi 2009/10/1 "Münt

Re: [Moses-support] Train a factored model

2009-09-17 Thread Philipp Koehn
Hi, you need to specify the factors that you want to use. Check for details on how to do this here: http://www.statmt.org/moses/?n=Moses.FactoredTutorial -phi On Mon, Aug 24, 2009 at 6:25 AM, wrote: > Hi all, > > I'm new to Moses and kinda new to the SMT field in itself. I wish to train a >

Re: [Moses-support] POS translation

2009-09-16 Thread Philipp Koehn
Hi, if I understand this correctly, you want to preserve the XML tags verbatim and only translate the content. The easiest solution would be to strip out the XML, translate the text, and then using the phrase-alignment (and word alignment within phrases) to determine the positions where the tags s

Re: [Moses-support] phrase penalty

2009-09-16 Thread Philipp Koehn
Hi, the phrase penalty is part of the translation model - in a very crude way: each phrase pair entry has a 5th scoring component which is 2.76, which is almost e. Hence if there are 17 phrase pairs used, the log score is 16.9982 (log 2.76^17). -phi 2009/9/16 Felipe Sánchez Martínez > Hi there

Re: [Moses-support] Multi Moses - what's required?

2009-09-15 Thread Philipp Koehn
Hi, just to chime in: when translating from L1-L2 and L2-L1 you will certainly need different language models, but there is conceivably some efficiency in sharing the same translation model. The way the translation model is parameterized (both forwards and backward probabilities), it is quite con

Re: [Moses-support] Moses step 1 - data preparation step

2009-09-01 Thread Philipp Koehn
> take so long? Am I missing something here? > > James > >> -phi >> >> On Tue, Sep 1, 2009 at 3:59 PM, James Read wrote: >>> >>> Hi, >>> >>> Quoting Philipp Koehn : >>> >>>> Hi, >>>> >>>> ye

Re: [Moses-support] Moses step 1 - data preparation step

2009-09-01 Thread Philipp Koehn
Hi, the computationally expensive part are the *.classes files. -phi On Tue, Sep 1, 2009 at 3:59 PM, James Read wrote: > Hi, > > Quoting Philipp Koehn : > >> Hi, >> >> yes, it is correct that step 1 is doing just the data preparation for >> GIZA++. >&g

Re: [Moses-support] Moses step 1 - data preparation step

2009-08-31 Thread Philipp Koehn
Hi, yes, it is correct that step 1 is doing just the data preparation for GIZA++. The most time-consuming step is running mkcls to creake the classes for the relative distortion models. -phi On Mon, Aug 31, 2009 at 4:39 PM, James Read wrote: > Hi, > > does anyone know what step 1 of the moses tr

Re: [Moses-support] help for Moses-Decoder

2009-08-20 Thread Philipp Koehn
Hi, either the decoding or the mert optimization must have crashed at some point. Check how many run* files in the tuning temp directory have been created. If tuning ran for 10+ iterations, it is probably safe to use the newest weights. Otherwise, you can continue tuning by running the tuning scri

Re: [Moses-support] How to Create Factored Models

2009-08-19 Thread Philipp Koehn
Hi, you will need an external tool to give you this type of information (for instance MXPOST or Brill's tagger) - and reformat it into the Moses format. -phi On Mon, Aug 17, 2009 at 4:10 PM, Anand Kumar wrote: > hi, > > I am using Moses for Translating English to Tamil .I hav a parallel corpus .

Re: [Moses-support] no translation due to intermediate step

2009-08-05 Thread Philipp Koehn
Hi, you are looking at the STDERR output of the decoder which gives some additional information about the translation. For instance, it flags one of the words as a unknown word which is copied verbatim into the output. The actual translation is passed to STDOUT, where no such additional informatio

Re: [Moses-support] for help

2009-07-29 Thread Philipp Koehn
Hi, h one thing to check would be if your corpus has any bars "|" in it, which a special symbols for the multiple factors... It would be helpful to know, what values "count" and "factorType" have at the point of the error, and if this happened on the first sentence or somewhere down the r

Re: [Moses-support] EM Model 1 question

2009-07-27 Thread Philipp Koehn
:30 PM, James Read wrote: > In that case I really don't see how the code is guaranteed to give results > which add up to 1. > > Quoting Philipp Koehn : > >> Hi, >> >> this is LaTex {algorithmic} code. >> >> count($e|f$) += $\frac{t(e|f)}{\text{s-tota

Re: [Moses-support] EM Model 1 question

2009-07-27 Thread Philipp Koehn
t[f][e] / total[f]; > } > } > > > Is this the kind of thing you mean? > > Thanks > James > > Quoting Philipp Koehn : > >> Hi, >> >> I think there was a flaw in some versions of the pseudo code. >> The probabilities certainly need to add up to one. The

Re: [Moses-support] EM Model 1 question

2009-07-27 Thread Philipp Koehn
Hi, I think there was a flaw in some versions of the pseudo code. The probabilities certainly need to add up to one. There are two normalizations going on in the algorithm: one on the sentence level (so the probability of all alignments add up to one) and one on the word level. Here the most rece

Re: [Moses-support] urgent query..pls help

2009-07-27 Thread Philipp Koehn
Hi, does the phrase table file exist? Is it listed in moses.ini? -phi On Mon, Jul 27, 2009 at 6:39 AM, nikita joshi wrote: > can anyone pls help me out. > > > after training the model , in the end i get > (9) create moses.ini > > > but when i test for a sample data, i get the following err

Re: [Moses-support] why is the same betweem the source language and the target language?

2009-07-26 Thread Philipp Koehn
Hi, if you run the decoder with the option -v 3 you will see in detail what the decoder is doing. In your case, I would assume that it is not reading in the phrase table at all. -phi On Sun, Jul 26, 2009 at 3:07 AM, 美娜 宋 wrote: > Hi all, >Now I'm working on using Moses for the Chinese to

Re: [Moses-support] Turning off debugging in IRSTLM

2009-07-23 Thread Philipp Koehn
Hi, the IRSTLM code is managed by Marcello Federico and Nicola Bertoldi at IRST, so they would have to check that in... -phi On Thu, Jul 23, 2009 at 3:59 PM, Francis Tyers wrote: > Hello all, > > I have a question about the debugging output in IRSTLM. Is there a way > to turn it off ? If I set

Re: [Moses-support] urgent query..pls help

2009-07-22 Thread Philipp Koehn
Hi, I checked the integrity of the file by downloading it myself, and it is fine. Could you please try to download it again - that seems to be the problem. Also, there is a version 3 available. -phi On Wed, Jul 22, 2009 at 9:19 AM, nikita joshi wrote: > > > i am facing a very serious problem.

Re: [Moses-support] alignment and confidence questions

2009-07-20 Thread Philipp Koehn
Hi Scott, Moses currently does not output confidence measures. In terms of alignment, using the trace option "-t" the decoder outputs phrase alignments. Getting word alignments is a bit more difficult, please read the manual section on outputting word alignment. It might be easier to compute word

Re: [Moses-support] Moses killed during phrase table loading

2009-07-18 Thread Philipp Koehn
Hi, this is most likely because the machine runs out of memory. Try filtering and/or binarizing the phrase table. -phi On Sat, Jul 18, 2009 at 6:32 PM, Girard Ramsay wrote: > Hello, > > I am going through the "Moses Installation and Training Run-Through" > document, and have built Moses with SR

Re: [Moses-support] alignment problem

2009-06-23 Thread Philipp Koehn
Hi, I have no idea what an ERROR 2 error is, but if I recall correctly then GIZA has a hard limit on mapping at most 9 words to 1 during alignment. If this is standard in your data, then it is bound to cause some severe problems. You may be able to change this either by a switch in GIZA or by chan

Re: [Moses-support] Question concerning Factored Tutorial

2009-06-09 Thread Philipp Koehn
Hi, so this would be a translation of Der|DET Mann|NN -> the man If yoiu do this in one translation step, you would need to specify --translation-factors 0,1-0 --decoding-steps t0 -phi On Tue, Jun 9, 2009 at 1:53 PM, Catharine Oertel wrote: > Hi there, > > I have a question concerning the fac

Re: [Moses-support] Binaries for Ubuntu 9.04

2009-05-27 Thread Philipp Koehn
Hi, if you can send us the diffs, we can check them in. -phi On Wed, May 27, 2009 at 1:42 AM, Eric Nichols wrote: > Greetings, > > Ubuntu has made some fairly large changes to the default build > environment in 8.10, > and I have been unable to build packages for that version and 9.04. > > To b

Re: [Moses-support] no individual alignment files

2009-05-23 Thread Philipp Koehn
Hi, the original two alignment files are generated by GIZA++ and are in the giza++ directories. -phi On Thu, May 21, 2009 at 8:25 PM, Scott Ledbetter wrote: > I am using Moses up to the word alignment step in training (steps 1-3), and > I'm not sure why in the resulting model directory, there i

Re: [Moses-support] can I use a character language model in translation?

2009-05-21 Thread Philipp Koehn
Hi, if the translation model also maps into individual Chinese characters as words in the translation table, then this is works out of the box. If you use different tokenization for translation model and language model, you need to add some code in the language model scoring, which should not too

Re: [Moses-support] what is Dyn.PT?

2009-05-17 Thread Philipp Koehn
Hi, I have no idea - where did you hear about it? -phi 2009/5/11 dongxinghua0213 : > HI,in the model of "SMT has the last word",source text--rule-based MT > engines--Hypotheses--Dyn.PT--SMT decoder--target text,what is Dyn.PT? how > can I get it?thank u. > > > > > ___

Re: [Moses-support] filtering

2009-05-17 Thread Philipp Koehn
Hi, the "filter-phrase-table.perl" script that you may be referring to, removes all phrase pairs for which the source phrase does not occur in the specified test set. No probabilities are adjusted. During training, phrases up to the length of 7 are used, and that number may be increased (and may

Re: [Moses-support] Moses how to know the language model is trained with or not?

2009-05-17 Thread Philipp Koehn
Hi, the decoder is not aware of the fact, if the language model was trained with -unk. It is recommended to do so. The decoder uses a floor of -100 log for low language model probabilities, which may happen with unseen words if is not in the model. Here is the part of LanguageModelSRI.cpp where

Re: [Moses-support] grow algorithm on wiki

2009-05-12 Thread Philipp Koehn
Hi, this is correct - this was in fact a mistake in the description on the Wiki. I fixed it to "or" instead of "and". As far as I know, there has not been all that much work on different such heuristics, the Och&Ney Computational Linguistics journal paper comes to mind. Most of the work that advan

Re: [Moses-support] lexical weighting and inverse probabilities

2009-05-07 Thread Philipp Koehn
Hi, they are all multiplied together, after applying an exponential weight. -phi On Thu, May 7, 2009 at 4:51 PM, Sanne Korzec wrote: > Hi, > > > > The final phrase pair table usually has a score vector of length 5: > > > > The components are: probability, lexical weights, inverse probability, >

Re: [Moses-support] MBR Decoding

2009-05-06 Thread Philipp Koehn
Hi, The InnerProduct function multiplies the individual feature weights path.GetScoreBreakdown() with the weights StaticData::Instance().GetAllWeights() -phi On Fri, Apr 24, 2009 at 11:10 PM, K.Taraka Rama wrote: > I have a doubt on how the individuals' models' probabilities are combined to

Re: [Moses-support] final phrase-table.gz

2009-05-06 Thread Philipp Koehn
Hi, the order of the entries in the file does not matter and it has no effect on performance. -phi On Wed, May 6, 2009 at 4:13 PM, Sanne Korzec wrote: > Hi, > > > > I have another question regarding the final phrase-table. > > > > Does the ordering of phrase pairs matter? Is it allowed to sort

<    4   5   6   7   8   9   10   11   >