Hi,
the lm switch requires three items for each language model:
factor, order, filename, and type.
So, the training ":0" you are referring to is the type
(0=srilm). If it is omitted, the value "0" is assumed.
-phi
On Wed, May 26, 2010 at 3:54 AM, wrote:
> Recently, these two threads have disc
ercase+recase or truecase+detruecase?
>
> Suzy
>
> On 25/05/10 8:47 PM, Philipp Koehn wrote:
>>
>> Hi Suzy,
>>
>> I could re-produce this error in a way that I assume is what you did.
>> You changed the specification of the CORPUS, but you did not
>> di
model to limited memory such as on-disk
> loading of phrase-based model.
>
> I mean the latest released version (moses-2010-04-26).
>
> --
> Hwidong Na
> KLE lab, POSTECH, KOREA
>
>
> 2010-05-24 (월), 13:06 +0100, Philipp Koehn:
>> Hi,
>>
>> t
Hi Suzy,
I could re-produce this error in a way that I assume is what you did.
You changed the specification of the CORPUS, but you did not
disable the truecaser.
You need to comment out the following settings:
[TRUECASER]
### script to train truecaser models
#
#trainer = $moses-script-dir/reca
Hi,
yes, it is no problem to have multiple versions of the decoder, as long
as they are in different directories.
-phi
On Mon, May 24, 2010 at 9:02 PM, Korzec, Sanne wrote:
> Hi All,
>
> I have a checkout from the repository on my system from 2008.
>
> Is it possible for me to do another checko
Hi,
there have been some updates over the last two weeks, but the current version
should work - as you also indicate. Let me know, if there are any other bugs
in the latest revision.
What is the "stable version" you are referring to?
-phi
On Mon, May 24, 2010 at 9:28 AM, Hwidong Na wrote:
> Th
Hi,
thanks for trying out experiment.perl, please let me know of any
other problems you encounter.
The two solutions you indicate are both correct. I just updated in
the main branch ~/mosesdecoder/scripts/released_files so that
should be up to date now.
-phi
On Tue, May 18, 2010 at 3:23 AM, Suz
> best word, it chooses any word which is not necessarily the good one.
>
>
> Regards
> S
>
>
>
> 2010/4/27, Philipp Koehn :
>> Hi,
>>
>> I am not entirely sure what the state of the code is here.
>> Looking at the code (XmlOption.cpp), it seems to
Hi,
I am not entirely sure what the state of the code is here.
Looking at the code (XmlOption.cpp), it seems to require
to use double bars ("||") to separate the options.
See if that works, otherwise please help out with getting
the code into shape.
-phi
On Wed, Apr 21, 2010 at 10:18 AM, SG wr
Hi,
this seems to be the output of a tagger and not a syntactic parser,
so it is not suitable for a tree-based model.
-phi
On Mon, Apr 19, 2010 at 6:39 PM, haithem afli wrote:
>
> Hi;
> I'm trying to train Syntax Model . I use Treetiger to annotate the english
> trainning data and Amira for ara
Hi,
in all likelihood this is the result of noisy characters that are not visible
but are treated as tokens.
-phi
On Fri, Apr 16, 2010 at 8:57 AM, Thu, Vuong Hoai wrote:
> Hi Hieu,
>
> When I read carefully output from training model, I determined that in
> lexical file (e2f and f2e) have some
Hi,
On Sat, Apr 3, 2010 at 7:26 AM, Somayeh Bakhshaei wrote:
> 1. I have used 5 order LM, and run the tuning step but in moses.ini file
> still the LM order is 3! How it is may?
The training script gets the language model from the specified user settings,
so you need to specify the right orde
Hi,
this is still work in progress - the documentation at the time of
the MT Marathon is here:
http://www.statmt.org/mtm4/?n=Main.EMSDocumentation
-phi
On Tue, Apr 6, 2010 at 3:14 PM, Lane Schwartz wrote:
> Hi,
>
> At the MT Marathon, Jon and the other LoonyBin folks presented a paper
> descri
Hi,
if you work on a particular language pair, I suggest to build a
baseline system, analyze the mistakes and consider what needs
to be done to improve it.
There is typically something that can be done with regard to
morphology or reordering. At least you will learn something about
the problem of
Hi,
the relative limit -b is set very permissive to the point that
it has not practical impact. The stack size -s is 200 by default.
You can set -s to values around 10-1000 and the parameter
-b to 0.001-0.5. See yourself what the effect is (speed/quality).
You may also want to look into cube pru
Hi Lane,
multi-threading in Moses works the same: different sentences
are distributed to different threads.
Adding stateful feature functions has two costs: one is the
calculation of the function and one is the additional state splitting
which hurts recombination is the beam search. The first cos
Hi,
if snt2cooc is causing trouble, I suggest to run the training
script with the additional option "--parts 4", which splits up
the data for snt2cooc.
-phi
On Mon, Mar 29, 2010 at 2:43 PM, John Burger wrote:
>> C:\cygwin\home\moses\tools\bin\snt2cooc.out: *** fatal error - cmalloc
>> would hav
Hi,
the NIST script does internal tokenization, while the multi-bleu script
assumes that the data is already tokenized. There is also a difference
with the brevity penalty in the case of multiple reference translations.
-phi
On Fri, Mar 19, 2010 at 7:49 PM, Adam Lopez wrote:
> IIRC, the princip
Hi,
thanks for pointing out the error - I fixed the web page.
-phi
On Fri, Mar 19, 2010 at 10:00 AM, Sara Stymne wrote:
> Hi Maria!
>
> The model msd-bidirectional-e will not work, since you can only
> condition lexical reordering models on either the foreign phrase (f) or
> both the foreign an
Hi,
there is also that a larger language model leads to state splitting
during decoding, so you may have to try larger beam sizes to
see gains.
-phi
On Wed, Mar 17, 2010 at 8:14 PM, Hieu Hoang wrote:
> hi somayeh
>
> if you change the LM order, you should change the ini file so the decoder
> k
Hi,
lexicalized reordering cannot be used in hierarchical models.
-phi
On Tue, Mar 16, 2010 at 8:49 AM, Bui Hung wrote:
> Dear Sir,
>
> When I use mert-moses in Mose-chart with the command:
> nohop nice ./mert-moses.pl
> working-dir/tuning/nc-dev2007.lowercased.frworking-dir/tuning/lowercased
Hi,
there is probably something wrong with the meta data in the xml files.
Do they have matching IDs, language names, etc.?
Check the NIST web site for the proper format.
-phi
On Tue, Mar 2, 2010 at 1:27 PM, Somayeh Bakhshaei wrote:
> Hi every body,
>
> I am at the end of running moses, :)
> my
table
> Best regards!
>
> Jie Jiang
> CNGL, School of Computing,
> Dublin City University,
> Glasnevin, Dublin 9.
> Tel: +353 (0)1 700 6724
>
>
>
>
> 2010/2/24 Philipp Koehn
>>
>> Hi,
>>
>> the relationship between the Xs is encoded as wor
could be calculated from another
> reordering-table that is generated in the training phase?
>
> Best regards!
>
> Jie Jiang
> CNGL, School of Computing,
> Dublin City University,
> Glasnevin, Dublin 9.
> Tel: +353 (0)1 700 6724
>
>
>
>
> 2010/2/24 Philipp Koehn
Hi,
the best reference would be David Chiang's Computational Linguistics
article on hierarchical phrase-based models.
-phi
On Wed, Feb 24, 2010 at 11:21 AM, Jie Jiang wrote:
> Dear all:
>
> Could you tell me where can I find the details of moses-chart reordering?
> It seems that the reordering
Hi Francois,
thanks for raising this interesting problem.
The translation probability that Moses provides is unlikely to be a
good indicator
of the quality of the translation, since it will be dominated by the
language model
component score. In other words, it is more an indicator of how many unu
Hi,
which corpus are you talking about - where did you get it from
and what kind of processing did you do besides tokenization?
At some point the number of lines got out of sync, and you should
narrow down when that happened.
-phi
On Tue, Feb 9, 2010 at 9:24 AM, Pavani Y wrote:
> Hi All,
>
>
Hi,
thanks for catching this - I fixed it.
-phi
On Fri, Feb 5, 2010 at 1:01 AM, Christof Pintaske
wrote:
> Hi,
>
> here's a minor discovery in phrase_extract
>
> phrase_extract does not give any usage information, even though it seems
> somebody had the intention to do so:
>
> if (argc < 1)
HI,
yes, GetScore() does not include the future score, just the weighted
partial score computed so far. The difference to hypo->GetPrevHypo()->GetScore()
is the transition cost from the prior best hypothesis. This number should
be negative, excecpt in rare cases (for instance adding a common word
Hi,
Moses has a very liberal license (LGPL) that allows it to be used
in commercial products free of charge. We would appreciate a
appropriate mention of Moses.
-phi
On Mon, Feb 1, 2010 at 7:09 AM, wrote:
> Hi Philipp,
>
> We are very interested to use moses for our language translation purpo
Hi,
are you using your own installation, or are you referring to
http://demo.statmt.org/ ?
-phi
On Thu, Jan 21, 2010 at 9:24 PM, besacier wrote:
> hi
>
> i am experiencing a problem using moses-web : sometimes, i don't get
> the translated web page on the navigator while the translation is fu
Hi,
it seems you are using hierarchical rules and lexicalized
reordering at the same time. This is asking for trouble...
-phi
On Wed, Jan 20, 2010 at 11:05 PM, Christof Pintaske
wrote:
> Hi,
>
> my "extract.o.gz" respectively "extract.o.sorted" produce a large number
> of error messages: "buggy
Hi Bill,
here you go:
http://www.statmt.org/moses/manual-tex.tgz
This is the current snapshot, we will not update it.
-phi
On Wed, Dec 23, 2009 at 6:05 AM, Bill_Lang(Gmail) wrote:
> Hi friends,
> I know that moses manual is daily compiled by tex. Is it open-source
> also? If possible, I
Hi Doren,
please read the tutorial on factored models:
http://www.statmt.org/moses/?n=Moses.FactoredTutorial
-phi
On Tue, Dec 29, 2009 at 5:23 AM, EILMT Project wrote:
> Hi
>
> There is pos.lm of the target language in factored model training. I want
> to know the steps involved in preparing t
Hi,
this should work... Does the moses process generate a proper n-best list file?
There may be something wrong with running the decoder.
Regarding the section "non-terminals" in the moses.ini file - don't worry, this
is just a list of special non-terminals that are used for unknown words etc.
-
Hi,
the running in parts options only affects the "cooc" file creation -
which is mostly for memory efficiency, so GIZA++ does not
run out of memory. It only makes sense to use this option,
if the cooc file creation runs out of memory.
-phi
On Wed, Dec 23, 2009 at 9:32 PM, Mark Fishel wrote:
>
Hi Alex,
unfortunately the randomized language model implementation is not yet
thread-safe, so it is incompatible with the multi-threaded Moses. Only the
SRILM interface is currently supported.
-phi
On Wed, Dec 16, 2009 at 4:28 PM, Alexander Fraser
wrote:
> Hi Barry and other folks,
>
> I'm als
Hi,
one way to reduce the size of the rule table is to enforce a lower
span size for the rules, for instance:
train-model.perl [...] -extract-options="--MaxSpan 8"
The default is 12.
-phi
2009/12/14 zhmmc :
> Hi
> Now I find a problem when I'm training a hierarchical model with script
> of
Hi,
try to compile without the --with-berkeleydb switch and
everything should work.
-phi
On Tue, Dec 8, 2009 at 4:19 PM, John Morgan wrote:
> Hello,
> I guess this question is for Hugh.
> I'm trying to compile the new moses chart parsing decoder in the
> mt3_chart directory.
> I have the berkel
smoothing for
better probabilities of low-count phrase pairs.
Almost all of the tree-based code was written by Hieu Hoang,
who deserves full credit for this.
Regards,
Philipp Koehn
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu
Hi,
there is no option to report this.
I suggest to change the score.cpp code so that the counts
are reported somewhere as they are processed.
Or, sort the extract files and and count up phrase pairs.
-phi
On Mon, Dec 7, 2009 at 7:32 AM, Delip Rao wrote:
> Hi,
> Is there a way to find absolut
Hi,
I do not fully understand your tokenization issues, but you should look
into writing a tokenizer that suits your needs. The Moses decoder is
agnotistic about tokenization.
Regarding the demo: Look at the documentation on the Moses web site
on the Moses server implementation. You can also spee
4: Training data released
* February 15: Test data released (available on this web site)
* February 19: Results submissions
* March 26: Short paper submissions (4 pages)
Organizers
* Chris Callison-Burch (Johns Hopkins University)
* Philipp Koehn (University of Edinburgh
Hi,
if you generate an n-best list, the types of the different feature functions
are more transparent, but it is a safe bet that your example "-312.951"
is the unweighted language model weight.
-phi
On Wed, Dec 2, 2009 at 11:44 PM, marco turchi wrote:
> Hi Hieu,
> I'll check... and I'll let u
Hi,
you should not specify an n-best-list option to the MERT script,
since it uses its own file naming convention for the n-best list
files it produces. If you want to change the size of the n-best lists
to 200 use the option "-nbest 200".
-phi
On Fri, Nov 27, 2009 at 2:55 PM, Samidh Chatterjee
Hi,
I have seen text files under windows that add a starting byte to
indicate the encoding of the file. Sine the first word is a problem,
this may be the cause.
-phi
On Thu, Nov 26, 2009 at 2:34 PM, Ivan Uemlianin
wrote:
> Hieu
>
> Thanks for your comment.
>
> How can this be a line-ending issu
Hi,
you are correct that for POS LMs the lower order n-gram counts
are very different and smoothing is less relevant.
You could train a 7-gram LM with Good Turing smoothing for the lower
order n-grams and Kneser-Ney for the higher order n-grams.
I have done this occasionally.
-phi
On Tue, Nov
Hi,
On Sat, Nov 21, 2009 at 12:50 PM, Kranthi Achanta
wrote:
> Hi,
>
> We are trying to run the Moses decoder with two different phrase-tables. We
> gone through the instructions in the Moses site and modified the moses.ini
> file according to that but, we couldn’t able to get the output and it a
Hi,
Yes. For each weight specified in the config file, there are feature
values that are represented in the n-best list.
-phi
2009/11/22 iamzcy_hit iamzcy_hit :
> Hi,all
> I have a question .
> Whether the paremetres in Moses' config file are one-to-one
> correspondence with the featur
Hi,
I have no idea how easy it is to use conventional database
to store the models.
Regarding Google translate: they use very similar methods,
but it is implemented in a parallel way.
If you are interested in that - check out Hadoop,
or this tutorial:
http://clear.colorado.edu/NAACLHLT2009/tutori
Hi,
you will need a source and reference file that is also
in the sgm file format. If you do not have this, you will
have to build it yourself. See the test file for the WMT task
for examples: http://www.statmt.org/wmt09/
With this file, you can run the script:
wrap-xml.perl LANGUAGE SOURCE-FILE
Hi,
I assume that you are talking about the GIZA++ training.
This takes easily 1-2 days with large corpora.
-phi
On Wed, Nov 18, 2009 at 4:06 AM, Pavani Y wrote:
> Hi all,
>
>
>
> I have given the corpus of French(12MB approximately) &
> English(11MB approximately) which I got from Mo
therwise.) Does the last source phrase with regard to the following
> context have the same policy? If you don't know off the top of your
> head, I'll dig into the data and figure it out.
>
> Thanks,
> John
>
> On Wed, Nov 11, 2009 at 11:59 AM, Philipp Koehn wr
Hi,
the determination in training, whether a phrase is swap (with regard to previous
phrase or next) is based on alignment points around the phrase.
Slide 112 in this tutorial defines which alignment points are looked at:
http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2006.pdf
So, yes
Hi,
we may even remove this option, since cube pruning is doing something
very similar, and it is not clear if there are tangible benefits to the
early discarding.
-phi
On Mon, Nov 9, 2009 at 3:24 PM, Chris Dyer wrote:
> This functionality is broken in the tip of the trunk. There was a
> proje
Hi,
if you use the new training code that is stored away in the mt3_chart branch,
you can run
train-factored-phrase-model.perl
with the additional option:
-score-options "--WordAlignment FILE"
which will generate a word alignment file (with the specified file name)
for all the phrase pairs.
HI,
yes, boost is only needed for the multi-threaded decoder, not the
single-threaded decoder.
-phi
On Tue, Nov 3, 2009 at 4:32 PM, Hieu Hoang wrote:
> MySQL - no. That was taken out a few years ago.
> Boost - i think you need it only if you compile the new code in svn
> which implements multi-
Hi,
I am not aware of any plans to port the training code to Windows,
since we are not using Windows here at the university. The decoder
itself should run on Windows identically.
-phi
On Tue, Nov 3, 2009 at 1:51 AM, Chia Tee Kiah wrote:
> Hello all,
>
> I read from the Moses FAQ that the Moses
Hi,
this looks like that your training corpus has some noisy ASCII characters
that are handled differently by C++ and Perl. You will need to clean up
your corpus to remove them.
-phi
On Mon, Nov 2, 2009 at 12:29 PM, Ivan Uemlianin
wrote:
> Dear All
>
> I have Moses running fine on MacOSX. Now
Hi Lane,
this is described here:
http://www.statmt.org/moses/?n=Moses.ChartDecoding
Extracting hierarchical rules is pretty straigh-forward, just add
"-hierarchical" when you run train-factored-phrase-model.perl
We are not entire sure, if there is any sanity to the defaults
(span-size 15, for ins
Hi,
yes, it is possible to have a single binary with SRILM and IRSTLM,
and even RandLM - there used to be problems but that is resolved now.
I tried to find on the web page where that is still mentioned as a problem.
If there is indeed such a mention, can you point me to it, so we can
remove that
Hi,
On Wed, Oct 21, 2009 at 1:32 PM, Bakhshaei wrote:
> 2. i can't run moses configuration file . when type ./configure
> --with-srilm=/path-to-srilm i get this error:
>
> bash: ./configure: No such file or directory
>
> Where is the problem? Can you help me please?
You have to run
sh regenera
Hi,
these should be working with the factored model without problems.
Have you tried it out?
-phi
On Mon, Oct 19, 2009 at 12:21 PM, Pouliquen, Bruno
wrote:
> As advanced features of Moses (wonderful tool!), the XML markup “”
> and “” tags are very handy, however, according to the manual they ar
Hi,
there is currently no such option, although it would be relatively
easy to implement this. If you need some pointers to the code
that handles unknown words, please let me know.
-phi
On Fri, Oct 16, 2009 at 9:28 AM, miguel wrote:
> Dear list,
>
> Does moses feature any option that allows the
;." --- end of
> sentence marker).
>
> Could you please let me know if there is a limit on the max length of
> sentences - I gave a length of 1 - 60 while running the script.
> In addition, is there any limit on the max allowable difference in sentence
> length of the parallel
ientation
> Executing:
> ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract
> ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2
> ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7
> --NoFileLimit orientation
> PhraseExtract v1.4, wri
Hi,
ok, I am late on that latest change. Please remove all the
abort() statments then.
In general, it would be better to ensure that no faulty XML
is generated in the first place. Graceful decay also leads
to error that are hard to track down.
-phi
2009/10/5 "Münt, Bernd" :
>> Von: phko...@gmai
Hi,
if you want the decoder to just be less picky and gracefully
decay on faulty XML input, you can edit the source file
moses/src/XmlOption.cpp
and remove all "return false;" statements after "TRACE_ERR("ERROR:..."
lines.
This way, the XML is processed all the way through.
-phi
2009/10/1 "Münt
Hi,
you need to specify the factors that you want to use.
Check for details on how to do this here:
http://www.statmt.org/moses/?n=Moses.FactoredTutorial
-phi
On Mon, Aug 24, 2009 at 6:25 AM, wrote:
> Hi all,
>
> I'm new to Moses and kinda new to the SMT field in itself. I wish to train a
>
Hi,
if I understand this correctly, you want to preserve the XML tags verbatim
and only
translate the content. The easiest solution would be to strip out the XML,
translate the
text, and then using the phrase-alignment (and word alignment within
phrases) to
determine the positions where the tags s
Hi,
the phrase penalty is part of the translation model - in a very
crude way: each phrase pair entry has a 5th scoring component
which is 2.76, which is almost e. Hence if there are 17
phrase pairs used, the log score is 16.9982 (log 2.76^17).
-phi
2009/9/16 Felipe Sánchez Martínez
> Hi there
Hi,
just to chime in:
when translating from L1-L2 and L2-L1 you will certainly need
different language models, but there is conceivably some
efficiency in sharing the same translation model. The way the
translation model is parameterized (both forwards and backward
probabilities), it is quite con
> take so long? Am I missing something here?
>
> James
>
>> -phi
>>
>> On Tue, Sep 1, 2009 at 3:59 PM, James Read wrote:
>>>
>>> Hi,
>>>
>>> Quoting Philipp Koehn :
>>>
>>>> Hi,
>>>>
>>>> ye
Hi,
the computationally expensive part are the *.classes files.
-phi
On Tue, Sep 1, 2009 at 3:59 PM, James Read wrote:
> Hi,
>
> Quoting Philipp Koehn :
>
>> Hi,
>>
>> yes, it is correct that step 1 is doing just the data preparation for
>> GIZA++.
>&g
Hi,
yes, it is correct that step 1 is doing just the data preparation for GIZA++.
The most time-consuming step is running mkcls to creake the classes
for the relative distortion models.
-phi
On Mon, Aug 31, 2009 at 4:39 PM, James Read wrote:
> Hi,
>
> does anyone know what step 1 of the moses tr
Hi,
either the decoding or the mert optimization must have crashed at some
point. Check how many run* files in the tuning temp directory have been
created. If tuning ran for 10+ iterations, it is probably safe to use the newest
weights. Otherwise, you can continue tuning by running the tuning
scri
Hi,
you will need an external tool to give you this type of information
(for instance MXPOST or Brill's tagger) - and reformat it into the
Moses format.
-phi
On Mon, Aug 17, 2009 at 4:10 PM, Anand Kumar wrote:
> hi,
>
> I am using Moses for Translating English to Tamil .I hav a parallel corpus .
Hi,
you are looking at the STDERR output of the decoder which gives
some additional information about the translation. For instance, it
flags one of the words as a unknown word which is copied verbatim
into the output. The actual translation is passed to STDOUT, where
no such additional informatio
Hi,
h one thing to check would be if your corpus has any bars "|"
in it, which a special symbols for the multiple factors...
It would be helpful to know, what values "count" and "factorType" have
at the point of the error, and if this happened on the first sentence or
somewhere down the r
:30 PM, James Read wrote:
> In that case I really don't see how the code is guaranteed to give results
> which add up to 1.
>
> Quoting Philipp Koehn :
>
>> Hi,
>>
>> this is LaTex {algorithmic} code.
>>
>> count($e|f$) += $\frac{t(e|f)}{\text{s-tota
t[f][e] / total[f];
> }
> }
>
>
> Is this the kind of thing you mean?
>
> Thanks
> James
>
> Quoting Philipp Koehn :
>
>> Hi,
>>
>> I think there was a flaw in some versions of the pseudo code.
>> The probabilities certainly need to add up to one. The
Hi,
I think there was a flaw in some versions of the pseudo code.
The probabilities certainly need to add up to one. There are
two normalizations going on in the algorithm: one on the sentence
level (so the probability of all alignments add up to one) and
one on the word level.
Here the most rece
Hi,
does the phrase table file exist?
Is it listed in moses.ini?
-phi
On Mon, Jul 27, 2009 at 6:39 AM, nikita joshi wrote:
> can anyone pls help me out.
>
>
> after training the model , in the end i get
> (9) create moses.ini
>
>
> but when i test for a sample data, i get the following err
Hi,
if you run the decoder with the option -v 3 you will see in detail
what the decoder is doing. In your case, I would assume that
it is not reading in the phrase table at all.
-phi
On Sun, Jul 26, 2009 at 3:07 AM, 美娜 宋 wrote:
> Hi all,
>Now I'm working on using Moses for the Chinese to
Hi,
the IRSTLM code is managed by Marcello Federico and Nicola Bertoldi
at IRST, so they would have to check that in...
-phi
On Thu, Jul 23, 2009 at 3:59 PM, Francis Tyers wrote:
> Hello all,
>
> I have a question about the debugging output in IRSTLM. Is there a way
> to turn it off ? If I set
Hi,
I checked the integrity of the file by downloading it myself, and it
is fine. Could you please try to download it again - that seems to
be the problem.
Also, there is a version 3 available.
-phi
On Wed, Jul 22, 2009 at 9:19 AM, nikita joshi wrote:
>
>
> i am facing a very serious problem.
Hi Scott,
Moses currently does not output confidence measures.
In terms of alignment, using the trace option "-t" the decoder outputs
phrase alignments.
Getting word alignments is a bit more difficult, please read the manual
section on
outputting word alignment. It might be easier to compute word
Hi,
this is most likely because the machine runs out of
memory. Try filtering and/or binarizing the phrase
table.
-phi
On Sat, Jul 18, 2009 at 6:32 PM, Girard Ramsay wrote:
> Hello,
>
> I am going through the "Moses Installation and Training Run-Through"
> document, and have built Moses with SR
Hi,
I have no idea what an ERROR 2 error is, but if I recall correctly
then GIZA has a hard limit on mapping at most 9 words to 1
during alignment. If this is standard in your data, then it is bound to
cause some severe problems. You may be able to change this
either by a switch in GIZA or by chan
Hi,
so this would be a translation of
Der|DET Mann|NN -> the man
If yoiu do this in one translation step,
you would need to specify
--translation-factors 0,1-0 --decoding-steps t0
-phi
On Tue, Jun 9, 2009 at 1:53 PM, Catharine Oertel
wrote:
> Hi there,
>
> I have a question concerning the fac
Hi,
if you can send us the diffs, we can check them in.
-phi
On Wed, May 27, 2009 at 1:42 AM, Eric Nichols wrote:
> Greetings,
>
> Ubuntu has made some fairly large changes to the default build
> environment in 8.10,
> and I have been unable to build packages for that version and 9.04.
>
> To b
Hi,
the original two alignment files are generated by GIZA++ and are
in the giza++ directories.
-phi
On Thu, May 21, 2009 at 8:25 PM, Scott Ledbetter wrote:
> I am using Moses up to the word alignment step in training (steps 1-3), and
> I'm not sure why in the resulting model directory, there i
Hi,
if the translation model also maps into individual Chinese
characters as words in the translation table, then this is
works out of the box. If you use different tokenization for translation
model and language model, you need to add some code
in the language model scoring, which should not too
Hi,
I have no idea - where did you hear about it?
-phi
2009/5/11 dongxinghua0213 :
> HI,in the model of "SMT has the last word",source text--rule-based MT
> engines--Hypotheses--Dyn.PT--SMT decoder--target text,what is Dyn.PT? how
> can I get it?thank u.
>
>
>
>
> ___
Hi,
the "filter-phrase-table.perl" script that you may be referring to,
removes all phrase pairs for which the source phrase does not
occur in the specified test set. No probabilities are adjusted.
During training, phrases up to the length of 7 are used, and that
number may be increased (and may
Hi,
the decoder is not aware of the fact, if the language model
was trained with -unk. It is recommended to do so. The decoder
uses a floor of -100 log for low language model probabilities,
which may happen with unseen words if is not in the model.
Here is the part of LanguageModelSRI.cpp where
Hi,
this is correct - this was in fact a mistake in the description on the Wiki.
I fixed it to "or" instead of "and". As far as I know, there has not been
all that much work on different such heuristics, the Och&Ney Computational
Linguistics journal paper comes to mind. Most of the work that advan
Hi,
they are all multiplied together, after applying an exponential weight.
-phi
On Thu, May 7, 2009 at 4:51 PM, Sanne Korzec wrote:
> Hi,
>
>
>
> The final phrase pair table usually has a score vector of length 5:
>
>
>
> The components are: probability, lexical weights, inverse probability,
>
Hi,
The InnerProduct function multiplies the individual feature weights
path.GetScoreBreakdown()
with the weights
StaticData::Instance().GetAllWeights()
-phi
On Fri, Apr 24, 2009 at 11:10 PM, K.Taraka Rama
wrote:
> I have a doubt on how the individuals' models' probabilities are combined to
Hi,
the order of the entries in the file does not matter and it
has no effect on performance.
-phi
On Wed, May 6, 2009 at 4:13 PM, Sanne Korzec wrote:
> Hi,
>
>
>
> I have another question regarding the final phrase-table.
>
>
>
> Does the ordering of phrase pairs matter? Is it allowed to sort
801 - 900 of 1090 matches
Mail list logo