LDHT is not really supported, but looking at your error message it seems
that you need to install Google Sparse Hash.
On Wed Nov 19 2014 at 12:47:27 PM Hieu Hoang hieu.ho...@ed.ac.uk wrote:
There is a script within the randlm project that compiles just the library
needed to integrate the
I would model them as feature functions over phrases. You might imagine
that you can exploit vector similarity to do smoothing.
Good luck
Miles
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
this perl snippet:
$line =~ tr/\040-\176/ /c;
On 30 May 2014 12:17, moses-support-requ...@mit.edu wrote:
Send Moses-support mailing list submissions to
moses-support@mit.edu
To subscribe or unsubscribe via the World Wide Web, visit
gonzález
to
gonz lez
On 30 May 2014 17:22, Miles Osborne mi...@inf.ed.ac.uk wrote:
this perl snippet:
$line =~ tr/\040-\176/ /c;
On 30 May 2014 12:17, moses-support-requ...@mit.edu wrote:
Send Moses-support mailing list submissions to
moses-support@mit.edu
friday nlp malaise
On 30 May 2014 17:51, Miles Osborne mi...@inf.ed.ac.uk wrote:
it is trivial to change it to say a ? mark.
but I'm not sure what you want as output now. the original request
was for removing non-printable characters, which the Perl does,
Miles
On 30 May 2014 12:43
you can get kenlm to report perplexity as follows:
bin/query foo.arpa text | tail -n 1
note that you need to be careful with OOVs if you are comparing models
that do not all use the same vocabulary.
(SRILM is broekn in this respect in that an OOV will give you a
probability of one)
Miles
--
SMT systems such as Moses do not guarantee that they can reproduce
the training set. For example, phrases might be pruned due to
frequencies being too low, not all words might be aligned, the
decoder might discard the true translation during etc etc.
This doesn't really have much to do with
Incremental training in Moses is based upon work we did a few years back:
http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf
Table 3 shows that there is essentially no quality difference between
incremental training and standard GIZA++ training. incremental (re)
training is a lot faster.
If I recall the decoder was modified to allow batching of LM requests.
Miles
On 25 September 2013 10:22, Hieu Hoang hieuho...@gmail.com wrote:
I'm not sure how to compile LDHT but when i compiled randlm from svn, i had
to change 2 minor things to get it to compile on my mac:
1.
this functionality.
Cheers,
Lane
On Wed, Sep 25, 2013 at 10:24 AM, Miles Osborne mi...@inf.ed.ac.uk wrote:
If I recall the decoder was modified to allow batching of LM requests.
Miles
On 25 September 2013 10:22, Hieu Hoang hieuho...@gmail.com wrote:
I'm not sure how to compile LDHT but when i compiled
For a long time now I've wanted to see Moses on a small device. Apart from
all of the extra functionality that isn't needed, one would also need to
work on shrinking the phrase table and perhaps also the search graph.
KenLM / RandLM already deal with making the language model smaller.
An
this is a fairly typical result for MERT. i notice you are using
MIRA, which is claimed to be more reliable. see
http://www.aclweb.org/anthology/N/N09/N09-1025.pdf
note that getting MIRA to work takes a lot of tweaking, so read the
fine print carefully
Miles
On 25 July 2012 17:24, Cristina
the way weights are estimated, translation changes when I add
new features with zero weight (not in development but in test). They
shouldn't contribute to score the final translation, right?
Cristina
On Wed, 25 Jul 2012, Miles Osborne wrote:
this is a fairly typical result for MERT. i notice
then something is wrong
Miles
On 25 July 2012 19:42, Cristina cristi...@lsi.upc.edu wrote:
mmm... but the others were optimised altogether, without the new ones I'm
giving a weight zero...
On Wed, 25 Jul 2012, Miles Osborne wrote:
if you have non-zero feature values at training time
The standard way to do this is pretend that each word pair in a dictionary
is a little sentence. Append this to the usual parallel corpus and train
with Giza
Miles
On May 1, 2012 5:53 PM, Abby Levenberg leven...@gmail.com wrote:
Hi,
I assume the answer is no but wanted to be sure.
Thanks,
Very short sentences will give you high scores.
Also multiple references will boost them
Miles
On Apr 26, 2012 8:13 PM, John D Burger j...@mitre.org wrote:
I =think= I recall that pairwise BLEU scores for human translators are
usually around 0.50, so anything much better than that is indeed
no it works as I just verified.
On 20 April 2012 11:29, sara hamza sarahamz...@gmail.com wrote:
Good Morning everyOne ,
Can anyone tell me please where can I get the mteval‐v11b.pl used in
evaluation ?? I found this URL in some documentation : ftp://
incremental training for Giza is distinct from incremental training
for the language model.
we have worked on both --see Abby Levenberg's PhD
http://homepages.inf.ed.ac.uk/miles/phd-projects/levenberg.pdf
the short answer is yes, but I don't think the incremental LM code
has migrated from
Oliver is in the process of finishing it.
Miles
On Feb 14, 2012 3:45 PM, Lane Schwartz dowob...@gmail.com wrote:
Miles,
Just ran across this email and thought I'd follow up. How is this
coming along? :)
Cheers,
Lane
On Thu, Nov 17, 2011 at 11:31 AM, Miles Osborne mi...@inf.ed.ac.uk
:33 AM, Miles Osborne mi...@inf.ed.ac.uk
wrote:
Oliver is in the process of finishing it.
Miles
On Feb 14, 2012 3:45 PM, Lane Schwartz dowob...@gmail.com wrote:
Miles,
Just ran across this email and thought I'd follow up. How is this
coming along? :)
Cheers,
Lane
source
segments was translated redundantly by four different Turkers.
Note that we have translated paragraphs, so the data should be of
interest to researchers looking at discourse as well as machine
translation.
http://homepages.inf.ed.ac.uk/miles/babel.html
Miles Osborne (Edinburgh)
Chris Callison
this can be done, but it tends to not save much space. also it does
not help deal with OOVs, which the language model can still score even
though they are not in the parallel set.
if you are worried about saving space then you should either look at
KenLM or RandLM
Miles
On 24 November 2011
--in general, Machine Translation training is non-convex. this means
that there are multiple solutions and each time you run a full
training job, you will get different results. in particular, you will
see different results when running Giza++ (any flavour) and MERT.
Is there no way to
re: not tuning on training data, in principle this shouldn't matter
(especially if the tuning set is large and/or representative of the
task).
in reality, Moses will assign far too much weight to these examples,
at the detriment of the others. (it will drastically overfit). this
is why the
we have been working on making distributed LMs efficient. stay tuned
Miles
On 17 November 2011 13:53, Hieu Hoang hieuho...@gmail.com wrote:
hi peter
i think christian federmann worked on the remote LM :
Question: do you think it's better to run mert-moses.pl more times
with smaller sets, or fewer times with larger sets?
you should run tuning with larger sets, multiple times
no amount of rerunning tuning on a small set will tell you anything
Miles
On 7 November 2011 13:45, Tom Hoar
that doesn't work, as all of the locking code etc would still be invoked.
you really want something like
--threads 0
which should bypass everything and truly run in single threaded mode
Miles
On 22 September 2011 10:26, Kenneth Heafield mo...@kheafield.com wrote:
-threads 1 ?
On 09/22/11
compile-time does a better job at meeting a goal that I don't buy into.
On 09/22/11 10:31, Miles Osborne wrote:
that doesn't work, as all of the locking code etc would still be invoked.
you really want something like
--threads 0
which should bypass everything and truly run in single threaded
On 22 September 2011 11:28, Kenneth Heafield mo...@kheafield.com wrote:
But I don't see a use case for it. I can run gdb just fine on a
multithreaded program that happens to be running one thread. And the
stderr output will be in order.
On 09/22/11 11:21, Miles Osborne wrote:
should someone
exactly, the only correct way to get real probabilities out would be
to compute the normalising constant and renormalise the dot products
for each phrase pair.
remember that this is best thought of as a set of scores, weighted
such that the relative proportions of each model are balanced
Miles
question is: What is that metric best indicative of?
--
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
Server: freenode
On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
exactly, the only correct way to get real probabilities out would be
to compute
yes
On 6 September 2011 17:28, Cyrine NASRI cyrine.na...@gmail.com wrote:
Hi all,
Is it possible tu uses 5 gram Language model built bu SRILM with MOses?
Thanks
Best
Cyrine
--
*Cyrine
Ph.D. Student in Computer Science*
___
Moses-support
for the SRILM, you use the -unk flag; RandLM does this by default if I
recall
Miles
On 16 August 2011 06:28, Tom Hoar tah...@precisiontranslationtools.comwrote:
Ken,
Does the online moses documentation refer to how to ensure the language
model has unk in the vocabulary? I've never seen it.
good to see the variance reduction.
why not repeat this with more features? you should see a greater effect
this way. an easy way to do this is to just add more language models.
Miles
On 11 August 2011 19:53, Philipp Koehn pko...@inf.ed.ac.uk wrote:
Hi,
I added a number of improvements to
that isn't the expected answer here. i think the OP wants some kind of
incremental (re) training.
firstly: it is not really possible to guarantee that performance is not
degraded when running from subsets up to the full set (compared with just
running it on the full set).
secondly, you may
it is this:
Abby Levenberg, Chris Callison-Burch and Miles Osborne. Stream-based
Translation Models for Statistical Machine
Translationhttp://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf.
NAACL, Los Angeles, USA, 2010.
http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf
Miles
On 15
the simplest approach would be to use another character to join words
together. the tokeniser thinks you have hyphenated words, which is
probably what you don't want.
Miles
On 13 June 2011 18:39, Anna c annac...@hotmail.com wrote:
Hi,
I've tried what you suggested, but I'm not sure if I'm
is this after running with SRILM?
if so, then look for the script which creates the LM and delete it.
that should force it to be re-created, using IRSLM
Miles
On 27 May 2011 09:16, Greg Wilson gre...@gmail.com wrote:
Hi, first let me thank the people who are making Moses available, your
work
It looks like you are using 64 bit versions eg srilm. Make sure everything
is 32 bit
Miles
On 21 May 2011 13:45, Bartosz Grabski bartosz.grab...@gmail.com wrote:
Hello,
I'm using quite fresh Ubuntu 11.04 (on a 32bit machine).
I downloaded and compiled latest srilm and irstlm (not without some
naturally, the parallel data could be down-sampled (eg use 1/2 of it).
you probably won't see a significant degradation in translation
quality and the whole training process will use less RAM and will be
quicker.
Miles
On 18 April 2011 15:05, Tom Hoar tah...@precisiontranslationtools.com wrote:
There is work published on making mert more stable (on the train so can't
easily dig it up)
Miles
sent using Android
On 25 Mar 2011 12:49, Lane Schwartz dowob...@gmail.com wrote:
We know that there is nondeterminism during optimization, yet virtually all
papers report results based on a single
/mtmarathon2010/ProjectFinalPresentation/MERT/StabilizingMert.pdf
On Friday 25 March 2011 13:02, Miles Osborne wrote:
There is work published on making mert more s...
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number S
to add to Barry's excellent answer, we are currently working on a
client-server language model. this will mean that a cluster of
machines can be used, with a shared resource. it should also work
with multicore
but in the short-term, you are probably better off with multicore
Miles
On 2
supply a weights file, eg
weight-config = /home/miles/nist09/run9.moses.ini
add this to the TUNING section.
Miles
On 31 January 2011 21:22, John Morgan johnjosephmor...@gmail.com wrote:
--
Regards,
John J Morgan
Hello,
I'd like to run an experiment with the ems without tuning. Is it
?
Is there a way to use pas-unless, ignore-unless, or template-if for this?
Thanks,
John
On 1/31/11, Miles Osborne mi...@inf.ed.ac.uk wrote:
supply a weights file, eg
weight-config = /home/miles/nist09/run9.moses.ini
add this to the TUNING section.
Miles
On 31 January 2011 21:22, John Morgan
Not yet
Miles
sent using Android
On 15 Jan 2011 10:00, Sébastien Druon s.dr...@ml-technologies.com wrote:
Thanks!
Do you approximately know in what time frame?
Regards,
Sebastien
On Wed, 2011-01-12 at 09:44 +, Miles Osborne wrote:
sorry, the code is not publically availab
Sebastien
On 12 Jan 2011 09:21, Miles Osborne mi...@inf.ed.ac.uk wrote:
yes. we have done this for both Giza++ and for the language model:
Stream-based Translation Models for Statistical Machine Translation,
Abby Levenberg, Chris Callison-Burch and Miles Osborne, NAACL 2010
Stream-based
in general you should send SRILM requests to their mailing list and
not to this one.
but i can tell you straight away that the ngram server is behaving
correctly. it waits for requests ...
Miles
On 26 November 2010 11:28, Korzec, Sanne sanne.kor...@wur.nl wrote:
Hi,
I have compiled SRILM on
i second this.
but can I make another suggestion. make the default be *non* factored
input. i reckon that most people using Moses don't actually use
factors (hands-up if you do).
this means, plain input, with absolutely no meta chars in them.
and if you are going to use meta-chars, why not
i implemented this years ago (the idea then was to see if for
free-word-order languages, phrases could be generalised). at the time
it didn't seem that there was a more efficient way to do it than just
generate permutations and score them.
and if you think about it, this is essentially the
this sounds risky to me. it would be better to allow the user to
specify the behaviour; for your suggestions, you would add an extra
flag which would enable this. the default would be for truecasing to
operate as it used to.
Miles
On 25 October 2010 17:37, Ben Gottesman
ah, my apologies --I didn't realise you also wanted morphological
information. in that case, you will need something like Fran's
suggestion
Miles
On 20 October 2010 11:12, Francis Tyers fty...@prompsit.com wrote:
You could use the morphological analysers from the Apertium project.
note also that NIST changed to IBM BLEU recently which has a different
treatment of multiple references.
(mteval 13 uses IBM BLEU if i recall)
generally the BLEU scores will be a little lower than before, but MERT
performance should be more robust
Miles
On 17 October 2010 09:57, liu chang
the phrase length refers to the number of words in a phrase and the
number of scores to the number of feature function, per phrase.
they have nothing to do with each other
On 6 October 2010 11:31, supp...@precisiontranslationtools.com wrote:
I found this message below, which mentions the
clearly changing the configuration will change the alignment results.
i suggest that before mailing the list again, you read this article:
A Systematic Comparison of Various. Statistical Alignment Models.
Franz Josef Och*. Hermann Ney
http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf
Miles
looking at your output:
[ERROR] Malformed input at
Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
but instead received input with 0 factor(s).
sh: line 1: 5114 Aborted
make sure you have no bar (|) characters in the data
Miles
On 27 September 2010 14:45, Souhir
it is probably more helpful to give the number of sentences you used
for language model training (and other details, eg ngram order).
but at first glance that looks like a tiny amount of language model
data --i would expect to see something closer to 2GB or so, depending
upon representation
don't really understand
how the setup works.
Thanks again,
Suzy
On 2/09/10 8:26 PM, Miles Osborne wrote:
a better setup would be to have a loop which did the following:
--for a given version number and step, check for STDERR, STDOUT and DONE
--if they are all found, exit
--otherwise sleep
this is after a crash I presume?
if so, then you should delete the step which creates the first config
file. this will force it to be recreated, using the current version.
below is a small perl script I use (for an older version of
experiment.perl, but it should work for you too). this was
see here:
http://jeremy.zawodny.com/blog/archives/010546.html
for a discussion of utf8 v UTF8
... now off to see England triumphant against Germany
Miles
On 27 June 2010 13:23, Miles Osborne mi...@inf.ed.ac.uk wrote:
on the subject of UTF8, i think the Moses tokeniser may be using
On 11 May 2010 17:33, Christian Hardmeier c...@rax.ch wrote:
For my purposes, even a hard-coded assumption of 1, along with a more
transparent error message if the model isn't found, would do. Does
anybody actually decode with in-memory phrase tables in real life?
(well, I suppose some people
MADA can create tokens that are bar characters (ie | )
you need to rename them to something like BAR. Moses treats these as
factor delimiters, hence the message you are seeing
(i've been using MADA+TOKAN for Arabic, using the D2 setting)
Miles
On 7 May 2010 23:26, David Edelstein
there is a large amount of randomness involved with parameter tuning. each
time you run it (using the same language resources) you might get different
parameters,
also, the parameters are not scaled. this means that one run might give you
these values:
10 20 30
and the next run might give you
this means you have run out of memory.
you can either:
--get more memory
--use less data
--use a lower-order LM
--use RandLM, which can easily handle this amount of data (i am
currently building LMs using more than 30 billion words with it for
example)
Miles
On 21 April 2010 09:57, Zahurul
a quick question. will this break compatibility with existing training runs?
also, adding new features --even if they are not used-- can impact
upon MERT and may slow things down / make things worse. have you
verified (using multiple runs) that this new feature doesnt' make
things worse than
re: adding dictionary entries, this is certainly a hack. but the
standard trick is to pretend that the dictionary actually consists of
tiny parallel sentences. you therefore just append each word-entry as
a new sentence pair. don't bother with that -d option.
Miles
On 23 February 2010 18:34,
this is a standard error. you need to build SRILM using 64-bit
support (i686-m64)
Miles
On 22 February 2010 11:40, Marce van Velden marcevanvelde...@gmail.com wrote:
Hi,
I get the folowing error when trying to compile moses on a intel64 pc. What
could cause the liboolm.a to be incompatible?
How words are tokenised / segmented etc is crucial when using small
amounts of data. For the vast numbers of people using Moses (people
not training-up on millions of sentence pairs) this is the kind of
thing that needs to be done correctly.
It would be a service to extend the Moses tokeniser to
it looks to me like you have not correctly compiled / installed the srilm.
Miles
2010/1/27 christopher taylor christopher.paul.tay...@gmail.com:
hello everyone!
i'm currently trying to build an instance of moses to support
crisiscommons.org's machine translation project (i'm currently the
you should also look at RandLM, as it will enable you to run a
language model in small space.
that aside, i would look hard at pruning the various tables (eg phrase
tables, reordering, language models) so you can just the core that you
need. this will make for faster loading etc. note also that
2010-01-11
发件人: Miles Osborne
发送时间: 2010-01-11 16:12:38
收件人: 李贤华
抄送: moses-support
主题: Re: [Moses-support] different servers + different time -
differentresult?
Giza++ and MERT both can produce different results, even when using
the same
randlm is already in a binary format, so there is no extra conversion
loading randomised models faster is not something that we have really
looked at.
Miles
2009/12/23 Arda Tezcan arda...@yahoo.com:
Hi,
I would really appreciate it if you could help me with the following
question I have:
I
Making RandLM thread-safe is something I've been thinking about.
There are a number of bug fixes which need dealing with too, so
perhaps at some point I'll push out a new release.
Miles
2009/12/17 Alexander Fraser fra...@ims.uni-stuttgart.de:
Hi Barry and Philipp,
Philipp is correct,
/anthology-new/W/W02/W02-1018.pdf
On Tue, Oct 27, 2009 at 10:51 PM, Catalin Braescu cata...@braescu.com wrote:
Then I wonder how can aligning be done automatically for phrases? And
what's the accuracy of such process?
Catalin Braescu
On Wed, Oct 28, 2009 at 12:36 AM, Miles Osborne mi
data to do well at Chinese-English than with Spanish-English.
Miles
2009/10/27 Catalin Braescu cata...@braescu.com:
Then I wonder how can aligning be done automatically for phrases? And
what's the accuracy of such process?
Catalin Braescu
On Wed, Oct 28, 2009 at 12:36 AM, Miles Osborne mi
you can't supply language models for both directions: you need to
supply them for the target and not the source
Miles
2009/10/22 Ivan Uemlianin i.uemlia...@bangor.ac.uk:
Dear All
I am using Moses with irstlm. The language pair I am developing is
English and Welsh. I have built language
the only other source of lots of parallel data (I know about) is the LDC:
http://www.ldc.upenn.edu/
but this is not free ...
Miles
2009/9/6 Catalin Braescu cata...@braescu.com:
Thanks, Miles! From your link I got http://www.statmt.org/europarl/
Any other such goodies?
Catalin
--
the good thing about probabilities is that they should sum to one
(but you can get numerical errors giving you slightly more / less ...)
Miles
2009/7/27 James Read j.rea...@sms.ed.ac.uk
Ok. Thanks. I think I understand this now. I also think I have found
the bug in the code which was causing
and don't forget to look at RandLM -this can save you a lot of memory
for your language model (a lot more than IRSTLM)
plug over!
Miles
2009/5/5 Marcin Miłkowski milek...@o2.pl:
Jan Helak pisze:
I have one last question. Final version will be builded with ap. 50 MB
of polish text and 50 MB
actually, i think Jan wants a speedup, not a space saving.
your best bet is to reduce the size of the beam:
http://www.statmt.org/moses/?n=Moses.Tutorial#ntoc6
Miles
2009/5/4 Francis Tyers fty...@prompsit.com:
El lun, 04-05-2009 a las 14:54 +0200, Jan Helak escribió:
Hello everyone :)
I try
Miłkowski milek...@o2.pl:
Miles Osborne pisze:
filtering etc might give you a speed-up (eg a constant one --less
stuff to load) but if filtering is safe w.r.t to the source data, then
you shouldn't see much here.
(pruning the table should make it faster since there will be fewer
options to consider
also see fewer page faults and the like with a
smaller model and that will help matters.
but in general, the beam size is the most direct way to make it faster.
Miles
2009/5/4 Francis Tyers fty...@prompsit.com:
El lun, 04-05-2009 a las 14:08 +0100, Miles Osborne escribió:
actually, i think Jan
there are many factors here. firstly, the randomised LM makes errors
as a function of the false positive rate and the values (quantisation)
level. roughly, the smaller these parameters are, the smaller your LM
will be, but there may be a performance drop.
secondly, the default count-based
in general, when you compile a c or c++ program, you add the switch
-g
to the options (usually in a Makefile). this will tell the compiler
to add stuff to the program so that it works with gde.
you then do:
gdb moses
and you will see a prompt. you then run moses within that prompt, but
assuming the current version hasn't been fixed to deal with the LM
problem affecting older versions of gcc:
--check-out the code using SVN as usual, ie
svn co https://svn.sourceforge.net/svnroot/mosesdecoder/trunk mosesdecoder
then look at the SVN logs:
svn log | less
find some version
a couple of points:
--you are asking ngram for perplexities scores, but Moses uses log probs
--Moses will append s and /s pseudo-words to the start and end ot
a sentence; this will change the probabilities
Miles
2009/3/5 Carlos Henriquez carlo...@gps.tsc.upc.es:
Hi all.
I'm making some
one thing to remember is that the link between AER and BLEU is not
obvious; in my view at least AER-like scores should be treated with
skepticism and the real merit of an alignment approach should be the
corresponding translation performance (BLEU etc).
can you provide associated BLEU scores for
anyway. this is, I guess, because it's better on
recall. AER seems to strongly prefer precision.
jorg
On Wed, 4 Mar 2009 13:46:36 +
Miles Osborne mi...@inf.ed.ac.uk wrote:
one thing to remember is that the link between AER and BLEU is not
obvious; in my view at least AER-like scores
there is a related bug with randlm which i'm looking at now.
whilst i'm doing this, can you verify that it is some mac-specific
problem and not say something due to the gcc version you are using?
Miles
2009/3/4 Kemal Oflazer k...@cs.cmu.edu:
Dear All
I just install moses on a large mac
ok, i'll try to work it out.
can you:
--mail me your moses.ini file
--mail me the commands you ran to create your language model
--tell me exactly how much language model data you used and what it
is; if it is europarl then that should be ok
Miles
2009/2/24 Michael Zuckerman
to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Torbjorn Granlund and Richard M. Stallman.
Miles
2009/2/19 Miles Osborne mi...@inf.ed.ac.uk:
what happens when you run this?
Miles
2009/2/19 Michael Zuckerman michael90...@gmail.com:
Hi,
I am
that might be it. but i seem to have it working here, using a
non-gzipped version of Europarl.
in any case, Michael: tell us if it works when the corpus is gzipped
Miles
2009/2/19 Barry Haddow bhad...@inf.ed.ac.uk:
Hi
I've seen this error before. The short answer is that you need to use a
ah, ok. i think David hit it on the head: randlm is currently in the
very first release and to my knowledge hasn't been extensively tested
under various setups.
we'll gather together these problems and add them into the next release.
Miles
2008/12/7 Radek Bartoň xbart...@stud.fit.vutbr.cz:
which version of unix are you using?
MIles
2008/11/28 Radek Bartoň [EMAIL PROTECTED]:
Hello.
Since there is no RandLM mailing list (at least I haven't found one) I'm
posting here. When creating language model with cat compressor, buildlm
fails (on my system) with error:
cat: invalid
it could be due to things like the way ties are broken, floating-point
errors and the like
Miles
2008/11/21 Hieu Hoang [EMAIL PROTECTED]:
that would be a worrying. are you sure all parameters are the same ? loading
the models and memory shouldn't affect the results.
there may rarely be
about this, then look at our ACL and EMNLP papers:
David Talbot and Miles Osborne. Smoothed Bloom filter language
models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic
2007.
http://www.iccs.informatics.ed.ac.uk/~osborne/papers/emnlp07.pdf
David Talbot and Miles Osborne. Randomised
firstly, do MERT and make sure that everything has reasonable parameters!
this is how to think about testing. you are trying to estimate the
error of your model (which you trained-up in the usual way). when
estimating this error, the *training set* is the test set. so, the
more `training'
(my message bounced as it was too long ... here is a truncated version)
Miles
-- Forwarded message --
From: Miles Osborne [EMAIL PROTECTED]
Date: 2008/8/14
Subject: Re: [Moses-support] Fwd: Moses: Prepare Data, Build Language Model
and Train Model
To: Llio Humphreys [EMAIL
building language models (using for example ngram-count) is computationally
expensive. from what you tell the list, it seems that you don't have enough
physical memory to run it properly.
you have a number of options:
--specify a lower order model (eg 4 rather than 5, or even 3); depending
an ugly hack is to simply create a soft link to the i686-m64 directory (as i
recently did on a new 64 bit machine)
Miles
2008/8/13 Sara Stymne [EMAIL PROTECTED]
Hi!
When we installed SRILM and Moses on our 64-bit Ubuntu machine we had
some troubles with getting the machine type right. What
1 - 100 of 112 matches
Mail list logo