What is it? RandLM ("randomised language modelling") is yet another language model for Moses. However, it is designed to be very space-efficient indeed: depending upon settings, it can represent an SRILM language model in about 1/10 of the space. The code can be used to estimate LMs either from raw text (similar to SRILM's "ngram-count") or else can be used to load pre-built ARPA files. Best compression results are obtained when building LMs from raw text.
You can get the code here: http://sourceforge.net/projects/randlm (This is the first public release and there are sure to be bugs) Read the files: BUILDING_WITH_MOSES.txt for Moses integration and: README for general information on building the release. Note that Moses can support SRILM and RandLM LMs at the same time --just use /configure --with-randlm=/path/to/randlm --with-randlm=/path/to/srilm If you want to read more about this, then look at our ACL and EMNLP papers: David Talbot and Miles Osborne. Smoothed Bloom filter language models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic 2007. http://www.iccs.informatics.ed.ac.uk/~osborne/papers/emnlp07.pdf David Talbot and Miles Osborne. Randomised Language Modelling for Statistical Machine Translation. ACL, Prague, Czech Republic 2007. http://www.iccs.informatics.ed.ac.uk/~osborne/papers/acl07.pdf Miles -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support