[Moses-support] Announcement: RandLM
What is it? RandLM (randomised language modelling) is yet another language model for Moses. However, it is designed to be very space-efficient indeed: depending upon settings, it can represent an SRILM language model in about 1/10 of the space. The code can be used to estimate LMs either from raw text (similar to SRILM's ngram-count) or else can be used to load pre-built ARPA files. Best compression results are obtained when building LMs from raw text. You can get the code here: http://sourceforge.net/projects/randlm (This is the first public release and there are sure to be bugs) Read the files: BUILDING_WITH_MOSES.txt for Moses integration and: README for general information on building the release. Note that Moses can support SRILM and RandLM LMs at the same time --just use /configure --with-randlm=/path/to/randlm --with-randlm=/path/to/srilm If you want to read more about this, then look at our ACL and EMNLP papers: David Talbot and Miles Osborne. Smoothed Bloom filter language models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic 2007. http://www.iccs.informatics.ed.ac.uk/~osborne/papers/emnlp07.pdf David Talbot and Miles Osborne. Randomised Language Modelling for Statistical Machine Translation. ACL, Prague, Czech Republic 2007. http://www.iccs.informatics.ed.ac.uk/~osborne/papers/acl07.pdf Miles -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Help me
When i made the translation model, the warning occurred as follows. A parallel corpus contained the word and the sentence. At what time will this warning occur? What meaning is this warning? WARNING: sentence 3 has alignment point (3, 0) out of bounds (3, 1) E: 大きな木 F: a big tree WARNING: sentence 4 has alignment point (3, 0) out of bounds (1, 1) E: 原っぱ F: field WARNING: sentence 5 has alignment point (6, 0) out of bounds (1, 1) E: 道 F: road Regards, -- Yuta Takemoto [EMAIL PROTECTED] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] train-factored-phrase-model.perl error
Hi, you may have already seen this from emails from last week, this is now fixed, so please check out the latest version. -phi On Tue, Oct 21, 2008 at 9:25 AM, Radek Bartoň [EMAIL PROTECTED] wrote: Hello. I checkouted today's svn trunk and compiled with SRILM support. Then I trained corpus using procedure described here http://www.statmt.org/wmt08/baseline.html but it fails with following message: Executing: /mnt/data/Projekty/NLP/tools2/moses/scripts/training/phrase-extract/score ./model/extract.inv.sorted ./model/lex.e2f ./model/phrase-table.half.e2f inverse PhraseScore v1.4 written by Philipp Koehn phrase scoring methods for extracted phrases using inverse mode Loading lexical translation table from ./model/lex.e2f Executing: rm -f ./model/extract.inv.sorted (6.5) sorting inverse e2f table@ Tue Oct 21 09:32:42 CEST 2008 Executing: LC_ALL=C sort -T ./model ./model/phrase-table.half.e2f ./model/phrase-table.half.e2f.sorted Executing: rm -f ./model/phrase-table.half.e2f (6.6) consolidating the two halves @ Tue Oct 21 09:32:42 CEST 2008 Executing: rm -f ./model/phrase-table.half.* (7) learn reordering model @ Tue Oct 21 09:32:43 CEST 2008 Executing: gunzip ./model/extract.o.gz | LC_ALL=C sort -T ./model ./model/extract.o.sorted (7.2) building tables @ Tue Oct 21 09:32:43 CEST 2008 Executing: rm ./model/extract.o.sorted (8) learn generation model @ Tue Oct 21 09:32:43 CEST 2008 no generation model requested, skipping step (9) create moses.ini @ Tue Oct 21 09:32:43 CEST 2008 After default: -l mem_free=0.5G -hard Using SCRIPTS_ROOTDIR: /mnt/data/Projekty/NLP/tools2/moses/scripts checking weight-count for ttable-file checking weight-count for lmodel-file checking weight-count for distortion-file moses.ini:31:File does not exist or empty: /mnt/data/Projekty/NLP/corpora/test/model/msd-table.0-0.bi.fe.0.5.gz There is no such file in that directory, there is only /mnt/data/Projekty/NLP/corpora/test/reordering-table.msd-bidirectional-fe.0.5.gz file that should not be there. I think that there is some regression bug in train-factored-phrase-model.perl script. Could you confirm and fix it, please? -- Ing. Radek Bartoň Faculty of Information Technology Department of Computer Graphics and Multimedia Brno University of Technology E-mail: [EMAIL PROTECTED] Web: http://blackhex.no-ip.org Jabber: [EMAIL PROTECTED] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] train-factored-phrase-model.perl error
On Monday 03 of November 2008 11:53:54 Philipp Koehn wrote: Hi, you may have already seen this from emails from last week, this is now fixed, so please check out the latest version. -phi Yes it's working now, many thanks! -- Ing. Radek Bartoň Faculty of Information Technology Department of Computer Graphics and Multimedia Brno University of Technology E-mail: [EMAIL PROTECTED] Web: http://blackhex.no-ip.org Jabber: [EMAIL PROTECTED] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] 3rd MT Marathon - Call for participation
CALL FOR PARTICIPATION THIRD MACHINE TRANSLATION MARATHON 2009 The MT Marathon 2009, organized by the Institute of Formal and Applied Linguistics of the Charles University in Prague, Czech Republic, is the third in a series of MT Marathons organized by the EU Euromatrix research project on Machine Translation. The EuroMatrix consortium invites researchers, developers, students, and users of machine translation for participation. The event will feature - Winter School classes on current methods in statistical MT - Research showcase - Open source convention on resources for machine translation, this time with an open call for papers (separate announcement will follow) - Lab hands-on experience for system developers, students and programmers - Workshop on evaluation of European language translation Where: Prague's Lesser Town, the newly renovated historical building of the Computer Science School of the Charles University in Prague When: January 26-31, 2009 How: Registration is now possible! How much: Attendance is free of charge, but limited. For more information and online registration please go to http://ufal.mff.cuni.cz/euromatrix/mtmarathon. About the MT Marathon MT Marathon is organized yearly by the EuroMatrix machine translation research project funded by the European Union under its Cooperation programme as a STREP project FP6-IST-5-034291-STP. In January 2009, it will be third MT Marathon organized by EuroMatrix. MT Marathon consists of several events taking place at the same place to allow for free flow of thoughts and exchange of information and experience: a spring school (this time more like a winter school) with associated lab lessons, invited research talks, and a hands-on experience with Open Source MT tools. Participants will also experience evaluating Machine Translation systems (with some hands-on experience in actual subjective evaluation of MT systems taking part in the WMT 2009 competition - see more at http://www.statmt.org/wmt09/). This year, talks presenting some of the available OpenSource tools in more detail will also be planned throughout the week (see the call for papers). Please find more about the current MT Marathon and the previous ones at http://ufal.mff.cuni.cz/euromatrix/mtmarathon. About Euromatrix The EuroMatrix project (http://www.euromatrix.net) aims at a major push in machine translation (MT) technology applying the most advanced MT technologies systematically to all pairs of EU languages. Special attention is being paid to the languages of the new and near-term prospective member states. As part of this application development, EuroMatrix designs and investigates novel combinations of statistical techniques and linguistic knowledge sources as well as hybrid MT architectures. EuroMatrix addresses urgent European economic and social needs by concentrating on European languages and on high-quality translation to be employed for the publication of technical, social, legal and political documents. EuroMatrix aims at enriching the statistical MT approach with novel learning paradigms and experiment with new combinations of methods and resources from statistical MT, rule-based MT, shallow language processing and computational lexicography/morphology. The main objectives of the project are: * Translation systems for all pairs of EU languages, with a special focus on the languages of new and near-term prospective member states * Efficient inclusion of linguistic knowledge into statistical machine translation * The development and testing of hybrid architectures for the integration of rule-based and statistical approaches * Organization, analysis and interpretation of a competitive annual international evaluation of machine translation with a strong focus on European economic and social needs * The provision of open source machine translation technology including research tools, software and data * A systematically compiled and constantly updated detailed survey of the state of MT technology for all EU language pairs based on the developed systematic translation between all EU languages, the comparative MT evaluations and an inventory of available and needed tools, components, lingware and data. ___ Mt-list mailing list ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Help me
Thank you for the reply. There isn't the empty in parallel corpus. And, i didn't use |. I didn't understand well. But, many thanks. 2008/11/4 Ondrej Bojar [EMAIL PROTECTED] Dear Yuta, my guess is that it's some basic issue with your parallel corpus. Something has just got out of sync. The meaning of the error is that the alignments links between words try to refer to words beyond the sentence. Which clearly means you're trying to apply the alignments on a wrong sentence. Have you removed all sentence pairs where one of the sentences is empty? Are you sure there is no '|' character in your data? Are you sure you're using the exact files for phrase extraction as you used for GIZA? Have a look at the sentences (ie. lines 3, 4, and 5) of your corpus files and of the alignment files, and check manually, if they fit together. Best, Ondrej. 竹元勇太 wrote: When i made the translation model, the warning occurred as follows. A parallel corpus contained the word and the sentence. At what time will this warning occur? What meaning is this warning? WARNING: sentence 3 has alignment point (3, 0) out of bounds (3, 1) E: 大きな木 F: a big tree WARNING: sentence 4 has alignment point (3, 0) out of bounds (1, 1) E: 原っぱ F: field WARNING: sentence 5 has alignment point (6, 0) out of bounds (1, 1) E: 道 F: road Regards, -- Yuta Takemoto [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- 竹元勇太 [EMAIL PROTECTED] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support