[Moses-support] Announcement: RandLM

2008-11-03 Thread Miles Osborne
What is it?

RandLM (randomised language modelling) is yet another language model
for Moses.  However, it is designed to be very space-efficient indeed:
 depending upon settings, it can represent an SRILM language model in
about 1/10 of the space. The code can be used to estimate LMs either
from raw text (similar to SRILM's ngram-count) or else can be used
to load pre-built ARPA files.  Best compression results are obtained
when building LMs from raw text.

You can get the code here:

http://sourceforge.net/projects/randlm

(This is the first public release and there are sure to be bugs)

Read the files:

BUILDING_WITH_MOSES.txt

for Moses integration and:

README

for general information on building the release.

Note that Moses can support SRILM and RandLM LMs at the same time --just use

/configure --with-randlm=/path/to/randlm --with-randlm=/path/to/srilm

If you want to read more about this, then look at our ACL and EMNLP papers:

David Talbot and Miles Osborne.  Smoothed Bloom filter language
models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic
2007.
http://www.iccs.informatics.ed.ac.uk/~osborne/papers/emnlp07.pdf

David Talbot and Miles Osborne. Randomised Language Modelling for
Statistical Machine Translation. ACL, Prague, Czech Republic 2007.
http://www.iccs.informatics.ed.ac.uk/~osborne/papers/acl07.pdf

Miles

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Help me

2008-11-03 Thread 竹元勇太
When i made the translation model, the warning occurred as follows.
A parallel corpus contained the word and the sentence.

At what time will this warning occur?
What meaning is this warning?


WARNING: sentence 3 has alignment point (3, 0) out of bounds (3, 1)
E: 大きな木
F: a big tree
WARNING: sentence 4 has alignment point (3, 0) out of bounds (1, 1)
E: 原っぱ
F: field
WARNING: sentence 5 has alignment point (6, 0) out of bounds (1, 1)
E: 道
F: road



Regards,


-- 
Yuta Takemoto
[EMAIL PROTECTED]
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] train-factored-phrase-model.perl error

2008-11-03 Thread Philipp Koehn
Hi,

you may have already seen this from emails from last week,
this is now fixed, so please check out the latest version.

-phi

On Tue, Oct 21, 2008 at 9:25 AM, Radek Bartoň
[EMAIL PROTECTED] wrote:
 Hello.

 I checkouted today's svn trunk and compiled with SRILM support. Then I
 trained corpus using procedure described here
 http://www.statmt.org/wmt08/baseline.html but it fails with following
 message:

 Executing:
 /mnt/data/Projekty/NLP/tools2/moses/scripts/training/phrase-extract/score
 ./model/extract.inv.sorted ./model/lex.e2f ./model/phrase-table.half.e2f
 inverse

 PhraseScore v1.4 written by Philipp Koehn

 phrase scoring methods for extracted phrases

 using inverse mode

 Loading lexical translation table from ./model/lex.e2f

 Executing: rm -f ./model/extract.inv.sorted

 (6.5) sorting inverse e2f table@ Tue Oct 21 09:32:42 CEST 2008

 Executing: LC_ALL=C sort -T ./model ./model/phrase-table.half.e2f 
 ./model/phrase-table.half.e2f.sorted

 Executing: rm -f ./model/phrase-table.half.e2f

 (6.6) consolidating the two halves @ Tue Oct 21 09:32:42 CEST 2008

 Executing: rm -f ./model/phrase-table.half.*

 (7) learn reordering model @ Tue Oct 21 09:32:43 CEST 2008

 Executing: gunzip  ./model/extract.o.gz | LC_ALL=C sort -T ./model 
 ./model/extract.o.sorted

 (7.2) building tables @ Tue Oct 21 09:32:43 CEST 2008

 Executing: rm ./model/extract.o.sorted

 (8) learn generation model @ Tue Oct 21 09:32:43 CEST 2008

 no generation model requested, skipping step

 (9) create moses.ini @ Tue Oct 21 09:32:43 CEST 2008

 After default: -l mem_free=0.5G -hard

 Using SCRIPTS_ROOTDIR: /mnt/data/Projekty/NLP/tools2/moses/scripts

 checking weight-count for ttable-file

 checking weight-count for lmodel-file

 checking weight-count for distortion-file

 moses.ini:31:File does not exist or empty:
 /mnt/data/Projekty/NLP/corpora/test/model/msd-table.0-0.bi.fe.0.5.gz

 There is no such file in that directory, there is only
 /mnt/data/Projekty/NLP/corpora/test/reordering-table.msd-bidirectional-fe.0.5.gz
 file that should not be there. I think that there is some regression bug in
 train-factored-phrase-model.perl script.

 Could you confirm and fix it, please?

 --

 Ing. Radek Bartoň

 Faculty of Information Technology

 Department of Computer Graphics and Multimedia

 Brno University of Technology

 E-mail: [EMAIL PROTECTED]

 Web: http://blackhex.no-ip.org

 Jabber: [EMAIL PROTECTED]

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] train-factored-phrase-model.perl error

2008-11-03 Thread Radek Bartoň
On Monday 03 of November 2008 11:53:54 Philipp Koehn wrote:
 Hi,

 you may have already seen this from emails from last week,
 this is now fixed, so please check out the latest version.

 -phi


Yes it's working now, many thanks!

-- 
Ing. Radek Bartoň

Faculty of Information Technology
Department of Computer Graphics and Multimedia
Brno University of Technology

E-mail: [EMAIL PROTECTED]
Web: http://blackhex.no-ip.org
Jabber: [EMAIL PROTECTED]

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] 3rd MT Marathon - Call for participation

2008-11-03 Thread Philipp Koehn
 CALL FOR PARTICIPATION

THIRD MACHINE TRANSLATION MARATHON 2009

The MT Marathon 2009, organized by the Institute of Formal and
Applied Linguistics of the Charles University in Prague, Czech
Republic, is the third in a series of MT Marathons organized by
the EU Euromatrix research project on Machine Translation.

The EuroMatrix consortium invites researchers, developers,
students, and users of machine translation for participation. The
event will feature

- Winter School classes on current methods in statistical MT
- Research showcase
- Open source convention on resources for machine translation,
 this time with an open call for papers (separate announcement
 will follow)
- Lab hands-on experience for system developers, students and
 programmers
- Workshop on evaluation of European language translation

Where:
  Prague's Lesser Town, the newly renovated historical
  building of the Computer Science School of the Charles
  University in Prague
When:  January 26-31, 2009
How:   Registration is now possible!
How much: Attendance is free of charge, but limited.

For more information and online registration please go to
http://ufal.mff.cuni.cz/euromatrix/mtmarathon.


About the MT Marathon

MT Marathon is organized yearly by the EuroMatrix machine
translation research project funded by the European Union under
its Cooperation programme as a STREP project
FP6-IST-5-034291-STP. In January 2009, it will be third MT
Marathon organized by EuroMatrix. MT Marathon consists of several
events taking place at the same place to allow for free flow of
thoughts and exchange of information and experience: a spring
school (this time more like a winter school) with associated
lab lessons, invited research talks, and a hands-on experience
with Open Source MT tools. Participants will also experience
evaluating Machine Translation systems (with some hands-on
experience in actual subjective evaluation of MT systems taking
part in the WMT 2009 competition - see more at
http://www.statmt.org/wmt09/). This year, talks presenting some of
the available OpenSource tools in more detail will also be planned
throughout the week (see the call for papers). Please find more
about the current MT Marathon and the previous ones at
http://ufal.mff.cuni.cz/euromatrix/mtmarathon.

About Euromatrix

The EuroMatrix project (http://www.euromatrix.net) aims at a major
push in machine translation (MT) technology applying the most
advanced MT technologies systematically to all pairs of EU
languages. Special attention is being paid to the languages of the
new and near-term prospective member states. As part of this
application development, EuroMatrix designs and investigates novel
combinations of statistical techniques and linguistic knowledge
sources as well as hybrid MT architectures. EuroMatrix addresses
urgent European economic and social needs by concentrating on
European languages and on high-quality translation to be employed
for the publication of technical, social, legal and political
documents. EuroMatrix aims at enriching the statistical MT
approach with novel learning paradigms and experiment with new
combinations of methods and resources from statistical MT,
rule-based MT, shallow language processing and computational
lexicography/morphology.

The main objectives of the project are:
* Translation systems for all pairs of EU languages, with a
 special focus on the languages of new and near-term prospective
 member states
* Efficient inclusion of linguistic knowledge into statistical
 machine translation
* The development and testing of hybrid architectures for the
 integration of rule-based and statistical approaches
* Organization, analysis and interpretation of a competitive
 annual international evaluation of machine translation with a
 strong focus on European economic and social needs
* The provision of open source machine translation technology
 including research tools, software and data
* A systematically compiled and constantly updated detailed survey
 of the state of MT technology for all EU language pairs based on
 the developed systematic translation between all EU languages,
 the comparative MT evaluations and an inventory of available and
 needed tools, components, lingware and data.

___
Mt-list mailing list
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Help me

2008-11-03 Thread 竹元勇太
Thank you for the reply.

There isn't the empty in parallel corpus.
And, i didn't use |.
I didn't understand well. But, many thanks.

2008/11/4 Ondrej Bojar [EMAIL PROTECTED]

 Dear Yuta,

 my guess is that it's some basic issue with your parallel corpus. Something
 has just got out of sync.
 The meaning of the error is that the alignments links between words try to
 refer to words beyond the sentence. Which clearly means you're trying to
 apply the alignments on a wrong sentence.

 Have you removed all sentence pairs where one of the sentences is empty?

 Are you sure there is no '|' character in your data?

 Are you sure you're using the exact files for phrase extraction as you used
 for GIZA?

 Have a look at the sentences (ie. lines 3, 4, and 5) of your corpus files
 and of the alignment files, and check manually, if they fit together.

 Best, Ondrej.

 竹元勇太 wrote:

 When i made the translation model, the warning occurred as follows.
 A parallel corpus contained the word and the sentence.

 At what time will this warning occur?
 What meaning is this warning?


 WARNING: sentence 3 has alignment point (3, 0) out of bounds (3, 1)
 E: 大きな木
 F: a big tree
 WARNING: sentence 4 has alignment point (3, 0) out of bounds (1, 1)
 E: 原っぱ
 F: field
 WARNING: sentence 5 has alignment point (6, 0) out of bounds (1, 1)
 E: 道
 F: road



 Regards,


 --
 Yuta Takemoto
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


 

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





-- 
竹元勇太
[EMAIL PROTECTED]
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support