Re: [Moses-support] Tuning a factored model -> crash

2020-03-09 Thread Hieu Hoang
Hieu Hoang
http://statmt.org/hieu


On Mon, 9 Mar 2020 at 10:08, Haukur Páll Jónsson  wrote:

> Hi all,
>
>
> For the last few weeks, I have been trying to train and tune a factored
> model. I have found it difficult to implement and I now seek some
> assistance.
>
>
> I am trying to build a simple factored model from English to Icelandic:
> T0-0, T0,1-1. Where factor 0 is the surface and factor 1 is the POS. The
> training data (source and target) I have has three factors
> `surface|pos|lemma`, the lemma is ignored for now.
>
>
> When tuning a factored model I run into problems. My first question is,
> what factors should be in tuning data? It seems that I can have all factors
> as the input/source but I'm unsure about the output/target.
>
The tuning data needs to also have factors 0 and 1. You should pre-process
ttraining, tuning and test data in the same way

I run the tuning like so (using 10 threads):

>
>
> "$MOSESDECODER"/scripts/training/mert-moses.pl \
> "$DEV_DATA_IN"."$LANG_FROM" \
> "$DEV_DATA_OUT"."$LANG_TO" \
> "$MOSESDECODER"/bin/moses "$BASE_MOSES_INI" \
> --mertdir "$MOSESDECODER"/bin \
> --working-dir "$TUNE_DIR" \
> --decoder-flags="-threads $THREADS"
>
>
> But then when starting to decode the decoder crashes.
>
>
> Line Line 9: Initialize search took leave|VBP0.103 seconds total
> soft|JJ hands|NNS ,|, Jess|NNP you|PRP very|RB weak|JJ ,|, including|VBG
> 4: Initialize search took  .|.
>  : Collecting options took 0.003severe|JJ ,|, his|PRP$I|PRP loved|VBD
> Sean|NNP .|. you|PRP know|VBP .|.
>   seconds at moses/Manager.cpp Line 0.129 seconds total
> joint|JJ examination|NN Line 2: Initialize search took 141
>  Segmentation fault
> Exit code: 139
>
>
> I am monitoring the memory usage and the decoder is only using about 4GB
> of memory from the 32GB allocated when it crashes. Why the is the decoder
> crashing? Are there some recommendations for settings when training a
> factored model?


>
>
> Haukur Páll Jónsson
>
> Rannsóknarsérfræðingur | Tölvunarfræðideild
>
> Research Specialist | School of Computer Science
>
> Póstfang / E-mail: hauku...@ru.is
>
> [156021669]
>
> Háskólinn í Reykjavík | Reykjavik University
>
> Menntavegur 1 | 101 Reykjavík | Iceland
>
> Sími/Tel: +354 599 6200
>
> www.hr.is
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Final Call: EAMT Best Thesis Award (2 last days)

2020-03-09 Thread Carol Scarton
=
The Anthony C. Clarke Award for the 2019 EAMT Best Thesis
=

The European Association for Machine Translation (EAMT, http://www.eamt.org)
is an organization that serves the growing community of people interested
in MT and translation tools, including users, developers, and researchers
of this increasingly viable technology.

The EAMT invites entries for its eighth EAMT Best Thesis Award for a PhD or
equivalent thesis on a topic related to machine translation.

=== Eligibility ===

Researchers who

- have completed a PhD (or equivalent) thesis on a relevant topic in a
European, Northern African or Middle Eastern institution within calendar
year 2019 and
- have not previously won another international award for that thesis,
- are invited to submit their theses to the EAMT for consideration.

=== Panel ===

The submissions will be judged by a panel of experts who will be
specifically appointed as part of the EAMT 2020 programme committee and
which will be ratified by the Executive Board of the EAMT.

=== Selection criteria ===

Each thesis will be judged according to how challenging the problem was, to
how relevant the results are for machine translation as a field, and to the
strength of their impact in terms of scientific publications.

=== Scope ===

The scope of the thesis need not be confined to a technical area, and
applications are also invited from students who carried out their research
into commercial and management aspects of machine translation.

Possible areas of research include:

- development of machine translation or advanced computer-assisted
translation: methods, software or resources
- machine translation for less-resourced languages
- the use of these systems in professional environments (freelance
translators, translation agencies, localisation, etc.)
- the increasing impact of machine translation on non-professional Internet
users and its impact in communications, social networking, etc.
- spoken language translation
- the integration of machine translation and translation memory systems
- the integration of machine translation software in larger IT applications
- the evaluation of machine translation systems in real tasks such as those
above
- the cross-fertilisation between machine translation and other language
technologies

=== Prize ===

The winner will be announced at the same time as accepted papers for the
EAMT 2020: the 22nd Annual Conference of the European Association for
Machine Translation (Lisbon, Portugal, 4-6 May 2020), and will receive a
prize of €500, together with an inscribed certificate. The recipient of the
award will be required to briefly present their research at EAMT 2020. In
order to facilitate this, the EAMT will waive the winner's registration
costs, and will make available a travel bursary of €200 to enable the
recipient of the award to attend the said conference. The prize includes
complimentary membership in the EAMT for 2020 and 2021.

=== Submission ===

Candidates will submit using EasyChair:
https://easychair.org/conferences/?conf=eamt2020 (Submission type: Thesis
Award), a single PDF file containing:

- a 2-page summary of your thesis in English, containing:
- your full contact details,
- the name and contact details of your supervisor(s),
- a copy of your CV in English (at most one page, plus a complete list of
publications directly related to the thesis)
- an electronic copy of your thesis
- optionally, an appendix with any other relevant information on the thesis

By submitting their work, authors

- agree that, in case they are granted the award, any subsequently
published version of the thesis should carry the citation "The Anthony C.
Clarke Award for the 2019 EAMT Best Thesis" and
- acknowledge the right of the EAMT to publicize the granting of the award.

=== Closing date ===

The closing date for submissions will be the same as the deadline for EAMT
2020 research papers: March 11th (23.59 CEST).

-- 
*Carolina Scarton*
Academic Fellow
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Tuning a factored model -> crash

2020-03-09 Thread Haukur Páll Jónsson
Hi all,


For the last few weeks, I have been trying to train and tune a factored model. 
I have found it difficult to implement and I now seek some assistance.


I am trying to build a simple factored model from English to Icelandic: T0-0, 
T0,1-1. Where factor 0 is the surface and factor 1 is the POS. The training 
data (source and target) I have has three factors `surface|pos|lemma`, the 
lemma is ignored for now.


When tuning a factored model I run into problems. My first question is, what 
factors should be in tuning data? It seems that I can have all factors as the 
input/source but I'm unsure about the output/target.


I run the tuning like so (using 10 threads):


"$MOSESDECODER"/scripts/training/mert-moses.pl \
"$DEV_DATA_IN"."$LANG_FROM" \
"$DEV_DATA_OUT"."$LANG_TO" \
"$MOSESDECODER"/bin/moses "$BASE_MOSES_INI" \
--mertdir "$MOSESDECODER"/bin \
--working-dir "$TUNE_DIR" \
--decoder-flags="-threads $THREADS"


But then when starting to decode the decoder crashes.


Line Line 9: Initialize search took leave|VBP0.103 seconds total
soft|JJ hands|NNS ,|, Jess|NNP you|PRP very|RB weak|JJ ,|, including|VBG 4: 
Initialize search took  .|.
 : Collecting options took 0.003severe|JJ ,|, his|PRP$I|PRP loved|VBD Sean|NNP 
.|. you|PRP know|VBP .|.
  seconds at moses/Manager.cpp Line 0.129 seconds total
joint|JJ examination|NN Line 2: Initialize search took 141
 Segmentation fault
Exit code: 139


I am monitoring the memory usage and the decoder is only using about 4GB of 
memory from the 32GB allocated when it crashes. Why the is the decoder 
crashing? Are there some recommendations for settings when training a factored 
model?



Haukur Páll Jónsson

Rannsóknarsérfræðingur | Tölvunarfræðideild

Research Specialist | School of Computer Science

Póstfang / E-mail: hauku...@ru.is

[156021669]

Háskólinn í Reykjavík | Reykjavik University

Menntavegur 1 | 101 Reykjavík | Iceland

Sími/Tel: +354 599 6200

www.hr.is


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support