from:"Barry Haddow"

[Moses-support] Final Call for Papers: EMNLP 2020 Fifth Conference on Machine Translation (WMT20)

2020-07-28 Thread Barry Haddow


EMNLP 2020 FiFTH CONFERENCE ON MACHINE TRANSLATION (WMT20)

November 19-20th, Online


*** CALL FOR PAPERS ***

Submission: https://www.softconf.com/emnlp2020/WMT2020/

Website: http://www.statmt.org/wmt20


We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:

 * MT models (neural, statistical etc. )
 * analysis of neural models for MT
 * using comparable corpora for MT
 * selection and preparation of training data for MT
 * incorporating linguistic information into MT
 * decoding
 * system combination
 * error analysis
 * manual and automatic methods for evaluating MT
 * quality estimation for MT

PAPER SUBMISSION INFORMATION

Submissions will consist of full research papers  of 6-10 pages, plus
additional pages for references, formatted following the EMNLP 2020
guidelines. In addition, shared task participants are invited to
submit short papers (suggested length 4-6 pages) describing their
systems or their evaluation metrics. Both submission and review processes
will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

Double submission (including EMNLP) is permitted this year. Please note this
when submitting. There is no arxiv blackout period.


IMPORTANT DATES

Paper submissions:

Paper submission deadline: August 15th, 2020
Notification of acceptance: September 29th, 2020
Camera-ready deadline: October 10th, 2020
Online conference: November 19-20th, 2020



Barry Haddow
(On behalf of the organisers)

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for Papers: EMNLP 2020 Fifth Conference on Machine Translation (WMT20)

2020-05-14 Thread Barry Haddow


EMNLP 2020 FiFTH CONFERENCE ON MACHINE TRANSLATION (WMT20)

November 19-20th, Online

http://www.statmt.org/wmt20


*** CALL FOR PAPERS ***

We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:

 * MT models (neural, statistical etc. )
 * analysis of neural models for MT
 * using comparable corpora for MT
 * selection and preparation of data for MT
 * semi-supervised and unsupervised learning for MT, transfer learning
 * multilingual MT
 * incorporating linguistic information into MT
 * MT inference
 * manual and automatic methods for evaluating MT
 * quality estimation for MT


SHARED TASKS

There are several MT-related shared tasks associated with the workshop. 
These include several translation tasks, automatic post-editing, 
lifelong learning, automatic evaluation, targeted evaluation (test 
suites) and quality estimation. See the conference website for more 
details, and subscribe to the mailing list 
(https://groups.google.com/forum/#!forum/wmt-tasks)


PAPER SUBMISSION INFORMATION

Submissions will consist of full research papers  of 6-10 pages, plus
additional pages for references, formatted following the EMNLP 2020
guidelines. In addition, shared task participants will be invited to
submit short papers (suggested length 4-6 pages) describing their
systems or their evaluation metrics. Both submission and review processes
will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

IMPORTANT DATES

Paper submissions:

Paper submission deadline: August 15th, 2020
Notification of acceptance: September 29th, 2020
Camera-ready deadline: October 10th, 2020
Online conference: November 19-20th, 2020



Barry Haddow
(On behalf of the organisers)

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] PMIndia - A Collection of Parallel Corpora of Languages of India

2020-01-29 Thread Barry Haddow

Hi All

We have released a new sentence aligned corpora pairing English with 13 
different languages spoken in India. Up to 56k sentence pairs are 
available for each pair. The languages of India contained in the corpora 
are Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri, 
Marathi, Odia, Punjabi, Tamil, Telugu and Urdu. We also provide a larger 
version of the corpus, document-aligned only.

The corpus is available here: http://data.statmt.org/pmindia/

There is an accompanying paper which describes the construction of the 
corpus, a comparison of alignment methods, and some initial MT results.

https://arxiv.org/abs/2001.09907


Barry Haddow and Faheem Kirefu




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for Papers: ACL 2019 Fourth Conference on Machine Translation (WMT19)

2019-02-01 Thread Barry Haddow


ACL 2019 FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT19)

** Submission now open https://www.softconf.com/acl2019/wmt **

1-2nd August, 2019, in conjunction with ACL 2019 in Florence, Italy

http://www.statmt.org/wmt19

*** CALL FOR PAPERS ***

We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:

 * MT models (neural, statistical etc. )
 * analysis of neural models for MT
 * using comparable corpora for MT
 * selection and preparation of training data for MT
 * incorporating linguistic information into MT
 * decoding
 * system combination
 * error analysis
 * manual and automatic methods for evaluating MT
 * quality estimation for MT


SHARED TASKS

There are several MT-related shared tasks associated with the workshop, 
all of which are starting or just about to start. These include 
translation, robustness, automatic post-editing, automatic evaluation, 
targeted evaluation (test suites) and quality estimation. See the 
conference website for more details, and subscribe to the mailing list 
(https://groups.google.com/forum/#!forum/wmt-tasks)


PAPER SUBMISSION INFORMATION

Submissions will consist of full research papers  of 6-10 pages, plus
additional pages for references, formatted following the ACL 2019
guidelines. In addition, shared task participants will be invited to
submit short papers (suggested length 4-6 pages) describing their
systems or their evaluation metrics. Both submission and review processes
will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

IMPORTANT DATES

Paper submissions:

Paper submission deadline: May 17th, 2019
Notification of acceptance: June 7th, 2019
Camera-ready deadline: June 17th, 2019
Conference in Florence : August 1-2, 2019




Barry Haddow
(On behalf of the organisers)

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] First Call for Participation: WMT19 Machine Translation related Shared Tasks

2018-12-13 Thread Barry Haddow


ACL 2019 FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT19)

Shared Tasks on translation, MT evaluation, and automated post-editing.


August 1-2, 2019, in conjunction with ACL 2019 in Florence, Italy

As part of WMT, as in previous years, we will be organising a collection 
of shared tasks related to machine translation.  We hope that both 
beginners and established research groups will participate. This year we 
have so far confirmed the following tasks


- Translation tasks
    - News
    - Biomedical
    - Closely related languages
- Evaluation tasks
    - Metrics
    - Quality estimation
- Other tasks
    - Automatic post-editing

Further information, including task rationale, timetables and data will 
be posted on the WMT19 website (http://www.statmt.org/wmt19) in due 
course. Tasks will be launched in January/February with test weeks in 
April.


Intending participants are encouraged to register with the mailing list 
for further announcements 
(https://groups.google.com/forum/#!forum/wmt-tasks 
<https://groups.google.com/forum/#%21forum/wmt-tasks>)


For all tasks,  participants will also be  invited to submit a short 
paper describing their system.



Best wishes
Barry Haddow
(On behalf of the organisers)
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for Papers: EMNLP Third Conference on Machine Translation (WMT18)

2018-06-11 Thread Barry Haddow


EMNLP 2018 THIRD CONFERENCE ON MACHINE TRANSLATION (WMT18)

Featuring shared tasks on translation, evaluation, automated
post-editing and parallel corpus cleaning.

31st October - 1st November, in conjunction with EMNLP 2018 in Brussels, Belgium

http:/www.statmt.org/wmt18

*** CALL FOR PAPERS ***

We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:


* MT models (neural, statistical etc. )
* analysis of neural models for MT
* using comparable corpora for MT
* selecting and preparing training data for MT
* incorporating linguistic information into MT
* decoding
* system combination and selection
* error analysis
* manual and automatic methods for evaluating MT
* quality estimation of MT

SHARED TASKS

The workshop will feature seven shared tasks:

* Translation tasks
 * News
 * Biomedical
 * Multimodal

* Evaluation tasks
 * Metrics
 * Quality estimation

* Other tasks
 * Automatic post-editing
 * Corpus cleaning

The tasks have been announced separately and more information is
available on the workshop website.

PAPER SUBMISSION INFORMATION

Submissions will consist of full research papers  of 6-10 pages, plus
additional pages for references, formatted following the EMNLP 2018
guidelines. In addition, shared task participants will be invited to
submit short papers (suggested length 4-6 pages) describing their
systems, test sets, or their evaluation metrics. Both submission and review 
processes will be
handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

IMPORTANT DATES

Paper submissions:

Paper submission deadline: July 27th, 2018
Notification of acceptance: August 18th, 2018
Camera-ready deadline: August 31st, 2018
Conference in Brussels preceding EMNLP: October 31st - November 1st, 2018

For shared task timetables, see website.



Barry Haddow
(On behalf of the organisers)

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] First Call for Participation: WMT18 Machine Translation related Shared Tasks

2017-12-20 Thread Barry Haddow


EMNLP 2018 THIRD CONFERENCE ON MACHINE TRANSLATION (WMT18)

Shared Tasks on translation, MT evaluation, and automated post-editing.


October 31 - November 1, 2018, in conjunction with EMNLP 2018 in 
Brussels, Belgium


As part of WMT, as in previous years, we will be organising a collection 
of shared tasks related to machine translation.  We hope that both 
beginners and established research groups will participate. This year we 
have so far confirmed the following tasks


- Translation tasks
    - News
    - Biomedical
    - Multimodal
- Evaluation tasks
    - Metrics
    - Quality estimation
- Other tasks
    - Automatic post-editing

Further information, including task rationale, timetables and data will 
be posted on the WMT18 website (http://www.statmt.org/wmt18) in due 
course. Tasks will be launched in January/February with test weeks in 
May/June.


Intending participants are encouraged to register with the mailing list 
for further announcements 
(https://groups.google.com/forum/#!forum/wmt-tasks)


For all tasks,  participants will also be  invited to submit a short 
paper describing their system.



Best wishes
Barry Haddow
(On behalf of the organisers)
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Deploying large models

2017-12-12 Thread Barry Haddow

Hi

Yes, that's true. From Liling's description it sounds like a 
pathologically long sentence is causing Moses to blow up. However he 
states that it happens on random lines -- could it be that there are so 
many threads that the amount of data translated is random but it's the 
same problem line at each time?

cheers - Barry

On 12/12/17 09:24, Marcin Junczys-Dowmunt wrote:
> Hi,
> I think the important part is that Liling actually manages to translate
> several tens of thousands of sentences before that happens. A quick fix
> would be to break your corpus into pieces of 10K sentences each and loop
> over the files. I usually have bad experience with trying to translate
> large batches of text with moses.
>
> Is still trying to load the entire corpus into memory? It used to do that.
>
> W dniu 12.12.2017 o 10:16, Barry Haddow pisze:
>> Hi Liling
>>
>> The short answer is you need need to prune/filter your phrase table
>> prior to creating the compact phrase table. I don't mean "filter model
>> given input", because that won't make much difference if you have a
>> very large input, I mean getting rid of rare translations which won't
>> be used anyway.
>>
>> The compact phrase does not do pruning, it ends up being done in
>> memory, so if you have 750,000 translations of the full-stop in your
>> model then they all get loaded into memory, before Moses selects the
>> top 20.
>>
>> You can use prunePhraseTable from Moses (which bizarrely needs to load
>> a phrase table in order to parse the config file, last time I looked).
>> You could also apply Johnson / entropic pruning, whatever works for you,
>>
>> cheers - Barry
>>
>> On 11/12/17 09:20, liling tan wrote:
>>> Dear Moses community/developers,
>>>
>>> I have a question on how to handle large models created using moses.
>>>
>>> I've a vanilla phrase-based model with
>>>
>>>* PhraseDictionary num-features=4 input-factor=0 output-factor=0
>>>* LexicalReordering num-features=6 input-factor=0 output-factor=0
>>>* KENLM order=5 factor=0
>>>
>>> The size of the model is:
>>>
>>>* compressed phrase table is 5.4GB,
>>>* compressed reordering table is 1.9GB and
>>>* quantized LM is 600MB
>>>
>>>
>>> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm
>>> decoding I use -threads 56 parameter.
>>>
>>> It's takes really long to load the table and after loading, it breaks
>>> inconsistently at different lines when decoding, I notice that the
>>> RAM goes into swap before it breaks.
>>>
>>> I've tried compact phrased table and get a
>>>
>>>* 3.2GB .minphr
>>>* 1.5GV .minlexr
>>>
>>> And the same kind of random breakage happens when RAM goes into swap
>>> after loading the phrase-table.
>>>
>>> Strangely, it still manage to decode ~500K sentences before it breaks.
>>>
>>> Then I've tried with ondisk phrasetable and it's around 37GB
>>> uncompressed. Using the ondisk PT didn't cause breakage but the
>>> decoding time is significantly increased, now it can only decode 15K
>>> sentences in an hour.
>>>
>>> The setup is a little different from normal where we have the
>>> train/dev/test split. Currently, my task is to decode the train set.
>>> I've tried filtering the table with the trainset with
>>> filter-model-given-input.pl <http://filter-model-given-input.pl> but
>>> the size of the compressed table didn't really decrease much.
>>>
>>> The entire training set is made up of 5M sentence pairs and it's
>>> taking 3+ days just to decode ~1.5M sentences with ondisk PT.
>>>
>>>
>>> My questions are:
>>>
>>>   - Are there best practices with regards to deploying large Moses models?
>>>   - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
>>>   - How else should I filter/compress the phrase table?
>>>   - Is it normal to decode only ~500K sentence a day given the machine
>>> specs and the model size?
>>>
>>> I understand that I could split the train set up into two and train 2
>>> models then cross-decode but if the training size is 10M sentence
>>> pairs, we'll face the same issues.
>>>
>>> Thank you for reading the long post and thank you in advances for any
>>> answers, discussions and enlightenment on this issue =)
>>>
>>> Regards

Re: [Moses-support] Deploying large models

2017-12-12 Thread Barry Haddow


Hi Liling

The short answer is you need need to prune/filter your phrase table 
prior to creating the compact phrase table. I don't mean "filter model 
given input", because that won't make much difference if you have a very 
large input, I mean getting rid of rare translations which won't be used 
anyway.


The compact phrase does not do pruning, it ends up being done in memory, 
so if you have 750,000 translations of the full-stop in your model then 
they all get loaded into memory, before Moses selects the top 20.


You can use prunePhraseTable from Moses (which bizarrely needs to load a 
phrase table in order to parse the config file, last time I looked). You 
could also apply Johnson / entropic pruning, whatever works for you,


cheers - Barry

On 11/12/17 09:20, liling tan wrote:

Dear Moses community/developers,

I have a question on how to handle large models created using moses.

I've a vanilla phrase-based model with

  * PhraseDictionary num-features=4 input-factor=0 output-factor=0
  * LexicalReordering num-features=6 input-factor=0 output-factor=0
  * KENLM order=5 factor=0

The size of the model is:

  * compressed phrase table is 5.4GB,
  * compressed reordering table is 1.9GB and
  * quantized LM is 600MB


I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm 
decoding I use -threads 56 parameter.


It's takes really long to load the table and after loading, it breaks 
inconsistently at different lines when decoding, I notice that the RAM 
goes into swap before it breaks.


I've tried compact phrased table and get a

  * 3.2GB .minphr
  * 1.5GV .minlexr

And the same kind of random breakage happens when RAM goes into swap 
after loading the phrase-table.


Strangely, it still manage to decode ~500K sentences before it breaks.

Then I've tried with ondisk phrasetable and it's around 37GB 
uncompressed. Using the ondisk PT didn't cause breakage but the 
decoding time is significantly increased, now it can only decode 15K 
sentences in an hour.


The setup is a little different from normal where we have the 
train/dev/test split. Currently, my task is to decode the train set. 
I've tried filtering the table with the trainset with 
filter-model-given-input.pl  but 
the size of the compressed table didn't really decrease much.


The entire training set is made up of 5M sentence pairs and it's 
taking 3+ days just to decode ~1.5M sentences with ondisk PT.



My questions are:

 - Are there best practices with regards to deploying large Moses models?
 - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
 - How else should I filter/compress the phrase table?
 - Is it normal to decode only ~500K sentence a day given the machine 
specs and the model size?


I understand that I could split the train set up into two and train 2 
models then cross-decode but if the training size is 10M sentence 
pairs, we'll face the same issues.


Thank you for reading the long post and thank you in advances for any 
answers, discussions and enlightenment on this issue =)


Regards,
LIling


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS for the neural age?

2017-11-27 Thread Barry Haddow

Hi All

I did produce a version of experiment.perl for Groundhog (remember 
that?) but it's not much use for any other nmt system. The problem (well 
actually the big advantage!) of nmt is that the pipeline is too simple 
for a tool like experiment.perl. And the experiments that do need tool 
support (eg parameter sweeps) were never supported by experiment.perl 
anyway. Sounds as though eman is much better for this ...

For the marian usage examples, I would go for the lowest common 
denominator - shell scripts. Generally more readable than Makefiles.

cheers - Barry

On 26/11/17 14:56, Marcin Junczys-Dowmunt wrote:
> Dou you have an URL or repo for that?
>
> W dniu 26.11.2017 o 14:58, Ondrej Bojar pisze:
>> I can really recommend Eman to people interested in spawning dozens of 
>> similar experiments, by deriving new variations from older ones. Eman will 
>> take care of reusing reusable bits and creating new bits as needed. So e.g. 
>> corpus preprocessing, bpe etc. could be reused when you want to try new 
>> training parameters.
>>
>> The main reason for my little hesitation is that there has been no cleanup 
>> and release of Eman and friends for quite a while. We are actively using it, 
>> but such a polish would be better before trying to get fresh publicity.
>>
>> Please have a look at the old eman PBML paper and if you think this would 
>> match the intended use for your aimed user group, it would make a great 
>> incentive for us to do this cleanup.
>>
>> Thanks, O.
>>
>>
>> 26. listopadu 2017 14:31:40 SEČ, Marcin Junczys-Dowmunt  
>> napsal:
>>> Hi Ondrej,
>>> you do not seem confident enough to recommend Eman :)
>>>
>>> I now took another look at duct tape. That does not look too bad,
>>> basically Make with multi-targets and easier reuse of existing recipes
>>> (which is a nightmare in GNU make).
>>> Is anyone still using duct tape, commit dates are from two years ago?
>>>
>>> W dniu 26.11.2017 o 13:30, Ondrej Bojar pisze:
 Hi, Marcin.

 I am afraid you are correct. I have my Eman and a couple of my
>>> students are using it (we have Neural Monkey, Nematus, t2t and probably
>>> already also Marian in), but it has a rather steep learning curve and
>>> it generally has other bells and whistles that what someone with data
>>> and desire for a single model would ask for.
 There were also Makefiles for Moses, but I never tried those.

 And Neural Monkey has most of the pre-processing and evaluation in
>>> itself.
 I guess that commented oneliner snippets are the best thing you can
>>> do.
 Cheers, O.


 26. listopadu 2017 10:41:16 SEČ, Marcin Junczys-Dowmunt
>>>  napsal:
> Hi list,
>
> I am preparing a couple of usage example for my NMT toolkit and got
> hung
> up on all the preprocessing and other evil stuff. I am wondering is
> there now anything decent around for doing preprocessing, running
> experiments and evaluation? Or is the best thing still GNU make
>>> (isn't
> that embarrassing)?
>
> Best,
>
> Marcin
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] extract minimal phrase pairs

2017-11-10 Thread Barry Haddow


Hi Jorg

Since the operation sequence model is based in minimal phrase pairs, its 
training code should be able to do the extraction (although I'm not 
familiar with this code)


cheers - Barry

On 08/11/17 19:12, Jorg Tiedemann wrote:

Hi,

Can I use moses extract or any other tool to extract only minimal 
phrase pairs from word-aligned bitexts?

I need an efficient implementations that I can run on large data sets ...

Jörg



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] NCv12 number of lines mismatch

2017-09-13 Thread Barry Haddow

Hi Vincent

Looks fine to me:

> wc -l news-commentary-v12.de-en.*
>   270769 news-commentary-v12.de-en.de
>   270769 news-commentary-v12.de-en.en
>   541538 total

What are you running that shows you different line numbers?

cheers - Barry

On 12/09/17 10:06, Vincent Nguyen wrote:
> Hi,
> Is there an updated version of NCv12 for this
> http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
>
> the number of lines for de-en is not the same in the 2 languages.
>
> Cheers,
> Vincent
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Final Call For Papers: Second Conference on Machine Translation (WMT17)

2017-05-25 Thread Barry Haddow

EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)

** Submission now openhttps://www.softconf.com/emnlp2017/wmt/  **


7-8th September, 2017, in conjunction with EMNLP 2017 in Copenhagen, Denmark

http://www.statmt.org/wmt17

*** CALL FOR PAPERS ***

We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:

* word-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* incorporating linguistic information into SMT
* decoding
* system combination and selection
* error analysis
* manual and automatic methods for evaluating MT
* quality estimation of MT
* scaling MT to very large data sets
* neural  MT


PAPER SUBMISSION INFORMATION

Submissions will consist of full research papers  of 6-10 pages, plus
additional pages for references, formatted following the EMNLP 2017
guidelines. In addition, shared task participants will be invited to
submit short papers (suggested length 4-6 pages) describing their
systems or their evaluation metrics. Both submission and review processes
will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

IMPORTANT DATES

Paper submissions:

Paper submission deadline: June 9th, 2017
Extended deadline (metrics papers only) June 15th, 2017
Notification of acceptance: June 30th, 2017
Camera-ready deadline: July 14th, 2017
Workshop in Copenhagen preceding EMNLP: September 7-8th, 2017




Barry Haddow
(On behalf of the organisers)


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] SMT decoding complexity

2017-02-27 Thread Barry Haddow

Hi Amir

You could also try this paper for a derivation of the complexity of PBMT 
decoding
https://www.aclweb.org/anthology/E/E09/E09-1061v2.pdf

cheers - Barry

On 27/02/17 15:54, Philipp Koehn wrote:
> Hi,
>
> I am not sure if you follow your question - in the formula you cite,
> there are exponential terms: 2^n and T^n.
>
> The Knight paper is worth trying to understand (it's on IBM Models,
> but applies similarly to phrase-based models).
>
> Also keep in mind that limited reordering windows and beam search
> makes actual decoding algorithm implementations linear.
>
> -phi
>
> On Sun, Feb 26, 2017 at 1:16 PM, amir haghighi
>  wrote:
>> Hi all,
>>
>> In the Moses manual and also in SMT textbooks it is mentioned that the
>> decoding complexity for PB-SMT is exponential in the source sentence length.
>> If we have a source sentence with length n, in decoding by hypothesis
>> expansion, we have power(2,n) state, each of them can be reordered in n!
>> orders, and each state can be translated in power(T,n), where T is the
>> number of translation options, right?
>> so the decoder complexity is power(2,n)*n!*power(T,n), so why its mentioned
>> that the complexity is exponential?
>>
>> Could someone please explain for me how the decoder complexity is
>> calculated?
>> I've read the Knight(1999) paper, but I couldn't understand it. Could you
>> please introduce another reference?
>>
>> Thanks
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] First Call for Participation: WMT17 Machine Translated related Shared Tasks

2016-12-06 Thread Barry Haddow


EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)
Shared Tasks on translation, evaluation, training and automated 
post-editing.


http://www.statmt.org/wmt17/index.html
September 7-8 2017, in conjunction with EMNLP 2017 in Copenhagen, Denmark

As part of WMT, as in previous years, we will be organising a collection 
of shared tasks related to machine translation.  We hope that both 
beginners and established research groups will participate. This year we 
have so far confirmed the following tasks


- Translation tasks
- News
- Biomedical
- Multimodal
- Evaluation tasks
- Metrics
- Quality estimation
- Other tasks
- Bandit learning
- Neural MT training
- Automatic post-editing

Further information, including task rationale, timetables and data will 
be posted on the WMT17 website, in time for the task launches in 
January/February.  Intending participants are encouraged to register 
with the mailing list for further announcements 
(https://groups.google.com/forum/#!forum/wmt-tasks)


For all tasks,  participants will also be  invited to submit a short 
paper describing their system.



Best wishes
Barry Haddow
(On behalf of the organisers)







The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS dry-run flag?

2016-12-05 Thread Barry Haddow


In steps/0

On 05/12/16 22:36, Fred Blain wrote:

hi Lane,

if you omit the '-exec' in your call to experiment.perl, it will only
generate the required scripts without running anything. you will find the
scripts under the steps/ folder.

best,


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] NMT vs Moses

2016-11-24 Thread Barry Haddow


Hi Nat

Imagine it's a translator using MT and somehow he/she has translated 
the sentence before and just wants the exact translation. A TM would 
solve the problem and Moses surely could emulate the TM but NMT tends 
to go overly creative and produces something else.

Then just use a TM for this. Fast and simple.

You can probably create a seq2seq model which will do the copying when 
appropriate (see e.g. 
https://www.aclweb.org/anthology/P/P16/P16-1154.pdf), but in the 
scenario you describe I think there is really no need.


cheers - Barry

On 24/11/16 10:22, Nat Gillin wrote:

Dear Moses Community,

This seems to be prickly topic to discuss but my experiments on a 
different kind of data set than WMT or WAT (workshop for asian 
translation) has not been able to achieve the stella scores that the 
recent advancement in MT has been reporting.


Using state-of-art encoder-attention-decoder framework, just by 
running things like lamtram or tensorflow, I'm unable to beat Moses' 
scores from sentences that appears both in the train and test data.


Imagine it's a translator using MT and somehow he/she has translated 
the sentence before and just wants the exact translation. A TM would 
solve the problem and Moses surely could emulate the TM but NMT tends 
to go overly creative and produces something else. Although it is 
consistent in giving the same output for the same sentence, it's just 
unable to regurgitate the sentence that was seen in the training data. 
In that matter, Moses does it pretty well.


For sentences that is not in train but in test, NMT does fairly the 
same or sometimes better than Moses.


So the question is 'has anyone encounter similar problems?' Is the 
solution simply to do a fetch in the train set before translating? Or 
a system/output chooser to rerank outputs?


Are there any other ways to resolve such a problem? What could have 
happened such that NMT is not "remembering"? (Maybe it needs some 
memberberries)


Any tips/hints/discussion on this is much appreciated.

Regards,
Nat


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow


Hi Hasan

This is full of messages about incorrect alignments. It looks like 
something went wrong at an earlier stage. I would start again from the 
beginning with a small data set, in order to debug your setup,


cheers - Barry

On 02/11/16 15:32, Hasan Sait ARSLAN wrote:

Hi Barry,

I have rerun training starting from step 5, and I kept log file. I 
have checked log file, and it doesn't show any bug about creating 
phrase-table, but phrase-table is still not being created. I am 
attaching log file, could you please check it, and show me where is 
the problem?


Thanks,
log.gz 
<https://drive.google.com/file/d/0BxvJK3H5ZKsnYzJiZmhjUWI0Qlk/view?usp=drive_web>



2016-11-02 14:28 GMT+02:00 Barry Haddow <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>>:


Adding
-first-step 5 -last-step 5
will just run step 5 (phrase extraction)


On 02/11/16 12:01, Hasan Sait ARSLAN wrote:

For instance, could you show me an example?

Thanks,

2016-11-02 13:57 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

You can use the first-step and last-step arguments to run
steps manually.

cheers - Barry


On 02/11/16 11:46, Hasan Sait ARSLAN wrote:

Sorry,

as I see my model folder, I can say that Lexical translation
part also went well.

so, the only steps I have to run manually are'

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>

  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>

  * 7 Reordering model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file

<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>






2016-11-02 13:43 GMT+02:00 Hasan Sait ARSLAN
<hasan.sait.ars...@gmail.com
<mailto:hasan.sait.ars...@gmail.com>>:

Hi Barry,

Actually I know where I it stuck.

As I check my train folder, I can see the all the files
which show that

  * 1 Prepare data
<http://www.statmt.org/moses/?n=FactoredTraining.PrepareData>

  * 2 Run GIZA
<http://www.statmt.org/moses/?n=FactoredTraining.RunGIZA>

  * 3 Align words
<http://www.statmt.org/moses/?n=FactoredTraining.AlignWords>


went well.

So, is there any way to run the remaining


  * 4 Lexical translation

<http://www.statmt.org/moses/?n=FactoredTraining.GetLexicalTranslationTable>

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>

  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>

  * 7 Reordering model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file

<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>


steps manually?


2016-11-02 13:27 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

It's not really possible to debug without a log
file, so I think that unfortunately you should start
again.

I suggest you start with a much smaller corpus to
try to debug your setup. If you use 100,000
sentences it should train within hours.

cheers - Barry


On 02/11/16 11:13, Hasan Sait ARSLAN wrote:

Hi Barry,

    Unfortunately I didn't keep the log file. Is it
really a hopeless situation?

2016-11-02 13:10 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

You should have run train_model.perl something
like this:

nohup nice /path/to/train_model.perl
ARGS_FOR_TRAINING &> log &

You just need the log file

cheers - Barry


On 02/11/16 11:00, Hasan Sait ARSLAN wrote:

    Hi Barry,

I

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow


Adding
-first-step 5 -last-step 5
will just run step 5 (phrase extraction)

On 02/11/16 12:01, Hasan Sait ARSLAN wrote:

For instance, could you show me an example?

Thanks,

2016-11-02 13:57 GMT+02:00 Barry Haddow <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>>:


Hi Hasan

You can use the first-step and last-step arguments to run steps
manually.

cheers - Barry


On 02/11/16 11:46, Hasan Sait ARSLAN wrote:

Sorry,

as I see my model folder, I can say that Lexical translation part
also went well.

so, the only steps I have to run manually are'

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>
  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>
  * 7 Reordering model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file

<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>






2016-11-02 13:43 GMT+02:00 Hasan Sait ARSLAN
<hasan.sait.ars...@gmail.com <mailto:hasan.sait.ars...@gmail.com>>:

Hi Barry,

Actually I know where I it stuck.

As I check my train folder, I can see the all the files which
show that

  * 1 Prepare data
<http://www.statmt.org/moses/?n=FactoredTraining.PrepareData>

  * 2 Run GIZA
<http://www.statmt.org/moses/?n=FactoredTraining.RunGIZA>
  * 3 Align words
<http://www.statmt.org/moses/?n=FactoredTraining.AlignWords>

went well.

So, is there any way to run the remaining


  * 4 Lexical translation

<http://www.statmt.org/moses/?n=FactoredTraining.GetLexicalTranslationTable>

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>

  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>

  * 7 Reordering model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model

<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file

<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>


steps manually?


2016-11-02 13:27 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

It's not really possible to debug without a log file, so
I think that unfortunately you should start again.

I suggest you start with a much smaller corpus to try to
debug your setup. If you use 100,000 sentences it should
train within hours.

cheers - Barry


On 02/11/16 11:13, Hasan Sait ARSLAN wrote:

Hi Barry,

Unfortunately I didn't keep the log file. Is it really a
hopeless situation?

2016-11-02 13:10 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

You should have run train_model.perl something like
this:

nohup nice /path/to/train_model.perl
ARGS_FOR_TRAINING &> log &

You just need the log file

cheers - Barry


On 02/11/16 11:00, Hasan Sait ARSLAN wrote:

Hi Barry,

    I don't have any idea about where it is saved.

Cheers,

2016-11-02 12:55 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

If your phrase table is empty, that would
explain why tuning didn't work. Something went
wrong earlier in the process. Could you post
your log file from train_model.perl,

cheers - Barry


On 02/11/16 10:45, Hasan Sait ARSLAN wrote:

Now currently I have a bigger problem. I have
trained my data for 5 days, and the folder
"train" is 39 G, but there is no any phrases
saved on phrase table. It is annoying. What
        should I do now? I hope I won't need to rerun
everything from the scratch

2016-11-02 12:26 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staff

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow


Hi Hasan

You can use the first-step and last-step arguments to run steps manually.

cheers - Barry

On 02/11/16 11:46, Hasan Sait ARSLAN wrote:

Sorry,

as I see my model folder, I can say that Lexical translation part also 
went well.


so, the only steps I have to run manually are'

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>
  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>
  * 7 Reordering model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file
<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>






2016-11-02 13:43 GMT+02:00 Hasan Sait ARSLAN 
<hasan.sait.ars...@gmail.com <mailto:hasan.sait.ars...@gmail.com>>:


Hi Barry,

Actually I know where I it stuck.

As I check my train folder, I can see the all the files which show
that

  * 1 Prepare data
<http://www.statmt.org/moses/?n=FactoredTraining.PrepareData>
  * 2 Run GIZA
<http://www.statmt.org/moses/?n=FactoredTraining.RunGIZA>
  * 3 Align words
<http://www.statmt.org/moses/?n=FactoredTraining.AlignWords>

went well.

So, is there any way to run the remaining


  * 4 Lexical translation

<http://www.statmt.org/moses/?n=FactoredTraining.GetLexicalTranslationTable>

  * 5 Extract phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ExtractPhrases>
  * 6 Score phrases
<http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases>
  * 7 Reordering model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel>

  * 8 Generation model
<http://www.statmt.org/moses/?n=FactoredTraining.BuildGenerationModel>

  * 9 Configuration file

<http://www.statmt.org/moses/?n=FactoredTraining.CreateConfigurationFile>


steps manually?


2016-11-02 13:27 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

It's not really possible to debug without a log file, so I
think that unfortunately you should start again.

I suggest you start with a much smaller corpus to try to debug
your setup. If you use 100,000 sentences it should train
within hours.

cheers - Barry


On 02/11/16 11:13, Hasan Sait ARSLAN wrote:

Hi Barry,

Unfortunately I didn't keep the log file. Is it really a
hopeless situation?

2016-11-02 13:10 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

You should have run train_model.perl something like this:

nohup nice /path/to/train_model.perl ARGS_FOR_TRAINING &>
log &

You just need the log file

cheers - Barry


On 02/11/16 11:00, Hasan Sait ARSLAN wrote:

Hi Barry,

    I don't have any idea about where it is saved.

Cheers,

2016-11-02 12:55 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

If your phrase table is empty, that would explain
why tuning didn't work. Something went wrong earlier
in the process. Could you post your log file from
train_model.perl,

cheers - Barry


On 02/11/16 10:45, Hasan Sait ARSLAN wrote:

Now currently I have a bigger problem. I have
trained my data for 5 days, and the folder "train"
is 39 G, but there is no any phrases saved on
phrase table. It is annoying. What should I do now?
I hope I won't need to rerun everything from the
scratch

2016-11-02 12:26 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

The error message should be written into
filterphrases.err, inside your working directory,

cheers - Barry


On 02/11/16 10:02, Hasan Sait ARSLAN wrote:

Hi Hieu,

I did in the way, you want. Plus, I am sure
the path and file names are correctly spelt.
 But still, I get the same error.

2016-11-01 21:41 GMT+02:00 Hasan Sait ARSLAN
<hasan.sait.ars...@gmail.com
<mailto:hasan.sait.ars...@gmail.com>>:

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow


Hi Hasan

It's not really possible to debug without a log file, so I think that 
unfortunately you should start again.


I suggest you start with a much smaller corpus to try to debug your 
setup. If you use 100,000 sentences it should train within hours.


cheers - Barry

On 02/11/16 11:13, Hasan Sait ARSLAN wrote:

Hi Barry,

Unfortunately I didn't keep the log file. Is it really a hopeless 
situation?


2016-11-02 13:10 GMT+02:00 Barry Haddow <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>>:


Hi Hasan

You should have run train_model.perl something like this:

nohup nice /path/to/train_model.perl ARGS_FOR_TRAINING &> log &

You just need the log file

cheers - Barry


On 02/11/16 11:00, Hasan Sait ARSLAN wrote:

Hi Barry,

I don't have any idea about where it is saved.

Cheers,

2016-11-02 12:55 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

If your phrase table is empty, that would explain why tuning
didn't work. Something went wrong earlier in the process.
Could you post your log file from train_model.perl,

cheers - Barry


On 02/11/16 10:45, Hasan Sait ARSLAN wrote:

Now currently I have a bigger problem. I have trained my
data for 5 days, and the folder "train" is 39 G, but there
is no any phrases saved on phrase table. It is annoying.
What should I do now? I hope I won't need to rerun
everything from the scratch

    2016-11-02 12:26 GMT+02:00 Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>>:

Hi Hasan

The error message should be written into
filterphrases.err, inside your working directory,

cheers - Barry


On 02/11/16 10:02, Hasan Sait ARSLAN wrote:

Hi Hieu,

I did in the way, you want. Plus, I am sure the path
and file names are correctly spelt.
 But still, I get the same error.

2016-11-01 21:41 GMT+02:00 Hasan Sait ARSLAN
<hasan.sait.ars...@gmail.com
<mailto:hasan.sait.ars...@gmail.com>>:

Thank you Hieu, I will do it, and give you feedback
about it.

Kind Regards,

2016-11-01 21:37 GMT+02:00 Hieu Hoang
<hieuho...@gmail.com <mailto:hieuho...@gmail.com>>:

the command you execute looks ok. the problem
is likely to be with your using tuning data.

You should look at your tuning data, make sure
you've spelt the path and filename correctly,
use a very small subset (eg. start with 1
parallel sentence) and increase the number of
sentences until you find the problem



On 01/11/2016 12:30, Hasan Sait ARSLAN wrote:

Hello,

I have trained dataset for 5 days, and for 2-3
days I am dealing with the bug in tuning, but
still couldn't solve the problem.

I use the following command to run:

*/home/sait/mosesdecoder/scripts/training/mert-moses.pl
<http://mert-moses.pl>
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr>
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.en
/home/sait/mosesdecoder/bin/moses
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
--working-dir
/home/sait/Kairit/Task3/working_tr-en/mert
--decoder-flags "--threads 32"

*
Then I get the following error:

*Using SCRIPTS_ROOTDIR:
/home/sait/mosesdecoder/scripts
Assuming --mertdir=/home/sait/mosesdecoder/bin
filtering the phrase tables... T nov1
19:17:39 EET 2016
exec:

/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
<http://filter-model-given-input.pl>
./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr>
Executing:

/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
<http://filter-model-given-input.pl>
./filtered
/home/sait/Kairit/Task3/working_tr-e

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow


Hi Hasan

If your phrase table is empty, that would explain why tuning didn't 
work. Something went wrong earlier in the process. Could you post your 
log file from train_model.perl,


cheers - Barry

On 02/11/16 10:45, Hasan Sait ARSLAN wrote:
Now currently I have a bigger problem. I have trained my data for 5 
days, and the folder "train" is 39 G, but there is no any phrases 
saved on phrase table. It is annoying. What should I do now? I hope I 
won't need to rerun everything from the scratch


2016-11-02 12:26 GMT+02:00 Barry Haddow <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>>:


Hi Hasan

The error message should be written into filterphrases.err, inside
your working directory,

cheers - Barry


On 02/11/16 10:02, Hasan Sait ARSLAN wrote:

Hi Hieu,

I did in the way, you want. Plus, I am sure the path and file
names are correctly spelt.
 But still, I get the same error.

2016-11-01 21:41 GMT+02:00 Hasan Sait ARSLAN
<hasan.sait.ars...@gmail.com <mailto:hasan.sait.ars...@gmail.com>>:

Thank you Hieu, I will do it, and give you feedback about it.

Kind Regards,

2016-11-01 21:37 GMT+02:00 Hieu Hoang <hieuho...@gmail.com
<mailto:hieuho...@gmail.com>>:

the command you execute looks ok. the problem is likely
to be with your using tuning data.

You should look at your tuning data, make sure you've
spelt the path and filename correctly, use a very small
subset (eg. start with 1 parallel sentence) and increase
the number of sentences until you find the problem



On 01/11/2016 12:30, Hasan Sait ARSLAN wrote:

Hello,

I have trained dataset for 5 days, and for 2-3 days I am
dealing with the bug in tuning, but still couldn't solve
the problem.

I use the following command to run:

*/home/sait/mosesdecoder/scripts/training/mert-moses.pl
<http://mert-moses.pl>
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr>
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.en
/home/sait/mosesdecoder/bin/moses
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
--working-dir /home/sait/Kairit/Task3/working_tr-en/mert
--decoder-flags "--threads 32"

*
Then I get the following error:

*Using SCRIPTS_ROOTDIR: /home/sait/mosesdecoder/scripts
Assuming --mertdir=/home/sait/mosesdecoder/bin
filtering the phrase tables... T nov1 19:17:39 EET 2016
exec:
/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
<http://filter-model-given-input.pl> ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr>
Executing:
/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
<http://filter-model-given-input.pl> ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr> > filterphrases.out 2>
filterphrases.err
Exit code: 255
ERROR: Failed to run

'/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
<http://filter-model-given-input.pl> ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
<http://dev.clean.tr>'. at
/home/sait/mosesdecoder/scripts/training/mert-moses.pl
<http://mert-moses.pl> line 1748.

*
I have searched over the internet about the error, but
unfortunately I couldn't find any beneficial solution.

It is so annoying, and eats my time, please help me to
solve my problem.

Thanks,


___
Moses-support mailing list
Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support  
<http://mailman.mit.edu/mailman/listinfo/moses-support>






___
Moses-support mailing list
Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support  
<http://mailman.mit.edu/mailman/listinfo/moses-support>



The University of Edinburgh is a charitable body, registered in
Scotland, with r

Re: [Moses-support] error in mosesdecoder tuning

2016-11-02 Thread Barry Haddow

Hi Hasan

The error message should be written into filterphrases.err, inside your 
working directory,

cheers - Barry

On 02/11/16 10:02, Hasan Sait ARSLAN wrote:

Hi Hieu,

I did in the way, you want. Plus, I am sure the path and file names 
are correctly spelt.

 But still, I get the same error.

2016-11-01 21:41 GMT+02:00 Hasan Sait ARSLAN 
>:

Thank you Hieu, I will do it, and give you feedback about it.

Kind Regards,

2016-11-01 21:37 GMT+02:00 Hieu Hoang >:

the command you execute looks ok. the problem is likely to be
with your using tuning data.

You should look at your tuning data, make sure you've spelt
the path and filename correctly, use a very small subset (eg.
start with 1 parallel sentence) and increase the number of
sentences until you find the problem

On 01/11/2016 12:30, Hasan Sait ARSLAN wrote:

Hello,

I have trained dataset for 5 days, and for 2-3 days I am
dealing with the bug in tuning, but still couldn't solve the
problem.

I use the following command to run:

*/home/sait/mosesdecoder/scripts/training/mert-moses.pl

/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr

/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.en
/home/sait/mosesdecoder/bin/moses
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
--working-dir /home/sait/Kairit/Task3/working_tr-en/mert
--decoder-flags "--threads 32"

*
Then I get the following error:

*Using SCRIPTS_ROOTDIR: /home/sait/mosesdecoder/scripts
Assuming --mertdir=/home/sait/mosesdecoder/bin
filtering the phrase tables... T nov1 19:17:39 EET 2016
exec:
/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
 ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr

Executing:
/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
 ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
 > filterphrases.out 2> filterphrases.err
Exit code: 255
ERROR: Failed to run
'/home/sait/mosesdecoder/scripts/training/filter-model-given-input.pl
 ./filtered
/home/sait/Kairit/Task3/working_tr-en/train/model/moses.ini
/home/sait/Kairit/Task3/dataset_tur_en/dev.clean.tr
'. at
/home/sait/mosesdecoder/scripts/training/mert-moses.pl
 line 1748.

*
I have searched over the internet about the error, but
unfortunately I couldn't find any beneficial solution.

It is so annoying, and eats my time, please help me to solve
my problem.

Thanks,

___
Moses-support mailing list
Moses-support@mit.edu  
http://mailman.mit.edu/mailman/listinfo/moses-support  

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] News monolingual corpus question

2016-10-04 Thread Barry Haddow

Hi Vincent

Could you say exactly which files you are comparing?

cheers - Barry

On 04/10/16 21:20, Vincent Nguyen wrote:
>
> no but my mistake I was comparing with that link for the per year 
> files : http://www.statmt.org/wmt15/translation-task.html
>
> what is the difference ? (with the wmt11 files)
>
>
>
> Le 04/10/2016 à 21:46, Barry Haddow a écrit :
>> Hi Vincent
>>
>> Are you comparing compressed with uncompressed files?
>>
>> cheers - Barry
>>
>> On 04/10/16 14:40, Vincent Nguyen wrote:
>>> Hi,
>>>
>>> on this link:
>>>
>>> http://www.statmt.org/wmt11/translation-task.html
>>>
>>> on the download section for monolingual data, there is :
>>>
>>> one big file : http://www.statmt.org/wmt11/training-monolingual.tgz
>>>
>>> And separate files, of which news crawls per year.
>>>
>>> However, when you take a single file for a specific year, it is not the
>>> same size as the same name file in the big download.
>>>
>>> expanded size for english corpus :
>>>
>>> news2008: 4.3GB vs 1.6GB for single download
>>> news2009: 5.3GB vs 1.8GB for single download
>>>
>>> etc...
>>>
>>> can someone please explain the difference ?
>>>
>>> thanks
>>>
>>> Vincent.
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] News monolingual corpus question

2016-10-04 Thread Barry Haddow

Hi Vincent

Are you comparing compressed with uncompressed files?

cheers - Barry

On 04/10/16 14:40, Vincent Nguyen wrote:
> Hi,
>
> on this link:
>
> http://www.statmt.org/wmt11/translation-task.html
>
> on the download section for monolingual data, there is :
>
> one big file : http://www.statmt.org/wmt11/training-monolingual.tgz
>
> And separate files, of which news crawls per year.
>
> However, when you take a single file for a specific year, it is not the
> same size as the same name file in the big download.
>
> expanded size for english corpus :
>
> news2008: 4.3GB vs 1.6GB for single download
> news2009: 5.3GB vs 1.8GB for single download
>
> etc...
>
> can someone please explain the difference ?
>
> thanks
>
> Vincent.
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: moses.ini file for sparse features

2016-08-20 Thread Barry Haddow


Hi Arefeh

Are your features tuneable?

I think if you run
moses --show-weights -f path-to-ini
it should tell you

cheers - Barry

On 20/08/16 15:52, arefeh kazemi wrote:

Hi Barry

Thanks so much,
moses.ini seems ok, but my feature isn't in *.best files.
These are the lines in my moses.ini:

[feature]
Aref trainFilePath=/files/entr/entr-dd-surf-tune.csv 
decodeFilePath=/files/entr/entr_Tune_Decoding_WithIndex.txt

UnknownWordPenalty

# dense weights for feature functions
[weight-file]
/old-scratch/akazemi/experiments/entr-mymodel/dd-sur/sparse/weights.txt

This is the command that I use for tuning:
nohup nice $SCRIPTS_ROOTDIR/training/mert-moses.pl 
<http://mert-moses.pl> entr-tune.en entr-tune.tr <http://entr-tune.tr> \

$MOSESCHART $WorkingDir/moses.ini --mertdir $MERTDIR \
--rootdir $SCRIPTS_ROOTDIR \
--batch-mira --return-best-dev --decoder-flags '-threads all -v 2' > 
mert.out


This is a line from debug output, red ones are my features:
BEST TRANSLATION: 639354 S  -> S  :0-0 : c=-0.024 
core=(0.000,-1.000,1.000,0.000,0.000,0.000,0.000,0.000,0.000) 
sourcePhrase=X   [0..28] 638176 [total=-5.857] 
core=(0.000,-33.000,41.000,-34.703,-48.214,-30.033,-39.420,16.998,-96.448) 
m=-3.414 prepprepm=-9.457 unalignedPenalty=79.000 
cohesionPenalty=61.000 amodprepm=-10.328 detpreps=-3.297 
amodamodm=-7.082 detnnm=-7.316

Best Hypothesis Generation Time: : [401.400] seconds
WRITING 100 TRANSLATION ALTERNATIVES TO run1.best100.out
N-Best Hypotheses Generation Time: : [401.645] seconds
Sentence Decoding Time: : [401.650] seconds
Translation took 1907.342 seconds


Should I check anything else?

Regards
Arefeh


On Sat, Aug 20, 2016 at 4:53 PM, Barry Haddow 
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>> wrote:


Hi Arefeh

Attached.

If you look at the files produced in your tuning run, you should
see the following produced for each iteration:

run2.best100.out.gz
run2.dense
run2.extract.err
run2.extract.out
run2.features.dat
run2.init.opt
run2.mert.log
run2.mert.out
run2.mira.out
run2.moses.ini
run2.out
run2.scores.dat
run2.sparse-weights
run2.weights.txt


(run1 may be different)

In run2.best100.out.gz you should see the sparse feature values.
In run2.mert.out you should see the sparse feature weights.


cheers - Barry


On 20/08/16 12:47, arefeh kazemi wrote:

Hi Barry

Thanks,
I tried verbose and I can see my sparse feature and it's scores
for each sentence, but the weights.txt file is still empty.
(I use kbmira for tuning).
I think there is a problem with moses.ini file.
Could you please send an example .ini file with a sparse feature?

Thanks
Arefeh

On Thu, Aug 18, 2016 at 6:13 PM, Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>
wrote:

Hi Arefeh

The quickest way to see if Moses is using your feature is to
put a debug message in it to see if it gets called. You can
also increase the debug of Moses (try -v 2) to see if your
feature's scores appear in the breakdown.

To populate the weights file, you will need to run tuning
(kbmira or pro). If you just decode with an empty weights
file, all the weights will be set to 0 and Moses will not
update the file,

cheers - Barry


On 18/08/16 14:15, arefeh kazemi wrote:

Hi Barry

Thanks.
I create an empty weights.txt file and write it's address in
moses.ini. Moses runs normally but weights file remains
empty. It seems moses doesn't use my feature.

Regards
Arefeh

    On Wed, Aug 17, 2016 at 12:57 PM, Barry Haddow
<bhad...@staffmail.ed.ac.uk
<mailto:bhad...@staffmail.ed.ac.uk>> wrote:

Hi Arefeh

That seems OK. Tuning (with kbmira or pro) will create a
weights file for the sparse features, which you can add
with:

[weight-file]
/path/to/sparse/weights

What goes wrong when you run moses?

cheers - Barry


On 17/08/16 07:50, arefeh kazemi wrote:

Hi
This is just a kindly reminder that I'm waiting for
response.

Thanks
Arefe
-- Forwarded message --
From: *arefeh kazemi* <akazem...@gmail.com
<mailto:akazem...@gmail.com>>
Date: Tue, Aug 9, 2016 at 7:28 PM
Subject: moses.ini file for sparse features
To: Moses-support <moses-support@mit.edu
<mailto:moses-support@mit.edu>>


Hi

I've implemented a sparse feature function in Moses
Hiero system but I don't know what are the parameters
in moses.ini file for a sparse feature.

Re: [Moses-support] Fwd: moses.ini file for sparse features

2016-08-20 Thread Barry Haddow


Hi Arefeh

Attached.

If you look at the files produced in your tuning run, you should see the 
following produced for each iteration:


run2.best100.out.gz
run2.dense
run2.extract.err
run2.extract.out
run2.features.dat
run2.init.opt
run2.mert.log
run2.mert.out
run2.mira.out
run2.moses.ini
run2.out
run2.scores.dat
run2.sparse-weights
run2.weights.txt


(run1 may be different)

In run2.best100.out.gz you should see the sparse feature values. In 
run2.mert.out you should see the sparse feature weights.



cheers - Barry

On 20/08/16 12:47, arefeh kazemi wrote:

Hi Barry

Thanks,
I tried verbose and I can see my sparse feature and it's scores for 
each sentence, but the weights.txt file is still empty.

(I use kbmira for tuning).
I think there is a problem with moses.ini file.
Could you please send an example .ini file with a sparse feature?

Thanks
Arefeh

On Thu, Aug 18, 2016 at 6:13 PM, Barry Haddow 
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>> wrote:


Hi Arefeh

The quickest way to see if Moses is using your feature is to put a
debug message in it to see if it gets called. You can also
increase the debug of Moses (try -v 2) to see if your feature's
scores appear in the breakdown.

To populate the weights file, you will need to run tuning (kbmira
or pro). If you just decode with an empty weights file, all the
weights will be set to 0 and Moses will not update the file,

cheers - Barry


On 18/08/16 14:15, arefeh kazemi wrote:

Hi Barry

Thanks.
I create an empty weights.txt file and write it's address in
moses.ini. Moses runs normally but weights file remains empty. It
seems moses doesn't use my feature.

Regards
Arefeh

On Wed, Aug 17, 2016 at 12:57 PM, Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>
wrote:

Hi Arefeh

That seems OK. Tuning (with kbmira or pro) will create a
weights file for the sparse features, which you can add with:

[weight-file]
/path/to/sparse/weights

What goes wrong when you run moses?

cheers - Barry


On 17/08/16 07:50, arefeh kazemi wrote:

Hi
This is just a kindly reminder that I'm waiting for response.

Thanks
Arefe
-- Forwarded message --
From: *arefeh kazemi* <akazem...@gmail.com
<mailto:akazem...@gmail.com>>
Date: Tue, Aug 9, 2016 at 7:28 PM
Subject: moses.ini file for sparse features
To: Moses-support <moses-support@mit.edu
<mailto:moses-support@mit.edu>>


Hi

I've implemented a sparse feature function in Moses Hiero
system but I don't know what are the parameters in moses.ini
file for a sparse feature.
for the dense version of my feature, I had these lines in my
ini file:
[features]
Aref num-features=4 ...
[weight]
Aref= 0.2 0.2 0.2 0.2

Now, What should I write for my sparse feature in .ini file?
I have removed the weights for Aref and also "num-features"
from ini file, but it doesn't work.

Regards
Arefeh Kazemi



-- 
Arefeh Kazemi



___
Moses-support mailing list
Moses-support@mit.edu <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support
<http://mailman.mit.edu/mailman/listinfo/moses-support>

The University of Edinburgh is a charitable body, registered
in Scotland, with registration number SC005336. 

-- 
Arefeh Kazemi

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336. 


--
Arefeh Kazemi
# MERT optimized configuration
# decoder /home/bhaddow/moses.new/dist/977e8ea/bin/moses
# BLEU 0.283399 on dev 
/home/bhaddow/data/wmt16/qt21-ro-en/newsdev2016.1.en-ro.true.en
# We were before running iteration 4
# finished Wed Apr 13 10:50:37 BST 2016
### MOSES CONFIG FILE ###
#

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[distortion-limit]
6

# additional settings

[feature]
TargetWordInsertionFeature name=TWI factor=0 
path=/fs/magni0/bhaddow/experiments/wmt16/en-ro/model/sparse-features.24.ro.top50
SourceWordDeletionFeature name=SWD factor=0 
path=/fs/magni0/bhaddow/experiments/wmt16/en-ro/model/sparse-features.24.en.top50
WordTranslationFeature name=WT input-factor=0 output-factor=0 simple=1 
source-context=0 target-context=0 
source-path=/fs/magni0/bhaddow/experiments/wmt16/en-ro/model/sparse-features.24.en.top50
 
target-path=/fs/magni0/bhaddow/experiments/wmt16/en-ro/model/sparse-features.24.ro.top50
PhraseLengthFeature name=PL



# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryC

Re: [Moses-support] Fwd: moses.ini file for sparse features

2016-08-18 Thread Barry Haddow


Hi Arefeh

The quickest way to see if Moses is using your feature is to put a debug 
message in it to see if it gets called. You can also increase the debug 
of Moses (try -v 2) to see if your feature's scores appear in the breakdown.


To populate the weights file, you will need to run tuning (kbmira or 
pro). If you just decode with an empty weights file, all the weights 
will be set to 0 and Moses will not update the file,


cheers - Barry

On 18/08/16 14:15, arefeh kazemi wrote:

Hi Barry

Thanks.
I create an empty weights.txt file and write it's address in 
moses.ini. Moses runs normally but weights file remains empty. It 
seems moses doesn't use my feature.


Regards
Arefeh

On Wed, Aug 17, 2016 at 12:57 PM, Barry Haddow 
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>> wrote:


Hi Arefeh

That seems OK. Tuning (with kbmira or pro) will create a weights
file for the sparse features, which you can add with:

[weight-file]
/path/to/sparse/weights

What goes wrong when you run moses?

cheers - Barry


On 17/08/16 07:50, arefeh kazemi wrote:

Hi
This is just a kindly reminder that I'm waiting for response.

Thanks
Arefe
-- Forwarded message --
From: *arefeh kazemi* <akazem...@gmail.com
<mailto:akazem...@gmail.com>>
Date: Tue, Aug 9, 2016 at 7:28 PM
Subject: moses.ini file for sparse features
To: Moses-support <moses-support@mit.edu
<mailto:moses-support@mit.edu>>


Hi

I've implemented a sparse feature function in Moses Hiero system
but I don't know what are the parameters in moses.ini file for a
sparse feature.
for the dense version of my feature, I had these lines in my ini
file:
[features]
Aref num-features=4 ...
[weight]
Aref= 0.2 0.2 0.2 0.2

Now, What should I write for my sparse feature in .ini file? I
have removed the weights for Aref and also "num-features" from
ini file, but it doesn't work.

Regards
Arefeh Kazemi



-- 
Arefeh Kazemi



___
Moses-support mailing list
Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support  
<http://mailman.mit.edu/mailman/listinfo/moses-support>



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




--
Arefeh Kazemi


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: moses.ini file for sparse features

2016-08-17 Thread Barry Haddow


Hi Arefeh

That seems OK. Tuning (with kbmira or pro) will create a weights file 
for the sparse features, which you can add with:


[weight-file]
/path/to/sparse/weights

What goes wrong when you run moses?

cheers - Barry

On 17/08/16 07:50, arefeh kazemi wrote:

Hi
This is just a kindly reminder that I'm waiting for response.

Thanks
Arefe
-- Forwarded message --
From: *arefeh kazemi* >
Date: Tue, Aug 9, 2016 at 7:28 PM
Subject: moses.ini file for sparse features
To: Moses-support >


Hi

I've implemented a sparse feature function in Moses Hiero system but I 
don't know what are the parameters in moses.ini file for a sparse 
feature.

for the dense version of my feature, I had these lines in my ini file:
[features]
Aref num-features=4 ...
[weight]
Aref= 0.2 0.2 0.2 0.2

Now, What should I write for my sparse feature in .ini file? I have 
removed the weights for Aref and also "num-features" from ini file, 
but it doesn't work.


Regards
Arefeh Kazemi



--
Arefeh Kazemi


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Incremental tuning?

2016-08-01 Thread Barry Haddow

Hi Bogdan

Why do you set the maximum phrase length to 20? Such long phrases are 
unlikely to be useful, and could be the cause of the excessive resource 
usage.

Other than that, the system you describe should not be using up 192G ram.

cheers - Barry

On 01/08/16 20:40, Bogdan Vasilescu wrote:
> Thanks Hieu,
>
> It runs out of memory around 3,000 sentences when n-best is the
> default 100. It seems to do a little bit better if I set n-best to 10
> (5,000 sentences or so). The machine I'm running this on has 192 GB
> RAM. I'm using the binary moses from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>
> My phrase table was built on 1,200,000 sentences (phrase length at
> most 20). My language model is a 5-gram, built on close to 500,000,000
> sentences.
>
> Still, the question remains. Is there a way to perform tuning incrementally?
>
> I'm thinking:
> - tune on a sample of my original tuning corpora; this generates an
> updated moses.ini, with "better" weights
> - use this moses.ini as input for a second tuning phase, on another
> sample of my tuning corpora
> - repeat until there is convergence in the weights
>
> Bogdan
>
>
> On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang  wrote:
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 July 2016 at 18:57, Bogdan Vasilescu  wrote:
>>> Hi,
>>>
>>> I've trained a model and I'm trying to tune it using mert-moses.pl.
>>>
>>> I tried different size tuning corpora, and as soon as I exceed a
>>> certain size (this seems to vary between consecutive runs, as well as
>>> with other tuning parameters like --nbest), the process gets killed:
>> it should work with any size tuning corpora. The only thin I can think of is
>> if the tuning corpora is very large (1,000,000 sentences say) or the n-best
>> list is very large (1,000,000 say) then the decoder or the mert script may
>> use a lot of memory
>>>
>>> Killed
>>> Exit code: 137
>>> The decoder died. CONFIG WAS -weight-overwrite ...
>>>
>>> Looking into the kernel logs in /var/log/kern.log suggests I'm running
>>> out of memory:
>>>
>>> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
>>> 992 or sacrifice child
>>> kernel: [98464.080920] Killed process 15848 (moses)
>>> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
>>>
>>> Is there a way to perform tuning incrementally?
>>>
>>> I'm thinking:
>>> - tune on a sample of my original tuning corpora; this generates an
>>> updated moses.ini, with "better" weights
>>> - use this moses.ini as input for a second tuning phase, on another
>>> sample of my tuning corpora
>>> - repeat until there is convergence in the weights
>>>
>>> Would this work?
>>>
>>> Many thanks in advance,
>>> Bogdan
>>>
>>> --
>>> Bogdan (博格丹) Vasilescu
>>> Postdoctoral Researcher
>>> Davis Eclectic Computational Analytics Lab
>>> University of California, Davis
>>> http://bvasiles.github.io
>>> http://decallab.cs.ucdavis.edu/
>>> @b_vasilescu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tuning crashed

2016-06-08 Thread Barry Haddow


Hi Tomasz

The error message about missing the ini file is a consequence of the 
tuning crash, so just ignore this.


To find out why Moses is failing, run it again in the console like this:

/home/moses/src/mosesdecoder/bin/moses -threads 16 -v 0  -config 
/home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2 -show-weights


If necessary, increase the verbosity of the debug (-v).

cheers - Barry

On 08/06/16 08:42, Tomasz Gawryl wrote:


Hi,

I have a problem with tuning crashed. It seems that moses.ini is 
missing in temporary folder but I have no idea why. I attached link to 
my config file.


Please help.

Regards

Thomas

moses@smtserver:~/working/experiments/NGRAM5/steps/2$ more 
TUNING_tune.2.STDERR


Using SCRIPTS_ROOTDIR: /home/moses/src/mosesdecoder/scripts

Asking moses for feature names and values from 
/home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2


Executing: /home/moses/src/mosesdecoder/bin/moses -threads 16 -v 0 
-config /home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2 
-show-weights


exec: /home/moses/src/mosesdecoder/bin/moses -threads 16 -v 0 -config 
/home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2 -show-weights


Executing: /home/moses/src/mosesdecoder/bin/moses -threads 16 -v 0 
-config /home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2 
-show-weights > ./features.list 2> /dev/null


Exit code: 1

ERROR: Failed to run '/home/moses/src/mosesdecoder/bin/moses -threads 
16 -v 0 -config 
/home/moses/working/experiments/NGRAM5/model/moses.bin.ini.2 
-show-weights'. at 
/home/moses/src/mosesdecoder/scripts/training/mert-moses.pl line 1748.


cp: cannot stat 
‘/home/moses/working/experiments/NGRAM5/tuning/tmp.2/moses.ini’: No 
such file or directory


https://docs.google.com/document/d/1gI7YVUx8VoktIfIQvvU54jKSm5Ta6UZjYcszBFtP-V8/edit?usp=sharing



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Finding the top 5 most ambiguous words

2016-05-13 Thread Barry Haddow


Hi Joe

You could also look at the entropy of the distribution. I'll leave Matt 
to post the one-liner for that one,


cheers - Barry

On 13/05/16 15:10, Matt Post wrote:
gzip -cd model/phrase-table.gz | cut -d\| -f1 | sort | uniq -c | sort 
-nr | head -n5


(according to one definition of "ambiguous")

On May 11, 2016, at 2:53 AM, Joe Jean > wrote:


Hello,

How would you go about finding the top 5 most ambiguous words in a 
translation system just by looking at the phrase table and the 
lexical translation tables? Thanks.



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Best alignment algorithm

2016-05-11 Thread Barry Haddow

Hi Dorra

I think this is the classic paper

http://dl.acm.org/citation.cfm?id=778824

Although a quick google turned up this paper, which is more specific to 
your question

http://www.mt-archive.info/MTS-2007-Wu.pdf

cheers - Barry

On 10/05/16 23:51, haoua...@iro.umontreal.ca wrote:
> Hi,
>
> In moses, there is several alignment algorithms like grow-diag-final-and,
> grow-diag-final etc.
> Alignment has impact on the extracted phrases and then on the translation
> quality.
> My question is: what is the best alignment algorithm among those available
> in Moses? Is there an article comparing the performance of the several
> alignment algorithms.
>
> Thank you,
>
> Dorra
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Random segfaults with alternative decoding paths

2016-04-13 Thread Barry Haddow


Hi Ales

Well, bitPos=18446744073708512633  looks bogus. Marcin?

cheers - Barry

On 13/04/16 17:23, Aleš Tamchyna wrote:

Hi all,

sorry for the delay. I'm attaching the debug backtrace.

Best,
Ales

On Wed, Apr 13, 2016 at 1:49 PM, Barry Haddow 
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>> wrote:


Hi

The backtrace would be more informative if you run with a debug
build (add variant=debug to bjam). Sometimes this makes bugs go
away, or new bugs appear, but if not then it will give more
information. You can run with core files enabled (ulimit -c
unlimited) to save having to run Moses inside gdb.

If the bug is random, but not thread related, then it could well
be memory corruption. Running Moses in valgrind can help track
this down (again, using a debug build is better). Note that the
suffix arrays crash valgrind (last time I checked) so don't build
them in,

cheers - Barry


On 13/04/16 11:25, Ales Tamchyna wrote:

Hi,
Let me add some more information to this: when running Moses in gdb, I get 
the following backtrace:
#0  0x006e3ba4 in 
Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, bool, 
bool) ()
#1  0x005cd2a7 in 
Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase
 const&) const ()
#2  0x0048efe4 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase const&) 
const ()
#3  0x0048e6a0 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector<Moses::InputPath*, 
std::allocator<Moses::InputPath*> > const&) const ()
#4  0x00560948 in 
Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() ()
#5  0x00551a39 in 
Moses::TranslationOptionCollectionText::CreateTranslationOptions() ()
#6  0x004bddfc in Moses::Manager::Decode() ()
#7  0x00433bd4 in Moses::TranslationTask::Run() ()
#8  0x00496088 in Moses::ThreadPool::Execute() ()
#9  0x007cbdba in thread_proxy ()
#10 0x7fffc210c182 in start_thread (arg=0x7ffc23a5d700) at 
pthread_create.c:312
#11 0x7fffc1e3947d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
This suggests the problem is somewhere in loading phrase translations from 
the compact phrase table.
I’m not sure why the LEGACY functions are called but I’m assuming that 
these are “future” legacy methods and that they are in fact still used by 
phrase dictionary implementations (?).
Best,
Ales
From: Ondrej Bojar
Sent: středa 13. dubna 2016 12:19
To:moses-support@mit.edu <mailto:moses-support@mit.edu>
Cc: Roman Sudarikov; Ales Tamchyna
Subject: Random segfaults with alternative decoding paths


Hi,

we're experiencing random segfaults when we use two phrase tables in 
alternative decoding paths. The exact commit of moses we use is 
6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1.

We do have test runs on the exact same 200 input sentences, exact same 
moses.ini, on the very same machine, where one of the runs succeeds and the 
other dies after 45 sentences.


Would anyone have any idea what should we be chasing?

- it doesn't seem to be thread-related (segfault experienced with -threads 
1 as well as -threads 8)
- not related to nbest-list construction (we first had this problem in mert 
tuning so we isolated this)
- not related to more LMs (we first had several LMs in the setup, we get 
the crash with just one as well)
- not related to -search, the bug is there with -search set to 0, 1 or 4
- seems related to data or data size: when we trained the first ttable on 
just a very small corpus, we did not get the segfault (yet)
- not related to translation options caching, the bug is there even with 
-no-cache
- not related to the specification of output-factors; left unspecified or set to 
01, the bug is there


Here is the moses.ini:

[input-factors]
0

[mapping]
0 T 0
1 T 1

[distortion-limit]
6

[feature]
Distortion
KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4
PhraseDictionaryCompact name=TranslationModel0 num-features=4 
path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100
PhraseDictionaryCompact name=TranslationModel1 num-features=4 
path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100
PhrasePenalty
UnknownWordPenalty
WordPenalty

[weight]
Distortion0= 0.3
LM0= 0.5
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
TranslationModel1= 0.2 0.2 0.2 0.2
UnknownWordPenalty0= 1
WordPenalty0= -1


The large setup that shows these crashes uses this big files:

-rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm
-rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0

Re: [Moses-support] Random segfaults with alternative decoding paths

2016-04-13 Thread Barry Haddow


Hi

The backtrace would be more informative if you run with a debug build 
(add variant=debug to bjam). Sometimes this makes bugs go away, or new 
bugs appear, but if not then it will give more information. You can run 
with core files enabled (ulimit -c unlimited) to save having to run 
Moses inside gdb.


If the bug is random, but not thread related, then it could well be 
memory corruption. Running Moses in valgrind can help track this down 
(again, using a debug build is better). Note that the suffix arrays 
crash valgrind (last time I checked) so don't build them in,


cheers - Barry

On 13/04/16 11:25, Ales Tamchyna wrote:

Hi,
Let me add some more information to this: when running Moses in gdb, I get the 
following backtrace:
#0  0x006e3ba4 in 
Moses::PhraseDecoder::CreateTargetPhraseCollection(Moses::Phrase const&, bool, 
bool) ()
#1  0x005cd2a7 in 
Moses::PhraseDictionaryCompact::GetTargetPhraseCollectionNonCacheLEGACY(Moses::Phrase
 const&) const ()
#2  0x0048efe4 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionLEGACY(Moses::Phrase const&) 
const ()
#3  0x0048e6a0 in 
Moses::PhraseDictionary::GetTargetPhraseCollectionBatch(std::vector > const&) const ()
#4  0x00560948 in 
Moses::TranslationOptionCollection::GetTargetPhraseCollectionBatch() ()
#5  0x00551a39 in 
Moses::TranslationOptionCollectionText::CreateTranslationOptions() ()
#6  0x004bddfc in Moses::Manager::Decode() ()
#7  0x00433bd4 in Moses::TranslationTask::Run() ()
#8  0x00496088 in Moses::ThreadPool::Execute() ()
#9  0x007cbdba in thread_proxy ()
#10 0x7fffc210c182 in start_thread (arg=0x7ffc23a5d700) at 
pthread_create.c:312
#11 0x7fffc1e3947d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
This suggests the problem is somewhere in loading phrase translations from the 
compact phrase table.
I’m not sure why the LEGACY functions are called but I’m assuming that these 
are “future” legacy methods and that they are in fact still used by phrase 
dictionary implementations (?).
Best,
Ales
From: Ondrej Bojar
Sent: středa 13. dubna 2016 12:19
To:moses-support@mit.edu
Cc: Roman Sudarikov; Ales Tamchyna
Subject: Random segfaults with alternative decoding paths


Hi,

we're experiencing random segfaults when we use two phrase tables in 
alternative decoding paths. The exact commit of moses we use is 
6a06e7776a58b09e4ed5b1cf11eb64fbdd6b02a2, from April 1.

We do have test runs on the exact same 200 input sentences, exact same 
moses.ini, on the very same machine, where one of the runs succeeds and the 
other dies after 45 sentences.


Would anyone have any idea what should we be chasing?

- it doesn't seem to be thread-related (segfault experienced with -threads 1 as 
well as -threads 8)
- not related to nbest-list construction (we first had this problem in mert 
tuning so we isolated this)
- not related to more LMs (we first had several LMs in the setup, we get the 
crash with just one as well)
- not related to -search, the bug is there with -search set to 0, 1 or 4
- seems related to data or data size: when we trained the first ttable on just 
a very small corpus, we did not get the segfault (yet)
- not related to translation options caching, the bug is there even with 
-no-cache
- not related to the specification of output-factors; left unspecified or set to 
01, the bug is there


Here is the moses.ini:

[input-factors]
0

[mapping]
0 T 0
1 T 1

[distortion-limit]
6

[feature]
Distortion
KENLM lazyken=0 name=LM0 factor=0 path=lm.1.trie.lm order=4
PhraseDictionaryCompact name=TranslationModel0 num-features=4 
path=phrase-table.0-0,1.1.1 input-factor=0 output-factor=0,1 table-limit=100
PhraseDictionaryCompact name=TranslationModel1 num-features=4 
path=phrase-table.0-0,1.2.1 input-factor=0 output-factor=0,1 table-limit=100
PhrasePenalty
UnknownWordPenalty
WordPenalty

[weight]
Distortion0= 0.3
LM0= 0.5
PhrasePenalty0= 0.2
TranslationModel0= 0.2 0.2 0.2 0.2
TranslationModel1= 0.2 0.2 0.2 0.2
UnknownWordPenalty0= 1
WordPenalty0= -1


The large setup that shows these crashes uses this big files:

-rw-r--r-- 1 bojar ufal 584M Apr 13 09:19 lm.1.trie.lm
-rw-r--r-- 1 bojar ufal 1.1G Apr 13 09:24 phrase-table.0-0,1.1.1.minphr
-rw-r--r-- 1 bojar ufal 5.7M Apr 13 09:24 phrase-table.0-0,1.2.1.minphr


Thanks,
   Ondrej.




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Missing /usr/lib64/lib64/libboost_thread-mt.so.5 for multithreaded GIZA

2016-03-19 Thread Barry Haddow


Hi Sergey

Just the executable, which will be either mgiza or mgizapp

cheers - Barry

On 19/03/16 15:07, Sergey A. wrote:

Hi Barry,

Thank you for the swift response. I do need *mgiza*, this is why this 
thread was created. I compiled it, but apparently didn't copy the 
needed files to the Moses directory. Which files are missing from the 
*/tools/* directory?


Thanks.

Sergey

2016-03-19 17:01 GMT+02:00 Barry Haddow <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>>:


Hi Sergey

It's looking for mgiza, which you don't have. Either install mgiza
into your tools directory, or remove the mgiza arguments from your
train-model.perl command line.

cheers - Barry


On 19/03/16 13:56, Sergey A. wrote:

Hello Hieu Hoang.

Thank you for your suggestion, everything worked!!! Now to the
next problem, I'm trying to use it in the training of the
translation system, I've copied the bins and the script from the
mgiza compiled, but//it doesn't work (specifying the
*--external-bin-dir* didn't work as well):

*$ ~/mosesdecoder/scripts/training/train-model.perl -root-dir
train -corpus corpus/train.tags.he-en.clean -f he -e en
-alignment grow-diag-final-and -reordering msd-bidirectional-fe
-lm 0:3:/home/altshus/he-en/lm/train.blm.en:8 -external-bin-dir
~/mosesdecoder/tools -cores 12 -mgiza-cpus 12 -mgiza 2>&1 *
Use of implicit split to @_ is deprecated at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 2103.
Using SCRIPTS_ROOTDIR: /home/altshus/mosesdecoder/scripts
Using multi-thread GIZA
using gzip
Use of uninitialized value $GIZA in -x at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 489.
ERROR:Cannot find mkcls, GIZA++/mgiza, & snt2cooc.out/snt2cooc 
in/home/altshus/mosesdecoder/tools.
You MUST specify the parameter -external-bin-dir at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 489.

*$ ll /home/altshus/mosesdecoder/tools *
total 1996
-rwxr-xr-x. 1 altshus altshus 1103791 Mar  5 19:14 GIZA++
-rwxr-xr-x. 1 altshus altshus3291 Mar 19 15:41 merge_alignment.py
-rwxr-xr-x. 1 altshus altshus  277836 Mar 19 15:42*mkcls*
-rwxr-xr-x. 1 altshus altshus   43595 Mar 19 15:42 plain2snt
-rwxr-xr-x. 1 altshus altshus   42358 Mar 19 15:42 snt2cooc
-rwxr-xr-x. 1 altshus altshus  439934 Mar  5 19:14 snt2cooc.out
-rwxr-xr-x. 1 altshus altshus   30883 Mar 19 15:42 snt2coocrmp
-rwxr-xr-x. 1 altshus altshus   35005 Mar 19 15:42 snt2plain
-rwxr-xr-x. 1 altshus altshus   50385 Mar 19 15:42 symal

Should I open a separate topic for it, or is it okay to ask in
this thread?

Thanks.

Sergey Altshuller

2016-03-10 17:19 GMT+02:00 Hieu Hoang <hieuho...@gmail.com
<mailto:hieuho...@gmail.com>>:

not sure, to be honest.

If you tire of fighting cmake/make problems, you can try
   ./manual-compile/compile.sh
It has it's own problems, but it's problems that you can see
and fix yourself

Hieu Hoang
http://www.hoang.co.uk/hieu

On 9 March 2016 at 18:20, Sergey A. <www.se...@gmail.com
<http://www.se...@gmail.com>> wrote:

Hi and thank you for your time. I also just cloned it
from the path you've provided, and ran these commands,
from the mgizapp directory inside the repo. I'm getting this:

[ 94%] Building CXX object
src/CMakeFiles/mgiza_lib.dir/vocab.cpp.o
Linking CXX static library ../lib/libmgiza.a
[ 94%] Built target mgiza_lib
Scanning dependencies of target d4norm
Scanning dependencies of target hmmnorm
Scanning dependencies of target mgiza
make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/d4norm'. Stop.
make[2]: *** Waiting for unfinished jobs
make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/hmmnorm'. Stop.
make[2]: *** Waiting for unfinished jobs
[ 96%] make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/mgiza'.  Stop.
make[2]: *** Waiting for unfinished jobs
[ 98%] [100%] Building CXX object
src/CMakeFiles/d4norm.dir/d4norm.cxx.o
Building CXX object src/CMakeFiles/hmmnorm.dir/hmmnorm.cxx.o
Building CXX object src/CMakeFiles/mgiza.dir/main.cpp.o
make[1]: *** [src/CMakeFiles/hmmnorm.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs
make[1]: *** [src/CMakeFiles/d4norm.dir/all] Error 2
make[1]: *** [src/CMakeFiles/mgiza.dir/all] Error 2
make: *** [all] Error 2

Re: [Moses-support] Missing /usr/lib64/lib64/libboost_thread-mt.so.5 for multithreaded GIZA

2016-03-19 Thread Barry Haddow


Hi Sergey

It's looking for mgiza, which you don't have. Either install mgiza into 
your tools directory, or remove the mgiza arguments from your 
train-model.perl command line.


cheers - Barry

On 19/03/16 13:56, Sergey A. wrote:

Hello Hieu Hoang.

Thank you for your suggestion, everything worked!!! Now to the next 
problem, I'm trying to use it in the training of the translation 
system, I've copied the bins and the script from the mgiza compiled, 
but//it doesn't work (specifying the *--external-bin-dir* didn't work 
as well):


*$ ~/mosesdecoder/scripts/training/train-model.perl -root-dir train 
-corpus corpus/train.tags.he-en.clean -f he -e en -alignment 
grow-diag-final-and -reordering msd-bidirectional-fe -lm 
0:3:/home/altshus/he-en/lm/train.blm.en:8 -external-bin-dir 
~/mosesdecoder/tools -cores 12 -mgiza-cpus 12 -mgiza 2>&1 *

Use of implicit split to @_ is deprecated at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 2103.
Using SCRIPTS_ROOTDIR: /home/altshus/mosesdecoder/scripts
Using multi-thread GIZA
using gzip
Use of uninitialized value $GIZA in -x at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 489.
ERROR:Cannot find mkcls, GIZA++/mgiza, & snt2cooc.out/snt2cooc 
in/home/altshus/mosesdecoder/tools.
You MUST specify the parameter -external-bin-dir at 
/home/altshus/mosesdecoder/scripts/training/train-model.perl line 489.

*$ ll /home/altshus/mosesdecoder/tools *
total 1996
-rwxr-xr-x. 1 altshus altshus 1103791 Mar  5 19:14 GIZA++
-rwxr-xr-x. 1 altshus altshus3291 Mar 19 15:41 merge_alignment.py
-rwxr-xr-x. 1 altshus altshus  277836 Mar 19 15:42*mkcls*
-rwxr-xr-x. 1 altshus altshus   43595 Mar 19 15:42 plain2snt
-rwxr-xr-x. 1 altshus altshus   42358 Mar 19 15:42 snt2cooc
-rwxr-xr-x. 1 altshus altshus  439934 Mar  5 19:14 snt2cooc.out
-rwxr-xr-x. 1 altshus altshus   30883 Mar 19 15:42 snt2coocrmp
-rwxr-xr-x. 1 altshus altshus   35005 Mar 19 15:42 snt2plain
-rwxr-xr-x. 1 altshus altshus   50385 Mar 19 15:42 symal

Should I open a separate topic for it, or is it okay to ask in this 
thread?


Thanks.

Sergey Altshuller

2016-03-10 17:19 GMT+02:00 Hieu Hoang >:


not sure, to be honest.

If you tire of fighting cmake/make problems, you can try
   ./manual-compile/compile.sh
It has it's own problems, but it's problems that you can see and
fix yourself

Hieu Hoang
http://www.hoang.co.uk/hieu

On 9 March 2016 at 18:20, Sergey A. > wrote:

Hi and thank you for your time. I also just cloned it from the
path you've provided, and ran these commands, from the mgizapp
directory inside the repo. I'm getting this:

[ 94%] Building CXX object
src/CMakeFiles/mgiza_lib.dir/vocab.cpp.o
Linking CXX static library ../lib/libmgiza.a
[ 94%] Built target mgiza_lib
Scanning dependencies of target d4norm
Scanning dependencies of target hmmnorm
Scanning dependencies of target mgiza
make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/d4norm'.  Stop.
make[2]: *** Waiting for unfinished jobs
make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/hmmnorm'.  Stop.
make[2]: *** Waiting for unfinished jobs
[ 96%] make[2]: *** No rule to make target
`/usr/lib64/lib64/libboost_thread-mt.so.5', needed by
`bin/mgiza'.  Stop.
make[2]: *** Waiting for unfinished jobs
[ 98%] [100%] Building CXX object
src/CMakeFiles/d4norm.dir/d4norm.cxx.o
Building CXX object src/CMakeFiles/hmmnorm.dir/hmmnorm.cxx.o
Building CXX object src/CMakeFiles/mgiza.dir/main.cpp.o
make[1]: *** [src/CMakeFiles/hmmnorm.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs
make[1]: *** [src/CMakeFiles/d4norm.dir/all] Error 2
make[1]: *** [src/CMakeFiles/mgiza.dir/all] Error 2
make: *** [all] Error 2


2016-03-07 11:33 GMT+02:00 Hieu Hoang >:

where did you get the code for mgiza from and what was the
exact commands you used to compile?

I just tested mgiza compilation with the code from
https://github.com/moses-smt/mgiza
using the commands
   cmake .
   cmake .
   make -j4
It seems to compile ok


On 06/03/2016 20:02, Sergey A. wrote:

Hello.

I'm trying to compile mgiza
,
but getting the below error. What am I doing wrong? Note
that I don't have root access on the machine, so if
libraries needed I'd like to know how to download

Re: [Moses-support] EMS appears to require SRILM for OSM, even when not interpolating

2016-02-16 Thread Barry Haddow


Hi Lane

SRILM is no longer required, since Nadir made some EMS updates last 
October. Try upgrading to a recent version,


cheers - Barry

On 16/02/16 15:04, Lane Schwartz wrote:

Hi,

This is mostly an FYI, but I thought I'd point it out. The OSM 
documentation (http://www.statmt.org/moses/?n=Advanced.Models#ntoc3) 
mentions that SRILM is required when training an interpolated OSM 
model. This makes sense, because KenLM currently doesn't support 
interpolation.


However, the documentation doesn't state that SRILM is required for 
(non-interpolated) OSM training. This also makes sense, because 
regular OSM training can use KenLM.


The problem comes when running EMS using OSM when srilm-dir is not 
defined in your config file. My experience is that doing so results in 
the error: "ERROR: you need to define GENERAL:srilm-dir".


operation-sequence-model = "yes"
operation-sequence-model-order = 5
operation-sequence-model-settings = ""


 As far as I can tell, this is because the build-osm section of 
experiment.meta references the variable $srilm-dir:


build-osm
in: corpus word-alignment
out: osm-model
ignore-unless: operation-sequence-model
rerun-on-change: operation-sequence-model training-options
script giza-settings operation-sequence-model-settings
template: $moses-script-dir/OSM/OSM-Train.perl --corpus-f
IN0.$input-extension --corpus-e IN0.$output-extension --alignment
IN1.$alignment-symmetrization-method --order
$operation-sequence-model-order --out-dir OUT --moses-src-dir
$moses-src-dir --srilm-dir $srilm-dir
$operation-sequence-model-settings
default-name: model/OSM


I'm not terribly familiar with this code, but it seems that a solution 
would be to specify --lmplz instead of --srilm-dir in the build-osm 
section of experiment.meta, since OSM-Train.perl accepts --lmplz as an 
alternative to --srilm-dir.


Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-02-01 Thread Barry Haddow


Hi Martin

In your original mail, you passed different inputs to the server and 
command-line version. The following are extracted from your debug:


command-line:
Translating:  das ist ein haus 

server:
Translating:  dies ist ein haus . 

(Although you specify the command-line in the first case as "echo 'dies 
ist ein haus' | moses -f string-to-tree/moses.ini", which contradicts 
the debug. I assume you spliced together different runs.)


Could you show an example where you pass the *same* string to the two 
different Moses versions and get different outputs?


cheers - Barry

On 01/02/16 12:47, Martin Baumgärtner wrote:
With the current fix the crash is gone, but there is still an 
unexpected difference between server mode and command line moses: the 
first produces an empty string, now, whereas the latter produces "this 
is a house". I tried another engine (english->chinese) and got the 
same behaviour - empty string vs. correct translation.


Cheers,
Martin

Am 29.01.2016 um 22:37 schrieb Matthias Huck:

On Fri, 2016-01-29 at 21:26 +, Hieu Hoang wrote:

The decoder should handle no translation without falling over. But
yes, the model is too toy

Normally the decoder would always produce some translation. (The translation 
could be an empty sentence, of course.) If it's misconfigured, it should tell 
you about it. But maybe not with a segmentation fault. :-)



On 29 Jan 2016 9:15 pm, "Matthias Huck"<mh...@inf.ed.ac.uk>  wrote:

Hi,

It seems to me that this toy string-to-tree setup is either
outdated,
or it always had issues. It should be replaced.

Under real-world conditions, the decoder should always be able to
produce some hypothesis. We would therefore usually extract a whole
set
of glue rules. And we would typically also add an [unknown-lhs]
section
to the moses.ini that would tell the decoder which left-hand side
non
-terminal labels to use for out-of-vocabulary words. To my
knowledge,
these two techniques are crucial for being able to parse any input
sentence provided to the chart decoder in syntax-based translation.

So, in my opinion, the problem is most likely neither the server
implementation nor the syntax-based decoder, but a problematic
setup.
I would consider it okay for the server to crash (or at least print
a
warning) under such circumstances. You don't want it to silently
not
translate complete sentences.

(I must admit that I didn't look into it in too much detail, but it
sho
uld be easy to confirm.)

Cheers,
Matthias


On Fri, 2016-01-29 at 20:28 +, Barry Haddow wrote:

Hi All

I think I see what happened now.

When you give the input "dies ist ein haus" to the sample model,

the

"dies" is unknown, and there is no translation. The server did

not check

for this condition, and got a seg fault. I have added a check, so

if you

pull and try again it should not crash.

In the log pasted by Martin, he passed "das ist ein haus" to
command-line Moses, which works, and gives a translation.

I think ideally the sample models should handle unknown words,

and give

a translation. Maybe adding a glue rule would be sufficient?

cheers - Barry

On 29/01/16 11:13, Barry Haddow wrote:

Hi

When I run command-line Moses, I get the output below - i.e. no

best

translation. The server crashes for me since it does not check

for the

null pointer, but the command-line version does.

I think there should be a translation for this example.

cheers - Barry

[gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses

-f

string-to-tree/moses.ini
Defined parameters (per moses.ini or switch):
   config: string-to-tree/moses.ini
   cube-pruning-pop-limit: 1000
   feature: KENLM name=LM factor=0 order=3 num

-features=1

path=lm/europarl.srilm.gz WordPenalty UnknownWordPenalty
PhraseDictionaryMemory input-factor=0 output-factor=0
path=string-to-tree/rule-table num-features=1 table-limit=20
   input-factors: 0
   inputtype: 3
   mapping: 0 T 0
   max-chart-span: 20 1000
   non-terminals: X S
   search-algorithm: 3
   translation-details: translation-details.log
   weight: WordPenalty0= 0 LM= 0.5

PhraseDictionaryMemory0= 0.5

line=KENLM name=LM factor=0 order=3 num-features=1

path=lm/europarl.srilm.gz

Loading the LM will be faster if you build a binary file.
Reading lm/europarl.srilm.gz
5---10---15---20---25---30---35---40---45---50---55---60--

-65---70---75---80---85---90---95--100

**The ARPA file is missing .  Substituting log10 probability

-100.000.
***
***

FeatureFunction: LM start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 1 end: 1
line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 2 end: 2
line=PhraseDictionaryMemory input-factor=0 output-factor=0
path=string-to-tree/rule-table num-features=1

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-02-01 Thread Barry Haddow

end: 3
Loading LM
Loading WordPenalty0
Loading UnknownWordPenalty0
Loading PhraseDictionaryMemory0
Start loading text phrase table. Moses format : [0.747] seconds
Reading string-to-tree/rule-table

5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

max-chart-span: 20
RUN SERVER at pid 0
[moses/server/Server.cpp:49] Listening on port 8080
[moses/server/TranslationRequest.cpp:315] Input: das ist ein haus
Translating: das ist ein haus
0 1 2 3 4 5
0 3 2 2 1 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0
Translation took 0.000 seconds

=> XML-RPC result after calling (see moses server output just above) ...

% curl --data @rpc.xml '192.168.178.14:8080/RPC2
<http://192.168.178.14:8080/RPC2>'

text

... with rpc.xml ...

translate

text

das ist ein haus

Cheers,
Martin

Am 01.02.2016 um 14:16 schrieb Barry Haddow:

Hi Martin

In your original mail, you passed different inputs to the server and
command-line version. The following are extracted from your debug:

command-line:
Translating: das ist ein haus

server:
Translating: dies ist ein haus .

(Although you specify the command-line in the first case as "echo
'dies ist ein haus' | moses -f string-to-tree/moses.ini", which
contradicts the debug. I assume you spliced together different runs.)

Could you show an example where you pass the *same* string to the two
different Moses versions and get different outputs?

cheers - Barry

On 01/02/16 12:47, Martin Baumgärtner wrote:
With the current fix the crash is gone, but there is still an
unexpected difference between server mode and command line moses:
the first produces an empty string, now, whereas the latter produces
"this is a house". I tried another engine (english->chinese) and got
the same behaviour - empty string vs. correct translation.

Cheers,
Martin

Am 29.01.2016 um 22:37 schrieb Matthias Huck:

On Fri, 2016-01-29 at 21:26 +, Hieu Hoang wrote:

The decoder should handle no translation without falling over. But
yes, the model is too toy

Normally the decoder would always produce some translation. (The translation
could be an empty sentence, of course.) If it's misconfigured, it should tell
you about it. But maybe not with a segmentation fault. :-)

On 29 Jan 2016 9:15 pm, "Matthias Huck"<mh...@inf.ed.ac.uk> wrote:

Hi,

It seems to me that this toy string-to-tree setup is either
outdated,
or it always had issues. It should be replaced.

Under real-world conditions, the decoder should always be able to
produce some hypothesis. We would therefore usually extract a whole
set
of glue rules. And we would typically also add an [unknown-lhs]
section
to the moses.ini that would tell the decoder which left-hand side
non
-terminal labels to use for out-of-vocabulary words. To my
knowledge,
these two techniques are crucial for being able to parse any input
sentence provided to the chart decoder in syntax-based translation.

So, in my opinion, the problem is most likely neither the server
implementation nor the syntax-based decoder, but a problematic
setup.
I would consider it okay for the server to crash (or at least print
a
warning) under such circumstances. You don't want it to silently
not
translate complete sentences.

(I must admit that I didn't look into it in too much detail, but it
sho
uld be easy to confirm.)

Cheers,
Matthias

On Fri, 2016-01-29 at 20:28 +, Barry Haddow wrote:

Hi All

I think I see what happened now.

When you give the input "dies ist ein haus" to the sample model,

the

"dies" is unknown, and there is no translation. The server did

not check

for this condition, and got a seg fault. I have added a check, so

if you

pull and try again it should not crash.

In the log pasted by Martin, he passed "das ist ein haus" to
command-line Moses, which works, and gives a translation.

I think ideally the sample models should handle unknown words,

and give

a translation. Maybe adding a glue rule would be sufficient?

cheers - Barry

On 29/01/16 11:13, Barry Haddow wrote:

When I run command-line Moses, I get the output below - i.e. no

best

translation. The server crashes for me since it does not check

for the

null pointer, but the command-line version does.

I think there should be a translation for this example.

cheers - Barry

[gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses

-f

string-to-tree/moses.ini
Defined parameters (per moses.ini or switch):
config: string-to-tree/mo

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Barry Haddow

Hi

When I run command-line Moses, I get the output below - i.e. no best 
translation. The server crashes for me since it does not check for the 
null pointer, but the command-line version does.

I think there should be a translation for this example.

cheers - Barry

[gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses  -f 
string-to-tree/moses.ini
Defined parameters (per moses.ini or switch):
 config: string-to-tree/moses.ini
 cube-pruning-pop-limit: 1000
 feature: KENLM name=LM factor=0 order=3 num-features=1 
path=lm/europarl.srilm.gz WordPenalty UnknownWordPenalty 
PhraseDictionaryMemory input-factor=0 output-factor=0 
path=string-to-tree/rule-table num-features=1 table-limit=20
 input-factors: 0
 inputtype: 3
 mapping: 0 T 0
 max-chart-span: 20 1000
 non-terminals: X S
 search-algorithm: 3
 translation-details: translation-details.log
 weight: WordPenalty0= 0 LM= 0.5 PhraseDictionaryMemory0= 0.5
line=KENLM name=LM factor=0 order=3 num-features=1 path=lm/europarl.srilm.gz
Loading the LM will be faster if you build a binary file.
Reading lm/europarl.srilm.gz
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
**The ARPA file is missing .  Substituting log10 probability -100.000.
**
FeatureFunction: LM start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 1 end: 1
line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 2 end: 2
line=PhraseDictionaryMemory input-factor=0 output-factor=0 
path=string-to-tree/rule-table num-features=1 table-limit=20
FeatureFunction: PhraseDictionaryMemory0 start: 3 end: 3
Loading LM
Loading WordPenalty0
Loading UnknownWordPenalty0
Loading PhraseDictionaryMemory0
Start loading text phrase table. Moses format : [3.038] seconds
Reading string-to-tree/rule-table
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

max-chart-span: 20
Created input-output object : [3.041] seconds
Line 0: Initialize search took 0.000 seconds total
Translating:  dies ist ein haus   ||| [0,0]=X (1) [0,1]=X (1) 
[0,2]=X (1) [0,3]=X (1) [0,4]=X (1) [0,5]=X (1) [1,1]=X (1) [1,2]=X (1) 
[1,3]=X (1) [1,4]=X (1) [1,5]=X (1) [2,2]=X (1) [2,3]=X (1) [2,4]=X (1) 
[2,5]=X (1) [3,3]=X (1) [3,4]=X (1) [3,5]=X (1) [4,4]=X (1) [4,5]=X (1) 
[5,5]=X (1)

   0   1   2   3   4   5
   0   1   2   2   1   0
 0   0   0   2   0
   0   0   4   0
 0   0   0
   0   0
 0
Line 0: Additional reporting took 0.000 seconds total
Line 0: Translation took 0.002 seconds total
Translation took 0.000 seconds
Name:moses  VmPeak:74024 kB VmRSS:11084 kB  RSSMax:36832 kB 
user:2.972  sys:0.048   CPU:3.020   real:3.058

On 29/01/16 00:40, Hieu Hoang wrote:
> If it works ok on the command line but crashes when using the server,
> then that suggest a server issue.
>
> I don't know much about the server code, to be honest.
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Barry Haddow

Hi All

I think I see what happened now.

When you give the input "dies ist ein haus" to the sample model, the 
"dies" is unknown, and there is no translation. The server did not check 
for this condition, and got a seg fault. I have added a check, so if you 
pull and try again it should not crash.

In the log pasted by Martin, he passed "das ist ein haus" to 
command-line Moses, which works, and gives a translation.

I think ideally the sample models should handle unknown words, and give 
a translation. Maybe adding a glue rule would be sufficient?

cheers - Barry

On 29/01/16 11:13, Barry Haddow wrote:
> Hi
>
> When I run command-line Moses, I get the output below - i.e. no best
> translation. The server crashes for me since it does not check for the
> null pointer, but the command-line version does.
>
> I think there should be a translation for this example.
>
> cheers - Barry
>
> [gna]bhaddow: echo 'dies ist ein haus' | ~/moses.new/bin/moses  -f
> string-to-tree/moses.ini
> Defined parameters (per moses.ini or switch):
>   config: string-to-tree/moses.ini
>   cube-pruning-pop-limit: 1000
>   feature: KENLM name=LM factor=0 order=3 num-features=1
> path=lm/europarl.srilm.gz WordPenalty UnknownWordPenalty
> PhraseDictionaryMemory input-factor=0 output-factor=0
> path=string-to-tree/rule-table num-features=1 table-limit=20
>   input-factors: 0
>   inputtype: 3
>   mapping: 0 T 0
>   max-chart-span: 20 1000
>   non-terminals: X S
>   search-algorithm: 3
>   translation-details: translation-details.log
>   weight: WordPenalty0= 0 LM= 0.5 PhraseDictionaryMemory0= 0.5
> line=KENLM name=LM factor=0 order=3 num-features=1 path=lm/europarl.srilm.gz
> Loading the LM will be faster if you build a binary file.
> Reading lm/europarl.srilm.gz
> 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> **The ARPA file is missing .  Substituting log10 probability -100.000.
> **
> FeatureFunction: LM start: 0 end: 0
> line=WordPenalty
> FeatureFunction: WordPenalty0 start: 1 end: 1
> line=UnknownWordPenalty
> FeatureFunction: UnknownWordPenalty0 start: 2 end: 2
> line=PhraseDictionaryMemory input-factor=0 output-factor=0
> path=string-to-tree/rule-table num-features=1 table-limit=20
> FeatureFunction: PhraseDictionaryMemory0 start: 3 end: 3
> Loading LM
> Loading WordPenalty0
> Loading UnknownWordPenalty0
> Loading PhraseDictionaryMemory0
> Start loading text phrase table. Moses format : [3.038] seconds
> Reading string-to-tree/rule-table
> 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> 
> max-chart-span: 20
> Created input-output object : [3.041] seconds
> Line 0: Initialize search took 0.000 seconds total
> Translating:  dies ist ein haus   ||| [0,0]=X (1) [0,1]=X (1)
> [0,2]=X (1) [0,3]=X (1) [0,4]=X (1) [0,5]=X (1) [1,1]=X (1) [1,2]=X (1)
> [1,3]=X (1) [1,4]=X (1) [1,5]=X (1) [2,2]=X (1) [2,3]=X (1) [2,4]=X (1)
> [2,5]=X (1) [3,3]=X (1) [3,4]=X (1) [3,5]=X (1) [4,4]=X (1) [4,5]=X (1)
> [5,5]=X (1)
>
> 0   1   2   3   4   5
> 0   1   2   2   1   0
>   0   0   0   2   0
> 0   0   4   0
>   0   0   0
> 0   0
>   0
> Line 0: Additional reporting took 0.000 seconds total
> Line 0: Translation took 0.002 seconds total
> Translation took 0.000 seconds
> Name:moses  VmPeak:74024 kB VmRSS:11084 kB  RSSMax:36832 kB
> user:2.972  sys:0.048   CPU:3.020   real:3.058
>
>
> On 29/01/16 00:40, Hieu Hoang wrote:
>> If it works ok on the command line but crashes when using the server,
>> then that suggest a server issue.
>>
>> I don't know much about the server code, to be honest.
>>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-20 Thread Barry Haddow

Hi Dingyuan

What platform are you running on? I could not reproduce your error on 
Ubuntu 12.04, and valgrind is clean,

cheers - Barry

On 19/01/16 16:31, Barry Haddow wrote:
> Hi Dingyuan
>
> I ran for over 200 iterations and saw no problem. I tried with your LANG
> and LANGUAGE settings (I don't have the right packages for the other
> settings) and still saw no failure.
>
> Maybe it is a random pointer/memory problem like you suggested. I have
> started running your model with valgrind, but nothing so far,
>
> cheers - Barry
>
> On 19/01/16 14:26, Dingyuan Wang wrote:
>> Hi Barry,
>>
>> It usually hits an error in about 1~10 iterations on my laptop. I don't
>> know what triggers that, so it may be a probability problem.
>>
>> Disabling xml-input won't help. I think I should use verbose output.
>>
>> My locale settings is:
>>
>> LANG=zh_CN.UTF-8
>> LANGUAGE=zh_CN.UTF-8:zh_TW.UTF-8:zh_HK.utf8:en_US.utf8
>> LC_CTYPE="zh_CN.UTF-8"
>> LC_NUMERIC="zh_CN.UTF-8"
>> LC_TIME="zh_CN.UTF-8"
>> LC_COLLATE="zh_CN.UTF-8"
>> LC_MONETARY="zh_CN.UTF-8"
>> LC_MESSAGES="zh_CN.UTF-8"
>> LC_PAPER="zh_CN.UTF-8"
>> LC_NAME="zh_CN.UTF-8"
>> LC_ADDRESS="zh_CN.UTF-8"
>> LC_TELEPHONE="zh_CN.UTF-8"
>> LC_MEASUREMENT="zh_CN.UTF-8"
>> LC_IDENTIFICATION="zh_CN.UTF-8"
>> LC_ALL=
>>
>> 在 2016年01月19日 19:20, Barry Haddow 写道:
>>> Hi Dingyuan
>>>
>>> I have your script and model running, but so far it has not reported any
>>> errors. It's at iteration 27, and I'm using the latest Moses from git.
>>>
>>> How long should I expect it to run before it hits an error? Could it be
>>> affected by the locale setting?
>>>
>>> Have you tried running without xml-input to see if you still have the
>>> problem?
>>>
>>> cheers - Barry
>>>
>>> On 19/01/16 05:43, Dingyuan Wang wrote:
>>>> Hi Barry,
>>>>
>>>> I've uploaded the model:
>>>> https://mega.nz/#!UsVSBCBJ!e5IATFvLqrCb5zhmDekLn8NOGw4PSD9RRQLGQeKEvNY
>>>>
>>>> To test the model, I included a script 'repeatnbest.sh' which runs moses
>>>> repeatedly until encoding error occurs.
>>>>
>>>> The file run7.best100.out and run7.out in the archive is the last run
>>>> that produces the error.
>>>>
>>>> It seems that it is WordTranslationFeature that causes the problem.
>>>>
>>>> 在 2016年01月19日 00:03, Barry Haddow 写道:
>>>>> Hi Dingyuan
>>>>>
>>>>> Something is going wrong with the construction or outputting of feature
>>>>> names, and it looks like it's WordTranslationFeature that's the problem.
>>>>> Does the problem go away if you do not use word translation features?
>>>>>
>>>>> If you could make available a model that reproduces the nbest list
>>>>> construction then I would have a chance to debug it,
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> On 18/01/16 15:32, Dingyuan Wang wrote:
>>>>>> Hi Barry,
>>>>>>
>>>>>> I've checked all the models and corpora with the script, without
>>>>>> finding
>>>>>> any encoding problem.
>>>>>>
>>>>>> I also find that all such errors in nbest list occurs only in the
>>>>>> feature list (3 different samples), without affecting translation
>>>>>> result. Therefore, the phrase table or training corpus may not be the
>>>>>> problem.
>>>>>>
>>>>>> 在 2016年01月18日 23:04, Barry Haddow 写道:
>>>>>>> Hi Dingyuan
>>>>>>>
>>>>>>> Are these encoding errors present in your phrase table? Are they
>>>>>>> present
>>>>>>> in your training corpus? Since they appear in the word translation
>>>>>>> features, and you are using a shortlist, are they in the shortlist
>>>>>>> files
>>>>>>> in the model directory? (These have names with "topn" in them afaik).
>>>>>>>
>>>>>>> File-system errors are unlikely, and for the most part Moses treats
>>>>>>> text
>>>>>>> as byte strings so encoding errors usually trace back to the source
>>>>>>> text

[Moses-support] Job: Researcher in Statistical MT at the University of Edinburgh

2016-01-20 Thread Barry Haddow

Hi

We are looking for a new researcher to join the statmt group in Edinburgh

Link to the advert:
https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.jobspec?p_id=035233

About the group:
http://www.statmt.org/ued/

cheers - Barry

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-19 Thread Barry Haddow

Hi Dingyuan

I have your script and model running, but so far it has not reported any 
errors. It's at iteration 27, and I'm using the latest Moses from git.

How long should I expect it to run before it hits an error? Could it be 
affected by the locale setting?

Have you tried running without xml-input to see if you still have the 
problem?

cheers - Barry

On 19/01/16 05:43, Dingyuan Wang wrote:
> Hi Barry,
>
> I've uploaded the model:
> https://mega.nz/#!UsVSBCBJ!e5IATFvLqrCb5zhmDekLn8NOGw4PSD9RRQLGQeKEvNY
>
> To test the model, I included a script 'repeatnbest.sh' which runs moses
> repeatedly until encoding error occurs.
>
> The file run7.best100.out and run7.out in the archive is the last run
> that produces the error.
>
> It seems that it is WordTranslationFeature that causes the problem.
>
> 在 2016年01月19日 00:03, Barry Haddow 写道:
>> Hi Dingyuan
>>
>> Something is going wrong with the construction or outputting of feature
>> names, and it looks like it's WordTranslationFeature that's the problem.
>> Does the problem go away if you do not use word translation features?
>>
>> If you could make available a model that reproduces the nbest list
>> construction then I would have a chance to debug it,
>>
>> cheers - Barry
>>
>> On 18/01/16 15:32, Dingyuan Wang wrote:
>>> Hi Barry,
>>>
>>> I've checked all the models and corpora with the script, without finding
>>> any encoding problem.
>>>
>>> I also find that all such errors in nbest list occurs only in the
>>> feature list (3 different samples), without affecting translation
>>> result. Therefore, the phrase table or training corpus may not be the
>>> problem.
>>>
>>> 在 2016年01月18日 23:04, Barry Haddow 写道:
>>>> Hi Dingyuan
>>>>
>>>> Are these encoding errors present in your phrase table? Are they present
>>>> in your training corpus? Since they appear in the word translation
>>>> features, and you are using a shortlist, are they in the shortlist files
>>>> in the model directory? (These have names with "topn" in them afaik).
>>>>
>>>> File-system errors are unlikely, and for the most part Moses treats text
>>>> as byte strings so encoding errors usually trace back to the source
>>>> text.
>>>>
>>>> cheers - Barry
>>>>
>>>> On 18/01/16 14:56, Dingyuan Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> "The ones starting with the "@"" are due to corrupted bytes in the
>>>>> nbest
>>>>> list.
>>>>>
>>>>> This kind of corruption occurs from time to time. I wonder if it comes
>>>>> from memory errors or filesystem failure or some kind of
>>>>> pointer/encoding problem in moses.
>>>>>
>>>>> I've written a script to find such corrupted lines:
>>>>>
>>>>> https://gist.github.com/gumblex/0d9d0848b435e4f9818f
>>>>>
>>>>> 在 2016年01月18日 20:42, Barry Haddow 写道:
>>>>>> Hi Dingyuan
>>>>>>
>>>>>> The extractor expects feature names to contain an underscore (not sure
>>>>>> exactly why) but some of yours don't, and Moses skips them,
>>>>>> interpreting
>>>>>> their values as extra dense features.
>>>>>>
>>>>>> The attached screenshot shows my view of the offending names. The ones
>>>>>> starting with the "@" are the problem. So it does look like the nbest
>>>>>> list is corrupted. Can you run the decoder on just that sentence, to
>>>>>> create an uncompressed version of the nbest list?
>>>>>>
>>>>>> cheers - Barry
>>>>>>
>>>>>> On 18/01/16 12:02, Dingyuan Wang wrote:
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> Attached is the zgrep result.
>>>>>>> I found that in the middle of line 61 a few bytes are corrupted. Is
>>>>>>> that
>>>>>>> a moses problem or my memory has a problem?
>>>>>>>
>>>>>>> I also checked other files using iconv, they are all OK in UTF-8.
>>>>>>>
>>>>>>> 在 2016年01月18日 19:32, Barry Haddow 写道:
>>>>>>>> Hi Dingyuan
>>>>>>>>
>>>>>>>> Yes, that's very possible. The error could be in extracting
>&g

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

Are these encoding errors present in your phrase table? Are they present 
in your training corpus? Since they appear in the word translation 
features, and you are using a shortlist, are they in the shortlist files 
in the model directory? (These have names with "topn" in them afaik).

File-system errors are unlikely, and for the most part Moses treats text 
as byte strings so encoding errors usually trace back to the source text.

cheers - Barry

On 18/01/16 14:56, Dingyuan Wang wrote:
> Hi Barry,
>
> "The ones starting with the "@"" are due to corrupted bytes in the nbest
> list.
>
> This kind of corruption occurs from time to time. I wonder if it comes
> from memory errors or filesystem failure or some kind of
> pointer/encoding problem in moses.
>
> I've written a script to find such corrupted lines:
>
> https://gist.github.com/gumblex/0d9d0848b435e4f9818f
>
> 在 2016年01月18日 20:42, Barry Haddow 写道:
>> Hi Dingyuan
>>
>> The extractor expects feature names to contain an underscore (not sure
>> exactly why) but some of yours don't, and Moses skips them, interpreting
>> their values as extra dense features.
>>
>> The attached screenshot shows my view of the offending names. The ones
>> starting with the "@" are the problem. So it does look like the nbest
>> list is corrupted. Can you run the decoder on just that sentence, to
>> create an uncompressed version of the nbest list?
>>
>> cheers - Barry
>>
>> On 18/01/16 12:02, Dingyuan Wang wrote:
>>> Hi Barry,
>>>
>>> Attached is the zgrep result.
>>> I found that in the middle of line 61 a few bytes are corrupted. Is that
>>> a moses problem or my memory has a problem?
>>>
>>> I also checked other files using iconv, they are all OK in UTF-8.
>>>
>>> 在 2016年01月18日 19:32, Barry Haddow 写道:
>>>> Hi Dingyuan
>>>>
>>>> Yes, that's very possible. The error could be in extracting features.dat
>>>> from the nbest list. Are you able to post the nbest list? Or at least
>>>> the entries for sentence 16?
>>>>
>>>> Run something like
>>>>
>>>> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
>>>>
>>>> cheers - Barry
>>>>
>>>> On 18/01/16 11:24, Dingyuan Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> I have rerun the ems after the first email, and then posted the recent
>>>>> results, so the line changed.
>>>>>
>>>>> I just use the latest code, and the EMS script. Pretty much are default
>>>>> settings. The EMS setting is:
>>>>>
>>>>> sparse-features = "target-word-insertion top 50, source-word-deletion
>>>>> top 50, word-translation top 50 50, phrase-length"
>>>>>
>>>>> I suspect there is something unexpected in the extractor.
>>>>>
>>>>>
>>>>> 在 2016年01月18日 19:03, Barry Haddow 写道:
>>>>>> Hi Dingyuan
>>>>>>
>>>>>> In fact it is not the sparse features nor the Asian characters that
>>>>>> are
>>>>>> the problem. The offending line has 17 dense features, yet your model
>>>>>> has 14 dense features.
>>>>>>
>>>>>> The string "1 1 1" appears directly after the language model
>>>>>> feature in
>>>>>> line 1694, in your attachment, adding the extra 3 features. Note that
>>>>>> this is not the line you mentioned in your earlier email.
>>>>>>
>>>>>> I have no idea why there are extra features. Have you made changes to
>>>>>> any of the core Moses features?
>>>>>>
>>>>>> best wishes
>>>>>> Barry
>>>>>>
>>>>>> The offending line:
>>>>>> what():  Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39
>>>>>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_，~，=3 WT_：~：=1 WT_“~“=1
>>>>>> WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5
>>>>>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1
>>>>>> WT_！~！=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1
>>>>>> WT_而~
>>>>>> 却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1
>>>>>> WT_者~
>>>>>> 的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_
&g

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

Is it possible to attach the features.dat file that is causing the 
error? Almost certainly Moses is failing to parse the line because of 
the Asian characters in the feature names,

cheers - Barry

On 16/01/16 15:58, Dingyuan Wang wrote:
> I ran
>
> ~/software/moses/bin/kbmira -J 75  --dense-init run7.dense --sparse-init
> run7.sparse-weights  --ffile run1.features.dat --ffile run2.features.dat
> --ffile run3.features.dat --ffile run4.features.dat --ffile
> run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out
>
> in the tuning/tmp.1 directory, which will certainly replicate the error.
>
> 在 2016年01月16日 23:42, Hieu Hoang 写道:
>> The mert script prints out every command it runs. You should be able to
>> replicate the error by running the last command
>>
>> On 16 Jan 2016 14:18, "Dingyuan Wang" > > wrote:
>>
>>  Sorry, but I can't reliably replicate the same problem when running
>>  TUNING_tune.1 alone. There is no character '_' in the test set or top50
>>  list.
>>
>>  I'm using sparse-features = "target-word-insertion top 50,
>>  source-word-deletion top 50, word-translation top 50 50, phrase-length"
>>
>>  I've attached some related files from EMS and the EMS config.
>>
>>  https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>
>>  在 2016年01月16日 02:45, Hieu Hoang 写道:
>>  > could you make your model files available for download so I can
>>  > replicate this problem.
>>  >
>>  > it seems like you're using a feature function with sparse scores. I
>>  > think the character '_' must be escaped.
>>  >
>>  >
>>  > On 12/01/16 04:00, Dingyuan Wang wrote:
>>  >> Hi all,
>>  >>
>>  >> I'm using EMS for doing experiments. Every time the kbmira died with
>>  >> SIGABRT when turning on one direction, while tuning on the opposite
>>  >> direction (same config and test set) was successful.
>>  >>
>>  >> The mert.log (stderr) shows follows:
>>  >>
>>  >>
>>  >> kbmira with c=0.01 decay=0.999 no_shuffle=0
>>  >> Initialising random seed from system clock
>>  >> Found 15323 initial sparse features
>>  >> terminate called after throwing an instance of
>>  >> 'MosesTuning::FileFormatException'
>>  >>what():  Error in line "-4.51933 0 0 -6.09733 0 0 0 -121.556 2
>>  -20 12
>>  >> -31.6201 -38.5211 -26.5112 -60.6166 WT_，~，=2 WT_？~？=1 PL_s1=4
>>  >> PL_s3=1 PL_3,3=1 PL_2,2=3 PL_1,2=1 PL_2,1=3 PL_t1=6 PL_t2=4 PL_t3=2
>>  >> PL_2,3=1 PL_s2=7 PL_1,1=3 WT_未~没有=1 WT_何~怎么=1 WT_何~能=1
>>  WT_方~正
>>  >> 在=1 WT_又~还=1 WT_君~您=2 WT_趣~向=1 WT_趣~奔=1 WT_有~没有=1 WT_
>>  往~去=1
>>  >> WT_官~官员=1 WT_假~借=1 WT_檄~檄文=1 WT_文~文告=1 WT_上~上级=1 WT_为~
>>  >> 呢=1 WT_在~正在=1 " of run7.features.dat
>>  >> Aborted
>>  >>
>>  >>
>>  >> I think since run7.scores.dat is generated by some scripts, I
>>  wouldn't
>>  >> be responsible for making the bad format. Last time it also died, I
>>  >> removed the likely offending line in the test set, but this time
>>  another
>>  >> line appears.
>>  >>
>>  >> --
>>  >> Dingyuan Wang
>>  >> ___
>>  >> Moses-support mailing list
>>  >> Moses-support@mit.edu 
>>  >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>  >
>>
>>  --
>>  Dingyuan Wang (gumblex)
>>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

Yes, that's very possible. The error could be in extracting features.dat 
from the nbest list. Are you able to post the nbest list? Or at least 
the entries for sentence 16?

Run something like

zgrep "^16 " tuning/tmp.1/run7.best100.out.gz

cheers - Barry

On 18/01/16 11:24, Dingyuan Wang wrote:
> Hi Barry,
>
> I have rerun the ems after the first email, and then posted the recent
> results, so the line changed.
>
> I just use the latest code, and the EMS script. Pretty much are default
> settings. The EMS setting is:
>
> sparse-features = "target-word-insertion top 50, source-word-deletion
> top 50, word-translation top 50 50, phrase-length"
>
> I suspect there is something unexpected in the extractor.
>
>
> 在 2016年01月18日 19:03, Barry Haddow 写道:
>> Hi Dingyuan
>>
>> In fact it is not the sparse features nor the Asian characters that are
>> the problem. The offending line has 17 dense features, yet your model
>> has 14 dense features.
>>
>> The string "1 1 1" appears directly after the language model feature in
>> line 1694, in your attachment, adding the extra 3 features. Note that
>> this is not the line you mentioned in your earlier email.
>>
>> I have no idea why there are extra features. Have you made changes to
>> any of the core Moses features?
>>
>> best wishes
>> Barry
>>
>> The offending line:
>> what():  Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39
>> 18 -26.2331 -40.6736 -44.3698 -82.5072 WT_，~，=3 WT_：~：=1 WT_“~“=1
>> WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5
>> PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1
>> WT_！~！=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~
>> 却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~
>> 的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发
>> 誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中
>> 原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_
>> 流~中流=1 WT_部曲~部下=1 " of ...
>>
>>
>> On 18/01/16 10:37, Dingyuan Wang wrote:
>>> Hi,
>>>
>>> I've attached that. The line number is 1694.
>>>
>>> 在 2016年01月18日 16:43, Barry Haddow 写道:
>>>> Hi Dingyuan
>>>>
>>>> Is it possible to attach the features.dat file that is causing the
>>>> error? Almost certainly Moses is failing to parse the line because of
>>>> the Asian characters in the feature names,
>>>>
>>>> cheers - Barry
>>>>
>>>> On 16/01/16 15:58, Dingyuan Wang wrote:
>>>>> I ran
>>>>>
>>>>> ~/software/moses/bin/kbmira -J 75  --dense-init run7.dense
>>>>> --sparse-init
>>>>> run7.sparse-weights  --ffile run1.features.dat --ffile
>>>>> run2.features.dat
>>>>> --ffile run3.features.dat --ffile run4.features.dat --ffile
>>>>> run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
>>>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
>>>>> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
>>>>> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out
>>>>>
>>>>> in the tuning/tmp.1 directory, which will certainly replicate the
>>>>> error.
>>>>>
>>>>> 在 2016年01月16日 23:42, Hieu Hoang 写道:
>>>>>> The mert script prints out every command it runs. You should be
>>>>>> able to
>>>>>> replicate the error by running the last command
>>>>>>
>>>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle...@gmail.com
>>>>>> <mailto:abcdoyle...@gmail.com>> wrote:
>>>>>>
>>>>>>Sorry, but I can't reliably replicate the same problem when
>>>>>> running
>>>>>>TUNING_tune.1 alone. There is no character '_' in the test set
>>>>>> or top50
>>>>>>list.
>>>>>>
>>>>>>I'm using sparse-features = "target-word-insertion top 50,
>>>>>>source-word-deletion top 50, word-translation top 50 50,
>>>>>> phrase-length"
>>>>>>
>>>>>>I've attached some related files from EMS and the EMS config.
>>>>>>
>>>>>>  
>>>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

The extractor expects feature names to contain an underscore (not sure 
exactly why) but some of yours don't, and Moses skips them, interpreting 
their values as extra dense features.

The attached screenshot shows my view of the offending names. The ones 
starting with the "@" are the problem. So it does look like the nbest 
list is corrupted. Can you run the decoder on just that sentence, to 
create an uncompressed version of the nbest list?

cheers - Barry

On 18/01/16 12:02, Dingyuan Wang wrote:

Hi Barry,

Attached is the zgrep result.
I found that in the middle of line 61 a few bytes are corrupted. Is that
a moses problem or my memory has a problem?

I also checked other files using iconv, they are all OK in UTF-8.

在 2016年01月18日 19:32, Barry Haddow 写道:

Hi Dingyuan

Yes, that's very possible. The error could be in extracting features.dat
from the nbest list. Are you able to post the nbest list? Or at least
the entries for sentence 16?

Run something like

zgrep "^16 " tuning/tmp.1/run7.best100.out.gz

cheers - Barry

On 18/01/16 11:24, Dingyuan Wang wrote:

Hi Barry,

I have rerun the ems after the first email, and then posted the recent
results, so the line changed.

I just use the latest code, and the EMS script. Pretty much are default
settings. The EMS setting is:

sparse-features = "target-word-insertion top 50, source-word-deletion
top 50, word-translation top 50 50, phrase-length"

I suspect there is something unexpected in the extractor.

在 2016年01月18日 19:03, Barry Haddow 写道:

Hi Dingyuan

In fact it is not the sparse features nor the Asian characters that are
the problem. The offending line has 17 dense features, yet your model
has 14 dense features.

The string "1 1 1" appears directly after the language model feature in
line 1694, in your attachment, adding the extra 3 features. Note that
this is not the line you mentioned in your earlier email.

I have no idea why there are extra features. Have you made changes to
any of the core Moses features?

best wishes
Barry

The offending line:
what():  Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39
18 -26.2331 -40.6736 -44.3698 -82.5072 WT_，~，=3 WT_：~：=1 WT_“~“=1
WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5
PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1
WT_！~！=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~
却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~
的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发
誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中
原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_
流~中流=1 WT_部曲~部下=1 " of ...

On 18/01/16 10:37, Dingyuan Wang wrote:

Hi,

I've attached that. The line number is 1694.

在 2016年01月18日 16:43, Barry Haddow 写道:

Hi Dingyuan

Is it possible to attach the features.dat file that is causing the
error? Almost certainly Moses is failing to parse the line because of
the Asian characters in the feature names,

cheers - Barry

On 16/01/16 15:58, Dingyuan Wang wrote:

I ran

~/software/moses/bin/kbmira -J 75  --dense-init run7.dense
--sparse-init
run7.sparse-weights  --ffile run1.features.dat --ffile
run2.features.dat
--ffile run3.features.dat --ffile run4.features.dat --ffile
run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
--scfile run1.scores.dat --scfile run2.scores.dat --scfile
run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
--scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out

in the tuning/tmp.1 directory, which will certainly replicate the
error.

在 2016年01月16日 23:42, Hieu Hoang 写道:

The mert script prints out every command it runs. You should be
able to
replicate the error by running the last command

On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle...@gmail.com
<mailto:abcdoyle...@gmail.com>> wrote:

Sorry, but I can't reliably replicate the same problem when
running
TUNING_tune.1 alone. There is no character '_' in the test
set
or top50
list.

I'm using sparse-features = "target-word-insertion top 50,
source-word-deletion top 50, word-translation top 50 50,
phrase-length"

I've attached some related files from EMS and the EMS config.

https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4

在 2016年01月16日 02:45, Hieu Hoang 写道:
> could you make your model files available for download so I
can
> replicate this problem.
>
> it seems like you're using a feature function with sparse
scores. I
> think the character '_' must be escaped.
>
>
> On 12/01/16 04:00, Dingyuan Wang wrote:
>> Hi all,
>>
>> I'm using EMS for doing experiments. Every time the kbmira
died with
>> SIGABRT when turning on one direction, while tuning on the

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

In fact it is not the sparse features nor the Asian characters that are 
the problem. The offending line has 17 dense features, yet your model 
has 14 dense features.

The string "1 1 1" appears directly after the language model feature in 
line 1694, in your attachment, adding the extra 3 features. Note that 
this is not the line you mentioned in your earlier email.

I have no idea why there are extra features. Have you made changes to 
any of the core Moses features?

best wishes
Barry

The offending line:
what():  Error in line "-5.44027 0 0 -5.34901 0 0 0 -224.872 1 1 1 -39 
18 -26.2331 -40.6736 -44.3698 -82.5072 WT_，~，=3 WT_：~：=1 WT_“~“=1 
WT_”~”=1 WT_曰~说=1 PL_s3=5 PL_3,2=2 PL_3,3=3 PL_2,3=4 PL_t3=7 PL_s1=5 
PL_1,2=2 PL_1,1=3 PL_t1=4 PL_2,2=3 PL_t2=7 PL_s2=8 PL_2,1=1 WT_有~有=1 
WT_！~！=1 WT_其~的=1 WT_其~他=1 WT_不~也=1 WT_不~没=1 WT_而~而=1 WT_而~ 
却=1 WT_祖逖~逖=1 WT_祖逖~祖=1 WT_逖~祖=1 WT_逖~逖=1 WT_大~大江=1 WT_者~ 
的=1 WT_者~人=1 WT_江~大江=1 WT_渡~渡过=1 WT_复~又=1 WT_余~有=1 WT_誓~发 
誓=1 WT_楫~木=1 WT_江~长江=1 WT_击~击=1 WT_将~带领=1 WT_济~成功=1 WT_中 
原~中原=1 WT_清~廓清=1 WT_如~像=1 WT_楫~戢=1 WT_能~能=1 WT_中~中流=1 WT_ 
流~中流=1 WT_部曲~部下=1 " of ...


On 18/01/16 10:37, Dingyuan Wang wrote:
> Hi,
>
> I've attached that. The line number is 1694.
>
> 在 2016年01月18日 16:43, Barry Haddow 写道:
>> Hi Dingyuan
>>
>> Is it possible to attach the features.dat file that is causing the
>> error? Almost certainly Moses is failing to parse the line because of
>> the Asian characters in the feature names,
>>
>> cheers - Barry
>>
>> On 16/01/16 15:58, Dingyuan Wang wrote:
>>> I ran
>>>
>>> ~/software/moses/bin/kbmira -J 75  --dense-init run7.dense --sparse-init
>>> run7.sparse-weights  --ffile run1.features.dat --ffile run2.features.dat
>>> --ffile run3.features.dat --ffile run4.features.dat --ffile
>>> run5.features.dat --ffile run6.features.dat --ffile run7.features.dat
>>> --scfile run1.scores.dat --scfile run2.scores.dat --scfile
>>> run3.scores.dat --scfile run4.scores.dat --scfile run5.scores.dat
>>> --scfile run6.scores.dat --scfile run7.scores.dat -o /tmp/mert.out
>>>
>>> in the tuning/tmp.1 directory, which will certainly replicate the error.
>>>
>>> 在 2016年01月16日 23:42, Hieu Hoang 写道:
>>>> The mert script prints out every command it runs. You should be able to
>>>> replicate the error by running the last command
>>>>
>>>> On 16 Jan 2016 14:18, "Dingyuan Wang" <abcdoyle...@gmail.com
>>>> <mailto:abcdoyle...@gmail.com>> wrote:
>>>>
>>>>   Sorry, but I can't reliably replicate the same problem when running
>>>>   TUNING_tune.1 alone. There is no character '_' in the test set
>>>> or top50
>>>>   list.
>>>>
>>>>   I'm using sparse-features = "target-word-insertion top 50,
>>>>   source-word-deletion top 50, word-translation top 50 50,
>>>> phrase-length"
>>>>
>>>>   I've attached some related files from EMS and the EMS config.
>>>>
>>>>  
>>>> https://mega.nz/#!xs0SFKxL!M_RTBp1JGX24-b4xlYYLP-bLXKiC_Sl-p96x55avAB4
>>>>
>>>>   在 2016年01月16日 02:45, Hieu Hoang 写道:
>>>>   > could you make your model files available for download so I can
>>>>   > replicate this problem.
>>>>   >
>>>>   > it seems like you're using a feature function with sparse
>>>> scores. I
>>>>   > think the character '_' must be escaped.
>>>>   >
>>>>   >
>>>>   > On 12/01/16 04:00, Dingyuan Wang wrote:
>>>>   >> Hi all,
>>>>   >>
>>>>   >> I'm using EMS for doing experiments. Every time the kbmira
>>>> died with
>>>>   >> SIGABRT when turning on one direction, while tuning on the
>>>> opposite
>>>>   >> direction (same config and test set) was successful.
>>>>   >>
>>>>   >> The mert.log (stderr) shows follows:
>>>>   >>
>>>>   >>
>>>>   >> kbmira with c=0.01 decay=0.999 no_shuffle=0
>>>>   >> Initialising random seed from system clock
>>>>   >> Found 15323 initial sparse features
>>>>   >> terminate called after throwing an instance of
>>>>   >> 'MosesTuning::FileFormatException'
>>>>   >>what():  Error in line "-4.51933 0 0 -6.09733 0 0 0
&

Re: [Moses-support] kbmira died with SIGABRT when tuning

2016-01-18 Thread Barry Haddow

Hi Dingyuan

Something is going wrong with the construction or outputting of feature 
names, and it looks like it's WordTranslationFeature that's the problem. 
Does the problem go away if you do not use word translation features?

If you could make available a model that reproduces the nbest list 
construction then I would have a chance to debug it,

cheers - Barry

On 18/01/16 15:32, Dingyuan Wang wrote:
> Hi Barry,
>
> I've checked all the models and corpora with the script, without finding
> any encoding problem.
>
> I also find that all such errors in nbest list occurs only in the
> feature list (3 different samples), without affecting translation
> result. Therefore, the phrase table or training corpus may not be the
> problem.
>
> 在 2016年01月18日 23:04, Barry Haddow 写道:
>> Hi Dingyuan
>>
>> Are these encoding errors present in your phrase table? Are they present
>> in your training corpus? Since they appear in the word translation
>> features, and you are using a shortlist, are they in the shortlist files
>> in the model directory? (These have names with "topn" in them afaik).
>>
>> File-system errors are unlikely, and for the most part Moses treats text
>> as byte strings so encoding errors usually trace back to the source text.
>>
>> cheers - Barry
>>
>> On 18/01/16 14:56, Dingyuan Wang wrote:
>>> Hi Barry,
>>>
>>> "The ones starting with the "@"" are due to corrupted bytes in the nbest
>>> list.
>>>
>>> This kind of corruption occurs from time to time. I wonder if it comes
>>> from memory errors or filesystem failure or some kind of
>>> pointer/encoding problem in moses.
>>>
>>> I've written a script to find such corrupted lines:
>>>
>>> https://gist.github.com/gumblex/0d9d0848b435e4f9818f
>>>
>>> 在 2016年01月18日 20:42, Barry Haddow 写道:
>>>> Hi Dingyuan
>>>>
>>>> The extractor expects feature names to contain an underscore (not sure
>>>> exactly why) but some of yours don't, and Moses skips them, interpreting
>>>> their values as extra dense features.
>>>>
>>>> The attached screenshot shows my view of the offending names. The ones
>>>> starting with the "@" are the problem. So it does look like the nbest
>>>> list is corrupted. Can you run the decoder on just that sentence, to
>>>> create an uncompressed version of the nbest list?
>>>>
>>>> cheers - Barry
>>>>
>>>> On 18/01/16 12:02, Dingyuan Wang wrote:
>>>>> Hi Barry,
>>>>>
>>>>> Attached is the zgrep result.
>>>>> I found that in the middle of line 61 a few bytes are corrupted. Is
>>>>> that
>>>>> a moses problem or my memory has a problem?
>>>>>
>>>>> I also checked other files using iconv, they are all OK in UTF-8.
>>>>>
>>>>> 在 2016年01月18日 19:32, Barry Haddow 写道:
>>>>>> Hi Dingyuan
>>>>>>
>>>>>> Yes, that's very possible. The error could be in extracting
>>>>>> features.dat
>>>>>> from the nbest list. Are you able to post the nbest list? Or at least
>>>>>> the entries for sentence 16?
>>>>>>
>>>>>> Run something like
>>>>>>
>>>>>> zgrep "^16 " tuning/tmp.1/run7.best100.out.gz
>>>>>>
>>>>>> cheers - Barry
>>>>>>
>>>>>> On 18/01/16 11:24, Dingyuan Wang wrote:
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> I have rerun the ems after the first email, and then posted the
>>>>>>> recent
>>>>>>> results, so the line changed.
>>>>>>>
>>>>>>> I just use the latest code, and the EMS script. Pretty much are
>>>>>>> default
>>>>>>> settings. The EMS setting is:
>>>>>>>
>>>>>>> sparse-features = "target-word-insertion top 50, source-word-deletion
>>>>>>> top 50, word-translation top 50 50, phrase-length"
>>>>>>>
>>>>>>> I suspect there is something unexpected in the extractor.
>>>>>>>
>>>>>>>
>>>>>>> 在 2016年01月18日 19:03, Barry Haddow 写道:
>>>>>>>> Hi Dingyuan
>>>>>>>>
>>>>>>>> In fact it is not t

Re: [Moses-support] Error when running moses server with sample code

2016-01-12 Thread Barry Haddow


Hi Lane

Can you get a stack trace to see which line the message is coming from? 
That error message is repeated in a few files.


From looking at the code, I'd guess that the OutputFactorOrder is not 
being initialised correctly. Possibly due to the refactoring of the 
config code. Does your example work with an earlier version of Moses?


cheers - Barry

On 11/01/16 21:43, Lane Schwartz wrote:

Hi,

I'm trying out mosesserver for the first time. I have a config file 
for fr-en that uses a smallish TM and LM, that work fine when run with 
Moses. When I try running the same config using mosesserver, and then 
use the sample Perl or Python code in contrib/server, mosesserver dies 
with the following error:


Defined parameters (per moses.ini or switch):
config: moses.tuned.ini.2.probing.noLexRO
cube-pruning-pop-limit: 400
distortion-limit: 6
feature: UnknownWordPenalty WordPenalty PhrasePenalty ProbingPT
name=TranslationModel0 num-features=4
path=phrase-table.2.pruned100.probing/ input-factor=0
output-factor=0 table-limit=20 Distortion KENLM lazyken=1 name=LM0
factor=0 path=europarl.kenlm order=5
input-factors: 0
mapping: 0 T 0
max-phrase-length: 20
n-best-list: nbest.txt 111
output-hypo-score: 1
search-algorithm: 0
server:
threads: 1
weight: Distortion0= 0.0222366 LM0= 0.0834208 WordPenalty0=
-0.0654626 PhrasePenalty0= 0.0220686 TranslationModel0= 0.0520176
0.0415173 0.124293 0.027126 UnknownWordPenalty0= 1

line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 1 end: 1
line=PhrasePenalty
FeatureFunction: PhrasePenalty0 start: 2 end: 2
line=ProbingPT name=TranslationModel0 num-features=4
path=phrase-table.2.pruned100.probing/ input-factor=0
output-factor=0 table-limit=20
FeatureFunction: TranslationModel0 start: 3 end: 6
line=Distortion
FeatureFunction: Distortion0 start: 7 end: 7
line=KENLM lazyken=1 name=LM0 factor=0 path=europarl.kenlm order=5
FeatureFunction: LM0 start: 8 end: 8
Loading UnknownWordPenalty0
Loading WordPenalty0
Loading PhrasePenalty0
Loading Distortion0
Loading LM0
Loading TranslationModel0
Initialized successfully!

RUN SERVER at pid 1733327039
[moses/server/Server.cpp:49] Listening on port 8080
[moses/server/TranslationRequest.cpp:281] Input: il a souhaité que
la présidence trace à nice le chemin pour l' avenir .
Translating: il a souhaité que la présidence trace à nice le
chemin pour l' avenir .

Line 0: Collecting options took 0.038 seconds at moses/Manager.cpp
Line 141
Line 0: Search took 0.674 seconds
terminate called after throwing an instance of 'util::Exception'
  what():  No factor 1 at position 0


Any idea what's going on here? I'm using a basic single-factor model, 
so I don't get why it would be complaining about factors. I'm using 
the latest moses from git.


Thanks,
Lane


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call For Papers: First Conference on Machine Translation (WMT16)

2016-01-11 Thread Barry Haddow


ACL 2016 FIRST CONFERENCE ON MACHINE TRANSLATION (WMT16)

Featuring shared tasks on translation, evaluation, automated 
post-editing and document alignment.


11-12th August 2016, in conjunction with ACL 2016 in Berlin, Germany


http://www.statmt.org/wmt16/

*** CALL FOR PAPERS ***

We invite the submission of scientific papers on topics related to MT.
Topics of interest include, but are not limited to:

* word-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* incorporating linguistic information into SMT
* decoding
* system combination and selection
* error analysis
* manual and automatic methods for evaluating MT
* quality estimation of MT
* scaling MT to very large data sets
* neural networks in MT

SHARED TASKS

The workshop will feature ten shared tasks:

* Translation tasks

* News

* IT-domain

* Biomedical

* Multimodal

* Pronoun

* Evaluation tasks

* Metrics

* Quality estimation

* Tuning

* Other tasks

* Automatic post-editing

* Bilingual document alignment
The tasks have been announced separately and more information is
available on the workshop website.

PAPER SUBMISSION INFORMATION

Submissions will consist of regular full papers of 6-10 pages, plus
additional pages for references, formatted following the ACL 2016
guidelines. In addition, shared task participants will be invited to
submit short papers (4-6 pages) describing their systems or their
evaluation metrics. Both submission and review processes will be
handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop and past workshops, so that their experiments can be
repeated by others using these publicly available corpora.

IMPORTANT DATES

Paper submissions:

Paper submission deadline: May 8th, 2016
Notification of acceptance: June 5th, 2016
Camera-ready deadline: June 22nd, 2016

Workshop in Berlin following ACL: August 11-12th, 2016

For shared task timetable, see website.



Barry Haddow
(On behalf of the organisers)
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for papers: LREC 2016 Workshop "Translation Evaluation – From fragmented tools and data sets to an integrated ecosystem"

2016-01-06 Thread Barry Haddow

 Bruno Kessler (FBK), Italy
Christian Federmann, Microsoft, USA
Rosa Gaudio, Higher Functions, Portugal
Josef van Genabith, Deutsches Forschungszentrum für Künstliche 
Intelligenz (DFKI), Germany

Barry Haddow, University of Edinburgh, UK
Jan Hajic, Charles University in Prague, Czech Republic
Kim Harris, text, Germany
Matthias Heyn, SDL, Belgium
Philipp Koehn, Johns Hopkins University, USA, and University of 
Edinburgh, UK

Christian Lieske, SAP, Germany
Lena Marg, Welocalize, UK
Katrin Marheinecke, text, Germany
Matteo Negri, Fondazione Bruno Kessler (FBK), Italy
Martin Popel, Charles University in Prague, Czech Republic
Jörg Porsiel, Volkswagen AG, Germany
Georg Rehm, Deutsches Forschungszentrum für Künstliche Intelligenz 
(DFKI), Germany

Rubén Rodriguez de la Fuente, PayPal, Spain
Lucia Specia, University of Sheffield, UK
Marco Turchi, Fondazione Bruno Kessler (FBK), Italy
Hans Uszkoreit, Deutsches Forschungszentrum für Künstliche Intelligenz 
(DFKI), Germany


*http://www.cracking-the-language-barrier.eu/mt-eval-workshop-2016/*

This workshop is a joint activity of the EU projects QT21 and CRACKER.

/– apologies for cross-posting –/
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] First Call for Participation: WMT16 Machine Translated related Shared Tasks

2015-12-14 Thread Barry Haddow


ACL 2016 FIRST CONFERENCE ON MACHINE TRANSLATION (WMT16)
Shared Tasks on translation, evaluation, automated post-editing and 
document alignment.

August 2016, in conjunction with ACL 2016 in Berlin, Germany

http://www.statmt.org/wmt16

As part of WMT, as in previous years, we will be organising a collection 
of shared tasks related to machine translation.  We hope that both 
beginners and established research groups will participate. This year we 
are pleased to present the following 10 tasks:


- Translation tasks
- News
- IT-domain
- Biomedical
- Multimodal
- Pronoun
- Evaluation tasks
- Metrics
- Quality estimation
- Tuning
- Other tasks
- Automatic post-editing
- Bilingual document alignment

Further information, including task rationale, timetables and data will 
be posted on the WMT16 website, and fully announced in January. Brief 
descriptions of each task are given below. Intending participants are 
encouraged to register with the mailing list for further announcements 
(https://groups.google.com/forum/#!forum/wmt-tasks)


For all tasks,  participants will also be  invited to submit a short 
paper describing their system.


News Translation Task
-
This is the translation task run at most of the past WMT editions. This 
year the language pairs will be English to/from Czech, Finnish, German, 
Romanian, Russian and Turkish. Sponsorship for the task comes from the 
EU H2020 projects QT21 and Cracker, Yandex and the University of Helsinki.


IT Domain Translation Task
-
This guest task will involve translation of queries and their responses, 
on the topic of information technology. It will cover English to/from 
Bulgarian, Czech, German, Spanish, Basque, Dutch and Portugese, and be 
sponsored by the EU FP7 project QTLeap.


Biomedical Translation Task
-
This guest task will focus on the translation of biomedical research 
abstracts from English to and from Spanish, Portuguese and French.


Multimodal Translation Task
-
This task will aim at generating image descriptions in a target 
language, given equivalent descriptions in one or more languages. The 
dataset will consist of 30,000 image--description tuples in three 
languages -- English, German and French.


Pronoun Translation Task
-
This will be similar to the task run last year as part of the DiscoMT 
workshop (https://www.idiap.ch/workshop/DiscoMT/shared-task)


Metrics
--
The idea here is that participants propose evaluation metrics for 
machine translation, which compare the MT output against a reference. 
The metrics will be correlated against the human judgements produced in 
the news translation task. This task is sponsored by QT21.


Quality Estimation
-
This consists of several sub-tasks, all of which are concerned with the 
idea of assessing the quality of MT output without using a reference, at 
different levels of granularity: word, phrase, sentence and document. 
This task is sponsored by QT21.


Tuning
-
Participants in this task are asked to come up with algorithms and 
objectives (i.e. metrics) for tuning the parameters of a given MT system.


Automatic Post-editing
---
In this task participants will aim to create systems that can 
automatically correct machine translation outputs, given a corpus of 
human post-edits. This task is sponsored by QT21.


Bilingual document alignment

The aim is to find translated document pairs from a large collection of 
documents in two languages.


Best wishes
Barry Haddow
(On behalf of the organisers)







The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] different versions of moses yielding different translations

2015-11-26 Thread Barry Haddow


Hi Vito

It's clear from your example that PhraseDictionaryMultiModel is giving 
different scores in each version (compare the 1st hypothesis of old with 
the 3rd of new), and that should not happen.


I'm not familiar with the changes made to this class, so maybe someone 
that is can suggest where to look. Hieu?


cheers - Barry

On 26/11/15 13:39, Vito Mandorino wrote:
Yes, the moses.ini is the same in the two cases. I don't see any 
difference other than the moses version. Here is the 5-best list for 
the segment 'test' in the two cases. The phrase-table scores are 
different and the rankings change accordingly.


---

echo 'test' | old_mosesdecoder/bin/moses -f ../moses.ini -mp 
-n-best-list nbest_oldMoses 5


0 ||| test  ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0= 0 
LM0= -46.0739 WordPenalty0= -1 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -1.36401 -1.16642 -2.38112 -1.93671 ||| 
-1.72803
0 ||| épreuve  ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0 LM0= 
-33.438 WordPenalty0= -1 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -2.40188 -2.66496 -4.1693 -4.0067 ||| 
-2.04003
0 ||| test sur  ||| LexicalReordering0= -5.1761 0 0 0 0 0 Distortion0= 
0 LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -1.63217 -2.48226 -3.3935 -4.16375 ||| 
-2.2293
0 ||| test sur les  ||| LexicalReordering0= -4.29043 0 0 0 0 0 
Distortion0= 0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -1.71765 -2.88763 -4.30406 -7.17601 ||| 
-2.37806
0 ||| tester  ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0 LM0= 
-47.6457 WordPenalty0= -1 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -2.1744 -2.19974 -4.52128 -4.49808 ||| 
-2.49226


---

echo 'test' | new_mosesdecoder/bin/moses -f ../moses.ini -mp 
-n-best-list nbest_newMoses 5


0 ||| test sur les  ||| LexicalReordering0= -4.29043 0 0 0 0 0 
Distortion0= 0 LM0= -57.1853 WordPenalty0= -3 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -0.968934 -1.36372 -1.54406 -1.60562 ||| 
-1.3334
0 ||| critère de la  ||| LexicalReordering0= -4.14314 0 0 0 0 0 
Distortion0= 0 LM0= -58.2007 WordPenalty0= -3 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -0.916291 -1.4366 -1.55314 -1.60891 ||| 
-1.35742
0 ||| test  ||| LexicalReordering0= -0.859778 0 0 0 0 0 Distortion0= 0 
LM0= -46.0739 WordPenalty0= -1 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -1.17713 -1.00259 -1.42326 -1.24867 ||| 
-1.4286
0 ||| test sur  ||| LexicalReordering0= -5.1761 0 0 0 0 0 Distortion0= 
0 LM0= -55.2752 WordPenalty0= -2 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -0.92759 -1.26035 -1.45418 -1.53457 ||| 
-1.53375
0 ||| critère de  ||| LexicalReordering0= -4.41886 0 0 0 0 0 
Distortion0= 0 LM0= -57.5157 WordPenalty0= -2 PhrasePenalty0= 1 
PhraseDictionaryMultiModel0= -1.09861 -1.4366 -1.53505 -1.59179 ||| 
-1.6079



Vito

2015-11-26 12:16 GMT+01:00 Barry Haddow <bhad...@inf.ed.ac.uk 
<mailto:bhad...@inf.ed.ac.uk>>:


Hi Vito

The tcmalloc message is normal.

Are you absolutely sure you are using the same model (and same
pre- and post-processing)? A difference of 5 or 14 bleu should be
quite visible in the output. What do the outputs look like?

cheers - Barry


On 26/11/15 09:58, Vito Mandorino wrote:

Hi Barry,

actually with OnDisk table there is virtually no difference (0.2
average difference no matter if re-tuning has been done or not).
With compact Phrase-table however the difference is larger. The
latest test this morning yields a loss of 14 Bleu score points
without re-tuning. I don't know which could be the cause.
Sometimes there is this message on loading the phrase-tables
tcmalloc: large alloc 1149427712 bytes == 0x28a54000 @

After re-tuning however the difference in BLEU score gets smaller
even with compact phrase-table.

Best regards,
Vito

2015-11-25 21:23 GMT+01:00 Barry Haddow <bhad...@inf.ed.ac.uk
<mailto:bhad...@inf.ed.ac.uk>>:

Hi Vito

The 0.2 difference is after retuning? That's normal then.

But a difference of 5 bleu without retuning suggests a bug.
Did you say that this only happens with
PhraseDictionaryMultiModel?

cheers - Barry


On 25/11/15 13:53, Vito Mandorino wrote:

Thank you. In our tests it seems that with the OnDisk table
the quality is basically the same between the two versions
of Moses (average 0.2 difference in score Bleu) but for the
CompactPhraseTable the difference is larger (2 points Bleu
loss in average after re-tuning with the new version of
Moses, and more than 5 points Bleu in average without
re-tuning).
Do you think a better quality would be obtained by running a
complete re-training of the model with the new version of Moses?


Best regards,
Vito

2015-11-24 16:31 GMT+01:00 Hieu Hoang <hieuho...@gmail.com

Re: [Moses-support] different versions of moses yielding different translations

2015-11-25 Thread Barry Haddow


Hi Vito

The 0.2 difference is after retuning? That's normal then.

But a difference of 5 bleu without retuning suggests a bug. Did you say 
that this only happens with PhraseDictionaryMultiModel?


cheers - Barry

On 25/11/15 13:53, Vito Mandorino wrote:
Thank you. In our tests it seems that with the OnDisk table the 
quality is basically the same between the two versions of Moses 
(average 0.2 difference in score Bleu) but for the CompactPhraseTable 
the difference is larger (2 points Bleu loss in average after 
re-tuning with the new version of Moses, and more than 5 points Bleu 
in average without re-tuning).
Do you think a better quality would be obtained by running a complete 
re-training of the model with the new version of Moses?



Best regards,
Vito

2015-11-24 16:31 GMT+01:00 Hieu Hoang >:


There was a change in the underlying datastructure for stacks, it
changed from std::set (ordered) to boost::unordered_set.

https://github.com/moses-smt/mosesdecoder/commit/6b182ee5e987a5b2823aea7eaaa7ef0457c6a30d
This got some speed gains

1   5   10  15  20  25  30  35  
56  real4m57.795s   real1m19.005s   real0m51.636s   real0m49.624s
real0m49.869s   real0m52.475s   real0m53.806s   real0m54.957s
13/10 baseline  user4m41.255s   user5m45.086s   user6m34.053s
user8m12.430s   user8m10.667s   user8m16.486s   user8m10.592s   user
8m13.859s

sys0m16.514ssys0m35.494ssys0m54.513ssys1m10.643s
sys1m18.449ssys1m21.738ssys1m23.133ssys 1m25.048s










57  real4m41.148s   real1m16.002s   real0m50.747s   real0m48.711s
real0m49.130s   real0m51.473s   real0m53.141s   real0m54.513s
(56) + unordered set stack  user4m23.968s   user5m30.356s
user6m26.167s   user7m39.286s   user7m56.229s   user7m52.669s
user7m56.978s   user7m56.216s

sys0m17.231ssys0m35.063ssys0m54.081ssys1m10.137s
sys1m17.194ssys1m22.912ssys1m25.948ssys 1m26.247s


However, the hypotheses are now added to the stack in a different
order so there will be slight differences in results


On 24/11/2015 13:53, Vito Mandorino wrote:

Hi,

in some of our tests a recent version of Moses (pulled from
github last week) and an older one do not give the same
translations on the same source segment (with the same moses.ini).
Here is the 5-best list for the translation of 'test' with the
last week version:

0 ||| test  ||| LexicalReordering0= -1.1969 0 0 0 0 0
Distortion0= 0 LM0= -51.1788 WordPenalty0= -1 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -3.03811 -2.5834 -2.08503 -1.83075
||| -1.27754
0 ||| testing  ||| LexicalReordering0= 0 0 0 0 0 0 Distortion0= 0
LM0= -35.1495 WordPenalty0= -1 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -5.21045 -5.04877 -4.71131 -4.66382
||| -1.70337
0 ||| funds  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -11.3753 WordPenalty0= -1 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -10.8209 -10.6835 -5.14555 -5.73388
||| -1.77009
0 ||| known as a  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -58.8877 WordPenalty0= -3 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -4.42285 -11.9339 -5.14555 -18.0392
||| -1.89152
0 ||| as a  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -35.5353 WordPenalty0= -2 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -9.34698 -11.9339 -5.14555 -9.14874
||| -1.89159

and with the older version of Moses:

0 ||| funds  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -11.3753 WordPenalty0= -1 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -2.52548 -2.52544 -2.45544 -2.48609
||| -0.815668
0 ||| as a  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -35.5353 WordPenalty0= -2 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -2.52464 -2.52565 -2.45544 -2.5244
||| -0.953799
0 ||| as  ||| LexicalReordering0= -3.1355 0 0 0 0 0 Distortion0=
0 LM0= -34.1633 WordPenalty0= -1 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -2.5256 -2.52565 -2.45544 -2.48609
||| -1.07254
0 ||| known as a  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -58.8877 WordPenalty0= -3 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -2.38597 -2.52565 -2.45544 -2.52573
||| -1.07536
0 ||| is known as a  ||| LexicalReordering0= -3.1355 0 0 0 0 0
Distortion0= 0 LM0= -80.8518 WordPenalty0= -4 PhrasePenalty0= 1
PhraseDictionaryMultiModel0= -2.37158 -2.52565 -2.45544 -2.52573
||| -1.18753

This looks very strange. The only difference is in the
phrase-table scores. Do you have any idea of what is going on?
The only possibility

Re: [Moses-support] ems: PhraseDictionaryOnDisk binarization

2015-11-25 Thread Barry Haddow


Hi Nick

The best solution is to use the compact phrase table, and for this just add

ttable-binarizer = $moses-bin-dir/processPhraseTableMin

to the general section.

If you need to use the ondisk phrase table (sparse features, properties 
etc.) then replace the above with


ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 4 100 2"

where the "4" is the number of dense features in your phrase table.

It's a bit strange that binarize-all gives you 
PhraseDictionaryBitextSampling, maybe that's the default these days? 
Anyway, if you just want to decode a dev & test set then you don't want 
to set binarize-all


cheers - Barry



On 25/11/15 01:44, Nicholas Ruiz wrote:

Hi all,

I'm a bit behind on my moses versions and I'm using EMS for the first 
time. I trained a toy model, which gave me a PhraseTableMemory 
translation table. I'd like to binarize the phrase table and 
reordering models. I'm still operating back in the PhraseTableBinary 
days, but obviously the codebase has changed quite a bit since then.


How do I binarize the phrase table as a PhraseDictionaryOnDisk? I had 
tried uncommenting the binarize-all setting, but that gave me 
a PhraseDictionaryBitextSampling. However, I don't need to do 
incremental training -- and the tuning phase is failing anyway. Help 
would be appreciated about how to do a simple binarization in EMS.


Thanks!
Nick

zınɹ ʞɔıu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Pruning phrase-table

2015-11-24 Thread Barry Haddow


Hi

You're better off using the Johnson pruning method 
http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc5 . The relent 
code is no longer maintained,


cheers - Barry

On 24/11/15 05:42, Sanjanashree Palanivel wrote:


Dear All,

   I just tried to prune the phrase table using relent-filter 
insisde mosesdecoder. I used the command as mentioned in the moses 
site (http://www.statmt.org/moses/?n=Advanced.RuleTables). But, I am 
getting the following error "Use of uninitialized value $_[0] in 
substitution (s///) at /usr/share/perl/5.18/File/Basename.pm line 341.
fileparse(): need a valid pathname at 
/home/sanjana/Documents/SMT/mosesdecoder/contrib/relent-filter/scripts/calcPruningScores.pl 
line 140." . Where am i being worng.



--
Thanks and regards,

Sanjanasri J.P


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Pruning phrase-table

2015-11-24 Thread Barry Haddow


Hi

The documentation suggests:


A good setting is |-l a+e -n 30|


Try this first,

cheers - Barry

On 24/11/15 10:00, Sanjanashree Palanivel wrote:


Dear Barry,

Thank you for ur earnest response. I am using sigtest filter, 
now. In sigtest filter method, how should I tune a  parameters -l -n.


On Nov 24, 2015 2:44 PM, "Barry Haddow" <bhad...@staffmail.ed.ac.uk 
<mailto:bhad...@staffmail.ed.ac.uk>> wrote:


Hi

You're better off using the Johnson pruning method
http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc5 . The
relent code is no longer maintained,

cheers - Barry

On 24/11/15 05:42, Sanjanashree Palanivel wrote:


Dear All,

   I just tried to prune the phrase table using relent-filter
insisde mosesdecoder. I used the command as mentioned in the
moses site (http://www.statmt.org/moses/?n=Advanced.RuleTables).
But, I am getting the following error "Use of uninitialized value
$_[0] in substitution (s///) at
/usr/share/perl/5.18/File/Basename.pm line 341.
fileparse(): need a valid pathname at

/home/sanjana/Documents/SMT/mosesdecoder/contrib/relent-filter/scripts/calcPruningScores.pl
line 140." . Where am i being worng.


-- 
Thanks and regards,


Sanjanasri J.P


___
Moses-support mailing list
Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Pruning phrase-table

2015-11-24 Thread Barry Haddow

To be honest, I have not experimented much with the settings. If the
filtering does not work for you, it could be that your training data is
small or out-of-domain, so that the poorly attested phrase pairs are
actually important.

Can you see from looking at the translations what changes when you
filter at default settings? (note that the length drops).

Do you really need this kind of pruning? Maybe histogram pruning, and/or
the compact data structures can reduce the size of your model sufficiently?

cheers - Barry

On 24/11/15 10:19, Sanjanashree Palanivel wrote:

Dear Barry,

I tried it, I got decrease in BLEU Score say from 16.39 to 14.35,
But the size of PT was greatly reduced. When I tried some positive
values the BLEU Score varies. The following is a sample table.

Inline image 1

On Tue, Nov 24, 2015 at 3:40 PM, Barry Haddow
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>> wrote:

The documentation suggests:

A good setting is |-l a+e -n 30|

Try this first,

cheers - Barry

On 24/11/15 10:00, Sanjanashree Palanivel wrote:

Dear Barry,

Thank you for ur earnest response. I am using sigtest
filter, now. In sigtest filter method, how should I tune a
parameters -l -n.

On Nov 24, 2015 2:44 PM, "Barry Haddow"
<bhad...@staffmail.ed.ac.uk <mailto:bhad...@staffmail.ed.ac.uk>>
wrote:

You're better off using the Johnson pruning method
http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc5 .
The relent code is no longer maintained,

cheers - Barry

On 24/11/15 05:42, Sanjanashree Palanivel wrote:

Dear All,

I just tried to prune the phrase table using
relent-filter insisde mosesdecoder. I used the command as
mentioned in the moses site
(http://www.statmt.org/moses/?n=Advanced.RuleTables). But, I
am getting the following error "Use of uninitialized value
$_[0] in substitution (s///) at
/usr/share/perl/5.18/File/Basename.pm line 341.
fileparse(): need a valid pathname at

/home/sanjana/Documents/SMT/mosesdecoder/contrib/relent-filter/scripts/calcPruningScores.pl
line 140." . Where am i being worng.

--
Thanks and regards,

Sanjanasri J.P

___
Moses-support mailing list
Moses-support@mit.edu <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--
Thanks and regards,

Sanjanasri J.P

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses-support Digest, Vol 109, Issue 19

2015-11-12 Thread Barry Haddow

Hi Tomasz

The moseserver is just the decoder, so it doesn't do any of the pre- and 
post-processing steps that you also need. In particular it does not do 
tokenisation. You need to send it tokenised text, and then de-tokenise 
the output,

cheers - Barry

On 12/11/15 13:40, Tomasz Gawryl wrote:
> Hi Ulrich,
>
> I have a question about Moses server too. I'm testing it as a wrapper for
> Across server to check pre-translation possibilities. It generally works but
> there is one problem. Input segments are translated without tokenization, so
> every word close to special character (for example `this is small house.`)
> remains untranslated ('to jest mały house.'). I was searching list archive
> and I found similar question here:
> http://comments.gmane.org/gmane.comp.nlp.moses.user/14020  but for me it's
> not yet answered. I would appreciate any information on this subject.
>
> Best regards,
> Tomek
>
> -Original Message-
> From: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu]
> On Behalf Of moses-support-requ...@mit.edu
> Sent: Wednesday, November 11, 2015 10:31 AM
> To: moses-support@mit.edu
> Subject: Moses-support Digest, Vol 109, Issue 19
>
> Send Moses-support mailing list submissions to
>   moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>   http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
>   moses-support-requ...@mit.edu
>
> You can reach the person managing the list at
>   moses-support-ow...@mit.edu
>
> When replying, please edit your Subject line so it is more specific than
> "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
> 1. Re: use placeholder with mosesserver (Ulrich Germann)
> 2. Re: use placeholder with mosesserver (Evgeny Matusov)
>
>
> --
>
> Message: 1
> Date: Wed, 11 Nov 2015 01:58:40 +
> From: Ulrich Germann 
> Subject: Re: [Moses-support] use placeholder with mosesserver
> To: Evgeny Matusov 
> Cc: "moses-support@mit.edu" 
> Message-ID:
>   
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> I've just pushed what I believe might address a few of the issues in this
> thread:
>
> - the more fine-grained configuration options for request handling and
> queuing, server timeouts etc. (added in August due to threading issue) have
> been transferred to the main moses executable.
>
> - the server now pays attention to the xml-input option specified via json;
> the range of accepted values is the same as when specified on the command
> line. I have not written the xml-input handling and do not actively use it,
> so it may or may not work. I don't think there are any regression tests that
> test this right now. Reports from the trenches are welcome.
>
> - mosesserver.cpp is deprecated. It is now merely a shell around the regular
> moses call with --server. I did not remove it from the code base entirely,
> as I assume that there's a plethora of setups out there that rely on the
> existence of mosesserver. What the wrapper does is add --server to the
> options and then call run regular moses.
>
> - anyone adding stuff to mosesserver.cpp from now on owes me a lifetime
> supply of the finest Laphroaig. Just send me a quarter cask every year for
> Burns Nicht for the rest of my life if you do. If I haven't pushed anything
> for two years, you may assume I'm dead.
>
>
> - Uli
>
> On Tue, Nov 10, 2015 at 2:58 PM, Ulrich Germann 
> wrote:
>
>> Hi all,
>>
>> mosesserver is deprecated and should not be used any more. I'll
>> transfer the threading-related changes to the server implementation in
>> the regular moses executable and let you know once I'm done so that
>> other things (like
>> passthrough) can be added. By the looks of it, the changes are fairly
>> straightforward, so it shouldn't take long. However, I can't guarantee
>> that the new server will do everything the old server did, (or do it
>> the same way).
>>
>> It would be fantastic if a few people could design and contribute test
>> cases so that we can do some regression testing for the server.
>> Ideally a test case should provide:
>>
>> - tiny models to work with (or we may be able to recycle some that
>> already
>> exist)
>> - sample input (json)
>> - expected output (json)
>>
>> Cheers - Uli
>>
>> On Tue, Nov 10, 2015 at 11:37 AM, Evgeny Matusov 
>> wrote:
>>
>>> Hi,
>>>
>>> can any of the more active recent developers advise what is the
>>> latest stable mosesserver implementation?
>>>
>>> It seems to be the one in moses/server, but the  one in in
>>> contrib/server/mosesserver.cpp has been updated in August of this
>>> year with an important fix related to multiple threads:
>>>
>>>
>>>

Re: [Moses-support] Correct form of using Mira

2015-10-29 Thread Barry Haddow


Hi Davood

The first command you give has a quote missing at the end - is this correct?

Another difference is that you have "-v 0", so moses will run silently.

What was the actual output when you ran this command? What you have 
below looks correct to me.


cheers - Barry

On 28/10/15 21:57, Davood Mohammadifar wrote:

Hello everyone

because of variations in BLEU score when using normal mert, i decided 
to use mira instead. Moses manual (updated on 28 October 2015) says me 
to use this command:


$MOSES_SCRIPTS/training/mert-moses.pl work/dev.fr work/dev.en 
$MOSES_BIN/moses work/model/moses.ini --mertdir $MOSES_BIN --rootdir 
$MOSES_SCRIPTS --batch-mira --return-best-dev --batch-mira-args '-J 
300' --decoder-flags '-threads 8 -v 0


but this command is not work for me. When i execute the command, i 
just see some options for it and nothing happens. So i wanted to 
change the command. Based on usual mert, i changed the command to this:


$MOSES_SCRIPTS/training/mert-moses.pl 
/home/mohammadifar/corpus/tune.true.fa 
/home/mohammadifar/corpus/tune.true.en $MOSES_BIN/moses 
/home/mohammadifar/First/train/model/moses.ini --mertdir 
$MOSES_BIN--rootdir $MOSES_SCRIPTS --batch-mira --return-best-dev 
--batch-mira-args="-J 300" --decoder-flags="-threads all"


the difference of two command is in the end. The latter works for me 
very good. BLEU variations in test-set are very slight (many times 
<0.1 and rarely about 0.2 in 3 times running the whole of translation 
commands for same dataset). So i want to be sure, Is the form of using 
mira correct? (Moses v3.0)


Regards
Davood


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-05 Thread Barry Haddow


Hi Hieu

That's exactly why I took to pre-pruning the phrase table, as I 
mentioned on Friday. I had something like 750,000 translations of the 
most common word, and it took half-an-hour to get the first sentence 
translated.


cheers - Barry

On 05/10/15 15:48, Hieu Hoang wrote:
what pt implementation did you use, and had it been pre-pruned so that 
there's a limit on how many target phrase for a particular source 
phrase? ie. don't have 10,000 entries for 'the' .


I've been digging around multithreading in the last few weeks. I've 
noticed that the compact pt is VERY bad at handling unpruned pt.

Cores   
1   5   10  15  20  25
Unprunedcompact pt  143 42  32  38  52  62
probing pt  245 58  33  25  24  21
Pruned  compact pt  119 24  15  10  10  10
probing pt  117 25  25  10  10  10



Hieu Hoang
http://www.hoang.co.uk/hieu

On 5 October 2015 at 15:15, Michael Denkowski 
> 
wrote:


Hi all,

Like some other Moses users, I noticed diminishing returns from
running Moses with several threads.  To work around this, I added
a script to run multiple single-threaded instances of moses
instead of one multi-threaded instance.  In practice, this sped
things up by about 2.5x for 16 cpus and using memory mapped models
still allowed everything to fit into memory.

If anyone else is interested in using this, you can prefix a moses
command with scripts/generic/multi_moses.py.  To use multiple
instances in mert-moses.pl , specify
--multi-moses and control the number of parallel instances with
--decoder-flags='-threads N'.

Below is a benchmark on WMT fr-en data (2M training sentences,
400M words mono, suffix array PT, compact reordering, 5-gram
KenLM) testing default stack decoding vs cube pruning without and
with the parallelization script (+multi):

---
1cpu   sent/sec
stack  1.04
cube   2.10
---
16cpu  sent/sec
stack  7.63
+multi12.20
cube   7.63
+multi18.18
---

--Michael

___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] prune phrase table

2015-10-02 Thread Barry Haddow

And there's prunePhraseTable, which prunes according to weighted TM 
score (as Moses does at runtime).


Some day there will be one pruner to rule them all ...

On 02/10/15 18:39, Philipp Koehn wrote:

Hi,

there is also scripts/training/threshold-filter.perl
which filters out phrase pairs based on minimum scores
- which is not quite what you want but similar enough.

There is also the corresponding
remove-orphan-phrase-pairs-from-reordering-table.perl
which removes phrase pairs from the reordering table
that were removed from the phrase table.

-phi

On Fri, Oct 2, 2015 at 11:43 AM, Marcin Junczys-Dowmunt 
> wrote:


You can use filter-pt from contrib/sigtestfilter without the
suffix arrays, needs SALM to compile though. When you only specifg
-n 100 it will prune according to p(t|s)

W dniu 2015-10-02 17:35, Hieu Hoang napisa?(a):


I can't remember, but is there a script that prune the pt,
keeping just the best x rules, according to p(t|s) ?

Hieu Hoang
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu  
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Regarding Parallel Corpus Repository

2015-09-27 Thread Barry Haddow

Hi Nakul

The Emille project released parallel corpora for several South Asian 
languages
http://catalog.elra.info/product_info.php?products_id=696

cheers - Barry

On 27/09/15 15:45, nakul sharma wrote:
> Dear All,
>
> Is there any online repository of parallel corpus for Indian Regional
> languages ? Bulding from scratch is very tedious task and quite error
> prone? I am looking for English-to any North Indian language pair
> (Punjabi, Hindi, Urdu).
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Blingual neural lm, log-likelihood: -nan

2015-09-21 Thread Barry Haddow


Hi Jian

You could also try using dropout. Adding something like

--dropout 0.8 --input_dropout 0.9 --null_index 1

to nplm training can help - look at your vocabulary file to see what the 
null index should be set to. This works with the Moses version of nplm,


cheers - Barry

On 21/09/15 08:45, Nikolay Bogoychev wrote:


Hey Jian,

I have encountered this problem with nplm myself and couldn't really 
find a solution that works every time.


Basically what happens is that there is a token that occurs very 
frequently on the same position and it's weights become huge and 
eventually not a number which propagates to the rest of the data. This 
usually happens with the beginning of sentence token especially if 
your source and target size contexts are big. One thing you could do 
is to decrease the source and target size context (doesn't always 
work). Another thing you could do is to lower the learning rate 
(always works, but you might need to set it quite low like 0.25)


The proper solution to this according to Ashish Vasvani who is the 
creator of nplm is to use gradient clipping which is commented out in 
his code. You should contact him because this is a nplm issue.


Cheers,

Nick


On Sat, Sep 19, 2015 at 8:58 PM, jian zhang > wrote:


Hi all,

I got

Epoch 
Current learning rate: 1
Training minibatches: Validation log-likelihood: -nan
   perplexity: nan

during bilingual neural lm training.

I use command:
/home/user/tools/nplm-master-rsennrich/src/trainNeuralNetwork
--train_file work_dir/blm/train.numberized --num_epochs 30
--model_prefix work_dir/blm/train.10k.model.nplm --learning_rate 1
--minibatch_size 1000 --num_noise_samples 100 --num_hidden 2
--input_embedding_dimension 512 --output_embedding_dimension 192
--num_threads 6 --loss_function log --activation_function tanh
--validation_file work_dir/blm/valid.numberized
--validation_minibatch_size 10

where train.numberized and valid.numberized files are splitted
from the file generated by
script ${moses}/scripts/training/bilingual-lm/extract_training.py.

Training/Validation numbers are:
Number of training instances: 4128195
Number of validation instances: 217274


Thanks,

Jian

Jian Zhang
Centre for Next Generation Localisation (CNGL)

Dublin City University 

___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU score

2015-09-07 Thread Barry Haddow


Hi Tomek

Yes, that's quite a low score. Have a look at the translation output, do 
the sentences have lots of English words in them, are they very long, 
very short, or scrambled in some other way?


The commonest problem is that something went wrong in corpus 
preparation, for example the corpora weren't correctly aligned, some 
parts got swapped around accidentally, they were not consistently 
tokenised or truecased, etc.


Did you run tuning? , and if so double-check that you passed the correct 
files (input and reference) to tuning,


cheers - Barry

On 07/09/15 08:15, Tomasz Gawryl wrote:


Hi All!

This is my first post here and AT first I want to apologize for my 
English but I would like to ask you some questions. I finished a full 
phrase based Moses training of EN-PL (English - Polish) corpus (few 
million sentences from free sources + half million sentences from 
commercial tmx). Training pipeline always ends with test translation 
and BLEU score. I didn’t expect the first score around 30% but my 
result 4.5% surprised me. Why my result is so bad? Is it a consequence 
of chosen language pair? Polish language is very flexible – we can 
interchange words in a sentence without losing sense. What should I do 
to improve this result? Or maybe that’s all I can get ;).


Regards,

Tomek



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Domain adaptation

2015-08-14 Thread Barry Haddow

You could try this tutorial

http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf

On 14/08/15 20:20, Vincent Nguyen wrote:
 I had read this section, which deals with translation model combination.
 not much on language model or tuning.

 For instance : if I want to make sure that a specific expression
 titres is translated in equities from French to English.

 These 2 words have specifically to be in the Monolingual corpus of the
 language model, or in the parallel corpus ?

 the fact that 2 parallel expressions are in the tuning set but not
 present in the parallel corpora nor the monolingual LM, can it trigger a
 good translation ?

 I am not sure to be clear 

 thanks again for your help.


 Le 14/08/2015 20:52, Rico Sennrich a écrit :
 Hi Vincent,

 this section describes some domain adaptation methods that are
 implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain

 It is incomplete (focusing on parallel data and the translation model),
 and does not recommend best practices.

 In general, my recommendation is to use in-domain data whenever possible
 (for the language model, translation model, and held-out in-domain data
 for tuning/testing). Out-of-domain data can help, but also hurt your
 system: the effect depends on your domains and the amount of data you
 have for each. Data selection, instance weighting, model interpolation
 and domain features are different methods that give you the benefits of
 out-of-domain data, but reduce its harmful effects, and are often better
 than just concatenating all the data you have.

 best wishes,
 Rico


 On 14/08/15 16:22, Vincent Nguyen wrote:
 Hi,

 I can't find a sort of tutorial  on domain adaptation path to follow.
 I read this in the doc :
 The language model should be trained on a corpus that is suitable to the
 domain. If the translation model is trained on a parallel corpus, then
 the language model should be trained on the output side of that corpus,
 although using additional training data is often beneficial.

 And in the training section of the EMS, there is a sub section with
 domain-features=

 What is the best practice ?

 Let's say for instance that I would like to specialize my modem in
 finance translation, with specific corpus.

 Should I train the Language model with finance stuff ?
 Should I include parallel corpus in the translation model training ?
 Should I tune with financial data sets ?

 Please help me to understand.
 Vincent

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS results - makes sense ?

2015-08-06 Thread Barry Haddow


Hi Vincent

It's a SIGKILL. Probably means it ran out of memory.

I'd recommend fast_align for this data set. Even if you manage to get it 
running with mgiza it will still take a week or so.


Just add
fast-align-settings = -d -o -v
to the TRAINING section of ems, and make sure that fast_align is in your 
external-bin-dir.


cheers - Barry

On 06/08/15 08:40, Vincent Nguyen wrote:


so I dropped my hierarchical model since I got an error.
Switched back to the more data by adding the Giga FR EN source
but now another error pops un running Giza Inverse :

Using SCRIPTS_ROOTDIR: /home/moses/mosesdecoder/scripts
Using multi-thread GIZA
using gzip
(2) running giza @ Wed Aug  5 21:03:56 CEST 2015
(2.1a) running snt2cooc fr-en @ Wed Aug  5 21:03:56 CEST 2015
Executing: mkdir -p /home/moses/working/training/giza-inverse.7
Executing: /home/moses/working/bin/training-tools/mgizapp/snt2cooc 
/home/moses/working/training/giza-inverse.7/fr-en.cooc 
/home/moses/working/training/prepared.7/en.vcb 
/home/moses/working/training/prepared.7/fr.vcb 
/home/moses/working/training/prepared.7/fr-en-int-train.snt

line 1000
line 2000

...
line 6609000
line 661
ERROR: Execution of: 
/home/moses/working/bin/training-tools/mgizapp/snt2cooc 
/home/moses/working/training/giza-inverse.7/fr-en.cooc 
/home/moses/working/training/prepared.7/en.vcb 
/home/moses/working/training/prepared.7/fr.vcb 
/home/moses/working/training/prepared.7/fr-en-int-train.snt

  died with signal 9, without coredump


any clue what signal 9 means ?



Le 04/08/2015 17:28, Barry Haddow a écrit :

Hi Vincent

If you are comparing to the results of WMT11, then you can look at 
the system descriptions to see what the authors did. In fact it's 
worth looking at the WMT14 descriptions (WMT15 will be available next 
month) to see how state-of-the-art systems are built.


For fr-en or en-fr, the first thing to look at is the data. There are 
some large data sets released for WMT and you can get a good gain 
from just crunching more data (monolingual and parallel). 
Unfortunately this takes more resources (disk, cpu etc) so you may 
run into trouble here.


The hierarchical models are much bigger so yes you will need more 
disk. For fr-en/en-fr it's probably not worth the extra effort,


cheers - Barry

On 04/08/15 15:58, Vincent Nguyen wrote:

thanks for your insights.

I am just stuck by the Bleu difference between my 26 and the 30 of
WMT11, and some results of WMT14 close to 36 or even 39

I am currently having trouble with hierarchical rule set instead of
lexical reordering
wondering if I will get better results but I have an error message
filesystem root low disk space before it crashes.
is this model taking more disk space in some ways ?

I will next try to use more corpora of which in domain with my 
internal TMX


thanks for your answers.

Le 04/08/2015 16:02, Hieu Hoang a écrit :


On 03/08/2015 13:00, Vincent Nguyen wrote:

Hi,

Just a heads up on some EMS results, to get your experienced 
opinions.


Corpus: Europarlv7 + NC2010
fr = en
Evaluation NC2011.

1) IRSTLM vs KenLM is much slower for training / tuning.

that sounds right. KenLM is also multithreaded, IRSTLM can only be
used in single-threaded decoding.
2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with 
KenLM)

true
3) Compact Mode is faster than onDisk with a short test (77 
segments 96

seconds, vs 126 seconds)

true

4) One last thing I do not understand though :
For sake of checking, I replaced NC2011 by NC2010 in the 
evaluation (I

know since NC2010 is part of training, should not be relevant)
I got roughly the same BLEU score. I would have expected a higher 
score

with a test set inculded in the training corpus.

makes sense ?


Next steps :
What path should I use to get better scores ? I read the 'optimize'
section of the website which deals more with speed
and of course I will appply all of this but I was interested in 
tips to

get more quality if possible.

look into domain adaptation if you have multiple training corpora,
some of which is in-domain and some out-of-domain.

Other than that, getting good bleu score is a research open question.

Well done on getting this far


Thanks



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support








The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS results - makes sense ?

2015-08-04 Thread Barry Haddow

Hi Vincent

If you are comparing to the results of WMT11, then you can look at the 
system descriptions to see what the authors did. In fact it's worth 
looking at the WMT14 descriptions (WMT15 will be available next month) 
to see how state-of-the-art systems are built.

For fr-en or en-fr, the first thing to look at is the data. There are 
some large data sets released for WMT and you can get a good gain from 
just crunching more data (monolingual and parallel). Unfortunately this 
takes more resources (disk, cpu etc) so you may run into trouble here.

The hierarchical models are much bigger so yes you will need more disk. 
For fr-en/en-fr it's probably not worth the extra effort,

cheers - Barry

On 04/08/15 15:58, Vincent Nguyen wrote:
 thanks for your insights.

 I am just stuck by the Bleu difference between my 26 and the 30 of
 WMT11, and some results of WMT14 close to 36 or even 39

 I am currently having trouble with hierarchical rule set instead of
 lexical reordering
 wondering if I will get better results but I have an error message
 filesystem root low disk space before it crashes.
 is this model taking more disk space in some ways ?

 I will next try to use more corpora of which in domain with my internal TMX

 thanks for your answers.

 Le 04/08/2015 16:02, Hieu Hoang a écrit :

 On 03/08/2015 13:00, Vincent Nguyen wrote:
 Hi,

 Just a heads up on some EMS results, to get your experienced opinions.

 Corpus: Europarlv7 + NC2010
 fr = en
 Evaluation NC2011.

 1) IRSTLM vs KenLM is much slower for training / tuning.
 that sounds right. KenLM is also multithreaded, IRSTLM can only be
 used in single-threaded decoding.
 2) BLEU results are almost the same (25.7 with Irstlm, 26.14 with KenLM)
 true
 3) Compact Mode is faster than onDisk with a short test (77 segments 96
 seconds, vs 126 seconds)
 true
 4) One last thing I do not understand though :
 For sake of checking, I replaced NC2011 by NC2010 in the evaluation (I
 know since NC2010 is part of training, should not be relevant)
 I got roughly the same BLEU score. I would have expected a higher score
 with a test set inculded in the training corpus.

 makes sense ?


 Next steps :
 What path should I use to get better scores ? I read the 'optimize'
 section of the website which deals more with speed
 and of course I will appply all of this but I was interested in tips to
 get more quality if possible.
 look into domain adaptation if you have multiple training corpora,
 some of which is in-domain and some out-of-domain.

 Other than that, getting good bleu score is a research open question.

 Well done on getting this far

 Thanks



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] blm napalm weights name in ems

2015-08-03 Thread Barry Haddow

Hi John

 Is there a reason the example weight file has this feature name that I’m 
 missing?
My fault I'm afraid. I streamlined bilingual-lm in EMS, but didn't 
realise that the example bypassed tuning. I've fixed it now according to 
your suggestion,

cheers - Barry

On 02/08/15 15:46, John Joseph Morgan wrote:
 Hello,
 I just wanted to give a heads up that in order to get the ems to run 
 completely with the config.toy.bilinguallm config file I had to make a minor 
 change.
 Instead of tuning the config file points to a file called 
 weight_bilinguallm.ini.
 I changed the feature name “BilingualNPLM0” to BLMbilingual-lm”.
 I’m running the same setup now with tuning, I’m hoping the feature names get 
 written correctly.
 Is there a reason the example weight file has this feature name that I’m 
 missing?
 Thanks,
 John



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] blm napalm weights name in ems

2015-08-03 Thread Barry Haddow

I took out the dash - does it work now?

On 03/08/15 18:55, John Morgan wrote:
 Thanks Barry,
 I think there's still a problem with the feature name.
 I think the subroutine get_order_of_scores_from_nbestlist in
 mert-moses.pl does not expect a dash in the feature name.
 John


 On 8/3/15, Barry Haddow bhad...@inf.ed.ac.uk wrote:
 Hi John

 Is there a reason the example weight file has this feature name that I’m
 missing?
 My fault I'm afraid. I streamlined bilingual-lm in EMS, but didn't
 realise that the example bypassed tuning. I've fixed it now according to
 your suggestion,

 cheers - Barry

 On 02/08/15 15:46, John Joseph Morgan wrote:
 Hello,
 I just wanted to give a heads up that in order to get the ems to run
 completely with the config.toy.bilinguallm config file I had to make a
 minor change.
 Instead of tuning the config file points to a file called
 weight_bilinguallm.ini.
 I changed the feature name “BilingualNPLM0” to BLMbilingual-lm”.
 I’m running the same setup now with tuning, I’m hoping the feature names
 get written correctly.
 Is there a reason the example weight file has this feature name that I’m
 missing?
 Thanks,
 John



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS help

2015-07-31 Thread Barry Haddow


HI Vincent

The compact phrase table is a direct replacement for the on disk phrase 
table. Just use processPhraseTableMin as your binarizer (no arguments 
required)


cheers - Barry

On 30/07/15 21:01, Vincent Nguyen wrote:

Barry,
If I want the end result to be Compact Tables and not OnDisk.
Do  I have to binarize first or can I convert directly to Compact ? 
(ie can I skip the CreateOnDisk stuff)

if so is there a predefined script or should do it manually ?
thanks


Le 28/07/2015 15:44, Barry Haddow a écrit :

Hi Vincent


I think the quotes are getting stripped off further down the 
pipeline. You could work around by changing to the compact phrase 
table. Or try editing binarize-model.perl to change


safesystem($RealBin/filter-model-given-input.pl  $targetdir 
$input_config /dev/null $hierarchical -nofilter -Binarizer 
$binarizer) || die binarising failed;


to

safesystem($RealBin/filter-model-given-input.pl  $targetdir 
$input_config /dev/null $hierarchical -nofilter -Binarizer 
\$binarizer\) || die binarising failed;


Note the escaped quotes around the $binarizer.

cheers - Barry

On 28/07/15 14:09, Vincent Nguyen wrote:

same error:

#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
cd /home/moses/working
echo 'starting at '`date`' on '`hostname`
mkdir -p /home/moses/working/training
mkdir -p /home/moses/working/model
/home/moses/mosesdecoder/scripts/training/binarize-model.perl 
/home/moses/working/model/moses.ini.5 
/home/moses/working/model/moses.bin.ini.6 -Binarizer 
/home/moses/mosesdecoder/bin/CreateOnDiskPt 1 1 4 100 2


echo 'finished at '`date`
touch /home/moses/working/steps/6/TRAINING_binarize-config.6.DONE




Le 28/07/2015 14:47, Barry Haddow a écrit :

Hi Vincent

It could be a bug. Could you edit 
mosesdecoder/scripts/ems/experiment.meta and change the line:


  template: $binarize-all IN OUT -Binarizer $ttable-binarizer

to

  template: $binarize-all IN OUT -Binarizer $ttable-binarizer

Note that I have added quotes. Then you'll have to delete the most 
recent run, and re-run experiment.perl. If it works, fine. If it 
doesn't, could you post the steps/6/TRAINING_binarize-config.6 
script (hopefully I got the name right - you may need to change the 
number)


cheers - Barry


On 28/07/15 13:11, Vincent Nguyen wrote:

I know but this is what I have in my config.basic now:
# conversion of rule table into binary on-disk format
ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 4 100 2
binarize-all = $moses-script-dir/training/binarize-model.perl

I don't where else I can add the 5 arguments or if I need to 
reference ttable-binarizer somewhere



Le 28/07/2015 13:49, Barry Haddow a écrit :

Hi Vincent

If you look at the error log, you will see:

Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt 
numSourceFactors numTargetFactors numScores tableLimit 
sortScoreIndex inputPath outputPath 
You are missing the first 5 arguments to CreateOnDiskPt, as given 
in config.basic.


cheers - Barry

On 28/07/15 12:37, Vincent Nguyen wrote:

I don't know why but the binarize crashes see below 




in my working directory I have 2 subdir,
tuning with inside moses.filtered.ini.5 moses.ini.5 
moses.tuned.ini.5

and
model with inside moses.ini.5 (apparently this one does not 
have the

tuned weights)

those in the tuning subdir : the tuned one moses.tuned.ini.5 
generated
after the moses.ini.5 seems to point on phrase-table.5.gz not 
binarized

and the moses.5.ini seem to point on the binarized within
tuning/filtered.5/...
unclear to me on which one I should use.
If you run EMS, there will be a filtered ini file inside the 
evaluation directory which can be used to translate the test 
set using the tuned weights. However this model is filtered for 
the test set, so you cannot use it on other sentences.


If you want the full model binarised, then you should add:

binarize-all = $moses-script-dir/training/binarize-model.perl

to the [GENERAL] section of the EMS config and rerun EMS. In 
this case the moses.tuned.ini in tuning can be used to 
translate any sentences.





Executing: 
/home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl 
/home/moses/working/model/moses.bin.ini.6.tables 
/home/moses/working/model/moses.ini.5 /dev/null -nofilter 
-Binarizer /home/moses/mosesdecoder/bin/CreateOnDiskPt
Executing: mkdir -p 
/home/moses/working/model/moses.bin.ini.6.tables

Stripping XML...
Executing: 
/home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl 
 /dev/null  
/home/moses/working/model/moses.bin.ini.6.tables/input.34384
pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4 
path=/home/moses/working/model/phrase-table.5 input-factor=0 
output-factor=0

Considering factor 0
ro:LexicalReordering name=LexicalReordering0 num-features=6 
type=wbe-msd-bidirectional-fe-allff input-factor=0 
output-factor=0 
path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz 


Considering

Re: [Moses-support] Contrib Web - translate.cgi

2015-07-30 Thread Barry Haddow

Try using the -b option in the tokenizer / detokenizer to disable buffering.

On 29/07/15 18:47, Vincent Nguyen wrote:
 Hi,

 As is, it was working fine except the tokenizer / detokenizer .perl code
 is outdated.
 It causes problem with the apostrophe in French.

 so I changed the translate.cgi file to run the 2 perl file from
 moses/scripts/share/tokenizer
 but it does not work at all.
 not same parameters ?

 Cheers,
 Vincent
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS help

2015-07-28 Thread Barry Haddow

Hi Vincent

 2 bugs report :
 in the LM Corpus definition for Europarl : the $pair-extension is
 missing before .$output-extension
 in the step 5 (maybe for others too) generation of the moses.tuned.ini.5
 file there is a missing .gz at the end of phrase-table.5
 in the PhraseDictionaryMemory definition.
These seem OK to me. For europarl, it points to the monolingual corpus, 
and for the phrase table the .gz is implicitly added. Did they not work 
for you?

 in my working directory I have 2 subdir,
 tuning with inside moses.filtered.ini.5  moses.ini.5 moses.tuned.ini.5
 and
 model with inside moses.ini.5 (apparently this one does not have the
 tuned weights)

 those in the tuning subdir : the tuned one moses.tuned.ini.5 generated
 after the moses.ini.5 seems to point on phrase-table.5.gz not binarized
 and the moses.5.ini seem to point on the binarized within
 tuning/filtered.5/...
 unclear to me on which one I should use.
If you run EMS, there will be a filtered ini file inside the evaluation 
directory which can be used to translate the test set using the tuned 
weights. However this model is filtered for the test set, so you cannot 
use it on other sentences.

If you want the full model binarised, then you should add:

binarize-all = $moses-script-dir/training/binarize-model.perl

to the [GENERAL] section of the EMS config and rerun EMS. In this case 
the moses.tuned.ini in tuning can be used to translate any sentences.

 I tried to remove the IGNORE for the Interpolated-LM section
 I am still using KenLM.
 BUT I get a message saying I need to define srilm-dir
 is SRILM mandatory to turn on the interpolated-lm with KenLM only ?
That's right, the interpolated LM uses some code from SRILM. You can 
still use KenLM to create the individual language models, and use KenLM 
during decoding,

cheers - Barry

On 26/07/15 08:36, Vincent Nguyen wrote:
 Hi,

 I worked with the config.basic file

 2 bugs report :
 in the LM Corpus definition for Europarl : the $pair-extension is
 missing before .$output-extension
 in the step 5 (maybe for others too) generation of the moses.tuned.ini.5
 file there is a missing .gz at the end of phrase-table.5
 in the PhraseDictionaryMemory definition.

 Then questions :

 in my working directory I have 2 subdir,
 tuning with inside moses.filtered.ini.5  moses.ini.5 moses.tuned.ini.5
 and
 model with inside moses.ini.5 (apparently this one does not have the
 tuned weights)

 those in the tuning subdir : the tuned one moses.tuned.ini.5 generated
 after the moses.ini.5 seems to point on phrase-table.5.gz not binarized
 and the moses.5.ini seem to point on the binarized within
 tuning/filtered.5/...

 unclear to me on which one I should use.


 Last question :
 I tried to remove the IGNORE for the Interpolated-LM section
 I am still using KenLM.
 BUT I get a message saying I need to define srilm-dir
 is SRILM mandatory to turn on the interpolated-lm with KenLM only ?





 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS help

2015-07-28 Thread Barry Haddow


Hi Vincent


I think the quotes are getting stripped off further down the pipeline. 
You could work around by changing to the compact phrase table. Or try 
editing binarize-model.perl to change


safesystem($RealBin/filter-model-given-input.pl  $targetdir 
$input_config /dev/null $hierarchical -nofilter -Binarizer $binarizer) 
|| die binarising failed;


to

safesystem($RealBin/filter-model-given-input.pl  $targetdir 
$input_config /dev/null $hierarchical -nofilter -Binarizer 
\$binarizer\) || die binarising failed;


Note the escaped quotes around the $binarizer.

cheers - Barry

On 28/07/15 14:09, Vincent Nguyen wrote:

same error:

#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
cd /home/moses/working
echo 'starting at '`date`' on '`hostname`
mkdir -p /home/moses/working/training
mkdir -p /home/moses/working/model
/home/moses/mosesdecoder/scripts/training/binarize-model.perl 
/home/moses/working/model/moses.ini.5 
/home/moses/working/model/moses.bin.ini.6 -Binarizer 
/home/moses/mosesdecoder/bin/CreateOnDiskPt 1 1 4 100 2


echo 'finished at '`date`
touch /home/moses/working/steps/6/TRAINING_binarize-config.6.DONE




Le 28/07/2015 14:47, Barry Haddow a écrit :

Hi Vincent

It could be a bug. Could you edit 
mosesdecoder/scripts/ems/experiment.meta and change the line:


  template: $binarize-all IN OUT -Binarizer $ttable-binarizer

to

  template: $binarize-all IN OUT -Binarizer $ttable-binarizer

Note that I have added quotes. Then you'll have to delete the most 
recent run, and re-run experiment.perl. If it works, fine. If it 
doesn't, could you post the steps/6/TRAINING_binarize-config.6 script 
(hopefully I got the name right - you may need to change the number)


cheers - Barry


On 28/07/15 13:11, Vincent Nguyen wrote:

I know but this is what I have in my config.basic now:
# conversion of rule table into binary on-disk format
ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 4 100 2
binarize-all = $moses-script-dir/training/binarize-model.perl

I don't where else I can add the 5 arguments or if I need to 
reference ttable-binarizer somewhere



Le 28/07/2015 13:49, Barry Haddow a écrit :

Hi Vincent

If you look at the error log, you will see:

Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt 
numSourceFactors numTargetFactors numScores tableLimit 
sortScoreIndex inputPath outputPath 
You are missing the first 5 arguments to CreateOnDiskPt, as given 
in config.basic.


cheers - Barry

On 28/07/15 12:37, Vincent Nguyen wrote:

I don't know why but the binarize crashes see below 




in my working directory I have 2 subdir,
tuning with inside moses.filtered.ini.5  moses.ini.5 
moses.tuned.ini.5

and
model with inside moses.ini.5 (apparently this one does not 
have the

tuned weights)

those in the tuning subdir : the tuned one moses.tuned.ini.5 
generated
after the moses.ini.5 seems to point on phrase-table.5.gz not 
binarized

and the moses.5.ini seem to point on the binarized within
tuning/filtered.5/...
unclear to me on which one I should use.
If you run EMS, there will be a filtered ini file inside the 
evaluation directory which can be used to translate the test set 
using the tuned weights. However this model is filtered for the 
test set, so you cannot use it on other sentences.


If you want the full model binarised, then you should add:

binarize-all = $moses-script-dir/training/binarize-model.perl

to the [GENERAL] section of the EMS config and rerun EMS. In this 
case the moses.tuned.ini in tuning can be used to translate any 
sentences.





Executing: 
/home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl 
/home/moses/working/model/moses.bin.ini.6.tables 
/home/moses/working/model/moses.ini.5 /dev/null  -nofilter 
-Binarizer /home/moses/mosesdecoder/bin/CreateOnDiskPt

Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
Stripping XML...
Executing: 
/home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl  
/dev/null  
/home/moses/working/model/moses.bin.ini.6.tables/input.34384
pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4 
path=/home/moses/working/model/phrase-table.5 input-factor=0 
output-factor=0

Considering factor 0
ro:LexicalReordering name=LexicalReordering0 num-features=6 
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz 


Considering factor 0
Filtering files...
filtering /home/moses/working/model/phrase-table.5 - 
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1... 

Executing: ln -s /home/moses/working/model/phrase-table.5.gz 
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz 


binarizing...
Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt 
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz 
/home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin 

Usage

Re: [Moses-support] EMS help

2015-07-28 Thread Barry Haddow

Hi Vincent

If you look at the error log, you will see:

 Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors 
 numTargetFactors numScores tableLimit sortScoreIndex inputPath outputPath 
You are missing the first 5 arguments to CreateOnDiskPt, as given in 
config.basic.

cheers - Barry

On 28/07/15 12:37, Vincent Nguyen wrote:
 I don't know why but the binarize crashes see below 


 in my working directory I have 2 subdir,
 tuning with inside moses.filtered.ini.5  moses.ini.5 
 moses.tuned.ini.5
 and
 model with inside moses.ini.5 (apparently this one does not have the
 tuned weights)

 those in the tuning subdir : the tuned one moses.tuned.ini.5 
 generated
 after the moses.ini.5 seems to point on phrase-table.5.gz not binarized
 and the moses.5.ini seem to point on the binarized within
 tuning/filtered.5/...
 unclear to me on which one I should use.
 If you run EMS, there will be a filtered ini file inside the 
 evaluation directory which can be used to translate the test set 
 using the tuned weights. However this model is filtered for the test 
 set, so you cannot use it on other sentences.

 If you want the full model binarised, then you should add:

 binarize-all = $moses-script-dir/training/binarize-model.perl

 to the [GENERAL] section of the EMS config and rerun EMS. In this 
 case the moses.tuned.ini in tuning can be used to translate any 
 sentences.



 Executing: 
 /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl 
 /home/moses/working/model/moses.bin.ini.6.tables 
 /home/moses/working/model/moses.ini.5 /dev/null  -nofilter -Binarizer 
 /home/moses/mosesdecoder/bin/CreateOnDiskPt
 Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
 Stripping XML...
 Executing: 
 /home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl  
 /dev/null  /home/moses/working/model/moses.bin.ini.6.tables/input.34384
 pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4 
 path=/home/moses/working/model/phrase-table.5 input-factor=0 
 output-factor=0
 Considering factor 0
 ro:LexicalReordering name=LexicalReordering0 num-features=6 
 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
 path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz 

 Considering factor 0
 Filtering files...
 filtering /home/moses/working/model/phrase-table.5 - 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1...
 Executing: ln -s /home/moses/working/model/phrase-table.5.gz 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz
 binarizing...
 Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin
 Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors 
 numTargetFactors numScores tableLimit sortScoreIndex inputPath outputPath
 Exit code: 1
 Can't binarize at 
 /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl 
 line 417.
 Exit code: 1
 binarising failed at 
 /home/moses/mosesdecoder/scripts/training/binarize-model.perl line 43.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] EMS help

2015-07-28 Thread Barry Haddow

Hi Vincent

It could be a bug. Could you edit 
mosesdecoder/scripts/ems/experiment.meta and change the line:

   template: $binarize-all IN OUT -Binarizer $ttable-binarizer

to

   template: $binarize-all IN OUT -Binarizer $ttable-binarizer

Note that I have added quotes. Then you'll have to delete the most 
recent run, and re-run experiment.perl. If it works, fine. If it 
doesn't, could you post the steps/6/TRAINING_binarize-config.6 script 
(hopefully I got the name right - you may need to change the number)

cheers - Barry


On 28/07/15 13:11, Vincent Nguyen wrote:
 I know but this is what I have in my config.basic now:
 # conversion of rule table into binary on-disk format
 ttable-binarizer = $moses-bin-dir/CreateOnDiskPt 1 1 4 100 2
 binarize-all = $moses-script-dir/training/binarize-model.perl

 I don't where else I can add the 5 arguments or if I need to reference 
 ttable-binarizer somewhere


 Le 28/07/2015 13:49, Barry Haddow a écrit :
 Hi Vincent

 If you look at the error log, you will see:

 Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors 
 numTargetFactors numScores tableLimit sortScoreIndex inputPath 
 outputPath 
 You are missing the first 5 arguments to CreateOnDiskPt, as given in 
 config.basic.

 cheers - Barry

 On 28/07/15 12:37, Vincent Nguyen wrote:
 I don't know why but the binarize crashes see below 


 in my working directory I have 2 subdir,
 tuning with inside moses.filtered.ini.5  moses.ini.5 
 moses.tuned.ini.5
 and
 model with inside moses.ini.5 (apparently this one does not have 
 the
 tuned weights)

 those in the tuning subdir : the tuned one moses.tuned.ini.5 
 generated
 after the moses.ini.5 seems to point on phrase-table.5.gz not 
 binarized
 and the moses.5.ini seem to point on the binarized within
 tuning/filtered.5/...
 unclear to me on which one I should use.
 If you run EMS, there will be a filtered ini file inside the 
 evaluation directory which can be used to translate the test set 
 using the tuned weights. However this model is filtered for the 
 test set, so you cannot use it on other sentences.

 If you want the full model binarised, then you should add:

 binarize-all = $moses-script-dir/training/binarize-model.perl

 to the [GENERAL] section of the EMS config and rerun EMS. In this 
 case the moses.tuned.ini in tuning can be used to translate any 
 sentences.



 Executing: 
 /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl 
 /home/moses/working/model/moses.bin.ini.6.tables 
 /home/moses/working/model/moses.ini.5 /dev/null  -nofilter 
 -Binarizer /home/moses/mosesdecoder/bin/CreateOnDiskPt
 Executing: mkdir -p /home/moses/working/model/moses.bin.ini.6.tables
 Stripping XML...
 Executing: 
 /home/moses/mosesdecoder/scripts/training/../generic/strip-xml.perl 
  /dev/null  
 /home/moses/working/model/moses.bin.ini.6.tables/input.34384
 pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4 
 path=/home/moses/working/model/phrase-table.5 input-factor=0 
 output-factor=0
 Considering factor 0
 ro:LexicalReordering name=LexicalReordering0 num-features=6 
 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
 path=/home/moses/working/model/reordering-table.5.wbe-msd-bidirectional-fe.gz
  

 Considering factor 0
 Filtering files...
 filtering /home/moses/working/model/phrase-table.5 - 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1... 

 Executing: ln -s /home/moses/working/model/phrase-table.5.gz 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz 

 binarizing...
 Executing: /home/moses/mosesdecoder/bin/CreateOnDiskPt 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.gz 
 /home/moses/working/model/moses.bin.ini.6.tables/phrase-table.0-0.1.1.bin 

 Usage: /home/moses/mosesdecoder/bin/CreateOnDiskPt numSourceFactors 
 numTargetFactors numScores tableLimit sortScoreIndex inputPath 
 outputPath
 Exit code: 1
 Can't binarize at 
 /home/moses/mosesdecoder/scripts/training/filter-model-given-input.pl line 
 417.
 Exit code: 1
 binarising failed at 
 /home/moses/mosesdecoder/scripts/training/binarize-model.perl line 43.






-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem in translation

2015-07-28 Thread Barry Haddow


Hi Fatma

I don't see any error in the file. What do you mean the output was 
wrong. ?


cheers - Barry

On 28/07/15 19:13, fatma elzahraa Eltaher wrote:

Dear All,

I try to build a Model but I get an attached error file . is this mean 
that there are a problem in model . Because I test it by word in 
training data but the output was wrong.


kindly find the attached file.

thank you,



Fatma El-Zahraa El -Taher

Teaching Assistant at Computer  System department

 Faculty of Engineering, Azhar University

Email : fatmaelta...@gmail.com mailto:fatmaelta...@gmail.com
mobile: +201141600434



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Performance issue using Moses Server with Moses 3 (probably same as Oren's)

2015-07-24 Thread Barry Haddow


Hi Martin

Thanks for the detailed information. It's a bit strange since 
command-line Moses uses the same threadpool, and we always overload the 
threadpool since the entire test set is read in and queued.


The server was refactored somewhat recently - which git revision are you 
using?


In the case where Moses takes a long time, and cpu activity is low, it 
could be either waiting on IO, or waiting on locks. If the former, I 
don't know why it works fine for command-line Moses, and if the latter 
then it's odd how it eventually frees itself.


Is it possible to run scenario 2, then attach a debugger whilst Moses is 
in the low-CPU phase to see what it is doing? (You can do this in gdb 
with info threads)


cheers - Barry

On 24/07/15 12:07, Martin Baumgärtner wrote:

Hi,

followed your discussion about mosesserver performance issue with much 
interest so far.


We're having similar behaviour in our perfomance tests with a current 
github master clone. Both, mosesserver and complete engine run from 
same local machine, i.e. no NFS. Machine is virtualized CentOS 7 using 
Hyper-V:


 lscpu

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:1
Core(s) per socket:8
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 30
Model name:Intel(R) Core(TM) i7 CPU 860  @ 2.80GHz
Stepping:  5
CPU MHz:   2667.859
BogoMIPS:  5335.71
Hypervisor vendor: Microsoft
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  8192K


Following experiments using an engine with 75000 segments for TM/LM 
(--minphr-memory, --minlexr-memory):


1.)
server: --threads: 8
client: shoots 8 threads = about 12 seconds, server shows full CPU 
workload = OK


2.)
server: --threads: 8
client: shoots 10 threads = about 85 seconds, server shows mostly low 
activity, full CPU workload only near end of process = NOT OK


3.)
server: --threads: 16
client: shoots 10 threads = about 12 seconds, server shows busy CPU 
workload = OK


4.)
server: --threads: 16
client: shoots 16 threads = about 11 seconds, server shows busy CPU 
workload = OK


5.)
server: --threads: 16
client: shoots 20 threads = about 40-60 seconds (depending), server 
shows mostly low activity, full CPU workload only near end of process 
= NOT OK



We've seen a breakdown in performance always when the client threads 
exceed the number of threads given by the --threads param.


Kind regards,
Martin

--

*STAR Group* http://www.star-group.net
http://www.star-group.net/

*Martin Baumgärtner*

STAR Language Technology  Solutions GmbH
Umberto-Nobile-Straße 19 | 71063 Sindelfingen | Germany
Tel. +49 70 31-4 10 92-0 	martin.baumgaert...@star-group.net 
mailto:martin.baumgaert...@star-group.net

Fax +49 70 31-4 10 92-70www.star-group.net http://www.star-group.net/
Geschäftsführer: Oliver Rau, Bernd Barth
Handelsregister Stuttgart HRB 245654 | St.-Nr. 56098/11677



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Performance issue using Moses Server with Moses 3 (probably same as Oren's)

2015-07-24 Thread Barry Haddow


Hi Martin

So it looks like it was the abyss connection limit that was causing the 
problem? I'm not sure why this should be, either it should queue the 
jobs up or discard them.


Probably Moses server should allow users to configure the number of 
abyss connections directly rather than tying it to the number of Moses 
threads.


cheers - Barry

On 24/07/15 14:17, Martin Baumgärtner wrote:

Hi Barry,

thanks for your quick reply!

We're currently testing on SHA 
e53ad4085942872f1c4ce75cb99afe66137e1e17 (master, from 2015-07-23). 
This version includes the fix for mosesserver recently mentioned by 
Hieu in the performance thread.


Following my first intuition, I ran the critical experiments after 
having modified mosesserver.cpp just by simply doubling the given 
--threads value, but only for abyss server:   .maxConn((unsigned 
int)numThreads*2):


2.)
server: --threads: 8 (i.e. abyss: 16)
client: shoots 10 threads = about 11 seconds, server shows busy CPU 
workload = OK


5.)
server: --threads: 16 (i.e. abyss: 32)
client: shoots 20 threads = about 11 seconds, server shows busy CPU 
workload = OK


Helps. :-)

Best wishes,
Martin

Am 24.07.2015 um 13:26 schrieb Barry Haddow:

Hi Martin

Thanks for the detailed information. It's a bit strange since 
command-line Moses uses the same threadpool, and we always overload 
the threadpool since the entire test set is read in and queued.


The server was refactored somewhat recently - which git revision are 
you using?


In the case where Moses takes a long time, and cpu activity is low, 
it could be either waiting on IO, or waiting on locks. If the former, 
I don't know why it works fine for command-line Moses, and if the 
latter then it's odd how it eventually frees itself.


Is it possible to run scenario 2, then attach a debugger whilst Moses 
is in the low-CPU phase to see what it is doing? (You can do this in 
gdb with info threads)


cheers - Barry

On 24/07/15 12:07, Martin Baumgärtner wrote:

Hi,

followed your discussion about mosesserver performance issue with 
much interest so far.


We're having similar behaviour in our perfomance tests with a 
current github master clone. Both, mosesserver and complete engine 
run from same local machine, i.e. no NFS. Machine is virtualized 
CentOS 7 using Hyper-V:


 lscpu

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:1
Core(s) per socket:8
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 30
Model name:Intel(R) Core(TM) i7 CPU 860  @ 2.80GHz
Stepping:  5
CPU MHz:   2667.859
BogoMIPS:  5335.71
Hypervisor vendor: Microsoft
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  8192K


Following experiments using an engine with 75000 segments for TM/LM 
(--minphr-memory, --minlexr-memory):


1.)
server: --threads: 8
client: shoots 8 threads = about 12 seconds, server shows full CPU 
workload = OK


2.)
server: --threads: 8
client: shoots 10 threads = about 85 seconds, server shows mostly 
low activity, full CPU workload only near end of process = NOT OK


3.)
server: --threads: 16
client: shoots 10 threads = about 12 seconds, server shows busy CPU 
workload = OK


4.)
server: --threads: 16
client: shoots 16 threads = about 11 seconds, server shows busy CPU 
workload = OK


5.)
server: --threads: 16
client: shoots 20 threads = about 40-60 seconds (depending), server 
shows mostly low activity, full CPU workload only near end of 
process = NOT OK



We've seen a breakdown in performance always when the client threads 
exceed the number of threads given by the --threads param.


Kind regards,
Martin

--

*STAR Group* http://www.star-group.net
http://www.star-group.net/

*Martin Baumgärtner*

STAR Language Technology  Solutions GmbH
Umberto-Nobile-Straße 19 | 71063 Sindelfingen | Germany
Tel. +49 70 31-4 10 92-0 	martin.baumgaert...@star-group.net 
mailto:martin.baumgaert...@star-group.net
Fax +49 70 31-4 10 92-70 	www.star-group.net 
http://www.star-group.net/

Geschäftsführer: Oliver Rau, Bernd Barth
Handelsregister Stuttgart HRB 245654 | St.-Nr. 56098/11677



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


--

*STAR Group* http://www.star-group.net
http://www.star-group.net/

*Martin Baumgärtner*

STAR Language Technology  Solutions GmbH
Umberto-Nobile-Straße 19 | 71063 Sindelfingen | Germany
Tel. +49 70 31-4 10 92-0 	martin.baumgaert...@star-group.net 
mailto:martin.baumgaert...@star-group.net

Fax +49 70 31-4 10 92-70

Re: [Moses-support] Performance issues using Moses Server with Moses 3

2015-07-23 Thread Barry Haddow


Hi Oren

You can fit a lot of model in 50G RAM. It's worth looking at the compact 
phrase and reordering models, pruning options, and kenlm quantisation, 
and you might fit the models on one machine.


As to the slowdown, a delay of 20s or so suggests that it's waiting on 
I/O, or perhaps xmlrpc-c is queueing up requests or connections.


cheers - Barry

On 23/07/15 15:06, Oren wrote:

More details...

Our NFS is mounted at /mnt/storage .

The command to run moses server is:

/mnt/storage/Common/mosesdecoder3/bin/mosesserver -f 
/mnt/storage/Common/models/translation_models/moses3_model/mert-work/moses.ini 
mark-unknown -xml-input -threads 18 exclusive


Here is the complete moses.ini (excluding comments):


[input-factors]
0

[mapping]
0 T 0

[distortion-limit]
6

[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryMemory name=TranslationModel0 num-features=4 
path=/mnt/storage/Common/models/translation_models/moses3_model/train/model/phrase-table.gz 
input-factor=0 output-factor=0
LexicalReordering name=LexicalReordering0 num-features=6 
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
path=/mnt/storage/Common/models/translation_models/moses3_model/train/model/reordering-table.wbe-msd-bidirectional-fe.gz

Distortion
KENLM lazyken=0 name=LM0 
path=/mnt/storage/Common/models/language_models/langmod.5gram.blmorder=5


[threads]
6

[weight]

LexicalReordering0= 0.0268311 -0.0146878 0.0261305 0.0380759 
0.0118265 0.07479

Distortion0= 0.074665
LM0= 0.0972206
WordPenalty0= -0.18469
PhrasePenalty0= -0.212528
TranslationModel0= 0.0184105 0.091358 0.112618 0.0161682
UnknownWordPenalty0= 1


On Wednesday, July 22, 2015, Oren mooshif...@gmail.com 
mailto:mooshif...@gmail.com wrote:


Yes, we use a lot of RAM in our setup. But the improved response
time justifies it.

Our language is on a nfs, bit we've been working this way with
moses 1 for some time with no problems (ten different machines
using the same language model file over nfs). Same for the
reordering model. Neither of them is in memory. Raising these
models into memory will require raising our already excessive RAM
requirements...

Thanks again for the help.

On Wednesday, July 22, 2015, Barry Haddow
bhad...@staffmail.ed.ac.uk
javascript:_e(%7B%7D,'cvml','bhad...@staffmail.ed.ac.uk'); wrote:

Hi Oren

I'm not aware of any threading problems with
PhraseDictionaryMemory, but not many people use it since it
takes up too much memory. Moses command line runs
multi-threaded using a thread pool.

Is your language model on a local file system? Running it over
nfs can be bad. What about your reordering model? Are you
using the compact or the memory version?

cheers - Barry

On 22/07/15 15:09, Oren wrote:

We have no swapping issues. I was asking if the use of
an in-memory translation model might cause multithreading
problems.


I'm not sure how to replicate the problem on cmd moses, since
it's purely a multithreading issue. Ca you run cmd moses
multithreaded?

I can't attach the complete moses.ini because it's on a
separate network... But I copied below the stuff that looked
relevant. I also tried to change the [threads] setting to 18,
with no apparent effect.

[input-factors]
0

[mapping]
0 T 0

[distortion-limit]
6

[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryMemory name=TranslationModel0 num-features=4
path=path/phrase-table.gz input-factor=0 output-factor=0
LexicalReordering parameters
Distortion
KENLM lazyken=0 name=LM0 path=path order=5

[threads]
6

[weight]

weight parameters

On Tuesday, July 21, 2015, Hieu Hoang hieuho...@gmail.com
wrote:

is it possible you can make your moses.ini file available
for us to see?

do you know if the same problem occurs if you use the
command line moses, rather than mosesserver?


Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 21 July 2015 at 18:07, Barry Haddow
bhad...@staffmail.ed.ac.uk wrote:


On 21/07/15 14:51, Oren wrote:

I am using the in-memory mode, using about 50GB of
RAM. (No swap issues as far as I can tell.) Could
that cause issues?


Yes, swapping would definitely cause issues - was
that your question?




I looked at the commit you linked to, but it doesn't
seem to be something configurable beyond the
-threads switch. Am I missing something?


The commit enables

Re: [Moses-support] Performance issues using Moses Server with Moses 3

2015-07-22 Thread Barry Haddow


Hi Oren

I'm not aware of any threading problems with PhraseDictionaryMemory, but 
not many people use it since it takes up too much memory. Moses command 
line runs multi-threaded using a thread pool.


Is your language model on a local file system? Running it over nfs can 
be bad. What about your reordering model? Are you using the compact or 
the memory version?


cheers - Barry

On 22/07/15 15:09, Oren wrote:
We have no swapping issues. I was asking if the use of an in-memory 
translation model might cause multithreading problems.



I'm not sure how to replicate the problem on cmd moses, since it's 
purely a multithreading issue. Ca you run cmd moses multithreaded?


I can't attach the complete moses.ini because it's on a separate 
network... But I copied below the stuff that looked relevant. I also 
tried to change the [threads] setting to 18, with no apparent effect.


[input-factors]
0

[mapping]
0 T 0

[distortion-limit]
6

[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
PhraseDictionaryMemory name=TranslationModel0 num-features=4 
path=path/phrase-table.gz input-factor=0 output-factor=0

LexicalReordering parameters
Distortion
KENLM lazyken=0 name=LM0 path=path order=5

[threads]
6

[weight]

weight parameters

On Tuesday, July 21, 2015, Hieu Hoang hieuho...@gmail.com 
javascript:_e(%7B%7D,'cvml','hieuho...@gmail.com'); wrote:


is it possible you can make your moses.ini file available for us
to see?

do you know if the same problem occurs if you use the command line
moses, rather than mosesserver?


Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 21 July 2015 at 18:07, Barry Haddow
bhad...@staffmail.ed.ac.uk wrote:


On 21/07/15 14:51, Oren wrote:

I am using the in-memory mode, using about 50GB of RAM. (No
swap issues as far as I can tell.) Could that cause issues?


Yes, swapping would definitely cause issues - was that your
question?




I looked at the commit you linked to, but it doesn't seem to
be something configurable beyond the -threads switch. Am
I missing something?


The commit enables you to set the maximum number of
connections to be the same as the maximum number of threads.



On Tuesday, July 21, 2015, Barry Haddow
bhad...@staffmail.ed.ac.uk wrote:

Hi Oren

Does your host have 18 threads available? It could also
be that xmlrpc-c is limiting the number of connections -
this can now be configured:

https://github.com/moses-smt/mosesdecoder/commit/b3baade7f022edbcea2969679a40616683f63523

Slowdowns in Moses are often caused by disk access
bottlenecks. You can use --minphr-memory and
--minlexr-memory to make sure that the phrase and
reordering tables are loaded in to memory, rather than
being access on-demand. Make sure your host has enough
RAM and is not swapping. As I mentioned before there are
various ways to make your models smaller
(http://www.statmt.org/moses/?n=Advanced.RuleTables),
which can make a big difference to speed depending on
your setup.

cheers - Barry

On 21/07/15 09:30, Oren wrote:

Hi Barry,

Thanks for the quick response.

I added the switch -threads 18 to the command to raise
moses server. The slowness issue persists but in a
different form. Most requests return right away, even
under heavy load, but some requests (about 5%) take far
longer - about 15-20seconds.

Perhaps there are other relevant switches?

Thanks again.

On Monday, July 20, 2015, Barry Haddow
bhad...@staffmail.ed.ac.uk wrote:

Hi Oren

The threading model is different. In v1, the server
created a new thread for every request, v3 uses a
thread pool. Try increasing the number of threads.

Also, make sure you use the compact phrase table and
KenLM as they are normally faster, and pre-pruning
your phrase table can help,

cheers - Barry

On 20/07/15 12:01, Oren wrote:

Hi all,

We are in the process of migrating from Moses 1 to
Moses 3. We have noticed a significant slowdown
when sending many requests at once to Moses Server.
The first request will actually finish about 25%
faster that a single request using Moses 1, but as
more requests accumulate there is a marked
slowdown, until requests take 5 times longer or more.

Is this a known issue? Is it specific to Moses
Server? What can we do

Re: [Moses-support] Performance issues using Moses Server with Moses 3

2015-07-21 Thread Barry Haddow


Hi Oren

Does your host have 18 threads available? It could also be that xmlrpc-c 
is limiting the number of connections - this can now be configured:

https://github.com/moses-smt/mosesdecoder/commit/b3baade7f022edbcea2969679a40616683f63523

Slowdowns in Moses are often caused by disk access bottlenecks. You can 
use --minphr-memory and --minlexr-memory to make sure that the phrase 
and reordering tables are loaded in to memory, rather than being access 
on-demand. Make sure your host has enough RAM and is not swapping. As I 
mentioned before there are various ways to make your models smaller 
(http://www.statmt.org/moses/?n=Advanced.RuleTables), which can make a 
big difference to speed depending on your setup.


cheers - Barry

On 21/07/15 09:30, Oren wrote:

Hi Barry,

Thanks for the quick response.

I added the switch -threads 18 to the command to raise moses server. 
The slowness issue persists but in a different form. Most requests 
return right away, even under heavy load, but some requests (about 5%) 
take far longer - about 15-20seconds.


Perhaps there are other relevant switches?

Thanks again.

On Monday, July 20, 2015, Barry Haddow bhad...@staffmail.ed.ac.uk 
mailto:bhad...@staffmail.ed.ac.uk wrote:


Hi Oren

The threading model is different. In v1, the server created a new
thread for every request, v3 uses a thread pool. Try increasing
the number of threads.

Also, make sure you use the compact phrase table and KenLM as they
are normally faster, and pre-pruning your phrase table can help,

cheers - Barry

On 20/07/15 12:01, Oren wrote:

Hi all,

We are in the process of migrating from Moses 1 to Moses 3. We
have noticed a significant slowdown when sending many requests at
once to Moses Server. The first request will actually finish
about 25% faster that a single request using Moses 1, but as more
requests accumulate there is a marked slowdown, until requests
take 5 times longer or more.

Is this a known issue? Is it specific to Moses Server? What can
we do about it?

Thanks!

Oren.


___
Moses-support mailing list
Moses-support@mit.edu  
javascript:_e(%7B%7D,'cvml','Moses-support@mit.edu');
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Performance issues using Moses Server with Moses 3

2015-07-21 Thread Barry Haddow



On 21/07/15 14:51, Oren wrote:
I am using the in-memory mode, using about 50GB of RAM. (No swap 
issues as far as I can tell.) Could that cause issues?


Yes, swapping would definitely cause issues - was that your question?




I looked at the commit you linked to, but it doesn't seem to be 
something configurable beyond the -threads switch. Am I missing something?


The commit enables you to set the maximum number of connections to be 
the same as the maximum number of threads.




On Tuesday, July 21, 2015, Barry Haddow bhad...@staffmail.ed.ac.uk 
mailto:bhad...@staffmail.ed.ac.uk wrote:


Hi Oren

Does your host have 18 threads available? It could also be that
xmlrpc-c is limiting the number of connections - this can now be
configured:

https://github.com/moses-smt/mosesdecoder/commit/b3baade7f022edbcea2969679a40616683f63523

Slowdowns in Moses are often caused by disk access bottlenecks.
You can use --minphr-memory and --minlexr-memory to make sure that
the phrase and reordering tables are loaded in to memory, rather
than being access on-demand. Make sure your host has enough RAM
and is not swapping. As I mentioned before there are various ways
to make your models smaller
(http://www.statmt.org/moses/?n=Advanced.RuleTables), which can
make a big difference to speed depending on your setup.

cheers - Barry

On 21/07/15 09:30, Oren wrote:

Hi Barry,

Thanks for the quick response.

I added the switch -threads 18 to the command to raise moses
server. The slowness issue persists but in a different form. Most
requests return right away, even under heavy load, but some
requests (about 5%) take far longer - about 15-20seconds.

Perhaps there are other relevant switches?

Thanks again.

On Monday, July 20, 2015, Barry Haddow
bhad...@staffmail.ed.ac.uk
javascript:_e(%7B%7D,'cvml','bhad...@staffmail.ed.ac.uk'); wrote:

Hi Oren

The threading model is different. In v1, the server created a
new thread for every request, v3 uses a thread pool. Try
increasing the number of threads.

Also, make sure you use the compact phrase table and KenLM as
they are normally faster, and pre-pruning your phrase table
can help,

cheers - Barry

On 20/07/15 12:01, Oren wrote:

Hi all,

We are in the process of migrating from Moses 1 to Moses 3.
We have noticed a significant slowdown when sending many
requests at once to Moses Server. The first request will
actually finish about 25% faster that a single request using
Moses 1, but as more requests accumulate there is a marked
slowdown, until requests take 5 times longer or more.

Is this a known issue? Is it specific to Moses Server? What
can we do about it?

Thanks!

Oren.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu  
javascript:_e(%7B%7D,'cvml','Moses-support@mit.edu');
http://mailman.mit.edu/mailman/listinfo/moses-support




The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Performance issues using Moses Server with Moses 3

2015-07-20 Thread Barry Haddow


Hi Oren

The threading model is different. In v1, the server created a new thread 
for every request, v3 uses a thread pool. Try increasing the number of 
threads.


Also, make sure you use the compact phrase table and KenLM as they are 
normally faster, and pre-pruning your phrase table can help,


cheers - Barry

On 20/07/15 12:01, Oren wrote:

Hi all,

We are in the process of migrating from Moses 1 to Moses 3. We have 
noticed a significant slowdown when sending many requests at once to 
Moses Server. The first request will actually finish about 25% faster 
that a single request using Moses 1, but as more requests accumulate 
there is a marked slowdown, until requests take 5 times longer or more.


Is this a known issue? Is it specific to Moses Server? What can we do 
about it?


Thanks!

Oren.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for papers: Deep Machine Translation Workshop 2015

2015-06-29 Thread Barry Haddow

(Sent on behalf of Jan Hajic)

We cordially invite you to take part in the first Deep Machine 
Translation Workshop, which will take place in Prague, Czech Republic, 
on 3rd-4th September 2015.

https://ufal.mff.cuni.cz/events/deep-machine-translation-workshop

This is the first workshop on Deep Machine Translation. Its aim is to 
bring together researchers and students working on machine translation 
approaches and technology using deep understanding (not necessarily 
using Deep Neural Networks, as the name might suggest, but certainly not 
excluding them either). Adding more linguistics has long been 
considered as a possible way to boost quality of current, mainly 
(PB)SMT-based systems. However, there are many ways to do so, and it was 
felt a forum is needed where experience can be shared among people 
working on such systems.

Moreover, we welcome submissions on any aspects of deep language 
analysis, generation and natural language understanding, even if the 
connection to machine translation might be indirect.

Finally we welcome submissions on query translation and other aspects of 
multilingual Question Answering (such as an NLP interface to an IT 
helpdesk) and/or Cross-lingual Information Retrieval.

We would like to attract submissions also from running or past EU 
projects on MT (QT21, HiML, QTLeap, TraMOOC, MMT, Khresmoi, KConnect, 
...) to share their experience about pursuing higher quality in MT - 
even if they do not use linguistic aspects and features directly.

Papers on original and unpublished research are welcome on any of the 
topics listed above in general, and specifically on any of the following:

- General approaches to the use of linguistic knowledge for Machine 
Translation
- Semantics for Machine Translation
- Combination of statistical and manual approaches to Machine 
Translation, hybrid systems
- Innovative use of manually built lexical resources in Machine 
Translation (monolingual, bilingual)
- Deep linguistic representation of meaning / semantics, including 
semantic graphs, logical representation, temporal and spatial 
representation and grounding
- Deep linguistic analysis and generation
- Joint linguistic and distributional modeling (analysis, generation, 
transfer)
- Analysis, generation and transfer using graph-based meaning representation
- Incorporating co-reference, named entity recognition, words sense 
disambiguation, or any other linguistically motivated features into the 
MT chain
- Multilingual question-answering and CLIR approaches, including 
specific methods for query translation and query matching in a 
multilingual setting
- Evaluation methods for standard text translation, query translation, 
and CLIR

Schedule:
- CFP released: June 26, 2015
- Submission deadline: July 20, 2015
- Announcement of acceptance: August 12, 2015
- Camera Ready due: August 27, 2015
- Workshop dates: September 3-4, 2015

Venue:
Institute of Formal and Applied Linguistics, Faculty of Mathematics and 
Physics, Charles University in Prague
Malostranske nam. 25
11800 Prague 1
Czech Republic

The maximum submission length is 8 pages (A4), plus two extra pages for 
references, following a one-column ACL-like format, as specified on the 
workshop webpage.

Papers shall be submitted in English. As the reviewing will be 
double-blind, papers must be anonymized with regard to the authors 
and/or their institution (no author-identifying information on the title 
page nor anywhere in the paper), including referencing style as usual. 
Authors should also ensure that identifying meta-information is removed 
from files submitted for review. Papers must conform to official DMTW 
2015 style guidelines, which are available on the workshop webpage. 
Submission and reviewing will be managed online by the EasyChair system. 
The only accepted format for submitted papers is in Adobe's PDF.

Papers that are being submitted in parallel to other conferences or 
workshops must indicate this on the title page. Papers that contain 
significant overlap with previously published work must also signal that.

Papers will be published online by the time of the Workshop, assigned an 
ISBN as regular proceedings published by the UFAL / Charles University 
publishing house, and listed in the ACL Anthology.

Mode of presentation will be decided by the Program Committee based on 
the submitted papers - either as an oral presentation or as a poster, 
based on suitability for the given presentation mode, not quality; all 
papers will be given the same space in the proceedings, and there will 
be no other distinction in the proceedings between research papers 
presented orally vs. as posters, either. Papers will be reviewed by at 
least three members of the Program Committee.

Program Committee:
- Jan Hajic (chair)
- António Branco (co-chair)
- Eneko Agirre
- Martin Popel
- Gertjan van Noord
- Aljoscha Burchardt
- Kiril Simov
- Petya Osenova
- Rosa Del Gaudio
- Eva Hajicova
- Khalil Sima'an
- Dekai Wu
- Deiy Xiong

Re: [Moses-support] How to re-run tuning using EMS

2015-06-22 Thread Barry Haddow

Just remove steps/1/TUNING_tune.1.DONE (replacing 1 with your experiment 
id) and then re-run.


It would be nice if EMS supported multiple tuning runs without 
intervention, but afaik it doesn't.


On 22/06/15 16:15, Lane Schwartz wrote:
Given a successful run of EMS, what do I need to do to configure a new 
run that re-uses all of the training, but re-runs tuning?


Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

2015-06-17 Thread Barry Haddow


Hi Davood

From line 20113 onwards there's a whole bunch of error messages 
indicating that the giza alignment didn't run properly, so then the 
resulting phrase extraction didn't work. I can't actually see why giza 
failed though - possibly the corpus was not preprocessed correctly. I'm 
not familiar with the arabic tool chain,


cheers - Barry

On 16/06/15 18:24, Davood Mohammadifar wrote:

Thanks Barry.

I attached log file. The file reports two training phases. (after (9) 
create moses.ini, the second training report has been appended).


I executed following instruction for both:

nohup nice 
/home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl 
-mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compress 
gzip -root-dir /home/hieu/train -corpus 
/home/hieu/corpus/training/training.clean -f fa -e en -alignment 
grow-diag-final-and -reordering msd-bidirectional-fe -lm 
0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir 
/home/hieu/workspace/github/mosesdecoder/tools




Is there any error or unusual thing in it?


Date: Tue, 16 Jun 2015 13:01:10 +0100
From: bhad...@staffmail.ed.ac.uk
To: davood...@hotmail.com; moses-support@mit.edu
Subject: Re: [Moses-support] Different phrase tables with same dataset

Hi Davood

It isn't normal to get such large differences in phrase table size or 
quality, on the same data set, although small variations are possible. 
You should check carefully that you used exactly the same settings in 
each run, and check if anything went wrong during training (errors in 
the log file),


cheers - Barry

On 16/06/15 12:00, Davood Mohammadifar wrote:

Hello everyone

I used Moses 3 for training my parallel corpus. I gained different
BLEU scores (18.5-22.5); So i tried to find the reason. Finally, I
understood that phrase tables are different from each other. I
trained 5 parallel sentences and the size of phrase table, for
the first time was about 39MB (gz format) and in second time, it
was about 59MB (gz format). Also the phrase tables' content are
somewhat different (in scores, and entries).

I used Mgiza and followed the instructions for baseline system in
Moses manual. The problem was remained by using Giza++, too.

The problem was remained in training of 15 sentences, too.

Is different size of phrase tables, normal?

Thank you


___
Moses-support mailing list
Moses-support@mit.edu  mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

2015-06-17 Thread Barry Haddow

Do you think that my medium system is effective? (Core i5 2400 , 4GB 
RAM, Ubuntu 32bit 14.04). Of course i wanted to train about 5 
sentences. 
For a small data set of 50k sentences, this should work. You could try 
on 10k sentences to be sure.



On 17/06/15 13:46, Davood Mohammadifar wrote:

Thanks a lot Barry

I do not think the problem is related to persian side of corpus. 
Because My problem is remained when i'm running with French/English 
sample corpus (its link is in moses manual). Based on your comments, I 
think that i should check truecasing, recasing and cleaning tools that 
works properly in preprocessing.


Do you think that my medium system is effective? (Core i5 2400 , 4GB 
RAM, Ubuntu 32bit 14.04). Of course i wanted to train about 5 
sentences.



Date: Wed, 17 Jun 2015 12:34:59 +0100
From: bhad...@staffmail.ed.ac.uk
To: davood...@hotmail.com; moses-support@mit.edu
Subject: Re: [Moses-support] Different phrase tables with same dataset

Hi Davood

From line 20113 onwards there's a whole bunch of error messages 
indicating that the giza alignment didn't run properly, so then the 
resulting phrase extraction didn't work. I can't actually see why giza 
failed though - possibly the corpus was not preprocessed correctly. 
I'm not familiar with the arabic tool chain,


cheers - Barry

On 16/06/15 18:24, Davood Mohammadifar wrote:

Thanks Barry.

I attached log file. The file reports two training phases. (after
(9) create moses.ini, the second training report has been
appended).

I executed following instruction for both:

nohup nice
/home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
-mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compress
gzip -root-dir /home/hieu/train -corpus
/home/hieu/corpus/training/training.clean -f fa -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
/home/hieu/workspace/github/mosesdecoder/tools



Is there any error or unusual thing in it?


Date: Tue, 16 Jun 2015 13:01:10 +0100
From: bhad...@staffmail.ed.ac.uk mailto:bhad...@staffmail.ed.ac.uk
To: davood...@hotmail.com mailto:davood...@hotmail.com;
moses-support@mit.edu mailto:moses-support@mit.edu
Subject: Re: [Moses-support] Different phrase tables with same dataset

Hi Davood

It isn't normal to get such large differences in phrase table size
or quality, on the same data set, although small variations are
possible. You should check carefully that you used exactly the
same settings in each run, and check if anything went wrong during
training (errors in the log file),

cheers - Barry

On 16/06/15 12:00, Davood Mohammadifar wrote:

Hello everyone

I used Moses 3 for training my parallel corpus. I gained
different BLEU scores (18.5-22.5); So i tried to find the
reason. Finally, I understood that phrase tables are different
from each other. I trained 5 parallel sentences and the
size of phrase table, for the first time was about 39MB (gz
format) and in second time, it was about 59MB (gz format).
Also the phrase tables' content are somewhat different (in
scores, and entries).

I used Mgiza and followed the instructions for baseline system
in Moses manual. The problem was remained by using Giza++, too.

The problem was remained in training of 15 sentences, too.

Is different size of phrase tables, normal?

Thank you


___
Moses-support mailing list
Moses-support@mit.edu  mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support





The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

2015-06-16 Thread Barry Haddow


Hi Davood

It isn't normal to get such large differences in phrase table size or 
quality, on the same data set, although small variations are possible. 
You should check carefully that you used exactly the same settings in 
each run, and check if anything went wrong during training (errors in 
the log file),


cheers - Barry

On 16/06/15 12:00, Davood Mohammadifar wrote:

Hello everyone

I used Moses 3 for training my parallel corpus. I gained different 
BLEU scores (18.5-22.5); So i tried to find the reason. Finally, I 
understood that phrase tables are different from each other. I trained 
5 parallel sentences and the size of phrase table, for the first 
time was about 39MB (gz format) and in second time, it was about 59MB 
(gz format). Also the phrase tables' content are somewhat different 
(in scores, and entries).


I used Mgiza and followed the instructions for baseline system in 
Moses manual. The problem was remained by using Giza++, too.


The problem was remained in training of 15 sentences, too.

Is different size of phrase tables, normal?

Thank you


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Come work at the birthplace of the Moses decoder for a year or more to bring MT to the next level!

2015-05-26 Thread Barry Haddow


Hi All

The deadline for this has been extended to June 2nd. We are looking for 
a research associate to join the Edinburgh SMT group, initially for 12 
months, but could be extended if current funding applications are 
successful.


The advert mentions specific projects, but we can be quite flexible. We 
have several projects going on at the moment 
(http://www.statmt.org/ued/?n=Public.Projects) and are looking for 
someone interested in MT with strong software development skills.


Feel free to drop myself or Uli a line if you want to know more,

cheers - Barry

On 13/05/15 01:33, Ulrich Germann wrote:
We have an open position for a research associate in machine 
translation for 12 months initially, with the possibility of extension 
(depending on funding).


https://www.vacancies.ed.ac.uk/pls/corehrrecruit/erq_jobspec_version_4.jobspec?p_id=033241

- Uli


--
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Error compiling Moses

2015-05-21 Thread Barry Haddow


Hi Marius

It looks like you're missing the bz2 package. Try installing libbz2-dev 
(on debian-based systems) or bzip2-devel (rpm-based systems).


You're also using your own boost installation, as opposed to the system 
one. Usually it's easier to use the system one as the correct 
dependencies will be there,


cheers - Barry

On 21/05/15 09:04, Marius Oliver Gheorghita wrote:

Hi,
Please advise about the error that I get when compiling Moses. Thank 
you very much in anticipation,

The exact command that I have executed when get this error is:

./bjam --with-boost=~/home/pangeanic/MariusMoses//boost_1_58_0 -j8

All the best,
Marius Gheorghita


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Error compiling Moses

2015-05-21 Thread Barry Haddow


Hi Marius

 No such file or directory while opening lm/europarl.srilm.gz

This file is missing. Note that you have a relative path, so you have to 
be in the correct directory,


cheers - Barry

On 21/05/15 12:56, Marius Oliver Gheorghita wrote:

Hi Barry,
Thanks again for the previous help. After compiling I have run Moses 
for the first time and I get this error:


Exception: util/file.cc:68 in int util::OpenReadOrThrow(const char*) 
threw ErrnoException because `-1 == (ret = open(name, 00))'.

No such file or directory while opening lm/europarl.srilm.gz

Can you help me to identify the source of error? Thanks so much.
Cheers,
Marius


*From:* Barry Haddow bhad...@staffmail.ed.ac.uk
*To:* Marius Oliver Gheorghita redwir...@yahoo.com; 
moses-support@mit.edu moses-support@mit.edu

*Sent:* Thursday, 21 May 2015, 12:56
*Subject:* Re: [Moses-support] Error compiling Moses

Hi Marius

It looks like you're missing the bz2 package. Try installing 
libbz2-dev (on debian-based systems) or bzip2-devel (rpm-based systems).


You're also using your own boost installation, as opposed to the 
system one. Usually it's easier to use the system one as the correct 
dependencies will be there,


cheers - Barry



On 21/05/15 09:04, Marius Oliver Gheorghita wrote:

Hi,
Please advise about the error that I get when compiling Moses. Thank 
you very much in anticipation,

The exact command that I have executed when get this error is:

./bjam --with-boost=~/home/pangeanic/MariusMoses//boost_1_58_0 -j8

All the best,
Marius Gheorghita


___
Moses-support mailing list
Moses-support@mit.edu  mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem with qsub when running experiments

2015-05-07 Thread Barry Haddow


Hi Carla

I don't think your second error is qsub-related, you need to look at 
filterphrases.err to see what is going on.


For the EMS errors, I can't really see why it is detecting that you have 
a cluster. As a workaround, I would suggest that you comment out the 
call to detect_if_cluster() (line 44 in current master) in 
experiment.perl and see if that helps,


cheers - Barry

On 06/05/15 14:59, carla.pa...@hermestrans.com wrote:

Hi again,

this is the command I use: ~/mosesdecoder/scripts/ems/experiment.perl 
-config config.toy -exec


I have also tried a modification of the configuration file 
config.placeables. This is when the problem started. To make sure 
that it was not the config file, I tried with an older config.file 
from another experiment which finished without problems (just changed 
the path for the experiment). I tried to run an older experiment which 
worked and also failed this time with the same error.


I attach both files, so that you can see them.

I don't know if it might help, but I also tried to run the experiment 
step-by-step and at decoding I also got an error:


/home/hermesta/mosesdecoder/scripts/training/mert-moses.pl 
/home/hermesta/Exps/KES_newDev_placeholders/data/dev/KES10.dev.preproc.tok.true.en 
/home/hermesta/Exps/KES_newDev_placeholders/data/dev/KES10.dev.preproc.tok.true.es 
/home/hermesta/mosesdecoder/bin/moses 
/home/hermesta/Exps/KES_newDev_placeholders/working/train/model/moses.ini 
--mertdir /home/hermesta/mosesdecoder/bin/


Using SCRIPTS_ROOTDIR: /home/hermesta/mosesdecoder/scripts
filtering the phrase tables... mié may  6 15:58:15 CEST 2015
exec: 
/home/hermesta/mosesdecoder/scripts/training/filter-model-given-input.pl 
./filtered 
/home/hermesta/Exps/KES_newDev_placeholders/working/train/model/moses.ini 
/home/hermesta/Exps/KES_newDev_placeholders/data/dev/KES10.dev.preproc.tok.true.en 

Executing: 
/home/hermesta/mosesdecoder/scripts/training/filter-model-given-input.pl 
./filtered 
/home/hermesta/Exps/KES_newDev_placeholders/working/train/model/moses.ini 
/home/hermesta/Exps/KES_newDev_placeholders/data/dev/KES10.dev.preproc.tok.true.en 
 filterphrases.out 2 filterphrases.err

Exit code: 1
ERROR: Failed to run 
'/home/hermesta/mosesdecoder/scripts/training/filter-model-given-input.pl 
./filtered 
/home/hermesta/Exps/KES_newDev_placeholders/working/train/model/moses.ini 
/home/hermesta/Exps/KES_newDev_placeholders/data/dev/KES10.dev.preproc.tok.true.en'. 
at /home/hermesta/mosesdecoder/scripts/training/mert-moses.pl line 1719.


When checking line 1719 of mert-moses.pl, I realized it is also 
related to qsub.


Thank you so much!
Carla

El 06.05.2015 15:27, Barry Haddow escribió:

Hi Carla

Not sure what's going on, and no reason why things should change when
you installed asiya. Something else must have changed.

Could you post your EMS config file, and the exact command you use to 
run EMS?


cheers - Barry

On 06/05/15 13:32, carla.pa...@hermestrans.com wrote:

Hi Barry,

thanks for your prompt reply. If I am not wrong the name of the 
server is hermesta-Z10PE-D8-WS (I have taken it from the machine 
information, I attach a screenshot). If I should look somewhere else 
please let me know.


Thanks,
Carla

El 06.05.2015 14:10, Barry Haddow escribió:

Hi Carla

What's your server called?

There's a hard-coded list of Edinburgh machines in ems, so I'm
wondering if it collides with one of them,

cheers - Barry

On 06/05/15 12:56, carla.pa...@hermestrans.com wrote:

Hi everyone,

First of all, thanks for reading and hopefully giving me some useful
pointer. I am running several SMT experiments on an Ubuntu 
machine. It
is a multicore machine, but I have commented out the options for 
running

experiments on multicore machines in the config.file.

Up to last week, I was able to run experiments without problems.
However, since yesterday I get the error:

Can't exec qsub: No existe el archivo o el directorio at
/home/hermesta/mosesdecoder/scripts/ems/experiment.perl line 1291.

Does anyone know what could be going on? The only thing I did was
installing asiya to assess MT output. My guess is that somehow MOSES
detects that it is a multicore machine and tries to parallelize jobs.
However, I don't understand why this was not happening last week, for
instance. I have also tried to update MOSES by running git pull 
and I
also reinstalled MOSES hoping this would fix the problem. My 
background

is linguistics, and thus I am a bit lost now.

Thank you very much,

Carla Parra Escartín
Marie Curie ER - EXPERT ITN
Hermes Traducciones


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336

Re: [Moses-support] lattice MBR - what happened to my nbest features?

2015-05-07 Thread Barry Haddow

Hi Jeremy

It probably won't be hard - but I haven't looked at the code in a long time.

Note that we didn't see any gain from lmbr. We only ever implemented it as
reranking - not lattice rescoring - but according to the original paper that
should still help

Cheers - Barry

On 7 May 2015 19:14:18 BST, Jeremy Gwinnup jer...@gwinnup.org wrote:
How hard would it be to append the LMBR scores to the list of features
instead of overwriting it? Maybe I can tackle this at MTMA15 next week.
I’m not too worried about the long runtime at least initially.

On May 6, 2015, at 5:01 PM, Barry Haddow bhad...@inf.ed.ac.uk
wrote:

Hi Jeremy

It's been a long time since I looked at this, but I think these are
the component scores in the linearised corpus bleu used in Lattice MBR
(see Tromble et al (2008), section 5). They are used to rescore the
nbest list, and must be implemented by replacing the original feature
vectors.

So it looks like lattice MBR doesn't work with nbest lists, or at
least doesn't give you what you need for tuning. Do you really want to
tune with lattice MBR? It's going to be very slow,

cheers - Barry

On 06/05/15 20:49, Jeremy Gwinnup wrote:
Hi,

I’ve been attempting to experiment with lattice MBR with various
settings and I see something weird happen to my nbest lists:

0 ||| the prime ministers of india and japan meet in tokyo |||
Distortion0= … etc etc becomes

0 ||| the prime ministers of india and japan meet in tokyo ||| map:
0 w: 12 11.19 8.06 6.63 5.39

My feature weights get replaced by map and w feature with 5 weights

I’m setting the following as moses.ini parameters:
lminimum-bayes-risk
lmbr-p
lmbr-r
mbr-scale
lmbr-pruning-factor

Anything extra that I need to know as far as what map and w are, or
how to get my normal features back in the nbest list?

Thanks!
-Jeremy
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Fwd: Server development

2015-05-06 Thread Barry Haddow


Hi Tomas

The thread pool fixes the issue you mention, but it was never backported 
to v2.1.1. If you can move to v3, that would be the best way to go,


cheers - Barry

On 06/05/15 08:38, Tomas Fulajtar wrote:


Hi Barry,


  Thanks for explanation –  I was referring to “[Moses-support]
  mosesserver parallelization issue
  
https://www.mail-archive.com/search?l=moses-support@mit.eduq=subject:%22%5C%5BMoses%5C-support%5C%5D+mosesserver+parallelization+issue%22o=newest”
  thread in April 2014.


  Do you thing the thread pool would be  also the solution for this
  issue reported on 2.1.1 branch.?


  Thanks,

Tomas

*From:*Barry Haddow [mailto:bhad...@inf.ed.ac.uk]
*Sent:* Tuesday, May 5, 2015 9:27 PM
*To:* Hieu Hoang; moses-support
*Subject:* Re: [Moses-support] Fwd: Fwd: Server development

HI Tomas

There were some issues in v2 with the way that caching was done in the 
binarised phrase table. It used a cache per thread, and mosesserver 
used a thread per request, so caching was effectively broken in the 
server. Since last Autumn, mosesserver uses a thread pool ... and the 
binarised phrase table is gone now anyway,


cheers - Barry



On 05/05/15 18:27, Hieu Hoang wrote:

What limitations are you referring to?

-- Forwarded message --
From: Ulrich Germann ulrich.germ...@gmail.com
mailto:ulrich.germ...@gmail.com
Date: 5 May 2015 19:49
Subject: [Moses-support] Fwd: Server development
To: moses-support@mit.edu mailto:moses-support@mit.edu
moses-support@mit.edu mailto:moses-support@mit.edu
Cc:

This response was meant to go to moses-support as well Tomas.

-- Forwarded message --
From: *Tomas Fulajtar* toma...@moravia.com
mailto:toma...@moravia.com
Date: Fri, Apr 3, 2015 at 5:03 PM
Subject: RE: [Moses-support] Server development
To: ugerm...@inf.ed.ac.uk mailto:ugerm...@inf.ed.ac.uk
ugerm...@inf.ed.ac.uk mailto:ugerm...@inf.ed.ac.uk

Hi Ulrich,

Thanks for the thorough explanation -  the idea of merging the
server code back to moses is great.

Apart from this (and I know is is a huge workload), were there any
changes in the thread support? I know this part had some
limitations – as discussed on the forum.

Kind regards,

Tomas

*From:*Ulrich Germann [mailto:ulrich.germ...@gmail.com
mailto:ulrich.germ...@gmail.com]
*Sent:* Thursday, April 2, 2015 12:57 AM
*To:* Tomas Fulajtar
*Subject:* Re: [Moses-support] Server development

Hi Tomas,

the plan is to fold server capabilities into the main moses
executable. In fact, that has already happened (in the sense that
you can run the main moses executable in server mode), but
functional equivalence with the old code has not been tested.

There are currently no server tests included in the regression
tests, so I left the old code mostly intact (adjusting only for
changes in the API of functions called) for legacy reasons, but
adding new functionality to mosesserver is extremely strongly
DIScouraged.

Supplying regression tests for server functionality, on the other
hand, is equally strongly ENcouraged. In a nutshell, what you get
back from calling mosesserver and moses --server should be identical.

The long-term plan is to offer through RPC calls (almost)
everything that moses offers in batch mode (i.e., send search and
output parameters through json/RCP calls and have them noticed and
respected). Notice the long-term there.

So mosesserver is on its way out, and moses --server-port=port
--server will replace the old call to mosesserver.

Best regards - Uli

On Wed, Apr 1, 2015 at 9:48 AM, Tomas Fulajtar
toma...@moravia.com mailto:toma...@moravia.com wrote:

Dear all,

I have spotted there were numerous commits in the server side
development -  could the developers share the news/goals with
the forum?  I think it might be interesting for more users –
especially those out of core team.

Thank you,

*Tomáš Fulajtár*| Researcher
*T:* +420-545-552-340 tel:%2B420-545-552-340
toma...@moravia.com mailto:toma...@moravia.com | moravia.com
http://www.moravia.com/ | *Skype:* tomasfulajtar


___
Moses-support mailing list
Moses-support@mit.edu mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



-- 


Ulrich Germann
Senior Researcher

School of Informatics

University of Edinburgh



-- 


Ulrich Germann
Senior Researcher

School of Informatics

University of Edinburgh


___
Moses-support mailing list
Moses-support@mit.edu mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem with qsub when running experiments

2015-05-06 Thread Barry Haddow

Hi Carla

Not sure what's going on, and no reason why things should change when 
you installed asiya. Something else must have changed.

Could you post your EMS config file, and the exact command you use to 
run EMS?

cheers - Barry

On 06/05/15 13:32, carla.pa...@hermestrans.com wrote:
 Hi Barry,

 thanks for your prompt reply. If I am not wrong the name of the server 
 is hermesta-Z10PE-D8-WS (I have taken it from the machine 
 information, I attach a screenshot). If I should look somewhere else 
 please let me know.

 Thanks,
 Carla

 El 06.05.2015 14:10, Barry Haddow escribió:
 Hi Carla

 What's your server called?

 There's a hard-coded list of Edinburgh machines in ems, so I'm
 wondering if it collides with one of them,

 cheers - Barry

 On 06/05/15 12:56, carla.pa...@hermestrans.com wrote:
 Hi everyone,

 First of all, thanks for reading and hopefully giving me some useful
 pointer. I am running several SMT experiments on an Ubuntu machine. It
 is a multicore machine, but I have commented out the options for 
 running
 experiments on multicore machines in the config.file.

 Up to last week, I was able to run experiments without problems.
 However, since yesterday I get the error:

 Can't exec qsub: No existe el archivo o el directorio at
 /home/hermesta/mosesdecoder/scripts/ems/experiment.perl line 1291.

 Does anyone know what could be going on? The only thing I did was
 installing asiya to assess MT output. My guess is that somehow MOSES
 detects that it is a multicore machine and tries to parallelize jobs.
 However, I don't understand why this was not happening last week, for
 instance. I have also tried to update MOSES by running git pull and I
 also reinstalled MOSES hoping this would fix the problem. My background
 is linguistics, and thus I am a bit lost now.

 Thank you very much,

 Carla Parra Escartín
 Marie Curie ER - EXPERT ITN
 Hermes Traducciones


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem with qsub when running experiments

2015-05-06 Thread Barry Haddow

Hi Carla

What's your server called?

There's a hard-coded list of Edinburgh machines in ems, so I'm wondering 
if it collides with one of them,

cheers - Barry

On 06/05/15 12:56, carla.pa...@hermestrans.com wrote:
 Hi everyone,

 First of all, thanks for reading and hopefully giving me some useful
 pointer. I am running several SMT experiments on an Ubuntu machine. It
 is a multicore machine, but I have commented out the options for running
 experiments on multicore machines in the config.file.

 Up to last week, I was able to run experiments without problems.
 However, since yesterday I get the error:

 Can't exec qsub: No existe el archivo o el directorio at
 /home/hermesta/mosesdecoder/scripts/ems/experiment.perl line 1291.

 Does anyone know what could be going on? The only thing I did was
 installing asiya to assess MT output. My guess is that somehow MOSES
 detects that it is a multicore machine and tries to parallelize jobs.
 However, I don't understand why this was not happening last week, for
 instance. I have also tried to update MOSES by running git pull and I
 also reinstalled MOSES hoping this would fix the problem. My background
 is linguistics, and thus I am a bit lost now.

 Thank you very much,

 Carla Parra Escartín
 Marie Curie ER - EXPERT ITN
 Hermes Traducciones


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

1 2 3 4 5 6 7 8 >

1 - 100 of 707 matches

Mail list logo