[Moses-support] Extract list of n-grams from Trie Language Model that contains a certain word

2016-06-03 Thread Graeme Kidd
Hi,

 

This is still all very new to me so apologies if this is not the correct
place to ask this questions.

 

I am wanting to take the English Trie Language Model (5.5TB) created from
the Common Crawl data set:

http://data.statmt.org/ngrams/lm/en.trie

 

Then extract all n-grams that contain a certain word. This needs to be done
for a list of 100 words. For example if I was looking for all n-grams that
contained the word "discombobulated" I would want an output file containing
the n-gram that contains that word and the number of times that n-gram
occurs:

word1 discombobulated 25

word1 discombobulated word3 40

 

Due to the size of the file, this is something I am keen to get right first
time. For this reason is someone able to give me an example of how this can
be done and would this kind of query be possible with 64GB of RAM?

 

Thanks,

Graeme

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase align given word alignments

2016-06-03 Thread Rajen Chatterjee
Hi Motoki,

What do you mean by word alignments from a monolingual corpus?

I assume you have word alignments of a parallel corpus, and you want to
extract phrase pair using your alignment file. If this is the case then you
may look here
http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters
Scroll down to "if you may have your own method to generate a word
alignment, you want to skip  ."
You have to pass "--first-step 4" and "--alignment-file
/your-alignment-file" to train-model.perl



On Fri, Jun 3, 2016 at 12:42 AM, Motoki Wu  wrote:

> Hi,
>
> I have the word alignments (e.g. 0-0, 0-1, 1-2...) from a monolingual
> corpus using another program.
>
> I'd like to only use the phrase align part of the Moses pipeline directly.
>
> Is it possible to do so?
>
> Thanks
>
> - Motoki
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
-Regards,
 Rajen Chatterjee.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] mosesserver and placeholder compatibility with tree-based models

2016-06-03 Thread Vito Mandorino
Hi Hieu,

There is also the mosesserver issue so I cannot say that's the only
difficulty from using tree-based models. If you can have a look it would be
great anyway. Speed is also an issue even though much less for hierarchical
than syntactic models in our tests.

Thank you and best regards,
Vito

2016-06-02 14:44 GMT+02:00 Hieu Hoang :

> I believe placeholders can be used but no-one has really tried. If thats
> the only thing holding you back from using hiero/syntax models, I can look
> into it for you.
>
> However, there's likely to be other issues, such as speed and memory
> consumption which may make these models unsuitable for commercial use.
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 2 June 2016 at 11:48, Vito Mandorino  > wrote:
>
>> Dear all,
>>
>> I am exploring hierarchical and syntactic models and I have two questions:
>>
>> 1. Is it possible to decode using mosesserver instead of moses_chart ?
>> According to  " /mosesdecoder/bin/mosesserver --help "  one can choose
>> different --search-algorithm values. I tried for instance, for a
>> target-syntax model,
>>
>> /mosesdecoder/bin/mosesserver -f moses.ini --search-algorithm 6
>>
>> but got no translation in return (and actually no error either).
>>
>> 2. Is it possible to use placeholders with a hierarchical or syntactic
>> model?
>> I tried something like
>>
>> echo '@num@ test' |
>> /mosesdecoder/bin/moses_chart -f moses.ini -xml-input inclusive
>>
>> which got me a Segmentation Fault. Is there any additional preprocessing
>> step needed to make them work together?
>>
>>
>> Thank you,
>>
>> Vito
>>
>> --
>> *M**. Vito MANDORINO -- Chief Scientist*
>>
>>
>> [image: Description : Description : lingua_custodia_final full logo]
>>
>>  *The Translation Trustee*
>>
>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>
>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>> <%2B33%206%2084%2065%2068%2089>*
>>
>> *Email :*  *vito.mandor...@linguacustodia.com
>> *
>>
>> *Website :*
>> *www.linguacustodia.finance *
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>


-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
*

*Website :*
*www.linguacustodia.finance *
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support