Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-09 Thread Hèctor Alòs i Font
Yes, my own experience is that more or less simultaneous update of the dictionaries is the quickest option. I usually work on a spreadsheet with words in decreasing order of frequency, and I write a script that reads it and generates the XML code for inserting in the dictionaries. It's quick and

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-09 Thread Sevilay Bayatlı
Hi Anuradha, You need to update your proposal based on what Hèctor suggested, yeah it is better to work on both monodix and bidix simultaneously, but for a good lexicon, you need to take a small corpus and analysis the sentences and adding words. Sevilay On Thu, Apr 8, 2021 at 9:24 AM Anuradha

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-08 Thread Anuradha Pandey
Thank you for your response, Hèctor. I read the proposal for the Hindi-Bengali translator. There aren't open-source dictionaries for the Bhojpuri language (though there are resources for getting a Bhojpuri corpus), so I was using a hardcopy of a BHO-HIN dictionary for manually adding the pairs. I

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Hèctor Alòs i Font
Hi, Anuradha. Thanks for your proposal draft. First, I would like to tell you that if Apertium is a rule-based translation system, it is because this paradigm still makes sense for many languages (indeed, for the vast majority of them). If Bhojpuri has extensive electronic language resources and,

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Kevin Brubeck Unhammer
Rajarshi Roychoudhury čálii: > Bhojpuri and Hindi are very closely related language pairs > As far as I know(correct me if I am wrong) , apart from some minor > phoenetical changes they can be considered identical pairs . Seems like a good fit for Apertium then :) considering one of the most

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Rajarshi Roychoudhury
# in the grammar On Wed, Apr 7, 2021, 19:34 Rajarshi Roychoudhury wrote: > Please give an example where CFG vary significantly in the 2 languages > > On Wed, Apr 7, 2021, 19:25 Anuradha Pandey > wrote: > >> Yes, I did look into the constraint grammar and the two languages vary >> significantly

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Rajarshi Roychoudhury
Please give an example where CFG vary significantly in the 2 languages On Wed, Apr 7, 2021, 19:25 Anuradha Pandey wrote: > Yes, I did look into the constraint grammar and the two languages vary > significantly though lemmas in Bhojpuri are mostly an extension to those in > Hindi. So what would

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Anuradha Pandey
Yes, I did look into the constraint grammar and the two languages vary significantly though lemmas in Bhojpuri are mostly an extension to those in Hindi. So what would you suggest? Should I translate it to Marathi instead? Since in terms of linguistics, I am proficient in Hindi, English, Marathi,

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Rajarshi Roychoudhury
Bhojpuri and Hindi are very closely related language pairs As far as I know(correct me if I am wrong) , apart from some minor phoenetical changes they can be considered identical pairs . Have you tried building disambiguation rules? What are their structures? On Wed, Apr 7, 2021, 18:57 Anuradha

[Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-07 Thread Anuradha Pandey
Hello everyone, I am Anuradha Pandey, a sophomore student at BITS Pilani. I am interested I participating in GSoC 2021, on the project - "*Develop a prototype MT system for a strategic language pair*". I have prepared a rough draft for the same and I am planning to build Bhojpuri(BHO)-Hindi(HIN)