Hi Sanskarshan, >I will take a look at this over the weekend (I am between cities and >it is a bit difficult for me now). Is the code for the implementation >available as well?
Sure not a problem, we can discuss more during the weekend. Sampark is a government funded project and the code for the implementation is not available as per I now we can look into details for same. >The initial idea was to check if a system like Moses (Statistical MT) >could be enhanced to be able to handle translation of content at >scale. I'd look forward to what you think is possible. We can start by looking how Moses performs and do the error analysis and make improvisation over same using the necessary methods . What data are we using for learning can you provide more details about the corpus that we have in terms of number of sentences. I was also thinking it would be a good idea to first do a ground research about other English-Bengali systems and use the knowledge from same. Two important systems which I found are as follows- 1) http://tdil-dc.in/components/com_mtsystem/CommonUI/homeMT.php this is a government project and it's more on hybrid mechanism kind of a pipeline architecture, we can discuss the details as per the need I know the architecture and other detailed information about same. 2)Anubadok- (http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl) it seems this is an open source project and it's using some of the resources been build by Ankur organization the English-Bengali dictionary ( http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl) so if you have some more details about same then it will be great. I downloaded the Anubadok system and is trying to have some hand-on experience on same and look into the source code. Apart from this there is also an apertium project ( http://wiki.apertium.org/wiki/Apertium-bn-en) for English-Bengali language pair which has some of the tools and resources available. I have few queries- What are we aiming by this project as far as I see there can be 3 different aspects- 1) We want to begin from scratch and use statistical mt and see how it works for English-Bengali language pair and over this statistical approach use other knowledge to learn rules and make a translation model / prototype. 2) Search and based on the available other models and resources such as chunker, pos tagger which are openly available make a model combining the available resources and build a MT system. 3) Take some of the exiting system and improve over same using statistical approaches. Sorry for a big mail but wanted to cover all details. Looking forward to hear from you. Regards Piyush On Wed, Apr 17, 2013 at 9:47 PM, Sankarshan Mukhopadhyay < [email protected]> wrote: > On Wed, Apr 17, 2013 at 6:04 PM, piyush arora <[email protected]> > wrote: > > I have done some projects on natural language processing, machine > > translations and information retrieval. I have worked on the Machine > > Translation project > http://sampark.iiit.ac.in/sampark/web/index.php/content > > where the aim is to build MT system for 18 indian language pairs. > > I will take a look at this over the weekend (I am between cities and > it is a bit difficult for me now). Is the code for the implementation > available as well? > > > I worked on the similar lines of tranfer-grammar rules. I have a bit of > > experience with transfer rules for Hindi, Telugu and a bit of Bengali > > Language. > > > > So would be great if can get more information about the project and other > > details. > > The initial idea was to check if a system like Moses (Statistical MT) > could be enhanced to be able to handle translation of content at > scale. I'd look forward to what you think is possible. > > > > -- > sankarshan mukhopadhyay > <https://twitter.com/#!/sankarshan> >
_______________________________________________ Project-ideas mailing list [email protected] http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in
