Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
I have modified the proposal for better explanation of the process. Kindly give a look at it. The bilingual dictionary needs some work to be done, I didn't time to complete it as I was busy determining the sentiment tag . I will try to incorporate it as soon as possible. Please suggest if any

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
The sentiment tags will help to form more detailed and diverse patterns which can help to form better rules to disambiguate, lexical selection and reorder . As far as those languages where sentiwordnet does not exist, a linguist will be able to determine sentiment polarity. Since i have the

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Tanmai Khanna
Hey I have one doubt, The examples given for mistranslation, I didn't quite understand how sentiment analysis would fix those. Also what about languages for which a SentiWordNet doesn't exist? Thanks and Regards, Tanmai On Fri, Mar 27, 2020 at 3:56 PM Rajarshi Roychoudhury <

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
Hi, I have finished writing my proposal , wrote a code on how to do sentiment analysis with character embedding as a coding challenge, added words to monolingual and bilingual dictionaries and designed a constraint grammar. I am working on building the bidix and lrx files for now.. Would be very

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-23 Thread Tino Didriksen
"A randomly generated password for Rroychoudhury has been sent to rroychoudhu...@gmail.com." -- Tino Didriksen On Mon, 23 Mar 2020 at 03:10, Rajarshi Roychoudhury < rroychoudhu...@gmail.com> wrote: > I have completed writing my gsoc proposal, can I get a wiki account? > > Username:

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-22 Thread Rajarshi Roychoudhury
I have completed writing my gsoc proposal, can I get a wiki account? Username: rroychoudhury email: rroychoudhu...@gmail.com On Fri, Mar 6, 2020, 21:40 Rajarshi Roychoudhury wrote: > One is .odt format , the other in .pdf. Kindly give it a read and give > suggestions. > Best, > Rajarshi > > On

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Rajarshi Roychoudhury
One is .odt format , the other in .pdf. Kindly give it a read and give suggestions. Best, Rajarshi On Fri, 6 Mar 2020 at 21:15, Francis Tyers wrote: > El 2020-03-06 15:35, Scoop Gracie escribió: > > Sending it as .odt would be great. > > > > On Fri, Mar 6, 2020, 07:27 Rajarshi Roychoudhury > >

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Francis Tyers
El 2020-03-06 15:35, Scoop Gracie escribió: Sending it as .odt would be great. On Fri, Mar 6, 2020, 07:27 Rajarshi Roychoudhury wrote: Then how should I send the file. I don't know if there is anyone to mentor this since this is not from the list of ideas mentioned . On Fri, Mar 6, 2020,

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Scoop Gracie
Sending it as .odt would be great. On Fri, Mar 6, 2020, 07:27 Rajarshi Roychoudhury wrote: > Then how should I send the file. I don't know if there is anyone to mentor > this since this is not from the list of ideas mentioned . > > On Fri, Mar 6, 2020, 20:49 Francis Tyers wrote: > >> El

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Rajarshi Roychoudhury
Then how should I send the file. I don't know if there is anyone to mentor this since this is not from the list of ideas mentioned . On Fri, Mar 6, 2020, 20:49 Francis Tyers wrote: > El 2020-03-06 08:40, Rajarshi Roychoudhury escribió: > > Hi, > > I have written my idea in the file attached .

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Francis Tyers
El 2020-03-06 08:40, Rajarshi Roychoudhury escribió: Hi, I have written my idea in the file attached . It is just the idea , not the project proposal . Kindly read the idea and give feedback on whether this can be a feasible GSoC project. Best, Rajarshi Please do not use proprietary formats

Re: [Apertium-stuff] GSOC 2020 idea

2020-03-06 Thread Rajarshi Roychoudhury
Hi, I have written my idea in the file attached . It is just the idea , not the project proposal . Kindly read the idea and give feedback on whether this can be a feasible GSoC project. Best, Rajarshi On Fri, 28 Feb 2020 at 06:31, Rajarshi Roychoudhury < rroychoudhu...@gmail.com> wrote: > Here

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
Here are some published papers on how character embeddings are used for classification. https://www.google.com/url?sa=t=web=j=https://arxiv.org/abs/1810.03595=2ahUKEwiu-ajdgvPnAhXXxzgGHQAWA3cQFjAVegQIDBAB=AOvVaw0LQ60M-KXtk-NGyAoVqmeU https://lsm.media.mit.edu/papers/tweet2vec_vvr.pdf

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Kevin Brubeck Unhammer
Tino Didriksen čálii: > One major issue specific to Apertium is that the source information is no > longer available in the target generation step. It might make sense to have something like this right after bilingual dictionary lookup (as an alternative or complement to lrx-proc). Perhaps a

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Tanmai Khanna
How exactly can characters predict sentiment? Don’t you still need some training data for pairs? English, Hindi, Bangla aren’t really low resource languages. Anyway, we can continue this discussion on the IRC so that it’ll be easier and more people can contribute to the discussion. Tanmai

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
To answer the question on how to analyse sentiment on low resource language , I think character embedding would be the best option. The words in the corpus is not exhaustive but the number of unique characters is certainly well deterministic. We can figure out the embedding weight for each

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
As I mentioned earlier, I would like to work on English-Hindi or English-Bengali translation, the dataset can be obtained from sentiwordnet for Indian languages, https://amitavadas.com/sentiwordnet.php which is by far the most resourceful dataset available for sentiment analysis.It contains data

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Tanmai Khanna
Hi, I have a few questions about this: 1. How would you analyse the sentiment of the source text? Considering the language pairs that Apertium deals with are low resource languages. 2. As Tino mentions, is there a problem of sentiment loss in Apertium? Any examples of this? 3. Doesn't the

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
The effect won't be very evident on simple sentences, I think it would be more effective on sentences where choice of words can decide the efficiency of translation. It's not about if "Watch out" could be " be careful" , it's about choosing words that can retain the urgency in "watch out".

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Scoop Gracie
So, "Watch out!" Could become "Be careful"? On Thu, Feb 27, 2020, 10:13 Rajarshi Roychoudhury wrote: > It is not just about minimizing loss of sentiment , it is about using > that information for better translation. A very trivial example would be > that for some situations , sentences can

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
It is not just about minimizing loss of sentiment , it is about using that information for better translation. A very trivial example would be that for some situations , sentences can project a strong sentiment and simple translation may not always yield the best result. However if we can use the

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Tino Didriksen
My first question would be, is this actually a problem for rule-based machine translation? I am not a linguist, but given how RBMT works I can't really see where sentiment would be lost in the process, especially because Apertium is designed for related languages where sentiment is mostly the

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
I just need to know which libraries are used(if any STL) to store the words and how the translation is actually done. I plan to use an ordered map to store the word as key and sentiment value as value . I can choose the one with best sentiment by running an iterative search. Or a better idea would

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Scoop Gracie
Oh okay. That should be fine. On Thu, Feb 27, 2020, 08:24 Rajarshi Roychoudhury wrote: > No I just need python to get the result, which can be written in a text > file and read using c++. It won't depend on python. > > On Thu, Feb 27, 2020, 21:52 Scoop Gracie wrote: > >> Oh, okay. So Python

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
No I just need python to get the result, which can be written in a text file and read using c++. It won't depend on python. On Thu, Feb 27, 2020, 21:52 Scoop Gracie wrote: > Oh, okay. So Python would not be needed at runtime? > > On Thu, Feb 27, 2020, 08:20 Rajarshi Roychoudhury < >

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Scoop Gracie
Oh, okay. So Python would not be needed at runtime? On Thu, Feb 27, 2020, 08:20 Rajarshi Roychoudhury wrote: > I just need to write the dictionary I would get in python in a file and > read it using c++. I guess I can use a map to solve my purpose. > > On Thu, Feb 27, 2020, 21:40 Scoop Gracie

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Rajarshi Roychoudhury
I just need to write the dictionary I would get in python in a file and read it using c++. I guess I can use a map to solve my purpose. On Thu, Feb 27, 2020, 21:40 Scoop Gracie wrote: > I believe it must use C++, so nltk won't work. > > On Wed, Feb 26, 2020, 23:17 Rajarshi Roychoudhury < >

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-27 Thread Scoop Gracie
I believe it must use C++, so nltk won't work. On Wed, Feb 26, 2020, 23:17 Rajarshi Roychoudhury wrote: > Formally i present my idea in this form: > From my understanding of RBMT , > > The RBMT system contains: > >- a *SL morphological analyser* - analyses a source language word and >

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Rajarshi Roychoudhury
Formally i present my idea in this form: >From my understanding of RBMT , The RBMT system contains: - a *SL morphological analyser* - analyses a source language word and provides the morphological information; - a *SL parser* - is a syntax analyser which analyses source language

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Scoop Gracie
It is absolutely fine to use languages you are most comfortable with. On Wed, Feb 26, 2020, 22:18 Rajarshi Roychoudhury wrote: > I need to study more about RBMT to develop an idea of how to preserve > sentiment while translating, which I think can increase the efficiency of > translation. It

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Rajarshi Roychoudhury
I need to study more about RBMT to develop an idea of how to preserve sentiment while translating, which I think can increase the efficiency of translation. It will also help my research , thank you so much for suggesting it. Also, will it be okay if I work on languages I am comfortable with? Say

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Scoop Gracie
I think it is worth looking into, it is just that anything that needs a neural network is not possible. I'm sure sentiment translation is possible in RBMT too. On Wed, Feb 26, 2020, 21:58 Rajarshi Roychoudhury wrote: > Ok,then I wont pursue this idea and will look for one in the idea list . > >

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Rajarshi Roychoudhury
Ok,then I wont pursue this idea and will look for one in the idea list . On Thu, 27 Feb 2020 at 11:10, Scoop Gracie wrote: > The main problem is that I don't believe there is a way to send > information down the pipeline without breaking stuff. > > On Wed, Feb 26, 2020, 21:37 Rajarshi

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Scoop Gracie
The main problem is that I don't believe there is a way to send information down the pipeline without breaking stuff. On Wed, Feb 26, 2020, 21:37 Rajarshi Roychoudhury wrote: > Thank you so much for the feedback,i will try to think of any other way of > doing this without using neural networks

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Rajarshi Roychoudhury
Thank you so much for the feedback,i will try to think of any other way of doing this without using neural networks or propose a new project http://wiki.apertium.org/wiki/Apertium_for_Dummies#The_units_of_translation is an excellent starting point for beginners, however it would be very helpful if

Re: [Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Scoop Gracie
I'm not an expert in this, but given the non-neural nature of Apertium, this does not seem feasible to me, at least in the way you described. On Wed, Feb 26, 2020, 21:02 Rajarshi Roychoudhury wrote: > Hi, > I am Rajarshi Roychoudhury,a second year undergraduate student at Jadavpur >

[Apertium-stuff] GSOC 2020 idea

2020-02-26 Thread Rajarshi Roychoudhury
Hi, I am Rajarshi Roychoudhury,a second year undergraduate student at Jadavpur University,Kolkata,India.I have done many projects in Natural Language Processing,mainly focussing on sentiment analysis and machine translation. Most of the machine translation have no explicit preservation on the