Re: [Apertium-stuff] Registration for wiki page

2020-03-27 Thread Ayush
Hey,
I want to submit a proposal for robust tokenisation task and requesting for a 
wiki page with name ayushPradhan or ayush0209.
Thanks and regards,
Ayush Pradhan
From: Flammie A Pirinen
Sent: 23 March 2020 05:56 PM
To: apertium-stuff@lists.sourceforge.net
Subject: Re: [Apertium-stuff] Registration for wiki page

On Mon, Mar 23, 2020 at 04:46:06PM +0530, Ayush wrote:
> Dear sir,
> Actually I have quite reached nowhere while going through the lttoolbox. Can 
> you please help me with making of schedule for the proposal and also what all 
> thinks I would be working under for the task of robust tokenisation. I know 
> that I have to update lttoolbox to be fully Unicode but how?

Hi,
the lttoolbox part of the code is one that is also not my area of
expertise and it would be a good thing for the application to recruit a
co-mentor or advisor who knows lttoolbox internals. That said, I would
suggest to start figuring out just the user point of view of
tokenisation at the moment, take a handful of languages from current
apertium set, e.g. English, Finnish, Kazakh, Norwegian, German, and
maybe some spaceless script if there are any. Find kind of test cases
how they work currently and where they could improve and approach the
gsoc schedule as a test-driven software engineering project. It may be
hard to spread such schedule to three months timeline but when you have
some targets uncovered like so we can discuss what additional steps are
likely to take time-. 
>  

-- 
Regards, Flammie 
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)

___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] GSoC Proposal Evaluation

2020-03-27 Thread Rajarshi Roychoudhury
Hi,
I have  made some  changes in the proposal due to some  errors and to
incorporate  the minor changes in the GSoC timeline .
Kindly review the proposal and give me your feedback on whether it is ready
to be submitted.
http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal

Best regards,
Rajarshi
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Please review my proposal draft

2020-03-27 Thread 杨伟哲
IMO the most difficult thing to tokenize for CJK, especially
Chinese, will be the segmentation of words. Because they don't
separate characters and words by delimiters. They always appear
as a string of characters and words. Another problem is that in
Chinese, the same sentence can be interpreted entirely as different
meanings depending on the ways of sentence segmentation.

Apertium has already had a *Chinese dictionary*[1], and I have
compiled and tested its functionality with lt-comp and lt-proc before.

Apertium's tokenization of Chinese seems to go something like this:
A dictionary is ready with commonly used characters and commonly
used words. After the program reads a string of characters, if there
are several characters that combine a word that in the dictionary, the
characters will be considered as a whole, regarded as a word to be
tokenized and analyzed. For other characters that failed to combine
words, all of them will be individually tokenized and analyzed as a
lexeme. As far as I know, Apertium has not yet been implemented the
translation function from Chinese to other languages.

Weizhe

[1] https://github.com/apertium/apertium-zho

On Fri, Mar 27, 2020 at 9:49 PM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Fri, Mar 27, 2020 at 09:58:53AM +0800, 杨伟哲 wrote:
> >
> > Of course, as a Chinese student, I would also be very happy to work
> > on the CJK. We can keep communicating about the tweaks of the plan
> > and the other details.
>
> Awesome, could you perhaps then make even a small example of how
> apertium would currently tokenise any Chinese language and how that
> would be improved. If/when there is no existing apertium dictionary you
> can make a toy example with just a handful of words, this would be very
> interesting.
>
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> , Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> .
> I tend to follow inline-posting style in desktop e-mail messages.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Guidance for hin-pan language pair development

2020-03-27 Thread Priyank Modi
Hi all,
I've completed the preliminary draft of my proposal and would really
appreciate your comments/suggestions on the same :
http://wiki.apertium.org/wiki/Pmodi/GSOC_2020_proposal:_Hindi-Punjabi

Francis(firstly sorry for cc'ing you personally), since you have been
managing the repo, could you review my coding challenge(I believe you know
the script).

Warm Regards,
Priyank Modi

On Sat, Mar 21, 2020 at 1:11 PM Hèctor Alòs i Font 
wrote:

> Hi Prinyak,
>
> Yes, I now see that the Hindi गलत__adj paradigm is like this, and the
> Punjabi ਗਲਤ__adj seems to be a copy of it.
>
> I can only say that we do differently in the Romance languages I work
> with. I can say that the "Hindi method" is bad. It works for Hindi-Urdu,
> doesn't it? This makes morphological disambiguation harder, but probably
> transfer is easier.
>
> I agree with you that, since apertium-urd-hin is released, apertium-hin
> should be quite reliable, so you should concentrate on Punjabi.
> Nevertheless, according to my experience, it is not unusual that a language
> package with just one released pair needs some improvement too. This
> happens especially in cases like Urdu-Hindi, when the pair language is one
> extremely close-related. For instance, if morphological disambiguation is
> only superficially done, there won't be any problem for a translation into
> Urdu because almost all the time the same ambiguity will exist in Urdu too.
> But when translating to a less close-related language problems arise, and
> more work on disambiguation has to be done.
>
> Best,
> Hèctor
>
> Missatge de Priyank Modi  del dia ds., 21 de
> març 2020 a les 9:22:
>
>> By the way, it seems strange that you have 9 analyses for this adjective.
>>> Usually in these cases we put only the first analysis in the dictionary.
>>> The other, in really needed, can be added as .
>>
>>
>> Regarding this, I found a number of such anomalies in the Hindi monodix,
>> and tried to resolve some of them by asking mentors on irc. But since
>> urdu-hindi is a released pair(and hence the hindi monodix should have been
>> reviewed) I have tried to add similar rules in the Punjabi monodix as well.
>> This will have to be fixed in the final version. I guess following your
>> suggestion, I'll add to my list of (possible) errors I find in current hin,
>> hin-pan dictionaries and report the same in the proposal. This will also
>> help me in getting quick feedback on most of these so that I can alteast
>> bring the hindi monodix up to a reviewed and correct state during the
>> duration between post-proposal and acceptance period. :D
>>
>> Does this look good?
>> Thanks.
>>
>> On Sat, Mar 21, 2020 at 11:37 AM Priyank Modi 
>> wrote:
>>
>>> Hi Hector,
>>> Thank you so much for taking time to look at my challenge in detail and
>>> providing the feedback. I already understand this error and will work on
>>> removing all '#' symbols in the final submission of my coding challenge. To
>>> start with, the number of '#'s were atleast 3-4 times of what I have
>>> currently. Quite a few of these still exist because these words were
>>> already added to bidix but the monodix for Punjabi was almost empty when I
>>> started off(u can check the original repo in the incubator).
>>> Anyways, this has been really helpful and I'll make sure to improve on
>>> this. Since you couldn't read the script, I should tell you that I'm able
>>> to achieve close to human translation for most of these test sentences (as
>>> said earlier, I'll be including an analysis in my proposal explaining the
>>> translations in ipa, with which I'll need your help in reviewing as well )
>>>
>>> I was able to find some dictionaries and parallel texts for both
>>> languages. Is there anything else I can do right now? Could you help me
>>> with some references on the use of case markers during translation as well?
>>> :)
>>>
>>> Thank you again.
>>>
>>> Warm regards,
>>> Priyank
>>>
>>>
>>> On Sat 21 Mar, 2020, 10:49 AM Hèctor Alòs i Font, 
>>> wrote:
>>>
 Hi Prinyak,

 I've been looking at you coding challenge. I can't understand anything,
 but I see the symbol # relatively often. That is annoying. See:
 http://wiki.apertium.org/wiki/Apertium_stream_format#Special

 This happens, for instance, when in the bidix the target word has a
 given gender and/or case, but in the monodix it has another. The lemma is
 recognized, but there isn't any information for generating the surface form
 as received from the bidix + transfer.

 Using apertium-viewer, I analysed this case:

 सब
 ^सब/सब/सब/सब/सब/सब/सब/सब/सब/सब/सब/सब$

 ^सब/सब/सब$
 ^सब$
 ^सब/ਸਭ$
 ^default{^ਸਭ$}$
 ^ਸਭ$
 #ਸਭ

 As expected, the problem is that ^ਸਭ$
 cannot be generated.

 Then I do:
 apertium-pan$ echo "ਸਭ" | apertium -d . pan_Guru-disam
 "<ਸਭ>"
 "ਸਭ" adj mfn sp
 "ਸਭ" adj m sg nom
 "ਸਭ" adj m sg obl
 "ਸਭ" adj m pl nom
 "ਸਭ" adj m pl obl
 

Re: [Apertium-stuff] Please review my proposal draft

2020-03-27 Thread Tommi A Pirinen
On Fri, Mar 27, 2020 at 09:58:53AM +0800, 杨伟哲 wrote:
> 
> Of course, as a Chinese student, I would also be very happy to work
> on the CJK. We can keep communicating about the tweaks of the plan
> and the other details.

Awesome, could you perhaps then make even a small example of how
apertium would currently tokenise any Chinese language and how that
would be improved. If/when there is no existing apertium dictionary you
can make a toy example with just a handful of words, this would be very
interesting.


-- 
Doktor Tommi A Pirinen, Computational Linguist,
, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
.
I tend to follow inline-posting style in desktop e-mail messages.


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
I have modified the proposal for better explanation of the process. Kindly
give a look at it. The bilingual dictionary needs some work to be done, I
didn't time to complete it as I was busy determining the sentiment tag . I
will try to incorporate it as soon as possible. Please suggest if any
changes are needed to be made.
http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal.

Rajarshi



On Fri, Mar 27, 2020, 16:34 Rajarshi Roychoudhury 
wrote:

> The sentiment tags will help to form more detailed and diverse patterns
> which can help to form better rules to disambiguate, lexical selection and
> reorder .
> As far as those languages where sentiwordnet does not exist, a linguist
> will be able to determine sentiment polarity. Since i have the resource, I
> can use that for a neural network method.
>
> On Fri, 27 Mar 2020 at 16:22, Tanmai Khanna 
> wrote:
>
>> Hey I have one doubt,
>> The examples given for mistranslation, I didn't quite understand how
>> sentiment analysis would fix those.
>> Also what about languages for which a SentiWordNet doesn't exist?
>>
>> Thanks and Regards,
>> Tanmai
>>
>> On Fri, Mar 27, 2020 at 3:56 PM Rajarshi Roychoudhury <
>> rroychoudhu...@gmail.com> wrote:
>>
>>> Hi,
>>> I have finished writing my proposal , wrote a code on how to do
>>> sentiment analysis with character embedding as a coding challenge, added
>>> words to monolingual and bilingual dictionaries and designed a constraint
>>> grammar. I am working  on building the bidix and lrx files for now.. Would
>>> be very helpful if someone could review my application and give feedback.
>>> http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal
>>>
>>> On Mon, 23 Mar 2020 at 15:45, Tino Didriksen 
>>> wrote:
>>>
 "A randomly generated password for Rroychoudhury has been sent to
 rroychoudhu...@gmail.com."

 -- Tino Didriksen


 On Mon, 23 Mar 2020 at 03:10, Rajarshi Roychoudhury <
 rroychoudhu...@gmail.com> wrote:

> I have completed writing my gsoc proposal, can I get a wiki account?
>
> Username: rroychoudhury
> email: rroychoudhu...@gmail.com
>
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>>
>> --
>> *Khanna, Tanmai*
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
The sentiment tags will help to form more detailed and diverse patterns
which can help to form better rules to disambiguate, lexical selection and
reorder .
As far as those languages where sentiwordnet does not exist, a linguist
will be able to determine sentiment polarity. Since i have the resource, I
can use that for a neural network method.

On Fri, 27 Mar 2020 at 16:22, Tanmai Khanna  wrote:

> Hey I have one doubt,
> The examples given for mistranslation, I didn't quite understand how
> sentiment analysis would fix those.
> Also what about languages for which a SentiWordNet doesn't exist?
>
> Thanks and Regards,
> Tanmai
>
> On Fri, Mar 27, 2020 at 3:56 PM Rajarshi Roychoudhury <
> rroychoudhu...@gmail.com> wrote:
>
>> Hi,
>> I have finished writing my proposal , wrote a code on how to do sentiment
>> analysis with character embedding as a coding challenge, added words to
>> monolingual and bilingual dictionaries and designed a constraint grammar. I
>> am working  on building the bidix and lrx files for now.. Would be very
>> helpful if someone could review my application and give feedback.
>> http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal
>>
>> On Mon, 23 Mar 2020 at 15:45, Tino Didriksen 
>> wrote:
>>
>>> "A randomly generated password for Rroychoudhury has been sent to
>>> rroychoudhu...@gmail.com."
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Mon, 23 Mar 2020 at 03:10, Rajarshi Roychoudhury <
>>> rroychoudhu...@gmail.com> wrote:
>>>
 I have completed writing my gsoc proposal, can I get a wiki account?

 Username: rroychoudhury
 email: rroychoudhu...@gmail.com

>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> *Khanna, Tanmai*
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Tanmai Khanna
Hey I have one doubt,
The examples given for mistranslation, I didn't quite understand how
sentiment analysis would fix those.
Also what about languages for which a SentiWordNet doesn't exist?

Thanks and Regards,
Tanmai

On Fri, Mar 27, 2020 at 3:56 PM Rajarshi Roychoudhury <
rroychoudhu...@gmail.com> wrote:

> Hi,
> I have finished writing my proposal , wrote a code on how to do sentiment
> analysis with character embedding as a coding challenge, added words to
> monolingual and bilingual dictionaries and designed a constraint grammar. I
> am working  on building the bidix and lrx files for now.. Would be very
> helpful if someone could review my application and give feedback.
> http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal
>
> On Mon, 23 Mar 2020 at 15:45, Tino Didriksen 
> wrote:
>
>> "A randomly generated password for Rroychoudhury has been sent to
>> rroychoudhu...@gmail.com."
>>
>> -- Tino Didriksen
>>
>>
>> On Mon, 23 Mar 2020 at 03:10, Rajarshi Roychoudhury <
>> rroychoudhu...@gmail.com> wrote:
>>
>>> I have completed writing my gsoc proposal, can I get a wiki account?
>>>
>>> Username: rroychoudhury
>>> email: rroychoudhu...@gmail.com
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC 2020 idea

2020-03-27 Thread Rajarshi Roychoudhury
Hi,
I have finished writing my proposal , wrote a code on how to do sentiment
analysis with character embedding as a coding challenge, added words to
monolingual and bilingual dictionaries and designed a constraint grammar. I
am working  on building the bidix and lrx files for now.. Would be very
helpful if someone could review my application and give feedback.
http://wiki.apertium.org/wiki/User:Rroychoudhury/GSoC_2020_Proposal

On Mon, 23 Mar 2020 at 15:45, Tino Didriksen  wrote:

> "A randomly generated password for Rroychoudhury has been sent to
> rroychoudhu...@gmail.com."
>
> -- Tino Didriksen
>
>
> On Mon, 23 Mar 2020 at 03:10, Rajarshi Roychoudhury <
> rroychoudhu...@gmail.com> wrote:
>
>> I have completed writing my gsoc proposal, can I get a wiki account?
>>
>> Username: rroychoudhury
>> email: rroychoudhu...@gmail.com
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC--Apertium Website Development

2020-03-27 Thread Mohit Kumar Verma
Hi

So now, I have my GSoC proposal. Please let me know what kind of things are
missing and which things I should exclude.
Here is the link
http://wiki.apertium.org/wiki/User:Yaimgr8/GSoC_2020_Proposal

Thanks

On Fri, Mar 27, 2020 at 3:27 AM Mohit Kumar Verma  wrote:

> Thanks. This is what I was looking for.
> New to wiki so was not able to create pages.
>
>
> On Thu, 26 Mar, 2020, 10:31 PM Scoop Gracie, 
> wrote:
>
>> You can edit
>> http://wiki.apertium.org/wiki/User:Yaimgr8/GSoC_2020_Proposal
>>
>> On Thu, Mar 26, 2020, 08:56 Xavi Ivars  wrote:
>>
>>> Don't you have an account already? Please go ahead and start working on
>>> it.
>>>
>>> Missatge de Mohit Kumar Verma  del dia dj., 26 de
>>> març 2020 a les 16:37:
>>>
 I want to ask that where is the option to create draft proposal in the
 wiki?

 Thanks.

 On Thu, Mar 26, 2020 at 8:57 PM Xavi Ivars 
 wrote:

> hi Mohit,
>
> The best thing is that you write a draft proposal in the wiki, and
> send a link to the list, so mentors can discuss the proposal with you.
>
> If you said you have new ideas to implement, please add them to the
> proposal as a starting point for the conversation.
>
> Xavi
>
> Missatge de Mohit Kumar Verma  del dia dj., 26 de
> març 2020 a les 14:17:
>
>> Hi
>> I wanted to ask that what do you think can be accomplished in the
>> project idea: Improvements to the Apertium Website.
>> The more I think, the more I get new ideas to implement but they are
>> just too much to be done in 3 month period.
>> Can you please suggest how many tasks and what type of tasks I should
>> include in the timeline.
>>
>> Thanks
>> Mohit
>>
>>
>>
>> On Thu, Mar 26, 2020 at 3:49 PM Shrey Modi 
>> wrote:
>>
>>> Hey Mohit
>>> For review send it in the mailing list.
>>>
>>> All The Best
>>> Shrey Modi
>>>
>>> On Thu, 26 Mar 2020 at 14:03, Mohit Kumar Verma 
>>> wrote:
>>>
 Hi
 My GSoC proposal is ready. I want to send it for a review before
 putting it on the GSoC website. Where should I send it?

 Thanks
 Mohit

 On Tue, Mar 24, 2020 at 6:35 PM Tino Didriksen <
 m...@tinodidriksen.com> wrote:

> "A randomly generated password for Yaimgr8 has been sent to
> yaim...@gmail.com."
>
> -- Tino Didriksen
>
>
> On Tue, 24 Mar 2020 at 13:21, Mohit Kumar Verma 
> wrote:
>
>> Hi
>> I am interested in project: Apertium Website Development and I
>> will send proposal for it.
>> I am requesting for wiki account.
>> username:  yaimgr8
>> email id: yaim...@gmail.com
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> < Xavi Ivars >
> < http://xavi.ivars.me >
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>>
>>>
>>> --
>>> < Xavi Ivars >
>>> < http://xavi.ivars.me >
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff