[Wikimedia-l] Security: phishing attack via Google Docs

2017-05-03 Thread Pine W
There are reports of a "sophisticated" phishing attack on Google Docs users 
today. I believe that many WMF staff, Wikimedia affiliate staff, and individual 
Wikimedians make use of Google apps, so I'm forwarding this news to Wikimedia 
lists. Victims of this attack may have had their data accessed.
http://money.cnn.com/2017/05/03/technology/google-docs-phishing-attack/
https://www.theverge.com/2017/5/3/15537064/google-docs-phishing-attack-fixed
Pine
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Recognition of the Wikivoyage Association

2017-05-03 Thread Lodewijk
Very pleased to hear that the Wikivoyage Association, which exists longer
than most chapters and predates the adoption of Wikivoyage as a Wikimedia
projects. Recognition well deserved! I hope this will help expand their
userbase and activities.

Lodewijk

On Wed, May 3, 2017 at 9:17 PM, Kirill Lokshin 
wrote:

> Hi everyone!
>
> I'm very happy to announce that the Affiliations Committee has recognized
> the Wikivoyage Association [1] as a Wikimedia User Group.  The group plans
> to support Wikivoyage in various ways, including fundraising, promotion,
> and technical development.
>
> Please join me in congratulating the members of this new user group!
>
> Regards,
> Kirill Lokshin
> Chair, Affiliations Committee
>
> [1] https://meta.wikimedia.org/wiki/Wikivoyage_Association
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] [Wikimedia Announcements] Voting has begun in 2017 Wikimedia Foundation Board of Trustees elections

2017-05-03 Thread Joe Sutherland
This is a message from the Wikimedia Foundation Elections Committee.[1]
Translations are available: <
https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2017/Updates/Board_voting_has_begun
>

Voting has begun for eligible voters in the 2017 elections for the
Wikimedia Foundation Board of Trustees.

Direct voting link: <
https://meta.wikimedia.org/wiki/Special:SecurePoll/vote/341>
Check that you are eligible: <
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/2017#Requirements
>

The Wikimedia Foundation Board of Trustees[2] is the ultimate governing
authority of the Wikimedia Foundation, a 501(c)(3) non-profit organisation
registered in the United States. The Wikimedia Foundation manages many
diverse projects such as Wikipedia and Commons.

The voting phase lasts from 00:00 UTC May 1 to 23:59 UTC May 14.

*Vote here: >*

More information on the candidates and the elections can be found on the
2017 Board of Trustees election page on Meta-Wiki.[3]

On behalf of the Elections Committee,
Katie Chan, Chair, Wikimedia Foundation Elections Committee[4]
Joe Sutherland, Community Advocate, Wikimedia Foundation[5]

[1]
https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections_committee
[2] https://meta.wikimedia.org/wiki/Special:MyLanguage/Board_of_Trustees
[3]
https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2017/Board_of_Trustees
[4] https://meta.wikimedia.org/wiki/User:KTC
[5] https://meta.wikimedia.org/wiki/User:JSutherland_(WMF)

--
Joe Sutherland
Community Advocate
Wikimedia Foundation
___
Please note: all replies sent to this mailing list will be immediately directed 
to Wikimedia-l, the public mailing list of the Wikimedia community. For more 
information about Wikimedia-l:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
___
WikimediaAnnounce-l mailing list
wikimediaannounc...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaannounce-l
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Recognition of the Wikivoyage Association

2017-05-03 Thread Kirill Lokshin
Hi everyone!

I'm very happy to announce that the Affiliations Committee has recognized
the Wikivoyage Association [1] as a Wikimedia User Group.  The group plans
to support Wikivoyage in various ways, including fundraising, promotion,
and technical development.

Please join me in congratulating the members of this new user group!

Regards,
Kirill Lokshin
Chair, Affiliations Committee

[1] https://meta.wikimedia.org/wiki/Wikivoyage_Association
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread Yaroslav Blanter
My idea was that adding an extra button to press brings the probability of
the whole process down. If someone is determine to systemically add bad
machine translations to the main namespace I guess only blocks could help.
On the other hand, and extra button gives at leat an opportunity to read
the result and reflect on it.

Cheers
Yaroslav
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Wikimedia District of Columbia: New Institutional Partnerships Manager

2017-05-03 Thread Robert Fernandez
Wikimedia District of Columbia is pleased to announce the hiring of our
first employee.   Ariel Cetrone has joined WMDC as our Institutional
Partnerships Manager, where she will plan and facilitate events with our
many institutional partners, including cultural, academic, and professional
organizations.  Her work will help expand our outreach role in DC and
surrounding states and maximize the effectiveness of our volunteer time and
energy. A native of Philadelphia, Cetrone is a graduate of George
Washington University and Drexel University.  She previously worked for
Historic RittenhouseTown, a Philadelphia nonprofit organization and
historic site, and the DC Commission on the Arts and Humanities.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Voting has begun in 2017 Wikimedia Foundation Board of Trustees elections

2017-05-03 Thread Joe Sutherland

This is a message from the Wikimedia Foundation Elections Committee.[1] Translations 
are available: 


Voting has begun for eligible voters in the 2017 elections for the Wikimedia 
Foundation Board of Trustees.

Direct voting link: 

Check that you are eligible: 


The Wikimedia Foundation Board of Trustees[2] is the ultimate governing 
authority of the Wikimedia Foundation, a 501(c)(3) non-profit organisation 
registered in the United States. The Wikimedia Foundation manages many diverse 
projects such as Wikipedia and Commons.

The voting phase lasts from 00:00 UTC May 1 to 23:59 UTC May 14.

Vote here: 

More information on the candidates and the elections can be found on the 2017 
Board of Trustees election page on Meta-Wiki.[3]

On behalf of the Elections Committee,
Katie Chan, Chair, Wikimedia Foundation Elections Committee[4]
Joe Sutherland, Community Advocate, Wikimedia Foundation[5]

[1] 
https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections_committee
[2] https://meta.wikimedia.org/wiki/Special:MyLanguage/Board_of_Trustees
[3] 
https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2017/Board_of_Trustees
[4] https://meta.wikimedia.org/wiki/User:KTC
[5] https://meta.wikimedia.org/wiki/User:JSutherland_(WMF)

--
Joe Sutherland
Community Advocate
Wikimedia Foundation

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] What's making you happy this week? (Week of 30 April 2017)

2017-05-03 Thread James Heilman
Marielle there are a lot of great medical images in that textbook, Fae is
there an ability to upload the images to commons by bot?

James

On Wed, May 3, 2017 at 1:06 AM, Marielle Volz 
wrote:

> What made me happy this week was the discovery of some good scientific
> imagery that was openly licensed!
>
> The USDA has created a bunch of identification sites for species of
> agricultural interest and released the images into the public domain.
> I was looking for images of a particular mite and discovered the Bee
> Mite site has released most of their images and all of their text to
> the PD [1]. (I have uploaded to commons although done a bit of a hack
> job on it). There are other sites which would also be a candidate for
> batch upload, which are listed here: http://idtools.org/identify.php
> (anyone interested in molluscs?)
>
> I have also discovered this Clinical Skills textbook licensed under CC
> by 4 attribution.[2] I am in the process of adding some high quality
> medical diagrams to articles on wiki. This same website hosts a bunch
> of other open text books which may be a similarly good source of
> content: https://opentextbc.ca/
>
> [1] http://idtools.org/id/mites/beemites/
> [2] https://opentextbc.ca/clinicalskills/
>
> On Wed, May 3, 2017 at 7:08 AM, Kalliope Tsouroupidou
>  wrote:
> > +1 on this.
> > News of the newly recognised User Group put a smile on my face :)
> >
> > K.
> >
> > On Wed, May 3, 2017 at 4:25 AM, Pine W  wrote:
> >
> >> I'm happy to see the development of the Commons Photographers User Group
> >> .
> >>
> >> Personal background story (feel free to skip reading this):
> >>
> >> The first DSLR I touched was easy to use with the automatic settings for
> >> indoor photography in good lighting. Based on this limited experience, I
> >> concluded that photography with a DSLR was easy. Some time later I
> bought
> >> my own first DSLR, and quickly got lost. The menus were not intuitive
> to me
> >> as a DSLR newbie, there were new terms like "aperture" and "f-stop", the
> >> manual was written for someone who already had good technical knowledge
> of
> >> how cameras work, and my lens wouldn't focus like I wanted. Wikipedia
> has
> >> some helpful articles about photography concepts, but what would have
> >> helped me a lot is spending time with an experienced photographer.
> After a
> >> few years of trial and error, and asking questions of more knowledgeable
> >> people, I'm happy with my skill level as a photography hobbyist in a
> >> variety of situations. I hope that the new user Commons Photographers
> group
> >> will facilitate knowledge exchange, improve camaraderie, and consider
> ways
> >> to improve access to equipment -- especially for photographers in
> >> situations where resources are scarce and potential for valuable
> >> open-source contributions are very high.
> >>
> >> What's making you happy this week?
> >>
> >> Pine
> >> ___
> >> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> >> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> >> wiki/Wikimedia-l
> >> New messages to: Wikimedia-l@lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> 
> >
> >
> >
> >
> > --
> > Kalliope Tsouroupidou
> > Community Advocate
> > Wikimedia Foundation
> > ___
> > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 




-- 
James Heilman
MD, CCFP-EM, Wikipedian

The Wikipedia Open Textbook of Medicine
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] [Wikitech-l] Tech Talk: A Gentle Introduction to Wikidata for Absolute Beginners [including non-techies!]

2017-05-03 Thread יגאל חיטרון
Thanks a lot!
Igal

On May 3, 2017 07:56, "Asaf Bartov"  wrote:

> Hello again.
>
> I'd like to let you know that thanks to Victor Grigas, the Commons video of
> this talk (link below) now has English subtitles, synced to the talk.  This
> should make it easier to *translate the subtitles *to make this video
> useful for fellow Wikimedians in other languages.
>
> It's a very long tutorial, so this would be a significant effort, perhaps
> best taken by a group, or piece by piece.  If you do complete the
> translation in any language, I'd love to hear about it.
>
> Cheers,
>
>Asaf
>
> On Thu, Feb 9, 2017 at 10:32 PM Asaf Bartov  wrote:
>
> > Here's the (3-hour) footage of the detailed Wikidata tutorial delivered
> > today:
> >
> > on Commons:
> > https://commons.wikimedia.org/wiki/File:A_Gentle_
> Introduction_to_Wikidata_for_Absolute_Beginners_(including_
> non-techies!).webm
> >
> > on YouTube: https://www.youtube.com/watch?v=eVrAx3AmUvA
> >
> > the slides:
> > https://commons.wikimedia.org/wiki/File:Wikidata_-_A_Gentle_
> Introduction_for_Complete_Beginners_(WMF_February_2017).pdf
> >
> > It covers what Wikidata is (00:00), how to contribute new data to
> Wikidata
> > (1:09:34), how to create an entirely new item on Wikidata (1:27:07), how
> to
> > embed data from Wikidata into pages on other wikis (1:52:54), tools like
> > the Wikidata Game (1:39:20), Article Placeholder (2:01:01), Reasonator
> > (2:54:15) and Mix-and-match (2:57:05), and how to query Wikidata
> (including
> > SPARQL examples) (starting 2:05:05).
> >
> > Share and enjoy. :)
> >
> >A.
> >
> > On Fri, Feb 3, 2017 at 4:35 PM Rachel Farrand 
> > wrote:
> >
> >> Please join for the following talk:
> >>
> >> *Tech Talk**:* A Gentle Introduction to Wikidata for Absolute Beginners
> >> [including non-techies!]
> >> *Presenter:* Asaf Bartov
> >> *Date:* February 09, 2017
> >> *Time: *19:00 UTC
> >> <
> >> https://www.timeanddate.com/worldclock/fixedtime.html?msg=
> Tech+Talk%3A+A+Gentle+Introduction+to+Wikidata+for+Absolute+Beginners+%
> 5Bincluding+non-techies%21%5D+=20170209T19=1440=3
> >> >
> >> Link to live YouTube stream  watch?v=eVrAx3AmUvA>
> >> *IRC channel for questions/discussion:* #wikimedia-office
> >>
> >> *Summary: *This talk will introduce you to the Wikimedia Movement's
> latest
> >> major wiki project: Wikidata. We will cover what Wikidata is, how to
> >> contribute, how to embed Wikidata into articles on other wikis, tools
> like
> >> the Wikidata Game, and how to query Wikidata (including SPARQL
> examples).
> >> ___
> >> Wikitech-l mailing list
> >> wikitec...@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> >
> ___
> Wikitech-l mailing list
> wikitec...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread Amir E. Aharoni
2017-05-03 14:06 GMT+03:00 David Cuenca Tudela :

> Perhaps it would be a good idea to compare the translated text to the text
> that the user wants to save.
>
> If they are more than 95% the same, that means that the user didn't take
> the effort to correct the text.
>
> Cheers,
> Micru
>
>
As I noted, this already exists. Set at 75%. Can be changed.


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread Ziko van Dijk
Hello,
This seems to me like a social problem, rather than a technical one.
Shutting down the tool would be a disadvantage for those people who benefit
from the tool and do good things with it.
What is the general opinion among the Norwegians about this issue? Is there
consent about how to deal with this kind of "articles"? If most people
agree they should be speedy-deleted, this would be a useful deterrence for
those who are not careful enough when using the tool?
Kind regards
Ziko



2017-05-03 13:22 GMT+02:00 John Erling Blad :

> Agree! I also wonder if translators adapt to specific errors if they are
> repeated to often. I wonder if it works like priming the brain to a
> specific pattern.
>
> On Wed, May 3, 2017 at 1:15 PM, Lodewijk 
> wrote:
>
> > Reading this, I get a strong impression the problem may very well be in
> > setting expectations for the users of this translation tool. If they
> expect
> > the automated translation to be rather good, they may get fed up more
> > easily than when they consider it primarily a glorified dictionary.
> >
> > Lodewijk
> >
> > On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela 
> > wrote:
> >
> > > Perhaps it would be a good idea to compare the translated text to the
> > text
> > > that the user wants to save.
> > >
> > > If they are more than 95% the same, that means that the user didn't
> take
> > > the effort to correct the text.
> > >
> > > Cheers,
> > > Micru
> > >
> > > On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich 
> > > wrote:
> > >
> > > > It does depend a lot on the engagement level of the human behind the
> > > > keyboard. When I deal with machine-translated text, I simply wonder
> > > whether
> > > > the someone behind the keyboard took efforts to actually read the
> > piece.
> > > >
> > > > Now whether this would work if limited to namespaces outside "main"
> - I
> > > do
> > > > not want to demonise the issue, but if the person submitting the text
> > for
> > > > machine translation does not read it, what will stop them from a
> quick
> > > > ctrl+c / ctrl+v? Just asking.
> > > >
> > > > Wojciech
> > > >
> > > > W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
> > > >
> > > > Creating machine translations only in the draft space (or in the user
> > > space
> > > >> in the projects which do not have draft) could help.
> > > >>
> > > >> Cheers
> > > >> Yaroslav
> > > >>
> > > >> On Tue, May 2, 2017 at 10:16 PM, Pharos <
> pharosofalexand...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> I think it all depends on the level of engagement of the human
> > > translator.
> > > >>>
> > > >>> When the tool is used in the right way, it is a fantastic tool.
> > > >>>
> > > >>> Maybe we can find better methods to nudge people toward taking
> their
> > > time
> > > >>> and really doing work on their translations.
> > > >>>
> > > >>> Thanks,
> > > >>> Pharos
> > > >>>
> > > >>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
> > > >>> bodhisattwa.rg...@gmail.com> wrote:
> > > >>>
> > > >>> Content translation with Yandex is also a problem in Bengali
> > Wikipedia.
> > >  Some users have grown a tendency to create machine translated
> > >  meaningless
> > >  articles with this extension to increase edit count and article
> > count.
> > > 
> > > >>> This
> > > >>>
> > >  has increased the workloads of admins to find and delete those
> > > articles.
> > > 
> > >  Yandex is not ready for many languages and it is better to shut
> it.
> > We
> > >  don't need it in Bengali.
> > > 
> > >  Regards
> > >  On May 3, 2017 12:17 AM, "John Erling Blad" 
> > wrote:
> > > 
> > >  Actually this _is_ about turning ContentTranslation off, that is
> > what
> > > > several users in the community want. They block people using the
> > > >
> > >  extension
> > > 
> > > > and delete the translated articles. Use of ContentTranslation has
> > > >
> > >  become
> > > >>>
> > >  a
> > > 
> > > >   rather contentious case.
> > > >
> > > > Yandex as a general translation engine to be able to read some
> > alien
> > > > language is quite good, but as an engine to produce written text
> it
> > > is
> > > >
> > >  not
> > > 
> > > > very good at all. In fact it often creates quite horrible
> > Norwegian,
> > > >
> > >  even
> > > >>>
> > >  for closely related languages. One quite common problem is
> > reordering
> > > >
> > >  of
> > > >>>
> > >  words into meaningless constructs, an other problem is reordering
> > > >
> > >  lexical
> > > >>>
> > >  gender in weird ways. The English preposition "a" is often
> > translated
> > > >
> > >  as
> > > >>>
> > >  "en" in a propositional phrase, and then the gender is added to
> the
> > > > following phrase. That gives a translation of  "Oppland is a
> county
> > 

Re: [Wikimedia-l] machine translation

2017-05-03 Thread John Erling Blad
Agree! I also wonder if translators adapt to specific errors if they are
repeated to often. I wonder if it works like priming the brain to a
specific pattern.

On Wed, May 3, 2017 at 1:15 PM, Lodewijk 
wrote:

> Reading this, I get a strong impression the problem may very well be in
> setting expectations for the users of this translation tool. If they expect
> the automated translation to be rather good, they may get fed up more
> easily than when they consider it primarily a glorified dictionary.
>
> Lodewijk
>
> On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela 
> wrote:
>
> > Perhaps it would be a good idea to compare the translated text to the
> text
> > that the user wants to save.
> >
> > If they are more than 95% the same, that means that the user didn't take
> > the effort to correct the text.
> >
> > Cheers,
> > Micru
> >
> > On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich 
> > wrote:
> >
> > > It does depend a lot on the engagement level of the human behind the
> > > keyboard. When I deal with machine-translated text, I simply wonder
> > whether
> > > the someone behind the keyboard took efforts to actually read the
> piece.
> > >
> > > Now whether this would work if limited to namespaces outside "main" - I
> > do
> > > not want to demonise the issue, but if the person submitting the text
> for
> > > machine translation does not read it, what will stop them from a quick
> > > ctrl+c / ctrl+v? Just asking.
> > >
> > > Wojciech
> > >
> > > W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
> > >
> > > Creating machine translations only in the draft space (or in the user
> > space
> > >> in the projects which do not have draft) could help.
> > >>
> > >> Cheers
> > >> Yaroslav
> > >>
> > >> On Tue, May 2, 2017 at 10:16 PM, Pharos  >
> > >> wrote:
> > >>
> > >> I think it all depends on the level of engagement of the human
> > translator.
> > >>>
> > >>> When the tool is used in the right way, it is a fantastic tool.
> > >>>
> > >>> Maybe we can find better methods to nudge people toward taking their
> > time
> > >>> and really doing work on their translations.
> > >>>
> > >>> Thanks,
> > >>> Pharos
> > >>>
> > >>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
> > >>> bodhisattwa.rg...@gmail.com> wrote:
> > >>>
> > >>> Content translation with Yandex is also a problem in Bengali
> Wikipedia.
> >  Some users have grown a tendency to create machine translated
> >  meaningless
> >  articles with this extension to increase edit count and article
> count.
> > 
> > >>> This
> > >>>
> >  has increased the workloads of admins to find and delete those
> > articles.
> > 
> >  Yandex is not ready for many languages and it is better to shut it.
> We
> >  don't need it in Bengali.
> > 
> >  Regards
> >  On May 3, 2017 12:17 AM, "John Erling Blad" 
> wrote:
> > 
> >  Actually this _is_ about turning ContentTranslation off, that is
> what
> > > several users in the community want. They block people using the
> > >
> >  extension
> > 
> > > and delete the translated articles. Use of ContentTranslation has
> > >
> >  become
> > >>>
> >  a
> > 
> > >   rather contentious case.
> > >
> > > Yandex as a general translation engine to be able to read some
> alien
> > > language is quite good, but as an engine to produce written text it
> > is
> > >
> >  not
> > 
> > > very good at all. In fact it often creates quite horrible
> Norwegian,
> > >
> >  even
> > >>>
> >  for closely related languages. One quite common problem is
> reordering
> > >
> >  of
> > >>>
> >  words into meaningless constructs, an other problem is reordering
> > >
> >  lexical
> > >>>
> >  gender in weird ways. The English preposition "a" is often
> translated
> > >
> >  as
> > >>>
> >  "en" in a propositional phrase, and then the gender is added to the
> > > following phrase. That gives a translation of  "Oppland is a county
> > >
> >  in…"
> > >>>
> >    into something like "Oppland er en fylket i…" This should be
> > "Oppland
> > >
> >  er
> > >>>
> >  et fylke i…".
> > >
> > > (I just checked and it seems like Yandex messes up a lot less now
> > than
> > > previously, but it is still pretty bad.)
> > >
> > > Apertium works because the language is closely related, Yandex does
> > not
> > > work because it is used between very different languages. People
> try
> > to
> > >
> >  use
> > 
> > > Yandex and gets disappointed, and falsely conclude that all
> language
> > > translations are equally weird. They are not, but Yandex
> translations
> > >
> >  are
> > >>>
> >  weird.
> > >
> > > The numerical threshold does not work. The reason is simple, the
> > number
> > >

Re: [Wikimedia-l] machine translation

2017-05-03 Thread John Erling Blad
Note that some language pairs could easily be 100% correct.

On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela 
wrote:

> Perhaps it would be a good idea to compare the translated text to the text
> that the user wants to save.
>
> If they are more than 95% the same, that means that the user didn't take
> the effort to correct the text.
>
> Cheers,
> Micru
>
> On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich 
> wrote:
>
> > It does depend a lot on the engagement level of the human behind the
> > keyboard. When I deal with machine-translated text, I simply wonder
> whether
> > the someone behind the keyboard took efforts to actually read the piece.
> >
> > Now whether this would work if limited to namespaces outside "main" - I
> do
> > not want to demonise the issue, but if the person submitting the text for
> > machine translation does not read it, what will stop them from a quick
> > ctrl+c / ctrl+v? Just asking.
> >
> > Wojciech
> >
> > W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
> >
> > Creating machine translations only in the draft space (or in the user
> space
> >> in the projects which do not have draft) could help.
> >>
> >> Cheers
> >> Yaroslav
> >>
> >> On Tue, May 2, 2017 at 10:16 PM, Pharos 
> >> wrote:
> >>
> >> I think it all depends on the level of engagement of the human
> translator.
> >>>
> >>> When the tool is used in the right way, it is a fantastic tool.
> >>>
> >>> Maybe we can find better methods to nudge people toward taking their
> time
> >>> and really doing work on their translations.
> >>>
> >>> Thanks,
> >>> Pharos
> >>>
> >>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
> >>> bodhisattwa.rg...@gmail.com> wrote:
> >>>
> >>> Content translation with Yandex is also a problem in Bengali Wikipedia.
>  Some users have grown a tendency to create machine translated
>  meaningless
>  articles with this extension to increase edit count and article count.
> 
> >>> This
> >>>
>  has increased the workloads of admins to find and delete those
> articles.
> 
>  Yandex is not ready for many languages and it is better to shut it. We
>  don't need it in Bengali.
> 
>  Regards
>  On May 3, 2017 12:17 AM, "John Erling Blad"  wrote:
> 
>  Actually this _is_ about turning ContentTranslation off, that is what
> > several users in the community want. They block people using the
> >
>  extension
> 
> > and delete the translated articles. Use of ContentTranslation has
> >
>  become
> >>>
>  a
> 
> >   rather contentious case.
> >
> > Yandex as a general translation engine to be able to read some alien
> > language is quite good, but as an engine to produce written text it
> is
> >
>  not
> 
> > very good at all. In fact it often creates quite horrible Norwegian,
> >
>  even
> >>>
>  for closely related languages. One quite common problem is reordering
> >
>  of
> >>>
>  words into meaningless constructs, an other problem is reordering
> >
>  lexical
> >>>
>  gender in weird ways. The English preposition "a" is often translated
> >
>  as
> >>>
>  "en" in a propositional phrase, and then the gender is added to the
> > following phrase. That gives a translation of  "Oppland is a county
> >
>  in…"
> >>>
>    into something like "Oppland er en fylket i…" This should be
> "Oppland
> >
>  er
> >>>
>  et fylke i…".
> >
> > (I just checked and it seems like Yandex messes up a lot less now
> than
> > previously, but it is still pretty bad.)
> >
> > Apertium works because the language is closely related, Yandex does
> not
> > work because it is used between very different languages. People try
> to
> >
>  use
> 
> > Yandex and gets disappointed, and falsely conclude that all language
> > translations are equally weird. They are not, but Yandex translations
> >
>  are
> >>>
>  weird.
> >
> > The numerical threshold does not work. The reason is simple, the
> number
> >
>  of
> 
> > fixes depends on language constructs that fails, and that is simply
> >
>  not a
> >>>
>  constant for small text fragments. Perhaps if we could flag specific
> > language constructs that is known to give a high percentage of
> >
>  failures,
> >>>
>  and if the translator must check those sentences. One such language
> > construct is disappearances between the preposition and the gender of
> >
>  the
> >>>
>  following term in a prepositional phrase. If they are not similar,
> then
> >
>  the
> 
> > sentence must be checked. It is not always wrong to write "en jenta"
> in
> > Norwegian, but it is likely to be wrong.
> >
> > A language model could be a statistical model for the language
> 

Re: [Wikimedia-l] machine translation

2017-05-03 Thread Lodewijk
Reading this, I get a strong impression the problem may very well be in
setting expectations for the users of this translation tool. If they expect
the automated translation to be rather good, they may get fed up more
easily than when they consider it primarily a glorified dictionary.

Lodewijk

On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela 
wrote:

> Perhaps it would be a good idea to compare the translated text to the text
> that the user wants to save.
>
> If they are more than 95% the same, that means that the user didn't take
> the effort to correct the text.
>
> Cheers,
> Micru
>
> On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich 
> wrote:
>
> > It does depend a lot on the engagement level of the human behind the
> > keyboard. When I deal with machine-translated text, I simply wonder
> whether
> > the someone behind the keyboard took efforts to actually read the piece.
> >
> > Now whether this would work if limited to namespaces outside "main" - I
> do
> > not want to demonise the issue, but if the person submitting the text for
> > machine translation does not read it, what will stop them from a quick
> > ctrl+c / ctrl+v? Just asking.
> >
> > Wojciech
> >
> > W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
> >
> > Creating machine translations only in the draft space (or in the user
> space
> >> in the projects which do not have draft) could help.
> >>
> >> Cheers
> >> Yaroslav
> >>
> >> On Tue, May 2, 2017 at 10:16 PM, Pharos 
> >> wrote:
> >>
> >> I think it all depends on the level of engagement of the human
> translator.
> >>>
> >>> When the tool is used in the right way, it is a fantastic tool.
> >>>
> >>> Maybe we can find better methods to nudge people toward taking their
> time
> >>> and really doing work on their translations.
> >>>
> >>> Thanks,
> >>> Pharos
> >>>
> >>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
> >>> bodhisattwa.rg...@gmail.com> wrote:
> >>>
> >>> Content translation with Yandex is also a problem in Bengali Wikipedia.
>  Some users have grown a tendency to create machine translated
>  meaningless
>  articles with this extension to increase edit count and article count.
> 
> >>> This
> >>>
>  has increased the workloads of admins to find and delete those
> articles.
> 
>  Yandex is not ready for many languages and it is better to shut it. We
>  don't need it in Bengali.
> 
>  Regards
>  On May 3, 2017 12:17 AM, "John Erling Blad"  wrote:
> 
>  Actually this _is_ about turning ContentTranslation off, that is what
> > several users in the community want. They block people using the
> >
>  extension
> 
> > and delete the translated articles. Use of ContentTranslation has
> >
>  become
> >>>
>  a
> 
> >   rather contentious case.
> >
> > Yandex as a general translation engine to be able to read some alien
> > language is quite good, but as an engine to produce written text it
> is
> >
>  not
> 
> > very good at all. In fact it often creates quite horrible Norwegian,
> >
>  even
> >>>
>  for closely related languages. One quite common problem is reordering
> >
>  of
> >>>
>  words into meaningless constructs, an other problem is reordering
> >
>  lexical
> >>>
>  gender in weird ways. The English preposition "a" is often translated
> >
>  as
> >>>
>  "en" in a propositional phrase, and then the gender is added to the
> > following phrase. That gives a translation of  "Oppland is a county
> >
>  in…"
> >>>
>    into something like "Oppland er en fylket i…" This should be
> "Oppland
> >
>  er
> >>>
>  et fylke i…".
> >
> > (I just checked and it seems like Yandex messes up a lot less now
> than
> > previously, but it is still pretty bad.)
> >
> > Apertium works because the language is closely related, Yandex does
> not
> > work because it is used between very different languages. People try
> to
> >
>  use
> 
> > Yandex and gets disappointed, and falsely conclude that all language
> > translations are equally weird. They are not, but Yandex translations
> >
>  are
> >>>
>  weird.
> >
> > The numerical threshold does not work. The reason is simple, the
> number
> >
>  of
> 
> > fixes depends on language constructs that fails, and that is simply
> >
>  not a
> >>>
>  constant for small text fragments. Perhaps if we could flag specific
> > language constructs that is known to give a high percentage of
> >
>  failures,
> >>>
>  and if the translator must check those sentences. One such language
> > construct is disappearances between the preposition and the gender of
> >
>  the
> >>>
>  following term in a prepositional phrase. If they are not similar,
> then
> 

Re: [Wikimedia-l] machine translation

2017-05-03 Thread David Cuenca Tudela
Perhaps it would be a good idea to compare the translated text to the text
that the user wants to save.

If they are more than 95% the same, that means that the user didn't take
the effort to correct the text.

Cheers,
Micru

On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich 
wrote:

> It does depend a lot on the engagement level of the human behind the
> keyboard. When I deal with machine-translated text, I simply wonder whether
> the someone behind the keyboard took efforts to actually read the piece.
>
> Now whether this would work if limited to namespaces outside "main" - I do
> not want to demonise the issue, but if the person submitting the text for
> machine translation does not read it, what will stop them from a quick
> ctrl+c / ctrl+v? Just asking.
>
> Wojciech
>
> W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
>
> Creating machine translations only in the draft space (or in the user space
>> in the projects which do not have draft) could help.
>>
>> Cheers
>> Yaroslav
>>
>> On Tue, May 2, 2017 at 10:16 PM, Pharos 
>> wrote:
>>
>> I think it all depends on the level of engagement of the human translator.
>>>
>>> When the tool is used in the right way, it is a fantastic tool.
>>>
>>> Maybe we can find better methods to nudge people toward taking their time
>>> and really doing work on their translations.
>>>
>>> Thanks,
>>> Pharos
>>>
>>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
>>> bodhisattwa.rg...@gmail.com> wrote:
>>>
>>> Content translation with Yandex is also a problem in Bengali Wikipedia.
 Some users have grown a tendency to create machine translated
 meaningless
 articles with this extension to increase edit count and article count.

>>> This
>>>
 has increased the workloads of admins to find and delete those articles.

 Yandex is not ready for many languages and it is better to shut it. We
 don't need it in Bengali.

 Regards
 On May 3, 2017 12:17 AM, "John Erling Blad"  wrote:

 Actually this _is_ about turning ContentTranslation off, that is what
> several users in the community want. They block people using the
>
 extension

> and delete the translated articles. Use of ContentTranslation has
>
 become
>>>
 a

>   rather contentious case.
>
> Yandex as a general translation engine to be able to read some alien
> language is quite good, but as an engine to produce written text it is
>
 not

> very good at all. In fact it often creates quite horrible Norwegian,
>
 even
>>>
 for closely related languages. One quite common problem is reordering
>
 of
>>>
 words into meaningless constructs, an other problem is reordering
>
 lexical
>>>
 gender in weird ways. The English preposition "a" is often translated
>
 as
>>>
 "en" in a propositional phrase, and then the gender is added to the
> following phrase. That gives a translation of  "Oppland is a county
>
 in…"
>>>
   into something like "Oppland er en fylket i…" This should be "Oppland
>
 er
>>>
 et fylke i…".
>
> (I just checked and it seems like Yandex messes up a lot less now than
> previously, but it is still pretty bad.)
>
> Apertium works because the language is closely related, Yandex does not
> work because it is used between very different languages. People try to
>
 use

> Yandex and gets disappointed, and falsely conclude that all language
> translations are equally weird. They are not, but Yandex translations
>
 are
>>>
 weird.
>
> The numerical threshold does not work. The reason is simple, the number
>
 of

> fixes depends on language constructs that fails, and that is simply
>
 not a
>>>
 constant for small text fragments. Perhaps if we could flag specific
> language constructs that is known to give a high percentage of
>
 failures,
>>>
 and if the translator must check those sentences. One such language
> construct is disappearances between the preposition and the gender of
>
 the
>>>
 following term in a prepositional phrase. If they are not similar, then
>
 the

> sentence must be checked. It is not always wrong to write "en jenta" in
> Norwegian, but it is likely to be wrong.
>
> A language model could be a statistical model for the language itself,
>
 not

> for the translation into that language. We don't want a perfect
>
 language
>>>
 model, but a sufficient language model to mark weird constructs. A very
> simple solution could simply be to mark tri-grams that does not
>
 already
>>>
 exist in the text base for the destination as possible errors. It is
>
 not
>>>
 necessary to do a live check, but  at least do it before the page can
>

Re: [Wikimedia-l] machine translation

2017-05-03 Thread Amir E. Aharoni
[ Meta-comment: We usually call it "CX" and not "CT".[1] ]

2017-05-03 13:37 GMT+03:00 John Erling Blad :

> >
> > More seriously, it's quite possible that they already used some of the
> > translations made by the Norwegian Wikipedia community. In addition to
> > being published as an article, each translated paragraph is saved into
> > parallel corpora, and machine translation developers read the edited text
> > and use it to improve their software. This is completely open and usable
> by
> > all machine translation developers, not only for Yandex.
>
>
> It is quite possible the Yandex people has done something as the
> translations are a lot better now than previously. It also imply that it is
> really important to correct the text inside CT.
>

Absolutely.

All CX users must be encouraged to do this. Translation is done by humans.
That's the whole point. Content Translation is not a machine translation
tool. It's an article creation tool, which includes optional machine
translation for some language pairs. The Content Translation user interface
has three warning messages that discourage publishing unedited machine
translation,[2][3][4] and several of CX FAQs address this as well.[1]

If a user publishes an unedited machine translation, it should be handled
just like any other problematic page: it must be edited, moved to a draft,
or deleted, and the creating user should be warned.

[1] https://www.mediawiki.org/wiki/Content_translation/Documentation/FAQ
[2]
https://translatewiki.net/w/i.php?title=Special:Translations=MediaWiki%3ACx-tools-instructions-text4%2Fhe
[3]
https://translatewiki.net/w/i.php?title=Special:Translations=MediaWiki%3ACx-mt-abuse-warning-title%2Fhe
[4]
https://translatewiki.net/w/i.php?title=Special:Translations=MediaWiki%3ACx-mt-abuse-warning-text%2Fhe

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread John Erling Blad
>
> More seriously, it's quite possible that they already used some of the
> translations made by the Norwegian Wikipedia community. In addition to
> being published as an article, each translated paragraph is saved into
> parallel corpora, and machine translation developers read the edited text
> and use it to improve their software. This is completely open and usable by
> all machine translation developers, not only for Yandex.


It is quite possible the Yandex people has done something as the
translations are a lot better now than previously. It also imply that it is
really important to correct the text inside CT.

The question is how would we do it with our software. I simply cannot
> imagine doing it with the current MediaWiki platform, unless we develop a
> sophisticated NLP engine, although it's possible I'm exaggerating or
> forgetting something.


There are several places this can be inserted, both in VE and in MW. What I
want is a kind of rather simple language model, but Aharoni proposed
Languagetools in private communication. That lib is very interesting.

Perhaps one day some AI/machine-learning system like ORES would be able to
> do it. Maybe it could be an extension to ORES itself.


I've seen language models implemented as neural nets, but it is not
necessary to do it like that. Actually it is more common to do it with
plain statistics.

On Tue, May 2, 2017 at 9:25 PM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> 2017-05-02 21:47 GMT+03:00 John Erling Blad :
>
> > Yandex as a general translation engine to be able to read some alien
> > language is quite good, but as an engine to produce written text it is
> not
> > very good at all.
>
>
> ... Nor is it supposed to be.
>
> A translator is a person. Machine translation software is not a person,
> it's software. It's a tool that is supposed to help a human translator
> produce a good written text more quickly. If it doesn't make this work
> faster, it can and should be disabled. If no translator
>
>
> > In fact it often creates quite horrible Norwegian, even
> > for closely related languages. One quite common problem is reordering of
> > words into meaningless constructs, an other problem is reordering lexical
> > gender in weird ways. The English preposition "a" is often translated as
> > "en" in a propositional phrase, and then the gender is added to the
> > following phrase. That gives a translation of  "Oppland is a county in…"
> >  into something like "Oppland er en fylket i…" This should be "Oppland er
> > et fylke i…".
> >
>
> I suggest making a page with a list of such examples, so that the machine
> translation developers could read it.
>
>
> > (I just checked and it seems like Yandex messes up a lot less now than
> > previously, but it is still pretty bad.)
> >
>
> I guess that this is something that Yandex developers will be happy to hear
> :)
>
> More seriously, it's quite possible that they already used some of the
> translations made by the Norwegian Wikipedia community. In addition to
> being published as an article, each translated paragraph is saved into
> parallel corpora, and machine translation developers read the edited text
> and use it to improve their software. This is completely open and usable by
> all machine translation developers, not only for Yandex.
>
>
>
> > The numerical threshold does not work. The reason is simple, the number
> of
> > fixes depends on language constructs that fails, and that is simply not a
> > constant for small text fragments. Perhaps if we could flag specific
> > language constructs that is known to give a high percentage of failures,
> > and if the translator must check those sentences. One such language
> > construct is disappearances between the preposition and the gender of the
> > following term in a prepositional phrase.
> >
>
> The question is how would we do it with our software. I simply cannot
> imagine doing it with the current MediaWiki platform, unless we develop a
> sophisticated NLP engine, although it's possible I'm exaggerating or
> forgetting something.
>
>
> > A language model could be a statistical model for the language itself,
> not
> > for the translation into that language. We don't want a perfect language
> > model, but a sufficient language model to mark weird constructs. A very
> > simple solution could simply be to mark tri-grams that does not  already
> > exist in the text base for the destination as possible errors. It is not
> > necessary to do a live check, but  at least do it before the page can be
> > saved.
> >
>
> See above—we don't have support for plugging something like that into our
> workflow.
>
> Perhaps one day some AI/machine-learning system like ORES would be able to
> do it. Maybe it could be an extension to ORES itself.
>
>
> > Note the difference in what Yandex do and what we want to achieve; Yandex
> > translates a text between two different languages, without any clear
> reason
> > why. It is not to important 

Re: [Wikimedia-l] Turn the extension for ContentTranslation off?

2017-05-03 Thread John Erling Blad
I don't think it is useful to discuss projects and people, discuss
processes and fixes.

On Wed, May 3, 2017 at 12:15 AM, Lodewijk 
wrote:

> Hi John,
>
> Could you provide a bit more context? From which language are you drawing
> these experiences? Did you consider filing a phabricator request for the
> technical component that can be improved (if so, could you link to it)?
> Could you also provide some links to these discussions that are causing the
> internal fighting you refer to?
>
> I'd be curious to understand better what you're talking about before taking
> a position. Thanks!
>
> Best,
> Lodewijk
>
> 2017-05-02 17:20 GMT+02:00 John Erling Blad :
>
> > Yes, I wonder if the extension for content translation should be turned
> > off. Not because it is really bad, but because it allows creating
> > translations that isn't quite good enough, and those translations creates
> > fierce internal fighting between contributors.
> >
> > Some people use CT, and makes fairly good translations. Some are even
> > excellent, especially some of those based on machine translations through
> > the Apertium engine. Some are done manually and are usually fairly good,
> > but those done with the Yandex engine are usually very poor. Sometimes it
> > seems like the Yandex engine produce so many weird constructs that the
> > translators simply gives up, but sometimes it also seems like the most
> > common errors simply passes through. I guess people simply gets used to
> see
> > those errors and does not view them as "errors" anymore.
> >
> > Brute force solution; turn the ContentTranslation off. Really stupid
> > solution. The next solution; turn the Yandex engine off. That would
> solve a
> > part of the problem. Kind of lousy solution though.
> >
> > What about adding a language model that warns when the language
> constructs
> > gets to weird? It is like a "test" for the translation. The CT is used
> for
> > creating a translation, but the language model is used for verifying if
> the
> > translation is good enough. If it does not validate against the language
> > model it should simply not be published to the main name space. It will
> > still be possible to create a draft, but then the user is completely
> aware
> > that the translation isn't good enough.
> >
> > Such a language model should be available as a test for any article, as
> it
> > can be used as a quality measure for the article. It is really a quantity
> > measure for the well-spokenness of the article, but that isn't quite so
> > intuitive.
> >
> > The measure could simply be to color code the language constructs after
> how
> > common they are, with background color for common constructs in white and
> > really awful constructs in yellow.
> >
> > It could also use hints from other measurements, like readability,
> > confusion and perplexity. Perhaps even such things as punctuation and
> > markup.
> >
> > I believe users will get the idea pretty fast; only publish texts that
> are
> > "white". It is a bit like tests for developers; they don't publish code
> > that goes "red".
> > ___
> > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> > wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread Wojciech Pędzich
It does depend a lot on the engagement level of the human behind the 
keyboard. When I deal with machine-translated text, I simply wonder 
whether the someone behind the keyboard took efforts to actually read 
the piece.


Now whether this would work if limited to namespaces outside "main" - I 
do not want to demonise the issue, but if the person submitting the text 
for machine translation does not read it, what will stop them from a 
quick ctrl+c / ctrl+v? Just asking.


Wojciech

W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:

Creating machine translations only in the draft space (or in the user space
in the projects which do not have draft) could help.

Cheers
Yaroslav

On Tue, May 2, 2017 at 10:16 PM, Pharos 
wrote:


I think it all depends on the level of engagement of the human translator.

When the tool is used in the right way, it is a fantastic tool.

Maybe we can find better methods to nudge people toward taking their time
and really doing work on their translations.

Thanks,
Pharos

On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
bodhisattwa.rg...@gmail.com> wrote:


Content translation with Yandex is also a problem in Bengali Wikipedia.
Some users have grown a tendency to create machine translated meaningless
articles with this extension to increase edit count and article count.

This

has increased the workloads of admins to find and delete those articles.

Yandex is not ready for many languages and it is better to shut it. We
don't need it in Bengali.

Regards
On May 3, 2017 12:17 AM, "John Erling Blad"  wrote:


Actually this _is_ about turning ContentTranslation off, that is what
several users in the community want. They block people using the

extension

and delete the translated articles. Use of ContentTranslation has

become

a

  rather contentious case.

Yandex as a general translation engine to be able to read some alien
language is quite good, but as an engine to produce written text it is

not

very good at all. In fact it often creates quite horrible Norwegian,

even

for closely related languages. One quite common problem is reordering

of

words into meaningless constructs, an other problem is reordering

lexical

gender in weird ways. The English preposition "a" is often translated

as

"en" in a propositional phrase, and then the gender is added to the
following phrase. That gives a translation of  "Oppland is a county

in…"

  into something like "Oppland er en fylket i…" This should be "Oppland

er

et fylke i…".

(I just checked and it seems like Yandex messes up a lot less now than
previously, but it is still pretty bad.)

Apertium works because the language is closely related, Yandex does not
work because it is used between very different languages. People try to

use

Yandex and gets disappointed, and falsely conclude that all language
translations are equally weird. They are not, but Yandex translations

are

weird.

The numerical threshold does not work. The reason is simple, the number

of

fixes depends on language constructs that fails, and that is simply

not a

constant for small text fragments. Perhaps if we could flag specific
language constructs that is known to give a high percentage of

failures,

and if the translator must check those sentences. One such language
construct is disappearances between the preposition and the gender of

the

following term in a prepositional phrase. If they are not similar, then

the

sentence must be checked. It is not always wrong to write "en jenta" in
Norwegian, but it is likely to be wrong.

A language model could be a statistical model for the language itself,

not

for the translation into that language. We don't want a perfect

language

model, but a sufficient language model to mark weird constructs. A very
simple solution could simply be to mark tri-grams that does not

already

exist in the text base for the destination as possible errors. It is

not

necessary to do a live check, but  at least do it before the page can

be

saved.

Note the difference in what Yandex do and what we want to achieve;

Yandex

translates a text between two different languages, without any clear

reason

why. It is not to important if there are weird constructs in the text,

as

long as it is usable in "some" context. We translate a text for the

purpose

of republishing it. The text should be usable and easily readable in

that

language.



On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:


2017-05-02 18:20 GMT+03:00 John Erling Blad :


Brute force solution; turn the ContentTranslation off. Really

stupid

solution.


... Then I guess you don't mind that I'm changing the thread name :)



The next solution; turn the Yandex engine off. That would solve a
part of the problem. Kind of lousy solution though.

What about adding a language model that warns when the language

constructs

gets to weird? It is like a "test" for the 

Re: [Wikimedia-l] What's making you happy this week? (Week of 30 April 2017)

2017-05-03 Thread Marielle Volz
What made me happy this week was the discovery of some good scientific
imagery that was openly licensed!

The USDA has created a bunch of identification sites for species of
agricultural interest and released the images into the public domain.
I was looking for images of a particular mite and discovered the Bee
Mite site has released most of their images and all of their text to
the PD [1]. (I have uploaded to commons although done a bit of a hack
job on it). There are other sites which would also be a candidate for
batch upload, which are listed here: http://idtools.org/identify.php
(anyone interested in molluscs?)

I have also discovered this Clinical Skills textbook licensed under CC
by 4 attribution.[2] I am in the process of adding some high quality
medical diagrams to articles on wiki. This same website hosts a bunch
of other open text books which may be a similarly good source of
content: https://opentextbc.ca/

[1] http://idtools.org/id/mites/beemites/
[2] https://opentextbc.ca/clinicalskills/

On Wed, May 3, 2017 at 7:08 AM, Kalliope Tsouroupidou
 wrote:
> +1 on this.
> News of the newly recognised User Group put a smile on my face :)
>
> K.
>
> On Wed, May 3, 2017 at 4:25 AM, Pine W  wrote:
>
>> I'm happy to see the development of the Commons Photographers User Group
>> .
>>
>> Personal background story (feel free to skip reading this):
>>
>> The first DSLR I touched was easy to use with the automatic settings for
>> indoor photography in good lighting. Based on this limited experience, I
>> concluded that photography with a DSLR was easy. Some time later I bought
>> my own first DSLR, and quickly got lost. The menus were not intuitive to me
>> as a DSLR newbie, there were new terms like "aperture" and "f-stop", the
>> manual was written for someone who already had good technical knowledge of
>> how cameras work, and my lens wouldn't focus like I wanted. Wikipedia has
>> some helpful articles about photography concepts, but what would have
>> helped me a lot is spending time with an experienced photographer. After a
>> few years of trial and error, and asking questions of more knowledgeable
>> people, I'm happy with my skill level as a photography hobbyist in a
>> variety of situations. I hope that the new user Commons Photographers group
>> will facilitate knowledge exchange, improve camaraderie, and consider ways
>> to improve access to equipment -- especially for photographers in
>> situations where resources are scarce and potential for valuable
>> open-source contributions are very high.
>>
>> What's making you happy this week?
>>
>> Pine
>> ___
>> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
>> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
>> wiki/Wikimedia-l
>> New messages to: Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> 
>
>
>
>
> --
> Kalliope Tsouroupidou
> Community Advocate
> Wikimedia Foundation
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] machine translation

2017-05-03 Thread Yaroslav Blanter
Creating machine translations only in the draft space (or in the user space
in the projects which do not have draft) could help.

Cheers
Yaroslav

On Tue, May 2, 2017 at 10:16 PM, Pharos 
wrote:

> I think it all depends on the level of engagement of the human translator.
>
> When the tool is used in the right way, it is a fantastic tool.
>
> Maybe we can find better methods to nudge people toward taking their time
> and really doing work on their translations.
>
> Thanks,
> Pharos
>
> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
> bodhisattwa.rg...@gmail.com> wrote:
>
> > Content translation with Yandex is also a problem in Bengali Wikipedia.
> > Some users have grown a tendency to create machine translated meaningless
> > articles with this extension to increase edit count and article count.
> This
> > has increased the workloads of admins to find and delete those articles.
> >
> > Yandex is not ready for many languages and it is better to shut it. We
> > don't need it in Bengali.
> >
> > Regards
> > On May 3, 2017 12:17 AM, "John Erling Blad"  wrote:
> >
> > > Actually this _is_ about turning ContentTranslation off, that is what
> > > several users in the community want. They block people using the
> > extension
> > > and delete the translated articles. Use of ContentTranslation has
> become
> > a
> > >  rather contentious case.
> > >
> > > Yandex as a general translation engine to be able to read some alien
> > > language is quite good, but as an engine to produce written text it is
> > not
> > > very good at all. In fact it often creates quite horrible Norwegian,
> even
> > > for closely related languages. One quite common problem is reordering
> of
> > > words into meaningless constructs, an other problem is reordering
> lexical
> > > gender in weird ways. The English preposition "a" is often translated
> as
> > > "en" in a propositional phrase, and then the gender is added to the
> > > following phrase. That gives a translation of  "Oppland is a county
> in…"
> > >  into something like "Oppland er en fylket i…" This should be "Oppland
> er
> > > et fylke i…".
> > >
> > > (I just checked and it seems like Yandex messes up a lot less now than
> > > previously, but it is still pretty bad.)
> > >
> > > Apertium works because the language is closely related, Yandex does not
> > > work because it is used between very different languages. People try to
> > use
> > > Yandex and gets disappointed, and falsely conclude that all language
> > > translations are equally weird. They are not, but Yandex translations
> are
> > > weird.
> > >
> > > The numerical threshold does not work. The reason is simple, the number
> > of
> > > fixes depends on language constructs that fails, and that is simply
> not a
> > > constant for small text fragments. Perhaps if we could flag specific
> > > language constructs that is known to give a high percentage of
> failures,
> > > and if the translator must check those sentences. One such language
> > > construct is disappearances between the preposition and the gender of
> the
> > > following term in a prepositional phrase. If they are not similar, then
> > the
> > > sentence must be checked. It is not always wrong to write "en jenta" in
> > > Norwegian, but it is likely to be wrong.
> > >
> > > A language model could be a statistical model for the language itself,
> > not
> > > for the translation into that language. We don't want a perfect
> language
> > > model, but a sufficient language model to mark weird constructs. A very
> > > simple solution could simply be to mark tri-grams that does not
> already
> > > exist in the text base for the destination as possible errors. It is
> not
> > > necessary to do a live check, but  at least do it before the page can
> be
> > > saved.
> > >
> > > Note the difference in what Yandex do and what we want to achieve;
> Yandex
> > > translates a text between two different languages, without any clear
> > reason
> > > why. It is not to important if there are weird constructs in the text,
> as
> > > long as it is usable in "some" context. We translate a text for the
> > purpose
> > > of republishing it. The text should be usable and easily readable in
> that
> > > language.
> > >
> > >
> > >
> > > On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
> > > amir.ahar...@mail.huji.ac.il> wrote:
> > >
> > > > 2017-05-02 18:20 GMT+03:00 John Erling Blad :
> > > >
> > > > > Brute force solution; turn the ContentTranslation off. Really
> stupid
> > > > > solution.
> > > >
> > > >
> > > > ... Then I guess you don't mind that I'm changing the thread name :)
> > > >
> > > >
> > > > > The next solution; turn the Yandex engine off. That would solve a
> > > > > part of the problem. Kind of lousy solution though.
> > > > >
> > > >
> > > > > What about adding a language model that warns when the language
> > > > constructs
> > > > > gets to weird? It is like a "test" for the translation. 

Re: [Wikimedia-l] What's making you happy this week? (Week of 30 April 2017)

2017-05-03 Thread Kalliope Tsouroupidou
+1 on this.
News of the newly recognised User Group put a smile on my face :)

K.

On Wed, May 3, 2017 at 4:25 AM, Pine W  wrote:

> I'm happy to see the development of the Commons Photographers User Group
> .
>
> Personal background story (feel free to skip reading this):
>
> The first DSLR I touched was easy to use with the automatic settings for
> indoor photography in good lighting. Based on this limited experience, I
> concluded that photography with a DSLR was easy. Some time later I bought
> my own first DSLR, and quickly got lost. The menus were not intuitive to me
> as a DSLR newbie, there were new terms like "aperture" and "f-stop", the
> manual was written for someone who already had good technical knowledge of
> how cameras work, and my lens wouldn't focus like I wanted. Wikipedia has
> some helpful articles about photography concepts, but what would have
> helped me a lot is spending time with an experienced photographer. After a
> few years of trial and error, and asking questions of more knowledgeable
> people, I'm happy with my skill level as a photography hobbyist in a
> variety of situations. I hope that the new user Commons Photographers group
> will facilitate knowledge exchange, improve camaraderie, and consider ways
> to improve access to equipment -- especially for photographers in
> situations where resources are scarce and potential for valuable
> open-source contributions are very high.
>
> What's making you happy this week?
>
> Pine
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 




-- 
Kalliope Tsouroupidou
Community Advocate
Wikimedia Foundation
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,