date:20100728

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Shiju Alex


 We welcome automation in translation, but not at the expense of
 introducing incorrect and messy content on wikipedia. We'd rather stay
 small and hand-craft than allow an experimental tool and unskilled
 paid translators creating a big mess.



Yes. This is the answer that you will get from most of the active  wiki
((small wikis) communities where this project is going on. Many of the small
wiki communities are not worried about the numbers as some big wikipedias
do. Quality is more important for small wikis when number of contributors
are less. *Many of us will use this quality matrix* itself to bring in more
people.

My real concern is about the rift that is happening in a language community
due to this project. Issues of a language wiki is taken outside wiki to
prove some points against its contributors.  Two types are communities are
evolving out of this project. *Google's Wiki community* and *Wiki's wiki
community*. :) This is really annoying as far as small wikis are concerned.

So, some sort of intervention is required to make sure this project run
smootly on different wiikipedias.


~Shiju


On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibha...@gmail.com wrote:

 As an admin in Bengali wikipedia, I had to deal with this issue a lot
 (some of which were discussed with the Telegraph (India) newspaper
 article). But I'd like to elaborate our stance here:

 (The tool used was Google Translation Toolkit. (not Google Translate).
 There is a distinction between these two tools. Google Translation
 Toolkit (GTT) is a translation-memory based semi-manual translation
 tool. That is, it learns translation skills as you gradually translate
 articles by hand. Later, this can be used to automate translation.)

 Issues:
 1. Community involvement: First of all, the local community was not at
 all involved or informed about this project. All on a sudden, we found
 new users signing up, dropping a large article on a random topic, and
 move away. These users never responded to any talk page messages, so
 we first assumed these were just random users experimenting with
 wikipedia.

 Even now, no one from Google has contacted us in Bengali wikipedia and
 inform us about Google's intentions. This is not a problem by itself,
 but see the following points.

 2. Translation quality: The quality of the translations was awful. The
 translations added to Bengali wikipedia were artificial, dry, and used
 obscure words and phrases. It looked as if a non-native speaker sat
 down with a dictionary in hand, and mechanically translated each
 sentence word by word. That led to sentences which are hard to
 understand, or downright nonsensical.

 The articles were half-done. Numerals were not translated at all. The
 punctuation symbol for Bengali language (the danda symbol: । ) was
 not used. (apparently, GTT and/or the google transliteration tool does
 not support that).

 The articles were also full of spelling mistakes. The paid translator
 misspelled many simple words, or even used different spellings for the
 same word in different parts of the article.

 Finally, different languages have different sentence structures.
 Sometimes, a complex sentence is better expressed if broken up in two
 sentences in another language. We found that the translators simply
 translated sentences preserving their English language structure. This
 caused the resulting Bengali sentences awkward and artificial to read.
 For example, we do not write If x then y in Bengali just by
 replacing if and then with the corresponding Bengali words. But the
 translators did that, apparently this is an artifact of using GTT.


 3. Lack of follow up: When we found the above problems, naturally, we
 asked the contributor to fix them. Got no reply. It is NOT the task of
 volunteers to clean up the mess after the one-night-standish paid
 translators. Given the small number of volunteers active at any given
 moment, it will take enormous efforts in our part to go through these
 articles and fix the punctuation, spelling, and grammar issues. Not to
 mention the awkward language style used by the translators.

 So, after getting a cold shoulder from the paid translators about
 fixing their mess, we had to ban such edits outright. We didn't know
 who was behind this, until the Wikimania talk from Google. Not that it
 matters ... even now, we won't allow these half done and badly
 translated articles on bengali wikipedia.

 Bengali wikipedia is small (21k articles), but we do not want to
 populate it overnight with badly translated content, some of which
 won't even qualify as grammatically correct Bengali. While wikipedia
 may be a perpetual work in progress, that does not mean we need to be
 guinea-pigs of some careless experiments. So, our stance is, Thanks,
 but NO Thanks!. Unless, of course, they can put enough commitment
 into the translations and fix mistakes.

 We welcome automation in translation, but not at the expense of
 introducing incorrect and messy

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Ziko van Dijk

Dear colleagues,

My experiences with the Translate Kit are negative, too. It happened
just too often that a sentence was so twisted that I did not
understand it. Checking it with the original took me a lot of time, so
I decided that doing the translation by myself is much quicker and
reliable. It is good for nobody to read Wikipedia articles in
gibberish.
The idea that the translation tool is doing the work and that a human
being has to make just some little corrections, has simply failed.
Especially negative was, to me, that the Translator kit encourages you
to translate sentence by sentence.
I don't want to do injustice to anyone, but in my view there are two
groups of Wikipedians:
- those who want to see huge article numbers and believe that any
article with any content is good, in any quality, and that the
Wikipedians are sufficient to do the rest.
- those who believe that (at least a minimum) quality is important and
that articles below a certain niveau do damage to a Wikipedia. The
small numbers of Wikipedians cannot cope with the work. They welcome
not any content, but content that meets the possible interests of
their readers.
It seems to me that the first group is mainly populated by computer
specialists and natives of English. The second group consists of
language specialists and non natives of English. But of course there
are many exceptions.

Kind regards
Ziko van Dijk


2010/7/28 Shiju Alex shijualexonl...@gmail.com:

 We welcome automation in translation, but not at the expense of
 introducing incorrect and messy content on wikipedia. We'd rather stay
 small and hand-craft than allow an experimental tool and unskilled
 paid translators creating a big mess.



 Yes. This is the answer that you will get from most of the active  wiki
 ((small wikis) communities where this project is going on. Many of the small
 wiki communities are not worried about the numbers as some big wikipedias
 do. Quality is more important for small wikis when number of contributors
 are less. *Many of us will use this quality matrix* itself to bring in more
 people.

 My real concern is about the rift that is happening in a language community
 due to this project. Issues of a language wiki is taken outside wiki to
 prove some points against its contributors.  Two types are communities are
 evolving out of this project. *Google's Wiki community* and *Wiki's wiki
 community*. :) This is really annoying as far as small wikis are concerned.

 So, some sort of intervention is required to make sure this project run
 smootly on different wiikipedias.


 ~Shiju


 On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibha...@gmail.com wrote:

 As an admin in Bengali wikipedia, I had to deal with this issue a lot
 (some of which were discussed with the Telegraph (India) newspaper
 article). But I'd like to elaborate our stance here:

 (The tool used was Google Translation Toolkit. (not Google Translate).
 There is a distinction between these two tools. Google Translation
 Toolkit (GTT) is a translation-memory based semi-manual translation
 tool. That is, it learns translation skills as you gradually translate
 articles by hand. Later, this can be used to automate translation.)

 Issues:
 1. Community involvement: First of all, the local community was not at
 all involved or informed about this project. All on a sudden, we found
 new users signing up, dropping a large article on a random topic, and
 move away. These users never responded to any talk page messages, so
 we first assumed these were just random users experimenting with
 wikipedia.

 Even now, no one from Google has contacted us in Bengali wikipedia and
 inform us about Google's intentions. This is not a problem by itself,
 but see the following points.

 2. Translation quality: The quality of the translations was awful. The
 translations added to Bengali wikipedia were artificial, dry, and used
 obscure words and phrases. It looked as if a non-native speaker sat
 down with a dictionary in hand, and mechanically translated each
 sentence word by word. That led to sentences which are hard to
 understand, or downright nonsensical.

 The articles were half-done. Numerals were not translated at all. The
 punctuation symbol for Bengali language (the danda symbol: । ) was
 not used. (apparently, GTT and/or the google transliteration tool does
 not support that).

 The articles were also full of spelling mistakes. The paid translator
 misspelled many simple words, or even used different spellings for the
 same word in different parts of the article.

 Finally, different languages have different sentence structures.
 Sometimes, a complex sentence is better expressed if broken up in two
 sentences in another language. We found that the translators simply
 translated sentences preserving their English language structure. This
 caused the resulting Bengali sentences awkward and artificial to read.
 For example, we do not write If x then y in Bengali just by
 replacing if and then

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread praveenp

Consider Malayalam Language sentense വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ്
means Wikipedia is a good encyclopedia. How one can understand if a
translator picks meaning of Malayalam words and create an English
sentence like wikipedia one good encyclopedia is. Please think about
more complex sentences. Sentence structure of Indian languages are
completely different from English or European languages. Google's
current attempt putting extra weight over tiny communities by pushing
them complete rewriting (Easiest way is deletion because some sentence
does not make any sense at all). I am not against machine translations
but Google must improve their tool or toolkit before trying it over
small wikipedias.

On Sunday 25 July 2010 09:01 PM, Andreas Kolbe wrote:
--- On Sun, 25/7/10, Fajrofai...@gmail.com wrote:

Machine translation is always unsuitable to produce usable
articles, but can
help to start new ones in smaller wikipedias.

I second that. About 50% of machine translation output is gibberish, or
worse, plausible-sounding text that actually says the opposite of what the
original said. To get it into readable form takes about as long as starting
from scratch.

Translation memory software only helps where content is repetitive.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Pedro Sanchez

On Wed, Jul 28, 2010 at 10:20 AM, praveenp me.prav...@gmail.com wrote:
 Consider Malayalam Language sentense വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ്
 means Wikipedia is a good encyclopedia. How one can understand if a
 translator picks meaning of Malayalam words and create an English
 sentence like wikipedia one good encyclopedia is. Please think about
 more complex sentences. Sentence structure of Indian languages are
 completely different from English or European languages. Google's
 current attempt putting extra weight over tiny communities by pushing
 them complete rewriting (Easiest way is deletion because some sentence
 does not make any sense at all). I am not against machine translations
 but Google must improve their tool or toolkit before trying it over
 small wikipedias.


Nor google nor the wmf is creating articles automatically via machine
translations.
Google is not pushing translated articles.

Toolkit is a page where you can see a (sometimes not good)
translation, and you (if you want to) are able to complete or fix it.

When you believe it is complete, you upload it to wikipedia, just like
you waoult upload a fully manual translation when you consider it's
complete.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Mark Williamson

Ziko, again, we are not talking about machine translations; Google
doesn't have machine translation for Bangla, Malayalam, Tamil etc.
yet. This is about translation memory.

One of the things about MAT, whose use in the professional translator
community is still debated but most popular for translations of
time-dependent things like news, is that the original is often a very
rough translation that requires a _lot_ of editing. The biggest
problem is not the toolkit itself (with some exceptions - punctuation
and templates, for example) but the translators who do not bother to
use it properly, creating poor translations with lots of spelling
mistakes and leaving behind a wasteland of poor quality articles.

GTTK can be used as a force of good if someone puts in the appropriate
time and effort; when used _properly_ by a careful, knowledgeable
translator who gives ample time for proofreading, articles created
with it should be virtually indistinguishable from any other article.

It is my thought that the huge problem here is lack of engagement with
communities. Essentially, Google swooped down and started dropping
large amounts of poor quality content on our projects without engaging
the people from those communities. The people in Google's contest also
didn't engage the communities, nor did they respond to requests to
improve their content.

-m.


On Wed, Jul 28, 2010 at 7:18 AM, Ziko van Dijk zvand...@googlemail.com wrote:
 2010/7/28 Nathan nawr...@gmail.com:
 Just to be sure I understand...

 It's good that you ask, indeed. :-)

 No, it's not about free software, and the Wikimedians are not too
 snobby or lazy to correct poor language. That is what I frequently do
 in de.WP and eo.WP, and I suppose Ragib and many others as well. The
 point is: The machine translated articles are often so bad that I
 simply don't understand them. I *cannot* correct them, because I don't
 know what they are saying.

 Kind regards
 Ziko



 What's happening here is that human
 beings, using a software tool, are translating articles from the
 English Wikipedia into a variety of other languages and posting them
 on the comparatively small Wikipedia projects in these languages. The
 articles, of unknown intrinsic quality, are usually mid to low quality
 translations.

 In the projects with an active community, some have rejected these
 articles because they are not high quality and because the community
 refuses to be responsible for fixing punctuation and other errors made
 by editors who are not members of the community. In the projects
 without an active community, Wikimedians (who may not speak any of the
 languages affected by the Google initiative) are objecting for a
 variety of other reasons - because the software used to assist
 translation isn't free, because the effort is managed by a commercial
 organization or because the endeavor wasn't cleared with the Wikimedia
 community first. Some are also concerned that these new articles will
 somehow deter new editors from becoming involved, despite clear
 evidence that a larger base of content attracts more readers, and more
 readers plus imperfect content leads to more editors.

 What I find interesting is that few seem to be interested in keeping
 or improving the translated articles; Google's attempt to provide
 content in under-served languages is actually offending Wikimedians,
 despite our ostensible commitment to the same goal. Concerns like
 bureaucratic pre-approval, using free software, etc. are somehow more
 important than reaching more people with more content. It all seems
 strange and un-Wikimedian like to me. Obviously there are things
 Google should have done differently. Maybe working with them to
 improve their process should be the focus here?

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




 --
 Ziko van Dijk
 Niederlande

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Push translation

2010-07-28 Thread Mark Williamson

Yes, of course if it's not actually reviewed and corrected by a human
it's going to be bad. What I said was that if it's used as it was
meant to be used, the results should be indistinguishable from a
normal human translation, regardless of the language involved because
all mistakes would be fixed by a person. People often neglect to do
that, but that doesn't make the tool inherently evil.

-m.

On Tue, Jul 27, 2010 at 2:35 PM, Aphaia aph...@gmail.com wrote:
 Ah, I omitted T, and I meant Toolkit. A toolkit with garbage could be
 called toolkit, but it doesn't change it is useless; it cannot deal
 with syntax properly, i.e. conjugation etc. at this moment.  Intended
 to be reviewed and corrected by a human doesn't assure it was really
 reviewed and corrected by a human to a sufficient extent. It could
 be enough for your target language, but not for mine. Thanks.

 On Wed, Jul 28, 2010 at 5:15 AM, Casey Brown li...@caseybrown.org wrote:
 On Tue, Jul 27, 2010 at 3:44 PM, Mark Williamson node...@gmail.com wrote:
 Aphaia, Shiju Alex and I are referring to Google Translator Toolkit,
 not Google Translate. If the person using the Toolkit uses it as it
 was _meant_ to be used, the results should be as good as a human
 translation because they've been reviewed and corrected by a human.

 But if the program were being used by a human who speaks the language,
 wouldn't it be *pull* translation and not *push* translation?

 --
 Casey Brown
 Cbrown1023

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




 --
 KIZU Naoko
 http://d.hatena.ne.jp/Britty (in Japanese)
 Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Ziko van Dijk

Mark Williamson:
 GTTK can be used as a force of good if someone puts in the appropriate
 time and effort; when used _properly_ by a careful, knowledgeable

 It is my thought that the huge problem here is lack of engagement with
 communities. Essentially, Google swooped down and started dropping

Agreed. Again, in my experience it is quicker and delivers more
quality to translate by your own. If others have different experiences
(it may depend on the language), okay. It seems that something went
very wrong when telling people who to contribute to a Wikipedia
language version. Could you report more about that, Mark?

Kind regards
Ziko

-- 
Ziko van Dijk
Niederlande

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Push translation

2010-07-28 Thread Amir E. Aharoni

Is anyone from Google reading this thread?

Because of this thread i tried to play with the Google Translator Toolkit a
little and found some technical problems. When i tried to send bug reports
about them through the Contact us form, i received after a few minutes a
bounce message from the translation-editor-supp...@google.com address.

I love reporting bugs, and developers are supposed to love reading them, but
it looks like i'm stuck here...

2010/7/27 Mark Williamson node...@gmail.com

 Aphaia, Shiju Alex and I are referring to Google Translator Toolkit,
 not Google Translate. If the person using the Toolkit uses it as it
 was _meant_ to be used, the results should be as good as a human
 translation because they've been reviewed and corrected by a human.

 --
אָמִיר אֱלִישָׁע אַהֲרוֹנִי
Amir Elisha Aharoni

http://aharoni.wordpress.com

We're living in pieces,
 I want to live in peace. - T. Moore
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

[Foundation-l] I didn't know we're on the BBC!

2010-07-28 Thread Bod Notbod

I've just discovered that the BBC's music site [1] is using our
content for their biographies of musicians/bands [2].

This makes me happy.

[1] http://www.bbc.co.uk/music/
[2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread praveenp

On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote:

 Nor google nor the wmf is creating articles automatically via machine
 translations.
 Google is not pushing translated articles.

 Toolkit is a page where you can see a (sometimes not good)
 translation, and you (if you want to) are able to complete or fix it.

 When you believe it is complete, you upload it to wikipedia, just like
 you waoult upload a fully manual translation when you consider it's
 complete.

Unfortunately that is not something happening around. It looks like 
somebody hiring someone and they are creating a database of words. :(




___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Ragib Hasan

On Wed, Jul 28, 2010 at 1:40 PM, praveenp me.prav...@gmail.com wrote:
 On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote:

 Nor google nor the wmf is creating articles automatically via machine
 translations.
 Google is not pushing translated articles.

 Toolkit is a page where you can see a (sometimes not good)
 translation, and you (if you want to) are able to complete or fix it.

 When you believe it is complete, you upload it to wikipedia, just like
 you waoult upload a fully manual translation when you consider it's
 complete.

 Unfortunately that is not something happening around. It looks like
 somebody hiring someone and they are creating a database of words. :(




From my experience in Bengali wikipedia, many GTT-assisted edits are
unsalvageable. This is not a fault of GTT per se, but rather a fault
of the model Google followed here. Of course GTT does not provide a
translation magically, but the translators hired by Google did an
awful job of the first draft of translation, and never fixed that. If
you show a volunteer a 1 para stub with problems, they are happy to go
and fix it. But when you bring a 100 KB full article where every
sentence needs fixing, the volunteers just give up. Even seasoned
wikipedians are not willing to devote several hours in doing a
complete rewrite of the article ... a manual translation from scratch
takes a much shorter time.

Of course, last week, one of the translators came back with a much
better version of an article, and we allowed the translator to create
it in the user space. If the translation passes the community's
standards, we will move it to the main namespace. So, we are not
completely blocking/banning such paid translations, rather we banned
bad, unfixed, unreadable translations and translators that were not
willing to fix their problems.

--
Ragib

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] I didn't know we're on the BBC!

2010-07-28 Thread quiddity

On Wed, Jul 28, 2010 at 10:01 AM, Bod Notbod bodnot...@gmail.com wrote:
 I've just discovered that the BBC's music site [1] is using our
 content for their biographies of musicians/bands [2].

 This makes me happy.

 [1] http://www.bbc.co.uk/music/
 [2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia


Marvelous! They appear to be using just the lead section from our
articles, and then from Musicbrainz.org they're pulling the metadata
(band members, collaborations) and external links list.

Examples (they use the same unfortunate url scheme as musicbrainz...)
The Beatles
http://www.bbc.co.uk/music/artists/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
Ludwig van Beethoven
http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9
Portishead
http://www.bbc.co.uk/music/artists/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11


-
It's not fully automated. They're missing a few obscure bands (that we
and musicbrainz have the relevant content for) such as Lullatone.
Also they're occasionally missing Wikipedia's content when we've
disambiguated the page name, such as Solex
http://www.bbc.co.uk/music/artists/e064f6f6-76a8-4efe-a94b-09bec8942347
http://en.wikipedia.org/wiki/Solex_%28musician%29
On the other hand, they have pages for artists that we don't cover
yet, such as Ogurusu Norihide.

-
Relatedly, all their album reviews (including items from c.2002) seem
to be released under CC BY-NC-SA


-
Separately,
I tried to send the BBC this bug report, but the webform refused to
send because is not plain text... Gah! Maybe someone here can pass
it along to their webmaster?

Broken link in FAQ:
The link here:
http://www.bbc.co.uk/music/faqs#what_happens_if_wikipedias_vandalised
currently points to:
http://en.wikipedia.org/wiki/Wikipedia_vandalism
which is incorrect. It should point to:
http://en.wikipedia.org/wiki/Wikipedia:Vandalism




Thanks for the pointer, Bod.

Quiddity

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] I didn't know we're on the BBC!

2010-07-28 Thread James Alexander

On Wed, Jul 28, 2010 at 2:51 PM, quiddity pandiculat...@gmail.com wrote:

 On Wed, Jul 28, 2010 at 10:01 AM, Bod Notbod bodnot...@gmail.com wrote:
  I've just discovered that the BBC's music site [1] is using our
  content for their biographies of musicians/bands [2].
 
  This makes me happy.
 
  [1] http://www.bbc.co.uk/music/
  [2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia
 

 Marvelous! They appear to be using just the lead section from our
 articles, and then from Musicbrainz.org they're pulling the metadata
 (band members, collaborations) and external links list.

 Examples (they use the same unfortunate url scheme as musicbrainz...)
 The Beatles
 http://www.bbc.co.uk/music/artists/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
 Ludwig van Beethoven
 http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9
 Portishead
 http://www.bbc.co.uk/music/artists/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11


 -
 It's not fully automated. They're missing a few obscure bands (that we
 and musicbrainz have the relevant content for) such as Lullatone.
 Also they're occasionally missing Wikipedia's content when we've
 disambiguated the page name, such as Solex
 http://www.bbc.co.uk/music/artists/e064f6f6-76a8-4efe-a94b-09bec8942347
 http://en.wikipedia.org/wiki/Solex_%28musician%29
 On the other hand, they have pages for artists that we don't cover
 yet, such as Ogurusu Norihide.

 -
 Relatedly, all their album reviews (including items from c.2002) seem
 to be released under CC BY-NC-SA


 -
 Separately,
 I tried to send the BBC this bug report, but the webform refused to
 send because is not plain text... Gah! Maybe someone here can pass
 it along to their webmaster?

 Broken link in FAQ:
 The link here:
 http://www.bbc.co.uk/music/faqs#what_happens_if_wikipedias_vandalised
 currently points to:
 http://en.wikipedia.org/wiki/Wikipedia_vandalism
 which is incorrect. It should point to:
 http://en.wikipedia.org/wiki/Wikipedia:Vandalism
 



 Thanks for the pointer, Bod.

 Quiddity


Did we set this up? Even links to edit with a statement to do so if you find
anything problematic with it. Very nice!

James
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Ray Saintonge

Fajro wrote:
 On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibha...@gmail.com wrote:
   
 (The tool used was Google Translation Toolkit. (not Google Translate).
 There is a distinction between these two tools. Google Translation
 Toolkit (GTT) is a translation-memory based semi-manual translation
 tool. That is, it learns translation skills as you gradually translate
 articles by hand. Later, this can be used to automate translation.)
 
 Another issue: The resulting translation memory is not free

This is a red herring.  Some real and important issues have been raised 
about machine translations, but this is not one of them.

The fact that the source codes for the translation processes are not 
free does not make the results of such machine translations unfree.  Key 
to anything being copyright is that material must be original and not 
the result of a mechanical process.  Machine translations are mechanical 
processes.  Another person using the same software with the same text 
should have the same results.

It is also important that the allegedly infringing text must have been 
fixed in some medium.  A person issuing a take down order must show, as 
an necessary element of that order, where the material in question was 
previously published.  Two identical texts by different authors need not 
be copies of each other.  With human efforts two such identical texts 
are highly improbable, but this need not be the case with machine 
translation. Indeed if the same software keeps producing different 
results I would question its reliability.

Ray

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

[Foundation-l] Pending Changes update for July 28

2010-07-28 Thread Rob Lanphier

Hi folks,

It's been a little while since I've sent out an update (sorry about that).
 The Pending Changes trial continues apace, with 1,382 articles configured
to use the feature as of this writing.

Most of the work on the software that powers Pending Changes is focused on
refactoring and stability.  Some of the performance problems associated with
this feature have been fixed, and we believe we have fixed all of the
user-visible performance problems.  Looking at our backend systems, there's
some areas where this feature is still causing more load than it should,
which is where our work is focused now.

Aaron Schulz, who has done the lion's share of the development to date
(thanks Aaron!) continues to stay involved, but at a much reduced level as
he focuses on non-Wikimedia stuff, while Chad Horohoe ramps up.

We'll be publishing some statistics soon which outline per page metrics on
revisions under Pending Changes.  Nimish Gautam and Devin Finzer (Devin is
an intern that is working for Wikimedia Foundation this summer) are working
on some statistics that they'll be publishing soon.  More discussion is
here:
http://en.wikipedia.org/wiki/Wikipedia_talk:Pending_changes/Metrics

It will be time for a vote soon about whether to keep Pending Changes
enabled on en.wikipedia.org.  We'll be pinging folks in the community about
the post-trial discussion.  If we're rigidly following the proposal, the
trial will end on August 15, regardless of whether a vote has happened.
 However, we're probably already running late for making a decision by then.
 For a variety of operational reasons, we plan to leave the feature running
while the community decides whether to keep the feature on, assuming that
process lasts no more than a month or so after August 15.

The main discussion area for this feature is here:
http://en.wikipedia.org/wiki/Wikipedia:Pending_changes/Feedback

If you have comments/suggestions/questions, that's a good place to post
them.

Rob
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Pending Changes update for July 28

2010-07-28 Thread Chad

On Wed, Jul 28, 2010 at 1:43 PM, Rob Lanphier ro...@wikimedia.org wrote:
  For a variety of operational reasons, we plan to leave the feature running
 while the community decides whether to keep the feature on, assuming that
 process lasts no more than a month or so after August 15.


Wanted to expand on this point a bit. The justification is that turning
off Pending Changes is quite a bit of work and would clutter the logs
(all of the Pending Changes pages would be Semi-Protected so they're
not all immediate targets for vandalism).

It would be a lot of work with no net benefit if the community decides
to keep the feature on. This applies to both the operations staff as well
as the community (they would have to mark everything with pending
changes again after it was reactivated). If the community decides to
not keep the feature, a little extra time of leaving it on during the
discussion period wouldn't hurt--and would give people a chance to do
some last-minute evaluation if they're on the fence.


-Chad

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

[Foundation-l] A prerequisite for the neutral, notable sum of all human knowledge

2010-07-28 Thread Brian J Mingus

The WMF mission is to provide free knowledge to the world. Wikipedia, in
particular, hopes to summarize all notable topics into a neutral sum.

Accomplishing this goal means Wikipedia an the WMF will have to evolve.
Consider the implications of the mission: Every single work that contains
notable topics must have complete coverage in Wikipedia. While every article
need not cite every work, every article must accurately summarize every
notable opinion of every notable topic in every work.

Some have interpreted the role of the proposed citations project as one of
merely centralizing the citations that already exist in Wikipedia. The
mission, however, calls for a broader vision. This new project should have a
bibliography of all works since that is the scope of the mission. The nature
of knowledge further calls for us to understand the links between items
containing knowledge, their categorical context and their abstract
relationships. This broad, unambiguous view of works and their topics will
allow us to explicate them neutrally and select only the most notable ones
for inclusion. It will, in the limit of time, prevent our judgment from
being clouded by the limited, local view of knowledge that we currently
have.

The proposed new project has the following features: It is a bibliography of
all kinds of works that fall under the umbrella of the WMF mission. Works
and collections of works contain disambiguating user contributed text and
media. Works can link to other works. Works come together to form
categories. People can use this site as their personal bibliography,
encouraging participation of a much greater community of users and curation
of the bibliography them.

There are many challenges to creating a project of such scale, but in order
to accomplish our goals of freeing knowledge we must strive to collect it
and understand it in a more nuanced way than we currently are.

Brian Mingus
Graduate student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Push translation

2010-07-28 Thread Mark Williamson

Google is, in my experience, very difficult for regular people to
get in touch with. Sometimes, when a product is in beta, they give you
a way to contact them. They used to have an e-mail to contact them at
if you had information about bilingual corpora (I found one online
from the Nunavut parliament for English and Inuktitut, but now it
looks like they've removed the address) so they could use it to
improve Google Translate.

I think they intentionally have a relatively small support staff. I
read somewhere that that had turned out to be a huge problem for the
mobile phone they produced - people might not expect great support for
a huge website like Google, but when they buy electronics, they
certainly do expect to have someone they can call and talk to within
24 hours.

I don't think that's completely unwise, though. I'm sure they get tons
of crackpot e-mails all the time. I was reading an official blog about
Google Translate, and in the post about their Wikipedia contests,
someone wrote an angry comment that google must hate Spain because
the Spanish language wasn't mentioned in that particular post. Now
multiply that by millions, and that is part of the reason (or so I
imagine) that Google makes it difficult to contact them.

-m.

On Wed, Jul 28, 2010 at 9:14 AM, Amir E. Aharoni
amir.ahar...@mail.huji.ac.il wrote:
 Is anyone from Google reading this thread?

 Because of this thread i tried to play with the Google Translator Toolkit a
 little and found some technical problems. When i tried to send bug reports
 about them through the Contact us form, i received after a few minutes a
 bounce message from the translation-editor-supp...@google.com address.

 I love reporting bugs, and developers are supposed to love reading them, but
 it looks like i'm stuck here...

 2010/7/27 Mark Williamson node...@gmail.com

 Aphaia, Shiju Alex and I are referring to Google Translator Toolkit,
 not Google Translate. If the person using the Toolkit uses it as it
 was _meant_ to be used, the results should be as good as a human
 translation because they've been reviewed and corrected by a human.

 --
 אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 Amir Elisha Aharoni

 http://aharoni.wordpress.com

 We're living in pieces,
  I want to live in peace. - T. Moore
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Mark Williamson

Well, my impression, and I'm by no means an expert in this (I'm not
associated with Google), is that they emphasized quantity over quality
and forgot to mention the importance of community to our projects.

I heard that for the Swahili Wikipedia contest at least, they gave
away prizes... but perhaps they should've included a requirement that
the articles they created be rated as good by the community, not
full of errors and nonsense sentences, and that all project
participants who want any chance at winning must respond to all
talkpage messages within 72 hours (or something like that).

I think telling a group of newbies that they'll get a big prize if
they translate the most articles is a recipe for disaster. What
incentive do they have to make sure their translation is of good
quality? What incentive do they have to stick around afterwards?

-m.

On Wed, Jul 28, 2010 at 9:10 AM, Ziko van Dijk zvand...@googlemail.com wrote:
 Mark Williamson:
 GTTK can be used as a force of good if someone puts in the appropriate
 time and effort; when used _properly_ by a careful, knowledgeable

 It is my thought that the huge problem here is lack of engagement with
 communities. Essentially, Google swooped down and started dropping

 Agreed. Again, in my experience it is quicker and delivers more
 quality to translate by your own. If others have different experiences
 (it may depend on the language), okay. It seems that something went
 very wrong when telling people who to contribute to a Wikipedia
 language version. Could you report more about that, Mark?

 Kind regards
 Ziko

 --
 Ziko van Dijk
 Niederlande

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Mark Williamson

I'm not sure that's exactly the question. Rather, by using GTTK,
people are contributing to building [[Translation memory]] for Google,
which they can in turn use to build their statistical models. It's not
that we're using non-free software, but rather that we're contributing
to it.

-m.

On Wed, Jul 28, 2010 at 1:09 PM, Ray Saintonge sainto...@telus.net wrote:
 Fajro wrote:
 On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibha...@gmail.com wrote:

 (The tool used was Google Translation Toolkit. (not Google Translate).
 There is a distinction between these two tools. Google Translation
 Toolkit (GTT) is a translation-memory based semi-manual translation
 tool. That is, it learns translation skills as you gradually translate
 articles by hand. Later, this can be used to automate translation.)

 Another issue: The resulting translation memory is not free

 This is a red herring.  Some real and important issues have been raised
 about machine translations, but this is not one of them.

 The fact that the source codes for the translation processes are not
 free does not make the results of such machine translations unfree.  Key
 to anything being copyright is that material must be original and not
 the result of a mechanical process.  Machine translations are mechanical
 processes.  Another person using the same software with the same text
 should have the same results.

 It is also important that the allegedly infringing text must have been
 fixed in some medium.  A person issuing a take down order must show, as
 an necessary element of that order, where the material in question was
 previously published.  Two identical texts by different authors need not
 be copies of each other.  With human efforts two such identical texts
 are highly improbable, but this need not be the case with machine
 translation. Indeed if the same software keeps producing different
 results I would question its reliability.

 Ray

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Push translation

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Push translation

[Foundation-l] I didn't know we're on the BBC!

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] I didn't know we're on the BBC!

Re: [Foundation-l] I didn't know we're on the BBC!

Re: [Foundation-l] Is Google translation is good for Wikipedias?

[Foundation-l] Pending Changes update for July 28

Re: [Foundation-l] Pending Changes update for July 28

[Foundation-l] A prerequisite for the neutral, notable sum of all human knowledge

Re: [Foundation-l] Push translation

Re: [Foundation-l] Is Google translation is good for Wikipedias?

Re: [Foundation-l] Is Google translation is good for Wikipedias?

20 matches

Site Navigation

Mail list logo

Footer information