Re: [Wikimedia-l] Fwd: Fwd: Re: Swedish Wikipedia reach 1 million (with supportof bots)
On 18/06/13 01:04, Martin Rulsch wrote: As far as I know, that's even planned by the Wikidata team. It isn't exactly planned by the Wikidata tema, a volunteer would need to do it. 2013/6/18 Ziko van Dijk vand...@wmnederland.nl Hello, I am also unhappy with the mail from Hubertl, and also some remarks that good be understood as a criticism of the German Wikipedia editing community. Actually, both opinions coexist also in de.WP, although the anti bot faction is obviously stronger. My concern is that bot articles usually stay the same and don't grow much. They give a bad impression about a Wikipedia language version, and there is no one to update them. Maybe it would be better to support WikiData and later find a solution with WikiData to provide data to small or large ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Swedish Wikipedia reach 1 million (with support of bots)
On 16/06/13 15:24, Johan Jönsson wrote: 2013/6/16 Ilario Valdelli valde...@gmail.com I think that Anders is saying that the result of Wikimedia Swedish is due to a work of bots and to a work of people. It means that this result is contrary to the WMF strategy which would have more people and more contributors. The next millions of articles will be reached by Polish Wikipedia but also by cebuan Wikipedia and by Warai-Warai Wikipedia. May be it's the time to have only bots to write in Wikipedia? I hope that in future the number of articles will be counted considering at least a small content ad not only a template in a page, because the use like this will discourage the communities of editors. I would say our experience is that it doesn't affect the number of human editors at all in any way. A couple of people run bots that create very short articles about taxons or other stuff that, to be honest, probably wouldn't have been created otherwise. These articles have very few readers. Why on Earth would this discourage us? Our main problem is that browsing Swedish Wikipedia using the random article button isn't as fun as it used to be. That's probably fixable. I am again using the opportunity to remind that all of this will soon be completely unnecessary since it should be possible to generate the articles on the fly from Wikidata data. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] WikiData and WikiSpecies
On 03/06/13 11:40, Federico Leva (Nemo) wrote: Ting Chen, 03/06/2013 11:29: I happened to worked with a few biology interwikis on WikiData today and saw the taxnomical data on it. Given that WikiData is growing and more potential would not it be a good idea to merge WikiSpecies data into WikiData Yes, and close WikiSpecies (hope now there will no stones or rotten tomatos flying for this naive question ;-) )? no. https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/02#Include_Wikispecies_into_Wikidata http://lists.wikimedia.org/pipermail/wikispecies-l/2013-January/thread.html TL;DR: Wikidata offers only a partial way to store information and no real interface at all for browsing it. What if this interface would exist? I believe it could be made really quickly and easily. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] All the lakes in Sweden
On 16/05/13 23:18, Andy Mabbett wrote: On 16 May 2013 18:36, Anders Wennersten m...@anderswennersten.se wrote: All lakes (3, all down to ones of pondsizes with no name) is now produced based on lake data from Swedish metereology institute and all lake environment data from a newly set up authority demanded by EU, in order to register and track all data of lakes in all Europe. The articles are generated by AWB and with some manual effort to take care of text in existing articles and a major effort taking care of all with the same name (Little lake, Black lake etc) Are these all on OpenStreetMap, too? We will know when people start filling in http://www.wikidata.org/wiki/Property:P402 ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] The case for supporting open source machine translation
On 26/04/13 19:38, Bjoern Hoehrmann wrote: * Andrea Zanni wrote: At the moment, Wikisource could be a interesting corpora and laboratory for improving and enhancing OCR, as the OCR generated text is always proofread and corrected by humans. Try also Distributed Proofreaders. It is my impression that Wikisource's proofreading standards are not always up to par. As part of our project ( http://wikisource.org/wiki/Wikisource_vision_development), Micru was looking for a GSoC candidate for studing the reinsertion of proofread text into djvus [1], but at the moment didn't find any interested student. We have some contacts with people at Google working on Tesseract, and they were available for mentoring. [1] We thought about this both for OCR enhancement purposes and files updating on Commons and Internet Archive (which is off topic here). I built various tools that could be fairly easily adapted for this, my http://www.google.com/search?q=site:lists.w3.org+intitle:hoehrmann+ocr notes are available. One of the tools for instance is a diff tool, see image at http://lists.w3.org/Archives/Public/www-archive/2012Apr/0031. This is a very interesting approach :) ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Wikidata Stubs: Threat or Menace?
Since I was thinking about how to do this for some time, I wrote some developers' notes at http://meta.wikimedia.org/wiki/Wikidata/Notes/Article_generation so feel free to comment if anything is not clear or not desirable. On 26/04/13 14:10, Jane Darnell wrote: Well, I am going to come out of the closet here and admit that I for one will sometimes want to read that machine-generated text over the human-written English one. Sometimes to uncover the real little gems of Wikipedia, you need to have a lot of patience with Google translate options. 2013/4/26, Delirium delir...@hackish.org: This is a very interesting proposal. I think how well it will work may vary considerably based on the language. The strongest case in favor of machine-generating stubs, imo, is in languages where there are many monolingual speakers and the Wikipedia is already quite large and active. In that case, machine-generated stubs can help promote expansion into not-yet-covered areas, plus provide monolingual speakers with information they would otherwise either not get, or have to get in worse form via a machine-translated article. At the other end of the spectrum you have quite small Wikipedias, and Wikipedias which are both small and read/written mostly/entirely by bilingual readers. In these Wikipedias, article-writing tends to focus on things more specifically relevant to a certain culture and history. Suddenly creating tens or hundreds of thousands of stubs in such languages might serve to dilute a small Wikipedia more than strengthen it: if you take a Wikipedia with 10,000 articles, and it gains 500,000 machine-generated stubs, *almost every* article that comes up in search engines will be machine-generated, making it much less obvious what parts of the encyclopedia are actually active and human-written amidst the sea of auto-generated content. Plus, from a reader's perspective, it may not even improve the availability of information. For example, I doubt there are many speakers of Bavarian who would prefer to read a machine-generated bar.wiki article, over a human-written de.wiki article. That may even be true for some less-related languages: most Danes I know would prefer a human-written English article over a machine-generated Danish one. -Mark On 4/25/13 8:16 PM, Erik Moeller wrote: Millions of Wikidata stubs invade small Wikipedias .. Volapük Wikipedia now best curated source on asteroids .. new editors flood small wikis .. Google spokesperson: This is out of control. We will shut it down. Denny suggested: II ) develop a feature that blends into Wikipedia's search if an article about a topic does not exist yet, but we have data on Wikidata about that topic Andrew Gray responded: I think this would be amazing. A software hook that says we know X article does not exist yet, but it is matched to Y topic on Wikidata and pulls out core information, along with a set of localised descriptions... we gain all the benefit of having stub articles (scope, coverage) without the problems of a small community having to curate a million pages. It's not the same as hand-written content, but it's immeasurably better than no content, or even an attempt at machine-translating free text. XXX is [a species of: fish] [in the: Y family]. It [is found in: Laos, Vietnam]. It [grows to: 20 cm]. (pictures) This seems very doable. Is it desirable? For many languages, it would allow hundreds of thousands of pseudo-stubs (not real articles stored in the DB, but generated from Wikidata) to be served to readers and crawlers that would otherwise not exist in that language. Looking back 10 years, User:Ram-Man was one of the first to generate thousands of en.wp articles from, in this case, US census data. It was controversial at the time and it stuck. Other Wikipedias have since then either allowed or prohibited bot-creation of articles on a project-by-project basis. It tends to lead to frustration when folks compare article counts and see artificial inflation by bot-created content. Does anyone know if the impact of bot-creation on (new) editor behavior has been studied? I do know that many of the Rambot articles were expanded over time, and I suspect many wouldn't have been if they hadn't turned up in search engines in the first place. On the flip side, a large surface area of content being indexed by search engines will likely also attract a fair bit of drive-by vandalism that may not be detected because those pages aren't watched. A model like the proposed one might offer a solution to a lot of these challenges. How I imagine it could work: * Templates could be defined for different Wikidata entities. We could make it possible to let users add links from items in Wikidata to Wikipedia articles that don't exist yet. (Currently this is prohibited.) If such a link is added, _and_ a relevant template is defined for the Wikidata entity type (perhaps through an entity type-template mapping), WP will render an article using that
Re: [Wikimedia-l] The case for supporting open source machine translation
On 24/04/13 12:35, Denny Vrandečić wrote: Current machine translation research aims at using massive machine learning supported systems. They usually require big parallel corpora. We do not have big parallel corpora (Wikipedia articles are not translations of each other, in general), especially not for many languages, and there is no Could you define big? If 10% of Wikipedia articles are translations of each other, we have 2 million translation pairs. Assuming ten sentences per average article, this is 20 million sentence pairs. An average Wikipedia with 100,000 articles would have 10,000 translations and 100,000 sentence pairs; a large Wikipedia with 1,000,000 articles would have 100,000 translations and 1,000,000 sentence pairs - is this not enough to kickstart a massive machine learning supported system? (Consider also that the articles are somewhat similar in structure and less rich than general text - future tense is rarely used for example.) ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] The case for supporting open source machine translation
On 24/04/13 12:35, Denny Vrandečić wrote: In summary, I see four calls for action right now (and for all of them this means to first actually think more and write down a project plan and gather input on that), that could and should be tackled in parallel if possible: I ) develop a structured Wiktionary II ) develop a feature that blends into Wikipedia's search if an article about a topic does not exist yet, but we have data on Wikidata about that topic III ) develop a multilingual search, tagging, and structuring environment for Commons IV ) develop structured Wiki content using natural language as a surface syntax, with extensible parsers and serializers None of these goals would require tens of millions or decades of research and development. I think we could have an actionable plan developed within a month or two for all four goals, and my gut feeling is we could reach them all by 2015 or 16, depending when we actually start with implementing them. I fully support this, though! This is fully within Wikimedia's current infrastructure, and generally was planned to be done anyway. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] The case for supporting open source machine translation
On 24/04/13 08:29, Erik Moeller wrote: Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be proprietary, and to rely on goodwill for any integration that concerns Wikimedia. Not the worst outcome, but also not the best one. Are there open source MT efforts that are close enough to merit scrutiny? In order to be able to provide high quality result, you would need not only a motivated, well-intentioned group of people, but some of the smartest people in the field working on it. I doubt we could more than kickstart an effort, but perhaps financial backing at significant scale could at least help a non-profit, open source effort to develop enough critical mass to go somewhere. A huge and worthwile effort on its own, and anyway a necessary step for creating free MT software, would be to build a free (as in freedom) parallel translation corpus. This corpus could then be used as the starting point by people and groups who are producing free MT software, either under WMF or on their own. This could be done by creating a new project where volunteers could compare Wikipedia articles and other free translated texts and mark sentences that are translations of other sentences. By the way, I believe Google Translate's corpus was created in this way. Perhaps this could be best achieved by teaming with www.zooniverse.org or www.pgdp.net who have experience in this kind of projects. This would require specialized non-wiki software, and I don't think that the Foundation has enough experience in developing it. (By the way, similar things that could be similarly useful include free OCR training data or free fully annotated text.) ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Are there plans for interactions between wikidata and wiktionaries ?
On 11/03/13 16:14, Gerard Meijssen wrote: What we are asking for is that we ensure that the structures that exist in OmegaWiki are replicated in Wikidata for reasons that are clear and obvious. Technically there are a few things that make sense to have.. For instance.. In the Dutch language we have a noun, a verb an adjective we do not have a country in this class. A noun can be male, female or neutral we do not have a stupid. We have singular and plural and we do not have dual like in Arabic. Uh-oh. How will this square with http://blog.wikimedia.de/2013/02/22/restricting-the-world/ ? :) Perhaps this could be a feature of Wikibase that would not be turned on on Wikidata itself, but could be for other installations, such as the new Wiktionary. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Harvard urges Elsevier boycott
On 15/01/13 09:50, Federico Leva (Nemo) wrote: David Gerard, 15/01/2013 09:30: On 15 January 2013 08:26, Andrea Zannizanni.andre...@gmail.com wrote: * the folks at archiveteam set up this: http://aaronsw.archiveteam.org/ It is not properly legal, read it all. Note, btw, that this page was set up for humorous purposes - the papers it liberates are the public domain papers JSTOR has recently made freely available. The one not properly legal thing the user does is do something outside the JSTOR terms of service. I'm also an ArchiveTeam member, and the JSTOR liberator was definitely NOT set up for humorous purposes: it's a serious project, and everyone should consider joining. For the occasion, it also gives people the option to add a message of memorial about Aaron. I'd just like to remind that while people who liberate documents do violate JSTOR's TOS, people who subsequently access the documents have never agreed to the TOS and are not bound by it, so everything should be perfectly legal for them. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Editor retention (was Re: Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices))
On 15/01/13 00:21, Richard Farmbrough wrote: Of course any effort to make article source more readable meets with opposition - in the case of references in particular. And not only from those who cite CITEVAR legitimately, but from at least one admin who will block for putting references in numerical order. These are the sorts of things which would not have lasted long in (admittedly slightly mythical) Good Old Days Unfortunately, even this admin has some justification for what he's doing: he probably encountered someone who was using reordering to introduce subtle vandalism (since it can't be checked in diff). Again I see that part of the problem is that there are too few people guarding too much content, but I don't see what to do to change this. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Editor retention (was Re: Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices))
On 09/01/13 10:03, Kim Bruning wrote: On Wed, Jan 09, 2013 at 07:45:41AM +, David Gerard wrote: Right. So anyone in this thread going into detail about en:wp policies is actually not addressing this, and the problem is on a higher level? :-/ Back to the drawing board. That actually makes the problem a lot harder! (does mean we know where to start looking though) I am not sure that Facebook is the problem. http://www.google.com/trends/explore#q=wikipedia,facebook does show that Facebook overtook Wikipedia sometime in 2007, but that happened relatively slowly. Having said that, there have been suggestions to introduce social networking features in Wikipedia. WikiLove is a step in that direction. So, what could be the next step? Befriend users and see their edits and new articles? Like edits and articles? ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Editor retention (was Re: Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices))
On 05/01/13 04:47, Tim Starling wrote: For example, requiring phone number verification for new users from developed countries would be less damaging. I don't see how is this supposed to help (and I don't think most new users would want to do this; I certainly wouldn't). ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Editor retention (was Re: Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices))
On 08/01/13 11:35, Federico Leva (Nemo) wrote: Nikola Smolenski, 08/01/2013 10:30: On 05/01/13 04:47, Tim Starling wrote: For example, requiring phone number verification for new users from developed countries would be less damaging. I don't see how is this supposed to help (and I don't think most new users would want to do this; I certainly wouldn't). Not to say that it would be a good idea, but Google does it already and phone verification is probably less painful than our CAPTCHAs are to non-English users (https://bugzilla.wikimedia.org/show_bug.cgi?id=5309 ). It's not that it's painful, it's that I don't want various organizations to know my phone number. In general, as far as we know captchas are currently not stopping spammers at all, while effectively stopping many legitimate (less Care to elaborate? Do we know how are spammers avoiding captchas (by software or by humans)? How come other websites don't have this problem? ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] photography restrictions at the Olympics
On 27/07/12 03:47, birgitte...@yahoo.com wrote: On Jul 26, 2012, at 4:23 AM, wiki-l...@phizz.demon.co.uk wrote: There is a contractual arrangement between the IOC and the photographer as specified in terms and conditions on the ticket. If some one makes photos available commercially then they may be sued by the IOC under the terms of that contract. The issue isn't about copyright but about the contractual agreement and personal liability between the photographer and the IOC. This is a contract with the ticket fine print. But I don't see how that contract could actually bind the photographs. Certainly it prevents you, the contractually bound ticket holder, from using media you produced under this contract in a commercial manner. However the IOC cannot possibly extend the contract beyond the ticket-holder. Nor force the ticket holder to police third-parties. Let's run a few possibilities: Ticket-holder (TH) places own-work photo on FaceBook. It goes viral across the Internet and is eventually posters of the photo are found in the marketplace. IOC wishes to end poster sales. Your position that this the contract must be effective against third parties would mean that if TH fails to hire a lawyer and vigorously enforce their copyrights; then they have broken the terms of the contract with IOC and are liable for damages. This is not how contracts work. If TH does not choose to enforce their copyrights then IOC can do nothing. TH has a great photo, their sister owns a bookstore. TH informally licenses the photo to Sis to use in advertising. The IOC does not even have the standing to discover if Sis has a license to use the photo or is instead infringing on the creator's copyright. Only the copyright holder has standing contest the use of their work. IOC can do nothing. TH dies. Daughter inherits copyrights and sells photos taken at last month's Olympics. IOC can do nothing. TH donates the full copyrights on all photos they created at the Games to a non-profit organization on the condition that their identity is not revealed. The non-profit, now copyright holder, licenses the entire collection CC-SA. IOC can do nothing. An excellent list :) I'd like to add: you sneak in the stadium without paying the ticket. IOC can do nothing. Seriously, if IOC decides to go after someone, don't they first have to prove that he bought the ticket? And how can they prove that? ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Russian Wikipedia goes on strike
On 11/07/12 09:40, Milos Rancic wrote: Yep, I forgot it. BTW, note the comments below RIA Novosti news on Russian Wikipedia strike [1]. That baseline fluctuates a lot :) [1] http://en.ria.ru/society/20120710/174509543.html By the way, Western media are spinning this to be an anti-Putin protest, see f.e. http://www.guardian.co.uk/world/2012/jul/10/russian-wikipedia-shut-down-protest ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Russian Wikipedia goes on strike
On 11/07/12 09:57, Fred Bauder wrote: Try http://www.guardian.co.uk/world/2012/jul/10/russian-wikipedia-shut-down-protest?INTCMP=SRCH It is quite possible, as in China, political censorship is the actual purpose, and pornography, and whatever, is just the excuse. Yes, but this has nothing to do with Putin. First, it doesn't seem that this law is pushed personally by Putin. Second, Russian Wikipedians would be against the law regardless of whether it is pushed by Putin or not. Third, a anti-Putin pro-Western government could be expected to be even worse in this regard. On 11/07/12 09:40, Milos Rancic wrote: Yep, I forgot it. BTW, note the comments below RIA Novosti news on Russian Wikipedia strike [1]. That baseline fluctuates a lot :) [1] http://en.ria.ru/society/20120710/174509543.html By the way, Western media are spinning this to be an anti-Putin protest, see f.e. http://www.guardian.co.uk/world/2012/jul/10/russian-wikipedia-shut-down-protest ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Russian Wikipedia goes on strike
On 10/07/12 08:16, Milos Rancic wrote: On Tue, Jul 10, 2012 at 7:58 AM, Keegan Peterzellkeegan.w...@gmail.com wrote: Okay, I'll bite. This is just my opinion and based on SOPA in the United States and what our government represents. Thanks! I am responding as a non-cognitivist moral skeptic nihilist. I thought you're an egoist. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Russian Wikipedia goes on strike
On 10/07/12 15:45, Keegan Peterzell wrote: They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. ~ Benjamin Franklin The blackout was exactly the opposite. A little temporary (one day) safety (all the content was still available at countless mirrors) was given up in order to obtain an essential liberty. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l