Re: [Wikidata-l] Using Special:Unconnected Pages? Please read.
Unconnected pages should be rewritten to use QueryPage class so 1- It can use better caching 2- It can send results to API (which currently It can't and that's why we don't have Special:Unconnected support in Pywikibot) I started to do this a while ago but I got busy so I deferred this CS https://gerrit.wikimedia.org/r/167795 of mine until I got the time to do it but fortunately I will work on it in May. What do you think? Best On Thu, Apr 30, 2015 at 3:37 AM Romaine Wiki romaine.w...@gmail.com wrote: Hi Lydia, The Dutch community (on the Dutch Wikipedia) had a voting in which was decided to get rid of all the interwikis on articles, categories, templates, project pages and help pages. The moving or removal of all interwikis has been completed a few days ago, with what we have tried to fix all the interwiki conflicts we found. We also have an active abuse filter that prevents users to add local interwikis to these pages. Instead we actively communicate in person to every user that creates a new article which has not been added to Wikidata. We aim for that every article is added to Wikidata with a label in Dutch and at least the property *instance of*. And for many subjects multiple statements. With actively communicating to users that they should add their new article to Wikidata, the most heard reaction is that they did not know that it is good to have it added to Wikidata. The second most heard issue we hear is that they find it difficult to add articles to Wikidata (including adding statements). For those users who were in the past used to add interwikis are now actively adding their articles to existing Wikidata items. This special page we use to found out which articles have been forgotten to add to (existing) items on Wikidata. I have a templated message that notes those users if they forgot to add their article to an existing item and reminds them that such is needed. For users who are writing articles about typical Dutch subjects which are not expected to have an article in another language Wikipedia, it is new and I have a templated message that informs them why it is needed to create an item on Wikidata, tells them how to, and asks them to add some statements. We use the special page UnconnectedPages to see which articles have not been added yet to Wikidata. Another group of pages we try to find with this special page are the ones that have been removed from an item without being added to an item. The special pages makes it possible for me to just select with my mouse the list and then compare it in Autowikibrowser (still a pity we can't use AWB for items on Wikidata) to have see which of the unconnected pages is in which category, or which template, so we can groupewise add them to Wikidata. So to conclude, this special page is essential for our work to get all the Wikipedia articles to Wikidata. The adding, by the way, we like to to do by hand so that we can at least add some basic statements depending on the subject. Sometimes it is possible to use a tool to create items and add statements to a group of articles, but most of them are done by hand. (If it is done by bot/tool we get like 1000 items with only p17=q55, and that is as bad as nothing.) To do it by hand we make sure the most basic properties are added. The biggest issue we face with using is that the page is not real time updated, and that pages that already are added to Wikidata are still in the list. We then have to do a nulledit on the article to get them removed from the list. Another delay are the articles that were removed from an item do not show up directly. (Was it removed correctly from that item or was it an error or vandalism? It requires a lot of searching to find that out, if it is found.) Another subject we have, but we can work around it by just ignoring them, are those articles that are newly created and nominated for deletion as non encyclopaedic or too worse written to be kept. They are nominated for deletion. The procedure for this kind of articles is that they have a template on them which says they are nominated, and after 2 weeks an admin looks at them and judges if they should be kept, deleted or the nomination period is extended to give extra time to the writers/users to fix the issues. During this period of two weeks the article is on the special page of unconnected pages, of which a part is expected to be deleted. During these two weeks the articles nominated for deletion are making the special page a bit clumsy. (Maybe it is an idea to have magic word added to the deletion templates so that the articles that are nominated for deletion and not connected on Wikidata, show up at the bottom of the articles. I think those articles nominated for deletion should be on the special page, because I can imagine users would accidental add such magic word to the article and a magic word that would prevent being shown on the list is
Re: [Wikidata-l] Auto-transliteration of names of humans
Firstly, Note name of the thread, It's about transliterating names of humans, not transliterating in general, so translation doesn't make sense at all in this case. Secondly, I can transliterate names of Chinese people to Dutch or other Latin languages too, it will work well and it will have completely separate label in that item (so an item can easily have Dutch label and English label at the same time). Thirdly ISO standard is something we can use in some cases but most of the time how common the transliteration is more important. By some standards Smith should be transliterated to اسمیث but اسمیت is more common and the latter is being used everywhere in Persian and the bot works that way. (Note that transliteration to Arabic is something completely different) (P.S. Arabic is another language I can work on it too) Fourthly: Country of citizenship of the person is important too (My bot considers this too) why? e.g. Michael in Michael Jackson is being transliterated to مایکل (maay-kel) but Michael in Michael Schumacher is میشائل (mi-shaa-el) because this name pronounces differently in different languages. Same about James Bond and James Rodriguez (first is جیمز, Jeymez and latter is خامس, Khaa-mes). Best On Thu, Apr 23, 2015 at 11:36 AM Erics wikiadres wikiza...@xs4all.nl wrote: Thanks for your reaction, Stas. I understand what your saying, but I think it's quite arbitrary to use English transliteration. In my opinion transliteration should be unbiased and impartial, like those international (ISO) standards are. Wikidata provides a possibility to enter national standards Shchedrin (English), Sjtsjedrin (Dutch), Chtchedrin (French) and Schtschedrin (German) as aliases. So why not use the ISO-standard as the main form? best regards, Eric. Stas Malyshev schreef op 2015-04-22 22:10: Hi! maybe French and German too). Problem could be that these standardized forms lead to diacritic characters (e.g. hacek on 'c' 's'; Чайковский --- Čajkovskij and Щедрин --- Ščedrin). Is wikidata able to deal with these? As a native Russian speaker I can say transliteration like Ščedrin would look very unusual for Russian-speaking person (assuming they have experience at all with non-cyrillic transliterations, which most internet users do). Something like Shchedrin looks more familiar and seems to be much more common. While letters like ч and щ can indeed generate some long combinations which are not very visually appealing, I think it is more common than diacritics, which most people I think would struggle with. As for Hebrew, there are standard transliteration rules, which look a bit weird since they are not phonetical but rather base on spelling and distinguish some letters that all but lost their phonetical distinction in modern Hebrew (such as kaf and kuf) - but they are frequently used for signs, street names, maps, etc. These rules have been recently updated but old ones still are used from time to time. See more athttps://en.wikipedia.org/wiki/Romanization_of_Hebrew ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Auto-transliteration of names of humans
Hello, I started bot of auto-transliterating names of humans, initially with Persian and English (as a pair) since I know both and I can debug. After some modifications, In the last check, In more than several hundreds of edits I checked, I couldn't find any errors, I want to expand this bot for other languages but before, I need opinions of people who know rules of transliterating names in these languages, I tried to realize rules and this is my result but I need someone familiar to confirm *Chinese: Instead of space it uses · character (it's not dot) but order is the same. e.g Alan Turing is: 艾伦·图灵 https://zh.wikipedia.org/wiki/%E8%89%BE%E4%BC%A6%C2%B7%E5%9B%BE%E7%81%B5 which 艾伦 means Alan and 图灵 means Turing *Japanese: it's the same but different separator: ・, e.g. アラン・チューリング https://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%A9%E3%83%B3%E3%83%BB%E3%83%81%E3%83%A5%E3%83%BC%E3%83%AA%E3%83%B3%E3%82%B0 *Russian: The separator is space character but order is like FamilyName, GivenName e.g. Тьюринг, Алан https://ru.wikipedia.org/wiki/%D0%A2%D1%8C%D1%8E%D1%80%D0%B8%D0%BD%D0%B3,_%D0%90%D0%BB%D0%B0%D0%BD is Turing, Alan. Handling names with more than two words would be pretty complicated (I skip them) *I checked Hebrew and Greek and both are simple languages like Persian, same order, space as separator. If you can help me, it would make a great difference in number of labels in your language. Things you can help are: 1- Confirm or correct rules of these languages and add other rules if needed. 2- Suggest more languages. I thought about Sanskrit, Hindi, and Telugu but I don't know anyone who can check the rules, if you do, please help me. 3- For any language I will do an initial run just to test, if you can check edits of the bot (which is pretty easy, e.g. see this https://www.wikidata.org/w/index.php?title=Special:Contributions/Dexbotdir=prevoffset=20150422000329target=Dexbot) it would be awesome. Thanks, Best ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
One mistake https://www.wikidata.org/wiki/Q2963097 I just found via the report. Article in French Wikipedia is about a French type of cheese but connected to an article in Russian Wikipedia about a French playwriter. Best On Fri, Mar 20, 2015 at 3:59 AM Amir Ladsgroup ladsgr...@gmail.com wrote: Try to download it, or change the character encoding to utf-8 or unicode. And yes it's based on dumps. :) On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa ricordisa...@openmailbox.org wrote: Il 20/03/2015 01:11, Amir Ladsgroup ha scritto: OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human 0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty) And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them. Tell me if you want this test to be ran from another language too. 3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis. Best Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as Échécrate in my browsers. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human 0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty) And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them. Tell me if you want this test to be ran from another language too. 3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis. Best On Mon, Mar 16, 2015 at 9:50 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Thanks Sjoerddebruin, I'm working on this so I can write a system to find possible mistakes and it will find and report mistakes made by Dexbot or others. It works more precise as the time goes by. Best On Sun, Mar 15, 2015 at 8:51 PM Sjoerd de Bruin sjoerddebr...@me.com wrote: Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl. wikipedia.org/wiki/Categorie:Danceact As you can see, it's about musical groups but they all were marked as human. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup ladsgr...@gmail.com het volgende geschreven: I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebr...@me.com wrote: Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebr...@me.com het volgende geschreven: I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgr...@gmail.com het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgr...@gmail.com wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebr...@me.com wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maar...@mdammers.nl het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/ noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Try to download it, or change the character encoding to utf-8 or unicode. And yes it's based on dumps. :) On Fri, Mar 20, 2015 at 3:51 AM Ricordisamoa ricordisa...@openmailbox.org wrote: Il 20/03/2015 01:11, Amir Ladsgroup ha scritto: OK, I have some news: 1- Today I rewrote some parts of Kian and now it automatically chooses regulation parameter (lambda), thus predictions are more accurate. I wanted to push changes to the github but It seems my ssh has issues. It'll be there soon 2- (Important) I wrote a code that can find possible mistakes in Wikidata based on Kian. The code will be in github soon. Check out this link http://tools.wmflabs.org/dexbot/possible_mistakes_fr.txt. It's result from comparing French Wikipedia against Wikidata e.g. this line: Q2994923: 1 (d), 0.257480420229 (w) [0, 0, 1, 2, 0] 1 (d) means Wikidata thinks it's a human 0.25... (w) means French Wikipedia thinks it's not a human (with 74.3% certainty) And if you check the link you can see it's a mistake in Wikidata. Please check other results and fix them. Tell me if you want this test to be ran from another language too. 3- I used Kian to import unconnected pages from French Wikipedia and created about 1900 items. The result is here http://tools.wmflabs.org/dexbot/kian_res_fr.txt and please check if anything in this list is not human and tell me and I run some error analysis. Best Great! Unfortunately, some files seem to have been published with the wrong character encoding. E.g. the first name shows up as Échécrate in my browsers. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Thanks Sjoerddebruin, I'm working on this so I can write a system to find possible mistakes and it will find and report mistakes made by Dexbot or others. It works more precise as the time goes by. Best On Sun, Mar 15, 2015 at 8:51 PM Sjoerd de Bruin sjoerddebr...@me.com wrote: Now the gender game is working again, I encountered there were a lot of issues with the following category: https://nl.wikipedia.org/wiki/Categorie:Danceact As you can see, it's about musical groups but they all were marked as human. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 14:18 heeft Amir Ladsgroup ladsgr...@gmail.com het volgende geschreven: I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebr...@me.com wrote: Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebr...@me.com het volgende geschreven: I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgr...@gmail.com het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgr...@gmail.com wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebr...@me.com wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maar...@mdammers.nl het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
I just published the code https://github.com/Ladsgroup/Kian I really appreciate any comments or changes on the code. On Sun, Mar 15, 2015 at 2:30 PM Amir Ladsgroup ladsgr...@gmail.com wrote: I'm cleaning up and pep8fiying the code to publish it. On Sat, Mar 14, 2015 at 8:42 PM Cristian Consonni kikkocrist...@gmail.com wrote: 2015-03-08 12:56 GMT+01:00 Ricordisamoa ricordisa...@openmailbox.org: Sounds promising! It'd be good to have the code publicly viewable. +1 Cristian ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
I'm writing a parser so I can feed gender classification to Kian, It'll be done soon and you can use it :) On Sat, Mar 14, 2015 at 12:53 PM Sjoerd de Bruin sjoerddebr...@me.com wrote: Hm, the Wikidata Game is really slow. Magnus, if you read this: do you know what's going on? I play the gender game with only nlwiki articles, but it never loads. It was working yesterday with just 50 items, so it should work now imo. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 09:39 heeft Sjoerd de Bruin sjoerddebr...@me.com het volgende geschreven: I've corrected two lists (Lijst van voorzitters van de SER and Lijst van voorzitters van de WRR) and a music group (Viper (Belgische danceact)). Will play the gender game the next few days to check them. Greetings, Sjoerd de Bruin sjoerddebr...@me.com Op 14 mrt. 2015, om 00:51 heeft Amir Ladsgroup ladsgr...@gmail.com het volgende geschreven: Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgr...@gmail.com wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebr...@me.com wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maar...@mdammers.nl het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Sorry for the late answer, got busy in the real world. This is the result for unconnected pages of Dutch Wikipedia. http://tools.wmflabs.org/dexbot/kian_res_nl.txt Please check and tell me when they are not human. I'm producing result for empty items related to Dutch Wikipedia. On Thu, Mar 12, 2015 at 2:58 PM Amir Ladsgroup ladsgr...@gmail.com wrote: Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebr...@me.com wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maar...@mdammers.nl het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Sure, tonight it will be done. Best On Thu, Mar 12, 2015 at 2:08 AM, Sjoerd de Bruin sjoerddebr...@me.com wrote: I'm ready for it! All existing humans on nlwiki have a gender now, so it's easy to review this batch. Bring it on. Op 11 mrt. 2015, om 22:14 heeft Maarten Dammers maar...@mdammers.nl het volgende geschreven: Hi Amir, Amir Ladsgroup schreef op 9-3-2015 om 22:40: Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt Sounds like fun! Can you run it on the Dutch Wikipedia too? On https://tools.wmflabs.org/multichill/queries/wikidata/noclaims_nlwiki.txt I have a list of items without claims (linking them to other items). Maarten ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Hey, On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris tfmor...@gmail.com wrote: On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them. Definitely human in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else? I meant Kian classified them as Definitely human, I can't check all of them but their names https://tools.wmflabs.org/dexbot/kian_res_de.txt seems to be ok. please take a look and examine this list any way you want. Best Tom ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Result for English Wikipedia (6366 articles classified as human) https://tools.wmflabs.org/dexbot/kian_res_en.txt On Mon, Mar 9, 2015 at 2:39 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Hey, On Mon, Mar 9, 2015 at 5:50 AM, Tom Morris tfmor...@gmail.com wrote: On Sun, Mar 8, 2015 at 7:34 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: This is the result for German Wikipedia: ... so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles ... When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them. Definitely human in this context means that you did 100% verification of the 3600 items and one (or more?) human(s) agreed with the bots judgement in these cases? Or that you validated a statistically significant sample of the 3600? Or something else? I meant Kian classified them as Definitely human, I can't check all of them but their names https://tools.wmflabs.org/dexbot/kian_res_de.txt seems to be ok. please take a look and examine this list any way you want. Best Tom ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
This is the result for German Wikipedia: I ran the bot for German and I wanted to add P31:5 but it seems more than 90% of Wikidata items have P31 statement (how?) and there was nothing that I could do, so I got list of articles in German Wikipedia that doesn't have item in Wikidata. There were 16K articles and output of the bot for each one of them is this https://tools.wmflabs.org/dexbot/kian_res2.txt. If you plot it, you would have this https://tools.wmflabs.org/dexbot/kian2.png. When the number is below 0.50 it is obvious that they are not human. Between 0.50-0.61 there are 78 articles that the bot can't determine whether it's a human or not [1] and articles with more than 0.61 is definitely human. I used 0.62 just to be sure and created 3600 items with P31:5 in them. Imagine if I do something like that for English Wikipedia. [1]: They are probably about a cat or tree with categories of humans in them. Best On Sun, Mar 8, 2015 at 3:07 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: On Sat, Mar 7, 2015 at 9:19 PM, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, Yay, neural nets are definitely fun! Am I right in understanding this is a software you created for the specific purpose of doing tasks in Wikidata? Yes, in Wikidata and Wikipedia. Congratulations for this bold step towards the Singularity :-) Don't worry, it'll be some time before AI can actually ingest Wikidata, see https://dl.dropboxusercontent.com/u/7313450/entropy/aitraining.png Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Hey Markus, Thanks for your insight :) On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: Hi Amir, In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans. there is a certainty factor and It can save a lot without making such errors by using the certainty factor As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar: https://scholar.google.com/scholar?q=machine+learning+wikipedia As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality. Yes, definitely I would use them, thanks. You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch. I use backward propagation algorithm and I use octave in ML for my personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there) In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different types of vandalism would require very different techniques). You should pick hard or very large tasks to make sure that the tweaking you need in each case takes less time than you would need as a human to solve the task manually ;-) Yes, feature engineering is the most important thing and it can be tricky but feature engineering in Wikidata is lot easier (it's easier than Wikipedia. Wikipedia itself it's easier than other places). Anti-Vandalism bots are lot easier in Wikidata than Wikipedia. Editing in Wikidata is limited to certain kinds (like removing a sitelink, etc.) but it's not easy in Wikipedia. Anyway, it's an interesting field, and we could certainly use some effort to exploit the countless works in this field for Wikidata. But you should be aware that this is no small challenge and that there is no universal solution that will work well even for all the tasks that you have mentioned in your email. Of course, I spent lots of time studying this and I would be happy if anyone who knows about neural networks or AI can contribute too. Best wishes, Markus On 07.03.2015 18:21, Magnus Manske wrote: Congratulations for this bold step towards the Singularity :-) As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/ Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while. On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup ladsgr...@gmail.com mailto:ladsgr...@gmail.com wrote: Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
On Sat, Mar 7, 2015 at 10:25 PM, Emw emw.w...@gmail.com wrote: Amir, What is the false positive rate of your algorithm when dealing with fictitious humans and (non-fictitious) non-human organisms? That is, how often does your program classify such non-humans as humans? I give you an exact number for German Wikipedia in several hours Regarding the latter, note that items about individual dogs, elephants, chimpanzees and even trees can use properties that are otherwise extremely skewed towards humans. For example, Prometheus (Q590010) [1], an extremely old tree, has claims for *date of birth* (P569), *date of death* (P570), even *killed by* (P157). Non-human animals can also have kinship claims (e.g. *mother*, *brother, child*), among other properties typically used on humans. The trick to avoid such errors is to give big negative score for having a group E, or D category. Feature engineering for this task is a little complicated. At first I group categories of a Wiki by having human articles. If more than 80% members of a category are known to be humans, it's a group A category and so on. (D group= 0%). so an article can be parameterized by number of categories in each group it has. e.g. an article about human usually is like 5,3,2,0,0 and an article about a tree can be like 1,0,0,6,7 and having one or several group A category alongside with several group D category prevents the bot from making such false statements. How it's possible and how a bot can do that? it's because the huge set of data (training set) we have already and neural networks algorithms. Best Best, Eric https://www.wikidata.org/wiki/User:Emw 1. Prometheus. https://www.wikidata.org/wiki/Q590010 On Sat, Mar 7, 2015 at 1:44 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Hey Markus, Thanks for your insight :) On Sat, Mar 7, 2015 at 9:52 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: Hi Amir, In spite of all due enthusiasm, please evaluate your results (with humans!) before making automated edits. In fact, I would contradict Magnus here and say that such an approach would best be suited to provide meaningful (pre-filtered) *input* to people who play a Wikidata game, rather than bypassing the game (and humans) altogether. The expected error rates are quite high for such an approach, but it can still save a lot of works for humans. there is a certainty factor and It can save a lot without making such errors by using the certainty factor As for the next steps, I would suggest that you have a look at the works that others have done already. Try Google Scholar: https://scholar.google.com/scholar?q=machine+learning+wikipedia As you can see, there are countless works on using machine learning techniques on Wikipedia, both for information extraction (e.g., understanding link semantics) and for things like vandalism detection. I am sure that one could get a lot of inspiration from there, both on potential applications and on technical hints on how to improve result quality. Yes, definitely I would use them, thanks. You will find that people are using many different approaches in these works. The good old ANN is still a relevant algorithm in practice, but there are many other techniques, such as SVNs, Markov models, or random forests, which have been found to work better than ANNs in many cases. Not saying that a three-layer feed-forward ANN cannot do some jobs as well, but I would not restrict to one ML approach if you have a whole arsenal of algorithms available, most of them pre-implemented in libraries (the first Google hit has a lot of relevant projects listed: http://daoudclarke.github.io/machine%20learning%20in% 20practice/2013/10/08/machine-learning-libraries/). I would certainly recommend that you don't implement any of the standard ML algorithms from scratch. I use backward propagation algorithm and I use octave in ML for my personal works, but in Wikipedia I use python (for two main reasons: integrating with with other wikipedia-related tools like pywikibot and bad performance of octave and Matlab in big sets of data) and I had to write that parts from scratch since I couldn't find any related library in python. Even algorithms like BFGS is not there (I could find in scipy but I wasn't sure it works correctly and because no documentation is there) In practice, the most challenging task for successful ML is often feature engineering: the question which features you use as an input to your learning algorithm. This is far more important that the choice of algorithm. Wikipedia in particular offers you so many relevant pieces of information with each article that are not just mere keywords (links, categories, in-links, ...) and it is not easy to decide which of these to feed into your learner. This will be different for each task you solve (subject classification is fundamentally different from vandalism detection, and even different
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
:) On Sat, Mar 7, 2015 at 8:51 PM, Magnus Manske magnusman...@googlemail.com wrote: Congratulations for this bold step towards the Singularity :-) As for tasks, basically everything us mere humans do in the Wikidata game: https://tools.wmflabs.org/wikidata-game/ Some may require text parsing. Not sure how to get that working; haven't spent much time with (artificial) neural nets in a while. Gender is easy, occupation, country of citizenship, DoB and DoD are also possible but a little bit tricky. I should think more about other games. A question: How do you realize that two items are suspicious to merge? Maybe it would be easy to start merging them. Best On Sat, Mar 7, 2015 at 12:36 PM Amir Ladsgroup ladsgr...@gmail.com wrote: Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item'; Best -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
In technical terms a machine which is using forward and backward propagation to make a approximate prediction [1] is being called a neural network and doesn't matter if I agree or not. BTW: I use BGFS not gradient descending. [1]: https://en.wikipedia.org/wiki/Artificial_neural_network On Sat, Mar 7, 2015 at 9:48 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: On 07.03.2015 18:21, Magnus Manske wrote: Congratulations for this bold step towards the Singularity :-) Lol. The word neural in the name of the algorithm is infinitely more attractive and inspiring than something abstract like Support Vector Machine, isn't it? -- although we know that both approaches are much more similar to each other than to any biological neural system. ;-) However, since this is a general mailing list, it may be fair to clarify that this is just a gradient-descent based optimization procedure that we are deadling with here, and that it has nothing to do with a thinking general AI. I know that you know this, but not all of our readers may ... Cheers, Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Kian: The first neural network to serve Wikidata
Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item'; Best -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Kian: The first neural network to serve Wikidata
Some useful tasks that I'm looking for a way to do are: *Anti-vandal bot (or how we can quantify an edit). *Auto labeling for humans (That's the next task). *Add more :) On Sat, Mar 7, 2015 at 3:54 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Hey, I spent last few weeks working on this lights off [1] and now it's ready to work! Kian is a three-layered neural network with flexible number of inputs and outputs. So if we can parametrize a job, we can teach him easily and get the job done. For example and as the first job. We want to add P31:5 (human) to items of Wikidata based on categories of articles in Wikipedia. The only thing we need to is get list of items with P31:5 and list of items of not-humans (P31 exists but not 5 in it). then get list of category links in any wiki we want[2] and at last we feed these files to Kian and let him learn. Afterwards if we give Kian other articles and their categories, he classifies them as human, not human, or failed to determine. As test I gave him categories of ckb wiki (a small wiki) and worked pretty well and now I'm creating the training set from German Wikipedia and the next step will be English Wikipedia. Number of P31:5 will drastically increase this week. I would love comments or ideas for tasks that Kian can do. [1]: Because I love surprises [2]: select pp_value, cl_to from page_props join categorylinks on pp_page = cl_from where pp_propname = 'wikibase_item'; Best -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata wins the ODI Open Data Award
congrats Lydia :) On Wed, Nov 5, 2014 at 1:23 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Hey everyone, Wow! We won! \o/ This is incredible. I am so happy to see this recognition of all the work we've put into Wikidata together. Magnus and I had a blast at the award ceremony yesterday night. Here's a picture of us on stage with Sir Nigel Shadbolt and Sir Tim Berners-Lee: https://twitter.com/opencorporates/status/529721444549550080 Here's Wikidata written in huge letters above the heads of the audience: https://twitter.com/marin_dim/status/529721419580854272 And here is Magnus and me with the price: https://twitter.com/wikidata/status/529723778558083074 I am delighted that they especially recognized the breath of topics Wikidata covers and that it has been developed in the open since the beginning. This award is for everyone in this community. We should be proud! This is a perfect start into year 3 of Wikidata :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Birthday gift: Missing Wikipedia links (was Re: Wikidata turns two!)
I can connect all of them by bot but I'm not sure it should be done automatically. Happy birthday Wikidata :) On 10/29/14, James Forrester jdforres...@gmail.com wrote: On Wed Oct 29 2014 at 10:56:42 Denny Vrandečić vrande...@google.com wrote: There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge): https://tools.wmflabs.org/yichengtry This is really fun, and so useful too. Thank you so much, Denny, Jiang Bian, Si Li, and Yicheng Huang – Denny and the Googlers is a new band name if ever there was one. -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Report on Link FA/GA removal in Wikipedia
Yes, that's why I started and reverted my bot in Italian Wikipedia: https://it.wikipedia.org/wiki/Speciale:Contributi/Dexbot Wikis with similar situation are pt, fr, es, ru, ca. On 9/2/14, Ricordisamoa ricordisa...@openmailbox.org wrote: I wrote a small script for the same task and run it on a few pages on itwiki. Badges work correctly, but categories added by it:Template:Link V https://it.wikipedia.org/wiki/Template:Link_V and it:Template:Link VdQ https://it.wikipedia.org/wiki/Template:Link_VdQ obviously aren't shown. So we need an easy way to track articles by badges in real time /before/ starting the removal. Il 02/09/2014 09:09, Amir Ladsgroup ha scritto: Hey, My bot finished initial part of removing all of Link GA, and Link FA in nl, sv, pl Wikis (and some other wikis). I just removed badges that are already in Wikidata. My bot cleaned up more than 12/13 of articles in Ducth Wikipedia. The ratio was not good for sv and pl for some reasons that I fixed for the next run (e.g. in Polish Wikipedia usually they use link FA instead of Link FA). I ran a script in nl wiki to get the list of remained Link FA templates based on target language (to see what badges are missing) To see the report these are links: https://tools.wmflabs.org/dexbot/link_fa_nl.txt https://tools.wmflabs.org/dexbot/link_ga_nl.txt In those files gv: [Q183] means in article of Germany in Ducth Wikipedia this template exists: {{Link FA|gv}} because my bot didn't remove it. I managed to sort them based on number of missing badges in target language: {1: [u'gv', u'am', u'km', u'srn', u'la', u'nah', u'lt', u'frr', u'be', u'wa', u'fy', u'fa', u'oc', u'sk', u'sw'], 2: [u'th', u'arz', u'yi', u'ceb', u'bn', u'bat-smg', u'lmo', u'fo', u'ml', u'tt', u'mr'], 3: [u'af', u'mwl', u'sl', u'ms', u'uz', u'no'], 4: [u'sco', u'az', u'lv', u'ast', u'kl'], 5: [u'cs', u'nap', u'bg', u'fi', u'zh-classical', u'ka', u'bar'], 6: [u'sr'], 7: [u'zh', u'simple', u'el', u'hy', u'sq', u'hr'], 8: [u'tr'], 10: [u'vi', u'eu', u'sh', u'hu', u'sa'], 11: [u'gl', u'ko'], 13: [u'ca'], 14: [u'zh-yue', u'uk'], 15: [u'nds-nl'], 16: [u'bs', u'als', u'ar'], 19: [u'ro', u'be-x-old'], 22: [u'pt', u'vls'], 23: [u'ja'], 24: [u'ru', u'ur'], 25: [u'eo'], 29: [u'sv'], 32: [u'en'], 35: [u'de'], 41: [u'mk'], 43: [u'id'], 46: [u'scn'], 52: [u'it'], 65: [u'he'], 79: [u'es'], 80: [u'pl'], 95: [u'fr']} For a wiki with more than 1335 featured article 95 missing badge seems normal (Don't get me wrong A missing badge in this case doesn't mean we should add the badge to Wikidata, It can be caused for a former featured article, vandalism, or lots of other things) But for some Wiki the number isn't normal and I checked these languages carefully and added badges that are missing from Wikidata: *ur, vls, be-x-old, nds-nl, zh-yue, zh-classical, nap, bat-smg Overall about 300 badges are added to Wikidata today. For cleaning up Wikis from former featured articles and former good articles I wrote a feature in featured.py about four years ago but some Wikis use it (like nl, and pl) and some of them don't (like sv). I need to talk to them about this, If your Wiki have similar situation just command this: python pwb.py featured -fromall -former (or python featured.py ... in compat) I fixed some other issues and I'm rerunning them in lots of Wikis. I can get this report for English Wikipedia and Persian Wikipedia by tomorrow. If anything else needs to be done, a Wiki is missing or anything, please share here. Best -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Python bot framework for wikidata
Yes, Core has better support Best On 8/29/14, Maarten Dammers maar...@mdammers.nl wrote: Don't use compat, use core. Amir Ladsgroup schreef op 29-8-2014 2:27: Hey, It's pywikibot: https://www.mediawiki.org/wiki/PWB Both branches support Wikidata Best On 8/29/14, Benjamin Good ben.mcgee.g...@gmail.com wrote: Which python framework should a new developer use to make a wikidata editing bot? thanks -Ben ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Python bot framework for wikidata
Hey, It's pywikibot: https://www.mediawiki.org/wiki/PWB Both branches support Wikidata Best On 8/29/14, Benjamin Good ben.mcgee.g...@gmail.com wrote: Which python framework should a new developer use to make a wikidata editing bot? thanks -Ben -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wikidata-tech] [Wikimedia-l] Compare person data
Report 2.1 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.1 is now colored, red means no source for the statement in Wikidata, yellow means source is P143 (probably Wikipedia), and otherwise it's green. I also published version Report 2.2 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.2 which is version 2.1 but imprecise dates are filtered out. And Report 2.3 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.3 which is version 2 with filtered out imprecise dates. i.e. Version Colored? Wikidata against 3 WPs With imprecise dates? Report 2 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2 No No Yes Report 2.1 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.1 Yes Yes Yes Report 2.2 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.2 Yes Yes No Report 2.3 https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2.3 Yes No No Best On Wed, Aug 20, 2014 at 8:00 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: Yes, I have two options, I can just skip them or I can mark them in another color like pink or something else. What do you think? and what color if you recommend the latter. Best On Wed, Aug 20, 2014 at 7:48 PM, Magnus Manske magnusman...@googlemail.com wrote: Thanks for this. You might want to filter it, though: For example https://www.wikidata.org/wiki/Q85256 states born in 1606 (no month or day given), but your report at https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2/26 gives it as 1606-01-01, which then conflicts with the date given in Wikipedias (1606-03-18). Wikidata is less precise than Wikipedia here, but not actually wrong. Maybe these cases should be treated separately from the potential errors. Cheers, Magnus On Wed, Aug 20, 2014 at 3:54 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: I started this report, you can find it here https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2. Best On Tue, Aug 19, 2014 at 3:27 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: It's possible and rather easy to add them .just several regexes and list of months in that language are needed. but the issue is no major wikis except these three are using a person data template (Almost all of them have the template but almost none of them are using it widely) If any other wiki is using a person data template widely, I would be happy to implement it. Best On Tue, Aug 19, 2014 at 2:29 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Can we join more Wikipedias to the comparison? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014-08-19 10:38 GMT+03:00 Gerard Meijssen gerard.meijs...@gmail.com : Hoi, Amir has created functionality that compares data from en.wp de.wp and it.wp. It is data about humans and it only shows differences where they exist. It compares those four Wikipedias with information in Wikidata. The idea is that the report will be updated regularly. The problem we face is: what should it actually look like. Should it just splatter the info on a page or is more needed. At this time we just have data [1]. Please help us with something that works easily for now. Once we have something, it can be prettified and more functional. Thanks, GerardM [1] http://paste.ubuntu.com/8079742/ ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines wikimedi...@lists.wikimedia.org https://meta.wikimedia.org/wiki/Mailing_lists/guidelineswikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-tech mailing list wikidata-t...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wikimedia-l] Compare person data
I started this report, you can find it here https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2. Best On Tue, Aug 19, 2014 at 3:27 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: It's possible and rather easy to add them .just several regexes and list of months in that language are needed. but the issue is no major wikis except these three are using a person data template (Almost all of them have the template but almost none of them are using it widely) If any other wiki is using a person data template widely, I would be happy to implement it. Best On Tue, Aug 19, 2014 at 2:29 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Can we join more Wikipedias to the comparison? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014-08-19 10:38 GMT+03:00 Gerard Meijssen gerard.meijs...@gmail.com: Hoi, Amir has created functionality that compares data from en.wp de.wp and it.wp. It is data about humans and it only shows differences where they exist. It compares those four Wikipedias with information in Wikidata. The idea is that the report will be updated regularly. The problem we face is: what should it actually look like. Should it just splatter the info on a page or is more needed. At this time we just have data [1]. Please help us with something that works easily for now. Once we have something, it can be prettified and more functional. Thanks, GerardM [1] http://paste.ubuntu.com/8079742/ ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines wikimedi...@lists.wikimedia.org https://meta.wikimedia.org/wiki/Mailing_lists/guidelineswikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [Wikidata-tech] [Wikimedia-l] Compare person data
Yes, I have two options, I can just skip them or I can mark them in another color like pink or something else. What do you think? and what color if you recommend the latter. Best On Wed, Aug 20, 2014 at 7:48 PM, Magnus Manske magnusman...@googlemail.com wrote: Thanks for this. You might want to filter it, though: For example https://www.wikidata.org/wiki/Q85256 states born in 1606 (no month or day given), but your report at https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2/26 gives it as 1606-01-01, which then conflicts with the date given in Wikipedias (1606-03-18). Wikidata is less precise than Wikipedia here, but not actually wrong. Maybe these cases should be treated separately from the potential errors. Cheers, Magnus On Wed, Aug 20, 2014 at 3:54 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: I started this report, you can find it here https://www.wikidata.org/wiki/User:Ladsgroup/Birth_date_report2. Best On Tue, Aug 19, 2014 at 3:27 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: It's possible and rather easy to add them .just several regexes and list of months in that language are needed. but the issue is no major wikis except these three are using a person data template (Almost all of them have the template but almost none of them are using it widely) If any other wiki is using a person data template widely, I would be happy to implement it. Best On Tue, Aug 19, 2014 at 2:29 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Can we join more Wikipedias to the comparison? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014-08-19 10:38 GMT+03:00 Gerard Meijssen gerard.meijs...@gmail.com: Hoi, Amir has created functionality that compares data from en.wp de.wp and it.wp. It is data about humans and it only shows differences where they exist. It compares those four Wikipedias with information in Wikidata. The idea is that the report will be updated regularly. The problem we face is: what should it actually look like. Should it just splatter the info on a page or is more needed. At this time we just have data [1]. Please help us with something that works easily for now. Once we have something, it can be prettified and more functional. Thanks, GerardM [1] http://paste.ubuntu.com/8079742/ ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines wikimedi...@lists.wikimedia.org https://meta.wikimedia.org/wiki/Mailing_lists/guidelineswikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-tech mailing list wikidata-t...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Items without any label
I always get the list and run my labeler bot on them, It adds labels in case a sitelink exists. I can work on it and make it better to fix more labels. Best On 8/14/14, Lukas Benedix lukas.bene...@fu-berlin.de wrote: Hi, I found ~16.000 Items without any label. I have no idea how it's possible to create those and how to fix this problem, so here is the list for whoever can: http://tools.wmflabs.org/lbenedix/wikidata/items_without_any_label_20140811.txt Lukas ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] JSON dumps available
Thank you, let me check On 7/23/14, Tom Morris tfmor...@gmail.com wrote: On Wed, Jul 23, 2014 at 7:03 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: but since th file is huge It makes me lots of problem if I want to check myself and answer these question :( curl http://dumps.wikimedia.org/other/wikidata/20140721.json.gz | zcat | head will get you a sample in a few seconds. Tom -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Do you know why this edit isn't shown correctly? https://www.wikidata.org/w/index.php?title=Q4119465diff=123932128oldid=123931985 Best On Tue, Apr 29, 2014 at 9:34 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Am 29.04.2014 17:25, schrieb David Cuenca: Is it possible to have just an lower bond, leaving the upper one open? No. It's a precision interval, not a range. Range Snaks may be introduced in the future, but for now, you should use dedicated properties for start and end to express a range. I am thinking of uses like https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#earliest_date That's exactly the kind of property that *doesn't* need an interval or open precision: the earliest date is a precise point. For things like circa I don't see any clear solution other than inventing some ranges... Yes. I think it's reasonable to do that along the same lines that you do when reading ca 1850: I would read that as +/- 10 year. ca 2014 is probably +/- 1 year, and around August 1986 is +/- 1 month, while around August 10 is probably +/- a week or so. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
this problem is being tracked in https://bugzilla.wikimedia.org/show_bug.cgi?id=60999 Best On Wed, Apr 30, 2014 at 8:20 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Wed, Apr 30, 2014 at 11:45 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: Do you know why this edit isn't shown correctly? https://www.wikidata.org/w/index.php?title=Q4119465diff=123932128oldid=123931985 Will have a look. Thx. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
It's easy I skip pages with more than one bio templates I'm working on harvesting information right now and I'll start very soon Best On Sun, Apr 27, 2014 at 3:03 PM, David Cuenca dacu...@gmail.com wrote: Maybe it is possible to identify those cases and not use wd data for them? They must represent a very tiny percentage of the total... Cheers, Micru On Sun, Apr 27, 2014 at 12:28 PM, Amir Ladsgroup ladsgr...@gmail.comwrote: there are some problems in using bio template for example they used it for a group of people https://it.wikipedia.org/wiki/Fratelli_Wright On Sun, Apr 27, 2014 at 2:51 PM, David Cuenca dacu...@gmail.com wrote: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? In the past there was some animosity against importing data without their sources, but if it is data that is being used and displayed on Wikipedia, then I guess it would be regarded differently. Anyhow, these kinds of discussions are better on-wiki. On Sun, Apr 27, 2014 at 11:16 AM, Christian Thiele ap...@apper.dewrote: The german dates have to be parsed, and I think the uncertainty for many persons is the main problem (born 5th of July 1716 or 15th of July 1718, between 1918 and 1921). But of course for a lot of persons there are accurate information, which could be imported to Wikidata. For this kinds of situation you have the before and after of the time datatype: https://www.wikidata.org/wiki/Special:ListDatatypes The problem is that it is not visible yet... https://bugzilla.wikimedia.org/show_bug.cgi?id=61909 Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
I started my bothttps://www.wikidata.org/wiki/Special:Contributions/Dexboton: P31 (instance of), P21 (gender), P19 (place of birth), and P20 (place of death) I also wrote the code to import dates of birth and death but I'm not running it yet because there is one important question: What is the colander model you use as date of birth and death? in some places Gregorian wasn't common until 1912 so I can't add these dates before 1912 because the bot can't be sure about calender model of these dates Best On Sun, Apr 27, 2014 at 3:32 PM, Luca Martinelli martinellil...@gmail.comwrote: Il 27/apr/2014 12:59 Federico Leva (Nemo) nemow...@gmail.com ha scritto: David Cuenca, 27/04/2014 12:21: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? The Italian Wikipedia has a track of early adoption of Wikidata as a source. Almost everything that was added to Wikidata was immediately put into use (most recent big example, I think, the {{interprogetto}}). It wouldn't take long before {{bio}} starts using the data once it's available (probably days or weeks), it's been discussed several times and nobody appeared to dislike the idea. If I'm not mistaken, there are (or were) already some experiments going on with {{Bio}} using data from Wikidata, possibly for the image field. Anyway, if Amir (thanks!) is really going to upload that data, nobody is preventing us from trying to make an experiment on large scale. I'll talk with the Italian community about it. L. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Complete list of properties?
hi, you can use this: https://www.wikidata.org/w/index.php?title=Special%3AAllPagesfrom=to=namespace=120 On Fri, Feb 14, 2014 at 10:46 AM, Jeff Thompson j...@thefirst.org wrote: Hello. Is there a way to see an automatically-generated list of properties with their name and data type? (I tried the following page but it doesn't seem to be up-to-date, so that's why I asked if there is an automatically-generated page.) http://www.wikidata.org/wiki/Wikidata:List_of_properties/all Thank you, - Jeff ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] next sister project: Wikisource
On Sun, Nov 3, 2013 at 11:32 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Am 03.11.2013 19:59, schrieb Federico Leva (Nemo): *Wikisource has a big difference in interwiki mapping. you can map an item in wikisource to several item in another language (we have an open bug in bugzilla for Pywikibot about it: https://bugzilla.wikimedia.org/show_bug.cgi?id=55090) Are you going to let people to add several links with the same site in an item of Wikidata or vice verse, adding one link to several items, both of them are technically seems impossible to me because your output is dictionary (when you call API) and diciotnaries can't have several values for a key, what are you going to do about it? There's a bug report about this in wikidata too but no time now to find the number. From a technical perspective, sitelinks have to be unique (per project), otherwise they wouldn't function as sitelinks as defined by the data model (changing the data model would have in that regard would have aquite a few annoying consequences). It's entirely possible to link multiple pages on another wiki using properties/statements, though. If you're using statement instead of site link who we can trace item from article of wikisource to wikidata and how can show the interwikis in article of wikisource? about the first one I mean wgWikibaseItemId and about the second one I mean the sidebar, is there a way to control sidebar through statements instead of sitelinks? -- daniel ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] next sister project: Wikisource
Two questions: *Why don't you add Wikiquote?, It's pretty look like Wikipedia in concept (you don't have to make so many new items when you add Wikiquote as client of Wikidata) *Wikisource has a big difference in interwiki mapping. you can map an item in wikisource to several item in another language (we have an open bug in bugzilla for Pywikibot about it: https://bugzilla.wikimedia.org/show_bug.cgi?id=55090) Are you going to let people to add several links with the same site in an item of Wikidata or vice verse, adding one link to several items, both of them are technically seems impossible to me because your output is dictionary (when you call API) and diciotnaries can't have several values for a key, what are you going to do about it? Best On 11/2/13, David Cuenca dacu...@gmail.com wrote: Excellent news! :) For the author pages it is quite straight-forward. For the bibliographic metadata the easiest would be to connect wikidata items with the Book: page generated by the (planned) Book Manager extension. The Book: page is supposed to provide the book structure and act as a metadata hub for both books with scans (those with Index: page) and books without scans (there was no solution for those yet) Project page: https://meta.wikimedia.org/wiki/Book_management Bugzilla page: https://bugzilla.wikimedia.org/show_bug.cgi?id=15071 Example: http://tools.wmflabs.org/bookmanagerv2/mediawiki/index.php?title=Book:The_Interpretation_of_Dreamsaction=edit Problem, the extension is not finished yet and neither Molly nor Raylton have time to keep working on it. Some bugs are still open and the fields in the template would need to be maped to Wikidata properties. All this is not relevant for phase 1 (if it is done only for books), but it will become relevant for phase 2. Is there anyone that could volunteer as an OPW mentor to help a potential student to finish this project? Cheers, Micru On Sat, Nov 2, 2013 at 4:30 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Hey everyone, The next sister project to get language links via Wikidata is Wikisource. We're currently planning this for January 13. The coordination is happening at https://www.wikidata.org/wiki/Wikidata:Wikisource On this page we're also looking for ambassadors to help spread the messages to the different language editions of Wikisource. Please help if you can. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Dexbot problem
I fixed all of the mistakes (They were just 7 edits) https://www.wikidata.org/w/index.php?title=User:Dexbot/Possible_mistakesaction=history Best On 9/24/13, Amir Ladsgroup ladsgr...@gmail.com wrote: Hello, Yes that WAS a parsing error, I noticed (someone reported similar error in talk page of my bot a while ago), I fixed and I'm writing the correcting bot, and because I have put some quality control units(!) in my code, the death place has to be a geographical feature (P107=618123) or doesn't have P107 which the latter happened here. So I need to make my bot to be more precise. I need to add, task of my bot is harvesting information from infoboxes i.e. turning human-written data to machine-processable data, and this bot is working in a very large scale (It has 5.3M edits) so It's not a strange thing that bot makes some error Best On 9/23/13, Maarten Dammers maar...@mdammers.nl wrote: Hi Andy, Op 23-9-2013 17:18, Andy Mabbett schreef: There's a problem with Dexbot adding bogus values. For example, place of death = Welsh People on the entry for Catrin Collier (still very much alive): http://www.wikidata.org/w/index.php?title=Q13416998diff=72279065oldid=49769674 Since I have no broadband, due to an ISP failure, I can't post about this on-Wiki. Looking at the source: | death_date = !-- {{Death date and age||MM|DD|1948|MM|DD|df=y}} -- | death_place = | residence = | nationality = [[Welsh people|Welsh]] Probably parsing error here. Amir, can you have a look at it? Is this standard or custom code doing the template parsing? Maarten -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Dexbot problem
I fixed all of the mistakes (They were just 7 edits) https://www.wikidata.org/w/index.php?title=User:Dexbot/Possible_mistakesaction=history Best On 9/24/13, Amir Ladsgroup ladsgr...@gmail.com wrote: Hello, Yes that WAS a parsing error, I noticed (someone reported similar error in talk page of my bot a while ago), I fixed and I'm writing the correcting bot, and because I have put some quality control units(!) in my code, the death place has to be a geographical feature (P107=618123) or doesn't have P107 which the latter happened here. So I need to make my bot to be more precise. I need to add, task of my bot is harvesting information from infoboxes i.e. turning human-written data to machine-processable data, and this bot is working in a very large scale (It has 5.3M edits) so It's not a strange thing that bot makes some error Best On 9/23/13, Maarten Dammers maar...@mdammers.nl wrote: Hi Andy, Op 23-9-2013 17:18, Andy Mabbett schreef: There's a problem with Dexbot adding bogus values. For example, place of death = Welsh People on the entry for Catrin Collier (still very much alive): http://www.wikidata.org/w/index.php?title=Q13416998diff=72279065oldid=49769674 Since I have no broadband, due to an ISP failure, I can't post about this on-Wiki. Looking at the source: | death_date = !-- {{Death date and age||MM|DD|1948|MM|DD|df=y}} -- | death_place = | residence = | nationality = [[Welsh people|Welsh]] Probably parsing error here. Amir, can you have a look at it? Is this standard or custom code doing the template parsing? Maarten -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Dexbot problem
Hello, Yes that WAS a parsing error, I noticed (someone reported similar error in talk page of my bot a while ago), I fixed and I'm writing the correcting bot, and because I have put some quality control units(!) in my code, the death place has to be a geographical feature (P107=618123) or doesn't have P107 which the latter happened here. So I need to make my bot to be more precise. I need to add, task of my bot is harvesting information from infoboxes i.e. turning human-written data to machine-processable data, and this bot is working in a very large scale (It has 5.3M edits) so It's not a strange thing that bot makes some error Best On 9/23/13, Maarten Dammers maar...@mdammers.nl wrote: Hi Andy, Op 23-9-2013 17:18, Andy Mabbett schreef: There's a problem with Dexbot adding bogus values. For example, place of death = Welsh People on the entry for Catrin Collier (still very much alive): http://www.wikidata.org/w/index.php?title=Q13416998diff=72279065oldid=49769674 Since I have no broadband, due to an ISP failure, I can't post about this on-Wiki. Looking at the source: | death_date = !-- {{Death date and age||MM|DD|1948|MM|DD|df=y}} -- | death_place = | residence = | nationality = [[Welsh people|Welsh]] Probably parsing error here. Amir, can you have a look at it? Is this standard or custom code doing the template parsing? Maarten -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] 20% of interwiki conflicts solved at nl-wiki, others?
In Persian Wikipedia, we solved almost all of iw conflicts out of two or three thousands conflict Best regards On Jun 18, 2013 12:42 AM, יונה בנדלאק bende...@walla.com wrote: I look for list with old interwiki. now i get one, i start work on it, and hope to get some help from hebrew wikipedian, in hebrew wikipedia. Y. Bendelac שולח: *Jan Dudík* נושא: *Re: [Wikidata-l] 20% of interwiki conflicts solved at nl-wiki, others?* I am solving conflicts on cs.wiki (and some on sk.wiki too), now there is only about 200 problematic pages (at the beginning there was about 2000 pages), so on the end of june might be solved all. http://cs.wikipedia.org/wiki/Kategorie:Údržba:Interwiki_konflikty The mumber of pages with remaining old interwiki links you can see here http://tools.wmflabs.org/addshore/addbot/iwlinks/ JAnD 2013/6/17 Jane Darnell **: I noticed this when I clicked on a few notorious Dutch-English wikipages that I know of - thanks for all the good work! I for one, appreciate it. I think the only reason this can be done is thanks to WikiData, which now easily shows what other interwiki possibilities are when you select an item. How exactly are you solving these? I would be interested in helping for pages in Portal:Arts 2013/6/17, Svavar Kjarrval **: I'd like to see the status on my wiki. How can one detect those conflicts? - Svavar Kjarrval On 17/06/13 00:13, Romaine Wiki wrote: In the past weeks the community of the Dutch Wikipedia has been working hard to solve interwiki conflicts. A few months ago we had more than 14000 interwiki conflicts, today less than 10800. With fixing these interwiki conflicts, often we also fix them on other wikis. But are other Wikipedias also active on massively fixing interwiki conflicts? Romaine ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Walla! Mail - Get your free unlimited mail todayhttp://newmail.walla.co.il ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Question about wikipedia categories.
Guys! You can continue this conversion in a more public place like WD:PC It's bothering for people like me to receive e-mail every five minutes in a topic which I'm not interested So please continue this in a somewhere else On 5/7/13, Jane Darnell jane...@gmail.com wrote: What is interesting about categories, is that no matter how shaky the system is, these are pretty much the only meta data that there is for articles, because as I said before, just about every article has one. The weakness of DBpedia is that it is only programmed to pick up articles with infoboxes, and there just aren't that many of those. 2013/5/7, Michael Hale hale.michael...@live.com: Pardon the spam, but it is only 2000 categories. Four steps would be 25000. From: hale.michael...@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 7 May 2013 12:10:51 -0400 Subject: Re: [Wikidata-l] Question about wikipedia categories. I spoke too soon. That is the only loop at two steps. But if you go out three steps (25000 categories) you find another 23 loops. Organizational studies - organizations, housing - household behavior and family economics - home - housing, religious pluralism - religious persecution, secularism - religious pluralism, learning - inductive reasoning - scientific theories - sociological theories - social systems - society - education - learning, etc. From: hale.michael...@live.com To: wikidata-l@lists.wikimedia.org Date: Tue, 7 May 2013 11:31:24 -0400 Subject: Re: [Wikidata-l] Question about wikipedia categories. I don't know if these are useful, but if we go two steps from the fundamental categories on the English Wikipedia we find several loops. Knowledge contains information and information contains knowledge, for example. Not allowing loops might force you to have to give different ranks to two categories that are equally important. Date: Tue, 7 May 2013 16:41:45 +0200 From: hellm...@informatik.uni-leipzig.de To: wikidata-l@lists.wikimedia.org Subject: Re: [Wikidata-l] Question about wikipedia categories. Am 07.05.2013 14:01, schrieb emw: Yes, there is and should be more than one ontology, and that is already the case with categories, which are so flexible they can loop around and become their own grandfather. Can someone give an example of where it would be useful to have a cycle in an ontology? Navigation! How else are you going to find back where you came from ;) Wikipieda categories were invented originally for navigation, right? Cycles are not soo bad, then... Now we live in a new era. -- Sebastian To my knowledge cycles are considered a problem in categorization, and would be a problem in a large-scaled ontology-based classification system as well. My impression has been that Wikidata's ontology would be a directed acyclic graph (DAG) with a single root at entity (thing). On Tue, May 7, 2013 at 3:03 AM, Mathieu Stumpf psychosl...@culture-libre.org wrote: Le 2013-05-06 18:13, Jane Darnell a écrit : Yes, there is and should be more than one ontology, and that is already the case with categories, which are so flexible they can loop around and become their own grandfather. To my mind, categories indeed feet better how we think. I'm not sure grandfather is a canonical term in such a graph, I think it's simply a cycle[1]. [1] https://en.wikipedia.org/wiki/Cycle_%28graph_theory%29 Dbpedia complaints should be discussed on that list, I am not a dbpedia user, though I think it's a useful project to have around. Sorry I didn't want to make off topic messages, nor sound complaining. I just wanted to give my feedback, hopefuly a constructive one, on a message posted on this list. I transfered my message to dbpedia mailing list. Sent from my iPad On May 6, 2013, at 12:00 PM, Jona Christopher Sahnwaldt j...@sahnwaldt.de wrote: Hi Mathieu, I think the DBpedia mailing list is a better place for discussing the DBpedia ontology: https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion Drop us a message if you have questions or concerns. I'm sure someone will answer your questions. I am not an ontology expert, so I'll just leave it at that. JC
[Wikidata-l] Changing claims via PWB
Hello, I made commits r11209-11212 for making easier to change claims via API you can see the manual here: http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata#Changing_or_creating_claims.2Fstatements you simply can call and change properties and give them values I tested it on two categories and it worked ( http://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Dexbot_2) I haven't work on errors yet (for example if new and old values are same the code returns error instead of simply not saving the change like mediawiki) but if there is any bugs, tell me Cheers -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] weekly summary 46
I want to add it's now possible to import another wiki's articles via pywikipediabot: https://www.mediawiki.org/wiki/Special:Code/pywikipedia/11103 More info and examples: http://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Dexbot The code is very simple: # -*- coding: utf-8 -*- import wikipedia site=wikipedia.getSite('fa',fam='wikipedia') listofarticles=[uعماد خراسانی,uکوری جهاز] for name in listofarticles: page=wikipedia.Page(site,name) data=wikipedia.DataPage(page) try: items=data.get() except wikipedia.NoPage: print The item doesn't exist. Creating... data.createitem(Bot: Importing article from Persian wikipedia) else: print It has been created already. Skipping... On 2/22/13, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Heya folks :) Here's your weekly summary of everything fancy happening around Wikidata. It's was quite a week! The wiki version is at http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_02_22 = Development = * Deployed new features and bugfixes including diffs for statements and the ability to enter items and properties by their ID: http://www.wikidata.org/wiki/Wikidata:Project_chat#New_features_and_bugfixes_rolled_out * Updated demo system (http://wikidata-test-repo.wikimedia.de/wiki/Testwiki:Main_Page) * Database maintainance (Wikidata was in read-only mode for a bit) * Implemented first version of a string data type * Worked on better error reporting from the API * Ported Lua function mw.wikibase.getEntity to Wikibase extension * Worked on making the search box suggest items and properties while typing * Improved the behaviour of the entity selector (thingy that lets you select items and entities) * Improved debugging experience in JavaScript when dealing with prototypical inheritance * Worked on cleanup of local entity store * Generalized generation of localizable edit summaries See http://meta.wikimedia.org/wiki/Wikidata/Development/Current_sprint for what we’re working on next. You can follow our commits at https://gerrit.wikimedia.org/r/#/q/(status:open+project:mediawiki/extensions/Wikibase)+OR+(status:merged+project:mediawiki/extensions/Wikibase),n,z and view the subset awaiting review at https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Wikibase,n,z You can see all open bugs related to Wikidata at https://bugzilla.wikimedia.org/buglist.cgi?emailcc1=1list_id=151540resolution=---emailtype1=exactemailassigned_to1=1query_format=advancedemail1=wikidata-bugs%40lists.wikimedia.org = Discussions/Press = * Wikidata development will continue in 2013: http://blog.wikimedia.de/2013/02/20/the-future-of-wikidata/ * Wikidata Phase 2 in Full Swing: http://semanticweb.com/wikidata-phase-2-in-full-swing_b35445 * RFC about the Wikidata API: http://lists.wikimedia.org/pipermail/wikidata-l/2013-February/001763.html * Lots of discussions about certain properties and how they should be used. Current state is at http://www.wikidata.org/wiki/Wikidata:List_of_properties and new ones are being discussed at http://www.wikidata.org/wiki/Wikidata:Property_proposal * RCF about amending the global bot policy for Wikidata: http://meta.wikimedia.org/wiki/Requests_for_comment/Bot_policy * Proposed changes to Wikidata’s notability guidelines: http://www.wikidata.org/wiki/Wikidata:Project_chat#Proposed_guideline_change:_Notability_rules = Events = see http://meta.wikimedia.org/wiki/Wikidata/Events * office hour including a report on the current status of Wikidata (log is at http://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2013-02-16) * upcoming: Wikipedia Day NYC = Other Noteworthy Stuff = * Deployment of phase 1 (language links) on all remaining Wikipedias and a small update on Wikidata are planned for March 6 * Great page for editors to learn about what phase 1 means for them: http://en.wikipedia.org/wiki/Wikipedia:Wikidata * Cool tool visualizing family relations based on data in Wikidata: http://toolserver.org/~magnus/ts2/geneawiki/?q=Q1339 * “Restricting the World” (first in a series about some design decisions behind Wikidata): http://blog.wikimedia.de/2013/02/22/restricting-the-world/ * List of most used properties: http://www.wikidata.org/wiki/Wikidata:Database_reports/Popular_properties * OmegaWiki is using Wikidata to get links to Wikipedia articles: http://omegawiki.blogspot.de/2013/02/using-wikidata-to-display-links-to.html * Nyan Nyan Wikidata Nyan Nyan http://commons.wikimedia.org/wiki/File:Wikidata-Cat.gif * We’ve hit http://www.wikidata.org/wiki/Q500 * Had a look at http://www.wikidata.org/wiki/Wikidata:Tools lately? = Open Tasks for You = * Hack on one of https://bugzilla.wikimedia.org/buglist.cgi?keywords=need-volunteer%2C%20keywords_type=allwordsemailcc1=1resolution=---emailtype1=exactemailassigned_to1=1query_format=advancedemail1=wikidata-bugs%40lists.wikimedia.orglist_id=162515 *
Re: [Wikidata-l] phase 1 live on the English Wikipedia
When? On Thu, Feb 14, 2013 at 2:08 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Thu, Feb 14, 2013 at 11:34 AM, Yuri Astrakhan yuriastrak...@gmail.com wrote: I guess it only makes sense if the deployment will be few languages at a time. If at some point you will start switching them in bulk, it might make sense not to do it. The current plan is to do all the remaining ones in one go. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] phase 1 live on the English Wikipedia
I did this: https://www.mediawiki.org/wiki/Special:Code/pywikipedia/11073 so updated bots are not a concern anymore :) BTW:congrats. On Thu, Feb 14, 2013 at 1:11 AM, Katie Chan k...@ktchan.info wrote: On 13/02/2013 21:31, Denny Vrandečić wrote: You have examples of that? Did not happen to my edits (so far). Just once so far - https://en.wikipedia.org/w/**index.php?title=Papal_** conclave,_2013diff=prev**oldid=538105382https://en.wikipedia.org/w/index.php?title=Papal_conclave,_2013diff=prevoldid=538105382 -- Experience is a good school but the fees are high. - Heinrich Heine __**_ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] And we're live on the Hebrew and Italian Wikipedia \o/
Congrats. Is iw bots working correctly? I made r11016 in pwb to disable them but i'm not sure it's working On Thursday, January 31, 2013, Marco Fleckinger marco.fleckin...@wikipedia.at wrote: Hey, congratulation for this important step. With this I may also use it though Italian is the only language out of the the present three WP-Versions I can understand ;-) Cheers Marco On 01/30/2013 08:59 PM, Lydia Pintscher wrote: Heya folks :) We're now live on the Hebrew and Italian Wikipedia as well. Woh! More details are in this blog post: http://blog.wikimedia.de/2013/01/30/wikidata-coming-to-the-next-two-wikipedias (including planned dates for the next deployments!) At the same time we updated the code on the Hungarian Wikipedia. These are however only minor fixes and changes. Thanks to everyone who helped and to the Hebrew and Italian Wikipedia for joining the Hungarian Wikipedia in a very elite club of awesome :P Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Phase 1
What is exact time of the next deployment (it and he)? And what time you think is best to disable interwiki bots? On Mon, Jan 28, 2013 at 2:12 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Mon, Jan 28, 2013 at 10:16 AM, Jan Kučera kozuc...@gmail.com wrote: How is the Hungarian pilot running? When will you deploy Phase 1 to other wikis? Hi Jan, Things seem fine on the Hungarian Wikipedia except for a few hickups/bugs and missing things that are being worked on now. Maybe some of the people from the Hungrian Wikipedia who are on this list want to chime in as well. The planned date for the next deployment (on the Italian and Hebrew Wikipedias) is 30th of this month. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Wikidata and PWB
Finally the bug is solved. you can get and set items on wikidata via PWB At first you must define a wikidataPage and these are methods you can use. I'll try to expand and make more methods but i don't know any and your help will be very useful: Supports the same interface as Page, with the following added methods: setitem : Setting item(s) on a page getentity: Getting item(s) of a page these examples worked for me: site=wikipedia.getSite('wikidata',fam='wikidata') page=wikipedia.wikidataPage(site,Helium) text=page.getentity() page.setitem(summary=uBOT: TESTING FOO,items={'type':u'item', 'label':'fa', 'value':'OK'}) page.setitem(summary=uBOT: TESTING GOO,items={'type':u'description', 'language':'en', 'value':'OK'}) page.setitem(summary=uBOT: TESTING BOO,items={'type':u'sitelink', 'site':'de', 'title':'BAR'}) Cheers! -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] deployed new code
Thanks Amir and Lydia. The RTL bugs was really annoying On 12/11/12, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Mon, Dec 10, 2012 at 10:37 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Yay, all my right-to-left fixes are live :) Thank you! Thank you for writing them! Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata and PWB
There is no need of using q### anymore. For other languages it's not possible for now but i gonna add it soon On 12/11/12, Luca Martinelli martinellil...@gmail.com wrote: 2012/12/10 Amir Ladsgroup ladsgr...@gmail.com: Finally the bug is solved. you can get and set items on wikidata via PWB Woo hoo! http://i.imgur.com/0D00v.gif -- Luca Sannita Martinelli http://it.wikipedia.org/wiki/Utente:Sannita ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bots and the new API
On Thu, Nov 22, 2012 at 10:43 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Thu, Nov 22, 2012 at 8:11 PM, Sven Manguard svenmangu...@gmail.com wrote: I'm getting contradictory messages from Wikidata staff then. I mean we already knew that we could, the issue is whether or not we should. There is no technical reason at the moment to not run bots. Beyond that it's not the staff's call to make. I want to run my bot for test but gave me error 10061 (actively refusing server) ! Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] wikidata.org is live (with some caveats)
why wbgetentity does not exist in wikidata.org's API? bots have problems now http://wikidata.org/w/api.php On Tue, Oct 30, 2012 at 9:45 PM, Gregor Hagedorn g.m.haged...@gmail.comwrote: Great work, my congratulations! --- Some first impressions: Changing the language does not really work, the title of the item pages remain in English. http://www.wikidata.org/wiki/Special:RecentChanges?setlang=de - (Unterschied | Versionen) . . Sweden (Q34); 17:48 . . (+32) . . Aplasia (Diskussion | Beiträge) (Fügte websitespezifischen [itwiki] Link hinzu: Paesi Bassi) although http://www.wikidata.org/wiki/Q34 does list de: Schweden. I am not sure whether this is a bug or by design? - In German, translation of Item with Datenelement = data element seems odd, a data element is usually something much smaller and atomic. See http://de.wikipedia.org/wiki/Datenelement Proposal: Artikel or Datenobjekt - Change http://www.wikidata.org/ to http://wikidata.org/ ? I think the logo image (upper left corner) looks a bit lost, 10% larger perhaps? It is not even left aligned with the menu text below, which has ample margin Finally: The Q### will become the public face of Wikidata, whereever it is re-used. I think this brand should be less cryptic and use: http://wikidata.org/wiki/WD34 or http://wikidata.org/wiki/W34 (providing a memnonic link to Wikidata) instead of current http://wikidata.org/wiki/Q34 Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] New release of Pywikidata: feature requests?
What is the main purpose of pywikidata? Everything it is, it can be done by PWB On 10/25/12, Joan Creus joan.creu...@gmail.com wrote: Yep, I saw the code and that is completely unrelated to pywikidata. I don't think we should be dividing the efforts, and focus on either of them. I personally think it would be best to keep Pywikidata as a separate project for a while, until pywikidata and wikidata matures. We can't be changing pywikipedia constaly for users to find a broken pywikidata. As I see it, Pywikipedia is already mature and (py)wikidata is still beta. Regardless, I think implementing something like Pywikidata where items are a Python class and properties can be accessed like item.sitelinks[cawiki] is better than handling directly low-level JSON (like PWB+WD is iirc) or something similar. The library should abstract those things and provide utility functions like wikidata.api.getItemById or wikidata.api.getItemByInterwiki, or whatever. Just some thoughts. Joan Creus. 2012/10/25 Amir Ladsgroup ladsgr...@gmail.com Lydia is right. I changed PWB to make it run on wikidata. Now i can write a code and integrate data of wikipedia and add it to wikidata DIRECTLY. I say again read this: http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata I suggest we merge pywikidata and PWB On 10/25/12, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Wed, Oct 24, 2012 at 7:08 PM, Joan Creus joan.creu...@gmail.com wrote: Hi all, I've released a new version of Pywikidata @ https://github.com/jcreus/pywikidata (not really a release, since there is no version numbering; I should begin, maybe...). Yay! :) New features include: Changes pushed to the server are only the properties which have changed. This means that it is really faster, and it is the way it should be (according to the spec). 'uselang' is no longer accepted, per spec. Bot flag allowed. Right now there's some more stuff left to do (including adapting to changes to the API, and adding the claim system [still beta]), but is there any feature request of something which would be useful for bots? In regards to Pywikipediabot integration I think someone is working on it (thanks!); yet I think it would be better to wait. Pywikipedia is a mature project and Pywikidata is still evolving constantly (mostly due to API changes, which break it). So I'd wait until Pywikidata has matured too, and Wikidata is deployed to WMF wikis. As far as I understood it this isn't really integration but instead implementing this directly in Pywikipediabot. Amir: Can you please clarify? Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] New release of Pywikidata: feature requests?
The old code needed id but the new one doesn't need I tried this and worked: site=wikipedia.getSite('wikidata',fam='wikidata') page=wikipedia.Page(site,Helium) page.put(u,uBOT: TESTING BOO,wikidata={'type':u'sitelink', 'site':'de', 'title':'BAR'}) Are you sure you are using the updated code? On 10/25/12, Joan Creus joan.creu...@gmail.com wrote: It's okay, I always said it would be best to merge it ;). But I still think it should wait (I don't care if it's merged now, but I think it would be better to wait; however, do whatever you want). The ID isn't necessary, you also have wikidata.api.getItemByInterwiki (defined in the Wikidata API). In regards to wikidata-is-still-mediawiki: yes, but it is structured data, thus we must treat it as such (much more useful for end users). Maybe the complexity now is low, but then, Claims will appear, and dealing with them in a seperate class would certainly be useful. It might be useful to handle it directly as you said in page.get()? Yes, sometimes. In that case we might provide both options (though I prefer the structured one). (BUT the user should be warned that (s)he is editing JSON data as a string! If (s)he wants to replace wiki by woki, (s)he'll probably break it). Finally, I tried your code and it seems to require the id, also (as in q320 as page title) :S. Joan Creus. -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] features needed by langlink bots (was Re: update from the Hebrew Wikipedia)
Is it helpful? http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata On 10/24/12, Merlissimo m...@toolserver.org wrote: Just to answer these questions for my _java_ interwiki bot MerlIwBot: Am 12.10.2012 16:45, schrieb Amir E. Aharoni: Will the bots be smart enough not to do anything to articles that are already listed in the repository and have the correct links displayed? Will the bots be smart enough not to do anything with articles that have interwiki conflicts (multiple links, non-1-to-1 linking etc.)? yes, because this is wikidata independent for my java bot and worked already before. So running my bot is always save on these wikis. non-1-to-1 langlinks can be moved partly to wikidata because mixing if wikidata and local langlinks is possible. But please note that since change https://gerrit.wikimedia.org/r/#/c/25232/ which is live with 1.21.wmf2 no multiple langlinks are display anymore. Will the bots be smart enough to update the repo in the transition period, when some Wikipedias have Wikidata and some don't? My bot can update the repository but will cause much load on local wikis. That's because currently there is no possibility to know if a langlinks is stored local or at wikidata. Local langlinks can be stored on main page or any included page. So the whole source code of every included page must be checked first. I created a feature request which could solve this problem: https://bugzilla.wikimedia.org/show_bug.cgi?id=41345 . If hope this will be added before wikidata goes live. To update the repository my bot need to know the corresponding repository script url for a local wiki. Currently http://wikidata-test-repo.wikimedia.de/w/api.php is hard coded at my bot framework. But this repository url will change for hewiki. There is currently no way to request this info using api. I also created a feature requests for this: https://bugzilla.wikimedia.org/show_bug.cgi?id=41347 Merlissimo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Fwd: Wikidata bug (or semi-bug ;)
Sumana says i forward this e-mail to wikidata-l. So i did it :) -- Forwarded message -- From: Amir Ladsgroup ladsgr...@gmail.com Date: Fri, 19 Oct 2012 03:18:57 +0330 Subject: Wikidata bug (or semi-bug ;) To: wikitec...@lists.wikimedia.org Hello I'm working on running PWB on wikidata and i want to do some edits via API so i did this: http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitemid=392data={%22labels%22:{%22ar%22:{%22language%22:%22ar%22,%22value%22:%22Bar%22}}}format=jsonfm but gives me this: http://wikidata-test-repo.wikimedia.de/w/index.php?title=Q392diff=105675oldid=33972 As you can see, the label changed but the change is not shown in List of pages linked to this item(79 entries) section. I think it's because they are not the same but it's better to be! and besides does anybody know how can i take page id? i mean i give Tantalum and take 392 or Q392 Best wishes -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata bug (or semi-bug ;)
Is there anybody who can help me? On 10/19/12, Amir Ladsgroup ladsgr...@gmail.com wrote: Sumana says i forward this e-mail to wikidata-l. So i did it :) -- Forwarded message -- From: Amir Ladsgroup ladsgr...@gmail.com Date: Fri, 19 Oct 2012 03:18:57 +0330 Subject: Wikidata bug (or semi-bug ;) To: wikitec...@lists.wikimedia.org Hello I'm working on running PWB on wikidata and i want to do some edits via API so i did this: http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitemid=392data={%22labels%22:{%22ar%22:{%22language%22:%22ar%22,%22value%22:%22Bar%22}}}format=jsonfm but gives me this: http://wikidata-test-repo.wikimedia.de/w/index.php?title=Q392diff=105675oldid=33972 As you can see, the label changed but the change is not shown in List of pages linked to this item(79 entries) section. I think it's because they are not the same but it's better to be! and besides does anybody know how can i take page id? i mean i give Tantalum and take 392 or Q392 Best wishes -- Amir -- Amir -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l