Re: [Wikidata-l] Bot request: 250+ thousands person data
Do you know why this edit isn't shown correctly? https://www.wikidata.org/w/index.php?title=Q4119465diff=123932128oldid=123931985 Best On Tue, Apr 29, 2014 at 9:34 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Am 29.04.2014 17:25, schrieb David Cuenca: Is it possible to have just an lower bond, leaving the upper one open? No. It's a precision interval, not a range. Range Snaks may be introduced in the future, but for now, you should use dedicated properties for start and end to express a range. I am thinking of uses like https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#earliest_date That's exactly the kind of property that *doesn't* need an interval or open precision: the earliest date is a precise point. For things like circa I don't see any clear solution other than inventing some ranges... Yes. I think it's reasonable to do that along the same lines that you do when reading ca 1850: I would read that as +/- 10 year. ca 2014 is probably +/- 1 year, and around August 1986 is +/- 1 month, while around August 10 is probably +/- a week or so. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
On Wed, Apr 30, 2014 at 11:45 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: Do you know why this edit isn't shown correctly? https://www.wikidata.org/w/index.php?title=Q4119465diff=123932128oldid=123931985 Will have a look. Thx. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
this problem is being tracked in https://bugzilla.wikimedia.org/show_bug.cgi?id=60999 Best On Wed, Apr 30, 2014 at 8:20 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Wed, Apr 30, 2014 at 11:45 AM, Amir Ladsgroup ladsgr...@gmail.com wrote: Do you know why this edit isn't shown correctly? https://www.wikidata.org/w/index.php?title=Q4119465diff=123932128oldid=123931985 Will have a look. Thx. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Hoi, When Wikipedia has an approach to specific articles that are not compatible with Wikidata, we can create items that fit our need and keep the original item for what it is .. for instance a list of people (in the case of the Wright brothers). The notion that Wikidata defers to Wikipedia is not one can keep because there are bound to be Wikipedias who differ in their approach and have an article for both Wilbur and Orville Wright.. Yes, it is good to have a hope for algorithms in the future, in the mean time consider what percentage is wrong and that quite often not having data is more damaging than having data that can be manipulated with queries, tools. No data is no grip at all. We do have queries in WDQ/Autolist and we have tools in ToolScript and pywikipedia. IMHO the most important thing we should do to get better quality is report on differences. This helps all projects involved in an import / export / comparison. Thanks, Gerard On 29 April 2014 09:15, John Mark Vandenberg jay...@gmail.com wrote: On Sun, Apr 27, 2014 at 8:28 PM, Amir Ladsgroup ladsgr...@gmail.com wrote: there are some problems in using bio template for example they used it for a group of people https://it.wikipedia.org/wiki/Fratelli_Wright This is quite a difficult problem. Also look for infoboxes not at the top of a page, because the Wikipedia page contains two concepts. Here is an example with {{Bio}}: https://it.wikipedia.org/wiki/Slashdot In the journals area, I faced this many times with the article about a society not having an infobox for the society, but including an infobox in a section for their primary journal . My bot has some very hacky code to detect the infobox type in a few languages https://www.wikidata.org/wiki/User:JVbot/periodicalbot.py (the first function) It would be good if we can create an algorithm that detects all these anomalies, or a special hidden parameter added to the invocation, to exclude those templates from automated parsing, but also lists all pages like this so that those pages can be split on the Wikipedias (unless notability rules prevent the split). -- John Vandenberg ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
2014-04-28 17:08 GMT+02:00 David Cuenca dacu...@gmail.com: On Mon, Apr 28, 2014 at 3:10 PM, Luca Martinelli martinellil...@gmail.com wrote: I recalled the fact quite correctly: https://it.wikipedia.org/wiki/Modulo:Bio takes dates of birth and death from Wikidata. I think we can talk to extend the possibility to gender, and later to other fields. That's perfect, because that means that the bot can just delete the text on import. I would say -1 for the moment. We first need to talk about it and create hidden categories in order to control the retrievals. There's time to delete. :) -- Luca Sannita Martinelli http://it.wikipedia.org/wiki/Utente:Sannita ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Am 29.04.2014 03:53, schrieb Amir Ladsgroup: It's not a big deal, parsing it would be no problem, I can use it in parsing data from Bio template in Italian Wikipedia but I have to use precision argument in snak. Am I right? Yes, exactly. what value have to set for precision if I have just year (and no month and day)? If you just have they yea, the precision value would be 9. This is arbitrary and obscure, sorry. I have filed a bug to fix this: https://bugzilla.wikimedia.org/show_bug.cgi?id=64593 For reference, here is the table of precisions to be used for time values, as defined in the TimeValue class: const PRECISION_Ga = 0; // Gigayear const PRECISION_100Ma = 1; // 100 Megayears const PRECISION_10Ma = 2; // 10 Megayears const PRECISION_Ma = 3; // Megayear const PRECISION_100ka = 4; // 100 Kiloyears const PRECISION_10ka = 5; // 10 Kiloyears const PRECISION_ka = 6; // Kiloyear const PRECISION_100a = 7; // 100 years const PRECISION_10a = 8; // 10 years const PRECISION_YEAR = 9; const PRECISION_MONTH = 10; const PRECISION_DAY = 11; const PRECISION_HOUR = 12; const PRECISION_MINUTE = 13; const PRECISION_SECOND = 14; If you have something like between 1846 and 1855, you can use the before and after fields of the time value: time: +0001850-00-00T00:00:00Z, precision: 9, before: 4, after: 5 This means the main value is 1850, given as a year, with a lower bound four years before and an upper bound 5 years after the main value (before and after are given in the unit specified by the precision value). The main value is what is going to be displayed per default; it will also be used for sorting query results (once we have queries). This is a bit complicated, but should allow you to actually represent uncertain dates. We made it so you can be precise about the uncertainty :) HTH Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Il 29/apr/2014 09:31 Gerard Meijssen gerard.meijs...@gmail.com ha scritto: Hoi, When Wikipedia has an approach to specific articles that are not compatible with Wikidata, we can create items that fit our need and keep the original item for what it is .. for instance a list of people (in the case of the Wright brothers). The notion that Wikidata defers to Wikipedia is not one can keep because there are bound to be Wikipedias who differ in their approach and have an article for both Wilbur and Orville Wright.. Exactly, I kinda had the same problem with Sacco and Vanzetti when I was uploading Italian authority codes. They have two different codes in the Italian national library system, but have a joint article on Wikipedia. L. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
On Tue, Apr 29, 2014 at 12:48 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: If you have something like between 1846 and 1855, you can use the before and after fields of the time value: time: +0001850-00-00T00:00:00Z, precision: 9, before: 4, after: 5 This means the main value is 1850, given as a year, with a lower bound four years before and an upper bound 5 years after the main value (before and after are given in the unit specified by the precision value). The main value is what is going to be displayed per default; it will also be used for sorting query results (once we have queries). Is it possible to have just an lower bond, leaving the upper one open? I am thinking of uses like https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#earliest_date For things like circa I don't see any clear solution other than inventing some ranges... Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
On Mon, Apr 28, 2014 at 3:10 PM, Luca Martinelli martinellil...@gmail.comwrote: I recalled the fact quite correctly: https://it.wikipedia.org/wiki/Modulo:Bio takes dates of birth and death from Wikidata. I think we can talk to extend the possibility to gender, and later to other fields. That's perfect, because that means that the bot can just delete the text on import. What is missing are the fields brth/death date after and before for uncertain dates (DataNascitaDopo/DataNascitaPrima?). I'm curious to see that working :) Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Hi, Am 25.04.2014, 18:31 Uhr, schrieb Federico Leva (Nemo) nemow...@gmail.com: Wikidata feels empty... even data on people is almost non-existing. The Italian Wikipedia has the most complete persondata dataset in Wikimedia world, ready for import. Legoktm's bot was almost ready to parse the {{bio}} template, some code tweaking will be needed. No takers, really? This is very low hanging fruit for bots. the german wikipedia has persondata for over 525.000 persons, the raw data could be found as csv at http://tools.wmflabs.org/persondata/data/pd_dump.txt The german dates have to be parsed, and I think the uncertainty for many persons is the main problem (born 5th of July 1716 or 15th of July 1718, between 1918 and 1921). But of course for a lot of persons there are accurate information, which could be imported to Wikidata. Greets, Christian / APPER ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Christian Thiele, 27/04/2014 11:16: the german wikipedia has persondata for over 525.000 persons, the raw data could be found as csv at http://tools.wmflabs.org/persondata/data/pd_dump.txt Sure, that's a useful source as well. Not as complete for each of the items, though. The german dates have to be parsed, and I think the uncertainty for many persons is the main problem (born 5th of July 1716 or 15th of July 1718, between 1918 and 1921). Hm. No such problem exists with it.wiki's {{bio}}, which is very restrictive in what you can enter in it. Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Maybe it is possible to identify those cases and not use wd data for them? They must represent a very tiny percentage of the total... Cheers, Micru On Sun, Apr 27, 2014 at 12:28 PM, Amir Ladsgroup ladsgr...@gmail.comwrote: there are some problems in using bio template for example they used it for a group of people https://it.wikipedia.org/wiki/Fratelli_Wright On Sun, Apr 27, 2014 at 2:51 PM, David Cuenca dacu...@gmail.com wrote: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? In the past there was some animosity against importing data without their sources, but if it is data that is being used and displayed on Wikipedia, then I guess it would be regarded differently. Anyhow, these kinds of discussions are better on-wiki. On Sun, Apr 27, 2014 at 11:16 AM, Christian Thiele ap...@apper.dewrote: The german dates have to be parsed, and I think the uncertainty for many persons is the main problem (born 5th of July 1716 or 15th of July 1718, between 1918 and 1921). But of course for a lot of persons there are accurate information, which could be imported to Wikidata. For this kinds of situation you have the before and after of the time datatype: https://www.wikidata.org/wiki/Special:ListDatatypes The problem is that it is not visible yet... https://bugzilla.wikimedia.org/show_bug.cgi?id=61909 Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
It's easy I skip pages with more than one bio templates I'm working on harvesting information right now and I'll start very soon Best On Sun, Apr 27, 2014 at 3:03 PM, David Cuenca dacu...@gmail.com wrote: Maybe it is possible to identify those cases and not use wd data for them? They must represent a very tiny percentage of the total... Cheers, Micru On Sun, Apr 27, 2014 at 12:28 PM, Amir Ladsgroup ladsgr...@gmail.comwrote: there are some problems in using bio template for example they used it for a group of people https://it.wikipedia.org/wiki/Fratelli_Wright On Sun, Apr 27, 2014 at 2:51 PM, David Cuenca dacu...@gmail.com wrote: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? In the past there was some animosity against importing data without their sources, but if it is data that is being used and displayed on Wikipedia, then I guess it would be regarded differently. Anyhow, these kinds of discussions are better on-wiki. On Sun, Apr 27, 2014 at 11:16 AM, Christian Thiele ap...@apper.dewrote: The german dates have to be parsed, and I think the uncertainty for many persons is the main problem (born 5th of July 1716 or 15th of July 1718, between 1918 and 1921). But of course for a lot of persons there are accurate information, which could be imported to Wikidata. For this kinds of situation you have the before and after of the time datatype: https://www.wikidata.org/wiki/Special:ListDatatypes The problem is that it is not visible yet... https://bugzilla.wikimedia.org/show_bug.cgi?id=61909 Cheers, Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
David Cuenca, 27/04/2014 12:21: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? The Italian Wikipedia has a track of early adoption of Wikidata as a source. Almost everything that was added to Wikidata was immediately put into use (most recent big example, I think, the {{interprogetto}}). It wouldn't take long before {{bio}} starts using the data once it's available (probably days or weeks), it's been discussed several times and nobody appeared to dislike the idea. Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Il 27/apr/2014 12:59 Federico Leva (Nemo) nemow...@gmail.com ha scritto: David Cuenca, 27/04/2014 12:21: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? The Italian Wikipedia has a track of early adoption of Wikidata as a source. Almost everything that was added to Wikidata was immediately put into use (most recent big example, I think, the {{interprogetto}}). It wouldn't take long before {{bio}} starts using the data once it's available (probably days or weeks), it's been discussed several times and nobody appeared to dislike the idea. If I'm not mistaken, there are (or were) already some experiments going on with {{Bio}} using data from Wikidata, possibly for the image field. Anyway, if Amir (thanks!) is really going to upload that data, nobody is preventing us from trying to make an experiment on large scale. I'll talk with the Italian community about it. L. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
I started my bothttps://www.wikidata.org/wiki/Special:Contributions/Dexboton: P31 (instance of), P21 (gender), P19 (place of birth), and P20 (place of death) I also wrote the code to import dates of birth and death but I'm not running it yet because there is one important question: What is the colander model you use as date of birth and death? in some places Gregorian wasn't common until 1912 so I can't add these dates before 1912 because the bot can't be sure about calender model of these dates Best On Sun, Apr 27, 2014 at 3:32 PM, Luca Martinelli martinellil...@gmail.comwrote: Il 27/apr/2014 12:59 Federico Leva (Nemo) nemow...@gmail.com ha scritto: David Cuenca, 27/04/2014 12:21: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? The Italian Wikipedia has a track of early adoption of Wikidata as a source. Almost everything that was added to Wikidata was immediately put into use (most recent big example, I think, the {{interprogetto}}). It wouldn't take long before {{bio}} starts using the data once it's available (probably days or weeks), it's been discussed several times and nobody appeared to dislike the idea. If I'm not mistaken, there are (or were) already some experiments going on with {{Bio}} using data from Wikidata, possibly for the image field. Anyway, if Amir (thanks!) is really going to upload that data, nobody is preventing us from trying to make an experiment on large scale. I'll talk with the Italian community about it. L. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
I was stumped by the same question and couldn't find an answer anywhere either - as I recall, I just picked the default option, whichever it was 2014-04-27 14:04 GMT+02:00, Amir Ladsgroup ladsgr...@gmail.com: I started my bothttps://www.wikidata.org/wiki/Special:Contributions/Dexboton: P31 (instance of), P21 (gender), P19 (place of birth), and P20 (place of death) I also wrote the code to import dates of birth and death but I'm not running it yet because there is one important question: What is the colander model you use as date of birth and death? in some places Gregorian wasn't common until 1912 so I can't add these dates before 1912 because the bot can't be sure about calender model of these dates Best On Sun, Apr 27, 2014 at 3:32 PM, Luca Martinelli martinellil...@gmail.comwrote: Il 27/apr/2014 12:59 Federico Leva (Nemo) nemow...@gmail.com ha scritto: David Cuenca, 27/04/2014 12:21: @Nemo, Apper: Do you think you could import that data into the wd-repo AND make use of it via an inclusion template? The Italian Wikipedia has a track of early adoption of Wikidata as a source. Almost everything that was added to Wikidata was immediately put into use (most recent big example, I think, the {{interprogetto}}). It wouldn't take long before {{bio}} starts using the data once it's available (probably days or weeks), it's been discussed several times and nobody appeared to dislike the idea. If I'm not mistaken, there are (or were) already some experiments going on with {{Bio}} using data from Wikidata, possibly for the image field. Anyway, if Amir (thanks!) is really going to upload that data, nobody is preventing us from trying to make an experiment on large scale. I'll talk with the Italian community about it. L. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Amir ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
Amir Ladsgroup, 27/04/2014 14:04: I also wrote the code to import dates of birth and death but I'm not running it yet because there is one important question: What is the colander model you use as date of birth and death? Consensus has mostly been to force gregorian calendar everywhere. I'll add more details on https://www.wikidata.org/wiki/Wikidata:Bot_requests/Italian_Wikipedia_person_data in a moment; please ask specific questions there so that we can edit the data mapping gradually. :) Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Bot request: 250+ thousands person data
David Cuenca, 27/04/2014 15:38: One of the things I would like to see in Wikidata is the replacement of imported from:Wikipedia X by another property (or function), that would show data shown on:Wikipedia X. That's like a crosswiki WhatLinksHere or a globalusage for data. I don't find a bug for it, please file. Nemo ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l