Re: [Wikidata-l] Wikidata just got 10 times easier to use
Hi Lydia, Two questions: * Is it possible to see if an added statement was suggested to the user? * * Do you have some kind of versioning for the UI, to see if features you add have a positive effect on the way Wikidata is used? I'm not familiar with the tag system, but it looks like it can be exploited to see where the users edit Wikidata. I can see edits made with the wikidata game or other tools from Magnus have a Widar[1.3] Tag, maybe this should be splitted into one tag for each application. * Can you send a link to the thesis you mentioned? Lukas Am Di 01.07.2014 21:20, schrieb Lydia Pintscher: Hey folks :) We have just deployed the entity suggester. This helps you with suggesting properties. So when you now add a new statement to an item it will suggest what should most likely be added to that item. One example: You are on an item about a person but it doesn't have a date of birth yet. Since a lot of other items about persons have a date of birth it will suggest you also add one to this item. This will make it a lot easier for you to figure out what the hell is missing on an item and which property to use. Thank you so much to the student team who worked on this as part of their bachelor thesis over the last months as well as everyone who gave feedback and helped them along the way. I'm really happy to see this huge improvement towards making Wikidata easier to use. I hope so are you. Cheers Lydia ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On Wed, Jul 2, 2014 at 12:40 PM, Lukas Benedix bene...@zedat.fu-berlin.de wrote: Hi Lydia, Two questions: * Is it possible to see if an added statement was suggested to the user? Not currently and I don't think it'd be trivial to do unfortunately. * * Do you have some kind of versioning for the UI, to see if features you add have a positive effect on the way Wikidata is used? I have the dates we rolled out certain features and I can look at how some key metrics change over time. So yes I can see if this has the effect we all intend it to have. I'm not familiar with the tag system, but it looks like it can be exploited to see where the users edit Wikidata. I can see edits made with the wikidata game or other tools from Magnus have a Widar[1.3] Tag, maybe this should be splitted into one tag for each application. I think it is mostly a matter of convenience so you don't have to reauthenticate each of those individual applications. Technically it could be done. But this is something for Magnus to decide if he wants it. * Can you send a link to the thesis you mentioned? I don't think it is published yet. Once the students have published it they can send an email here. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
Markus Krötzsch mar...@semantic-mediawiki.org writes: I guess the new property suggester rather errs on the other side, being tricked into suggesting very frequent properties even in places that don't need them. I fund some, most notably Date of death for living people which is likely inevitable even if you add a filter of the type too joung to be dead I presume ;-) Yet I am more wondering about properties not suggest where they cold be, such as various IDs for people (like VIAF; ORCID; etc.) Purodha ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On 02/07/14 16:29, David Cuenca wrote: On Tue, Jul 1, 2014 at 11:07 PM, Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org wrote: My hope is that with my other suggestion (using P31 values as features to correlate with), the property suggester will already be able to outperform my little toy algorithm anyway. One could also combine the two (my algorithm is really simple [1]), but maybe this is not needed. Interesting. That could also help to identify values with a high deviation, and perhaps even do a better job than some template constraints. I was trying to check more classes, but the server seems to have trouble: Error: could not load file 'classes/Classes.csv' http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q2087181 Strange. Works for me. But we had some temporary service problems at WMF Labs recently, so maybe there was some aftermath of these. In any case, I should update the software -- Yaron has further improved Miga to lower the initial load times significantly. I'll send another email when I have new code/new data there. Anyhow, many thanks for working on this. My pleasure. :-) Markus Micru ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Wikidata just got 10 times easier to use
Hey folks :) We have just deployed the entity suggester. This helps you with suggesting properties. So when you now add a new statement to an item it will suggest what should most likely be added to that item. One example: You are on an item about a person but it doesn't have a date of birth yet. Since a lot of other items about persons have a date of birth it will suggest you also add one to this item. This will make it a lot easier for you to figure out what the hell is missing on an item and which property to use. Thank you so much to the student team who worked on this as part of their bachelor thesis over the last months as well as everyone who gave feedback and helped them along the way. I'm really happy to see this huge improvement towards making Wikidata easier to use. I hope so are you. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On 1 July 2014 20:20, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: We have just deployed the entity suggester. This helps you with suggesting properties. So when you now add a new statement to an item it will suggest what should most likely be added to that item. One example: You are on an item about a person but it doesn't have a date of birth yet. Since a lot of other items about persons have a date of birth it will suggest you also add one to this item. This is a great idea, but I've just tried it on Q4810979 (about an historic building) and it prompted me for a date of birth, gender, taxon rank or taxon name. Teething troubles? -- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
We still need to tweak it a bit here and there, yeah. We're working on that right now. Also it will get smarter as more statements are added to items. Even with some somewhat off suggestions this will be a wonderful tool. Thank you to everyone who worked on making this happen! This really is going to make Wikidata so much easier to use. Is there any documentation on how it chooses which entities to suggest? Thank you, Derric Atzrott ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On Tue, Jul 1, 2014 at 9:52 PM, Derric Atzrott datzr...@alizeepathology.com wrote: We still need to tweak it a bit here and there, yeah. We're working on that right now. Also it will get smarter as more statements are added to items. Even with some somewhat off suggestions this will be a wonderful tool. \o/ We've just changed the threshold a bit so it should give less but more fitting suggestions. We'll play around with that setting a bit more over the next days to find the one that's right for us. Thank you to everyone who worked on making this happen! This really is going to make Wikidata so much easier to use. Is there any documentation on how it chooses which entities to suggest? It basically creates a table of correlations for properties over all items in Wikidata. So if say date of birth and place of birth are used together a lot they get a high correlation. When you then have an item with no place of birth but a date of birth it will suggest that because of the high correlation. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On 01/07/14 21:47, Lydia Pintscher wrote: On Tue, Jul 1, 2014 at 9:44 PM, Andy Mabbett a...@pigsonthewing.org.uk wrote: On 1 July 2014 20:20, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: We have just deployed the entity suggester. This helps you with suggesting properties. So when you now add a new statement to an item it will suggest what should most likely be added to that item. One example: You are on an item about a person but it doesn't have a date of birth yet. Since a lot of other items about persons have a date of birth it will suggest you also add one to this item. This is a great idea, but I've just tried it on Q4810979 (about an historic building) and it prompted me for a date of birth, gender, taxon rank or taxon name. Teething troubles? We still need to tweak it a bit here and there, yeah. We're working on that right now. Also it will get smarter as more statements are added to items. I hope tweaking will suffice. At least it seems that there is already enough data to find slightly more related related properties ;-). Here is the list of properties that I get for the two classes of Q4810979 (recall that I compute related properties for each class). (1) historic house museum http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q2087181 Related properties: English Heritage list number, OS grid reference, owned by, inspired by, coordinate location, visitors per year, Commons category, architect, mother house, manager/director, country, commissioned by, architectural style, MusicBrainz place ID, use, date of foundation or creation, street (2) Grade I listed building http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818 Related properties: English Heritage list number, masts, Minor Planet Center observatory code, home port, coordinate location, OS grid reference, mother house, architect, manager/director, Emporis ID, MusicBrainz place ID, country, architectural style, visitors per year, Commons category, Structurae ID (structure), officially opened by, floors above ground, inspired by, religious order, number of platforms, street, owned by, diocese These are computed fully automatically from the data, with no manual filtering or user input. But don't get me wrong -- great work! Brilliant to have such a thing integrated into the UI. In any case, my algorithm for computing the related properties is certainly very different from theirs; I am sure it also has its glitches. Cheers, Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
I noticed that sometimes it is a bit sluggish when showing the suggestions, maybe it is my network, no idea. And it would be nice if it could suggest the three basic properties (instance of, sublcass of, part of) when the item is empty and suggest further properties based on that initial value, Other than that, good job! On Tue, Jul 1, 2014 at 10:00 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Tue, Jul 1, 2014 at 9:52 PM, Derric Atzrott datzr...@alizeepathology.com wrote: We still need to tweak it a bit here and there, yeah. We're working on that right now. Also it will get smarter as more statements are added to items. Even with some somewhat off suggestions this will be a wonderful tool. \o/ We've just changed the threshold a bit so it should give less but more fitting suggestions. We'll play around with that setting a bit more over the next days to find the one that's right for us. Thank you to everyone who worked on making this happen! This really is going to make Wikidata so much easier to use. Is there any documentation on how it chooses which entities to suggest? It basically creates a table of correlations for properties over all items in Wikidata. So if say date of birth and place of birth are used together a lot they get a high correlation. When you then have an item with no place of birth but a date of birth it will suggest that because of the high correlation. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On 01/07/14 22:14, Markus Krötzsch wrote: ... (2) Grade I listed building http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818 Related properties: English Heritage list number, masts, Minor Planet Center observatory code, home port, coordinate location, OS grid reference, mother house, architect, manager/director, Emporis ID, MusicBrainz place ID, country, architectural style, visitors per year, Commons category, Structurae ID (structure), officially opened by, floors above ground, inspired by, religious order, number of platforms, street, owned by, diocese These are computed fully automatically from the data, with no manual filtering or user input. But don't get me wrong -- great work! Brilliant to have such a thing integrated into the UI. In any case, my algorithm for computing the related properties is certainly very different from theirs; I am sure it also has its glitches. P.S. One weakness of my algorithm you can already see: it has troubles estimating the relevance of very rare properties, such as Minor Planet Center observatory code above. A single wrong annotation may then lead to wrong suggestions. Also, it seems from my list under (2) that some Grade I listed buildings are ships. This seems to be an error that is amplified by the fact that property masts is used only 11 times in the dataset I evaluated (last week's data). I guess the new property suggester rather errs on the other side, being tricked into suggesting very frequent properties even in places that don't need them. -- Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
Markus, could your algorithm work together with human direction? Like, if we entered which properties are common for a class, and then a user creates an instance of that class, would the algorithm be able to sort those properties based on how often they appear on the database? Thanks, Micru On Tue, Jul 1, 2014 at 10:23 PM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: On 01/07/14 22:14, Markus Krötzsch wrote: ... (2) Grade I listed building http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id= Q15700818 Related properties: English Heritage list number, masts, Minor Planet Center observatory code, home port, coordinate location, OS grid reference, mother house, architect, manager/director, Emporis ID, MusicBrainz place ID, country, architectural style, visitors per year, Commons category, Structurae ID (structure), officially opened by, floors above ground, inspired by, religious order, number of platforms, street, owned by, diocese These are computed fully automatically from the data, with no manual filtering or user input. But don't get me wrong -- great work! Brilliant to have such a thing integrated into the UI. In any case, my algorithm for computing the related properties is certainly very different from theirs; I am sure it also has its glitches. P.S. One weakness of my algorithm you can already see: it has troubles estimating the relevance of very rare properties, such as Minor Planet Center observatory code above. A single wrong annotation may then lead to wrong suggestions. Also, it seems from my list under (2) that some Grade I listed buildings are ships. This seems to be an error that is amplified by the fact that property masts is used only 11 times in the dataset I evaluated (last week's data). I guess the new property suggester rather errs on the other side, being tricked into suggesting very frequent properties even in places that don't need them. -- Markus ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Etiamsi omnes, ego non ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata just got 10 times easier to use
On 01/07/14 22:43, Bene* wrote: Am 01.07.2014 22:23, schrieb Markus Krötzsch: P.S. One weakness of my algorithm you can already see: it has troubles estimating the relevance of very rare properties, such as Minor Planet Center observatory code above. A single wrong annotation may then lead to wrong suggestions. Also, it seems from my list under (2) that some Grade I listed buildings are ships. This seems to be an error that is amplified by the fact that property masts is used only 11 times in the dataset I evaluated (last week's data). I guess the new property suggester rather errs on the other side, being tricked into suggesting very frequent properties even in places that don't need them. However, it is obviously better if the algorithm performs well for frequently used properties. Isn't it possible to combine those two systems so they improve each other. One could check how often the property is used and then rely on Markus' or the students' algorithm. My hope is that with my other suggestion (using P31 values as features to correlate with), the property suggester will already be able to outperform my little toy algorithm anyway. One could also combine the two (my algorithm is really simple [1]), but maybe this is not needed. Cheers Markus [1] For each class C and property P, I count: * #C: the number of items in class C * #P: the number of items using property P * #PC: the number of items in class C using the property P * #items: the total number of items Then I compute two rates: * rateCP = #PC / #C (fraction of items in a class with the property) * rateP = #P / #items (fraction of all items with the property) I then rank the properties for each class by the ratio of rateCP/rateP (intuitively: by what factor does the property of P increase for items in C?). Moreover, I apply two sigmoid functions [2] to the rates as additional factors, so as to ensure that properties are less relevant if they have very high or very low values for the rates. I don't care about things that almost everything/almost nothing has. Obviously, one can tweak this if one wants to include properties that almost everything has anyway. [2] https://www.google.com/search?sclient=psy-abq=1+%2F+%281+%2B+exp%286+*+%28-2+*+x+%2B+0.5%29%29%29btnG= ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l