Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-02 Thread Lukas Benedix
Hi Lydia,

Two questions:

* Is it possible to see if an added statement was suggested to the user?
* * Do you have some kind of versioning for the UI, to see if features
you add have a positive effect on the way Wikidata is used?

I'm not familiar with the tag system, but it looks like it can be
exploited to see where the users edit Wikidata. I can see edits made
with the wikidata game or other tools from Magnus have a Widar[1.3]
Tag, maybe this should be splitted into one tag for each application.

* Can you send a link to the thesis you mentioned?


Lukas


Am Di 01.07.2014 21:20, schrieb Lydia Pintscher:
 Hey folks :)

 We have just deployed the entity suggester. This helps you with
 suggesting properties. So when you now add a new statement to an item
 it will suggest what should most likely be added to that item. One
 example: You are on an item about a person but it doesn't have a date
 of birth yet. Since a lot of other items about persons have a date of
 birth it will suggest you also add one to this item. This will make it
 a lot easier for you to figure out what the hell is missing on an item
 and which property to use.

 Thank you so much to the student team who worked on this as part of
 their bachelor thesis over the last months as well as everyone who
 gave feedback and helped them along the way.

 I'm really happy to see this huge improvement towards making Wikidata
 easier to use. I hope so are you.


 Cheers
 Lydia



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-02 Thread Lydia Pintscher
On Wed, Jul 2, 2014 at 12:40 PM, Lukas Benedix
bene...@zedat.fu-berlin.de wrote:
 Hi Lydia,

 Two questions:

 * Is it possible to see if an added statement was suggested to the user?

Not currently and I don't think it'd be trivial to do unfortunately.

 * * Do you have some kind of versioning for the UI, to see if features
 you add have a positive effect on the way Wikidata is used?

I have the dates we rolled out certain features and I can look at how
some key metrics change over time. So yes I can see if this has the
effect we all intend it to have.

 I'm not familiar with the tag system, but it looks like it can be
 exploited to see where the users edit Wikidata. I can see edits made
 with the wikidata game or other tools from Magnus have a Widar[1.3]
 Tag, maybe this should be splitted into one tag for each application.

I think it is mostly a matter of convenience so you don't have to
reauthenticate each of those individual applications. Technically it
could be done. But this is something for Magnus to decide if he wants
it.

 * Can you send a link to the thesis you mentioned?

I don't think it is published yet. Once the students have published it
they can send an email here.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-02 Thread P. Blissenbach
Markus Krötzsch mar...@semantic-mediawiki.org writes:

 I guess the new property 
 suggester rather errs on the other side, being tricked into suggesting 
 very frequent properties even in places that don't need them.

I fund some, most notably Date of death for living people which is
likely inevitable even if you add a filter of the type
too joung to be dead I presume ;-) 

Yet I am more wondering about properties not suggest where they cold be,
such as various IDs for people (like VIAF; ORCID; etc.)

Purodha

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-02 Thread Markus Krötzsch

On 02/07/14 16:29, David Cuenca wrote:

On Tue, Jul 1, 2014 at 11:07 PM, Markus Krötzsch
mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org
wrote:

My hope is that with my other suggestion (using P31 values as
features to correlate with), the property suggester will already be
able to outperform my little toy algorithm anyway. One could also
combine the two (my algorithm is really simple [1]), but maybe this
is not needed.


Interesting. That could also help to identify values with a high
deviation, and perhaps even do a better job than some template constraints.
I was trying to check more classes, but the server seems to have
trouble: Error: could not load file 'classes/Classes.csv'
http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q2087181


Strange. Works for me. But we had some temporary service problems at WMF 
Labs recently, so maybe there was some aftermath of these.


In any case, I should update the software -- Yaron has further improved 
Miga to lower the initial load times significantly. I'll send another 
email when I have new code/new data there.




Anyhow, many thanks for working on this.


My pleasure. :-)

Markus



Micru


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Lydia Pintscher
Hey folks :)

We have just deployed the entity suggester. This helps you with
suggesting properties. So when you now add a new statement to an item
it will suggest what should most likely be added to that item. One
example: You are on an item about a person but it doesn't have a date
of birth yet. Since a lot of other items about persons have a date of
birth it will suggest you also add one to this item. This will make it
a lot easier for you to figure out what the hell is missing on an item
and which property to use.

Thank you so much to the student team who worked on this as part of
their bachelor thesis over the last months as well as everyone who
gave feedback and helped them along the way.

I'm really happy to see this huge improvement towards making Wikidata
easier to use. I hope so are you.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Andy Mabbett
On 1 July 2014 20:20, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote:
 We have just deployed the entity suggester. This helps you with
 suggesting properties. So when you now add a new statement to an item
 it will suggest what should most likely be added to that item. One
 example: You are on an item about a person but it doesn't have a date
 of birth yet. Since a lot of other items about persons have a date of
 birth it will suggest you also add one to this item.

This is a great idea, but I've just tried it on Q4810979 (about an
historic building) and it prompted me for a date of birth, gender,
taxon rank or taxon name.

Teething troubles?

-- 
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Derric Atzrott
 We still need to tweak it a bit here and there, yeah. We're working on
 that right now. Also it will get smarter as more statements are added
 to items.

Even with some somewhat off suggestions this will be a wonderful tool.

Thank you to everyone who worked on making this happen!  This really
is going to make Wikidata so much easier to use.

Is there any documentation on how it chooses which entities to
suggest?

Thank you,
Derric Atzrott


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Lydia Pintscher
On Tue, Jul 1, 2014 at 9:52 PM, Derric Atzrott
datzr...@alizeepathology.com wrote:
 We still need to tweak it a bit here and there, yeah. We're working on
 that right now. Also it will get smarter as more statements are added
 to items.

 Even with some somewhat off suggestions this will be a wonderful tool.

\o/
We've just changed the threshold a bit so it should give less but more
fitting suggestions. We'll play around with that setting a bit more
over the next days to find the one that's right for us.

 Thank you to everyone who worked on making this happen!  This really
 is going to make Wikidata so much easier to use.

 Is there any documentation on how it chooses which entities to
 suggest?

It basically creates a table of correlations for properties over all
items in Wikidata. So if say date of birth and place of birth are used
together a lot they get a high correlation. When you then have an item
with no place of birth but a date of birth it will suggest that
because of the high correlation.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Markus Krötzsch

On 01/07/14 21:47, Lydia Pintscher wrote:

On Tue, Jul 1, 2014 at 9:44 PM, Andy Mabbett a...@pigsonthewing.org.uk wrote:

On 1 July 2014 20:20, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote:

We have just deployed the entity suggester. This helps you with
suggesting properties. So when you now add a new statement to an item
it will suggest what should most likely be added to that item. One
example: You are on an item about a person but it doesn't have a date
of birth yet. Since a lot of other items about persons have a date of
birth it will suggest you also add one to this item.


This is a great idea, but I've just tried it on Q4810979 (about an
historic building) and it prompted me for a date of birth, gender,
taxon rank or taxon name.

Teething troubles?


We still need to tweak it a bit here and there, yeah. We're working on
that right now. Also it will get smarter as more statements are added
to items.


I hope tweaking will suffice. At least it seems that there is already 
enough data to find slightly more related related properties ;-). Here 
is the list of properties that I get for the two classes of Q4810979 
(recall that I compute related properties for each class).


(1) historic house museum
http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q2087181

Related properties: English Heritage list number, OS grid reference, 
owned by, inspired by, coordinate location, visitors per year, Commons 
category, architect, mother house, manager/director, country, 
commissioned by, architectural style, MusicBrainz place ID, use, date of 
foundation or creation, street


(2) Grade I listed building
http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818

Related properties: English Heritage list number, masts, Minor Planet 
Center observatory code, home port, coordinate location, OS grid 
reference, mother house, architect, manager/director, Emporis ID, 
MusicBrainz place ID, country, architectural style, visitors per year, 
Commons category, Structurae ID (structure), officially opened by, 
floors above ground, inspired by, religious order, number of platforms, 
street, owned by, diocese


These are computed fully automatically from the data, with no manual 
filtering or user input. But don't get me wrong -- great work! Brilliant 
to have such a thing integrated into the UI. In any case, my algorithm 
for computing the related properties is certainly very different from 
theirs; I am sure it also has its glitches.


Cheers,

Markus



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread David Cuenca
I noticed that sometimes it is a bit sluggish when showing the suggestions,
maybe it is my network, no idea. And it would be nice if it could suggest
the three basic properties (instance of, sublcass of, part of) when the
item is empty and suggest further properties based on that initial value,

Other than that, good job!



On Tue, Jul 1, 2014 at 10:00 PM, Lydia Pintscher 
lydia.pintsc...@wikimedia.de wrote:

 On Tue, Jul 1, 2014 at 9:52 PM, Derric Atzrott
 datzr...@alizeepathology.com wrote:
  We still need to tweak it a bit here and there, yeah. We're working on
  that right now. Also it will get smarter as more statements are added
  to items.
 
  Even with some somewhat off suggestions this will be a wonderful tool.

 \o/
 We've just changed the threshold a bit so it should give less but more
 fitting suggestions. We'll play around with that setting a bit more
 over the next days to find the one that's right for us.

  Thank you to everyone who worked on making this happen!  This really
  is going to make Wikidata so much easier to use.
 
  Is there any documentation on how it chooses which entities to
  suggest?

 It basically creates a table of correlations for properties over all
 items in Wikidata. So if say date of birth and place of birth are used
 together a lot they get a high correlation. When you then have an item
 with no place of birth but a date of birth it will suggest that
 because of the high correlation.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Product Manager for Wikidata

 Wikimedia Deutschland e.V.
 Tempelhofer Ufer 23-24
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Etiamsi omnes, ego non
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Markus Krötzsch

On 01/07/14 22:14, Markus Krötzsch wrote:
...


(2) Grade I listed building
http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818


Related properties: English Heritage list number, masts, Minor Planet
Center observatory code, home port, coordinate location, OS grid
reference, mother house, architect, manager/director, Emporis ID,
MusicBrainz place ID, country, architectural style, visitors per year,
Commons category, Structurae ID (structure), officially opened by,
floors above ground, inspired by, religious order, number of platforms,
street, owned by, diocese

These are computed fully automatically from the data, with no manual
filtering or user input. But don't get me wrong -- great work! Brilliant
to have such a thing integrated into the UI. In any case, my algorithm
for computing the related properties is certainly very different from
theirs; I am sure it also has its glitches.


P.S. One weakness of my algorithm you can already see: it has troubles 
estimating the relevance of very rare properties, such as Minor Planet 
Center observatory code above. A single wrong annotation may then lead 
to wrong suggestions. Also, it seems from my list under (2) that some 
Grade I listed buildings are ships. This seems to be an error that is 
amplified by the fact that property masts is used only 11 times in the 
dataset I evaluated (last week's data). I guess the new property 
suggester rather errs on the other side, being tricked into suggesting 
very frequent properties even in places that don't need them.


-- Markus


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread David Cuenca
Markus, could your algorithm work together with human direction? Like, if
we entered which properties are common for a class, and then a user creates
an instance of that class, would the algorithm be able to sort those
properties based on how often they appear on the database?

Thanks,
Micru


On Tue, Jul 1, 2014 at 10:23 PM, Markus Krötzsch 
mar...@semantic-mediawiki.org wrote:

 On 01/07/14 22:14, Markus Krötzsch wrote:
 ...


 (2) Grade I listed building
 http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=
 Q15700818


 Related properties: English Heritage list number, masts, Minor Planet
 Center observatory code, home port, coordinate location, OS grid
 reference, mother house, architect, manager/director, Emporis ID,
 MusicBrainz place ID, country, architectural style, visitors per year,
 Commons category, Structurae ID (structure), officially opened by,
 floors above ground, inspired by, religious order, number of platforms,
 street, owned by, diocese

 These are computed fully automatically from the data, with no manual
 filtering or user input. But don't get me wrong -- great work! Brilliant
 to have such a thing integrated into the UI. In any case, my algorithm
 for computing the related properties is certainly very different from
 theirs; I am sure it also has its glitches.


 P.S. One weakness of my algorithm you can already see: it has troubles
 estimating the relevance of very rare properties, such as Minor Planet
 Center observatory code above. A single wrong annotation may then lead to
 wrong suggestions. Also, it seems from my list under (2) that some Grade I
 listed buildings are ships. This seems to be an error that is amplified by
 the fact that property masts is used only 11 times in the dataset I
 evaluated (last week's data). I guess the new property suggester rather
 errs on the other side, being tricked into suggesting very frequent
 properties even in places that don't need them.

 -- Markus



 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Etiamsi omnes, ego non
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata just got 10 times easier to use

2014-07-01 Thread Markus Krötzsch

On 01/07/14 22:43, Bene* wrote:

Am 01.07.2014 22:23, schrieb Markus Krötzsch:

P.S. One weakness of my algorithm you can already see: it has troubles
estimating the relevance of very rare properties, such as Minor
Planet Center observatory code above. A single wrong annotation may
then lead to wrong suggestions. Also, it seems from my list under (2)
that some Grade I listed buildings are ships. This seems to be an
error that is amplified by the fact that property masts is used only
11 times in the dataset I evaluated (last week's data). I guess the
new property suggester rather errs on the other side, being tricked
into suggesting very frequent properties even in places that don't
need them.

However, it is obviously better if the algorithm performs well for
frequently used properties. Isn't it possible to combine those two
systems so they improve each other. One could check how often the
property is used and then rely on Markus' or the students' algorithm.


My hope is that with my other suggestion (using P31 values as features 
to correlate with), the property suggester will already be able to 
outperform my little toy algorithm anyway. One could also combine the 
two (my algorithm is really simple [1]), but maybe this is not needed.


Cheers

Markus

[1] For each class C and property P, I count:

* #C: the number of items in class C
* #P: the number of items using property P
* #PC: the number of items in class C using the property P
* #items: the total number of items

Then I compute two rates:

* rateCP = #PC / #C (fraction of items in a class with the property)
* rateP = #P / #items (fraction of all items with the property)

I then rank the properties for each class by the ratio of rateCP/rateP 
(intuitively: by what factor does the property of P increase for items 
in C?). Moreover, I apply two sigmoid functions [2] to the rates as 
additional factors, so as to ensure that properties are less relevant 
if they have very high or very low values for the rates. I don't care 
about things that almost everything/almost nothing has. Obviously, one 
can tweak this if one wants to include properties that almost 
everything has anyway.


[2] 
https://www.google.com/search?sclient=psy-abq=1+%2F+%281+%2B+exp%286+*+%28-2+*+x+%2B+0.5%29%29%29btnG=


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l