Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Gerard Meijssen
Hoi,
A big city is what? A city with more than a given number of inhabitants? If
so it is redundant because it can be inferred.
Thanks,
 GerardM

On 28 November 2015 at 06:12, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> It seems to me that a whitelist is the preferred solution to the problem of
> displaying too many classes that an item belongs to.  Any blacklist
> solution
> is going to need revision as new classes are added to Wikidata.  Any
> preference data is going to have problems with different languages and
> cultures.  Any solution based on specificity is going to have problems with
> classes like big city Q1549591.  Any solution that solely depends on
> whether
> the class has a language-specific Wikipedia page also will have problems
> with
> big city.
>
> It may be that in many cases a combination of a whitelist that is not
> language/culture specific combined with only showing classes that have a
> language-specific Wikipedia page will work well enough.
>
>
>
> peter
>
>
>
> On 11/27/2015 07:41 AM, Markus Krötzsch wrote:
> > Hi James,
> [...]
> > Possible options for solving your problem:
> >
> > * Make a whitelist of classes you want to show at all in the template,
> and
> > default to "city" if none of them occurs.
> > * Make a blacklist of classes you want to hide.
> > * Instead of blacklist or whitelist, show only classes that have a
> Wikipedia
> > page in your language; default to "city" if there are none.
> > * Try to generalise overly specific classes (change "big city" to "city"
> > etc.). I don't know if there is a good programmatic approach for this,
> or if
> > you would have to make a substitution list or something, which would not
> be
> > very maintainable.
> > * Do not use instance-of information like this in the infobox. It might
> sound
> > radical, but I am not sure if "instance of" is really working very well
> for
> > labelling things in the way you expect. Instance-of can refer to many
> > orthogonal properties of an object, in essentially random order, while a
> label
> > should probably focus on certain aspects only.
> >
> > For obvious reasons, ranks of statements cannot be used to record
> > language-specific preferences.
> >
> > Cheers,
> >
> > Markus
> >
> > On 27.11.2015 15:58, James Heald wrote:
> >> Some items have quite a lot of "instance of" statements, connecting them
> >> to quite a few different classes.
> >>
> >> For example, Frankfurt is currently an instance of seven different
> classes,
> >>  https://www.wikidata.org/wiki/Q1794
> >>
> >> and Glasgow is currently an instance of five different classes:
> >>  https://www.wikidata.org/wiki/Q4093
> >>
> >> This can produce quite a pile-up of descriptions in the
> >> description/subtitle section of an infobox -- for example, as on the
> >> Spanish page for Frankfurt at
> >>  https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
> >> in the section between the infobox title and the picture.
> >>
> >>
> >> Question:
> >>
> >> Is it an appropriate use of ranking, to choose a few of the values to
> >> display, and set those values to be "preferred rank" ?
> >>
> >> It would be useful to have wider input, as to whether it is a good thing
> >> as to whether this is done widely.
> >>
> >> Discussions are open at
> >>
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
> >>
> >> and
> >>
> https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
> >>
> >> -- but these have so far been inconclusive, and have got slightly taken
> >> over by questions such as
> >>
> >> * how well terms really do map from one language to another --
> >> near-equivalences that may be near enough for sitelinks may be jarring
> >> or insufficient when presented boldly up-front in an infobox.
> >>
> >> (For example, the French translation "ville" is rather unspecific, and
> >> perhaps inadequate in what it conveys, compared to "city" in English or
> >> "ciudad" in Spanish; "town" in English (which might have over 100,000
> >> inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt"
> >> in German).
> >>
> >> * whether different-language wikis may seek different degrees of
> >> generalisation or specificity in such sub-title areas, depending on how
> >> "close" the subject is to that wiki.
> >>
> >> (For readers in some languages, some fine distinctions may be highly
> >> relevant and familiar, whereas for other language groups that level of
> >> detail may be undesirably obscure).
> >>
> >>
> >> There is also the question of the effect of promoting some values to
> >> "preferred rank" for the visibility of other values in SPARQL -- in
> >> particular when so queries are written assuming they can get away with
> >> using just the simple "truthy" wdt:... form of properties.
> >>
> >> However, making eg the value "city" preferred for Glasgow means that it
> >> will no longer be returned in searches for its other values, if these
> >> have been written u

Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Peter F. Patel-Schneider
It seems to me that a whitelist is the preferred solution to the problem of
displaying too many classes that an item belongs to.  Any blacklist solution
is going to need revision as new classes are added to Wikidata.  Any
preference data is going to have problems with different languages and
cultures.  Any solution based on specificity is going to have problems with
classes like big city Q1549591.  Any solution that solely depends on whether
the class has a language-specific Wikipedia page also will have problems with
big city.

It may be that in many cases a combination of a whitelist that is not
language/culture specific combined with only showing classes that have a
language-specific Wikipedia page will work well enough.



peter



On 11/27/2015 07:41 AM, Markus Krötzsch wrote:
> Hi James,
[...]
> Possible options for solving your problem:
> 
> * Make a whitelist of classes you want to show at all in the template, and
> default to "city" if none of them occurs.
> * Make a blacklist of classes you want to hide.
> * Instead of blacklist or whitelist, show only classes that have a Wikipedia
> page in your language; default to "city" if there are none.
> * Try to generalise overly specific classes (change "big city" to "city"
> etc.). I don't know if there is a good programmatic approach for this, or if
> you would have to make a substitution list or something, which would not be
> very maintainable.
> * Do not use instance-of information like this in the infobox. It might sound
> radical, but I am not sure if "instance of" is really working very well for
> labelling things in the way you expect. Instance-of can refer to many
> orthogonal properties of an object, in essentially random order, while a label
> should probably focus on certain aspects only.
> 
> For obvious reasons, ranks of statements cannot be used to record
> language-specific preferences.
> 
> Cheers,
> 
> Markus
> 
> On 27.11.2015 15:58, James Heald wrote:
>> Some items have quite a lot of "instance of" statements, connecting them
>> to quite a few different classes.
>>
>> For example, Frankfurt is currently an instance of seven different classes,
>>  https://www.wikidata.org/wiki/Q1794
>>
>> and Glasgow is currently an instance of five different classes:
>>  https://www.wikidata.org/wiki/Q4093
>>
>> This can produce quite a pile-up of descriptions in the
>> description/subtitle section of an infobox -- for example, as on the
>> Spanish page for Frankfurt at
>>  https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
>> in the section between the infobox title and the picture.
>>
>>
>> Question:
>>
>> Is it an appropriate use of ranking, to choose a few of the values to
>> display, and set those values to be "preferred rank" ?
>>
>> It would be useful to have wider input, as to whether it is a good thing
>> as to whether this is done widely.
>>
>> Discussions are open at
>> https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
>>
>> and
>> https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
>>
>> -- but these have so far been inconclusive, and have got slightly taken
>> over by questions such as
>>
>> * how well terms really do map from one language to another --
>> near-equivalences that may be near enough for sitelinks may be jarring
>> or insufficient when presented boldly up-front in an infobox.
>>
>> (For example, the French translation "ville" is rather unspecific, and
>> perhaps inadequate in what it conveys, compared to "city" in English or
>> "ciudad" in Spanish; "town" in English (which might have over 100,000
>> inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt"
>> in German).
>>
>> * whether different-language wikis may seek different degrees of
>> generalisation or specificity in such sub-title areas, depending on how
>> "close" the subject is to that wiki.
>>
>> (For readers in some languages, some fine distinctions may be highly
>> relevant and familiar, whereas for other language groups that level of
>> detail may be undesirably obscure).
>>
>>
>> There is also the question of the effect of promoting some values to
>> "preferred rank" for the visibility of other values in SPARQL -- in
>> particular when so queries are written assuming they can get away with
>> using just the simple "truthy" wdt:... form of properties.
>>
>> However, making eg the value "city" preferred for Glasgow means that it
>> will no longer be returned in searches for its other values, if these
>> have been written using "wdt:..." -- so it will now be missed in a
>> simple-level query for "council areas", the current top-level
>> administrative subdivisions of Scotland, or for historically-based
>> "registration counties" -- and this problem will become more pronounced
>> if the practice becomes more widespread of making some values
>> "preferred" (and so other values invisible, at least for queries using
>> wdt:...).
>>
>>  From a SPARQL point of view, what would actually b

Re: [Wikidata] Odd results from wdqs

2015-11-27 Thread Magnus Manske
The "absolute" was the char[] size, which I had set to ~1MB back in the
day. Subsequent use of STL string type does support any memory-fitting
string.

On Fri, Nov 27, 2015 at 3:24 PM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 27.11.2015 15:22, Magnus Manske wrote:
> > It was the "absolute terms" problem here  ;-)
>
> But 3MB uncompressed string data does not seem to be so big in absolute
> terms, or are you referring to something else (I got this number from
> the long pages special)? Parsing a 3MB string may need some extra
> memory, but the data you get in the end should not be much bigger than
> the original string, or should it?
>
> Markus
>
> >
> > On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch
> > mailto:mar...@semantic-mediawiki.org>>
> > wrote:
> >
> > On 25.11.2015 16:05, Lydia Pintscher wrote:
> >  > On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske
> >  >  > > wrote:
> >  >> Well, my import code chokes on the last two JSON dumps (16th and
> > 23rd). As
> >  >> it fails about half an hour or so in, debugging is ...
> > inefficient. Unless
> >  >> there is something that has changed with the dump itself (new
> > data type or
> >  >> so), and someone tells me, it will be quite some time (days,
> > weeks) until I
> >  >> figure it out.
> >  >
> >  > To update everyone here as well: Magnus has been able to pinpoint
> the
> >  > problem and fix the tools. They're catching up again. The issue
> was
> >  > one the extremely big pages that have have recently been created
> for
> >  > research papers: https://www.wikidata.org/wiki/Special:LongPages
> >
> > Thanks for explaining. This explains why we did not see any problems
> or
> > unusual behaviour in Wikidata Toolkit. I guess Java simply does not
> care
> > about how long pages are, as long as they are not very big in
> absolute
> > terms.
> >
> > Markus
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Dario Taraborelli
oh I see, what a mess those Grisulfs, the family relationships are totally 
messed up, off to clean them up. 

> On Nov 27, 2015, at 10:38 AM, Gerard Meijssen  
> wrote:
> 
> Hoi,
> I do not know how to as there are two candidates. I do not have your book 
> that helps pick the right one.  I have added some statements so that 
> disambiguation is even easier. Reasonator is a great tool :)
> Thanks,
>  GerardM
> 
> On 27 November 2015 at 19:35, Dario Taraborelli  > wrote:
> err…point me to the correct item or fix it then? WP:BOLD
>  
>> On Nov 27, 2015, at 10:33 AM, Gerard Meijssen > > wrote:
>> 
>> Hoi,
>> It is highly likely that your Lombard duke already existed. So I think you 
>> got it wrong.
>> Thanks,
>>  GerardM
>> 
>> On 27 November 2015 at 19:31, Dario Taraborelli > > wrote:
>> Gerard – I think you’re missing my point. I’m not suggesting this as a 
>> display feature (which would be welcome and can always be generated by any 
>> tool querying Wikidata labels) but as a contribution stored to avoid future 
>> errors.
>> 
>>> On Nov 27, 2015, at 10:29 AM, Gerard Meijssen >> > wrote:
>>> 
>>> Hoi,
>>> Why not use Reasonator?
>>> https://tools.wmflabs.org/reasonator/?find=Grasulfo 
>>> 
>>> Thanks,
>>>  GerardM
>>> 
>>> On 27 November 2015 at 19:26, Dario Taraborelli >> > wrote:
>>> Magnus, this is fantastic and works as expected, thanks a lot.
>>> 
>>> One last note regarding the use of different from (P1889 
>>> ). While I agree with you 
>>> that it would be overkill to generate all these relations for common 
>>> homonyms, for new items created by Mix’n’match with the above tweak, where 
>>> a single other notable individual was previously missing from Wikidata (and 
>>> when no matching label can be found), it would be tremendously useful to 
>>> automatically add a two-way relation (see for example Grasulfo (Q3775839 
>>> ) <—> different from (P1889 
>>> ) <—> Grasulfo (Q21571734 
>>> ). Having this property added 
>>> would save me 2 extra edits and permanently store disambiguation signal for 
>>> future reference.
>>> 
>>> Thoughts?
>>> 
 On Nov 24, 2015, at 9:54 AM, Luca Martinelli >>> > wrote:
 
 <3
 
 L.
 
 Il 23/nov/2015 21:05, "Magnus Manske" >>> > ha scritto:
 Done.
 
 On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov >>> > wrote:
 On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli 
 mailto:dtarabore...@wikimedia.org>> wrote:
 On Nov 21, 2015, at 10:31, Magnus Manske >>> > wrote:
> A soultion could be to change the "not on Wikidata" button (or link) to a 
> "create new item" button. The new item would have a label, a description 
> (maybe), a statement with the catalog ID (if there is an associated 
> WIkidata property!), and "instance of:human" if the entry is internally 
> marked as "person", but nothing else.
 
> 
> Would that be welcomed by "mix'n'matchers", and Wikidata people? I think 
> it would make sense, for catalogs with a Wikidata property at least.
 
 I would strongly support this, with the restrictions you suggest. 
 
 +1.  This would be good.
 
 A.
 
 -- 
 Asaf Bartov
 Wikimedia Foundation 
 
 Imagine a world in which every single human being can freely share in the 
 sum of all knowledge. Help us make it a reality!
 https://donate.wikimedia.org 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata 
 
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata 
 
 
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/wikidata 
 
>>> 
>>> 
>>> 
>>> Dario Taraborelli  Head of Research, Wikimedia Foundation
>>> wikimediafoundation.org  • nitens.org 
>>> 

Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Gerard Meijssen
Hoi,
I do not know how to as there are two candidates. I do not have your book
that helps pick the right one.  I have added some statements so that
disambiguation is even easier. Reasonator is a great tool :)
Thanks,
 GerardM

On 27 November 2015 at 19:35, Dario Taraborelli 
wrote:

> err…point me to the correct item or fix it then? WP:BOLD
>
>
> On Nov 27, 2015, at 10:33 AM, Gerard Meijssen 
> wrote:
>
> Hoi,
> It is highly likely that your Lombard duke already existed. So I think you
> got it wrong.
> Thanks,
>  GerardM
>
> On 27 November 2015 at 19:31, Dario Taraborelli <
> dtarabore...@wikimedia.org> wrote:
>
>> Gerard – I think you’re missing my point. I’m not suggesting this as a
>> display feature (which would be welcome and can always be generated by any
>> tool querying Wikidata labels) but as a contribution *stored* to avoid
>> future errors.
>>
>> On Nov 27, 2015, at 10:29 AM, Gerard Meijssen 
>> wrote:
>>
>> Hoi,
>> Why not use Reasonator?
>> https://tools.wmflabs.org/reasonator/?find=Grasulfo
>> Thanks,
>>  GerardM
>>
>> On 27 November 2015 at 19:26, Dario Taraborelli <
>> dtarabore...@wikimedia.org> wrote:
>>
>>> Magnus, this is fantastic and works as expected, thanks a lot.
>>>
>>> One last note regarding the use of *different from* (P1889
>>> ). While I agree with you
>>> that it would be overkill to generate all these relations for common
>>> homonyms, for new items created by Mix’n’match with the above tweak, where
>>> a single other notable individual was previously missing from Wikidata (and
>>> when no matching label can be found), it would be tremendously useful to
>>> automatically add a two-way relation (see for example *Grasulfo* (
>>> Q3775839 ) <—> *different from*
>>>  (P1889 ) <—> *Grasulfo *(
>>> Q21571734 ). Having this
>>> property added would save me 2 extra edits and permanently store
>>> disambiguation signal for future reference.
>>>
>>> Thoughts?
>>>
>>> On Nov 24, 2015, at 9:54 AM, Luca Martinelli 
>>> wrote:
>>>
>>> <3
>>>
>>> L.
>>> Il 23/nov/2015 21:05, "Magnus Manske"  ha
>>> scritto:
>>>
 Done.

 On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov 
 wrote:

> On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli <
> dtarabore...@wikimedia.org> wrote:
>
>> On Nov 21, 2015, at 10:31, Magnus Manske 
>> wrote:
>>
> A soultion could be to change the "not on Wikidata" button (or link)
>> to a "create new item" button. The new item would have a label, a
>> description (maybe), a statement with the catalog ID (if there is an
>> associated WIkidata property!), and "instance of:human" if the entry is
>> internally marked as "person", but nothing else.
>>
>>
>> Would that be welcomed by "mix'n'matchers", and Wikidata people? I
>> think it would make sense, for catalogs with a Wikidata property at 
>> least.
>>
>>
>> I would strongly support this, with the restrictions you suggest.
>>
>
> +1.  This would be good.
>
> A.
>
> --
> Asaf Bartov
> Wikimedia Foundation 
>
> Imagine a world in which every single human being can freely share
> in the sum of all knowledge. Help us make it a reality!
> https://donate.wikimedia.org
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>
>>>
>>> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
>>> wikimediafoundation.org • nitens.org • @readermeter
>>> 
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>
>> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
>> wikimediafoundation.org • nitens.org • @readermeter
>> 
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://l

Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Dario Taraborelli
err…point me to the correct item or fix it then? WP:BOLD
 
> On Nov 27, 2015, at 10:33 AM, Gerard Meijssen  
> wrote:
> 
> Hoi,
> It is highly likely that your Lombard duke already existed. So I think you 
> got it wrong.
> Thanks,
>  GerardM
> 
> On 27 November 2015 at 19:31, Dario Taraborelli  > wrote:
> Gerard – I think you’re missing my point. I’m not suggesting this as a 
> display feature (which would be welcome and can always be generated by any 
> tool querying Wikidata labels) but as a contribution stored to avoid future 
> errors.
> 
>> On Nov 27, 2015, at 10:29 AM, Gerard Meijssen > > wrote:
>> 
>> Hoi,
>> Why not use Reasonator?
>> https://tools.wmflabs.org/reasonator/?find=Grasulfo 
>> 
>> Thanks,
>>  GerardM
>> 
>> On 27 November 2015 at 19:26, Dario Taraborelli > > wrote:
>> Magnus, this is fantastic and works as expected, thanks a lot.
>> 
>> One last note regarding the use of different from (P1889 
>> ). While I agree with you that 
>> it would be overkill to generate all these relations for common homonyms, 
>> for new items created by Mix’n’match with the above tweak, where a single 
>> other notable individual was previously missing from Wikidata (and when no 
>> matching label can be found), it would be tremendously useful to 
>> automatically add a two-way relation (see for example Grasulfo (Q3775839 
>> ) <—> different from (P1889 
>> ) <—> Grasulfo (Q21571734 
>> ). Having this property added would 
>> save me 2 extra edits and permanently store disambiguation signal for future 
>> reference.
>> 
>> Thoughts?
>> 
>>> On Nov 24, 2015, at 9:54 AM, Luca Martinelli >> > wrote:
>>> 
>>> <3
>>> 
>>> L.
>>> 
>>> Il 23/nov/2015 21:05, "Magnus Manske" >> > ha scritto:
>>> Done.
>>> 
>>> On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov >> > wrote:
>>> On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli 
>>> mailto:dtarabore...@wikimedia.org>> wrote:
>>> On Nov 21, 2015, at 10:31, Magnus Manske >> > wrote:
 A soultion could be to change the "not on Wikidata" button (or link) to a 
 "create new item" button. The new item would have a label, a description 
 (maybe), a statement with the catalog ID (if there is an associated 
 WIkidata property!), and "instance of:human" if the entry is internally 
 marked as "person", but nothing else.
>>> 
 
 Would that be welcomed by "mix'n'matchers", and Wikidata people? I think 
 it would make sense, for catalogs with a Wikidata property at least.
>>> 
>>> I would strongly support this, with the restrictions you suggest. 
>>> 
>>> +1.  This would be good.
>>> 
>>> A.
>>> 
>>> -- 
>>> Asaf Bartov
>>> Wikimedia Foundation 
>>> 
>>> Imagine a world in which every single human being can freely share in the 
>>> sum of all knowledge. Help us make it a reality!
>>> https://donate.wikimedia.org 
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org 
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>>> 
>>> 
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org 
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>>> 
>>> 
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org 
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>>> 
>> 
>> 
>> 
>> Dario Taraborelli  Head of Research, Wikimedia Foundation
>> wikimediafoundation.org  • nitens.org 
>>  • @readermeter 
>> 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>> 
>> 
>> 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>> 
> 
> 
> 
> Dario Taraborelli  Head of Research, Wikimedia F

Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Gerard Meijssen
Hoi,
It is highly likely that your Lombard duke already existed. So I think you
got it wrong.
Thanks,
 GerardM

On 27 November 2015 at 19:31, Dario Taraborelli 
wrote:

> Gerard – I think you’re missing my point. I’m not suggesting this as a
> display feature (which would be welcome and can always be generated by any
> tool querying Wikidata labels) but as a contribution *stored* to avoid
> future errors.
>
> On Nov 27, 2015, at 10:29 AM, Gerard Meijssen 
> wrote:
>
> Hoi,
> Why not use Reasonator?
> https://tools.wmflabs.org/reasonator/?find=Grasulfo
> Thanks,
>  GerardM
>
> On 27 November 2015 at 19:26, Dario Taraborelli <
> dtarabore...@wikimedia.org> wrote:
>
>> Magnus, this is fantastic and works as expected, thanks a lot.
>>
>> One last note regarding the use of *different from* (P1889
>> ). While I agree with you
>> that it would be overkill to generate all these relations for common
>> homonyms, for new items created by Mix’n’match with the above tweak, where
>> a single other notable individual was previously missing from Wikidata (and
>> when no matching label can be found), it would be tremendously useful to
>> automatically add a two-way relation (see for example *Grasulfo* (
>> Q3775839 ) <—> *different from* (
>> P1889 ) <—> *Grasulfo *(
>> Q21571734 ). Having this
>> property added would save me 2 extra edits and permanently store
>> disambiguation signal for future reference.
>>
>> Thoughts?
>>
>> On Nov 24, 2015, at 9:54 AM, Luca Martinelli 
>> wrote:
>>
>> <3
>>
>> L.
>> Il 23/nov/2015 21:05, "Magnus Manske"  ha
>> scritto:
>>
>>> Done.
>>>
>>> On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov 
>>> wrote:
>>>
 On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli <
 dtarabore...@wikimedia.org> wrote:

> On Nov 21, 2015, at 10:31, Magnus Manske 
> wrote:
>
 A soultion could be to change the "not on Wikidata" button (or link) to
> a "create new item" button. The new item would have a label, a description
> (maybe), a statement with the catalog ID (if there is an associated
> WIkidata property!), and "instance of:human" if the entry is internally
> marked as "person", but nothing else.
>
>
> Would that be welcomed by "mix'n'matchers", and Wikidata people? I
> think it would make sense, for catalogs with a Wikidata property at least.
>
>
> I would strongly support this, with the restrictions you suggest.
>

 +1.  This would be good.

 A.

 --
 Asaf Bartov
 Wikimedia Foundation 

 Imagine a world in which every single human being can freely share
 in the sum of all knowledge. Help us make it a reality!
 https://donate.wikimedia.org
 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>
>> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
>> wikimediafoundation.org • nitens.org • @readermeter
>> 
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Dario Taraborelli
Gerard – I think you’re missing my point. I’m not suggesting this as a display 
feature (which would be welcome and can always be generated by any tool 
querying Wikidata labels) but as a contribution stored to avoid future errors.

> On Nov 27, 2015, at 10:29 AM, Gerard Meijssen  
> wrote:
> 
> Hoi,
> Why not use Reasonator?
> https://tools.wmflabs.org/reasonator/?find=Grasulfo 
> 
> Thanks,
>  GerardM
> 
> On 27 November 2015 at 19:26, Dario Taraborelli  > wrote:
> Magnus, this is fantastic and works as expected, thanks a lot.
> 
> One last note regarding the use of different from (P1889 
> ). While I agree with you that 
> it would be overkill to generate all these relations for common homonyms, for 
> new items created by Mix’n’match with the above tweak, where a single other 
> notable individual was previously missing from Wikidata (and when no matching 
> label can be found), it would be tremendously useful to automatically add a 
> two-way relation (see for example Grasulfo (Q3775839 
> ) <—> different from (P1889 
> ) <—> Grasulfo (Q21571734 
> ). Having this property added would 
> save me 2 extra edits and permanently store disambiguation signal for future 
> reference.
> 
> Thoughts?
> 
>> On Nov 24, 2015, at 9:54 AM, Luca Martinelli > > wrote:
>> 
>> <3
>> 
>> L.
>> 
>> Il 23/nov/2015 21:05, "Magnus Manske" > > ha scritto:
>> Done.
>> 
>> On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov > > wrote:
>> On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli 
>> mailto:dtarabore...@wikimedia.org>> wrote:
>> On Nov 21, 2015, at 10:31, Magnus Manske > > wrote:
>>> A soultion could be to change the "not on Wikidata" button (or link) to a 
>>> "create new item" button. The new item would have a label, a description 
>>> (maybe), a statement with the catalog ID (if there is an associated 
>>> WIkidata property!), and "instance of:human" if the entry is internally 
>>> marked as "person", but nothing else.
>> 
>>> 
>>> Would that be welcomed by "mix'n'matchers", and Wikidata people? I think it 
>>> would make sense, for catalogs with a Wikidata property at least.
>> 
>> I would strongly support this, with the restrictions you suggest. 
>> 
>> +1.  This would be good.
>> 
>> A.
>> 
>> -- 
>> Asaf Bartov
>> Wikimedia Foundation 
>> 
>> Imagine a world in which every single human being can freely share in the 
>> sum of all knowledge. Help us make it a reality!
>> https://donate.wikimedia.org 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>> 
>> 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>> 
>> 
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata 
>> 
> 
> 
> 
> Dario Taraborelli  Head of Research, Wikimedia Foundation
> wikimediafoundation.org  • nitens.org 
>  • @readermeter 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org  • nitens.org 
 • @readermeter 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Gerard Meijssen
Hoi,
Why not use Reasonator?
https://tools.wmflabs.org/reasonator/?find=Grasulfo
Thanks,
 GerardM

On 27 November 2015 at 19:26, Dario Taraborelli 
wrote:

> Magnus, this is fantastic and works as expected, thanks a lot.
>
> One last note regarding the use of *different from* (P1889
> ). While I agree with you
> that it would be overkill to generate all these relations for common
> homonyms, for new items created by Mix’n’match with the above tweak, where
> a single other notable individual was previously missing from Wikidata (and
> when no matching label can be found), it would be tremendously useful to
> automatically add a two-way relation (see for example *Grasulfo* (Q3775839
> ) <—> *different from* (P1889
> ) <—> *Grasulfo *(Q21571734
> ). Having this property added
> would save me 2 extra edits and permanently store disambiguation signal for
> future reference.
>
> Thoughts?
>
> On Nov 24, 2015, at 9:54 AM, Luca Martinelli 
> wrote:
>
> <3
>
> L.
> Il 23/nov/2015 21:05, "Magnus Manske"  ha
> scritto:
>
>> Done.
>>
>> On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov 
>> wrote:
>>
>>> On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli <
>>> dtarabore...@wikimedia.org> wrote:
>>>
 On Nov 21, 2015, at 10:31, Magnus Manske 
 wrote:

>>> A soultion could be to change the "not on Wikidata" button (or link) to
 a "create new item" button. The new item would have a label, a description
 (maybe), a statement with the catalog ID (if there is an associated
 WIkidata property!), and "instance of:human" if the entry is internally
 marked as "person", but nothing else.


 Would that be welcomed by "mix'n'matchers", and Wikidata people? I
 think it would make sense, for catalogs with a Wikidata property at least.


 I would strongly support this, with the restrictions you suggest.

>>>
>>> +1.  This would be good.
>>>
>>> A.
>>>
>>> --
>>> Asaf Bartov
>>> Wikimedia Foundation 
>>>
>>> Imagine a world in which every single human being can freely share
>>> in the sum of all knowledge. Help us make it a reality!
>>> https://donate.wikimedia.org
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'match: how to preserve manually audited items for posterity?

2015-11-27 Thread Dario Taraborelli
Magnus, this is fantastic and works as expected, thanks a lot.

One last note regarding the use of different from (P1889 
). While I agree with you that it 
would be overkill to generate all these relations for common homonyms, for new 
items created by Mix’n’match with the above tweak, where a single other notable 
individual was previously missing from Wikidata (and when no matching label can 
be found), it would be tremendously useful to automatically add a two-way 
relation (see for example Grasulfo (Q3775839 
) <—> different from (P1889 
) <—> Grasulfo (Q21571734 
). Having this property added would 
save me 2 extra edits and permanently store disambiguation signal for future 
reference.

Thoughts?

> On Nov 24, 2015, at 9:54 AM, Luca Martinelli  wrote:
> 
> <3
> 
> L.
> 
> Il 23/nov/2015 21:05, "Magnus Manske"  > ha scritto:
> Done.
> 
> On Mon, Nov 23, 2015 at 12:25 PM Asaf Bartov  > wrote:
> On Sat, Nov 21, 2015 at 10:45 AM, Dario Taraborelli 
> mailto:dtarabore...@wikimedia.org>> wrote:
> On Nov 21, 2015, at 10:31, Magnus Manske  > wrote:
>> A soultion could be to change the "not on Wikidata" button (or link) to a 
>> "create new item" button. The new item would have a label, a description 
>> (maybe), a statement with the catalog ID (if there is an associated WIkidata 
>> property!), and "instance of:human" if the entry is internally marked as 
>> "person", but nothing else.
> 
>> 
>> Would that be welcomed by "mix'n'matchers", and Wikidata people? I think it 
>> would make sense, for catalogs with a Wikidata property at least.
> 
> I would strongly support this, with the restrictions you suggest. 
> 
> +1.  This would be good.
> 
> A.
> 
> -- 
> Asaf Bartov
> Wikimedia Foundation 
> 
> Imagine a world in which every single human being can freely share in the sum 
> of all knowledge. Help us make it a reality!
> https://donate.wikimedia.org 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org  • nitens.org 
 • @readermeter 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Markus Krötzsch

On 27.11.2015 17:05, Tobias Schönberg wrote:

@Markus, James:
In my opinion it is better to make the query ask for the most recent
population number. People just need to start using time-qualifiers for
things like census-report numbers.


Unfortunately, this is not sufficient for census number selections, 
since the most recent number might be less accurate than another 
somewhat-recent number, which is therefore considered "preferred". I 
have no idea how to come up with a reasonable SPARQL query to evaluate 
this situation.


Similarly, ignoring the instance-of statements that are historic if 
other statements may have no times associated whatsoever, and picking 
the most recent instance-of statement if all of them have times 
associated would require an amount of computation that you really don't 
want to encode in SPARQL. Feel free to prove me wrong by posting the 
SPARQL query here, but I think it won't be feasible. SPARQL is not a 
programming language to implement arbitrarily complex selection rules 
in. The current rank-based system, in spite of its necessary 
limitations, is in fact highly effective for solving a huge number of 
such issues in a pragmatic way. You may need to use the exact data for 
many applications (we completely agree there), but ranks will always be 
of great use to keep the rest of your query as simple as possible.




And the other issue is one of standardized vocabulary and that is always
a sourcing problem in my opinion. A query could say "get the
instance-of-statement" that has a supporting source from the Spanish
Geographic Society. Then the infobox would only include standardized
vocabulary by that organization. But I aknowledge that large parts of
the world are not covered by standardized vocabulary organizations.


Yes, it seems we need to let the use of references evolve a little more 
until such things will be feasible and lead to good coverage.




If that doesn't solve it we could at least think about language specific
rank-overrides.


Storing ranks per language will not be feasible or desirable. I think 
the solutions I gave can go a long way. In the end, any 
language-specific way to define the classes you want to display/hide 
will do. For example, a SPARQL query for all super classes that have an 
article in a given Wikipedia is rather easy (querying for the most 
specific such superclasses is another matter of course ...).


Markus



2015-11-27 16:41 GMT+01:00 Markus Krötzsch
mailto:mar...@semantic-mediawiki.org>>:

Hi James,

I would immediately agree to the following measures to alleviate
your problem:

(1) If some instance-of statements are historic (i.e., no longer
valid), then one should make the current ones "preferred" and leave
the historic ones "normal", just like for, e.g., population numbers.
This would get rid of the rather inappropriate "Free imperial city"
label for Frankfurt.

(2) If some classes are redundant, they could be removed (e.g., if
we already have "Big city" we do not need "city"). However,
community might decide to prefer the direct use of a main class
(such as "Human"), even if redundant.

The other issues you mention are more tricky. Especially issues of
translation/cultural specificity. The most specific classes are not
always the ones that all languages would want to see, e.g., if the
concept of the class is not known in that language.

Possible options for solving your problem:

* Make a whitelist of classes you want to show at all in the
template, and default to "city" if none of them occurs.
* Make a blacklist of classes you want to hide.
* Instead of blacklist or whitelist, show only classes that have a
Wikipedia page in your language; default to "city" if there are none.
* Try to generalise overly specific classes (change "big city" to
"city" etc.). I don't know if there is a good programmatic approach
for this, or if you would have to make a substitution list or
something, which would not be very maintainable.
* Do not use instance-of information like this in the infobox. It
might sound radical, but I am not sure if "instance of" is really
working very well for labelling things in the way you expect.
Instance-of can refer to many orthogonal properties of an object, in
essentially random order, while a label should probably focus on
certain aspects only.

For obvious reasons, ranks of statements cannot be used to record
language-specific preferences.

Cheers,

Markus


On 27.11.2015 15:58, James Heald wrote:

Some items have quite a lot of "instance of" statements,
connecting them
to quite a few different classes.

For example, Frankfurt is currently an instance of seven
different classes,
https://www.wikidata.org/wiki/Q1794

and Glasgow is currently an instance of five different classes:
https://www.wikidata.org

Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Tobias Schönberg
@Markus, James:
In my opinion it is better to make the query ask for the most recent
population number. People just need to start using time-qualifiers for
things like census-report numbers.

And the other issue is one of standardized vocabulary and that is always a
sourcing problem in my opinion. A query could say "get the
instance-of-statement" that has a supporting source from the Spanish
Geographic Society. Then the infobox would only include standardized
vocabulary by that organization. But I aknowledge that large parts of the
world are not covered by standardized vocabulary organizations.

If that doesn't solve it we could at least think about language specific
rank-overrides.

-Tobias


2015-11-27 16:41 GMT+01:00 Markus Krötzsch :

> Hi James,
>
> I would immediately agree to the following measures to alleviate your
> problem:
>
> (1) If some instance-of statements are historic (i.e., no longer valid),
> then one should make the current ones "preferred" and leave the historic
> ones "normal", just like for, e.g., population numbers. This would get rid
> of the rather inappropriate "Free imperial city" label for Frankfurt.
>
> (2) If some classes are redundant, they could be removed (e.g., if we
> already have "Big city" we do not need "city"). However, community might
> decide to prefer the direct use of a main class (such as "Human"), even if
> redundant.
>
> The other issues you mention are more tricky. Especially issues of
> translation/cultural specificity. The most specific classes are not always
> the ones that all languages would want to see, e.g., if the concept of the
> class is not known in that language.
>
> Possible options for solving your problem:
>
> * Make a whitelist of classes you want to show at all in the template, and
> default to "city" if none of them occurs.
> * Make a blacklist of classes you want to hide.
> * Instead of blacklist or whitelist, show only classes that have a
> Wikipedia page in your language; default to "city" if there are none.
> * Try to generalise overly specific classes (change "big city" to "city"
> etc.). I don't know if there is a good programmatic approach for this, or
> if you would have to make a substitution list or something, which would not
> be very maintainable.
> * Do not use instance-of information like this in the infobox. It might
> sound radical, but I am not sure if "instance of" is really working very
> well for labelling things in the way you expect. Instance-of can refer to
> many orthogonal properties of an object, in essentially random order, while
> a label should probably focus on certain aspects only.
>
> For obvious reasons, ranks of statements cannot be used to record
> language-specific preferences.
>
> Cheers,
>
> Markus
>
>
> On 27.11.2015 15:58, James Heald wrote:
>
>> Some items have quite a lot of "instance of" statements, connecting them
>> to quite a few different classes.
>>
>> For example, Frankfurt is currently an instance of seven different
>> classes,
>>  https://www.wikidata.org/wiki/Q1794
>>
>> and Glasgow is currently an instance of five different classes:
>>  https://www.wikidata.org/wiki/Q4093
>>
>> This can produce quite a pile-up of descriptions in the
>> description/subtitle section of an infobox -- for example, as on the
>> Spanish page for Frankfurt at
>>  https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
>> in the section between the infobox title and the picture.
>>
>>
>> Question:
>>
>> Is it an appropriate use of ranking, to choose a few of the values to
>> display, and set those values to be "preferred rank" ?
>>
>> It would be useful to have wider input, as to whether it is a good thing
>> as to whether this is done widely.
>>
>> Discussions are open at
>>
>> https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
>>
>> and
>> https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
>>
>> -- but these have so far been inconclusive, and have got slightly taken
>> over by questions such as
>>
>> * how well terms really do map from one language to another --
>> near-equivalences that may be near enough for sitelinks may be jarring
>> or insufficient when presented boldly up-front in an infobox.
>>
>> (For example, the French translation "ville" is rather unspecific, and
>> perhaps inadequate in what it conveys, compared to "city" in English or
>> "ciudad" in Spanish; "town" in English (which might have over 100,000
>> inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt"
>> in German).
>>
>> * whether different-language wikis may seek different degrees of
>> generalisation or specificity in such sub-title areas, depending on how
>> "close" the subject is to that wiki.
>>
>> (For readers in some languages, some fine distinctions may be highly
>> relevant and familiar, whereas for other language groups that level of
>> detail may be undesirably obscure).
>>
>>
>> There is also the question of the effect of promoting some values to
>

Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Markus Krötzsch

Hi James,

I would immediately agree to the following measures to alleviate your 
problem:


(1) If some instance-of statements are historic (i.e., no longer valid), 
then one should make the current ones "preferred" and leave the historic 
ones "normal", just like for, e.g., population numbers. This would get 
rid of the rather inappropriate "Free imperial city" label for Frankfurt.


(2) If some classes are redundant, they could be removed (e.g., if we 
already have "Big city" we do not need "city"). However, community might 
decide to prefer the direct use of a main class (such as "Human"), even 
if redundant.


The other issues you mention are more tricky. Especially issues of 
translation/cultural specificity. The most specific classes are not 
always the ones that all languages would want to see, e.g., if the 
concept of the class is not known in that language.


Possible options for solving your problem:

* Make a whitelist of classes you want to show at all in the template, 
and default to "city" if none of them occurs.

* Make a blacklist of classes you want to hide.
* Instead of blacklist or whitelist, show only classes that have a 
Wikipedia page in your language; default to "city" if there are none.
* Try to generalise overly specific classes (change "big city" to "city" 
etc.). I don't know if there is a good programmatic approach for this, 
or if you would have to make a substitution list or something, which 
would not be very maintainable.
* Do not use instance-of information like this in the infobox. It might 
sound radical, but I am not sure if "instance of" is really working very 
well for labelling things in the way you expect. Instance-of can refer 
to many orthogonal properties of an object, in essentially random order, 
while a label should probably focus on certain aspects only.


For obvious reasons, ranks of statements cannot be used to record 
language-specific preferences.


Cheers,

Markus

On 27.11.2015 15:58, James Heald wrote:

Some items have quite a lot of "instance of" statements, connecting them
to quite a few different classes.

For example, Frankfurt is currently an instance of seven different classes,
 https://www.wikidata.org/wiki/Q1794

and Glasgow is currently an instance of five different classes:
 https://www.wikidata.org/wiki/Q4093

This can produce quite a pile-up of descriptions in the
description/subtitle section of an infobox -- for example, as on the
Spanish page for Frankfurt at
 https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
in the section between the infobox title and the picture.


Question:

Is it an appropriate use of ranking, to choose a few of the values to
display, and set those values to be "preferred rank" ?

It would be useful to have wider input, as to whether it is a good thing
as to whether this is done widely.

Discussions are open at
https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank

and
https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9

-- but these have so far been inconclusive, and have got slightly taken
over by questions such as

* how well terms really do map from one language to another --
near-equivalences that may be near enough for sitelinks may be jarring
or insufficient when presented boldly up-front in an infobox.

(For example, the French translation "ville" is rather unspecific, and
perhaps inadequate in what it conveys, compared to "city" in English or
"ciudad" in Spanish; "town" in English (which might have over 100,000
inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt"
in German).

* whether different-language wikis may seek different degrees of
generalisation or specificity in such sub-title areas, depending on how
"close" the subject is to that wiki.

(For readers in some languages, some fine distinctions may be highly
relevant and familiar, whereas for other language groups that level of
detail may be undesirably obscure).


There is also the question of the effect of promoting some values to
"preferred rank" for the visibility of other values in SPARQL -- in
particular when so queries are written assuming they can get away with
using just the simple "truthy" wdt:... form of properties.

However, making eg the value "city" preferred for Glasgow means that it
will no longer be returned in searches for its other values, if these
have been written using "wdt:..." -- so it will now be missed in a
simple-level query for "council areas", the current top-level
administrative subdivisions of Scotland, or for historically-based
"registration counties" -- and this problem will become more pronounced
if the practice becomes more widespread of making some values
"preferred" (and so other values invisible, at least for queries using
wdt:...).

 From a SPARQL point of view, what would actually be very helpful would
to add a (new) fourth rank -- "misleading without qualifier", below
"normal" but above "deprecated" -- for statements that *are* 

Re: [Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread Tobias Schönberg
@James
As you mention yourself using ranks is a very limiting approach, and I
think that we shouldn't modify the data to help the queries, but try to
make the queries more intelligent. - Once confliciting, and time-dependent
statements are added to each item, the return values of simple queries will
be huge lists, or chunks of the data-tree. - So I think even the infoboxes
have to make some decisions on how they wan't to deal with the complexity,
and those decisions might not be the same in every language community. - I
also think we need to communicate this more that something like "Mayor of
Barcelona" might get 1 results now, but is actually bad-practice and in
Wikidata's future will likely return 100s of values.

-Tobias

2015-11-27 15:58 GMT+01:00 James Heald :

> Some items have quite a lot of "instance of" statements, connecting them
> to quite a few different classes.
>
> For example, Frankfurt is currently an instance of seven different classes,
> https://www.wikidata.org/wiki/Q1794
>
> and Glasgow is currently an instance of five different classes:
> https://www.wikidata.org/wiki/Q4093
>
> This can produce quite a pile-up of descriptions in the
> description/subtitle section of an infobox -- for example, as on the
> Spanish page for Frankfurt at
> https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
> in the section between the infobox title and the picture.
>
>
> Question:
>
> Is it an appropriate use of ranking, to choose a few of the values to
> display, and set those values to be "preferred rank" ?
>
> It would be useful to have wider input, as to whether it is a good thing
> as to whether this is done widely.
>
> Discussions are open at
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
> and
> https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
>
> -- but these have so far been inconclusive, and have got slightly taken
> over by questions such as
>
> * how well terms really do map from one language to another --
> near-equivalences that may be near enough for sitelinks may be jarring or
> insufficient when presented boldly up-front in an infobox.
>
> (For example, the French translation "ville" is rather unspecific, and
> perhaps inadequate in what it conveys, compared to "city" in English or
> "ciudad" in Spanish; "town" in English (which might have over 100,000
> inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt" in
> German).
>
> * whether different-language wikis may seek different degrees of
> generalisation or specificity in such sub-title areas, depending on how
> "close" the subject is to that wiki.
>
> (For readers in some languages, some fine distinctions may be highly
> relevant and familiar, whereas for other language groups that level of
> detail may be undesirably obscure).
>
>
> There is also the question of the effect of promoting some values to
> "preferred rank" for the visibility of other values in SPARQL -- in
> particular when so queries are written assuming they can get away with
> using just the simple "truthy" wdt:... form of properties.
>
> However, making eg the value "city" preferred for Glasgow means that it
> will no longer be returned in searches for its other values, if these have
> been written using "wdt:..." -- so it will now be missed in a simple-level
> query for "council areas", the current top-level administrative
> subdivisions of Scotland, or for historically-based "registration counties"
> -- and this problem will become more pronounced if the practice becomes
> more widespread of making some values "preferred" (and so other values
> invisible, at least for queries using wdt:...).
>
> From a SPARQL point of view, what would actually be very helpful would to
> add a (new) fourth rank -- "misleading without qualifier", below "normal"
> but above "deprecated" -- for statements that *are* true (with the
> qualifiers), but could be misleading without them
> * for example, for a town that was the county town of a shire once, but
> hasn't been for two centuries
> * or for an administrative area that is partly located in one higher-level
> division, and partly in another -- this is very valuable information to be
> able to note, but it's important to be able to exclude it from being all
> included in a recursive search for the places in one (but not the other) of
> that higher-level division.
>
> The statements shouldn't be marked "deprecated", because they are true
> (unlike a widely-given but incorrect date of birth, for example).  At the
> moment one can sort of work round the issue, if one can find another
> statement to make "preferred", so that the qualified statement becomes
> invisible to a simple search without qualifiers.  However, if "preferred"
> status is going to be used just to select things to show in infoboxes, it
> becomes very desirable that "wdt:..." searches should retrieve things at
> normal rank as well -- creating a need for a new rank for statement

Re: [Wikidata] Odd results from wdqs

2015-11-27 Thread Markus Krötzsch

On 27.11.2015 15:22, Magnus Manske wrote:

It was the "absolute terms" problem here  ;-)


But 3MB uncompressed string data does not seem to be so big in absolute 
terms, or are you referring to something else (I got this number from 
the long pages special)? Parsing a 3MB string may need some extra 
memory, but the data you get in the end should not be much bigger than 
the original string, or should it?


Markus



On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch
mailto:mar...@semantic-mediawiki.org>>
wrote:

On 25.11.2015 16:05, Lydia Pintscher wrote:
 > On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske
 > mailto:magnusman...@googlemail.com>> wrote:
 >> Well, my import code chokes on the last two JSON dumps (16th and
23rd). As
 >> it fails about half an hour or so in, debugging is ...
inefficient. Unless
 >> there is something that has changed with the dump itself (new
data type or
 >> so), and someone tells me, it will be quite some time (days,
weeks) until I
 >> figure it out.
 >
 > To update everyone here as well: Magnus has been able to pinpoint the
 > problem and fix the tools. They're catching up again. The issue was
 > one the extremely big pages that have have recently been created for
 > research papers: https://www.wikidata.org/wiki/Special:LongPages

Thanks for explaining. This explains why we did not see any problems or
unusual behaviour in Wikidata Toolkit. I guess Java simply does not care
about how long pages are, as long as they are not very big in absolute
terms.

Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Preferred rank -- choices for infoboxes, versus SPARQL

2015-11-27 Thread James Heald
Some items have quite a lot of "instance of" statements, connecting them 
to quite a few different classes.


For example, Frankfurt is currently an instance of seven different classes,
https://www.wikidata.org/wiki/Q1794

and Glasgow is currently an instance of five different classes:
https://www.wikidata.org/wiki/Q4093

This can produce quite a pile-up of descriptions in the 
description/subtitle section of an infobox -- for example, as on the 
Spanish page for Frankfurt at

https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
in the section between the infobox title and the picture.


Question:

Is it an appropriate use of ranking, to choose a few of the values to 
display, and set those values to be "preferred rank" ?


It would be useful to have wider input, as to whether it is a good thing 
as to whether this is done widely.


Discussions are open at
https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
and
https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9

-- but these have so far been inconclusive, and have got slightly taken 
over by questions such as


* how well terms really do map from one language to another -- 
near-equivalences that may be near enough for sitelinks may be jarring 
or insufficient when presented boldly up-front in an infobox.


(For example, the French translation "ville" is rather unspecific, and 
perhaps inadequate in what it conveys, compared to "city" in English or 
"ciudad" in Spanish; "town" in English (which might have over 100,000 
inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt" 
in German).


* whether different-language wikis may seek different degrees of 
generalisation or specificity in such sub-title areas, depending on how 
"close" the subject is to that wiki.


(For readers in some languages, some fine distinctions may be highly 
relevant and familiar, whereas for other language groups that level of 
detail may be undesirably obscure).



There is also the question of the effect of promoting some values to 
"preferred rank" for the visibility of other values in SPARQL -- in 
particular when so queries are written assuming they can get away with 
using just the simple "truthy" wdt:... form of properties.


However, making eg the value "city" preferred for Glasgow means that it 
will no longer be returned in searches for its other values, if these 
have been written using "wdt:..." -- so it will now be missed in a 
simple-level query for "council areas", the current top-level 
administrative subdivisions of Scotland, or for historically-based 
"registration counties" -- and this problem will become more pronounced 
if the practice becomes more widespread of making some values 
"preferred" (and so other values invisible, at least for queries using 
wdt:...).


From a SPARQL point of view, what would actually be very helpful would 
to add a (new) fourth rank -- "misleading without qualifier", below 
"normal" but above "deprecated" -- for statements that *are* true (with 
the qualifiers), but could be misleading without them
* for example, for a town that was the county town of a shire once, but 
hasn't been for two centuries
* or for an administrative area that is partly located in one 
higher-level division, and partly in another -- this is very valuable 
information to be able to note, but it's important to be able to exclude 
it from being all included in a recursive search for the places in one 
(but not the other) of that higher-level division.


The statements shouldn't be marked "deprecated", because they are true 
(unlike a widely-given but incorrect date of birth, for example).  At 
the moment one can sort of work round the issue, if one can find another 
statement to make "preferred", so that the qualified statement becomes 
invisible to a simple search without qualifiers.  However, if 
"preferred" status is going to be used just to select things to show in 
infoboxes, it becomes very desirable that "wdt:..." searches should 
retrieve things at normal rank as well -- creating a need for a new rank 
for statements which are true, but misleading if read without qualifiers.



What *is* needed though, is a view on whether trying to tailor what is 
shown in infoboxes is an appropriate reason to alter statement rankings.


It would be good to get a view on this.

The Spanish guys who stated doing this have temporarily put further 
rank-changes on hold, for the issue to be discussed; but so far what 
they have done has only just scratched the surface of what could be done 
-- there are still a lot more cases of multiple values they would like 
to tidy.


So: is this the kind of thing that "preferred rank" is envisaged for ?

Or, should some statements not be marked as less preferred than others, 
if this is the only reason ?



   --  James.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/

Re: [Wikidata] Odd results from wdqs

2015-11-27 Thread Magnus Manske
It was the "absolute terms" problem here  ;-)

On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 25.11.2015 16:05, Lydia Pintscher wrote:
> > On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske
> >  wrote:
> >> Well, my import code chokes on the last two JSON dumps (16th and 23rd).
> As
> >> it fails about half an hour or so in, debugging is ... inefficient.
> Unless
> >> there is something that has changed with the dump itself (new data type
> or
> >> so), and someone tells me, it will be quite some time (days, weeks)
> until I
> >> figure it out.
> >
> > To update everyone here as well: Magnus has been able to pinpoint the
> > problem and fix the tools. They're catching up again. The issue was
> > one the extremely big pages that have have recently been created for
> > research papers: https://www.wikidata.org/wiki/Special:LongPages
>
> Thanks for explaining. This explains why we did not see any problems or
> unusual behaviour in Wikidata Toolkit. I guess Java simply does not care
> about how long pages are, as long as they are not very big in absolute
> terms.
>
> Markus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Odd results from wdqs

2015-11-27 Thread Markus Krötzsch

On 25.11.2015 16:05, Lydia Pintscher wrote:

On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske
 wrote:

Well, my import code chokes on the last two JSON dumps (16th and 23rd). As
it fails about half an hour or so in, debugging is ... inefficient. Unless
there is something that has changed with the dump itself (new data type or
so), and someone tells me, it will be quite some time (days, weeks) until I
figure it out.


To update everyone here as well: Magnus has been able to pinpoint the
problem and fix the tools. They're catching up again. The issue was
one the extremely big pages that have have recently been created for
research papers: https://www.wikidata.org/wiki/Special:LongPages


Thanks for explaining. This explains why we did not see any problems or 
unusual behaviour in Wikidata Toolkit. I guess Java simply does not care 
about how long pages are, as long as they are not very big in absolute 
terms.


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] looking for speaker for conference in Vienna

2015-11-27 Thread Lydia Pintscher
Hey folks :)

I am looking for someone who can represent Wikidata here:
http://www.oeaw.ac.at/acdh/de/node/396
This is on the 3rd of December so unfortunately short notice. If you
are willing and able to talk about Wikidata there let me know and I'll
get you in touch with the right people.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata