@James
As you mention yourself using ranks is a very limiting approach, and I
think that we shouldn't modify the data to help the queries, but try to
make the queries more intelligent. - Once confliciting, and time-dependent
statements are added to each item, the return values of simple queries will
be huge lists, or chunks of the data-tree. - So I think even the infoboxes
have to make some decisions on how they wan't to deal with the complexity,
and those decisions might not be the same in every language community. - I
also think we need to communicate this more that something like "Mayor of
Barcelona" might get 1 results now, but is actually bad-practice and in
Wikidata's future will likely return 100s of values.

-Tobias

2015-11-27 15:58 GMT+01:00 James Heald <[email protected]>:

> Some items have quite a lot of "instance of" statements, connecting them
> to quite a few different classes.
>
> For example, Frankfurt is currently an instance of seven different classes,
>     https://www.wikidata.org/wiki/Q1794
>
> and Glasgow is currently an instance of five different classes:
>     https://www.wikidata.org/wiki/Q4093
>
> This can produce quite a pile-up of descriptions in the
> description/subtitle section of an infobox -- for example, as on the
> Spanish page for Frankfurt at
>     https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
> in the section between the infobox title and the picture.
>
>
> Question:
>
> Is it an appropriate use of ranking, to choose a few of the values to
> display, and set those values to be "preferred rank" ?
>
> It would be useful to have wider input, as to whether it is a good thing
> as to whether this is done widely.
>
> Discussions are open at
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank
> and
> https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
>
> -- but these have so far been inconclusive, and have got slightly taken
> over by questions such as
>
> * how well terms really do map from one language to another --
> near-equivalences that may be near enough for sitelinks may be jarring or
> insufficient when presented boldly up-front in an infobox.
>
> (For example, the French translation "ville" is rather unspecific, and
> perhaps inadequate in what it conveys, compared to "city" in English or
> "ciudad" in Spanish; "town" in English (which might have over 100,000
> inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt" in
> German).
>
> * whether different-language wikis may seek different degrees of
> generalisation or specificity in such sub-title areas, depending on how
> "close" the subject is to that wiki.
>
> (For readers in some languages, some fine distinctions may be highly
> relevant and familiar, whereas for other language groups that level of
> detail may be undesirably obscure).
>
>
> There is also the question of the effect of promoting some values to
> "preferred rank" for the visibility of other values in SPARQL -- in
> particular when so queries are written assuming they can get away with
> using just the simple "truthy" wdt:... form of properties.
>
> However, making eg the value "city" preferred for Glasgow means that it
> will no longer be returned in searches for its other values, if these have
> been written using "wdt:..." -- so it will now be missed in a simple-level
> query for "council areas", the current top-level administrative
> subdivisions of Scotland, or for historically-based "registration counties"
> -- and this problem will become more pronounced if the practice becomes
> more widespread of making some values "preferred" (and so other values
> invisible, at least for queries using wdt:...).
>
> From a SPARQL point of view, what would actually be very helpful would to
> add a (new) fourth rank -- "misleading without qualifier", below "normal"
> but above "deprecated" -- for statements that *are* true (with the
> qualifiers), but could be misleading without them
> * for example, for a town that was the county town of a shire once, but
> hasn't been for two centuries
> * or for an administrative area that is partly located in one higher-level
> division, and partly in another -- this is very valuable information to be
> able to note, but it's important to be able to exclude it from being all
> included in a recursive search for the places in one (but not the other) of
> that higher-level division.
>
> The statements shouldn't be marked "deprecated", because they are true
> (unlike a widely-given but incorrect date of birth, for example).  At the
> moment one can sort of work round the issue, if one can find another
> statement to make "preferred", so that the qualified statement becomes
> invisible to a simple search without qualifiers.  However, if "preferred"
> status is going to be used just to select things to show in infoboxes, it
> becomes very desirable that "wdt:..." searches should retrieve things at
> normal rank as well -- creating a need for a new rank for statements which
> are true, but misleading if read without qualifiers.
>
>
> What *is* needed though, is a view on whether trying to tailor what is
> shown in infoboxes is an appropriate reason to alter statement rankings.
>
> It would be good to get a view on this.
>
> The Spanish guys who stated doing this have temporarily put further
> rank-changes on hold, for the issue to be discussed; but so far what they
> have done has only just scratched the surface of what could be done --
> there are still a lot more cases of multiple values they would like to tidy.
>
> So: is this the kind of thing that "preferred rank" is envisaged for ?
>
> Or, should some statements not be marked as less preferred than others, if
> this is the only reason ?
>
>
>    --  James.
>
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to