@James As you mention yourself using ranks is a very limiting approach, and I think that we shouldn't modify the data to help the queries, but try to make the queries more intelligent. - Once confliciting, and time-dependent statements are added to each item, the return values of simple queries will be huge lists, or chunks of the data-tree. - So I think even the infoboxes have to make some decisions on how they wan't to deal with the complexity, and those decisions might not be the same in every language community. - I also think we need to communicate this more that something like "Mayor of Barcelona" might get 1 results now, but is actually bad-practice and in Wikidata's future will likely return 100s of values.
-Tobias 2015-11-27 15:58 GMT+01:00 James Heald <[email protected]>: > Some items have quite a lot of "instance of" statements, connecting them > to quite a few different classes. > > For example, Frankfurt is currently an instance of seven different classes, > https://www.wikidata.org/wiki/Q1794 > > and Glasgow is currently an instance of five different classes: > https://www.wikidata.org/wiki/Q4093 > > This can produce quite a pile-up of descriptions in the > description/subtitle section of an infobox -- for example, as on the > Spanish page for Frankfurt at > https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno > in the section between the infobox title and the picture. > > > Question: > > Is it an appropriate use of ranking, to choose a few of the values to > display, and set those values to be "preferred rank" ? > > It would be useful to have wider input, as to whether it is a good thing > as to whether this is done widely. > > Discussions are open at > > https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_rank > and > https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9 > > -- but these have so far been inconclusive, and have got slightly taken > over by questions such as > > * how well terms really do map from one language to another -- > near-equivalences that may be near enough for sitelinks may be jarring or > insufficient when presented boldly up-front in an infobox. > > (For example, the French translation "ville" is rather unspecific, and > perhaps inadequate in what it conveys, compared to "city" in English or > "ciudad" in Spanish; "town" in English (which might have over 100,000 > inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt" in > German). > > * whether different-language wikis may seek different degrees of > generalisation or specificity in such sub-title areas, depending on how > "close" the subject is to that wiki. > > (For readers in some languages, some fine distinctions may be highly > relevant and familiar, whereas for other language groups that level of > detail may be undesirably obscure). > > > There is also the question of the effect of promoting some values to > "preferred rank" for the visibility of other values in SPARQL -- in > particular when so queries are written assuming they can get away with > using just the simple "truthy" wdt:... form of properties. > > However, making eg the value "city" preferred for Glasgow means that it > will no longer be returned in searches for its other values, if these have > been written using "wdt:..." -- so it will now be missed in a simple-level > query for "council areas", the current top-level administrative > subdivisions of Scotland, or for historically-based "registration counties" > -- and this problem will become more pronounced if the practice becomes > more widespread of making some values "preferred" (and so other values > invisible, at least for queries using wdt:...). > > From a SPARQL point of view, what would actually be very helpful would to > add a (new) fourth rank -- "misleading without qualifier", below "normal" > but above "deprecated" -- for statements that *are* true (with the > qualifiers), but could be misleading without them > * for example, for a town that was the county town of a shire once, but > hasn't been for two centuries > * or for an administrative area that is partly located in one higher-level > division, and partly in another -- this is very valuable information to be > able to note, but it's important to be able to exclude it from being all > included in a recursive search for the places in one (but not the other) of > that higher-level division. > > The statements shouldn't be marked "deprecated", because they are true > (unlike a widely-given but incorrect date of birth, for example). At the > moment one can sort of work round the issue, if one can find another > statement to make "preferred", so that the qualified statement becomes > invisible to a simple search without qualifiers. However, if "preferred" > status is going to be used just to select things to show in infoboxes, it > becomes very desirable that "wdt:..." searches should retrieve things at > normal rank as well -- creating a need for a new rank for statements which > are true, but misleading if read without qualifiers. > > > What *is* needed though, is a view on whether trying to tailor what is > shown in infoboxes is an appropriate reason to alter statement rankings. > > It would be good to get a view on this. > > The Spanish guys who stated doing this have temporarily put further > rank-changes on hold, for the issue to be discussed; but so far what they > have done has only just scratched the surface of what could be done -- > there are still a lot more cases of multiple values they would like to tidy. > > So: is this the kind of thing that "preferred rank" is envisaged for ? > > Or, should some statements not be marked as less preferred than others, if > this is the only reason ? > > > -- James. > > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
