Thanks, Brent! I was hoping to get some numbers exactly from you :) I am extremely curious what kind of statements people will make in the Wikidata page about "art", "privacy", "agriculture", "army", etc. I am looking forward to see what the community will add there. That'll be fun to watch :)
(Usually, such things tend to be retroactively obvious, but extremely hard to predict :) ) Cheers, Denny 2012/4/6 Brent Hecht <[email protected]> > Hi All, > > Our data (using a 25-language dataset) agrees with Denny's. 99% of all > connected components of the interlanguage link graph have only one article > per language edition. This is something we looked into in some detail in > our paper at ACM's CHI conference this year ( > http://www.brenthecht.com/papers/bhecht_CHI2012_omnipedia.pdf). > > However, it is important to point out that the 1% tends to contain > articles that are of great general interest. Some English articles that > occur in these situations include, "author", "art", "indigenous people", > "education", "privacy", "liberal arts", "computer science", "agriculture", > "socialism", "army", etc. To a certain extent, this is to be expected. > Where there is more global interest in a topic, there is going to be more > ambiguity. > > Just my two cents. > > - Brent > > > Brent Hecht > Ph.D. Candidate in Computer Science > CollabLab: The Collaborative Technology Laboratory > Northwestern University > w: http://www.brenthecht.com > e: [email protected] > > > On Apr 5, 2012, at 4:50 PM, Denny Vrandečić wrote: > > > Regarding definitions: > > > > Note that I said "Label + Description is identifying", not merely the > label. I assume this to be true because even for your example of "Germany", > the disambiguation page works with rather short descriptions of each > disambiguated page [1]. So even that fuzzy concept that you gave an example > seems to be sufficiently identifiable for the sake and mission of the > Wikipedia community, which gives me reason to believe that the community > can sort this out. I mean, they basically already had! > > > > Regarding the Kangoo / Kubistar example: > > > > In Wikidata they would be represented as two pages, one for the Kubistar > (which would link to the Danish and German page for the Kubistar), and one > for the Kangoo (which would link to the 20 language versions of the Kangoo > article, including a Danish and a German one). This is a rather simple > example, which would be easily expressed with the exact matches that we > suggest. > > > > In Wikidata, the Wikipedia links are planned to be inverse functional - > i.e., every Wikipedia article in a specific language can only be linked to > from one single Wikidata article. Two Wikidata pages cannot claim the same > Wikipedia article in a single language as their defining article. > > > > I.e. in the Kubistar/Kangoo example there would be two Wikidata pages. > One about the Kubistar, linking to de:Nissan_Kubistar and > da:Nissan_Kubistar, and one about the Kangoo, linking to the 20 different > Kangoo articles. The Wikidata page for Kubistar could not link to any of > those Kangoo articles. > > > > Please do not misunderstand, I am not categorically against nonexact > matches or broader or narrower (or else I wouldn't be discussing). But I > haven't seen examples yet that convince me that the additional complexity > of broader/narrower or unexact is required. As I said before, if we can > model more than 99% of all language links with the suggested simple > solution, I am reluctant to make it more complicated for the remaining <1%. > > > > Cheers, > > Denny > > > > P.S.: oh, yes, indeed! Thank you for this excellent and interesting > discussion, it really does shed light on some of the aspects of the current > draft of the data model, and will eventually improve it and sharpen the > understanding of the model. > > > > [1] https://en.wikipedia.org/wiki/Germany_(disambiguation) > > > > > > > > 2012/4/5 Gregor Hagedorn <[email protected]> > > On 5 April 2012 18:30, Denny Vrandečić <[email protected]> > wrote: > > > The label and the description together are meant to be identifying. > > > > > > I.e. "Georgia - A country in central Asia", or "Frankfurt - A city in > Hesse, > > > Germany", etc. > > > > > > Additionally, the Wikipedia links provide quite some guidance to it. > > > > I believe it will be difficult to craft labels that work as > > definitions. A label is hinting, and may often be sufficiently precise > > for the majority of purposes. If we speak of "Germany" it is very hard > > to express in a simple string the different historical, geographical, > > political delimitations that this term may carry. > > > > In my own field of work even technical terms are often difficult to > > resolve to a definition. In biology, the width of taxon delimitations > > changes over time and with new research, and even technical terms in > > morphologoy often have quite different meanings, depending on the > > "school" that is being followed. > > > > Or to cite a car example again: The label "Renault Kangoo" is > > unspecific as to the version/revision/release of it, so technical data > > that vary between these versions can not be added to it. However, the > > de.wikipedia.org/wiki/Nissan_Kubistar is in most Wikipedias also > > subsumed under "Renault Kangoo". So it is a valid assumption that when > > labeling something "Renault Kangoo" it refers to both of these > > identical models sold under different names. But then, the "Nissan > > Kubistar" is only equivalent to the first version/revision/release of > > the "Renault Kangoo"... > > > > This is not unsolvable, but if you want to import or add data to an > > element, it will be very hard to judge from a short label the correct > > concept. I was hoping that linking this to Wikipedia articles would > > help, but this will be hard if a Wikidata page is linked to 40 > > Wikipedias, any given Wikidata editor can read only a handful of, and > > with no support to distinguish between exactMatch and closeMatch. > > > > My suggestions is to allow a differentiation of exactMatch and > > closeMatch and instruct editors to use at least one exact match, and > > considers this or these the defining wikipedia pages, whereas other > > are added as close match. > > > > Of course, the label will remain useful to stumble of changes in > > definition of width of concept over time, and correct those after > > consulting the revision number to which the original links was formed > > (not present, but perhaps achievable by some timestamping and > > comparison?) > > > > Gregor > > > > _______________________________________________ > > Wikidata-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > > > > > -- > > Project director Wikidata > > Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin > > Tel. +49-30-219 158 26-0 | http://wikimedia.de > > > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für > Körperschaften I Berlin, Steuernummer 27/681/51985. > > > > _______________________________________________ > > Wikidata-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > > > > > > > _______________________________________________ > Wikidata-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > -- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________ Wikidata-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-l
