Thanks, Brent! I was hoping to get some numbers exactly from you :)

I am extremely curious what kind of statements people will make in the
Wikidata page about "art", "privacy", "agriculture", "army", etc. I am
looking forward to see what the community will add there. That'll be fun to
watch :)

(Usually, such things tend to be retroactively obvious, but extremely hard
to predict :) )

Cheers,
Denny

2012/4/6 Brent Hecht <[email protected]>

> Hi All,
>
> Our data (using a 25-language dataset) agrees with Denny's. 99% of all
> connected components of the interlanguage link graph have only one article
> per language edition. This is something we looked into in some detail in
> our paper at ACM's CHI conference this year (
> http://www.brenthecht.com/papers/bhecht_CHI2012_omnipedia.pdf).
>
> However, it is important to point out that the 1% tends to contain
> articles that are of great general interest. Some English articles that
> occur in these situations include, "author", "art", "indigenous people",
> "education", "privacy", "liberal arts", "computer science", "agriculture",
> "socialism", "army", etc. To a certain extent, this is to be expected.
> Where there is more global interest in a topic, there is going to be more
> ambiguity.
>
> Just my two cents.
>
> - Brent
>
>
> Brent Hecht
> Ph.D. Candidate in Computer Science
> CollabLab: The Collaborative Technology Laboratory
> Northwestern University
> w: http://www.brenthecht.com
> e: [email protected]
>
>
> On Apr 5, 2012, at 4:50 PM, Denny Vrandečić wrote:
>
> > Regarding definitions:
> >
> > Note that I said "Label + Description is identifying", not merely the
> label. I assume this to be true because even for your example of "Germany",
> the disambiguation page works with rather short descriptions of each
> disambiguated page [1]. So even that fuzzy concept that you gave an example
> seems to be sufficiently identifiable for the sake and mission of the
> Wikipedia community, which gives me reason to believe that the community
> can sort this out. I mean, they basically already had!
> >
> > Regarding the Kangoo / Kubistar example:
> >
> > In Wikidata they would be represented as two pages, one for the Kubistar
> (which would link to the Danish and German page for the Kubistar), and one
> for the Kangoo (which would link to the 20 language versions of the Kangoo
> article, including a Danish and a German one). This is a rather simple
> example, which would be easily expressed with the exact matches that we
> suggest.
> >
> > In Wikidata, the Wikipedia links are planned to be inverse functional -
> i.e., every Wikipedia article in a specific language can only be linked to
> from one single Wikidata article. Two Wikidata pages cannot claim the same
> Wikipedia article in a single language as their defining article.
> >
> > I.e. in the Kubistar/Kangoo example there would be two Wikidata pages.
> One about the Kubistar, linking to de:Nissan_Kubistar and
> da:Nissan_Kubistar, and one about the Kangoo, linking to the 20 different
> Kangoo articles. The Wikidata page for Kubistar could not link to any of
> those Kangoo articles.
> >
> > Please do not misunderstand, I am not categorically against nonexact
> matches or broader or narrower (or else I wouldn't be discussing). But I
> haven't seen examples yet that convince me that the additional complexity
> of broader/narrower or unexact is required. As I said before, if we can
> model more than 99% of all language links with the suggested simple
> solution, I am reluctant to make it more complicated for the remaining <1%.
> >
> > Cheers,
> > Denny
> >
> > P.S.: oh, yes, indeed! Thank you for this excellent and interesting
> discussion, it really does shed light on some of the aspects of the current
> draft of the data model, and will eventually improve it and sharpen the
> understanding of the model.
> >
> > [1] https://en.wikipedia.org/wiki/Germany_(disambiguation)
> >
> >
> >
> > 2012/4/5 Gregor Hagedorn <[email protected]>
> > On 5 April 2012 18:30, Denny Vrandečić <[email protected]>
> wrote:
> > > The label and the description together are meant to be identifying.
> > >
> > > I.e. "Georgia - A country in central Asia", or "Frankfurt - A city in
> Hesse,
> > > Germany", etc.
> > >
> > > Additionally, the Wikipedia links provide quite some guidance to it.
> >
> > I believe it will be difficult to craft labels that work as
> > definitions. A label is hinting, and may often be sufficiently precise
> > for the majority of purposes. If we speak of "Germany" it is very hard
> > to express in a simple string the different historical, geographical,
> > political delimitations that this term may carry.
> >
> > In my own field of work even technical terms are often difficult to
> > resolve to a definition. In biology, the width of taxon delimitations
> > changes over time and with new research, and even technical terms in
> > morphologoy often have quite different meanings, depending on the
> > "school" that is being followed.
> >
> > Or to cite a car example again: The label "Renault Kangoo" is
> > unspecific as to the version/revision/release of it, so technical data
> > that vary between these versions can not be added to it. However, the
> > de.wikipedia.org/wiki/Nissan_Kubistar is in most Wikipedias also
> > subsumed under "Renault Kangoo". So it is a valid assumption that when
> > labeling something "Renault Kangoo" it refers to both of these
> > identical models sold under different names. But then, the "Nissan
> > Kubistar" is only equivalent to the first version/revision/release of
> > the "Renault Kangoo"...
> >
> > This is not unsolvable, but if you want to import or add data to an
> > element, it will be very hard to judge from a short label the correct
> > concept. I was hoping that linking this to Wikipedia articles would
> > help, but this will be hard if a Wikidata page is linked to 40
> > Wikipedias, any given Wikidata editor can read only a handful of, and
> > with no support to distinguish between exactMatch and closeMatch.
> >
> > My suggestions is to allow a differentiation of exactMatch and
> > closeMatch and instruct editors to use at least one exact match, and
> > considers this or these the defining wikipedia pages, whereas other
> > are added as close match.
> >
> > Of course, the label will remain useful to stumble of changes in
> > definition of width of concept over time, and correct those after
> > consulting the revision number to which the original links was formed
> > (not present, but perhaps achievable by some timestamping and
> > comparison?)
> >
> > Gregor
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
> >
> >
> >
> > --
> > Project director Wikidata
> > Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin
> > Tel. +49-30-219 158 26-0 | http://wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>



-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to