On 02/04/12 19:52, Gregor Hagedorn wrote:
I certainly appreciate your experience in system design, Markus!
Still, I feel strongly about this so I poke another time. :-)

We should take care not to overrate this topic. There are hundreds of thousands of articles that have a unique, exact match between different language versions. Cities, countries, people, works of art, species, astronomic objects, chemical elements, car models, airports, ... I could continue forever -- all of these entities have a clear-cut agreed-upon identity that is not language dependent and that suffices for most purposes. I would go further and say that, if a concept has no such clear identity, then it is much less useful to store data about it. I am not at all concerned that different language versions of Wikipedia will use largely disjoint sets of Wikidata items due to small (but somehow essential) differences in meaning. It will happen, but usually for good reasons or as a temporary problem that can be addressed.

Please note that it does not matter if different communities have a different policy about what is written in an article about, say, a car model. Of course, two such articles will never be matching exactly, and always have a bit more or less information. However, for Wikidata it is only important that they are about the same car model. This will occur in a large number of cases.

Regards,

Markus



My analysis is that one of the two doors should closed in the first
iteration. This iteration could start with a clean system, that is
easily analyzed and has the potential of a better learning curve (it
does not delegate the difficulty to decide which door to use to the
user). If it turns out that both options of relations are needed, I
believe it is possible to add them in the next iteration.

My interpretation of the options:

Solution 1 (my preference for the sake of simplicity):
========
A wikidata page reflects an independent entity that may be an exact,
close, narrower or broader match with several Wikipedia language
versions. That is, each Wikidata page has relations to 0-n Wikipedias,
Wiktionaries, or Commons (NEW PROPOSAL to extent beyond Wikipedia
alone), each of which is labeled by exact/close/narrower/broader match
(SKOS vocabulary).

In addition (NEW PROPOSAL) it allows to express relations to other
definitions outside of Wikipedias, Wiktionaries, Commons, esp. were
the entity is complex to understand or map to Wikipedia entities. This
would be a natural extension of the Wikipedias, Wiktionaries, or
Commons to the entire semantic web.

An option to be researched over time would be, whether a "defining
link" qualifier must be present on one relation, pointing either to a
Wikipedia permalink to a given version or to an external permalink.

The great advantage of this system to me is that it can deal
efficiently with data where an import is desired, but where the exact
mapping to Wikipedia pages is difficult to ascertain.


Solution 2 (clear cut version of your present proposal, perhaps
cleanest solution):
========

A wikidata page reflects 0-1 Wikipedias, Wiktionaries, or Commons
pages. Relations such as exact/close/narrower/broader between the
language versions (the interlanguage links) are stated only  between
Wikidata pages.

Advantage to me: Clearcut design. As Wikipedia pages develop and
become broader or narrower in scope, only the relations between
wikidata objects need to be changed. However, property data of the
Wikidata page may become false as the linked Wikipedia page in a given
language changes.


Solution 3 (your present preference to express relations with two
structurally separate means):
========
A wikidata page reflects 0-n Wikipedias, Wiktionaries, or Commons
pages. If two pages are an exact or close match, they are stored as
multiple Wikipedia-Links to a single Wikidata page. A differentiation
between close and exact match is not possible.

If a given language version of Wikipedia is sufficiently different
from another one, it must be linked to a newly created, independent
Wikidata pages. Relations between Wikidata pages can be qualified as
narrower/broader (but not close or exact match, these are required to
be given the same Wikidata page.

Advantage as I see them:  None.

Potential problems with solution 3:

1. The user is expected to use two widely different actions depending
on whether two language versions are sufficiently closely matching or
not. The actions are structurally different, and it is unlikely that
this can be hidden by the user interface (because wikidata page object
creation and deletions are involved) The burden of the decision where
to draw the line is left to the user community. Revisions of this
decision within the community require creating or deleting Wikidata
objects, and are therefore likely to be difficult to make transparent.

2. Scenario: If two Wikipedia language versions describe more or less
the same abstract object, but one is later revised to be more narrow,
the other more broadly, a careful study of the changes of the
revisions since the creation of the wikidata page object is required,
to decide which Wikidata page remains linked to a Wikipedia page, and
for which revision a new one must be created. Or whether perhaps two
new ones must be created?

apologies for pestering you with this...

Gregor

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Dr. Markus Kroetzsch
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529               http://korrekt.org/

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to