Re: [Wikidata-l] Namespace-based model

2012-04-05 Thread Denny Vrandečić
John,

your suggestion has two requirements that I think are hard to achieve:

* first, we need an agreement on the set of (non-overlapping but complete)
types that exist in the world
* second, we would need to assume that the Wikidata editors would agree on
one and exactly one type for every item, and not change that anymore. And
this seems to habe to happen during the creation of the item.

I find both assumptions rather strong. What type would Tuesday be? Or
Roman-catholic religion? What type is Love? Who decided on the types?
What are the conditions for typeness? You give Person and Place and Date as
types. So Obama is a Person. Is Gollum a Person? Hal-9000? Noah, the
builder of the ark? Enos, the chimpansee that traveled into space?

I think the assumption that everything has exactly one type is
oversimplifying.

Cheers,
Denny


2012/4/5 John McClure jmccl...@hypergrove.com

 **

 Wiki namespaces are currently so underused people may not realize their
 importance: they provide crucial semantic information. For instance,
 consider the example given in Wikidata's data model article[1]
  Obama was US Senator from Illinois from January 3, 2005 to November 16,
 2008

 which yielded these observations:

- mainSnak of type PropertyValueSnak with subject Obama, property
US Senator from, and value Illinois
- auxiliary Snak of type PropertyIntervalSnak with property in
office and interval January 3, 2005 to November 16, 2008 (the subject of
the auxiliary Snak is always the statement itself).

 An alternative lexical model might restate this as
  The US Senator for the place Illinois is/was the person Obama from date
 January 3, 2005 until date November 16, 2008.

- the prime resource being described is a US Senator page not so much
the Obama page
- Person:Obama is the subject complement of this US Senator via the
linking verb-property 'was' or is
- for is a property of this US Senator whose value is
Place:Illinois
- from is a property of this US Senator with the value Date:January
3, 2005
- until is a property of this US Senator with the value
Date:November 16, 2008

 A significant point is that that US Senator page is named Senator:Barack H
 Obama (or, Legislator:Barack H Obama or Public Employee:Barack H Obama,
 etc); it is of type US Senator, and it has these three properties, for,
 from, and until. In other words, if the content from this page is to be
 shown on the Person:Barack H Obama page, then that content should be
 transcluded from the Senator page; its semantic markup need not because
 software can interpret transcluded material as being a subject of, or
 organic to, the Person page.

 Lastly I really don't know how developers will cognitively absorb made-up
 words like Snak. The need for the term does mystify me somewhat. I do think
 everyone seems to get namespaces, appreciating the clarity they provide.
 I hope concepts like namespace can be equally as prominent at this stage
 as Snaks in the Wikidata model. Regards, 
 --Hypergrovehttp://meta.wikimedia.org/w/index.php?title=User:Hypergroveaction=editredlink=1(
 talkhttp://meta.wikimedia.org/w/index.php?title=User_talk:Hypergroveaction=editredlink=1)
 03:03, 5 April 2012 (UTC)

 [1] http://meta.wikimedia.org/w/index.php?title=Talk:Wikidata/Data_model

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Namespace-based model

2012-04-05 Thread John McClure
It's more accurate to say that your belief is an artifact of present tools.
RDF has just one way to associate a Class with an object, the rdf:type
attribute. Specifically because RDF makes no distinction between classes
that represent a type-of-thing (eg a Character) and classes that represent a
facet-of-thing (eg Fictional), present tools require multiple classes to be
able to be associated with any resource. Obviously a given resource can have
multiple facets. In my work I store facet-classes in the Dublin Core
Coverage and Format properties and I store a single existential-class in the
Dublin Core Type property for the page; the page's template restates both
kinds of classes as Categories for the page (hence my piqued email to at
least define existential classes in a separate namespace from category).

So if no distinction is made, then multiple types are indeed necessary. If
a distinction between nouns and adjectives is made, then one type + multiple
facets is necessary.
  -Original Message-
  From: John McClure [mailto:jmccl...@hypergrove.com]
  Sent: Thursday, April 05, 2012 7:08 PM
  To: Wikidata (E-mail)
  Subject: [Wikidata-l] Namespace-based model


  Denny said:
  I think the assumption everything has exactly one type is oversimplifying

  The assumption that everything is of multiple types is over-complicating.
  Usually you can tell from the first sentence in the Wikipedia page.

  Tuesday is a day of the week
  Love is an emotion
  (Roman) Catholicism is a faith
  Gollum is a fictional character
  HAL-9000 is a character
  Noah is a Patriarch
  Enos was the first chimpanzee

  So consensus certainly is being achieved among thousands of authors about
the fundamental type of thing each of these pages represent. Disambiguation
pages very commonly reference these types of things as in Enos
(chimpanzee).

  Let's take Gollum. I can imagine a topic map has these subjects:
  1. Character
  1A. Fictional character
  1A1. Fictional person
  1A2. Fictional animal
  1A3. Fictional ghost
  1A4. Fictional god

  Another equally valid assertion is that Gollum is a Character that is
typed as Fictional and Human thing (both these adjectives that are instances
of owl:Class) -- so that a comprehensive system sometime in the future would
reinterpret that Gollum is actually a Fictional person.

  As you say yourself, it's not useful to create a perfect system to
handle every imaginable edge case **to the extent that they exist**.
Personally I don't believe such edge cases can be found - I challenge anyone
to provide me such an example.

  But more to the point of Wikidata. I don't believe for a second that WP
will be reorganized into thousands of namespaces. Rather, I believe first,
SUBOBJECT names must include the idea of 'namespace' for the efficiencies
gained, and second, WP pages should be associated with the same set of nouns
(noun-phrases) available for subobject names. IOW, it's an implementation
issue whether a wiki's pages are named using these namespaces, so that the
wiki as a whole can gain the same inherent efficiencies I've sketched for
subobjects.

  Best - john
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l