+cc: Paul Miller of Talis, who worked on the AHDS report mentioned below.

Henri Sivonen wrote:
On Aug 23, 2008, at 02:43, Ben Adida wrote:

Why would you reinvent URIs in a way that they can't be de-referenced?

To avoid having misleading affordances.
http://en.wikipedia.org/wiki/Affordance

We want one parser, with variability and innovation in the vocabulary definition only.

Having one parser seems appealing compared to using the native mechanisms of each of HTML (<meta>, <link>), PDF (document information dictionary), PNG (tEXt chunk), etc. at first, but the vision that tools handle this all when you remix culture already requires the tools to support reading and writing the file formats they remix. When you already have format-native key-value read/write capability, the ability to build and mine RDF *graphs* becomes an additional burden.

It may not be obvious to those who haven't followed the history, or who were at school at the time, but many of us did indeed invest a lot of time and effort using name/value metadata structures in HTML. For example, the Dublin Core project began with this technology base beginning back in 1994/5, and the experience of metadata implementors using it was one of the drivers for the creation of RDF. At the time there no WHATWG to talk to, but the metadata community *did* talk to W3C.

See http://dublincore.org/about/history/

Early on, the Dublin Core community found a lot of pressure for feature-creep: new elements/terms to address the needs of various groups who liked Dublin Core, but wanted some specifics added. This situation gave rise to the 'Warwick Framework', defined in 1996 - http://www.dlib.org/dlib/july96/lagoze/07lagoze.html
[[
While there was consensus among the attendees that the concept of a simple metadata set is useful, there were a number of fundamental questions concerning the real utility of the Dublin Core as it was defined at the end of the preceding workshop. Does the very loosely defined Dublin Core really qualify as a "standard" that can be read and processed programmatically? Should the number of the core elements be expanded, to increase semantic richness, or reduced, to improve ease-of-use by authors and/or web publishers? Will authors reliably attach core metadata elements to their content? Should a core metadata set be restricted to only descriptive cataloging information or should it include other types of metadata such as administrative information, linkage data, and the like? What is the relationship of the Dublin Core to other developing work in metadata schemes, particularly in those areas such as rights management information (terms and conditions)?

The workshop attendees concluded that the answer to these questions and the route to progress on the metadata issue lay in the formulation a higher-level context for the Dublin Core. This context should define how the Core can be combined with other sets of metadata in a manner that addresses the individual integrity, distinct audiences, and separate realms of responsibility of these distinct metadata sets.
]]

For an implementor report typical of the experience from this era, ie. with name/value pairs, see the UK Arts and Humanities Data Service document http://ahds.ac.uk/public/metadata/discovery.html which was presented at the Oct'97 Helsinki workshop of the Dublin Core. At the time I was involved with the ROADS internet cataloguing project and can vouch that we hit a similar ceiling with attribute/value metadata.

From the appendix, http://ahds.ac.uk/public/metadata/disc_09.html ... here are some of attribute/value structures they were forced to squash their metadata records into.


DC.creator.corporateName.1
        Canterbury Archaeological Trust

DC.creator.phone.1
        +44 227 462062


DC.creator.personalName.2
        Paul Miller

DC.creator.affiliation.2
        Archaeology Data Service

...this expresses name, affiliation and contact information for a number of contributors to a work. Another example describes several contributors along with their roles (actor, director, etc). Again the attribute/value representations contained numeric indexes ('DC.creator.role.9') to disambiguate which individual was being described.


What barrier is there to building reusable vocabularies?

The follow-your-nose principle is missing, which is fairly essential for
discovering the meaning of vocabularies (partially automatically, not by
doing a Google search.)

The partial automation with RDFa doesn't go very far. If a program automatically dereferences http://creativecommons.org/ns# and parses the result as RDFa, the program now has a human-readable string for each property--not exactly something that the program can act on further without human help.


Looking at this example,

          <div id="license" about="#license" typeof="rdf:Property">
              <h4>cc:license</h4>
A <a rel="rdfs:domain" href="#Work">Work</a> <span property="rdfs:label">has license</span> a <a rel="rdfs:range" href="#License">License</a>. <br />

(a <a rel="rdfs:subPropertyOf" href="http://purl.org/dc/terms/license";>subproperty of dc:license</a>, <a rel="owl:sameAs" href="http://www.w3.org/1999/xhtml/vocab#license";>the same as xhtml:license</a>)
          </div>


Actually we can do a fair bit more than simply have human readable strings. For example from the CC case, we've got a sub-property relationship between cc:license and dc:license. RDF often (more often, even) has relationships amongst classes too, and between classes and properties. So for example, the SIOC vocabulary defines a class sioc:User as a subclass of foaf:OnlineAccount; this is mechanically evident from http://rdfs.org/sioc/ns# .... similarly, http://trac.usefulinc.com/doap defines the DOAP vocabulary, schema here - http://usefulinc.com/ns/doap# (webserver misconfigured re mimetype right now). DOAP defines a class doap:Project that subclasses FOAF's 'Project' class, and which comes with a number of properties describing opensource software projects. Again this is mechanically evident. As the ccREL paper explains, and I can confirm w.r.t. FOAF, it is very useful to allow related projects to define related classes and properties but manage their evolution separately. It's a strategy for making incremental progress without a single project/organization carrying the burden of total coordination. Edd and friends in the DOAP project, for example, can keep developing new properties for describing projects. Elsewhere in the Web, we can be annotating the URI for 'foaf:Project' eg. with translations. http://svn.foaf-project.org/foaftown/foaf18n/foaf-kr.rdf tells us that a Korean rdfs:label for http://xmlns.com/foaf/0.1/Project is "프로젝트 (어 떤 형태의 협업).". The DOAP list is busy figuring out how they might want (within DOAP or elsewhere, depending on complexity) to model customer relationships w.r.t. DOAP's notion of project, see http://lists.usefulinc.com/pipermail/doap-interest/2008-August/000338.html ... but whatever they come up with will be linked back to other information about FOAF's broader notion of Project.

So while it is useful to have human readable strings (including translations) we also get simple relationships between independently defined vocabulary terms. RDFS basics here are sub-property, sub-class, range and domain. Without clear Web identifiers for vocabulary terms I believe this kind of distributed, collaborative approach becomes significantly harder. And I believe the experience of many in the Dublin Core metadata scene since the mid-90s backs this up...

cheers,

Dan

--
http://danbri.org/

For an example of browsing this kind of data structure btw see http://mqlx.com/~david/parallax/

Reply via email to