[RDA-L] Triumvirate of giants committed to scheming

2011-06-21 Thread Bernhard Eversberg

Some of us have anticipated that one day Google would enter the
metadata arena with an approach entirely their own.
Now, this seems to have happened. But not just Google alone
is making the move, they have forged an unprecedented triumvirate with
their two biggest competitors, Microsoft and Yahoo:

  http://schema.org

Didn't we also expect that their design would bear scant
resemblance with anything the library world has ever come up
with? And it is true. There's also no similarity with the
Dublin Core, for that matter. OTOH, their vision is far removed
from anything like catalog cards, just what we've been dreaming
of, is it not? Even better, it is a record-free concept.

The word metadata, not to speak of catalog, has obviously
been carefully circumvented, for whatever reason. There is also no
pondering of functional requirements or user tasks, and a
closer look reveals, in particular, that the FRBR user tasks
can have been of no concern in their reasoning. There is,
however, something akin to an authority concept for persons.
The whole scheme addressesprimarilythe tasks of the SEO, the Search
Engine Optimizer, and these do not necessarily coincide with the
interests of the search engine user in every search situation, AGWS.

Structured markup, instead of metadata, is a much-used term
in the documentation. It is based on w3.org's Microdata
(http://dev.w3.org/html5/md-LC/), and the gist of it all appears
to be this:
  By adding additional tags to the HTML of your web pages -- tags
   that say, Hey search engine, this information describes this
   specific movie, or place, or person, or video -- you can help
   search engines and other applications better understand your
   content and display it in a useful, relevant way.
   Microdata is a set of tags, introduced with HTML5,
   that allows you to do this.
For now, it is only HTML documents that microdata can be applied
to. Different from DC, microdata tags can be spread out all over
the file, just in those places it applies to. That means the metadata
for a Web page is tightly integrated with the content, it does not
form a record for the page as a whole but it can describe any and
many parts of it, but it is useless if ripped out of context.
It could thus not become an easy successor to MARC in which records
stand in as surrogates for resources.

All of that sounds pretty remote from what we need and what we
are doing, and why not indeed. But if this thing picks up speed (not
totally unlikely, considering who's involved), we better take a look.
If it won't, one may still learn a bit from the way it fails.

Reproduced here, for the record (no pun intended), is the list of
attributes for their Book schema. Note what they regard important
and what not.
Book is on the third level of an object hierarchy:

Thing / CreativeWork / Book

http://schema.org/Book   (contains example, as of now, draft version 0.9)

PROPERTY
   TYPEDESCRIPTION

Properties from Thing
-
description
   TextA short description of the item.
image
   URL URL of an image of the item.
name
   TextThe name of the item.
url
   TextURL of the item.

Properties from CreativeWork

about
   Thing   The subject matter of the content.
aggregateRating
   AggregateRating  The overall rating, based on a collection of
reviews or ratings, of the item.
audio
   AudioObject An embeded audio object or URL assoc. w. the content
author
   Person or Organization
   The author of this content. Please note that author
   is special in that HTML 5 provides a special
   mechanism for indicating authorship via the rel tag.
   That is equivalent to this and may be used
   interchangabely.
awards
   TextAwards won by this person or for this creative work.
contentLocation
   Place   The location of the content.
contentRating
   TextOfficial rating of a piece of content
   for example,'MPAA PG-13'.
datePublished
   DateDate of first broadcast/publication.
editor
   Person  Editor for this content.
encodings
   MediaObject The media objects that encode this creative work
genre
   TextGenre of the creative work
headline
   TextHeadline of the article
inLanguage
   TextThe language of the content. please use one of the
   language codes from the IETF BCP 47 standard.
interactionCount
   TextA count of a specific user interactions with this
   item - for example, 20 UserLikes, 5 UserComments,
   or 300 UserDownloads. The user interaction type
   should be one of the sub types of UserInteraction.
isFamilyFriendly
   Boolean Indicates whether this content is family friendly (!)
keywords
   TextThe 

Re: [RDA-L] Triumvirate of giants committed to scheming

2011-06-21 Thread James Weinheimer

On 21/06/2011 11:08, Bernhard Eversberg wrote:
snip
Some of us have anticipated that one day Google would enter the 
metadata arena with an approach entirely their own. Now, this seems to 
have happened. But not just Google alone is making the move, they have 
forged an unprecedented triumvirate with their two biggest 
competitors, Microsoft and Yahoo:


http://schema.org

/snip

Stu Weibel had an interesting blog post on this at 
http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html. 
He says: Will they achieve semantic web goals?  Perhaps incrementally, 
but I suspect not a lot.  The goal is to sell more stuff, and 
optimization will be based on that.  To expect semantic value to ooze 
from the seams of commercial advertising (no matter how structured) 
seems unrealistic. I think he's right, but I personally can't blame 
Google and Co. for opting for something immeasurably simpler than RDF 
(almost anybody can implement the schema.org schema which is not at all 
the case with RDF) and certainly not anything like FRBR structures. 
Plus, it's available now and not in 10 or 15 years at the earliest, when 
everything we are doing today will be changed. If I were a publisher, I 
would really be interested in this initiative--probably more interested 
than working with libraries, although, as Bernhard points out, if I were 
a journal editor, I may not be too happy!


I think it would be wise for libraries to join this initiative if 
possible. One clever attempt appears to be trying to coopt schema.org by 
putting it into RDF: http://schema.rdfs.org/


--
James Weinheimer  weinheimer.ji...@gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/


Re: [RDA-L] Triumvirate of giants committed to scheming

2011-06-21 Thread Ed Jones
It should be borne in mind that the focus of schema.org is search engine 
optimization, whereas the Semantic Web and linked data have somewhat more 
ambitious--if so far elusive--goals.

Ed Jones
National University (San Diego)

-Original Message-
From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of James Weinheimer
Sent: Tuesday, June 21, 2011 3:42 AM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] Triumvirate of giants committed to scheming

On 21/06/2011 11:08, Bernhard Eversberg wrote:
snip
 Some of us have anticipated that one day Google would enter the 
 metadata arena with an approach entirely their own. Now, this seems to 
 have happened. But not just Google alone is making the move, they have 
 forged an unprecedented triumvirate with their two biggest 
 competitors, Microsoft and Yahoo:

 http://schema.org
/snip

Stu Weibel had an interesting blog post on this at 
http://weibel-lines.typepad.com/weibelines/2011/06/uncommon-cause.html. 
He says: Will they achieve semantic web goals?  Perhaps incrementally, 
but I suspect not a lot.  The goal is to sell more stuff, and 
optimization will be based on that.  To expect semantic value to ooze 
from the seams of commercial advertising (no matter how structured) 
seems unrealistic. I think he's right, but I personally can't blame 
Google and Co. for opting for something immeasurably simpler than RDF 
(almost anybody can implement the schema.org schema which is not at all 
the case with RDF) and certainly not anything like FRBR structures. 
Plus, it's available now and not in 10 or 15 years at the earliest, when 
everything we are doing today will be changed. If I were a publisher, I 
would really be interested in this initiative--probably more interested 
than working with libraries, although, as Bernhard points out, if I were 
a journal editor, I may not be too happy!

I think it would be wise for libraries to join this initiative if 
possible. One clever attempt appears to be trying to coopt schema.org by 
putting it into RDF: http://schema.rdfs.org/

-- 
James Weinheimer  weinheimer.ji...@gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/


Re: [RDA-L] Triumvirate of giants committed to scheming

2011-06-21 Thread Croft, Emily
Didn't we also expect that their design would bear scant
resemblance with anything the library world has ever come up
with?

The design or markup language might be unique (and I'm not saying it is), but 
things like author, title, genre, subject, date published, are all standard.  

OTOH, their vision is far removed
from anything like catalog cards, just what we've been dreaming
of, is it not? Even better, it is a record-free concept.

Not exactly-- it's just that the record is embedded in the item itself (the 
webpage).  That's a property  that libraries don't have access to because our 
books exists on shelves, DVDs in drawers, articles in databases, etc.  Websites 
exist in the ether, with no catalogue or database, and search engines crawl 
around looking for what's up there-- any schema that integrates authors, 
publishers, dates, and subjects right in the website is doing what library 
catalogues have always done: provide access.  We have an insurmountable barrier 
between record and item; websites don't.  That doesn't make it record-free.

That means the metadata
for a Web page is tightly integrated with the content, it does not
form a record for the page as a whole but it can describe any and
many parts of it, but it is useless if ripped out of context.

This looks a lot like 100 |a to me:
span itemprop=nameJames Cameron/span (born span 
itemprop=birthDateAugust 16, 1954)/span
It would take minutes for a programmer to write a conversion from this markup 
to something an ILS could read.  

It could thus not become an easy successor to MARC in which records
stand in as surrogates for resources.

It works with MARC-- they're both markup languages.  A Schema record wouldn't 
be a full AACR2 record, but that is easily noted in the coding level field for 
anyone who cares.  

You're right that not all resources need a surrogate in the form of a record.  
But many still do-- not because of some inherent inferiority in library 
resources, but because of the separation between where our resources come from, 
how they are accessed, and how library patrons know we have them.

All of that sounds pretty remote from what we need and what we
are doing. 

Nah... Google et al may be in it to sell products, but it still has to 
function-- it still has to identify the resource and ease access to it.

I can easily see a future where the library's catalog crawls the web for 
authoritative resources, searches the physical holdings (presuming there is 
such a thing in the future), and searches subscriber databases all together.  
Perhaps there could be a Schema tag for some sort of authoritative score, 
where librarians can rate websites (rather than a rating for how many liked 
this on Facebook, I mean).  But perhaps I'm getting ahead of myself.

Emily Croft
University of Redlands Library