Hi Manu,
I would like to start by saying that we are always open for dialog!
I would like to add --simply because it seems to be a common
misunderstanding-- is that you can use any and all vocabularies in
conjunction with SearchMonkey, including your own media vocabularies.
See for example what Myspace has done in creating their own vocabulary [1].
As a way of explanation, you also have to know that when creating the
initial SearchMonkey documentation, we included vocabularies that we
felt were stable enough, covered major interests, and were widespread
including Dublin Core, FOAF and others. However, we wanted the
documentation to be complete and provide simple vocabularies in areas
where none existed or on the contrary, too many existed. The engineer
who created the media vocabulary followed the best practice of taking an
existing format (MediaRSS) and translating it into RDF. So that would
explain where the terms are coming from and also shows you that the
vocabulary is backed up by usage. We do publish OWL definitions for the
vocabularies at [2].
Obviously, this is not to say that the vocabularies are 'perfect' by
whatever measure of aesthetics (and yes, I consider ontology engineering
more of an art than science). All your suggestions to improve this
vocabulary or any other are highly welcome. There are only two
requirements: they have to be specific enough to implement and backwards
compatible. (As we are painfully aware of, schema versioning is an
unsolved problem on the Semantic Web.) The only non-issue I see from
your list of comments is the issue of prefixes: we have URIs in RDF(a)
and there already plenty of namespace clashes in the sense you describe:
RDF Calendar and Dublin Core both have at least two namespaces.
Best,
Peter
[1] http://www.myspace.com/parishilton
[2] http://developer.yahoo.com/searchmonkey/smguide/owl_defs.html
Manu Sporny wrote:
Ben Adida wrote:
Yahoo has launched even more RDFa coolness: embed RDFa on your site to
describe your flash games and videos, and they show up embedded in Yahoo
search results *for everyone*, *by default*.
Overall, this is great news. Very nice to see Yahoo! adopting RDFa this
deeply into their search service... do have some gripes about
SearchMonkey Vocabularies, however...
PS: the only thing that's a bit unfortunate is that they didn't reuse
Digital Bazaar's media vocabulary. I hope we can find a way to create
equivalences at some point... that's the goal of RDF, after all.
I've had a bit of time to look at Yahoo's published vocabularies and I'm
quite concerned by them and Yahoo!s general direction with vocabulary
design.
Here's a list of issues that I was able to find... there are
many more issues that I found than are outlined here. It would be good
to talk with whoever designed their vocabularies. You can find an
overview of Yahoo!s vocabularies here:
http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html
Issues specific to Yahoo's Media vocabulary:
Vocabulary is not machine-readable, not validate-able
-----------------------------------------------------
Yahoo's searchmonkey media vocabulary defined here:
http://search.yahoo.com/searchmonkey/media/
is not machine-readable. There are no RDF ranges, subClassOf, comments,
or types specified. New RDF vocabularies, especially ones from large
companies like Yahoo, should be machine readable otherwise it's going to
be nearly impossible to validate against them.
Monolithic Vocabulary Design
----------------------------
Rather than break Media out into multiple different vocabularies, Yahoo
has shoved audio, video, text, photos, thumbnails, re-invented sets, and
shoved them into one monolithic vocabulary which will surely get more
and more bloated as the years increase.
Rather than create a nice vocabulary stack (like what we've been doing
for the past several years):
+--------------+
|Music Ontology|
+--------------+-------+
| Audio | Video |
+--------------+-------+
| Media |
+----------------------+
They've instead created a mega vocabulary that doesn't seem to be backed
up by any usage data... or rather, it certainly isn't backed up by the
data we collected on the subjects of audio and video. Perhaps I'm
missing some sort of grand architecture, but when you have media:Article
and media:Text (neither of which subclass each other), then it shows
that not a great deal of design work went into your vocabularies.
Confounding Media with Media Format
-----------------------------------
Yahoo defines the following properties in media:
* media:bitrate
* media:channels
* media:duration
* media:fileSize
* media:framerate
* media:height
* media:samplingrate
* media:type
* media:width
Most of these are quite specific to web-based media formats and have
nothing to do with media in the physical world (not the Web). Many of
these can't be used to describe media:Text or media:Article. These
attributes really have nothing to do with media and should be separated
out into a different media format vocabulary.
* media:views
This one has more to do with social news sites than media.
Specification of medium using both class and property
-----------------------------------------------------
Yahoo defines both this:
media:Image, media:Audio, media:Video
and this:
media:medium - The type of object: image | audio | video | document |
executable.
What's the point of having both a 'medium' property and classes that
define the medium? media:medium shouldn't exist at all - use one or the
other, not both. Using both is confusing and will inevitably lead to
more pain for Yahoo down the line when you have to look at not only
@typeof information, but also medium information.
Naming conflict, right off of the bat
-------------------------------------
Yahoo has defined the following prefixes: commerce, media
These conflict directly with ones that we've already created, which
isn't that big of a deal - in fact, it shows that RDFa is resilient even
in these scenarios. However, it also means that almost all of the
solutions that have been proposed for addressing the "cut-paste
fragility" issue that the WHATWG has raised are now much more difficult
to implement correctly. Which commerce and which media vocabularies do
we resolve to?
I'm afraid that since Yahoo is the 300lb gorilla in the room, there will
be no place for good vocabulary designs.
These vocabularies will hurt RDFa adoption in the long run
----------------------------------------------------------
My real fear is that while Yahoo adopting RDFa will help in the short
term, these badly designed vocabularies will hurt RDFa adoption in the
long run.
The worst-case scenario is seeing wide adoption of Yahoo's media
vocabulary as it currently stands, which will eventually come under much
harsher and less constructive criticism than I've outlined above.
As I stated earlier, there are many more issues with what Yahoo has done
with their SearchMonkey vocabularies that should be fixed for the
benefit of this community. We are more than glad to help them work
through the issues, as long as Yahoo is willing to have an open dialog
with the RDF vocabulary creation community.
-- manu