Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Dan Brickley

Dan Brickley wrote:


Ben Adida wrote:

Greg Houston wrote:

I am not sure if Ben was eluding to this in the last paragraph, but to
further complicate things SearchMonkey is not actually using RDF,


I think you're confusing two different layers.

SearchMonkey parses HTML with microformats, and soon HTML+RDFa, and
makes that data available in RDF form to PHP scripts that you or anyone
else can write.


It does just this today, from actual RDFa. I've been working on an 
extension that integrates RDFa from the matched pages with additional 
information from external DataRSS (Atom+OpenSearch+RDFa) feeds.


A bit more information from Peter Mika at Yahoo (fwd'd with permission):
[[
the key point... is that indeed DataRSS is both Atom and RDFa 
compatible. RDFa is a set of attributes, we merely invented names for 
the XML elements that carry them... but you can completely ignore that 
and get the triples out by running an RDFa parser over it. OpenSearch is 
another extension you can add in the mix if you want.


We turn both microformats and RDFa-in-HTML into DataRSS when used as 
input for applications so that SearchMonkey applications can abstract 
away from the original format.


We are definitely not Microsoft doing JavaScript, since we are extending 
formats in the way they were foreseen (Atom extensibility) and complying 
with standards (RDFa) without adding to them or changing the meaning of 
constructs. So this is a genuine Semantic Web standards play.


Btw, we haven't announced RDFa support officially because we want to get 
it 100% right before we do... ok maybe 99% ;)

]]

cheers,

Dan

ps. http://labs.mozilla.com/2008/08/introducing-ubiquity/ is a nice case 
for in-page structured data, whether microformatty/posh or rdfa


Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Manu Sporny
Ian,

I am addressing these questions both personally and as a representative
of our company, Digital Bazaar. I am certainly not speaking in any way
for the W3C SWD, RDFa Task Force, or Microformats community.

Ian Hickson wrote:
> On Mon, 25 Aug 2008, Manu Sporny wrote:
>> Web browsers currently do not understand the meaning behind human 
>> statements or concepts on a web page. While this may seem academic, it 
>> has direct implications on website usability. If web browsers could 
>> understand that a particular page was describing a piece of music, a 
>> movie, an event, a person or a product, the browser could then help the 
>> user find more information about the particular item in question.
> 
> Is this something that users actually want? 

These are fairly broad questions, so I will attempt to address them in a
general sense. We can go into the details at a later date if that would
benefit the group in understanding how RDFa addresses this perceived need.

Both the Microformats community and the RDFa community believe that
users want a web browser that can help them navigate the web more
efficiently. One of the best ways that a browser can provide this
functionality is by understanding what the user is currently browsing
with more accuracy than what is available today.

The Microformats community is currently at 1,145 members on the
discussion mailing list and 350 members on the vocabulary specification
mailing list. The community has a common goal of making web semantics a
ubiquitous technology. It should be noted as well that the Microformats
community ARE the users that want this technology.

There are very few commercial interests in that community - we have
people from all walks of life contributing to the concept that the
semantic web is going to make the browsing experience much better by
helping computers to understand the human concepts that are being
discussed on each page.

I should also point out that XHTML1.1 and XHTML2 will have RDFa
integrated because it is the best technology that we have at this moment
to address the issue of web semantics. You don't have to agree with the
"best technology" aspect of the statement, just that there is some
technology X, that has been adopted to provide semantics in HTML.

The Semantic Web Deployment group at the W3C also believes this to be a
fundamental issue with the evolution of the Web. We are also working on
an HTML4 DTD to add RDFa markup to legacy websites. I say this not to
make the argument that "everybody is doing it", but to point out that
there seems to be a fairly wide representation, both from standards
bodies and from web communities that semantics is a requirement of
near-term web technologies.

> How would this actually work? 

I don't know if you mean from a societal perspective, a standards
perspective, a technological perspective or some other philosophical
perspective. I am going to assume that you mean from a "technological
perspective" and a "societal perspective" since I believe those to be
the most important.

The technological perspective is the easiest to answer - we have working
code, to the tune of 9 RDFa parser implementations and two browser
plug-ins. Here's the implementation report for RDFa:

http://www.w3.org/2006/07/SWD/RDFa/implementation-report/#TestResults

To see how it works in practice, the Fuzzbot plug-in shows what we have
right now. It's rough, but demonstrates the simplest use case (semantic
data on a web page that is extracted and acted upon by the browser):

http://www.youtube.com/watch?v=oPWNgZ4peuI

All of the code to do this stuff is available under an Open Source
license. librdfa, one of the many RDFa parsers is available here:

http://rdfa.digitalbazaar.com/librdfa/

and Fuzzbot, the semantic web processor, is available here:

http://rdfa.digitalbazaar.com/fuzzbot/

>From a societal perspective, it frees up the people working on this
problem to focus on creating vocabularies. We're wasting most of our
time in the Microformats community arguing over the syntax of the
vocabulary expression language - which isn't what we want to talk about
- we want to talk about web semantics.

More accurately, RDFa relies on technologies that are readily accepted
on the web (URIs, URLs, etc.) to express semantic information. So, RDFa
frees up users to focus on expressing semantics by creating vocabularies
either through a standards body, an ad-hoc group, or individually.
Anybody can create a vocabulary, then you let the web decide which
vocabularies are useful and which ones are not. The ones that aren't
useful get ignored and the ones that are useful find widespread usage.

>From a societal perspective, this is how the web already operates and it
is the defining feature that makes the web such a great tool for humanity.

> Personally I find that if I'm looking at a site with music tracks, say 
> Amazon's MP3 store, I don't have any difficulty working out what the 
> tracks are or interacting with the page. Why woul

Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Dan Brickley

Ben Adida wrote:

Greg Houston wrote:

I am not sure if Ben was eluding to this in the last paragraph, but to
further complicate things SearchMonkey is not actually using RDF,


I think you're confusing two different layers.

SearchMonkey parses HTML with microformats, and soon HTML+RDFa, and
makes that data available in RDF form to PHP scripts that you or anyone
else can write.


It does just this today, from actual RDFa. I've been working on an 
extension that integrates RDFa from the matched pages with additional 
information from external DataRSS (Atom+OpenSearch+RDFa) feeds.


cheers,

Dan

--
http://danbri.org/



Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Ben Adida
Greg Houston wrote:
> I am not sure if Ben was eluding to this in the last paragraph, but to
> further complicate things SearchMonkey is not actually using RDF,

I think you're confusing two different layers.

SearchMonkey parses HTML with microformats, and soon HTML+RDFa, and
makes that data available in RDF form to PHP scripts that you or anyone
else can write.

DataRSS is an intermediate representation that Yahoo provides so that,
if you want to specifically code your app to SearchMonkey, you can
directly produce DataRSS, which is only machine readable. But if you
want to produce the interoperable syntax that isn't Yahoo-specific and
that is both human and machine-readable, RDFa is your path.

(What may make things even more confusing is that DataRSS is itself
inspired by RDFa... but that's a different discussion altogether.)

Here's Yahoo discussing their support for RDFa parsing:
http://www.ysearchblog.com/archives/000527.html

-Ben




Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Greg Houston
On Tue, Aug 26, 2008 at 11:15 AM, Ben Adida <[EMAIL PROTECTED]> wrote:
> Here's one example. This is not the only way that RDFa can be helpful,
> but it should help make things more concrete:
>
>  http://developer.yahoo.com/searchmonkey/
>
> Using semantic markup in HTML (microformats and, soon, RDFa), you, as a
> publisher, can choose to surface more relevant information straight into
> Yahoo search results.
>
> And tool builders can build custom applications that surface other kinds
> of data for users who choose to install their SearchMonkey application.
> The extensibility of RDF, and in particular the ability to intermix
> vocabularies so that different applications can slice the data in their
> own chosen way, is key to this effort, as Yahoo appears to have recognized.

I am not sure if Ben was eluding to this in the last paragraph, but to
further complicate things SearchMonkey is not actually using RDF, but
their own specification based on RDF called, DataRSS, or in their own
words, "a standard similar to RDF". Since it is Yahoo's own homebrew I
think calling it a standard is something of a leap. It seems more akin
to Microsoft creating their own version of Javascript.

http://developer.yahoo.net/blog/archives/2008/08/rdf_xslt_and_the_monkey_makes_3.html

http://developer.yahoo.com/searchmonkey/smguide/datarss.html

On a side note, I don't have an opinion on this discussion. I just
thought this could use some clarification. Ben's post seemed
misleading to me since it made no mention of DataRSS.

- Greg


Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Ben Adida

I'm really glad Manu's explanation was helpful. Thanks Manu!

I don't want to interrupt this useful thread, I'll only contribute one
piece of information in the form of an example application:

> Is this something that users actually want? How would this actually work? 

[...]

> It would be helpful if you could walk me through some examples of what UI 
> you are envisaging in terms of "helping the user find more information".

[...]

> I don't think more metadata is going to improve search engines.

Here's one example. This is not the only way that RDFa can be helpful,
but it should help make things more concrete:

  http://developer.yahoo.com/searchmonkey/

Using semantic markup in HTML (microformats and, soon, RDFa), you, as a
publisher, can choose to surface more relevant information straight into
Yahoo search results.

And tool builders can build custom applications that surface other kinds
of data for users who choose to install their SearchMonkey application.
The extensibility of RDF, and in particular the ability to intermix
vocabularies so that different applications can slice the data in their
own chosen way, is key to this effort, as Yahoo appears to have recognized.

[...]

> If there's anything we can learn from the Web today, however, it is that 
> authors will reliably output garbage at the syntactic level.

[...]

> So to get this data into Web pages, we have to get past the laziness and 
> incompetence of authors.

Continuing with the Yahoo example: using SearchMonkey, users get
immediate feedback on whether they expressed the data correctly. It's
very similar to the incentive users have to make their page render
correctly: now they have incentive to express well-formed data.

Ian, I think you and I agree on the idea of user laziness. Hopefully,
you see the same thing I see here: a beginning of an incentive against
this laziness in the case of structured web data.

I should mention that two SearchMonkey applications have been installed
by default in Yahoo search results. So structured web data now has a
fairly sizable audience.

-Ben


Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Manu Sporny
Kristof Zelechovski wrote:
> Web browsers are (hopefully) designed so that they run in every culture.  If
> you define a custom vocabulary without considering its ability to describe
> phenomena of other cultures and try to impose it worldwide, you do more harm
> than good to the representatives of those cultures.  And considering it
> properly does require much time and effort; I do not think you can have that
> off the shelf without actually listening to them.

Kristof - I believe that you may also be confounding the concept of "the
method of expression" and the "vocabulary". RDFa is the method of
expression, the vocabulary uses that method of expression to convey
semantics.

RDFa is a collection of properties[1] for HTML family languages that are
used to express semantics through the use of a vocabulary. For an
example of what an RDF vocabulary page looks like, check out the following:

http://purl.org/media/

That page is marked up using RDFa to not only provide a human-readable
version of the vocabulary, but a machine readable version of the
vocabulary.

> In a way, complaining that the Microformats protocol impedes innovation is
> like saying 'we are big and rich and strong, so either you accommodate or
> you do not exist'.  Not that I do not understand; it is straightforward to
> say so and it happens all the time.

It's easy to miss the effect that the Microformats approach has on
innovation because it isn't stated directly in any Microformats
literature. I'd like to re-iterate that I have spent many, many hours
creating specifications in the Microformats community and have seen this
effect first-hand. I'd like to not focus on theory, but the state of the
world as it is right now.

Right now, the Microformats process requires everyone to go through our
community to create a vocabulary. It is the "we are big and rich and
strong, so either you accommodate or you do not exist" approach that you
seem to be arguing against.

If someone were to come along and request that bloodline be added to the
hCard format, it would be rejected as a corner-case. So, unless I
understood you incorrectly, RDFa provides a more open environment for
innovation because it doesn't require any sort of central authority to
approve a vocabulary.

One of the things that RDFa strives to do, and is successful at doing,
is to not give anyone power over what a constitutes a "valid" vocabulary.

If that's not what you were attempting to express, you will have to
explain it again, please.

-- manu

[1] http://www.w3.org/TR/rdfa-syntax#rdfa-attributes

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches


Re: [whatwg] RDFa Problem Statement (was: Creative Commons Rights Expression Language)

2008-08-26 Thread Ian Hickson
On Mon, 25 Aug 2008, Manu Sporny wrote:
> 
> Web browsers currently do not understand the meaning behind human 
> statements or concepts on a web page. While this may seem academic, it 
> has direct implications on website usability. If web browsers could 
> understand that a particular page was describing a piece of music, a 
> movie, an event, a person or a product, the browser could then help the 
> user find more information about the particular item in question.

Is this something that users actually want? How would this actually work? 
Personally I find that if I'm looking at a site with music tracks, say 
Amazon's MP3 store, I don't have any difficulty working out what the 
tracks are or interacting with the page. Why would I want to ask the 
computer to do something with the tracks?

It would be helpful if you could walk me through some examples of what UI 
you are envisaging in terms of "helping the user find more information". 
Why is Safari's "select text and then right click to search on Google" not 
good enough? Have any usability studies been made to test these ideas? 
(For example, paper prototype usability studies?) What were the results?


> It would help automate the browsing experience.

Why does the browsing experience need automating?


> Not only would the browsing experience be improved, but search engine 
> indexing quality would be better due to a spider's ability to understand 
> the data on the page with more accuracy.

This I can speak to directly, since I work for a search engine and have 
learnt quite a bit about how it works.

I don't think more metadata is going to improve search engines. In 
practice, metadata is so highly gamed that it cannot be relied upon. In 
fact, search engines probably already "understand" pages with far more 
accuracy than most authors will ever be able to express.


You started by saying:

> Web browsers currently do not understand the meaning behind human 
> statements or concepts on a web page.

This is true, and I even agree that fixing this problem, letting browsers 
understand the meaning behind human statements and concepts, would open up 
a giant number of potentially killer applications. I don't think 
"automating the browser experience" is necessarily that killer app, but 
let's assume that it is for the sake of argument.

You continue:

> If we are to automate the browsing experience and deliver a more usable 
> web experience, we must provide a mechanism for describing, detecting 
> and processing semantics.

This statement seems obvious, but actually I disagree with it. It is not 
the case the providing a mechanism for describing, detecting, and 
processing semantics is the only way to let browsers understand the 
meaning behind human statements or concepts on a web page. In fact, I 
would argue it's not even the the most plausible solution.

A mechanism for describing, detecting, and processing semantics; that is, 
new syntax, new vocabularies, new authoring requirements, fundamentally 
relies on authors actually writing the information using this new syntax.

If there's anything we can learn from the Web today, however, it is that 
authors will reliably output garbage at the syntactic level. They misuse 
HTML semantics and syntax uniformly (to the point where 90%+ of pages are 
invalid in some way). Use of metadata mechanisms is at a pitifully low 
level, and when used is inaccurate (Content-Type headers for non-HTML data 
and character encoding declarations for all text types are both widely 
wrong, to the point where browsers have increasingly complex heuristics to 
work around the errors). Even "successful" formats for metadata publishing 
like hCard have woefully low penetration.

Yet, for us to automate the browsing experience by having computers 
understand the Web, for us to have search engines be significantly more 
accurate by understanding pages, the metadata has to be widespread, 
detailed, and reliable.

So to get this data into Web pages, we have to get past the laziness and 
incompetence of authors.

Furthermore, even if we could get authors to reliably put out this data 
widely, we would have to then find a way to deal with spammers and black 
hat SEOs, who would simply put inaccurate data into their pages in an 
attempt to game search engines and browsers.

So to get this data into Web pages, we have to get past the inherent greed 
and evilness of hostile authors.


As I mentioned earlier, there is another solution, one that doesn't rely 
on either getting authors to be any more accurate or precise than they are 
now, one that doesn't require any effort on the part of authors, and one 
that can be used in conjunction with today's anti-spam tools to avoid 
being gamed by them and potentially to in fact dramatically improve them: 
have the computers learn the human languages themselves.

Instead of making all the humans of the world learn a computer language, 
or tools for writing that computer language, have the compute

Re: [whatwg] RDFa Problem Statement (was: Creative Commons Rights Expression Language)

2008-08-26 Thread Karl Dubost


Le 26 août 2008 à 16:04, Kristof Zelechovski a écrit :
Web browsers are (hopefully) designed so that they run in every  
culture.  If
you define a custom vocabulary without considering its ability to  
describe
phenomena of other cultures and try to impose it worldwide, you do  
more harm

than good to the representatives of those cultures.


The Web could have been designed in a Web of a huge central database  
of hypertext links. When the Web has been created it was mostly what  
hypertext solutions were proposing.


Having the possibility to rely on domain name system to create URLs  
has been the major shift in conceiving a distributed hypertext system.  
People could create independently without coordination their own Web  
site, put it online. Then some people could link to these Web sites  
from their own pages if they happen to know it.


A lot of craps have been put out there, a lot of good Web sites, a lot  
of duplicates too. In the end, the network effects, the social aspects  
of connecting has given places of references, has stabilized for a  
time some Web sites. Some have disappeared. There are broken links  
everywhere, but the net effect is…  the Web.


not that bad, no?

RDFa (and RDF effort in general) proposes exactly the same thing.

--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool








Re: [whatwg] RDFa Problem Statement

2008-08-26 Thread Dan Brickley

Kristof Zelechovski wrote:

Web browsers are (hopefully) designed so that they run in every culture.  If
you define a custom vocabulary without considering its ability to describe
phenomena of other cultures and try to impose it worldwide, you do more harm
than good to the representatives of those cultures.  And considering it
properly does require much time and effort; I do not think you can have that
off the shelf without actually listening to them.
In a way, complaining that the Microformats protocol impedes innovation is
like saying 'we are big and rich and strong, so either you accommodate or
you do not exist'.  Not that I do not understand; it is straightforward to
say so and it happens all the time.
Chris


Let me give a quick example of how this works in RDFland.

Each vocabulary defines nothing except classes (types of thing) and 
properties (aka relationship types). In FOAF for example, we defined 
Person, Agent, Document, OnlineAccount, Project, Group as classes. And 
we defined properties too. These tend to have a bit more 'character' 
than the classes, and carry the distinctive style of each vocabulary. 
FOAF has properties of Person and Agent such as 'openid', 'homepage', 
'weblog' that have as their range (ie. values) instances of the class 
Document. We also define properties like 'primaryTopic' that relate a 
page primarily about something to the thing itself. Each class and 
property is considered to be in the vocabulary whose URI is 
http://xmlns.com/foaf/0.1/ ... and this is the basis of RDF's "division 
of labour" mechanism. See also a squiggly diagram at 
http://danbri.org/2008/foafspec/foafspec.jpg (apologies that this is 
currently inaccessible).


The SIOC project declares a bunch more classes and properties. Some of 
these are defined with relationship to Person, Document, OnlineAccount 
from FOAF; classes that sub-class ours, or properties that cite our FOAF 
classes as the range or domain. DOAP does the same, expanding from the 
class Project to describe opensource projects. I've talked about this 
before so won't go on about those schemas.


The point about cultural diversity, independent extension etc is made 
better by the JaUranai FOAF extension that appeared a few years back:


http://kota.s12.xrea.com/vocab/uranai

They decided that FOAF was nice and all but was lacking some properties 
important in a Japanese context. So they declare new RDF properties: 
starsign, bloodtype, and various others that I don't fully understand 
because they have japanese names and documentation. From blood type's 
description from the RDF Schema file at 
http://kota.s12.xrea.com/vocab/uranai/uranai.rdf


http://kota.s12.xrea.com/vocab/uranaibloodtype";>
 血液型
 Blood type
 血液型を書きます。
 A blood type.
 http://xmlns.com/foaf/0.1/Person"/>
 http://www.w3.org/2000/01/rdf-schema#Literal"/>
[...]


This effectively wires in 'bloodtype' to the other classes in use in 
this wider community. Wherever SIOC or DOAP projects have created a 
property whose range is "Person", we know that Uranai's 'bloodtype' 
property is also applicable. Without needing heavy duty coordination 
between the SIOC and DOAP authors and the author of Uranai.


Furthermore, the fact that all these projects share a common syntactic 
grammar means that I can simply add a Uranai 'bloodtype' property into 
my FOAF self-description, and expect each and every RDF parser and 
SPARQL database to immediately be able to parse and query it - see 
http://danbri.org/words/2008/02/25/286 for example. As Manu describes in 
http://blog.digitalbazaar.com/2008/08/23/html5-rdfa-and-microformats/ 
this is rather different to the Microformats.org approach, which is by 
intention a monolithic community designing a single, self-consistent 
product.


Back on my point that RDF vocabulary classes (ie. named types of thing, 
Person etc) tend to be boring, and the properties more interesting. This 
is to address the difficulty you mention, ie. ... "If you define a 
custom vocabulary without considering its ability to describe phenomena 
of other cultures and try to impose it worldwide, you do more harm than 
good to the representatives of those cultures".


So for example in FOAF, we define fairly boring bland classes (like 
Person, Document) in a way that allow different cultures to attach 
properties that they care about. It seems "bloodtype" is more important 
in Japanese culture than in Western Europe, but that the toolset and 
design provided by RDFa allows independent extension of FOAF in Japan 
without expensive central bottlenecks. For Creative Commons, they have 
huge headaches because copyright law varies from country to country; 
this has informed their redesign and their enthusiasm for RDFa.


Hope this helps explain something of where RDFa folk are coming from,

cheers,

Dan

--
http://danbri.org/


Re: [whatwg] RDFa Problem Statement (was: Creative Commons Rights Expression Language)

2008-08-26 Thread Kristof Zelechovski
Web browsers are (hopefully) designed so that they run in every culture.  If
you define a custom vocabulary without considering its ability to describe
phenomena of other cultures and try to impose it worldwide, you do more harm
than good to the representatives of those cultures.  And considering it
properly does require much time and effort; I do not think you can have that
off the shelf without actually listening to them.
In a way, complaining that the Microformats protocol impedes innovation is
like saying 'we are big and rich and strong, so either you accommodate or
you do not exist'.  Not that I do not understand; it is straightforward to
say so and it happens all the time.
Chris

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manu Sporny
Sent: Tuesday, August 26, 2008 3:50 AM
To: Ian Hickson
Cc: WHAT-WG; [EMAIL PROTECTED]
Subject: Re: [whatwg] RDFa Problem Statement (was: Creative Commons Rights
Expression Language)

The Microformats community, and all communities like it, require a group
of people to come together, collaborate and create a standard vocabulary
to express ALL semantics. A somewhat strained analogy would be bringing
in representatives from all of the cultures of the world and having them
agree on a universal vocabulary. It is an untenable prospect, there is
too much diversity in the world to agree on one master vocabulary. This
is, however, the approach that Microformats has taken, for better or worse.

When you do not scope vocabularies, like the Microformats community has
chosen to do, you force new vocabulary development through a design
bottleneck. This isn't a theoretical bottleneck, it is one that we deal
with each day in the Microformats community.

The RDFa approach is to remove this vocabulary development bottleneck by
addressing the problem of creating a method of semantics expression. The
web has always relied on distributed innovation and RDFa allows that
sort of innovation to continue by solving the tenable problem of a
semantics expression mechanism. Microformats has no such general purpose
solution.

In short, RDFa addresses the problem of a lack of a standardized
semantics expression mechanism in HTML family languages. RDFa not only
enables the use cases described in the videos listed above, but all use
cases that struggle with enabling web browsers and web spiders
understand the context of the current page.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches



Re: [whatwg] RDFa Problem Statement (was: Creative Commons Rights Expression Language)

2008-08-25 Thread Manu Sporny
Ian Hickson wrote:
> I have no idea what problem RDFa is trying to solve. I have no idea what 
> the requirements are.

Ian, this is not an official response from the RDF in XHTML Task Force
or the Semantic Web Deployment Workgroup. It is a personal attempt to
outline some of the problems that RDFa is addressing.

Web browsers currently do not understand the meaning behind human
statements or concepts on a web page. While this may seem academic, it
has direct implications on website usability. If web browsers could
understand that a particular page was describing a piece of music, a
movie, an event, a person or a product, the browser could then help the
user find more information about the particular item in question. It
would help automate the browsing experience. Not only would the browsing
experience be improved, but search engine indexing quality would be
better due to a spider's ability to understand the data on the page with
more accuracy.

Currently, browsing is a very manual process, requiring a user to find
information about a particular subject and then copy-paste that
information from one page into another search page to explore
information about the subject. If we are to automate the browsing
experience and deliver a more usable web experience, we must provide a
mechanism for describing, detecting and processing semantics. If we are
to improve the search and indexing accuracy of web spiders, we must
provide a mechanism to describe, detect and process semantics.

Here's a really short intro video on why semantics are important (I'm
sure you already know all this stuff, but it outlines why online
communities are concerned about semantics and is meant to educate those
that aren't familiar with the concept of web semantics):

http://www.youtube.com/watch?v=OGg8A2zfWKg

The Microformats community has done a remarkable job of working on the
web semantics problem, creating several different methods of expressing
common human concepts (contact information (hCard), events (hCalendar),
and audio recordings (hAudio)). The method employed by the Microformats
community to embed semantics in web pages, using pre-existing HTML4 tags
and re-purposing them, was taken because none of the standards bodies
were effectively tackling the problem of embedded web semantics at the
time. In short, the community did the best that they could with what was
available to them at the time.

The results of the first set of Microformats efforts were some pretty
cool applications, like the following one demonstrating how a web
browser could forward event information from your PC web browser to your
phone via Bluetooth:

http://www.youtube.com/watch?v=azoNnLoJi-4

Here is another demonstration of how one could use music metadata
embedded in a web page to find more information about your favorite band:

http://www.youtube.com/watch?v=oPWNgZ4peuI

or how one could use movie metadata on a web page to find more
information about a movie:

http://www.youtube.com/watch?v=PVGD9HQloDI

The Mozilla Labs Aurora demos also show that semantic web markup is
necessary in order to execute upon some of the ideas demonstrated in
their future browsers project:

Aurora - Part 1 - Collaboration, History, Data Objects, Basic Navigation
http://www.vimeo.com/1450211

Aurora - Part 2 - Geo-location-based browsing
http://www.vimeo.com/1476338

Aurora - Part 3 - Integrating Web w/ Physical Environment
http://www.vimeo.com/1481810 (non-WebHD)

Aurora - Part 4 - Personal Data Portability
http://www.vimeo.com/1488633

Both RDFa and Microformats enable these user interaction scenarios and
make browsing the web a richer experience.

If one understands web semantics to be an important part of the web's
future, the question then becomes, why RDFa? Why not Microformats?

While there are a number of technical merits that speak in favor of RDFa
over Microformats (fully qualified vocabulary terms, prefix short-hand
via CURIEs, accessibility-friendly, unified processing rules, etc.),
this issue really boils down to one of centralized innovation vs.
distributed innovation.

The Microformats community, and all communities like it, require a group
of people to come together, collaborate and create a standard vocabulary
to express ALL semantics. A somewhat strained analogy would be bringing
in representatives from all of the cultures of the world and having them
agree on a universal vocabulary. It is an untenable prospect, there is
too much diversity in the world to agree on one master vocabulary. This
is, however, the approach that Microformats has taken, for better or worse.

When you do not scope vocabularies, like the Microformats community has
chosen to do, you force new vocabulary development through a design
bottleneck. This isn't a theoretical bottleneck, it is one that we deal
with each day in the Microformats community.

The RDFa approach is to remove this vocabulary development bottleneck by
addressing the problem of creating a method of semantics expression.