As a followup to the old news linked from
<http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html>:
Google has now made available a testing tool at
<http://www.google.com/webmasters/tools/richsnippets>. As far as I'm
aware it's using the same code that the real search engine results use.
I tested it a bit, and it seems that what's implemented in that tool
bears very little relation to RDFa. It's not simply a buggy
implementation - it's not even attempting to handle RDFa remotely correctly.
http://philip.html5.org/demos/rdfa/google-rich-snippets.html shows a few
examples. It rejects some perfectly correct RDFa markup; it interprets
some perfectly correct RDFa markup incorrectly; and it accepts some
totally broken RDFa markup.
For example, the documentation at
http://www.google.com/support/webmasters/bin/answer.py?answer=146646
includes:
<a href="http://darryl-blog.example.com/" rel="v:friend">Darryl</a>
Google's tool says the output has "friend = Darryl", whereas RDFa says
to ignore the element content and output a triple "...
<http://rdf.data-vocabulary.org/#friend>
<http://darryl-blog.example.com/>" instead, so the markup is being
interpreted incorrectly.
With input like <span property="v:name" datatype="">John <span
property="v:nickname">Smith</span></span>, Google's tool only extracts
the name and ignores the nickname triple that an RDFa processor would
generate, so it's again failing to interpret the markup correctly.
With input like <span property="v:name" content="John
Smith">error</span>, it returns "name = error".
So it seems to totally ignore attributes like 'datatype' and 'content',
and treats 'rel' identically to 'property', as far as I can tell.
Also, the tool accepts input like:
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
<span property="v:name">John Smith</span>
</div>
while it rejects equivalent input like:
<div xmlns:v="http://rdf.data-vocabulary.or" typeof="v:g/#Person">
<span property="v:g/#name">John Smith</span>
</div>
It also accepts input like:
<div xmlns:v="http://arbitrary.example.org/#" typeof="v:Person">
<span property="v:name">John Smith</span>
</div>
and apparently entirely ignores that it's in a different namespace, and
processes the data as if it were in "http://rdf.data-vocabulary.org/#"
(it still shows up in the search result preview regardless of namespace,
as long as you have the right string after the colon).
It also accepts input like:
<div typeof="zzz:Person">
<span property="#:name">John Smith</span>
</div>
and emits a warning about the undeclared namespaces but otherwise
processes it as if it were all using the correct namespace.
So it seems that Google doesn't attempt to do any kind of
namespace/CURIE processing at all (other than a little bit for the
harmless warning) - it simply looks at the part of the attribute value
after the colon (case-insensitively), and ignores everything else.
Am I doing something wrong here, or am I missing a good reason for this
apparent behaviour? It seems very disappointing that Google is claiming
to support RDFa while failing to implement it in a way that is remotely
correct or compatible with other RDFa processors.
--
Philip Taylor
pj...@cam.ac.uk