Tom Heath wrote:
As always it's a case of the right tool for the right job. Regarding
your other (admittedly unfounded) claim, there may be many more people
who end up publishing RDF as RDFa, but collectively they may end up
publishing far fewer triples in total than a small number of publishers
with very large data sets who choose to use RDF/XML to expose data from
backend DBs.
Hey, size isn't everything :)
Generating a massive RDF dataset is as easy as piping one's HTTP logs
through sed. There are many measures for data utility. Is the data
fresh? accurate? useful? maintained? *used*? Does it exploit well known
vocab? Does it use identifiers that other people use? Or identification
strategies that allow cross-reference with other data anyway? Are the
associated http servers kept patched and secure? Is it available over
SSL? Is there at least 5 years paid up on each associated DNS hostname
used? Do we know who owns and takes care of those domain names? Does it
link out? do people link in? Does the data have clear license? And
respect user's privacy wishes where appropriate? Is it I18N-cool?
On the size questsion: I'm wary of encouraging a 'bigger is better'
attitude to triple count. In data as in prose, brevity is valuable.
Extra triples add cost at the aggregation and querying level; eg.
sometimes a workplaceHomepage triple is better than having a 'workplace'
one and a 'homepage one'.
cheers,
Dan
--
http://danbri.org/