Kristof Zelechovski wrote:
It is not metadata vs data, it is metadata vs content. Data in HTML
documents go into the SCRIPT element and they are usually expected to be
private to the page.
Chris
There's a significant body of work and thought around microformats (see
below) that argues against keeping a separate and hidden pot of
[meta]data. And in RDF land, we've found time and again that the
distinction between so-called "metadata" and "data" is one that serves
largely to confuse.
Re content vs [meta]data and microformats, eg. see
http://tantek.com/log/2005/06.html#d03t2359 via
http://microformats.org/wiki/principles
[[
One of the principles of microformats is to be presentable and parsable.
This means we prefer visible data to invisible metadata. This is one of
the lessons we learned from the meta keywords debacle.
In the early days of HTML, authors used to place keywords for their
pages in an invisible <meta> tag and search engines used this
information, because the specifications said to do so. However, before
long, in the realm of the Wild Wild Web, these meta keywords fell out of
sync with the content on pages, were polluted, spammed, and otherwise
abused until there was so much noise, any semblance of signal was lost.
Along came a new search engine that ignored meta keywords, used visible
hyperlinks instead, and instantly provided better results than all other
existing search engines.
Lesson learned: hyperlinks, being visible by default, proved more
reliable and persistently accurate for many reasons. Authors readily saw
mistakes themselves and corrected them (because presentation matters).
Readers informed authors of errors the authors missed, which were again
corrected. This feedback led to an implied social pressure to be more
accurate with hyperlinks thus encouraging authors to more often get it
right the first time. When authors/sites abused visible hyperlinks, it
was obvious to readers, who then took their precious attention somewhere
else. Visible data like hyperlinks with the positive feedback loop of
user/market forces encouraged accuracy and accountability. This was a
stark contrast from the invisible metadata of meta keywords, which,
lacking such a positive feedback loop, through the combination of gaming
incentives and natural entropy, deteriorated into useless noise.
]]
In the RDF scene, many agree with the core claim here: data that is not
used, rots. We RDFish people perhaps tend to take a broader notion re
use, and allow that the data might live primarily eg. in a database or
app, with its expression in HTML markup being a downstream copy. So, for
example, FOAF files that are generated automatically from a "social
network" site are vastly more likely to be up to date than FOAF files
that are hand edited or were created by one-shot tools like
foaf-a-matic. The core data might live in the social network site's db
rather than in HTML, but the principle here is that data that's un-used
and un-seen by humans is unlikely to be kept accurate. Data embedded in
real life activity is much healthier.
The Microformat view tends towards putting data in human-readable blocks
of markup as a way of keeping it visible and alive. The RDF community
tends more towards making sure it can be consumed by multiple tools, so
that it is "seen" and consumed widely. Both generally agree that the
head section of an HTML document isn't usually the healthiest place to
store and manage [meta]data.
cheers,
Dan
--
http://danbri.org/