Re: [whatwg] Creative Commons Rights Expression Language

Dan Brickley Mon, 25 Aug 2008 02:23:27 -0700


Kristof Zelechovski wrote:

It is not metadata vs data, it is metadata vs content.  Data in HTML
documents go into the SCRIPT element and they are usually expected to be
private to the page.
Chris

There's a significant body of work and thought around microformats (seebelow) that argues against keeping a separate and hidden pot of[meta]data. And in RDF land, we've found time and again that thedistinction between so-called "metadata" and "data" is one that serveslargely to confuse.

Re content vs [meta]data and microformats, eg. seehttp://tantek.com/log/2005/06.html#d03t2359 viahttp://microformats.org/wiki/principles

[[

One of the principles of microformats is to be presentable and parsable.This means we prefer visible data to invisible metadata. This is one ofthe lessons we learned from the meta keywords debacle.

In the early days of HTML, authors used to place keywords for theirpages in an invisible <meta> tag and search engines used thisinformation, because the specifications said to do so. However, beforelong, in the realm of the Wild Wild Web, these meta keywords fell out ofsync with the content on pages, were polluted, spammed, and otherwiseabused until there was so much noise, any semblance of signal was lost.Along came a new search engine that ignored meta keywords, used visiblehyperlinks instead, and instantly provided better results than all otherexisting search engines.

Lesson learned: hyperlinks, being visible by default, proved morereliable and persistently accurate for many reasons. Authors readily sawmistakes themselves and corrected them (because presentation matters).Readers informed authors of errors the authors missed, which were againcorrected. This feedback led to an implied social pressure to be moreaccurate with hyperlinks thus encouraging authors to more often get itright the first time. When authors/sites abused visible hyperlinks, itwas obvious to readers, who then took their precious attention somewhereelse. Visible data like hyperlinks with the positive feedback loop ofuser/market forces encouraged accuracy and accountability. This was astark contrast from the invisible metadata of meta keywords, which,lacking such a positive feedback loop, through the combination of gamingincentives and natural entropy, deteriorated into useless noise.

]]

In the RDF scene, many agree with the core claim here: data that is notused, rots. We RDFish people perhaps tend to take a broader notion reuse, and allow that the data might live primarily eg. in a database orapp, with its expression in HTML markup being a downstream copy. So, forexample, FOAF files that are generated automatically from a "socialnetwork" site are vastly more likely to be up to date than FOAF filesthat are hand edited or were created by one-shot tools likefoaf-a-matic. The core data might live in the social network site's dbrather than in HTML, but the principle here is that data that's un-usedand un-seen by humans is unlikely to be kept accurate. Data embedded inreal life activity is much healthier.

The Microformat view tends towards putting data in human-readable blocksof markup as a way of keeping it visible and alive. The RDF communitytends more towards making sure it can be consumed by multiple tools, sothat it is "seen" and consumed widely. Both generally agree that thehead section of an HTML document isn't usually the healthiest place tostore and manage [meta]data.


cheers,

Dan

--
http://danbri.org/

Re: [whatwg] Creative Commons Rights Expression Language

Reply via email to