Re: [whatwg] Trying to work out the problems solved by RDFa

Calogero Alex Baldacchino Fri, 09 Jan 2009 20:35:44 -0800

Ben Adida ha scritto:

Ian Hickson wrote:
We have to make sure that whatever we specify in HTML5 actually is goingto be useful for the purpose it is intended for. If a feature intended forwide-scale automated data extraction is especially susceptible to spammingattacks, then it is unlikely to be useful for wide-scale automated dataextraction.
It's no more susceptible to spam than existing HTML, as per my previous
response.

Perhaps this is why general purpose search engines do not rely(entirely) on metadata and markup semantics to classify content, nordoes Yahoo with SearchMonkey. SearchMonkey documentation points out thatmetadata never affects page ranks, nor is semantics interpreted for anypurpose; metadata only affects additional informations presented to theuser at the user will, and if the user chose to get informations of acertain kind (gathered by a certain data service), thus spammy metadatacan be thought as circumscribed in this case, they might corruptSearchMonkey additional data, but not the user's overall experience withthe search engine. From this point of view, SearchMonkey is some kind ofwide-range but small-scale use case (with respect to each tool and eachsite the user might enable), because the user can easily choose whichsources to trust (e.g. which data services to use, or which sites tolook for additional infos), and in any case he can get enough infoswithout metadata.

On the other hand, a client UA implementing a feature entirely based onmetadata couldn't easily circumscribe abused metadata and bring validinformations to the user attention, nor could the average user takeeasily trusted and spammy sites apart, because he wouldn't understandthe problem (and a site with spammy metadata might still containinformations users were interested in previously, or in a differentcontext), whereas in SearchMonkey the average user would noticesomething doesn't work in enhanced results, but he'd also get the basicinfos he was looking for. Thus there are different requirements to betaken into account for different scenarios (SearchMonkey and client UAare such different scenarios)

Moreover, SearchMonkey is a kind of centralised service based ondistributed metadata, it doesn't need collaboration by any other UA(that is, it doesn't need support for metadata in other software) bydefault (whereas it allows custom data services to autonomously extractmetadata, but always for the purposes of SearchMonkey), it only requiresthat web sites adhering to the project (or just willing to provideadditional infos) embed some kind of metadata only for the purpose ofmaking them available to SearchMonkey services, or at least that authorscreate appropriate metadata and send them to Yahoo (in the form ofdataRSS embedded in a Atom document). That is, SearchMonkey seems to mea clear example of a use case for metadata not requiring any changes tohtml5 spec, since any kind of supported metadata are used bySearchMonkey as if they were custom, private metadata; whatever happensto such metadata client-side, even if they're just stripped by abrowser, doesn't really matter.

Furthermore, SearchMonkey supports several kinds of metadata, not onlyRDFa, but also eRDF, microformats and dataRSS external to the document.So, why should SearchMonkey be the reason to introduce explicit supportto RDFa and not also for eRDF, which doesn't require new attributes, butjust a parser? One might think one solution is better than the other,and this might be true in theory, but what really counts is what peopledo find easier to use, and this might be determined by experience withSearchMonkey (that is, let's see what people use more often, then decidewhat's more needed).

Moreover, RDFa is thought for xhtml, thus it can't be introduced in htmlserialization just by defining a few new attributes: a processor wouldor might need some knowledge over /namespaces/, thus the whole "family"of *xmlns* attributes (with and without prefixes) should be specifiedfor use with the html serialization, unless an alternative mechanism,similar to the one chosen for eRDF, were defined, and maybe such wouldresult in a new, hybrid mechanism (stitching together pieces from eRDFand RDFa). Buf if we introduce xmlns and xmlns:<prefix> into htmlserialization, why not also prefixed attributes? That is, can RDFa beintroduced into html serialization "as is", without resorting to thewhole xml extensibility? This should be taken into account as well,because just adding new attributes to the language might work fine forxml-serialized documents, but might not for html-serialized ones. Thismeans RDFa support might be more difficult than it may seem at firstglance, whereas it might not be needed for custom and/or small scale usecases (and I think SearchMonkey is one such case).

Nobody is suggesting that user agents derive any behavior from <title>, soit doesn't matter if <title> is spammed or not.


And RDFa does not mandate any specific behavior, only the ability to
express structure. The power lies in products like SearchMonkey that
make use of this structure with innovative applications.

Can one imagine tools that make poor use of this structured data so that
they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or
poorly conceived applications can be imagined, then it's not in the
standard?

I think the right question should be whether there are effective countermeasures to circumscribe bad uses and make possible damages lesssignificant then advantages from good uses. When a feature in thestandard is thought to be a possible security (or privacy) issue,counter-measures are proposed. Since spam is a possible immediate issuefor abused metadata, especially in wide-scale and automated dataextraction, we should also think to possible counter-measures to bespecc'ed out along with RDFa attributes.


WBR, Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Innammorarsi è facile con Meetic, milioni di single si sono iscritti, si sono 
conosciuti e hanno riscoperto l'amore. Tutto con Meetic, prova anche tu!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8292&d=10-1

Re: [whatwg] Trying to work out the problems solved by RDFa

Reply via email to