Bert wrote:
When it comes to search engines, can anyone prove that lean code
is better? Has anyone done research on this claim? Google is full
of tagsoup sites that are highly ranked.

I searched for "web design" in Google (pages from Australia
only).  The top 3 (non sponsored) sites used tables for layout,
none of them validated and only one had a doctype. They all used
some CSS but only in addition to the tagsoup.

So where are the benefits?


=====================================

As a digest reader I don't respond much since most things are settled or overworked by the time I read them. This may fit the later category but even so every now and then it's good to ask "why bother?".  Bert's been at it long enough to know the answer but sometime it's good to hear it.

For standards the answer  to why bother isn't really found in page size, bandwidth savings or even ease of reconfiguration. The answer is in the purpose of the semantic web. Berners-Lee, Hendler and Lassila were clear about their goal in their seminal The Semantic Web. The target is a decentralized, data driven web. That's the goal of the w3c: "a common framework that allows data to be shared and reused across application, enterprise, and community boundaries."[1]

To make that vision work, data has to be accessible and understandable. Web standards are focused on making that possible. Page design is a part of those standards but is not the extent of those standards.

The reason we follow standards should be to create pages which offer data that can be searched by User Agents in a reliable way. That is, the content is presented in accordance with a declared set of rules which clearly define what elements are and what their usage means. Furthermore, standards means presentation can be safely ignored since it in no way affects the content. With known elements and no concern about presentational inference, data becomes the focus. Some of the data can be even be extracted from the semantics of the page: proper use of headers, lists as dialogue, etc. A good start but only a start.

What we need to remember is that standards go way beyond this. What we need to remember is that a fully semantic web revolves around the rdf framework;----employing uri's, taking advantage of the sparql and owl for data manipulation, making use of daml+oil for establishing data equivalence.--- bringing the flexibility of the web to fruition:

"RDF is a flexible and extensible way to represent information about World Wide Web resources. It is used to represent, among other things, personal information, social networks, metadata about digital artifacts, as well as provide a means of integration over disparate sources of information. A standardized query language for RDF data with multiple implementations offers developers and end users a way to write and to consume the results of queries across this wide range of information. Used with a common protocol, applications can access and combine information from across the Web." [2]

This is a vision that moves well beyond the electronic. Uri's are conceived of as representing not just links but also places and people. The rdf framework of grahps provides a way to establish the links in a chain by which a person acquires and evaluates knowledge and resources in a,hopefully, evolving process.

The problem is that the data upon which such a web depends is in incompatible and likely mutually incomprehensible formats. Making that data interchangeable is what standards do. Various forms of xml derived technologies as OWL[3] and daml+oil[4]  help to establish an equivalence of data; making it possible for machines and people to trust that the data they are receiving is understood as it was meant to be. So that "zip" is understood as intended by the original author be that meaning as "zip code",  "nothing", or "zipper type". The translation is accurate.

In the end, a user, including all of us, will shape their own experience with the web. Trust, the feeling that the data is accurate, will come with a combination of electronic signature verification and the slower process of building a set of uri's individuals find reliable. It's not an absolute standard of trust, it's a personal and relative standard as varied as the web.

Will this matter to large commercial sites? Probably not; nor necessarily should it. Their version of the web is proscribed by their concern with brand name recognition and frankly sales. People maintaining such sites won't worry about truples and how daml classes are written. A web authoring tool which is wysiwyg, with nested tables and tag soup, will work fine for them unless and until it becomes a compelling business reason to do pages in another form.

You can bet that if  standards compliant pages lead to new and different search methodologies, commercial sites will follow along. Business didn't invent the web, it just changed it. Other groups can do the same thing.

You can have the web your way or have someone else define it for you. That's why you should bother, Bert.

drew






[1] http://www.w3.org/2001/sw/
[2] http://www.w3.org/TR/rdf-sparql-query/
[3] http://www.w3.org/TR/owl-features/
[4]   http://www.w3.org/TR/daml+oil-reference

Reply via email to