Here's my understanding of the situation with regard to DOCTYPE and how pages may be assembled from parts prior to being stored (static) or delivered (dynamic) from the server.
If there are any tools that mechanically generate web page <body> content, partly or in their entirety, you have to ensure that when the final page is served up from the server, the result is compatible with some single DOCTYPE declaration. I assume that the CMS is the likely determining factor, since it will generally be designed to generate a particular grade of HTML. That includes the result following server-side includes as well. There is no reason that national-language choice should be the determining factor. I wonder if that has simply been that the different authoring communities had their own preferences, perhaps related to agreements around authoring tools. Likewise, the character encoding has to be the same throughout the served web page. I presume that is UTF8, since there are NL concerns and it is simply a good choice. That means the httpd setting ensure the proper MIME type with specific character setting is also part of the response header. The only way to be able to operate this in a sane way is to have it be the same for all pages as delivered from the server. There may be similar considerations for the Community Forums and the MediaWiki as well. Those choices can be resolved independently but the DOCTYPE declarations should be accurate at all times, of course. That is not always the case on many sites. - Dennis PS: I'm ignoring the HTML 4.01 vs XHTML 1.0 debate. Going to HTML5 still requires a decision whether it is done using the HTML or XML flavor. No matter what the direction, the problem is going to be how page assembly is done and which page-generating products have to be accommodated. Finally, it is important to have valid pages under whatever the DOCTYPE is and also have a successful result with as many browsers and their users as possible. It might be more valuable to consider what it takes to make the pages adaptable on small-format device browsers (i.e., smartphones and tablets) and pay close attention to accessibility requirements than fuss about not-yet-approved HTML specifications. - Dennis -----Original Message----- From: Dave Fisher [mailto:[email protected]] Sent: Thursday, March 15, 2012 08:33 To: [email protected] Subject: Re: Doctype of websites On Mar 15, 2012, at 7:37 AM, Rob Weir wrote: > On Thu, Mar 15, 2012 at 10:33 AM, Dave Fisher <[email protected]> wrote: >> >> On Mar 15, 2012, at 12:22 AM, Regina Henschel wrote: >> >>> Hi, >>> >>> Joe Schaefer schrieb: >>>>> ________________________________ >>>>> From: Regina Henschel<[email protected]> >>>>> To: [email protected] >>>>> Sent: Tuesday, March 13, 2012 5:31 PM >>>>> Subject: Re: Doctype of websites >>>>> >>>>> Hi Joe, >>>>> >>>>> Joe Schaefer schrieb: >>>>>> Those de.openoffice.org pages should redirect >>>>>> to www.openoffice.org/de pages, if not your >>>>>> DNS resolver is busted. >>>>> >>>>> I had indeed set de.openoffice.org to 192.9.163.104. Removing it makes >>>>> redirecting work. >>>>> >>>>> That means the pages at de.openoffice.org had been the original ones, >>>>> but will be deleted in near future. They had been imported to >>>>> ooo-site.apache.org/de and here they have got a different doctype. Right? >>>> >>>> >>>> >>>> Well sort of. If you look at the actual document on the site >>>> you will probably find it contains an XHTML doctype even now. >>>> The thing is that the CMS build system as Dave has designed it >>>> will strip most of the header matter out of the file and replace >>>> it with a generic one supplied by a template. >>>> >>>> >>>>> >>>>> If that's not the problem >>>>>> then you need to refresh your pages as they >>>>>> are identical on the server. >>>>>> >>>>>> As to why the doctype is different from the original >>>>>> document, that's probably due to the way Dave worked >>>>>> out the templates for the site. If we need to scrape >>>>>> the doctype out of each individual page that will require >>>>>> some perl coding work, some templating work, >>>>>> and another sledgehammer style commit- ie not something >>>>>> to be taken lightly. >>>>> >>>>> Our pages had been XHTML with all the differences to HTML. And we tried >>>>> to produce valid pages (including W3C check button). It is not >>>>> impossible to change the pages and it can be done bit by bit while >>>>> reviewing the pages. But the aim should be clear. >>>> >>>> >>>> Well I can't advise you how to proceed from here, only point out >>>> that there is some impedance mismatch between how your site builds >>>> work and what's actually in these documents. The choice seems >>>> to be either standardize all the documents on a common doctype >>>> or have the perl code pull the doctype out of the original document >>>> if it exists and pass it along to the template as an argument. >>>> >>>> >>>> You might even be better off just not supplying a doctype at all >>>> and letting the browser figure it out. Up to you folks. >>>> >>> >>> If we want valid pages, a common doctype is needed because the inserted >>> part has to be written in a way, that it fits this doctype. For example you >>> need for the feather-logo an <img .../> element in XHTML and in HTML only >>> <img ...>. So I think we need to agree on one doctype. >>> >>> Is it possible to count, how many pages of all are actually having an XHTML >>> doctype? (I'm not familiar with command line.) >>> >>> Kind regards >>> Regina >>> >>> P.S. The feather img-Element is missing the alt-attribute. >> >> I have been looking into this. In general the skeleton is the non-compliant >> part and is what should be changed. However there are many of the NLC sites >> that are very much HTML. >> >> One more sledgehammer will happen ... but planning needs to be careful. >> > > What if we went subdomain by subdomain and ran HTML Tidy on the > content to coerce it to a single doctype. Would that butcher things? We have a file called content/brand.mdtext that controls the branding language and logo for each page. In templates we have templates/ssi.mdtext and templates/api/ssi.mdtext David-Fishers-MacBook-Air:templates dave$ more ssi.mdtext brand: /brand.html footer: /footer.html topnav: /topnav.html home: home I think that ssi.mdtext should add a line like: doctype: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> And if "mn" needs a different treatment: templates/mn/ssi.mdtext brand: /mn/brand.html footer: /footer.html topnav: /mn//topnav.html home: home doctype: This fits the NL plan. I want to avoid divergent skeleton.html files, and it may be the case that some sections will want an xhtml skeleton while others get a html. I still intend to avoid changing every file. I've $job to pay attention to until late today ... sorry that I'm dribbling out these plans bit by bit. Regards, Dave > > -Rob > >> Regards, >> Dave >> >>
