[Bug 671] Whitelist more HTML tags: address dfn kbd q samp

bugzilla-daemon Wed, 21 Jul 2010 18:30:21 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=671

S. McCandlish <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Whitelist more HTML tags:   |Whitelist more HTML tags:
                   |dfn kbd q samp              |address dfn kbd q samp

--- Comment #41 from S. McCandlish <[email protected]> 2010-07-22 01:29:44 
UTC ---
I apologize for the length of this, but this has gone on for far, far too long,
and is getting increasingly out in left field on several points. This bug needs
to get fixed as soon as possible.  I'm not trying to be sarky, but it's getting
very tiresome having this bug's fixing being held off or outright rejected on a
basis of "I don't see why..."  So, I'll explain why in detail. It'll be
tedious, but this bug's remained mostly unfixed for years, so I guess it's
necessary.

--- Comment #39 from The Evil IP address <[email protected]>
2010-07-21 19:20:27 UTC ---
> I think there are potential uses for these tags within wikis: The kbd tag
> is used for keyboard text and could for example be used for [[Template:Key
> press]] on the English language Wikipedia. The samp tag could be useful
> for formatting the software text given by other software, for example in
> manual pages.

Right. And perhaps even more to the point, these serve specific semantic and
usability/accessibility purposes, and Wikipedia and other MW wikis are
presently sorely abusing the code element and (now deprecated) tt element to
make up for the lack of these.  They should never have been left out to begin
with.  None of these should have, really, except arguably the q element because
of its widespread implementation issues.

--- Comment #37 from Aryeh Gregor <[email protected]> 2010-06-13
23:57:30 UTC 
> [The dfn element] sounds like it might be worth whitelisting.
> Does anyone have any actual evidence that it's used by search engines?
> Alternatively, S. McCandlish, what's this rich glossary code and why does
> it need dfn?

The lack of the dfn element is a particular thorn in my side on Wikipedia right
now, in reality and not in theory, as WP and other MW-based wikis
*overwhelmingly abuse* the dl/dt/dd elements all over the place (e.g. every
talk page!) for purely visual indentation (:) and/or boldfacing (;).  A new
Bugzilla ticket (if one doesn't exist for this yet) needs to be opened to fix
this - replace all dl/dt/dd output of ":" and ";" wikimarkup with CSS-styled
divs. The only way to distinguish a real definition (e.g. in a glossary,
in-article or as a stand-alone list article) at the Web semantics level (see
below for numerous reasons this is useful and important) from misuse of these
elements, is with the dfn element.  See draft guideline at [[WP:GLOSSARY]], and
its geeky-as-heck subpage for some detail on MW/WP evilbadness when it comes to
definition and more general lists.  For more background on the dfn element and
its stable HTML 5 future, see the
http://www.w3.org/TR/html5/text-level-semantics.html#the-dfn-element page.

That said (i.e. my personal reason for stumping for dfn), the dfn element is
also very useful all over Wikipedia and similar sites just by the element's
very nature.  It should be one of the most-used.  For example, I think that on
Wikipedia in particular, the bold-faced beginning of lead sections in mainspace
articles should mostly be done with a template that auto-adds dfn, instead of
manual boldfacing, e.g.: "An {{leadterm|electrokardiogram}} is..."  I mean,
really, that's precisely what this element exists for: To flag the defining
instance (in its context) of a term.  The average human reader looking (i.e.
with working eyes in a typical browser) at a WP article might not experience
anything differently, but it has a lot of automated processing potential, and
accessibility improvement potential, especially in articles that present
several closely-related things in one article (e.g. submodels or "trim levels"
of a car, e.g. the GT Cruiser and Street Cruiser variants of the PT Cruiser).
With dfn support, users of text-to-speech screen readers would be able to
customize their style sheets to do something specific for dfn-flagged "defining
instances" to help distinguish them from "just another section" and "just
another boldfaced something".

I don't recall making any search-engine-related argument about dfn, though
there may well be one.  And an argument like "we shouldn't implement it because
search engines don't use it" (which I'm not sure was being made here, but it
kind of looked like it) is logically invalid anyway, since there may be many
other things/people that do/will use the feature under discussion for various
reasons and purposes (not to mention that it's tautological and circular - a
search engine can't use a wiki feature that isn't implemented, by definition,
ero the lack of evidence of the search engine using the feature on wikis cannot
be used as an argument against the feature's wiki implementation, obviously -
it puts the cart before the horse).

--- Comment #39 from The Evil IP address <[email protected]>
2010-07-21 19:20:27 UTC ---
> The usages stated at
> http://www.w3.org/TR/html5/text-level-semantics.html#the-samp-element and
> http://www.w3.org/TR/html5/text-level-semantics.html#the-kbd-element are more
> than valid usages of this in a wiki.

Agreed on all these points. 

--- Comment #37 from Aryeh Gregor <[email protected]> 2010-06-13
23:57:30 UTC 
> [The address element] makes no sense on wikis.

and 

--- Comment #39 from The Evil IP address <[email protected]>
2010-07-21 19:20:27 UTC ---
> The address element seems really rather pointless and since the acronym
> element is deprecated, they shouldn't be added, but the rest is
> acceptable IMHO.

and

--- Comment #40 from Aryeh Gregor <[email protected]> 2010-07-21
19:26:40 UTC  ---
> I removed acronym and address from the summary because no one seems to
> support adding them.

Putting the address element back in since I and whoever first proposed adding
support for it, probably among others above, obviously do support adding it.
The address element isn't deprecated in HTML 5
(http://www.w3.org/TR/html5/sections.html#the-address-element), so DO include
it. It serves a well-defined semantic purpose just like every other
non-presentational element. It IS actually particularly useful on wikis (not
necessarily WikiPEDIA, mind you, but keep in mind that MW software can be used
for an endless number of end purposes, including databases of contact
information, etc.) when integrating metadata inline in the content with id= and
a standardized metadata schema like hCard/vCard. It should be in the MW
software, and it should be up to individual installations' system operators
whether to turn that element off.

The acronym element IS being deprecated in HTML 5, and SHOULD be dropped from
this bug (it's because acronyms are just a form of abbreviation, so the element
was redundant with abbr).  

The abbr element SHOULD certainly be supported, and I'm glad that it has
finally been added. But it was arguably the second least important one to add
support for! D'oh.

--- Comment #37 from Aryeh Gregor <[email protected]> 2010-06-13
23:57:30 UTC 
> <kbd> and <samp> are basically useless.

and

--- Comment #40 from Aryeh Gregor <[email protected]> 2010-07-21
19:26:40 UTC  ---
> The question isn't whether you could conceive of a case where the samp or
> kbd tags could legitimately be used, but rather whether you can come up
> with a case where it would be *more* useful than just using the code or tt
> or whatever.  If not, why should we whitelist them?  They'll just confuse
> people, they don't add anything.

You seem to be ignoring the nature of [X]HTML and Web semantics. *These tags
actually mean something* and they all mean something *different*. Nothing could
be further from the truth than them being "basically useless" (or they wouldn't
have been carefully preserved in HTML 5 and even better explained there than in
HTML 4 and earlier). It actually blew me away for a while when I tried to use
these as intended, in template documentation, and they didn't work.  Disabling
them was pointless and a bad idea.  You seem to me to be approaching this from
a 1995 browserwars-era, HTML 3-ish "if it LOOKS right, it IS right" Web dev
paradigm that is long obsolete and which generates genuine problems for many
people on both sides of the user/provider Web equation. It's ultimately
irrelevant that your particular browser, and even Wikpedia's style sheets, may
choose to *style* these tags the same, visually (monospaced, non-proportional
font), and they thus *appear* redundant to you.  *The are not the same*.  User
keyboard input (kbd element) is not output (samp element) is not source code
(code element) is not a variable (var element) and so forth.  Any user of a
modern browser is free to override the default style sheets they receive from
WP or any other site and from their browser, there is no guarantee that every
visual browser does and forever will style these elements all the same by
default, there no guarantee that even Wikipedia will always style them the same
(esp. given the weirdness that WP does in CSS to many things, including the pre
element), there's a near certainty that some screen readers do not treat them
as identical by default, and there's an absolute certainty that power users of
good screen readers customize their style sheets to not treat them as
identical.

There's no evidence I'm aware of that they confuse people. If they did, they
would no longer be part of the [X]HTML specs, given that there's been all of
the 1990s and 2000s to get rid of them if they were actually problematic. 
Mostly only HTML-experienced editors do anything with HTML elements manually on
Wikipedia anyway, and they are the least likely to be confused. If some dwid
does muck something up, someone with more know-how will fix it, just like
everything else on Wiki (which is chock full of much, much more confusing
things that a couple of HTML elements).  HTML elements are mostly used in
templates, which again are usually created by savvy editors who know the
difference between one element and another. The "confusing" argument is
therefore unconvincing.

The elements *do" add something: They add semantic precision, which is a boon
for editing clarity (is this code? is it output? what is it? what are we trying
to communicate here? if I need to update this template documentation, which
parts are code, user input, and example output?), it adds user customizability
with style sheets, it adds to accessibility, it adds to MW's Web standards
compliance, it adds to open content (i.e. Wikipedia, other WikiMedia projects,
and many non-WM projects) portability by separating different kinds of data, it
enhances the ability to template and metadata-ize content, and so on, and so
on.  There's a reason all these elements still exist.  I mean, really, HTML
*could* be stripped down to only a handful of tags like div and span and p and
br, if semantics didn't matter and only presentation was an issue (even tables
can be simulated with CSS and divs!).

So, to answer your "the question", it is *automatically and by definition*
"more useful" to use the proper elements for the content that is appropriate
for them, than to continue to abuse the code and (worse yet) tt elements, just
like it is automatically and by definition more useful to use a screwdriver
than a hammer when dealing with screws instead of nails, regardless of the fact
that a sufficient application of force with a hammer can drive a screw into
wood as if it were a nail.  Use the right tool for the job, and you have
smoother work, a happier worker, and better work output.

--- Comment #40 from Aryeh Gregor <[email protected]> 2010-07-21
19:26:40 UTC  ---
> Moreover, <q> is much harder to type than regular old quotation
> marks, so is anti-wiki.

This may be a moot point, because the q element is problematic for other
reasons, but I again don't think you are properly applying Web semantics and
the purpose of this markup. The q element is not a replacement for quotation
marks. It's semantic markup that tells editors, parsing software (browsers,
screen readers, format-to-format translation software, specialized search code,
etc., etc.) "this is a quotation".  Quotation marks are used for many things
that are not quotations, e.g. song titles.  The presence of quotation marks is
not an indication that something is a quotation, but the q element is.

And just for the record, the "anti-wiki" bit simply isn't applicable at all. As
with other stuff under discussion here, no one expects the noob or even average
editor to bother with this. We DO expect geeky, code-experienced editors to
bother with it, especially in the quotation-related templates.

HOWEVER, I'm in favor of deferring implementation of the q element until such
time as the Web at large agrees on how it should really be implemented and all
major browsers treat it the same. Normally I will never bend at all to suit
broken Microsoft apps, but this issue actually goes beyond that, as some
browsers do not auto-insert quotation marks, the auto-generated quotation marks
are treated as non-content (like li bullets and numbers), in many browsers
(i.e. they don't show up in a copy-paste), and few editors know that many
browsers auto-generate them (and all should, at least according to current and
near-future versions of the HTML specs).  I'm removing the q element from the
subject line in concurrence with Aryeh.  If someone wants to argue for enabling
q, it might be better to do that in a separate bug number and make a
well-defended case for it.  The last thing I want is for problems with q to
hold up implementation of dfn, kbd, samp and address, in that order.

--- Comment #40 from Aryeh Gregor <[email protected]> 2010-07-21
19:26:40 UTC  ---
> If a wiki actually wants to use <q> despite the IE problems, it can
> request that and it might be considered.  I haven't seen such a request.

"A wiki" can't make a request; editors on/of a wiki make requests. I concede on
the q element for now, for reasons given, but am definitely requesting that the
rest be implemented as soon as possible.  The situation is actually the
complete opposite, really: MW developers have willy-nilly disabled various
useful features of [X]HTML for no reason at all in most cases (q arguably being
an exception), despite no demand for the developers to do this, and now several
years of this Bugzilla thread seeking an end to it, to only minimal avail so
far (abbr).

IMPORTANT: After this bug is resolved (and finish reading to see why it *must*
be), there will ultimately need to a be a new bug report here, to get rid of
support for the tt element entirely, which doesn't exist in HTML 5. Here's a
good discussion of this issue (more broadly than wiki), and Googling about it
turns up more:
http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2009-April/000233.html

Salient quote:

Ian Hickson, Wednesday, 29 April 2009 6:44 AM:
> On Tue, 28 Apr 2009, Jim Garrison wrote:
>> I am trying to figure out the best way to replace the tt element as I
>> migrate to HTML5.
> 
> Are you using tt to mark up computer code, variables, sample computer 
> output, user input, for emphasis, to give a span of text in an alternate 
> voice or mood, a span of text to be stylistically offset from the normal 
> prose without conveying any extra importance, or something else?

This question must be asked every time a tt element is replaced (manually or
via AWB or whatever, and they WILL need to ultimately be replaced over the next
couple of years), yet *two of the correct and most common answers will not
actually be usable until bug 671 is actually fixed*.

PS: If we put angle brackets around the element names in these discussions,
many mail systems that try to parse [X]HTML on the fly in e-mail regardless of
its MIME type will interpret them as markup and not render them as text.  I.e.
"since the <blockquote> tag is already..." renders as "since the tag is
already...", with everything after "the" indented (sorry, some of you won't be
able to read that properly; the example passage has a blockquote tag in it).
This makes the messages basically impossible to correctly parse without coming
to bugzilla.wikimedia.org to read them. I've refactored the quoted material in
this message to compensate (e.g. I used "the blockquote tag", etc.).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 671] Whitelist more HTML tags: address dfn kbd q samp

Reply via email to