Re: [whatwg] sic element, was: Re: Exposing spelling/grammar suggestions in contentEditable

Martin Janecke Fri, 31 Dec 2010 07:17:12 -0800

Am 30.12.2010 um 22:49 schrieb Benjamin Hawkes-Lewis:

> On Thu, Dec 30, 2010 at 8:55 PM, Martin Janecke <[email protected]> wrote:
>> I don't think <mark> is appropriate for what I meant.
>> 
>> I as the publisher usually don't mean[1] to point a readers attention at 
>> spelling errors by someone I quote, I just want to be able to add semantic 
>> markup that identifies a part of text as deliberately published just the way 
>> it is published.
> 
> Indicating where mistakes have been reproduced in transcribed or
> quoted text seems like a different usage than Charles's application of
> marking mistakes in editable text for potential correction by the
> end-user.


Indeed. My reply which Hixie referred to was to one aspect of a different, more 
general proposal by Charles:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-November/029228.html


> 1. What problem(s) does indicating where mistakes have been reproduced solve?

I understand the question in this context as a concrete formulation of 
questions such as "What problem(s) does meta data solve? What problem(s) does 
semantic markup solve?" They carry additional information about a text. They 
solve the problem of not having this information available. Is the additional 
information worthwhile in this special case? I think so. It's common in plain 
text ("[sic]") and even spoken language. It's found in scientific papers as 
well as in respected newspapers.

Apart from informing human readers about the correct reproduction of a 
misspelled word, a HTML <sic> would indicate the same to web applications. 
Think of a search engine, which, as one factor of their ranking algorithm, 
considers orthography and grammar in a page as quality factor. The search 
engine could be made to ignore (reasonably few) <sic>-marked errors in such an 
algorithm; i.e. not let <sic>-marked errors rank the page lower.


> 2. What other solutions to this problem might there be?

As you suggested: Use a plain text string "[sic]" after the reproduced error.
As you suggested: Use some kind of microformat or related technologies.
Use span with an unstandardized title or class.
Use a HTML comment.
However, I think these solutions are inferior. The explanation is below.


> 3. What's the advantage of using markup to do this rather than visible
> text like deadtree.

Sorry, I don't understand "deadtree". Is this an idiom?


> What's wrong with "The House of Representatives
> shall chuse [<span lang="la">sic</span>] their Speaker and other
> Officers"?

In many cases there's nothing wrong with a visible "[sic]". It has successfully 
been done for decades. And it will be in future. There's also nothing wrong 
with plain text in general; it has been used successfully for centuries and 
will be in future. There's nothing wrong with books that use presentation 
oriented markup either, e.g. italics when emphasizing. They have been printed 
successfully for centuries and will be in future.

What is wrong with "Cats [emphasized] are cute animals" or "<span 
style='text-style:italics'>Cats</span> are cute animals"  or "<span 
class='emphasized'>Cats</span> are cute animals" instead of "<em>Cats</em> are 
cute animals"? I don't think there's anything really wrong with either of 
these, but apparently people agreed that it's good to use a standardized markup 
language for markup, that semantic markup is a good thing and that simple 
markup is a good thing. <sic> in an HTML page would be simple, semantic and 
consequent HTML.

I think <sic> is a more HTMLish solution than a plain text "[sic]" -- just like 
<ul><li>...<li>... styled with list-style-type:decimal is more HTMLish than 
<div>1. ...<div>2. ...

The plain text string "[sic]" doesn't indicate where the start of the "[sic]"ed 
part of text is. That means it provides less information than <sic>...</sic>.

"[sic]" can't be handled with @media and CSS in general.

Note that you can very well style <sic> as "[sic]" with CSS, if that's the form 
of presentation you prefer:
sic:after {content:" [sic] "}

"[sic]" is hardly used in full quotes/transcriptions, although the advantages 
of using "[sic]" in short quotes apply to full quotes too. For example, here's 
a short quote that uses "[sic]" visibly:
http://en.wikipedia.org/wiki/Article_One_of_the_United_States_Constitution#Clause_5:_Speaker_and_other_officers.3B_Impeachment
And here's a transcription that doesn't use "[sic]" in the same place although 
its publisher considered it important to indicate the correct reproduction of 
the original source in some way as well, as you can tell by looking into the 
wiki markup source code, where he added a comment stating the fact:
http://en.wikisource.org/wiki/Constitution_of_the_United_States_of_America#Section_2
Having "[sic]" numerous times in a text seems to be annoying. It puts too much 
emphasis on errors. It is easily misunderstood as ridiculing someone's 
orthography though often not intended. Also, readers use full text quotes for 
various purposes, e.g. printing a piece of poetic art out and pinning it to a 
wall just like a painting. Printed "[sic]"s are not desirable there, as they 
are not part of the art. An unobtrusive <sic> would preserve the advantages of 
"[sic]" without its disadvantages in full quotes. It carries its information 
even if made invisible to the common reader. Unlike HTML comments, which are 
also invisible, <sic> is semantic, can be easily made visible, and isn't 
stripped by processing scripts without good reason.


> 4. It seems like "sic" would be a very rarely used feature. Why do we
> need to include it in the small, core HTML vocabulary rather than an
> RDF vocabulary imported into HTML via annotations like RDFa,
> microdata, or microformats?

<sic> would be a natural enhancement in the tradition of <blockquote>, <q> and 
<cite>.

HTML is a widely taught and learned language.
Indicating where mistakes have been reproduced deliberately is a widely known 
and widely (though not very often) applied habit, even in spoken language and 
plain text.

Extensions such as microformats are less widely known and probably always will 
be. Because they build upon languages such as HTML, people won't learn 
microformats without the language they are used upon, but many people will 
learn the language they are used upon without learning microformats. 
Microformats are great to solve very specific problems and people seeking to 
solve specific problems will dig into them happily. But indicating where 
mistakes have been reproduced deliberately isn't a special 
interest/topic/technology application. It's a very basic thing to do, it occurs 
whenever quoting occurs. Almost every blogger does it. People on discussion 
boards quote each other all the time. Newspapers do it, scientific papers do it.

Thanks
Martin

Re: [whatwg] sic element, was: Re: Exposing spelling/grammar suggestions in contentEditable

Reply via email to