On 8 Jun 2006, at 10:3AM, White Lynx wrote:
Oistein E. Andersen wrote:
each mark-up element must be kept as short as possible.
Some people argue that short element names being misleading and not
intuitive does not actually improve readability, some people like short element
names as they are
On 9 Jun 2006, at 11:0AM, [EMAIL PROTECTED] wrote:
Ãistein E. Andersen wrote:
2) Fight verbosity
m, [...] frac2den3/frac and root3of125/root [are] clearly
better suited than formula, fraction2denominator3/fraction and
radical3radicand125/radical.
However frac2den3/frac is an shorthand for
On 10 Jun 2006, at 10:1AM, White Lynx wrote:
Oistein E. Andersen wrote:
traditional French typographical conventions for mathematics require lowercase
variables in italic, but uppercase ones in roman.
Do we need extra values like text-transform:french-italic; and
french-bold-italic;
that would
[EMAIL PROTECTED]
this may be difficult to achieve in practice, because TeX
conversors reading TeX sources are unable to provide correct MathML markup
for prescripts.
Conversion to MathML is obviously more difficult because the base has to be
found and encoded explicitly. Still, I do _not_ say
On 16 Jun 2006, at 2:27PM, White Lynx wrote:
Oistein E. Andersen wrote:
The proposal states that op should be used to mark resizable operators,
but this presumably does not mean that the size of such operators is actually
intended to change.
It is intended to be larger.
Yes, but the size is
On 17 Jun 2006, at 2:15PM, White Lynx wrote:
Oistein E. Andersen wrote:
The current proposal does not seem to include the following elements of
ISO-12083:
- fence with arbitrary delimiters (possibly not a good idea)
Probably it is better to list number of delimiters explicitly like in LaTeX.
Hello,
I just tried to check out how custom element and attribute names work in
current browsers and how they are supposed to work in HTML5, and some issues
seem unclear to me. Given the following fairly minimal document:
!DOCTYPE html
html xmlns:x=
titleHTML 5/title
style type=text/css
In section 5.2.2., `chickenkïwi.soup' (with diaeresis) appears twice (once
encoded as chickenk%C3%AFwi.soup), as does `chickenkiwi.soup' (without
diaeresis).
--
Ãistein E. Andersen
I think conforming text/html documents should not be allowed to parse into
a DOM that contains characters that are not allowed in XML 1.0. [...] I am
inclined to prefer [...] U+FFFD
I perfectly agree. (Actually, i think that U+7F (delete) and the C1 control
characters
should be excluded
On 5 Nov 2006, at 1:7PM, Lachlan Hunt wrote:
At the very least, ISO-8859-1 must be treated as Windows-1252. I'm not sure
about the other ISO-8859 encodings. Numeric and hex character references from
128 to 159 must also be treated as Windows-1252 code points.
I think this actually implies
Section 8.1.4:
Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD
Section 9.2.2:
Bytes or sequences of bytes [...] that could not be converted to Unicode
characters
must be converted to U+FFFD
If I read this correctly, section 8.1.4 requires that an illegal UTF-8
On 24 Nov 2006, at 10:33AM, Henri Sivonen wrote:
On Nov 24, 2006, at 04:11, Ãistein E. Andersen wrote:
Section 8.1.4:
Bytes [-] U+FFFD
Section 9.2.2:
Bytes or sequences of bytes [-] U+FFFD
I'm inclined to think that interop[erability] in error situations doesn't
need to go as
deep as
Trailing slashes in void elements are clearly unnecessary from a syntactic point
of view, but I think it can be argued that allowing them actually makes HTML
more internally consistent.
Current versions of HTML allow many unnecessary closing tags to be omitted
(e.g., /p), and for authors
On 3 Nov 2006, at 9:51PM, Ãistein E. Andersen wrote:
In section 5.2.2., `chickenkïwi.soup' (with diaeresis) appears twice [...],
as does `chickenkiwi.soup' (without diaeresis).
No one ever replied to this, and the draft remains unchanged.
(If this is /not/ a typo, this should probably be
In section 2.5.2. Dynamic markup insertion in HTML,
in the paragraph `Escaping a string',
the word `occurrences' is systematically misspelt with -ra- instead of -rre-.
--
Øistein E. Andersen
to be unavoidable at least in some
cases.
I hope this can lead to a fruitful discussion.
--
Øistein E. Andersen
on such details
somewhere?
--
Øistein E. Andersen
unnecessary burden on the author.
Perhaps an idea for Prince7?
Anyway, the preliminary conclusion seems to be that a hyph element in HTML
is unnecessary, so this discussion should probably continue somewhere else.
[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf
--
Øistein E. Andersen
, is not helpful
either.
--
Øistein E. Andersen
that importance and emphasis are intimately
related.
Therefore, defining strong as denoting importance and pretending
that the two are completely dissociated entities is unlikely to be productive.
--
Øistein E. Andersen
the typographical
emphasis, a technique that is arguably more effective than overemphasis.
Again, the obvious alternative bTyp/bobgraphy/b does not seem
quite right.
--
Øistein E. Andersen
*) Full title: Lexique des règles typographiques en usage à l’Imprimerie
nationale
-* encodings.
As suggested earlier [1], a simpler solution seems to be to treat C1 bytes and
NCRs from /all/ ISO-8859-* and Unicode encodings as Windows-1252.
[1]
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-November/007804.html
--
Øistein E. Andersen
does IE7 do?
IE7 does not seem to do this either, which indeed suggests
that specific C1 treatment not be needed outside ISO-8859-*.
--
Øistein E. Andersen
Ian Hickson wrote:
On Fri, 3 Nov 2006, Elliotte Harold wrote:
Section 9.2.2 of the current Web Apps 1.0 draft states:
Bytes or sequences of bytes in the original byte stream that could not
be converted to Unicode characters must be converted to U+FFFD
REPLACEMENT CHARACTER code points.
can all be
umlauted (ä, ö, ü
in German).
Moreover, the double-dot accent also has other uses (e.g., ä and ë both
designate
a stressed schwa in Luxembourgeois), so it is probably not advisable
to attempt a complete classification in HTML.
--
Øistein E. Andersen
*) possibly only in the word
on e, i (e.g., Rhomboïd),
but I do not know how consistently the diæresis was used, and words
requiring it are typically foreign words that, unlike the rest, will not have
been printed in Fraktur...
--
Øistein E. Andersen
On 25 Jun 2007, at 11:57AM, Kristof Zelechovski wrote:
Inconsistently, as of IE7: I got ge verbatim from your test.
ge; is /not/ a latin-1 entity.
--
Øistein E. Andersen
On 25 Jun 2007, at 8:28AM, Ian Hickson wrote:
On Sun, 24 Jun 2007, Øistein E. Andersen wrote:
HTML5 currently follows IE7 much more closely than Safari,
Firefox and Opera do, which seems to suggest that some of the quirks
could be dispensed with.
It's possible, though people kept pointing
The verb `precede' does not follow the same pattern
as `succeed' and `proceed'.
s/precee/prece/g would correct the current misspellings.
--
�istein E. Andersen
one is used in English.
They are both used in English, actually (and the spelling with
a ligature should not be considered obsolete in words borrowed
from French, unlike those of Latin origin).
--
Øistein E. Andersen
.
--
Øistein E. Andersen
in attribute values.
--
Øistein E. Andersen
, such content demonstrably exists,
and available data do not support the presupposition that
doing exactly what IE does is actually the best solution for
handling existing content.
--
Øistein E. Andersen
HTML5 currently maps lang; and rang; to
U+3008 LEFT ANGLE BRACKET,
U+3009 RIGHT ANGLE BRACKET,
both belonging to `CJK angle brackets' in
U+3000--U+303F CJK Symbols and Puntuation.
HTML 4.01 maps them to
U+2329 LEFT-POINTING ANGLE BRACKET,
U+232A RIGHT-POINTING ANGLE BRACKET
I wrote:
full-width East-Asian characters ( / ).
That should be / .
--
�istein E. Andersen
L. David Baron wrote:
What's wrong with these mappings, and why shouldn't they
also be the mappings in HTML5?
The problem is that they are canonically equivalent to CJK characters.
http://www.unicode.org/reports/tr15/ describes Unicode
normalisation in general and mentions singleton
: [...]
To make the notion of conformance more useful for authors (that is, to
make conformance checking catch unintentional stuff), I suggest making
starting an unquoted attribute value with a = a parse error.
Done.
On Mon, 17 Sep 2007, Øistein E. Andersen wrote:
An alternative solution would
as U+FDD0 to U+FDDF and the non-characters *FE and *FF
when these are expressed as character references. Would it be possible to
(dis)allow the same set of characters in both cases?
--
Øistein E. Andersen
On 5th June 2007, Øistein E. Andersen wrote:
(To do this properly, what we really ought to do is look for
C1 and undefined characters in all IANA charsets and semi-official
mappings to Unicode and check 1) whether the gaps can be filled
by borrowing from other encodings, and 2) whether
.
--
Øistein E. Andersen
As suggested earlier, ISO 8859-9 is a proper subset of CP1254,
and IE7 always uses the superset. [Actually, the name shown
in the menu varies -- Turkish (ISO) v. Turkish (Windows) --, but
the underlying encoding vector appears to be the same.]
Test pages (identical data, different Charset
On Thursday 10th April 2008, Ian Hickson wrote:
SVG radicals aren't typographically acceptable either.
You really want to use fonts for this.
Current browsers are clearly better at rendering TrueType
and PostScript fonts at small sizes than equivalent shapes
expressed as SVG paths. (This may
document)
is certainly feasible.
However, this solution would not seem to be practical for a colour scheme
using a larger number of colours. Would your mantra remain the same
given, e.g., 256^2 or 64^3 distinct shades of colour? If not, where should
the boundary be drawn?
--
Øistein E. Andersen
that documents containing the letter Ў/ў (only in
KOI8-RU)
are frequently mislabelled as KOI8-U.
Do you have input on the EUC-JP issue?
Not yet, but you can expect some input on CJK encodings at some point in
the future.
--
Øistein E. Andersen
problem?
On Thu, 13 Mar 2008, Øistein E. Andersen wrote:
Note: Similarly, IE apparently handles CS-ISO-2022-JP as distinct
from
ISO-2022-JP. This is something to keep in mind when looking at
multi-byte encodings.
What should we say about this?
The issue seems to be that IE's
On 2 Sep 2008, at 06:06, Ian Hickson wrote:
On Wed, 30 Jul 2008, Øistein E. Andersen wrote:
1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252.
IE7, on the other hand, simply ignores the high bit (as it does
for
a few other 7-bit encodings, by the way). Perhaps
not render my test page at
all.
--
Øistein E. Andersen
solution would be to add specific mark-up to HTML directly.
(I am aware that fractions have been proposed earlier in the context
of mathematical formulae, but I have not been able to find any
previous discussion regarding vulgar fractions.)
--
Øistein E. Andersen
no
unexpected consequences.
--
Øistein E. Andersen
and also make it slightly more
difficult to add (certain classes of) CSS, since a doctype would have
to be added to give the expected rendering (and for the document to
remain conforming).
--
Øistein E. Andersen
On 23 May 2008, at 03:50, Ian Hickson wrote:
On Thu, 28 Jun 2007, Øistein E. Andersen wrote:
1) Is it useful to handle unterminated entities followed by an
alphanumerical character like IE does? [...]
2) HTML 4.01 allows the semicolon to be omitted in certain cases.
[...] Firefox and Safari
On Tue, 14 Apr 2009, Øistein E. Andersen wrote:
Shift_JIS Windows-31J
[...]
Shift-JIS Windows-932
Le 5 juin 09, Anne van Kesteren écrivit :
Is the implication here that Shift_JIS and Shift-JIS are distinct
[...]?
No, Shift-JIS and Windows-932 are commonly used names/labels
for this feature to be considered for addition to Firefox
(apart from actually implementing it myself).
--
Øistein E. Andersen
Le 3 juin 09 à 23h19, Ian Hickson écrivit :
On Tue, 14 Apr 2009, Øistein E. Andersen wrote:
HTML5 currently contains a table of encodings aliases,
[...]
GB2312 and GB_2312-80 technically refer to the *character set* GB
2312-80,
[...]. GBK, on the other hand, is an encoding
.
--
Øistein E. Andersen
legacy encodings’ or better ‘advise authors against using
legacy encodings’.
--
Øistein E. Andersen
On 5 Jun 2009, at 00:49, Ian Hickson wrote:
Could you give an example of what you mean? I'm having trouble
following
your description
On Fri, 24 Apr 2009, Øistein E. Andersen wrote:
Let IE4 (resp. HTML4, HTML5) be a non-semicolon-terminated named
character reference from the IE4 (resp
') in the same
way? (Firefox handles both as non-space characters, IE and Safari
handle both as space characters, and handling these two slightly
exotic C0 white-space characters differently seems surprising.)
--
Øistein E. Andersen
has one.
--
Øistein E. Andersen
).
--
Øistein E. Andersen
The spec currently contains a few occurrences of colloquial
contractions like can't, won't and there's, which should be
changed to cannot, will not, there is etc. for consistency.
--
Øistein E. Andersen
The Character reference data tokeniser state should probably be
renamed to Character reference in data. Adding in would arguably
make the name more accurately descriptive and furthermore consistent
with the Character reference in attribute state.
--
Øistein E. Andersen
, seems inconsistent.
--
Øistein E. Andersen
surrogates in UTF-16 to U
+FFFD, so the mixed form may be interpreted as U+1,.
--
Øistein E. Andersen
and potentially useful to deal with bislabelled
documents, but it might be worth adding an explanatory note.
--
Øistein E. Andersen
On 8 Sep 2009, at 23:39, I wrote:
UTF-16BE
Actually, endianness is immaterial. Please read this as UTF-16
instead.
Sorry for the extra message.
--
Øistein E. Andersen
[P]ossibly algorithms in the adoption agency algorithm note should
be possible algorithms.
--
Øistein E. Andersen
§ 9.1.2.5 Restrictions on content models mentions that an initial
line feed (\n) character inside pre and textarea will be removed.
Should it not cover carriage return (\r) and \r\n as well?
--
Øistein E. Andersen
On 15 Sep 2009, at 02:37, Ian Hickson wrote:
On Tue, 8 Sep 2009, Øistein E. Andersen wrote:
The spec currently contains a few occurrences of colloquial
contractions
like can't, won't and there's, which should be changed to
cannot, will not, there is etc. for consistency.
I haven't
(in the second sentence) is
awkward given that all characters are in fact code points.
--
Øistein E. Andersen
occurrences of each, excluding an
unrelated unproblematic instance inside a script) should be changed
since they appear confusingly as ' and in a sans-serif
typeface.
--
Øistein E. Andersen
E. Andersen
On 19 Oct 2009, at 05:52, Ian Hickson wrote:
I've noted your e-mail here [...] and moved the whole thing out of
the spec.
That does not seem to apply to the last part of the original e-mail,
quoted below.
Øistein E. Andersen
Other character encoding issues
.
--
Øistein E. Andersen
On 22 Oct 2009, at 22:45, Philip Taylor wrote:
On Thu, Oct 22, 2009 at 9:23 PM, Øistein E. Andersen li...@coq.no
wrote:
On 22 Oct 2009, at 17:15, NARUSE, Yui wrote:
Finally, Why ISO 2022 series is discouraged is not clear.
We agree on this point.
The string 숍訊昱穿 encoded as ISO-2022-KR
On 23 Oct 2009, at 04:20, Ian Hickson wrote:
On Wed, 21 Oct 2009, Øistein E. Andersen wrote:
ASCII-compatibility:
The note in 2.1.5 Character encodings seems to say that [...]
ISO-2022[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and
I cannot
find anything in Section 2.1.5
these as potentially meaningful Han
characters.
--
Øistein E. Andersen
On 7 Apr 2012, at 15:04, Øistein E. Andersen wrote:
Suggested reverse mappings:
[...]
C6DE = U+3003
C6DF = U+4EDD
Sorry, these are different from the other C6xx (ETen-1) mappings.
Correction:
A1B2 = U+3003
C969 = U+4EDD
Rationale:
These codepoints are part of the original (unextended
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote:
On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote:
[...]
[1] http://coq.no/character-tables/eten1.pdf
http://coq.no/character-tables/eten1.js
What is the source for the mappings in eten1.pdf
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote:
On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen li...@coq.no wrote:
[1] http://coq.no/character-tables/eten1.pdf
http://coq.no/character-tables/eten1.js
What is the source for the mappings in eten1.pdf?
Unihan H
by this? If not, I'm quite willing to accept the historical accidents
and move on :)
Probably not many. Still, it seems safe to fix these four mappings if the
characters are ever added to Unicode.
Øistein E. Andersen
characters in the range U+0061 to U+007A ([... a] to [... z])’.)
Øistein E. Andersen
82 matches
Mail list logo