[whatwg] Parsing: /br and /p

2007-06-27 Thread Anne van Kesteren
These closing tags also need to be guided through the head element phase  
and such to ensure documents such as


  !doctype html/br

  !doctype htmlhead/p

behave similar to the browsers we try to imitate in English.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Maik Merten
Nicholas Shanks schrieb:
 Browsers don't (and shouldn't) include their own av decoders anyway.
 Codec support is an operating system issue, and any browser installed on
 my computer supports exactly the same set of codecs, which are the ones
 made available via the OS (QuickTime APIs in my case, Windows Media APIs
 on Bill's platform, and from the sounds of it, libavcodec on the Penguin)

Browsers should ship with their own decoders (at least one set) because
depending on what platform you are the choice of codecs that are
installed varies greatly and as a content producer you have no idea what
the clients can decode in that scenario. If IE supports WMV, Safari
supports MPEG4 and Opera and Mozilla support Ogg out of the box you can
at least be somewhat sure that if you provide content in those 3 formats
your visitors will almost certainly be able to access the content (and
that's a worst case scenario where interoperability is pretty poor).

Browsers don't rely on the OS to decode JPEG or PNG or GIF either - I
assume that's driven by similiar reasons.

Hooking into the media frameworks of the various platforms may be a good
idea despite of this, albeit that may mean that on one platform e.g.
Firefox can decode WMV while it can't on some other (and in this case
content providers may choose to not provide content in alternative
formats because Internet Explorer and Firefox on Windows cover 95% of
potential customers and they all can do WMV - that could grow to an
unfortunate situation where actually improving interoperability with
one media system slams the door for Linux and MacOS users).


Maik Merten


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Nicholas Shanks

On 27 Jun 2007, at 09:28, Maik Merten wrote:


Browsers don't rely on the OS to decode JPEG or PNG or GIF either


In my experience that seems to be exactly what they do do—rely on the  
OS to provide image decoding (as with other AV media).
I say this because changes that had occurred in the OS (such as  
adding JPEG-2000 support) are immediately picked up by my browsers.



Firefox can decode WMV while it can't on some other (and in this case
content providers may choose to not provide content in alternative
formats because Internet Explorer and Firefox on Windows cover 95% of
potential customers and they all can do WMV - that could grow to an
unfortunate situation where actually improving interoperability with
one media system slams the door for Linux and MacOS users).


WMV 9 is supported on the Mac OS via a (legal) download, so only  
Linux would get screwed. Once the download is installed, every app  
that uses QuickTime (including apps that have their own codecs too,  
such as RealPlayer, VLC) immediately gain the ability to play WMV  
files. Same is true for the Theora codecs from xiph.org.


I assert that any codec written by a browser vendor and available  
only within that browser is user-hostile (due to lack of system  
ubiquity), likely to be slower and buggier than the free decoding  
component written by the codec vendor themselves, and detracts from  
the time available for implementing other browser changes.


- Nicholas.

smime.p7s
Description: S/MIME cryptographic signature


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Robert O'Callahan

On 6/27/07, Nicholas Shanks [EMAIL PROTECTED] wrote:


On 27 Jun 2007, at 09:28, Maik Merten wrote:
 Browsers don't rely on the OS to decode JPEG or PNG or GIF either

In my experience that seems to be exactly what they do do—rely on the
OS to provide image decoding (as with other AV media).
I say this because changes that had occurred in the OS (such as
adding JPEG-2000 support) are immediately picked up by my browsers.



You do not know what you are talking about. Firefox does not use OS image
decoders.

likely to be slower and buggier than the free decoding

component written by the codec vendor themselves



We use official Ogg Theora libraries.

and detracts from the time available for implementing other browser changes.


No-one's suggesting reimplementing codecs. We're talking about integrating
existing codecs into the browser, and shipping them with the browser.

Rob
--
Two men owed money to a certain moneylender. One owed him five hundred
denarii, and the other fifty. Neither of them had the money to pay him back,
so he canceled the debts of both. Now which of them will love him more?
Simon replied, I suppose the one who had the bigger debt canceled. You
have judged correctly, Jesus said. [Luke 7:41-43]


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Nicholas Shanks

On 27 Jun 2007, at 11:55, Robert O'Callahan wrote:


In my experience...


You do not know what you are talking about. Firefox does not use OS  
image decoders.


And I don't use Firefox, so my point is still valid. Please don't  
inform me of what you think I know or do not know, it is impolite.


For your future reference, Robert, the browsers I am familiar with  
and was referring to in my statement about image decoders are WebKit- 
based browsers, OmniWeb 4.5 (historically), Camino and iCab 3. I  
avoid FireFox and Opera due to their non-native interfaces and form  
controls.

Given your statement I may be incorrect about Camino though.


We use official Ogg Theora libraries.
No-one's suggesting reimplementing codecs. We're talking about  
integrating existing codecs into the browser, and shipping them  
with the browser.


This is only possible if the codec is free. I thought we were talking  
about the problem of adding non-free codecs (namely WMV and MPEG4) to  
free software, (possibly also involving reverse-engineering the codec).


- Nicholas.

smime.p7s
Description: S/MIME cryptographic signature


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Maik Merten
Nicholas Shanks schrieb:
 This is only possible if the codec is free. I thought we were talking
 about the problem of adding non-free codecs (namely WMV and MPEG4) to
 free software, (possibly also involving reverse-engineering the codec).

Reverse-engineering doesn't lead to usable implementations of non-free
formats. You end up having *sourcecode* with a free license attached to
it, but you're not allowed to *distribute* actual binaries of that code
because the codec is still covered by patents.

Take for example libavcodec: That actually has WMV support and its
sourcecode is open. However, thanks to the MPEG and Microsoft codecs
being patented (and because those patents are enforced) you cannot put
it into Mozilla.

Open source usually only covers copyright. Truly free codecs are open
sourced AND don't require patent licensing.


Maik Merten


[whatwg] Editorial: typo (spelling)

2007-06-27 Thread Øistein E . Andersen
The verb `precede' does not follow the same pattern
as `succeed' and `proceed'.

s/precee/prece/g would correct the current misspellings.

-- 
�istein E. Andersen


Re: [whatwg] The issue of interoperability of the video element

2007-06-27 Thread Robert O'Callahan

On 6/28/07, Nicholas Shanks [EMAIL PROTECTED] wrote:


For your future reference, Robert, the browsers I am familiar with and was
referring to in my statement about image decoders are WebKit-based browsers,
OmniWeb 4.5 (historically), Camino and iCab 3. I avoid FireFox and Opera
due to their non-native interfaces and form controls.Given your statement
I may be incorrect about Camino though.



You are.

If we're going to make sweeping statements about how browsers work, let's
make sure we include IE, Firefox and Opera in our data.


We use official Ogg Theora libraries.
No-one's suggesting reimplementing codecs. We're talking about integrating
existing codecs into the browser, and shipping them with the browser.

This is only possible if the codec is free. I thought we were talking
about the problem of adding non-free codecs (namely WMV and MPEG4) to free
software, (possibly also involving reverse-engineering the codec).



No-one's suggesting that. As Maik points out, reverse engineering is a dead
end. Shipping a binary codec with, say, Firefox is a theoretical
possibility, but for many reasons it's very unlikely to happen.

Rob
--
Two men owed money to a certain moneylender. One owed him five hundred
denarii, and the other fifty. Neither of them had the money to pay him back,
so he canceled the debts of both. Now which of them will love him more?
Simon replied, I suppose the one who had the bigger debt canceled. You
have judged correctly, Jesus said. [Luke 7:41-43]


Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

2007-06-27 Thread Křištof Želechovski
How does it influence the case flanceacutee vs oeliguvre?  The only
difference is that the first one is used in English.
Chris

-Original Message-
From: Oistein E. Andersen [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 26, 2007 10:55 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

On 26 Jun 2007, at 7:49AM, Křištof Želechovski wrote:

 Internet Explorer apparently chose to support English natively
 while SGML preferred remaining language-agnostic.

To be fair, this is not how things developed.

Microsoft first chose to make the semicolon optional not only
when allowed by SGML rules (notably before whitespace and tags),
but in any position, for all named entities /that existed at the time/,
i.e., latin-1.

Unfortunately, this meant that new entities could not be added without
changing the interpretation of already existing pages (e.g., if a page
contained lessless, adding the entity le to the list would result in its
being interpreted
as less?ss), although most of the entities have names that are rather
unlikely to appear by chance, and the ampersand should be spelt amp;.

Microsoft did not dare to risk this, so entities beyond latin-1 require
a semicolon in IE, even in cases where it is optional according
to SGML (and therefore will pass HTML 4.01 validation, I might add).

-- 
Oistein E. Andersen



Re: [whatwg] Parsing: /br and /p

2007-06-27 Thread Ian Hickson
On Wed, 27 Jun 2007, Anne van Kesteren wrote:

 These closing tags also need to be guided through the head element phase and
 such to ensure documents such as
 
  !doctype html/br
 
  !doctype htmlhead/p
 
 behave similar to the browsers we try to imitate in English.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] void elements vs. content model = empty

2007-06-27 Thread Jonas Sicking

Ian Hickson wrote:

On Wed, 20 Jun 2007, Jonas Sicking wrote:

Simon Pieters wrote:

On Wed, 20 Jun 2007 00:28:37 +0200, Ian Hickson [EMAIL PROTECTED] wrote:


Also, if there's a difference between content=empty and 'void elements'
it deserves an explanation.

One is just about the content model, the other is just about the syntax.
They're not really related, though it happens to be the case that all
elements that have an empty content model are void elements in HTML.

FWIW, script src has empty content model but still requires the end tag.

That is not true. The contents of a script src is interpreted as script and
executed if loading the resource pointed to by the src-attribute fails. In
other words

script src=http://nonexistant.example.com/;
alert('hi');
/script

should bring up an alert.


This doesn't seem to be the case as far as I can tell.


It indeed appears I am wrong. I consulted Brendan and it appears that 
this might have been the case back in NS3 and apparently I still 
remember it from back then. My brain works in mysterious ways...


/ Jonas


Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

2007-06-27 Thread Øistein E . Andersen
On 27 Jun 2007, at 8:45PM, Křištof Želechovski wrote:

 How does it influence the case flanceacutee vs oeliguvre?

You might want to have a look at
http://pl.wikipedia.org/wiki/ISO_8859-1 .

Afterwards, consider the following:
1) Latin-1 does not contain all the characters that are required
for typesetting of English.
2) It does include characters that are never used in English at all.
3) In IE, the entities that can be used without a terminating semicolon
are the ones that can be found in this character set.

How does this make Microsoft Anglocentric?

 The only difference is that the first one is used in English.

They are both used in English, actually (and the spelling with
a ligature should not be considered obsolete in words borrowed
from French, unlike those of Latin origin).

-- 
Øistein E. Andersen


Re: [whatwg] Entity parsing

2007-06-27 Thread Øistein E . Andersen
On 26 Jun 2007, at 4:35AM, Ian Hickson wrote:

 The informal research I did when updating the spec suggests that the 
 current state of the spec is what is better.

(It is difficult to say anything sensible without knowing either the nature
of the research undertaken or the options under consideration.)

 I don't really know how to do more research
 -- it's quite hard to programatically tell when an entity 
 should be expanded and when it shouldn't.

True, but this is not completely insurmountable — or, rather: useful information
can be extracted without necessarily making these decisions explicitly.

I do not know what you have done already, but something like the following
for each entity ref; would be useful for the discussion:
— total number of ref;
— number of ref;;
— number of ref followed by /[a-zA-Z0-9]/;
— the N most frequent matches of /[a-zA-Z0-9]*ref[a-zA-Z0-9]+/.

Without any real data, arguing, e.g., that conforming HTML 4.01 documents that 
are
currently handled correctly by Firefox and Safari must be handled differently
in the future for the sake of backwards compatibility is not really persuasive.


The only argument for following IE that I have been able to find in the archives
is the following in a post from Simon Pieters on 14th Aug 2006 in the thread
“Parsing Entities”:

 I guess that for compat with IE and the Web[1] we have to treat
 Reacutesumeacute as if it were Reacute;sumeacute;. [...]
 [1] http://www.google.com/search?q=R%26eacutesum%C3%A9

The implication seems to be that Reacutesumeacute can be found on the Web
and therefore should be supported. But Google also tells us something else:

(1) reacutesumé: 572
(2) +résumé: 114,000,000
(3) reacute;sumeacute -reacute;sumeacute;s: 16,300
(4) +résumé: 1,000

Actually, (1) does not only cover reacutesumeacute, but also code like
ramp;eacutesumé, so the number of occurrences that can be saved
by parser quirks is lower than 572.

As could be expected, (1) is quite rare compared to (2), all the correctly
encoded variants. Whether 0.0005% should be regarded as significant
(supposing that résumé is representative) may be a contentious issue, but
it is interesting to note that other errors — unwanted conversion of  to amp;
in (3) and a typical encoding problem in (4) — are actually significantly
more common, and these cannot be corrected at all.

-- 
Øistein E. Andersen


Re: [whatwg] Entity parsing

2007-06-27 Thread Ian Hickson
On Thu, 28 Jun 2007, �istein E. Andersen wrote:
  
  I don't really know how to do more research -- it's quite hard to 
  programatically tell when an entity should be expanded and when it 
  shouldn't.
 
 True, but this is not completely insurmountable — or, rather: useful 
 information can be extracted without necessarily making these decisions 
 explicitly.
 
 I do not know what you have done already, but something like the following
 for each entity ref; would be useful for the discussion:
 — total number of ref;
 — number of ref;;
 — number of ref followed by /[a-zA-Z0-9]/;
 — the N most frequent matches of /[a-zA-Z0-9]*ref[a-zA-Z0-9]+/.
 
 Without any real data, arguing, e.g., that conforming HTML 4.01 
 documents that are currently handled correctly by Firefox and Safari 
 must be handled differently in the future for the sake of backwards 
 compatibility is not really persuasive.

Sadly none of the arguments in any direction right now are particularly 
persuasive.

I'm not really convinced that the data that the above proposed survey 
might collect would actually help, since it doesn't tell us the what was 
intended by the author. You'd be surprised at how often people use 
ampersands in text in ways that have nothing to do with entities but in 
ways which could get interpreted as entities.


 The implication seems to be that Reacutesumeacute can be found on the Web
 and therefore should be supported. But Google also tells us something else:
 
 (1) reacutesumé: 572
 (2) +résumé: 114,000,000
 (3) reacute;sumeacute -reacute;sumeacute;s: 16,300
 (4) +résumé: 1,000
 
 Actually, (1) does not only cover reacutesumeacute, but also code like 
 ramp;eacutesumé, so the number of occurrences that can be saved by 
 parser quirks is lower than 572.

The number of occurences of reacutesumé is at least two (the two hits
I looked at both worked in IE and did not in Firefox).


Am I correct in assuming that you would like the spec changed? What would 
you like the spec changed to, exactly?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

[whatwg] Canvas - non-standard globalCompositeOperation

2007-06-27 Thread Philip Taylor

In addition to the standard values for globalCompositeOperation (and
ignoring 'darker'), Gecko supports:

   clear: The Porter-Duff 'clear' operator, which always sets the
output to rgba(0, 0, 0, 0).

   over: Synonym for 'source-over'. The code says not part of spec,
kept here for compat. (It looks like FF1.5 had a broken
'source-over', and implemented 'over' like a correct 'source-over'.
'source-over' was fixed in FF2.0, and 'over' left unchanged.)

(See 
http://lxr.mozilla.org/mozilla/source/content/canvas/src/nsCanvasRenderingContext2D.cpp#1703.)

WebKit supports:

   clear: Same as above.

   highlight: Synonym for source-over. (See
http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSImage_Class/Reference/Reference.html#//apple_ref/doc/c_ref/NSCompositeHighlight
- NSCompositeHighlight: Deprecated. Mapped to
NSCompositeSourceOver.)

(See 
http://trac.webkit.org/projects/webkit/browser/trunk/WebCore/platform/graphics/GraphicsTypes.cpp#L34.)

Opera is very nice and doesn't do anything wrong.

The spec clearly defines the behaviour here: any attempts to set such
values must be ignored.



'clear' is pretty useless, since it's exactly equivalent to doing
globalAlpha = 0; globalCompositeOperation = 'copy' or (depending on
the transform matrix) clearRect(0, 0, w, h). The spec already omits
the Porter-Duff 'B' operator (which sets the output to be equal to the
destination bitmap, i.e. is equivalent to not drawing anything at
all), so it does not seem reasonable to argue for adding 'clear' just
for completeness. I can't think of any other reasons for it to be
added to the spec, other than for interoperability.



As far as I can imagine, for each non-standard value, the possible
situations are:

* No content relies on that value.
 = Web browsers should remove support for it: it has no purpose, and
it may result in authors accidentally using that value and becoming
confused when their code doesn't work in other browsers which will be
irritating for everyone and it will evolve into the next situation:

* Web content relies on that value.
 = It should be added to the spec, because it's necessary for
handling web content.

* Non-web, browser-specific content (extensions, widgets, etc) relies
on that value, and web content doesn't.
 = It should be disabled except when run in the extension/widget/etc
context, to avoid the problems as in the first case. That may cause
minor confusion to the extension/widget/etc authors about why their
code [which is relying on undocumented features] works differently if
they run it on the web instead, but that seems insignificant compared
to having interoperability problems on the web.

* Nobody cares.
 = Nothing happens.


Am I missing any issues here? Would any browser developer think one of
the first three situations applies, and be willing to make the
necessary changes in that case?

--
Philip Taylor
[EMAIL PROTECTED]


Re: [whatwg] Entity parsing

2007-06-27 Thread Øistein E . Andersen
On 28 Jun 2007, at 12:43AM, Ian Hickson wrote:

 Sadly none of the arguments in any direction right now are particularly 
 persuasive.

Indeed.


 I'm not really convinced that the data that the above proposed survey 
 might collect would actually help, since it doesn't tell us the what was 
 intended by the author.

To a certain extent, this depends on the results.

Some conclusions can be drawn without actually knowing the author's intent
at all: if, for instance, foo[^;] is exceedingly rare, then what the author 
meant
does not really matter, since the construct does not need to be supported 
anyway.

I also tend to think that entities that are part of existing words are highly 
likely
to be supposed to be expanded. Of course, 100% accuracy cannot be achieved,
but this is not really needed for the results to be useful.


 Am I correct in assuming that you would like the spec changed? What would 
 you like the spec changed to, exactly?

I would really like an informed decision, and I currently get the impression
that rules are changed to follow IE by default rather than to handle existing
content, which may lead to unnecessary complicated rules that do not
actually handle existing documents optimally.

More specifically, some of the points that probably should be
addressed are the following:

1) Is it useful to handle unterminated entities followed by an alphanumerical
character like IE does? The number of documents for which this actually helps
might be small compared to the number of documents that contain other,
incorrigible errors. The process also introduces errors, albeit not in 
conforming
documents. Is the gain worth the added complexity?

If so, then should this apply to all entities? (Probably not.) Would it be 
useful
to add to/remove from the set supported by IE7? (This may seem insane,
but we should try to avoid premature decisions.)

2) HTML 4.01 allows the semicolon to be omitted in certain cases. Does this
cause problems? Firefox and Safari both support this, and it would seem
meaningless to change the way conforming documents are parsed unless
it can be shown that, e.g., ndash  actually is supposed to mean amp;ndash 
more often than ndash; . (Conformance is a separate issue.)

3) Will new entities ever be needed? If yes, can new entities adopt existing
conformance criteria and parsing rules? 

4) Similar considerations for entities in attribute values.

-- 
Øistein E. Andersen