Re: [whatwg] P element's content model restrictions

2007-05-30 Thread Leif Halvard Silli
On 2007-05-30 02:19:48 +0200 Jon Barnett [EMAIL PROTECTED] wrote:

 On 5/29/07, Anne van Kesteren [EMAIL PROTECTED] wrote:
 
 I don't care much for the semantic side of things but changing section 8.2
 (and Acid2) to make ptable not become p/ptable as per HTML4
 would be fine with me. We discussed this recently in #whatwg. Simon has
 some ideas about it.

Fingers x-ed. 

I updated my little testpage 
http://www.malform.no/prov/content-model/index.html with a version version in 
the «opposite mode» http://www.malform.no/prov/content-model/html.index.html 
so you e.g. can see how TABLE in IE inherits font color and font size from P 
when in Standards, but not so, when in Quirks.

 Is there a link to any of this discussion?  I imagine my searching for p
 or p element might be futile.

I suppose Anne referred to IRC, for which Whatwg.org points to this log 
http://whatbot.charlvn.za.net/.

 Given:
 pThis is a lengthy paragraph that talks about the following table
 table...
 
 Breaking scripts that depend on p.nextSibling to be the table and styles
 that depend on p + table to work (and other various DOM issues) is an
 obvious point, and I'm sure it's been discussed.

That script would then allready be broken, cross-browser wise, I suppose?

The worst case is probably not when authors uses p+table{}, because then clean 
IE targetted styling stands ready to pick-up. The worst is if important TABLE 
styles was targetted via table{} for OperaSafariFirefox, but via _table{} for 
IE. (But then the underscore hack is not considered good coding style either 
...) THis will fix itself, when IE fix their browser (and remove the #table{} 
option and other hack-arounds).

Allthough there are also lots - most? - of content out there, for which it is 
quite irrelevant whether TABLE is sibling or child of the nearest P. P and 
TABLE are often contained in a DIV which carries the styling that positions 
that container on the page. That MSIE and the others see PTABLE so 
differently, without most of us recognising any difference on page, should be 
quite telling. 

Collapsing vertical Margins plays an important role in hiding the effects, I 
suppose. For instance if you have PtextPTABLEPtext, then the empty P 
that Opera-Safari-Firefox currently see in standards-mode, may become 
completely collapsed, unless you add padding-top/-bottom or border-top/-bottom 
to it. The reason collapsing vertical margins is such a useful default CSS 
behaviour, is probably simply because it is so typical to not add 
padding-top/padding-bottom or border-top/border-bottom. (When one do add 
padding/border, then the vertical margin collapsing behaviour stops working, 
and the author gets a whole lot more to think about, with regard to the 
vertical space between the elements.)
-- 
leif



Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Ivo Emanuel Gonçalves

I'm not in favor of this.

As Anne pointed out, noscript is used to display alternative content
that script would have shown.  The kind of content that goes only in
body, usually block elements, and never in head.

If the WebKit developers want to follow IE's broken model on parsing
even basic HTML like noscript, be my guest, but don't try to force
this into HTML 5 and make it a standard.

-Ivo


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Anne van Kesteren
On Wed, 30 May 2007 10:23:51 +0200, Ivo Emanuel Gonçalves  
[EMAIL PROTECTED] wrote:

As Anne pointed out, noscript is used to display alternative content
that script would have shown.  The kind of content that goes only in
body, usually block elements, and never in head.

If the WebKit developers want to follow IE's broken model on parsing
even basic HTML like noscript, be my guest, but don't try to force
this into HTML 5 and make it a standard.


Whether or not it should be conforming is a different question. How a  
document is to be parsed is best agreed upon between browser vendors I  
think. We already have enough differences as it is.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Julian Reschke

Anne van Kesteren wrote:
On Wed, 30 May 2007 10:23:51 +0200, Ivo Emanuel Gonçalves 
[EMAIL PROTECTED] wrote:

As Anne pointed out, noscript is used to display alternative content
that script would have shown.  The kind of content that goes only in
body, usually block elements, and never in head.

If the WebKit developers want to follow IE's broken model on parsing
even basic HTML like noscript, be my guest, but don't try to force
this into HTML 5 and make it a standard.


Whether or not it should be conforming is a different question. How a 
document is to be parsed is best agreed upon between browser vendors I 
think. We already have enough differences as it is.


Again, you're making the assumption that any consumer of HTML content is 
a browser.


Best regards, Julian


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Anne van Kesteren
On Wed, 30 May 2007 10:39:24 +0200, Julian Reschke [EMAIL PROTECTED]  
wrote:
Whether or not it should be conforming is a different question. How a  
document is to be parsed is best agreed upon between browser vendors I  
think. We already have enough differences as it is.


Again, you're making the assumption that any consumer of HTML content is  
a browser.


I think the primary consumer is. Content is written mostly against  
browsers, not parsing libraries. Parsing libraries should just follow the  
specification (like html5lib tries to do).



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Henri Sivonen

On May 30, 2007, at 11:39, Julian Reschke wrote:


Anne van Kesteren wrote:
Whether or not it should be conforming is a different question.  
How a document is to be parsed is best agreed upon between browser  
vendors I think. We already have enough differences as it is.


Again, you're making the assumption that any consumer of HTML  
content is a browser.


No, the assumption isn't that any consumer is a browser. The  
assumption is that browsers need to do what they do based on browser- 
specific constraints and the other consumers need to follow what  
browsers do in order to be compatible.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Julian Reschke

Henri Sivonen wrote:

On May 30, 2007, at 11:39, Julian Reschke wrote:


Anne van Kesteren wrote:
Whether or not it should be conforming is a different question. How a 
document is to be parsed is best agreed upon between browser vendors 
I think. We already have enough differences as it is.


Again, you're making the assumption that any consumer of HTML content 
is a browser.


No, the assumption isn't that any consumer is a browser. The assumption 
is that browsers need to do what they do based on browser-specific 
constraints and the other consumers need to follow what browsers do in 
order to be compatible.


...to be compatible with what? The browsers?

So let's rephrase this question: will there be a conformance class for 
HTML5 consumers that *only* accept conforming documents? (Keep in mind 
that these consumers may not even have a DOM or a Javascript engine).


Best regards, Julian



Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Maciej Stachowiak


On May 30, 2007, at 2:02 AM, Julian Reschke wrote:


Henri Sivonen wrote:

On May 30, 2007, at 11:39, Julian Reschke wrote:

Anne van Kesteren wrote:
Whether or not it should be conforming is a different question.  
How a document is to be parsed is best agreed upon between  
browser vendors I think. We already have enough differences as  
it is.


Again, you're making the assumption that any consumer of HTML  
content is a browser.
No, the assumption isn't that any consumer is a browser. The  
assumption is that browsers need to do what they do based on  
browser-specific constraints and the other consumers need to  
follow what browsers do in order to be compatible.


...to be compatible with what? The browsers?

So let's rephrase this question: will there be a conformance class  
for HTML5 consumers that *only* accept conforming documents? (Keep  
in mind that these consumers may not even have a DOM or a  
Javascript engine).


Do you mean: (A) only documents that meet all document conformance  
criteria (B) only documents that meet all *machine-checkable*  
conformance criteria or (C) documents that would not trigger any  
parse errors if the parsing algorithm were applied?


The HTML5 spec as currently written already allows implementations to  
accept only documents in category C, but I don't think there is  
allowance for restricting category (B), and checking for (A) by  
definition does not make sense.


Conformance errors in general can be quite hard to detect since they  
may depend on details of attribute value microsyntax and on  
relationships between elements in different parts of the document, so  
category (B) is likely not what you want in any case.


Regards,
Maciej



Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Lee Kowalkowski

On 30/05/07, Ivo Emanuel Gonçalves [EMAIL PROTECTED] wrote:

As Anne pointed out, noscript is used to display alternative content
that script would have shown.  The kind of content that goes only in
body, usually block elements, and never in head.


You could include a style sheet for non-JS visitors.  Especially
useful if either using javascript in CSS (using expression() in IE),
or more commonly seen when people hide elements by default and reveal
them using JS (bad practice I know but prevents potential flicker and
jiggle).

Perhaps some layouts don't make sense when JS isn't available, so a
different layout entirely is desired.  Authors may prefer this to
keeping all the JS and non-JS version styles in one style sheet and
class name switching to indicate JS is available.

--
Lee


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Philip Taylor

On 30/05/07, Maciej Stachowiak [EMAIL PROTECTED] wrote:


On May 30, 2007, at 2:02 AM, Julian Reschke wrote:
 So let's rephrase this question: will there be a conformance class
 for HTML5 consumers that *only* accept conforming documents? (Keep
 in mind that these consumers may not even have a DOM or a
 Javascript engine).

Do you mean: (A) only documents that meet all document conformance
criteria (B) only documents that meet all *machine-checkable*
conformance criteria or (C) documents that would not trigger any
parse errors if the parsing algorithm were applied?


Perhaps it would be better to rephrase as: Will there be a conformance
class for HTML5 consumers that process conforming documents according
the spec, but process non-conforming documents in an undefined way?
(Some non-conforming documents might still be processed according to
the spec, instead of being rejected, so it doesn't *only* accept
conforming documents. That makes it not be impossible, when using the
full definition of conformance.)

At least that's how I interpret the original intent - it means tools
in systems with guaranteed document conformance (i.e. not taking input
from the general web) could be simplified while still claiming to be
conformant and still being interoperable with other such tools. They
would only have to be compatible with the rules for processing
conforming documents, instead of being compatible with the rules
defined by browsers for non-conforming documents. (Is that
interpretation correct, or am I totally missing the point?)

(I'm not sure whether it's that useful to be able to claim conformance
for its own sake. Interoperability is useful, but maybe that can be
achieved by imagining a new spec which just says If a document is
conforming according to the definition in HTML5, then it must be
processed as described in HTML5, otherwise the document should be
rejected but anything may happen and all the tools can follow that,
so there's no need for HTML5 itself to explicitly allow that.)


 (Keep
 in mind that these consumers may not even have a DOM or a
 Javascript engine).


http://www.whatwg.org/specs/web-apps/current-work#non-scripted already
defines UA conformance when there's no scripting, which seems to cover
those cases.

--
Philip Taylor
[EMAIL PROTECTED]


Re: [whatwg] noscript should be allowed in head

2007-05-30 Thread Henri Sivonen

On May 30, 2007, at 15:02, Julian Reschke wrote:


Philip Taylor wrote:

...
Perhaps it would be better to rephrase as: Will there be a  
conformance

class for HTML5 consumers that process conforming documents according
the spec, but process non-conforming documents in an undefined way?
...


Yep, that's what I had in mind.


I think it could be useful to allow markup editors to coerce non- 
conforming documents into conforming in an implementation-defined way  
because then the editor could limit UI representations to conforming  
cases.


(I'm not sure whether it's that useful to be able to claim  
conformance

for its own sake. Interoperability is useful, but maybe that can be
achieved by imagining a new spec which just says If a document is
conforming according to the definition in HTML5, then it must be
processed as described in HTML5, otherwise the document should be
rejected but anything may happen and all the tools can follow that,
so there's no need for HTML5 itself to explicitly allow that.)

 (Keep
 in mind that these consumers may not even have a DOM or a
 Javascript engine).
http://www.whatwg.org/specs/web-apps/current-work#non-scripted  
already
defines UA conformance when there's no scripting, which seems to  
cover

those cases.


Thinking of which, they may not even want to build a tree of the  
document. So how does the HTML5 parsing model help consumers that  
just want to consume a stream of tokens similarly to a Sax parser?


The parsing spec allows a Draconian response to parse errors. Hence,  
if you want SAX events, you have two conforming options:
 1) Build a tree in its entirety first and then emit the events  
based on the tree.
 2) Emit events as the parse progresses and halt on errors that  
require non-streamable recovery.


My plan is to implement both (in Java).

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




[whatwg] No-DOM HTML (was: noscript should be allowed in head)

2007-05-30 Thread Michel Fortin

Le 2007-05-30 à 8:25, Henri Sivonen a écrit :

The parsing spec allows a Draconian response to parse errors.  
Hence, if you want SAX events, you have two conforming options:
 1) Build a tree in its entirety first and then emit the events  
based on the tree.
 2) Emit events as the parse progresses and halt on errors that  
require non-streamable recovery.


Or, assuming the spec changes to no longer move head-elements (like  
link) to the head when they're found in body, there is a third option:


3) Emit events until you reach a point where it may be possible that  
some events should be reordered, in which case you build a local DOM- 
like tree and wait until you can emit all pending events with a  
certainty they don't need to be reordered.


For instance, table requires maintaining a local DOM-like tree  
until the corresponding /table has been reached, at which point you  
know you can send events for the whole table. That's not optimal, but  
still better than keeping the whole DOM in memory and waiting until  
the end of the document to start sending events. Although it sure it  
more complicated too.


Of course, if head-elements are sent back to the head you can't go  
past the head with this technique, unless you consider yourself in  
the innerHTML case and append them to the current node, as the spec  
requires. I guess this could be fine for parsing HTML snippets  
belonging to the body for instance.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




[whatwg] img as thumbnails

2007-05-30 Thread Charles Iliya Krempeaux

Hello,

I'd like to suggest the addition of another attribute to img.  This
would be useful in cases where the img element is used as a
thumbnail.

The new attribute is like the cite attribute on the q element.

I suggest we add an optional cite attribute to the img element
too, for when an img element is used as a thumbnail.  So for
example...

 img src=... alt=... cite=http://example.com/video; 

The rational is the be able to specify the source from which the
thumbnail was made. (To be able to specify the source image or video
from which the thumbnail was created.)

And to be able to conceptual link together loosely coupled thumbnails
together.  (To be able to say... all these thumbnails are from the
same source.)

Use agents could let users get to the URL in the cite attribute of
the img element by right clicking on the thumbnail, or something
like that.


See ya

--
   Charles Iliya Krempeaux, B.Sc.

   charles @ reptile.ca
   supercanadian @ gmail.com

   developer weblog: http://ChangeLog.ca/


Re: [whatwg] img as thumbnails

2007-05-30 Thread ddailey
This makes good sense to me. Under US case law stemming from Kelly v Arriba, 
the thumbnail has a rather special legal status 
(http://srufaculty.sru.edu/david.dailey/copyright/legalthumb.htm ).* I 
believe similar discussions have taken within WIPO (and certainly did under 
CONFU), such that that status may have burbled outward (of the US) a bit.


The use case in which the thumbnail appears at a different site than the 
thing from which it is derived is therefore highly likely, at least in the 
US (or in places that have access to TCP/IP). If my memory is correct, it 
was shortly after the initial decision in Kelly that Google began an image 
search capability quite reminiscent of what Ditto/Arriba had been doing. 
The case law would appear to require proper citation to be provided so 
providing a standard typographic mechnism for doing that seems worthwhile.


David (IANAL)

*The situation was muddied a bit by a recent injunction against Google, 
http://news.com.com/2100-1030_3-6041724.html -- but upon appeal Google's use 
was upheld http://seattlepi.nwsource.com/business/316013_amazongoogle17.html 
.



- Original Message - 
From: Charles Iliya Krempeaux [EMAIL PROTECTED]

To: WHAT Working Group Mailing List [EMAIL PROTECTED]
Sent: Wednesday, May 30, 2007 11:21 AM
Subject: [whatwg] img as thumbnails



Hello,

I'd like to suggest the addition of another attribute to img.  This
would be useful in cases where the img element is used as a
thumbnail.

The new attribute is like the cite attribute on the q element.

I suggest we add an optional cite attribute to the img element
too, for when an img element is used as a thumbnail.  So for
example...

 img src=... alt=... cite=http://example.com/video; 

The rational is the be able to specify the source from which the
thumbnail was made. (To be able to specify the source image or video
from which the thumbnail was created.)

And to be able to conceptual link together loosely coupled thumbnails
together.  (To be able to say... all these thumbnails are from the
same source.)

Use agents could let users get to the URL in the cite attribute of
the img element by right clicking on the thumbnail, or something
like that.


See ya

--
   Charles Iliya Krempeaux, B.Sc.

   charles @ reptile.ca
   supercanadian @ gmail.com

   developer weblog: http://ChangeLog.ca/







Re: [whatwg] Style sheet loading and parsing (over HTTP)

2007-05-30 Thread Henri Sivonen

On May 29, 2007, at 23:39, gary turner wrote:


Henri Sivonen wrote:

snip
It seems to me that the safer way to show plain text in a browser  
content area is to use text/html and plaintext. :-/


My apologies if you're being facetious or if plaintext is to be  
re-introduced with a /working/ end tag.  But, the plaintext tag, as  
from html 1 is particularly unusable.


I was serious. If you want to display plain text in the browser  
content area, it seems that prepending plaintext and sending it  
as text/html is more likely not to invoke sniffing than sending it as  
text/plain. This use case doesn't require an end tag.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Style sheet loading and parsing (over HTTP)

2007-05-30 Thread gary turner

Henri Sivonen wrote:

snip

I was serious. If you want to display plain text in the browser content 
area, it seems that prepending plaintext and sending it as text/html 
is more likely not to invoke sniffing than sending it as text/plain. 
This use case doesn't require an end tag.


Ah.  That, prepending, gives me a better idea of your meaning.  Eg., a 
file, source.html, containing

plaintext
html
...
/html
would not be sniffed by IE, and would be rendered as plain text.  Where 
normally, the file, as source.txt, with its text/plain content type, and 
without the plaintext tag, would be sniffed by IE and rendered as an 
html document.


If a UAs were to simply honor the server response header, it would 
obviate the need for such an inelegant work-around.


Thanks for the clarification.

cheers,

gary
--
Anyone can make a usable web site. It takes a graphic
designer to make it slow, confusing and painful to use.


Re: [whatwg] Issues concerning the base element and xml:base

2007-05-30 Thread Jonas Sicking

Ian Hickson wrote:

On Tue, 1 May 2007, Jonas Sicking wrote:
The latter is the option I'm following for now. Note that browsers all 
do _different_ things for target= than for href=. The spec has 
made them act the same for now. I'm not sure this is workable, we'll 
have to see when the browser vendors try to get this interoperable. I 
can't imagine that it's a huge issue given that the browsers are so 
far from each other in terms of what they do here. I'm going to do a 
study of some subset of the Web to see how common this is (at least 
the static case; I can't really do much about the scripted case).
I don't think this is a good solution actually. In general, I think it's 
good to always make the DOM reflect the behavior of the document. I.e. 
it shouldn't matter how you arrived to a specific DOM, be it through 
parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever 
we make an exception for that rule I think we need to have a good reason 
for it.


I think you misread what I wrote. Right now, there's no magic involved 
here.


When you said the latter is the option I'm following for now I thought 
you referred to and Firefox and IE7/Win don't change any links. Is 
that not the case?


Looking at the spec it doesn't mention anything special regarding DOM 
mutations at all, so that would indeed make me think that links are 
changed if a base element is inserted at the top of the head using 
the DOM.


What I suggest is that we make the first or last base element in the 
head be the one that sets both the base target and the base href for 
the document (modulo all special handling needed when bases appear in 
the body, described below). While this is not what IE or Firefox does 
today, I doubt that it'll break enough pages to stray from the 
act-like-the-DOM-looks principal.


Right now the href= is from the first and the target= is from the 
last, but other than that that's what the spec says.


Why is the fact that the last target is the one used only defined in a 
Note? Or am I missing it somewhere else?


Also, if we're going to be inconsistent in how current browsers and web 
pages handle multiple bases, why not simply use the first base for 
both href= and target=?


One thing we unfortunately will have to deal with is base elements 
appearing in the middle of the body of the document. What mozilla had to 
do was once we find a base element in the body of the document, we 
tell the parser to remember the resolved href and/or target of that 
base element. We then for any element that uses base uris (full list 
at [1]) set an internal member in the element that hardcodes the 
elements base uri and/or base target.


For elements that don't get this property set on them base href and 
target resolution works as normal. For elements that has this set base 
href and target resolution only uses the set properties.


Note that you only set the saved href and target in the parser if the 
attribute is set in the base element. So if a document contains base 
target=foo in the middle of the body that does not set a saved href 
in the parser.


This is deep magic, as far as the DOM goes. It also makes it hard to debug 
-- e.g. dynamically modifiying base elements, moving them, etc, has no 
effect anymore.


Yup, I agree that this is deep magic as far as a DOM user goes.

HOWEVER, having said that, this is a tiny minority of pages. According to 
a study I did of over 100,000,000 pages, 0.036% of pages have more than 
one base href= element (ignoring those that specify the same href= 
value more than once).


With base href=, you can get 404s, but in practice IE7 is already 
doing that, and it doesn't seem to have affected adoption. Anecdotely, 
most of these pages use absolute URIs, which might explain it.


It's much easier for IE to get away with breaking pages, mostly because 
many people use IE as the yard-stick.


0.06% of pages have more than one base target= element (again ignoring 
duplicates). With base target=, the worst that can happen from the 
user's point of view is that links will open in a new page instead of on 
the same page, and in practice even that's not likely, since (anecdotely) 
most pages with base target= simply alternate between different names.


What do you think?


I would be hesitant to drop support for multiple bases in firefox 
actually. Implementation wise it was very easy to implement, and it is 
known that many pages out there break, though the percentage is small, 
there are a lot of pages on the internet.


It might be something we could restrict to quirks mode pages though, 
that's not a bad idea at all.


/ Jonas


Re: [whatwg] setting .src of a SCRIPT element

2007-05-30 Thread Jonas Sicking

Hallvord R M Steen wrote:

Hi,
if you set the src property of a SCRIPT element in the DOM, IE will
load the new script and run it. Firefox doesn't seem to do anything
(perhaps a more seasoned bugzilla searcher can tell me if it is
considered a known bug?).


It's by design (see below)


I think Opera 8 does what IE does, Opera 9 is buggy.

I think IE's behaviour is pretty useful and I'd like the spec to make
this standards-compliant. It is a common technique to create SCRIPT
elements dynamically to load data (particularly because this gets
around cross-domain limitations). Firefox's implementation means one
has to create a new SCRIPT element each time, keep track of them, and
remove them from the document again, whereas with IE's implementation
you can have one data loader SCRIPT element and set its .src
repeatedly.


The reason I designed it this way was that it felt like the least
illogical behavior. In general a document behaves according to its
current DOM. I.e. it doesn't matter what the DOM looked like before, or
how it got to be in the current state, it only matters what's in the DOM
now.

For style elements this work great. Whenever the contents of a style
is changed the UA can drop the current style rules associated with the
element, reparse or reload the new stylesheet, and apply the new style
rules to the document. (There was a bug in Firefox up to version 2,
where certain DOM mutations inside the style weren't detected, but
that has been fixed in Firefox 3).

For script things are a lot worse. If the contents of a script
element is changed it is impossible to 'drop' the script that was there
before. Once the contents of a script has executed, it can never be
unexecuted. And since we can't undo what the script has already done, 
it feels weird to redo the new thing that you're asking it to do.


Another thing that would be weird would be inline scripts. How would the 
following behave:

s = document.createElement('script');
document.head.appendChild(s);
for (i = 0; i  10; i++) {
  s.textContent += a + i +  += 5;;
}
Would you reexecute the entire script every time data was appended to 
the script? Would you try to just execute the new parts? Would you do 
nothing? IE gets around this problem by not supporting dynamically 
created inline scripts at all, which I think is a really bad solution.


So I opted for 'killing' script elements once they have executed, they 
become in effect dead elements. This felt simple and consistent.


I'm not sure what you mean when you say you need to keep track of them, 
and remove them from the document again. All you need to do every time 
you want to execute a script is to insert a new DOM element in the head 
of your page. It's not going to be a problem with having too many 
script elements in the document unless you start executing millions of 
scripts, at which point you'll have bigger performance issues.


/ Jonas


[whatwg] HTMLTableElement should have a createTBody method

2007-05-30 Thread Adam Roben
   The omission of a createTBody method from HTMLTableElement makes  
it rather inconvenient to create a table with both a thead and a tbody  
using the table DOM APIs. After creating a thead, you have to manually  
create and append the tbody to start putting rows into the body of the  
table, so you cannot exclusively use the HTMLTableElement methods to  
populate the table. If a createTBody method is added, I'd suggest that  
it always create a new tbody rather than ever returning an existing one.


-Adam



[whatwg] Google Gears and HTML5

2007-05-30 Thread Robert O'Callahan

*Maciej Stachowiak wrote:*


Now that Google Gears http://gears.google.com/ has been announced,
I'd like to see the features in it added to the HTML5 spec, since
these are features that should ultimately be a part of basic web
technology, not an extension.

Agreed...



Ian has already added a SQL API which is functionally more or less
equivalent to the Database module http://code.google.com/apis/gears/
api_database.html to the HTML5 spec, here: http://www.whatwg.org/
specs/web-apps/current-work/multipage/section-sql.html

That is sudden :-). Specifying the SQL dialect precisely will be very

important here. Someone also has to investigate carefully the issues around
exposing SQL to untrusted content. Could be some fun denial of service
attacks there.


Conversely maybe something can be done to make them integrate better,
perhaps the Storage items appear as a table via the SQL API, in which
case most of the Storage calls are just a convenience interface, but
you can still do queries on the same data.

Sounds reasonable.



I know Mozilla has considered other approaches to offline web apps,
but I think the LocalServer type approach seems cleaner than
Mozilla's JAR file plan, since it is much more transparent and allows
local resource caching to be decoupled from the rest of the web app.

JAR files can be fairly transparent ... you can redirect from

http://foo.com/foo/index.html to http://foo.com/foo.jar!/index.html, if
appropriate, and use relative URIs in your app so the same versions work in
both cases. On the server side, maintaining a manifest isn't much different
from maintaining a JAR. True, having different URLs for different browsers
--- or for the same browser, in different modes --- could be a hassle. On
the plus side, JAR files make versioning and and consistency incredibly
simple. It's not clear what the Gears ManagedStore does if it gets a 404 or
some other error during an update.

Other issues with the Gears API:
-- The ManagedStore approach seems to have a problem in the following
situation: Suppose an app is updated on the server overnight and I visit the
main page in the morning. It starts loading other resources.  ManagedStore
is going to check the manifest, find that the app needs to be updated, pull
down the new resources, and flip to the new version --- more than likely
while the app is in the middle of loading. Sure, this could happen normally
if I hit the site in the middle of the night at the switchover, but
ManagedStore makes this uncommon case common. (This is Dave Camp's example.)
-- I think making ResourceStore writable by clients is unnecessary
complexity. It's much simpler to maintain the model that the
LocalServer/offline cache is really just a cache of the Web. Then there are
no issues with enabling/disablings stores, there is no need to add domain
restrictions or requiredCookie (i.e. potential security holes) so that
different apps can't tread on each other's resources. (So apps that want to
refer to a canonical source for JS library files or whatever can still
work.) For file uploads, I think we should just have a DOM API on form
control elements that reads the file data into a binary blob of some sort
which can then be stored in Storage or SQL.

I think we're still willing to alter our API, but we want to stick with the
simple conceptual model we currently have: a single read-only offline cache
that requires minimal management. Perhaps we could figure out how to get
versioning and consistency without using JARs. E.g., we might be able to add
an API that reads a Gears-style manifest and does an atomic update of the
offline cache from it.

Rob
--
Two men owed money to a certain moneylender. One owed him five hundred
denarii, and the other fifty. Neither of them had the money to pay him back,
so he canceled the debts of both. Now which of them will love him more?
Simon replied, I suppose the one who had the bigger debt canceled. You
have judged correctly, Jesus said. [Luke 7:41-43]


[whatwg] Potenial Security Problem in Global Storage Specification

2007-05-30 Thread Jerason Banes

Hello All!

This is my first post here, so apologies in advance if I'm not quite up on
the list etiquette.

I was just comparing the Storage API with that of the Google
Gearshttp://gears.google.com,
and something jumped out at me. According to the spec, browsers should allow
a webapp to store data in the globalStorage object with no domain attached.
(i.e. globalStorage['']) This is intended to allow data to be shared across
all webpages.

My concern is that this poses a problem for the user's privacy. Let's say
that I'm an Evil Advertisement site. It is in my interest to penetrate the
user's veil of privacy and determine which pages they visit. I've
traditionally used cookies for this, but the browser makers foiled my
attempts by allowing cookies to only be accepted from the originating site.
But thanks to the new globalStorage API, I can store a Unique ID in the
user's browser, then use Javascript to retrieve it every time they download
one of my ads.

Here's some rough psuedo-js to demonstrate how it might work:

script
if(!gloabalStorage[''].evilbit) gloabalStorage[''].evilbit = createUUID();

var createUUID()
{
   //return a unique identifier using a random algorithm.
}

var displayEvilAd(type)
{
   document.write('img src=http://www.eviladagency.com' +
   '/getAdvertisement.asp' +
   '?type=' + type +
   'tracking=' + gloabalStorage[''].evilbit+'');
}
/script

...

scriptdisplayEvilAd(banner);/script

Is there something I'm missing that would prevent this?

Thanks,
Jerason Banes


Re: [whatwg] The problem of duplicate ID as a security issue

2007-05-30 Thread Ian Hickson
On Fri, 10 Mar 2006, Mihai Sucan wrote:
 Le Fri, 10 Mar 2006 Alexey Feldgendler [EMAIL PROTECTED] a écrit:
 
  Another solution may be to define functions like getElementById(), 
  getElementsByTagName() etc so that they don't cross sandbox boundaries 
  during their recursive search, at least by default. (If the sandbox 
  proposal makes it to the spec, of course.)

I don't see us using a sandboxing system that isn't based on browsing 
contexts, in which case this is moot.


 This is something I'd opt for. But ... this would be really bad, since 
 the spec would have to change the way getElementBy* functions work. It's 
 bad because you shouldn't make a spec that breaks other specs you rely 
 upon (this has probably already been done in this very spec).

True, we should avoid that where possible. (Sometimes, e.g. the MIME type 
sniffing stuff, we are constrained by the legacy implementations.)


On Mon, 13 Mar 2006, Mihai Sucan wrote:
 
 Yes... but there's a need for allowing the parent document control 
 sandboxed content. Therefore, it needs a new parameter, for example: 
 getElementById(string id, bool search_in_sandbox). Isn't that changing 
 the getElementById function? Of course this only a way, it could 
 probably be done differently, without changing the function(s).

This presumably wouldn't be needed with browsing context based sandboxes.


 As for scripting, if there's any user wanting to post his/her script in 
 a forum, then that's a problem. I wouldn't ever allow it (except 
 probably for research purposes, such as how users act when they are 
 given all power :) ).

Indeed.


On Tue, 14 Mar 2006, Mihai Sucan wrote:
 
 I've made a short investigation regarding how browsers behave with  
 document.getElementById('a-duplicate-ID').
 
 The page:
 http://www.robodesign.ro/_gunoaie/duplicate-ids.html
 
 Take a close look into the source (I've provided comments) to understand  
 what the Click me tests and what it shows. You'll see major browsers  
 I've tested behave the same: like with a queue, the last node that sets  
 the duplicate ID is also the node that's returned when you use  
 getElementById function.

This seems to be off the grid now. Is there a copy I can look at 
somewhere?


On Wed, 15 Mar 2006, Alexey Feldgendler wrote:

 Unfortunately we can't change it in a backwards-compatible way (though 
 we probably can define a stricter behavior for !DOCTYPE html only).

Generally we want to avoid adding any more processing modes.


On Tue, 14 Mar 2006, Alexey Feldgendler wrote:
 
 This is true, but there is a problem with the whitelisting approach: the 
 set of elements and attributes isn't in one-to-one correspondence with 
 the set of broowser features. For example, one can't define a set of 
 elements and attributes which must be removed to prohibit scripting: 
 it's not enough to just remove script elements and on* attributes, one 
 must also check attributes which contain URIs to filter out 
 javascript:.

You must also white-list attribute values, indeed. And this would mean 
checking URI syntax (for instance) and whitelisting URI schemes.


 While filtering the DOM tree by the HTML cleaner is easy, it approaches 
 the problem from the syntax point of view, not semantic. It's more 
 robust to write something like sandbox scripting=disallow to 
 disallow all scripting within the sandbox, including any obscure or 
 future flavors of scripts as well as those enabled by proprietary 
 extensions (like MSIE's expression() in CSS). Browser developers know 
 better what makes all possible kinds of scripts than the web 
 application developers.

Indeed, sandboxing (probably using iframe) is something we'll look at.


 Returning to the duplicate IDs, I think we should define some standard 
 behavior for getElementById() when there is more than one element with 
 the given ID. To lower the possible extent of duplicate ID attacks, I 
 propose that getElementById() should throw an exception in that case. 
 It's better to crash the script than to make it do what the attacker 
 wants.

We can't make it raise an exception; pages depend on this already.

I did some research a few months back, and in a sample of several billion 
documents, 13% had duplicate IDs. 13%!


On Thu, 16 Mar 2006, Mihai Sucan wrote:
  
  I don't.  getElementById is already defined and implemented to deal 
  with duplicate IDs, there's no need to redefine it in a way that isn't 
  backwards compatible with existing sites.
 
 Yes, getElementById is already defined to deal with duplicate IDs by 
 returning null, in DOM Level 3 Core [1]. In DOM Level 2 Core [2], the 
 behaviour is explicitly undefined in this case (behavior is not defined 
 if more than one element has this ID).
 
 Yet, the implementations (major User Agents: Opera, Gecko, Konqueror and 
 IE) are the problem, actually. These do not return null, they return the 
 last node which set the ID. That's a problem with security implications, 
 as stated by Alexey in the 

[whatwg] Fwd: setting .src of a SCRIPT element

2007-05-30 Thread liorean

Mistakenly sent this to the public-html list instead of WhatWG. Sorry
for the double for those on both lists.

-- Forwarded message --
From: liorean [EMAIL PROTECTED]
Date: 31-May-2007 07:24
Subject: Re: [whatwg] setting .src of a SCRIPT element
To: HTML WG [EMAIL PROTECTED]


On 31/05/07, Jonas Sicking [EMAIL PROTECTED] wrote:

 I think IE's behaviour is pretty useful and I'd like the spec to make
 this standards-compliant. It is a common technique to create SCRIPT
 elements dynamically to load data (particularly because this gets
 around cross-domain limitations). Firefox's implementation means one
 has to create a new SCRIPT element each time, keep track of them, and
 remove them from the document again, whereas with IE's implementation
 you can have one data loader SCRIPT element and set its .src
 repeatedly.

The reason I designed it this way was that it felt like the least
illogical behavior. In general a document behaves according to its
current DOM. I.e. it doesn't matter what the DOM looked like before, or
how it got to be in the current state, it only matters what's in the DOM
now.

For style elements this work great. Whenever the contents of a style
is changed the UA can drop the current style rules associated with the
element, reparse or reload the new stylesheet, and apply the new style
rules to the document. (There was a bug in Firefox up to version 2,
where certain DOM mutations inside the style weren't detected, but
that has been fixed in Firefox 3).

For script things are a lot worse. If the contents of a script
element is changed it is impossible to 'drop' the script that was there
before. Once the contents of a script has executed, it can never be
unexecuted. And since we can't undo what the script has already done,
it feels weird to redo the new thing that you're asking it to do.


The difference there is that the styles are continually effective,
while the script only does it's effect once. The idea of undoing the
script doesn't make sense since it's no longer in effect. It may have
set up the envirnment or changed things in it, but after that, the
script itself is finished. Everything after that are side effects of
the script having been there, it's not actually the script being in
place.

What I'm trying to say here is that undoing the script after it has
executed amounts to exactly no change because the script is no longer
in effect which means there is no effect to remove. In my opinion
changing the source should just replace the no-longer-in-effect script
with a new one, and send that one to the ECMAScript engine for
parsing.


Another thing that would be weird would be inline scripts. How would the
following behave:
s = document.createElement('script');
document.head.appendChild(s);
for (i = 0; i  10; i++) {
   s.textContent += a + i +  += 5;;
}
Would you reexecute the entire script every time data was appended to
the script? Would you try to just execute the new parts? Would you do
nothing? IE gets around this problem by not supporting dynamically
created inline scripts at all, which I think is a really bad solution.


I agree this is a problem. I see several non-solutions that simply
would close the issue without dealing with valid concerns. The only
solution I see that actually handles most concerns is to not execute
inline scripts at all without some API call on the script element to
tell that it's been set up fully. What if you were building a script
body in many text nodes and CDATA nodes and  entity reference nodes
where you only have a final, executable form once you have set it all
up? It makes sense to me to have an API function for triggering
evaluation of the script inline contents.

So, what are these issues I talk about? Well, mostly it's questions
about what is appropriate to do in cases like:
1. We have a script element, without inline content, in the document
hierarchy. A src attribute is added.
2. We have a script element, with either a src attribute or inline
content, in the document hierarchy. A type attribute is added, removed
or modified.
3. We have a script element, with inline contents, in the document
hierarchy. A src attribute is added.
4. We have a script element, with no inline content but with a src
attribute, in the document hierarchy. Inline content is added.
5. We have a script element, with inline content and a src attribute,
in the document hierarchy. The src attribute is removed.
6. We have a script element, in the document hierarchy. It is removed
from and reinserted into the document hierarchy.
7. We have a script element, with inline content, in the document
hierarchy. The inline content is changed.
8. We have a script element, without inline content, not in the
document hierarchy. A src attribute is added.
9. We have a script element, with a src attribute, in the document
hierarchy. The src attribute is changed.

(An similar example cases, on and on...)



I think it would be logical to handle DOM manipulation like so:
- 

Re: [whatwg] Potenial Security Problem in Global Storage Specification

2007-05-30 Thread Ian Hickson
On Thu, 31 May 2007, Jerason Banes wrote:
 
 I was just comparing the Storage API with that of the Google 
 Gearshttp://gears.google.com, and something jumped out at me. 
 According to the spec, browsers should allow a webapp to store data in 
 the globalStorage object with no domain attached. (i.e. 
 globalStorage['']) This is intended to allow data to be shared across 
 all webpages.

 My concern is that this poses a problem for the user's privacy.

Yeah, this is mentioned in the security section:

   http://www.whatwg.org/specs/web-apps/current-work/#security5

...along with recommended solutions to mitigate it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Google Gears and HTML5

2007-05-30 Thread Maciej Stachowiak


On May 30, 2007, at 8:32 PM, Robert O'Callahan wrote:



I know Mozilla has considered other approaches to offline web apps,
but I think the LocalServer type approach seems cleaner than
Mozilla's JAR file plan, since it is much more transparent and allows

local resource caching to be decoupled from the rest of the web app.
JAR files can be fairly transparent ... you can redirect from  
http://foo.com/foo/index.html to http://foo.com/foo.jar!/ 
index.html, if appropriate, and use relative URIs in your app so  
the same versions work in both cases. On the server side,  
maintaining a manifest isn't much different from maintaining a JAR.  
True, having different URLs for different browsers --- or for the  
same browser, in different modes --- could be a hassle.


Yes, I think the multiple URIs are a significant hassle.

On the plus side, JAR files make versioning and and consistency  
incredibly simple. It's not clear what the Gears ManagedStore does  
if it gets a 404 or some other error during an update.


I believe the update is made atomic to the web app:

http://code.google.com/apis/gears/ 
api_localserver.html#ManagedResourceStore


While an update is in progress, resources from the previous version  
(if any) will continue to be served locally. After all resources have  
been downloaded, the currentVersion property will be updated to  
indicate that the new set of a resources are now being served locally  
and the previous version has been removed.




Other issues with the Gears API:
-- The ManagedStore approach seems to have a problem in the  
following situation: Suppose an app is updated on the server  
overnight and I visit the main page in the morning. It starts  
loading other resources.  ManagedStore is going to check the  
manifest, find that the app needs to be updated, pull down the new  
resources, and flip to the new version --- more than likely while  
the app is in the middle of loading. Sure, this could happen  
normally if I hit the site in the middle of the night at the  
switchover, but ManagedStore makes this uncommon case common. (This  
is Dave Camp's example.) 


We've brought up the same problem. I thought more about this though -  
the update can only happen while you're online, in which case you  
could do all loads directly from the net (or at least revalidating  
per normal cache policy) while at the same time checking for an  
update. Or else the manifest could be checked before serving from the  
local store and if the version changed in that case let the page load  
live and cache those copies. The transparency of the cache from the  
URI point of view actually helps with solving this, I think. I don't  
think this problem is fundamental.


-- I think making ResourceStore writable by clients is unnecessary  
complexity. It's much simpler to maintain the model that the  
LocalServer/offline cache is really just a cache of the Web. Then  
there are no issues with enabling/disablings stores, there is no  
need to add domain restrictions or requiredCookie ( i.e. potential  
security holes) so that different apps can't tread on each other's  
resources. (So apps that want to refer to a canonical source for JS  
library files or whatever can still work.) For file uploads, I  
think we should just have a DOM API on form control elements that  
reads the file data into a binary blob of some sort which can then  
be stored in Storage or SQL. 


I don't think requiredCookie feature is there solely for writeability  
reasons, but rather to make the LocalServer cache work even when in  
normal use they might get different versions of a resource from the  
server at different times. For example, suppose you have two  
different gmail accounts with preferences set to different languages.


I am not sure what you mean by the resource store being writeable. It  
lets you tweak the set of items stored, but you can't construct an  
item with headers and data and all by hand. It does overload file  
insertion into the local store, which is perhaps needlessly complex,  
but you do want a way to access a file picked by an HTMLInputElement  
without having to round-trip it to the server. Perhaps that feature  
would be better served by API on HTMLInputElement instead.


I think we're still willing to alter our API, but we want to stick  
with the simple conceptual model we currently have: a single read- 
only offline cache that requires minimal management. Perhaps we  
could figure out how to get versioning and consistency without  
using JARs. E.g., we might be able to add an API that reads a Gears- 
style manifest and does an atomic update of the offline cache from it.


Do you have docs or a spec for your proposed API? We're considering  
working on offline web app support soon in WebKit and we'd like to  
get in sync with other efforts before we start implementing.


Regards,
Maciej