Re: [whatwg] data URLs and XMLHttpRequest

2012-09-06 Thread Anne van Kesteren
On Thu, Sep 6, 2012 at 8:47 PM, Ian Hickson  wrote:
> I ended up reverting that text, it didn't really work. Is there anything
> else you need for XHR instead?

I guess I will just special case data URLs in the same way. Seems
somewhat ugly, but I do not really have a better alternative either.


-- 
http://annevankesteren.nl/


Re: [whatwg] Conformance checking of missing alternative content for images

2012-09-06 Thread Ian Hickson
On Wed, 22 Aug 2012, Jukka K. Korpela wrote:
> 2012-08-22 3:43, Ian Hickson wrote:
> > 
> > [...] the argument is that WYSIWYG editor implementors will be 
> > pressured into making their tools output conforming content by people 
> > who don't understand the subtlties of this thread, based purely on 
> > validator output.
> 
> To which extent do people pressure WYSIWYG editor implementors into 
> that, who are these people, and is there evidence of the pressure being 
> successful? How often have they made implementors generate alt="" for 
> unknown images, instead of something appropriate like alt="(an image)"?

alt="(an image)", or alt="DSK1298.JPG", or similar such strings, are 
terrible alternative text. They're not uncommon. (I don't have numbers at 
hand. I did look into it a few years ago. It would certainly be good to 
have someone do more recent research.)


> > A user converting 100,000 PDFs to HTML isn't going to be entering 
> > alternative texts for each image.
> 
> Such bulk conversions can be useful for many purposes, but the results 
> are not accessible and do not conform to good HTML authoring rules. 
> There is no reason to prevent validators from saying this, in their own 
> way.

There is a reason; it's been explained in some detail in this thread. 
Briefly again: If they're not silent, the authors of those tools will be 
pressured into making the images have bogus alternative texts that are not 
programmatically detectable, by people comparing such tools using 
validators. Better to have the output be clearly non-conforming, but for 
the validators not to complain about it by default.


> Take the example of converting one non-HTML document with images to HTML 
> format. Should the result of an automatic converter that generates  
> tags without alt attributes be considered as valid as the result of 
> human conversion with alt attributes added or semi-automatic conversion 
> (where a human is prompted for entering alt texts)?

It's not valid to omit alt="" in these cases. We just don't want 
validators to say that it's not valid.


On Sun, 26 Aug 2012, Benjamin Hawkes-Lewis wrote:
> >> 
> >> It would help catch the not uncommon antipattern where the "content" 
> >> of a link or button is provided only by a background image.
> >>
> >>
> >>
> >>
> >>
> >
> > This is should-level non-conforming and has no reason to be 
> > conforming, as far as I can tell ("elements whose content model allows 
> > any flow content or phrasing content SHOULD have at least one child 
> > node that is palpable content and that does not have the hidden 
> > attribute specified").
> >
> > The only reason it's not entirely non-conforming ("must" rather than 
> > "should") is that there are some edge cases where it makes sense, e.g. 
> > when you have an empty paragraph that you're going to fill in later.
> >
> > But maybe we should tighten this up again, e.g. for interactive 
> > content?
> 
> I cannot imagine a good reason to include an unnamed control, so yes.
> 
> Note that this would need to take into account that fields might be 
> labelled by a  or a table header cell.

Hm, that's a good point. I don't think there's a good way to 
programmatically detect whether there's a label or not, so I'll just leave 
it as a SHOULD for now.


On Wed, 22 Aug 2012, Steve Faulkner wrote:
> >> 
> >> The spec currently allows img without alt if the title attribute is 
> >> present
> >
> > That's a wild over-statement of the case.
> 
> In terms of conformance checking it is not, as you have said yourself

This is only a limitation of the state of the art. There's no way to 
detect these valid cases vs the invalid ones, and reporting false 
negatives ("you're invalid" when it is valid) is worse than false 
positives ("you're valid" when it is not).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Wasn't there going to be a strict spec?

2012-09-06 Thread Michael[tm] Smith
Ian Hickson , 2012-09-07 04:25 +:

> On Fri, 10 Aug 2012, Erik Reppen wrote:
> > Why can't we set stricter rules that cause rendering to cease or at least a
> > non-interpreter-halting error to be thrown by browsers when the HTML is
> > broken from a nesting/XML-strict-tag-closing perspective if we want?
>
> Browsers are certainly allowed to report syntax errors in their consoles. 
> Indeed I would encourage it.

Firefox's "View source" does it. It highlights syntax errors in red (that
is, things the HTML spec defines as syntax errors for text/html documents),
and if you hover over them, shows the text of the error message.

  --Mike

P.S. to Eric, if by "XML-strict-tag-closing" you mean having a browser
report XML well-formedness errors when it's parsing a document served as
text/html, I don't think you're going to get that.  Lack of
"XML-strict-tag-closing" in an HTML document doesn't make it broken. If you
want to catch XML well-formedness errors, I guess you'd need check by
running stuff through an XML parser separately. Also, I don't what you mean
by "broken from a nesting perspective"...

-- 
Michael[tm] Smith http://people.w3.org/mike


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Fri, 7 Sep 2012, Fred Andrews wrote:
> 
> I think the aim is to have the URL of the page that includes these data: 
> URLs sent to the tracking server?

Ah, I see. So say you have a page A, which itself contains a data: URL, 
and you load that data: URL as page B, and in B there is a link to another 
resource C, the argument here is that in the network request for C, the 
referrer information should be of A, rather than B?

That's an interesting idea... Any browser vendors want to chip in on this?

Unless there is browser-vendor interest in implementing this, I don't 
intend to add it to the spec, since it seems a little esoteric and could 
leak referrers in cases where authors had previously assumed they'd be 
safe (e.g. if a Webmail app is opening e-mails in iframes using data: URLs 
to prevent the e-mail's images from including the user's webmail client's 
URL in the referrer information, or something).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] input type=barcode?

2012-09-06 Thread Ian Hickson
On Mon, 27 Aug 2012, Tab Atkins Jr. wrote:
> On Mon, Aug 27, 2012 at 10:56 AM, Ian Hickson  wrote:
> > On Wed, 3 Aug 2011, Tab Atkins Jr. wrote:
> >> On Wed, Aug 3, 2011 at 8:50 AM, Randy  wrote:
> >> > On top of that, the vast majority of these readers just translate 
> >> > it back to text. It's just another input "device", as barcodes are 
> >> > fixed (and sometimes standardized) fonts.
> >>
> >> True, so this is perhaps closer to an IME hint, as has been suggested 
> >> for a couple of other input types.
> >
> > Do you mean something like inputmode=barcode? Can you elaborate on how 
> > that would work? It's an intriguing idea, but I'm not sure I follow 
> > quite how to specify it.
> 
> Yes, something like that.  In terms of the table in the spec:
> 
> Keyword: barcode
> State: Barcode
> Fallback State: Default
> Description: Text input in the user's locale, with keys to activate
> the system's built-in barcode reader to retrieve a value instead.

I think that makes sense.


On Thu, 30 Aug 2012, Jonas Sicking wrote:
> 
> I think while in theory we could rely on UAs to enable barcode entry 
> anywhere, which definitely would provide the maximum capabilities for 
> the user. In practice it seems hard to create UI which enables that 
> while at the same time isn't annoyingly shoving a barcode button in your 
> face which you generally is not interested in using.

Agreed. I think a UA that wanted to support this would want to know which 
attributes to provide the feature for.

In this respect it's similar to the WebKit-proprietary x-webkit-speech 
attribute on . In fact, this suggests that if other browsers are 
interested in supporting speech input, maybe we should standardise it as 
an inputmode value, e.g. inputmode="speech". Possible the inputmode="" 
attribute in that case could be switched to a list of tokens, so you could 
in fact do inputmode="latin-prose speech" or inputmode="numeric barcode" 
in order to provide the user agent with more flexibility in the UI.


> That said, I'm not sure that barcode entry is a commonly enough used 
> feature that we need to have support for it in HTML. Is anyone actually 
> interested in implementing barcode support at this time, whether we add 
> an inputmode or not.

I agree. Until there is interest from a browser vendor ready to implement, 
or Web pages that are doing it themselves via getUserMedia(), we should 
probably not bother to add these features.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Wasn't there going to be a strict spec?

2012-09-06 Thread Ian Hickson
On Fri, 10 Aug 2012, Erik Reppen wrote:
>
> My understanding of the general philosophy of HTML5 on the matter of 
> malformed HTML is that it's better to define specific rules concerning 
> breakage rather than overly strict rules about how to do it right in the 
> first place

This is incorrect. The philosophy is to have strict rules about how to 
write content -- e.g. in the form of the strict content model 
descriptions, the "Writing HTML documents" syntax section, the obsoletion 
of many legacy parts of the language (like ), and other authoring 
conformance criteria -- and then to have equally strict rules for browsers 
and other user agents that defines what exactly should happen when the 
first set of rules are ignored and broken (usually to ignore the broken 
content and not try to fix the problem, but sometimes, usually for legacy 
reasons, to make an attempt at "do what I mean").


> Modern browsers are so good at hiding breakage in rendering now that I 
> sometimes run into things that are just nuking the DOM-node structure on 
> the JS-side of things while everything looks hunky-dorey in rendering 
> and no errors are being thrown.

Use a validator. :-) That should help catch syntax errors and content 
model errors, at least.


> It's like the HTML equivalent of wrapping every function in an empty 
> try/catch statement. For the last year or so I've started using IE8 as 
> my HTML canary when I run into weird problems and I'm not the only dev 
> I've heard of doing this. But what happens when we're no longer 
> supporting IE8 and using tags that it doesn't recognize?

I don't really understand how IE8 is relevant here. Can you elaborate? How 
does it help?


> Why can't we set stricter rules that cause rendering to cease or at least a
> non-interpreter-halting error to be thrown by browsers when the HTML is
> broken from a nesting/XML-strict-tag-closing perspective if we want?

Browsers are certainly allowed to report syntax errors in their consoles. 
Indeed I would encourage it.


> And if we were able to set such rules, wouldn't it be less work to parse?

You have to catch the error either way, whether it's to then abort, or to 
then ignore it. It's the same amount of work. For some errors, e.g. 
out-of-range errors or content model errors, it can actually be 
significantly _more_ work to detect the error than to ignore it.


> How difficult would it be to add some sort of opt-in strict mode for 
> HTML5 that didn't require juggling of doctypes (since that seems to be 
> what the vendors want)?

It's not at all difficult. The spec allows it today. The question is 
really how hard would it be to convince browsers to implement it. :-)


On Fri, 10 Aug 2012, Erik Reppen wrote:
> 
> I think there's a legit need for a version or some kind of mode for 
> HTML5 that assumes you're a pro and breaks visibly or throws an error 
> when you've done something wrong.

If you're a pro, use a validator.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] iframe sandbox and indexedDB

2012-09-06 Thread Ian Hickson
On Mon, 6 Aug 2012, Ian Melven wrote:
> Adom wrote:
> > Yes.  I think this is actually a consequence of having a unique origin 
> > and doesn't need to be stated explicitly in the spec.  (Although we 
> > might want to state it explicitly for the avoidance of doubt.)
> 
> yeah, i can see how this situation behaves being implementation 
> dependent - some implementations might allow storing data for the unique 
> origin, which seems undesirable. So, it might be worth stating the 
> restriction explicitly, as it is for LocalStorage.

The restriction on localStorage is just because of the following 
requirement in the Web Storage chapter:

 2. If the Document's origin is not a scheme/host/port tuple, then throw a 
SecurityError exception and abort these steps.

It has nothing specifically to do with sandboxing. The same would happen, 
e.g., if the page was a data: URL typed by a user (which is another case 
where the page's origin isn't a tuple). Whatever part of IndexedDB says 
what should happen for documents from handtyped data: URLs and any other 
situations with non-tuple origins should automatically cover the case of 
sandboxed content as well.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] iframe sandbox and indexedDB

2012-09-06 Thread Ian Hickson
On Mon, 6 Aug 2012, Ian Melven wrote:
> 
> the spec at 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#sandboxed-origin-browsing-context-flag
>  
> says :
> 
> "This flag also prevents script from reading from or writing to the 
> document.cookie IDL attribute, and blocks access to localStorage."
> 
> it seems that indexedDB access should also be blocked when this flag is 
> set (ie when 'allow-same-origin' is NOT specified for the sandbox 
> attribute).

It is, assuming that IndexedDB is based on the origin of the document. The 
spec doesn't mention it because IndexedDB isn't part of the HTML spec. 
Note that the sentence you cited is non-normative (or rather, it contains 
no normative statements), so that whether it mentions IndexedDB or not 
doesn't change anything about what the spec says.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] srcset isn't future-friendly to screen size differences

2012-09-06 Thread Fred Andrews


> From: jackalm...@gmail.com
...
> I'm not sure how best to solve this, but John Mellor suggested
> allowing the specification of the image's native dimensions somehow.
> That way, the browser could know that the 1600.jpg image is
> appropriate to serve as an 800px wide high-dpi image, or a 1600px wide
> low-dpi image.

John has a proposal here: 
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-August/036958.html  
e.g. '' 
. As I understand it John's proposal only declares the image sizes and is not 
intended for making a selection based on density or screen size directly?

I like John's proposal.  It would solve the problem of choosing an appropriate 
resolution image and is simple. What are the issues here?

You would still be able to use media queries to further customize image choice 
for particular screen sizes.  Note that there are other design approaches such 
as fluid design in which you may well avoid media queries.

Media queries do seem like a stop gap measure until better fluid design support 
is developed .  For example, if you have only four orthogonal media properties 
that are significant to an image, each with only two significant ranges, then 
you need 2^4 or 16 images.  With demands to add more and more media queries it 
should be clear that this approach is a dead end.

> It is possible to address this by repeating the same image at a larger
> breakpoint, like:
> 
> 

No.
 
> However, this means you're duplicating data, and have a chance of
> failing to update all of the urls when you update one.  It also
> becomes more hostile as future screens arrive with higher resolutions.
>  For example, if 3x screens showed up, one would have to write the
> following to serve things in the most ideal manner:
> 
> 
> 
> At this point it's just silly, and very error-prone.

It's just not a good solution for this problem.  We still need Johns proposal 
to address fluid designs.

There are two distinct proposals, and if there is no agreement then perhaps it 
would just be best to split them to allow development to proceed.

cheers
Fred

  

[whatwg] srcset isn't future-friendly to screen size differences

2012-09-06 Thread Tab Atkins Jr.
This email is an extension of the thread started at

by John Mellor, distilling the core problem he has into a more
easily-understood and digested form.

The srcset attribute, as currently written, is not friendly to large
screen-size differences that don't trigger different "art direction".

Consider the following example:



For a screen that's somewhere near 800px wide, this works just fine.
However, a 1x screen 1600px wide (not too uncommon - I think a 19"
monitor is roughly that width) will get served the 800.jpg image,
which then gets blown up to an unattractive level.  The 1600.jpg file
should be identical to the 800.jpg file, just higher resolution, so
delivering it instead would be ideal, but the current syntax doesn't
allow that, nor does it allow any reasonably reliable way for a
browser to detect that it would be okay to serve the 1600.jpg image
either.

I'm not sure how best to solve this, but John Mellor suggested
allowing the specification of the image's native dimensions somehow.
That way, the browser could know that the 1600.jpg image is
appropriate to serve as an 800px wide high-dpi image, or a 1600px wide
low-dpi image.

It is possible to address this by repeating the same image at a larger
breakpoint, like:



However, this means you're duplicating data, and have a chance of
failing to update all of the urls when you update one.  It also
becomes more hostile as future screens arrive with higher resolutions.
 For example, if 3x screens showed up, one would have to write the
following to serve things in the most ideal manner:



At this point it's just silly, and very error-prone.

~TJ


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Fred Andrews

> > I'm currently building an analysis system like Google Analytics, which 
> > gets embedded into a website via a small JavaScript snippet. When I 
> > analyzed the data, I came across a very interesting trick because I got 
> > a lot of requests (with the data from location.href) where the entire 
> > website was embedded into a data:text/html URI - except that all ads of 
> > the page were replaced. Fortunately, my tracking code has been left 
> > without modifications.
> 
> Weird.

Perhaps the concern is that content has been copied into a data: URL in 
violation of copyrights and used to obtain Ad revenue. However the content 
could very well be used with permission.  Ads are dynamic and do change on 
otherwise static content pages.  Thus this could well be an honest use of 
technology. It would be interesting to know if the search engines actually look 
at content in data: URLs - if not then the 'copied' content would seem to bring 
little advantage.

Or perhaps the concern is just that it thwarts efforts to track the referer.
 
> > But the scary thing is that this way you can monetize foreign content by 
> > simply embedding it somewhere you can direct traffic to. That's pretty 
> > clever, because the original site owner doesn't notice this abuse due to 
> > the fact that top.location.href isn't readable. Or even worse, he would 
> > never notice it at all when he doesn't sniff the URI with JavaScript, 
> > because image files would have no referrer.
> > 
> > My final approach to convict the abuser is based on the fact, that the 
> > JavaScript was dynamically loaded from my server and that I can write to 
> > location.href. So I added this piece of code:
> > 
> > if (top.location.protocol === 'data:') {
> > top.location.href = 'http://example.com/trap/';
> > }
> > 
> > But even then the referrer will not be passed to the server. So my 
> > proposal is that the data URI schema gets an exception on this security 
> > behavior.
> 
> I don't understand. What referrer are you trying to set? To what?

I think the aim is to have the URL of the page that includes these data: URLs 
sent to the tracking server?

I can't see any technical issues raised here?

Some think trackers are 'scary' and consider user privacy and safety more 
important, and would prefer to not send a referer and to even have such  
Javascript sandboxed so that it can't leak private information.

cheers
Fred





  

Re: [whatwg] Encoding sniffing algorithm

2012-09-06 Thread Ian Hickson
On Fri, 27 Jul 2012, Leif Halvard Silli wrote:
>
> I have just written a document on how implementations prioritize 
> encoding info for HTML documents.[1] (As that document shows, I have not 
> tested Safari 6.) Based on my findings there, I would like to suggest 
> that the spec's encoding sniffing algorithm should be updated to look as 
> follows:
> 
> Revised encoding sniffing algorithm proposal:
> 
> NEW! 0. document is XML format - opt out of the algorithm.
> [This step is already implicit in the spec, but it would
> make sense to explicitly include it to make sure that
> one could e.g. write test cases to see that it is step
> is implemented. Currently Safari, Chrome and Opera do 
> not 100% implement this step.]

I don't understand the relevance of the algorithm to XML. Why would anyone 
even look at this algorithm if they were parsing XML?


> NEW! #. Alternative: The BOM signa­ture could go here instead of 
> in step 5. There is a bug to move the BOM hereto and make
> it override anything else. What speaks against this are:
>   a) that Firefox, IE10 and Opera do not currently have
>  this behavior.
>   b) this revision of the sniffing algorithm, especially
>  the revision in step 6 (required UTF-8 detection),
>  might make the BOM-trumps-everything-else override
>  less necessary
> What speaks for this override:
>   a) Safari, Chrome and legacy IE implement it.
>   b) some legacy content may depend on it

Not sure what this means.


>  1. user override.
> (PS: The spec should clarify whether user override is
>  cacheable.)

This seems to be entirely a user interface issue.


> NEW! 2. iframe inherits user override from parent browsing context
> [Currently not mentioned in the spec, despite that "all"
>  UAs do have this step for HTML docs.]

That's a UI issue much like whether it's remembered or not. But I've added 
a non-normative note.


> NEW! 6. UTF-8 detection.
> I think we should separate UTF-8 detection from other
> detection in order to make this step obligatory.
> The newness here is only the limitation to UTF-8
> detection plus that it should be obligatory. 
> (Thus: If it is not detected as UTF-8, then
> the parser proceeds to next step in the algorithm.)
> This step would make browsers lean more strongly 
> towards UTF-8.

Without a specific algorithm to detect UTF-8, this is meaningless.


> NEW! 7. parent browsing context default.
> The current spec does not mention this step at all,
> despite that both Opera, IE, Safari, Chrome, Firefox
> do implement it.

Added. (Some comprehensive testing of this would be good, e.g. comparing 
it to each of the earlier and later steps, considering it with different 
ways of giving the encoding, differnet locales, etc.)


> Regarding 6. and 7., then the order is important. Chrome
> does for instance perform UTF-8 detection, but it does it
> only /after/ the parent browsing context. Whereas everyone
> else (Opera 12 by default, Firefox for some locales - don't
> know if there are others) let it happen before the 'parent
> browsing context default'.

Can you elaborate on this?


> NEW! 8. info on “the likely encoding”
> The main newness is that this step is placed _after_ 
> the (revised) UTF-8 detection and after the (new) parent
> browsing context default.
> The name 'the likely encoding' is from the current spec
> text. I am a bit uncertain about what it means in the 
> current spec, though. So I move here what I think make
> sense. The steps under this point should perhaps be
> optional:
> 
> a. detection of other charsets than UTF-8
>(e.g the optional Cyrillic detection in
>Firefox or legacy Asian encoding detection.
>The actual detection might happen in step 6,
>but it should only be made to count here.)

I don't understand your reasoning on the desired ordering here.


> b. markup label of the sister language
>
>(Opera/Webkit/Chrome currently have this directly
>after the native encoding label step - step 5.

No idea what this means.


> c. Other things? What does "likely encoding" current
>refer to, exactly?

The spec gives an example.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] whatwg Digest, Vol 102, Issue 19

2012-09-06 Thread Elton Hilsdorf Neves
help

2012/9/6 

> Send whatwg mailing list submissions to
> whatwg@lists.whatwg.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org
> or, via email, send a message with subject or body 'help' to
> whatwg-requ...@lists.whatwg.org
>
> You can reach the person managing the list at
> whatwg-ow...@lists.whatwg.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of whatwg digest..."
>
>
> When replying to digest messages, please please PLEASE update the subject
> line so it isn't the digest subject line.
>
> Today's Topics:
>
>1. Re: data URLs and XMLHttpRequest (Ian Hickson)
>
>
> --
>
> Message: 1
> Date: Thu, 6 Sep 2012 18:47:02 + (UTC)
> From: Ian Hickson 
> To: Anne van Kesteren 
> Cc: WHATWG 
> Subject: Re: [whatwg] data URLs and XMLHttpRequest
> Message-ID:
> 
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> On Wed, 18 Jul 2012, Anne van Kesteren wrote:
> >
> > I think we should expand http://html5.org/r/7180 with a mention of
> > XMLHttpRequest's open() method. XMLHttpRequest already has a section
> > detailing how data URLs behave in an HTTP context, but they are not yet
> > explicitly allowed. Allowing them in the same way as workers seems like
> > a good idea to me.
>
> I ended up reverting that text, it didn't really work. Is there anything
> else you need for XHR instead?
>
> --
> Ian Hickson   U+1047E)\._.,--,'``.fL
> http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>
>
> --
>
> ___
> whatwg mailing list
> whatwg@lists.whatwg.org
> http://lists.whatwg.org/listinfo.cgi/whatwg-whatwg.org
>
>
> End of whatwg Digest, Vol 102, Issue 19
> ***
>


Re: [whatwg] data URLs and XMLHttpRequest

2012-09-06 Thread Ian Hickson
On Wed, 18 Jul 2012, Anne van Kesteren wrote:
>
> I think we should expand http://html5.org/r/7180 with a mention of 
> XMLHttpRequest's open() method. XMLHttpRequest already has a section 
> detailing how data URLs behave in an HTTP context, but they are not yet 
> explicitly allowed. Allowing them in the same way as workers seems like 
> a good idea to me.

I ended up reverting that text, it didn't really work. Is there anything 
else you need for XHR instead?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Security restriction allows content thievery

2012-09-06 Thread Ian Hickson
On Mon, 16 Jul 2012, Robert Eisele wrote:
>
> Browsers are very restrictive when one tries to access the contents of 
> different domains (including the scheme), embedded via framesets. This 
> is normally a good practice, but I'd suggest to weaken this restriction 
> for the data: URI schema.

It already is. The origin of documents and images using data: URLs is 
essentially the origin of wherever you found the URL.


> I'm currently building an analysis system like Google Analytics, which 
> gets embedded into a website via a small JavaScript snippet. When I 
> analyzed the data, I came across a very interesting trick because I got 
> a lot of requests (with the data from location.href) where the entire 
> website was embedded into a data:text/html URI - except that all ads of 
> the page were replaced. Fortunately, my tracking code has been left 
> without modifications.

Weird.


> But the scary thing is that this way you can monetize foreign content by 
> simply embedding it somewhere you can direct traffic to. That's pretty 
> clever, because the original site owner doesn't notice this abuse due to 
> the fact that top.location.href isn't readable. Or even worse, he would 
> never notice it at all when he doesn't sniff the URI with JavaScript, 
> because image files would have no referrer.
> 
> My final approach to convict the abuser is based on the fact, that the 
> JavaScript was dynamically loaded from my server and that I can write to 
> location.href. So I added this piece of code:
> 
> if (top.location.protocol === 'data:') {
> top.location.href = 'http://example.com/trap/';
> }
> 
> But even then the referrer will not be passed to the server. So my 
> proposal is that the data URI schema gets an exception on this security 
> behavior.

I don't understand. What referrer are you trying to set? To what?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Should editable elements have placeholder attribute?

2012-09-06 Thread Ojan Vafai
On Thu, Sep 6, 2012 at 3:56 AM, Aryeh Gregor  wrote:

> On Sat, Sep 1, 2012 at 4:22 AM, David Young  wrote:
> > This demonstrates some unexpected contenteditable results on
> > Chrome 21.0.1180.82 under Mac OS X.  I cannot seem to return the
> > contenteditable to the empty state again---i.e., to the state where the
> > placeholder shows---using Chrome.  All that I have entered is a space.
> > Backspacing over the space leaves a .  Inserting a space again
> > deletes the .
> >
> > In Firefox 3.6.19 it is necessary to insert two spaces before a 
> > appears; the  cannot be deleted, not even by inserting a space. :-)
>
> It should never be possible to make a contenteditable element contain
> nothing, once it has something in it, because then it would collapse
> to zero height and you wouldn't be able to click on it.  (IIRC, some
> browsers have non-standard special cases for contenteditable elements
> and make them one line high even if they're empty, but this isn't per
> spec.)
>
> So if nothing else would be left, browsers are supposed to put a 
> in, which they remove as soon as anything else would stop it from
> collapsing.  WebKit does this pretty much per spec.  Gecko doesn't
> bother removing the 's it's added, which is messier and not per
> spec.  IE uses  's instead of 's to stop collapsing, last I
> checked, except IIRC, they're magical and can vanish depending on
> whether you look at them with DOM methods vs. innerHTML.
>

While WebKit does put the magic  in, that's not what avoids the
collapsing in this case. If you set the innerHTML to "", it still doesn't
collapse. We actually hard-code that editing hosts don't collapse.


> All this is relevant to any contenteditable element, incidentally, not
> just the editing host.  If you have x and the user backspaces
> over the "x", it's supposed to become .
>


[whatwg] Proposal about the table cell relationship combined with the column grouping and the row grouping

2012-09-06 Thread Pierre Dubois
Hi there,

I developed a javascript table parser based on my research. The parser is
able to understand complex relationship in a data table. The relationship
association is based on the current algorithm and take in consideration how
the header cell (th) is structured, positioned and spanned. All of this is
combined on how the column grouping (colgroup) and the row grouping (thead,
tbody) is structured.

My research was based on usability and common use of table. My goal was to
find how the HTML markup can be used to represent a complex table based on
how a person would understand the complex table by viewing it in a user
agent and on paper.

My research led me to extend the current definition of the table elements
(table, caption, colgroup, col, thead, tbody, tfoot, tr, th, td) and I
tried to understand the table without discriminating row and column. See
the Extended definition of HTML 5 table
elements
.

Described in the extended definition, here is a quick list of the 7
different types for a cell (th and td) can have.

Header cell (th) types

* Header
* Layout
* Header group

Data cell (td) types

* Data
* Summary
* Key
* Description
* Layout


This concept use the row grouping (thead and multiple tbody) and the column
grouping (colgroup) to define data summaries. This can be used to reduce
the table size when it's needed to be displayed in a smaller screen, like a
mobile device.

I would also like to propose adding a method on how data table size (the
visual aspect) can be reduced by an user agent instead of just overflow off
screen in the specification.

Based on the table usability concept, I developed the javascript table
parser and drafted 12 techniques to help web editor to understand and use
this concept.

   1. Defining a Key
Cell
   2. Defining a Data Row
Group
   3. Summaries a Data Row
Group
   4. Structuring the Header
Row
   5. Describing a Row Header
Cell
   6. Describing a Row Group Header
Cell
   7. Defining Column Group
Header
   8. Structuring the Header Column
Cell
   9. Defining a Data Column
Group
   10. Summaries a Data Column
Group
   11. Describing a Column Header
Cell
   12. Defining a Layout
Cell

*(FYI, the intention behind of those techniques are for a future submission
to the WCAG 2.0 Techniques*)

As part of the Web Experience Toolkit (WET) project, I enhanced a zebra
widget to support those complex table as well.

Here are some examples:

* Column highlight
table
* Simple 
table
* Simple grouping
table
* Invoice 
table
* Row level with summary
table


I documented 3 case studies on how a current table can be updated to create
a more usable table.

* Case Studies 
#1
* Nutrition Facts
table
* Ottawa Senators vs. Buffalo Sabres - Game ID #
270519002


Here is the latest source code of the table parser :
https://github.com/wet-boew/wet-boew/blob/master/src/js/workers/parser.table.js

The following table validator use the table parser and show structural
table error and provide a revised version of the table with the
ids/header/aria-describedby auto set :
http://wet-boew.github.com/wet-boew/docs/tableparser/validator-htmltable.html

I also I submitted two bugs in the w3c public bug database related to this
topic that can be seen below.

* Reducing data

Re: [whatwg] Why do HTML*Collection's nameItem need to return 5 different objects?

2012-09-06 Thread Ojan Vafai
On Wed, Sep 5, 2012 at 1:47 PM, Ian Hickson  wrote:

> For HTMLOptionsElement, the situation is more murky.
>
>http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1739
>
> From what I can tell, IE doesn't do direct named access, you have to do it
> via item() or namedItem(). The spec didn't support item() access for
> names, though all the browsers did. I've filed a bug on DOM Core for that.
> Using namedItem(), you see that IE returns a live HTMLCollection, the spec
> returns a live NodeList, WebKit returns a static NodeList, and Opera and
> Firefox return just the first option. (There's a note in the spec asking
> if we should switch to HTMLCollection rather than NodeList.)


I haven't followed the details closely enough to know which APIs should be
returning which types of lists/collections. As a general point though,
anywhere we can avoid live NodeLists/Collections is a big improvement. They
impose a significant implementation cost both in terms of complexity and in
terms of performance impact.


Re: [whatwg] Should editable elements have placeholder attribute?

2012-09-06 Thread Tab Atkins Jr.
On Thu, Sep 6, 2012 at 3:56 AM, Aryeh Gregor  wrote:
> It should never be possible to make a contenteditable element contain
> nothing, once it has something in it, because then it would collapse
> to zero height and you wouldn't be able to click on it.  (IIRC, some
> browsers have non-standard special cases for contenteditable elements
> and make them one line high even if they're empty, but this isn't per
> spec.)

Note that this shouldn't be hard to do without magic.  Just something
like this in the UA style sheet:

[contenteditable]:empty { min-height: 1em; }

~TJ


Re: [whatwg] Features for responsive Web design

2012-09-06 Thread Simon Pieters
On Wed, 05 Sep 2012 19:45:41 +0200, Mathew Marquis   
wrote:


I can say for my own part: manipulating strings is far more difficult  
than manipulating the value of individual attributes. It’s hard to  
imagine a situation where I’d prefer to muck through a space/comma  
separated string rather than a set of independent elements and  
attributes. Unless the plan is to include an API similar to classList,  
though it would then be occupied by a set of strings describing  
disparate information.


The implementation complexity for multiple elements is much greater  
compared to an attribute (or even several attributes, so long as it's just  
one element) plus an API. See  
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-May/035784.html  
and search for "it doesn't involve multiple elements." in  
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Aug/0070.html  
for why.


Given `srcset="img2.jpg 2x 300w, img3.jpg 600w 2x"`, I can only envision  
a classList-style API returning something like one of the following:


1) [ "img2.jpg", "2x", "300w", "img3.jpg", "600w", "2x" ]
This obviously isn’t ideal where authors will have no idea what  
information is being manipulated without keeping constant tabs on the  
current index as compared to the string in the markup. Even if the order  
of these separate concerns were normalized, the inclusion or omission of  
any individual aspect of a rule would mean a flurry of `console.log`s in  
order to figure out which index represented which concern — or careful  
counting of spaces in one’s markup, which certainly seems error-prone to  
me. I know I would certainly make mistakes, there.


2) [ "img2.jpg 2x 300w", "img3.jpg 600w 2x" ]
We’re still left parsing space-seperated strings for relevant  
information, albeit smaller ones.


3) [ { src:"img2.jpg", x:2, w:300 }, { src:"img3.jpg", x:2, w:600 } ]

Except as host objects so that setting the properties actually updates the  
attribute. (src="" can also be exposed in the same API.)


I don’t feel there’s much of a case to be made in favor of writing  
regular expressions to parse and manipulate strings, rather than  
manipulating elements and attributes — though, as always, I’m happy to  
reach out to the author community and ask. If I’m completely off-base  
here — and I may well be — I’d certainly be interested in reading more  
about the plans for an API.


(3) above doesn't need regexps.

--
Simon Pieters
Opera Software


Re: [whatwg] content editing (was Re: Request for new DOM property textarea.selectionText)

2012-09-06 Thread Aryeh Gregor
On Wed, Sep 5, 2012 at 10:37 PM, David Young  wrote:
> I have to say that I'm uneasy with the way that this API wavers between
> answering interaction-design questions and telling what ought to happen
> to the DOM under, say, an execCommand('insertText').  Just for example,
> lots of words are spent on just what to do when the user inserts two or
> more consecutive whitespace characters where the white-space property
> is 'normal' instead of 'pre-wrap'.  That seems like a question to leave
> to the interaction designer.  Different word processors through the
> years have treated consecutive spaces differently, especially in tricky
> contexts like the right margin.

See the todo here, at the top:

http://dvcs.w3.org/hg/editing/raw-file/tip/editing.html#additional-requirements

Conceivably, some things could be left unstandardized, with each
implementer choosing how to do it based on platform conventions, etc.
However, based on the unpleasant experience that we've had in the past
when editors leave browser behavior undefined, I chose to err on the
side of precisely specifying as much as possible.  If I get
implementers coming to me and saying that one specific feature should
be allowed to vary because they want to implement it differently, then
I'll add specific exceptions at that point, ideally as narrow as
possible.

> I say that it should be left to the interaction designer because when an
> intern and I explored the idea of embedding a word-processor directly
> into a web pages using JavaScript/DOM, I remember discovering no fewer
> than three different right-margin behaviors in a survey of Apple's
> TextEdit application, MS Word, the Canon Cat (an "information appliance"
> from 1987).  Then I invented a fourth behavior.  There was not an
> obstacle to implementing each in the DOM.  I'm sure that each behavior
> must have its fans and its detractors, but when I demonstrated the
> differences in a staff meeting, the behavior of MS Word so defied the
> expectations of one MS Word-using engineer that he protested that it
> *could not be*.

This suggests that perhaps the behavior of some of those was just a
bug.  Anyway, what behavior would you suggest as a possible
alternative?  Remember that our hands are tied somewhat here -- we're
restricted to things that can be expressed as one-off DOM mutations.
execCommand() can't persist state other than in the DOM itself.

Also, keep in mind that for web stuff, interop is important in its own
right.  TextEdit and Word are different programs and are meant to have
different functionality.  But the *same* website shouldn't vary in
behavior just because the user uses a different browser, in general.
We want browsers to be as interchangeable as possible, so users can
easily switch between them.  The authors of Word do not highly
prioritize interchangeability with competitors, to put it mildly.  :)

> So, anyway, I question the wisdom of standardizing such fine points of
> the UA behavior as what two taps of the spacebar will do: I believe that
> reasonable people can disagree, and setting a standard seems premature.

In the happy event that we have no fewer than two implementers who
look at the spec and want to implement it to the last detail, I will
be delighted to reconsider this point.  For the time being, no one is
seriously implementing it at all, so I think it's premature to make
changes based on what implementers might possibly think when they do
get around to implementing it.  :)

> There do seem to be a couple of areas where web standards seem
> to be lacking if you aspire to write a JavaScript/DOM word
> processor.  One area is keyboard input: we had to use a table of
> keycode->letter/function correspondences, (at least) one per browser, to
> interpret keypresses consistently.  Another area is locating the precise
> character position where a mouse click occurred: we found it doable by
> binary search, but it was kind of a pain.  Locating and decorating the
> "soft breaks" on a page was another pain point.

The editing spec doesn't intend to give you tools to write your own
word processor using DOM APIs.  The intent is to write a spec for a
preexisting poorly-designed API that was made up by Microsoft in the
1990s, and subsequently copied inaccurately by other browsers, which
in turn all added their own unspecified extensions and quirks.

I agree that if you were actually trying to write a good editor from
scratch, contenteditable is not what you want at all.  And in fact,
most real-world editors use contenteditable as little as possible, and
execCommand() not at all.  But we still have to spec it.  Browsers
have to support the API for compatibility with existing content,
regardless of how terrible it is as an API.


Re: [whatwg] Should editable elements have placeholder attribute?

2012-09-06 Thread Aryeh Gregor
On Sat, Sep 1, 2012 at 4:22 AM, David Young  wrote:
> This demonstrates some unexpected contenteditable results on
> Chrome 21.0.1180.82 under Mac OS X.  I cannot seem to return the
> contenteditable to the empty state again---i.e., to the state where the
> placeholder shows---using Chrome.  All that I have entered is a space.
> Backspacing over the space leaves a .  Inserting a space again
> deletes the .
>
> In Firefox 3.6.19 it is necessary to insert two spaces before a 
> appears; the  cannot be deleted, not even by inserting a space. :-)

It should never be possible to make a contenteditable element contain
nothing, once it has something in it, because then it would collapse
to zero height and you wouldn't be able to click on it.  (IIRC, some
browsers have non-standard special cases for contenteditable elements
and make them one line high even if they're empty, but this isn't per
spec.)

So if nothing else would be left, browsers are supposed to put a 
in, which they remove as soon as anything else would stop it from
collapsing.  WebKit does this pretty much per spec.  Gecko doesn't
bother removing the 's it's added, which is messier and not per
spec.  IE uses  's instead of 's to stop collapsing, last I
checked, except IIRC, they're magical and can vanish depending on
whether you look at them with DOM methods vs. innerHTML.

All this is relevant to any contenteditable element, incidentally, not
just the editing host.  If you have x and the user backspaces
over the "x", it's supposed to become .


Re: [whatwg] Why do HTML*Collection's nameItem need to return 5 different objects?

2012-09-06 Thread Simon Pieters

On Wed, 05 Sep 2012 22:47:07 +0200, Ian Hickson  wrote:


   http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1736

Webkit returns undefined, whereas IE, Gecko, and Opera all return an
HTMLCollection. (IE returns an HTMLCollection with a tags() method, Gecko
and Opera do not. The spec requires an HTMLAllCollection, which is the
kind of collection that has a tags() method in the spec; in IE, all
collections have a tags() method, and document.all is actually a regular
HTMLCollection. We could change the spec here, e.g. to put tags() on all
collections or to just forget about tags() on the subcollection here.)


I thought we had agreed to drop tags() everywhere except for the  
document.all collection. I guess the collection returned by  
document.all.foo wasn't discussed back then, though. Opera and Firefox  
don't support tags() there, which suggests it's not needed for compat, so  
maybe the spec should side with Opera/Firefox and return HTMLCollection  
instead of HTMLAllCollection.


--
Simon Pieters
Opera Software


Re: [whatwg] content editing (was Re: Request for new DOM property textarea.selectionText)

2012-09-06 Thread Simon Pieters

On Wed, 05 Sep 2012 21:37:58 +0200, David Young  wrote:


On Mon, Apr 30, 2012 at 08:55:04AM +0300, Aryeh Gregor wrote:

On Sun, Apr 29, 2012 at 11:39 PM, David Young  wrote:
> I'm curious what advantages document.execCommand() has over the
> customary DOM API for adding/deleting/moving nodes?

execCommand() does vastly more complicated things than the DOM APIs.
See the spec for details:

http://dvcs.w3.org/hg/editing/raw-file/tip/editing.html


First off, I have only skimmed this and, you could say, tested it
against my knowledge and experience of trying to create a word processor
using JavaScript.

I have to say that I'm uneasy with the way that this API wavers between
answering interaction-design questions and telling what ought to happen
to the DOM under, say, an execCommand('insertText').  Just for example,
lots of words are spent on just what to do when the user inserts two or
more consecutive whitespace characters where the white-space property
is 'normal' instead of 'pre-wrap'.  That seems like a question to leave
to the interaction designer.  Different word processors through the
years have treated consecutive spaces differently, especially in tricky
contexts like the right margin.

I say that it should be left to the interaction designer because when an
intern and I explored the idea of embedding a word-processor directly
into a web pages using JavaScript/DOM, I remember discovering no fewer
than three different right-margin behaviors in a survey of Apple's
TextEdit application, MS Word, the Canon Cat (an "information appliance"
from 1987).  Then I invented a fourth behavior.  There was not an
obstacle to implementing each in the DOM.  I'm sure that each behavior
must have its fans and its detractors, but when I demonstrated the
differences in a staff meeting, the behavior of MS Word so defied the
expectations of one MS Word-using engineer that he protested that it
*could not be*.

So, anyway, I question the wisdom of standardizing such fine points of
the UA behavior as what two taps of the spacebar will do: I believe that
reasonable people can disagree, and setting a standard seems premature.


I disagree. The browsers have to do something when the user hits multiple  
times. I would prefer if all browsers ended up with the same result.


If you think the spec's behavior when hitting space multiple times is  
sub-optimal, we can change it to something better.



There do seem to be a couple of areas where web standards seem
to be lacking if you aspire to write a JavaScript/DOM word
processor.  One area is keyboard input: we had to use a table of
keycode->letter/function correspondences, (at least) one per browser, to
interpret keypresses consistently.


I believe that's something that DOM3Events or its successor should address.


Another area is locating the precise
character position where a mouse click occurred: we found it doable by
binary search, but it was kind of a pain.


Mouse clicks are currently totally undefined in the Web platform. Awaiting  
a spec editor for it.


http://wiki.whatwg.org/wiki/Specs_todo#APIs


Locating and decorating the
"soft breaks" on a page was another pain point.


Interesting. I guess this is only needed for editable content? Is an  
arbitrary Unicode character enough for decoration, or do you want to be  
able to use an image? Do you want to be able to style it? If we add this I  
think we should do it in a way such that the decoration cannot affect the  
line breaking algorithm.


cheers
--
Simon Pieters
Opera Software


Re: [whatwg] Quirks mode handling of rowspan="0"

2012-09-06 Thread Simon Pieters

On Mon, 03 Sep 2012 15:29:13 +0200, Boris Zbarsky  wrote:


On 9/3/12 3:25 AM, Simon Pieters wrote:

Is there a compat problem with supporting it in quirks mode?


I did cover this in my post.  The last time we tried it, there was, but  
that was a while ago.


Oh, sorry.

greping for "rowspan=\"0\"" in  
http://www.paciellogroup.com/blog/2012/04/html5-accessibility-chops-data-for-the-masses/  
and http://dotnetdotcom.org/ I find the following pages being broken in  
Opera but working in Firefox/Chrome:


http://www.persianv.com/
http://www.quicherchetrouve.be/index.php?r1=2&an=3&anb=239&page=3
http://www.kvaak.fi/keskustelu/index.php?topic=1914.0
http://www.timekillerarcade.com/game.php?id=1941
http://www.elginisd.net/cgi-bin/calcium37.pl?CalendarName=David_Schmitt&Op=ShowIt&Amount=Week&NavType=Both&Type=List&Date=2008/8/1

I stopped after going through about half of the matches. All of these are  
quirks mode. I didn't find any that work in Opera/Firefox but are broken  
in Chrome (i.e. rely on it being unsupported in standards mode). I didn't  
find any that work in Opera but are broken in Firefox/Chrome (i.e. expect  
it to work in quirks mode).


This is enough to convince me that the feature should not be supported in  
quirks mode.


--
Simon Pieters
Opera Software