Re: [whatwg] Interaction of explicit and implicit sections (was: Re: Question on (new) header and hgroup)

2009-05-08 Thread Simon Pieters

On Fri, 08 May 2009 00:58:21 +0200, Simon Pieters sim...@opera.com wrote:


Actually I believe it would be:

+--HTML 5
+--A new era of loveliness
+--Navigation


This surprised me when I used implicit sections and just wrapped  
articles around news items (which were h3s). I expected the outline  
to be like it was without the article:


+--Site heading
   +--Page heading
  +--News item

...but instead it became (according to your and gsnedders' outliners):

+--Site heading
   +--Page heading
   +--News item


Maybe the spec should change here to match people's expectations better?


OTOH, if we do this, then the default style sheet (the x x x h1 stuff) will be 
wrong. I guess the proper solution to that is to introduce :heading-level(n) 
that uses the outline algorithm. (But how to select and style subtitles in 
hgroup though?)

--
Simon Pieters
Opera Software




Re: [whatwg] Question on (new) header and hgroup

2009-05-08 Thread Smylers
jgra...@opera.com writes:

 Quoting Smylers smyl...@stripey.com :

  James Graham writes:
 
   hgroup affects the document structure, header does not.
 
  That explains _how_ they are different (as does the spec), but not
  _why_ it is like that.
 
  More specifically:
 
  * Are there significant cases where header needs _not_ to imply
hgroup ?  Consider wrapping an hgroup inside every header ;
how many places has that broken the semantics?  I could believe
that most of the cases where a pager header appropriately contains
multiple headings they are subtitles rather than subsections.

 The semantic that authors seem to want from an element named header
 is All the top matter of my page before the main content. That could
 include headers, subheaders, navigation, asides and almost anything
 else.

It could.  But most of the above have no effect on the outline
algorithm.  In practice, how often do current div class=header
sections contain headers of multiple sections, without those nested
sections being separately wrapped in their own div-s (or similar,
which could become section or whatever's appropriate in HTML 5)?

 Since the header can contain multiple distinct logical sections of
 the document, each with their own headers, it makes no sense to
 implicitly wrap its contents in hgroup.

You're right.  What I was really thinking of is something closer to:
inside header if any hx elements are encountered before any nested
sectioning elements then treat all the hx elements as being a single
heading.

So header could still contain section-s, with their own headings.
And a header with no hx elements wouldn't create an empty entry in
the outline.

  * Given the newness and nuance of header and hgroup and the
distinction between them, it's likely that some authors will
confuse them.  Given that hgroup doesn't appear to do anything
on the page (it's similar to invisible meta-data), it's likely
that some authors will omit it[*1] when it's needed to convey the
semantics they intend.

 Yes, that is possible. The thinking behind the change (or, at least,
 part of my reason for proposing it) was that it is less harmful if
 authors omit something where it would be useful than that they use it
 incorrectly in such a way that tools which follow the spec would be
 broken from the point of view of end users.

That's a good point.

 In particular the old formulation of header would have caused the
 h2 element to be omitted from the outline in cases like header
 h1 My Blog/h1 nav h2 Navigation/h2 /nav /header , which
 would be confusing for users.

Indeed.  What I intended to raise for consideration (and hopefully now
have done) is that header would not merge the above, because nav
starts a new section inside header.  Consider a similar example:

  header
h1My Blog/h1 h2Ramblings of an internet nobody/h2
navh2Navigation/h2 ... /nav
  /header

The spec currently has both the h2-s as subsections.  The alternative
I was thinking of would treat the h1 and first h2 as being a single
heading (of the entire document), but keep the second h2 (as the
heading of the navigation).

 On he other hand in the current formulation of the spec, the most
 likely error (omitting hgroup ) only has the effect that that
 outline heirachy is slightly wrong, with the subheader appearing as an
 actual header; it does not lead to data loss. This seems like a much
 better failure mode.

That's true.  But if the number of failures can be minimized, it matters
less what the failure mode is.

My concern is that with hgroup being so esoteric, combined with its
effect being largely invisible, it will hardly be used and therefore
possibly not worth adding to HTML 5.

Authors don't have a good track record on accurately adding invisible
metadata.  If we can algorithmically get it right in most cases, while
leaving a way for careful authors to explicitly override it if
necessary, that may be better overall.

  * Are there significant cases where hgroup will be useful outside
of header ?
 
   hgroup exists to allow for subtitles and the like.  It's fairly
   common for documents to have these -- where it's likely there's use
   for a header element anyway.
 
   It's much less common for a mere section of a document to warrant a
   multi-part title; is that a case which is worth solving?  If it is,
   would it be problematic to force authors to use header there?

 It seems highly odd to have header perform a dual role where
 sometimes it means section header and sometimes it means group of
 heading/subheading elements. Much more confusing than one element per
 role.

I think the two concepts are sufficiently overlapping that it isn't
really a dual role.  header could mean 'section (or document) header'
-- it would be used when a section's header consists of more than just a
single hx element.  Whether those elements are because of multi-part
titles or search boxes or whatever is a distinction that authors would

Re: [whatwg] MessagePorts in Web Workers: implementation feedback

2009-05-08 Thread Maciej Stachowiak


On May 7, 2009, at 5:40 PM, Drew Wilson wrote:


Agreed that removing this requirement:
User agents must act as if MessagePort objects have a strong  
reference to their entangled MessagePort object.


would make MessagePort implementation much easier, as it would  
remove the need to track reachability across multiple threads. This  
requirement can get tricky especially as both sides can be cloned,  
in-flight to a new owner, etc.


My only concern is that removing this requirement introduces non- 
deterministic behavior - if I have an entangled MessagePort and I  
register an onmessage() handler with it, then drop my reference to  
it, after which someone calls postMessage() on the entangled port,  
there's no way to tell if my onmessage() handler will be invoked ;  
it entirely depends on whether a GC happens first or not. That seems  
bad.


That's a fair concern. I would mention a few counterpoints:

1) Nondeterministic behavior is inevitable with an API designed for  
concurrency. There are surely already possible cases of nondeterminism  
in the MessagePort API. Consider sending a message to two different  
workers and waiting for the reply. The replies may arrive in either  
order; indeed, the workers may receive the messages in either order,  
so if they are in communication with each other you cannot rely on one  
getting the message and performing its action first first.


2) The nondeterministic behavior in this case is easily avoided by  
what are in any case good coding practices: (a) don't drop all  
references to a MessagePort you are still using, and (b) call close()  
on the MessagePort when you are done with it and don't want more  
messages.


3) The alternatives on the table to removing this requirement are  
either removing the ability to use MessagePorts to communicate  with  
Workers, or leaving the spec as-is with its attendant high  
implementation cost.


Given all these factors, I think avoid nondeterminism in the one  
particular case you describe, when authors can already avoid it in a  
reasonable way, is not worth the order of magnitude increase in  
implementation complexity imposed by the entanglement keepalive  
requirement. I also think accepting this small amount of potential  
nondeterminism is preferable to excluding Workers from using  
MessagePorts.


Thus, on the whole, I think the best option is to remove the keepalive  
requirement.


Regards,
Maciej



-atw

On Thu, May 7, 2009 at 3:28 PM, Maciej Stachowiak m...@apple.com  
wrote:


I agree with Drew's assessment that MessagePorts in combination with  
Workers are extremely complicated to implement correctly, as  
currently specified. In fact, the design seems to push towards  
having lockable shared state, even though one potential advantage of  
the message passing design is to avoid locking and shared state.


Besides removing MessagePorts as a way to communicate with workers,  
another possibility is simplifying the life cycle requirements. For  
example, getting rid of the keepalive rule, whereby both  
MessagePorts remain live so long as either is otherwise live, would  
remove the majority of the complexity. I don't think the slight  
convenience of that rule is worth the extra implementation cost.


On May 7, 2009, at 1:39 PM, Drew Wilson wrote:


Hi all,

I've been hashing through a bunch of the design issues around using  
MessagePorts within Workers with IanH and the Chrome/WebKit teams  
and I wanted to follow up with the list with my progress.


The problems we've encountered are all solveable, but I've been  
surprised at the amount of work involved in implementing worker  
MessagePorts (and the resulting implications that MessagePorts have  
on worker lifecycles/reachability). My concern is that the amount  
of work to implement MessagePorts within Worker context may be so  
high that it will prevent vendors from implementing the  
SharedWorker API. Have other implementers started working on this  
part of the spec yet?


Let me quickly run down some of the implementation issues I've run  
into - some of these may be WebKit/Chrome specific, but other  
browsers may run into some of them as well:


1) MessagePort reachability is challenging in the context of  
separate Worker heaps


In WebKit, each worker has its own heap (in Chrome, they will have  
their own process as well). The spec reads:
User agents must act as if MessagePort objects have a strong  
reference to their entangled MessagePort object.


Thus, a message port can be received, given an event listener, and  
then forgotten, and so long as that event listener could receive a  
message, the channel will be maintained.


Of course, if this was to occur on both sides of the channel, then  
both ports would be garbage collected, since they would not be  
reachable from live code, despite having a strong reference to each  
other.


Furthermore, a MessagePort object must not be garbage collected  
while there exists a message in a 

Re: [whatwg] video/audio feedback

2009-05-08 Thread Silvia Pfeiffer
On Fri, May 8, 2009 at 9:43 AM, David Singer sin...@apple.com wrote:
 At 8:45  +1000 8/05/09, Silvia Pfeiffer wrote:

 On Fri, May 8, 2009 at 5:04 AM, David Singer sin...@apple.com wrote:

  At 8:39  +0200 5/05/09, KÞitof Îelechovski wrote:

  If the author wants to show only a sample of a resource and not the
 full
  resource, I think she does it on purpose.  It is not clear why it is
 vital
  for the viewer to have an _obvious_ way to view the whole resource
  instead;
  if it were the case, the author would provide for this.
  IMHO,
  Chris

  It depends critically on what you think the semantics of the fragment
 are.
  In HTML (the best analogy I can think of), the web page is not trimmed
 or
  edited in any way -- you are merely directed to one section of it.

 There are critical differences between HTML and video, such that this
 analogy has never worked well.

 could you elaborate?

At the risk of repeating myself ...

HTML is text and therefore whether you download a snippet only or the
full page and then do an offset does not make much of a difference.
Even for a long page.

In contrast, downloading a snippet of video compared to the full video
will make a huge difference, in particular for long-form video.

So, the difference is that in HTML the user agent will always have the
context available within its download buffer, while for video this may
not be the case.

This admittedly technical difference also has an influence on the user
interface.

If you have all the context available in the user agent, it is easy to
just grab a scroll-bar and jump around in the full content manually to
look for things. This is not possible in the video case without many
further download actions, which will each incur a network delay. This
difference opens the door to enable user agents with a choice in
display to either provide the full context, or just the fragment
focus.

Thus, while comparing media fragments to HTML fragments is a simple
way to introduce the concept - and I use it, too, to explain to my
less technical peers - it doesn't really help for detailed
specifications.

Regards,
Silvia.


Re: [whatwg] Interaction of explicit and implicit sections (was: Re: Question on (new) header and hgroup)

2009-05-08 Thread Tab Atkins Jr.
On Fri, May 8, 2009 at 1:46 AM, Simon Pieters sim...@opera.com wrote:
 I guess the proper solution to that is to introduce
 :heading-level(n) that uses the outline algorithm. (But how to select and
 style subtitles in hgroup though?)

If you used a :heading-level() pseudoclass, you'd just do
:heading-level(x) h2 to style the h2 within an element of heading
level x.  (This would only target h2s withing hgroups right now,
not h2s by themselves.)

~TJ


Re: [whatwg] Suitable video codec

2009-05-08 Thread Michael Dale
yea.. the take home point is that Theora now has an encoder that puts it 
in the same ballpark as contemporary proprietary codecs. I would not say 
Theora is outdoing h.264. The results of a given PSNR test are 
impressive and important to publicize but I think my wording in posting 
about that test might have promoted overstating the quality factor.


The only quality that really mattered in terms of standardization has 
stayed constant: which is Ogg Theora is /royalty free/ and implementable 
in both proprietary and free software browsers.


--michael

David Gerard wrote:

H.264 was advocated here for the video element as higher quality
than competing codecs such as Theora could ever manage.

The Thusnelda coder is outdoing H.,264 in current tests:

http://web.mit.edu/xiphmont/Public/theora/demo7.html

This is of course developmental work. I'm sure the advocates of H. 264
can also tune its encoders to keep up, and not make Theora the only
reasonable candidate for the video element.


- d.





Re: [whatwg] video/audio feedback

2009-05-08 Thread David Singer

At 23:46  +1000 8/05/09, Silvia Pfeiffer wrote:

On Fri, May 8, 2009 at 9:43 AM, David Singer sin...@apple.com wrote:

 At 8:45  +1000 8/05/09, Silvia Pfeiffer wrote:


 On Fri, May 8, 2009 at 5:04 AM, David Singer sin...@apple.com wrote:


  At 8:39  +0200 5/05/09, KÞitof Îelechovski wrote:


  If the author wants to show only a sample of a resource and not the
 full
  resource, I think she does it on purpose.  It is not clear why it is
 vital
  for the viewer to have an _obvious_ way to view the whole resource
  instead;
  if it were the case, the author would provide for this.
  IMHO,
  Chris


  It depends critically on what you think the semantics of the fragment
 are.
  In HTML (the best analogy I can think of), the web page is not trimmed
 or
  edited in any way -- you are merely directed to one section of it.


 There are critical differences between HTML and video, such that this
 analogy has never worked well.


 could you elaborate?


At the risk of repeating myself ...

HTML is text and therefore whether you download a snippet only or the
full page and then do an offset does not make much of a difference.
Even for a long page.


you might try loading, say, the one-page version 
of the HTML5 spec. from the WhatWG site...it 
takes quite a while.  Happily Ian also provides a 
multi-page, but this is not always the case.




In contrast, downloading a snippet of video compared to the full video
will make a huge difference, in particular for long-form video.


there are short and long pages and videos.

But we're talking about a point of principal 
here, which should be informed by practical, for 
sure, but not dominated by it.


The reason I want clarity is that this has 
ramifications.  For example, if a UA is asked to 
play a video with a fragment indication 
#time=10s-20s, and then a script seeks to 5s, 
does the user see the video at the 5s point of 
the total resource, or 15s?  I think it has to be 
5s.




So, the difference is that in HTML the user agent will always have the
context available within its download buffer, while for video this may
not be the case.


I'm sorry, I am lost.  We could quite easily 
extend HTTP to allow for anchor-based retrieval 
of HTML (i.e. convert a 'please start at anchor 
X' into a pair of byte-range responses, for the 
global material, and then the document from that 
anchor onwards).




This admittedly technical difference also has an influence on the user
interface.

If you have all the context available in the user agent, it is easy to
just grab a scroll-bar and jump around in the full content manually to
look for things. This is not possible in the video case without many
further download actions, which will each incur a network delay. This
difference opens the door to enable user agents with a choice in
display to either provide the full context, or just the fragment
focus.


But we can optimize for the fragment without disallowing the seeking.


--
David Singer
Multimedia Standards, Apple Inc.


Re: [whatwg] Micro-data/Microformats/RDFa Interoperability Requirement

2009-05-08 Thread Ian Hickson
On Thu, 7 May 2009, Manu Sporny wrote:
 
 That's certainly not what the WHATWG blog stated just 20 days ago for
 rel=license [...]

The WHATWG blog is an open platform on which anyone can post, and content 
is not vetted for correctness. Mark can sometimes make mistakes. Feel free 
to post a correction. :-)


 and the spec doesn't seem to clearly outline the difference in 
 definition either (at least, that's not my reading of the spec):
 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#link-type-license
 http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#link-type-tag

Actually I just looked at the rel-tag faq and found that it disagrees with 
what Tantek had told me, so (assuming the faq is normative or that the 
rel-tag spec does mention this somewhere that I didn't find) the specs do 
match here.

For rel-license, the HTML5 spec defines the value to apply to the content 
and not the page as a whole. This is a recent change to match actual 
practice and I will be posting about this shortly.


  The RDFa specification is very confusing to me (e.g. I don't 
  understand how the normative processing model is separate from the 
  section RDFa Processing in detail), so I may be misinterpreting 
  things, but as far as I can tell:
  
html xmlns=http://www.w3.org/1999/xhtml;
 head
  base href=http://example.com//
  link about=http://example.net/;
rel=dc.author 
href=http://a.example.org//
 ...
  
  ...will result in the following triple:
  
 http://example.net/ http://example.com/dc.author 
  http://a.example.org/ .
 
 Two corrections:
 
 The first is that an RDFa processor would not generate this triple.

My apologies, I misinterpreted 5.4.4. Use of CURIEs in Specific Attributes 
to mean that rel= was a relative-uri-or-curie attribute. (5.4.4. Use of 
CURIEs in Specific Attributes says it's link-type-or-curie, but 5.4.3. 
General Use of CURIEs in Attributes doesn't list that as a possibility and 
at the end says that rel= is an exception only insofar as it supports 
specific link types as well, which I interpreted differently.)


  For example, it would be somewhat presumptious of RDFa to prevent any 
  future version of HTML from being able to use the word resource as 
  an attribute name. What if we want to extend the forms features to 
  have an XForms datatype compatibility layer; why should we not be 
  able to use the datatype and typeof attributes?
 
 As long as their legacy nature was preserved, and those uses didn't 
 create ambiguity in RDFa processors and semantic equivalence was 
 ensured, I don't see why they shouldn't be re-used.

Ah, ok. If such attributes are re-used, I suppose that it should be 
possible to make sure that it is possible to re-use them in a way that 
doesn't conflict with RDFa (e.g. by triggering the non-curie-non-uri 
behaviour for property= or by having authors who want RDFa compatibility 
use xmlns:http=http: declarations or some such).

Noted.


  Surely this is what namespaces were intended for.
 
 Uhh, what sort of namespaces are we talking about here? xmlns-style, 
 namespaces?

The idea of XML Namespaces was to allow people to extend vocabularies
with a new features without clashing with older features by putting the 
new names in new namespaces. It seems odd that RDFa, a W3C technology for 
an XML vocabulary, didn't use namespaces to do it.


  For example, the way that n:next and next can end up being 
  equivalent in RDFa processors despite being different per HTML rules 
  (assuming an n namespace is appropriately declared).
 
  If they end up being equivalent in RDFa, the RDFa author did so 
  explicitly when declaring the 'n' prefix to the default prefix 
  mapping and we should not second-guess the authors intentions.
  
  My only point is that it is not compatible with HTML4 and HTML5, 
  because they end up with different results in the same situation (one 
  can treat two different values as the same, while the other can treat 
  two different values as different).
 
 It is only not compatible with HTML5 if this community chooses for it to 
 not be compatible with HTML5. Do you agree or disagree that we shouldn't 
 second guess the authors intentions if they go out of their way to 
 declare a mapping for 'n'?

I don't think that's a relevant question. My point is that it is possible 
in RDFa to put two strings that have different semantics in HTML4 and yet 
have them have the same semantics in RDFa. This means RDFa is not 
compatible with HTML4.


  Another example would be:
  
html xmlns=http://www.w3.org/1999/xhtml;
 head about=
  link rel=stylesheet alternate next href=...
  ...
  
  ...which in RDFa would cause the following triples to be created:
  
  http://www.w3.org/1999/xhtml/vocab#stylesheet ... .
  http://www.w3.org/1999/xhtml/vocab#alternate ... .
  http://www.w3.org/1999/xhtml/vocab#next ... .
  
  ...but according to 

[whatwg] Helping people seaching for content filtered by license

2009-05-08 Thread Ian Hickson

One of the use cases I collected from the e-mails sent in over the past 
few months was the following:

   USE CASE: Help people searching for content to find content covered by
   licenses that suit their needs.

   SCENARIOS:
 * If a user is looking for recipes of pies to reproduce on his blog, he
   might want to exclude from his results any recipes that are not
   available under a license allowing non-commercial reproduction.
 * Lucy wants to publish her papers online. She includes an abstract of
   each one in a page, but because they are under different copyright
   rules, she needs to clarify what the rules are. A harvester such as
   the Open Access project can actually collect and index some of them
   with no problem, but may not be allowed to index others. Meanwhile, a
   human finds it more useful to see the abstracts on a page than have to
   guess from a bunch of titles whether to look at each abstract.
 * There are mapping organisations and data producers and people who take
   photos, and each may place different policies. Being able to keep that
   policy information helps people with further mashups avoiding
   violating a policy. For example, if GreatMaps.com has a public domain
   policy on their maps, CoolFotos.org has a policy that you can use data
   other than images for non-commercial purposes, and Johan Ichikawa has
   a photo there of my brother's cafe, which he has licensed as must pay
   money, then it would be reasonable for me to copy the map and put it
   in a brochure for the cafe, but not to copy the data and photo from
   CoolFotos. On the other hand, if I am producing a non-commercial guide
   to cafes in Melbourne, I can add the map and the location of the cafe
   photo, but not the photo itself.
 * Tara runs a video sharing web site for people who want licensing
   information to be included with their videos. When Paul wants to blog
   about a video, he can paste a fragment of HTML provided by Tara
   directly into his blog. The video is then available inline in his
   blog, along with any licensing information about the video.
 * Fred's browser can tell him what license a particular video on a site
   he is reading has been released under, and advise him on what the
   associated permissions and restrictions are (can he redistribute this
   work for commercial purposes, can he distribute a modified version of
   this work, how should he assign credit to the original author, what
   jurisdiction the license assumes, whether the license allows the work
   to be embedded into a work that uses content under various other
   licenses, etc).
 * Flickr has images that are CC-licensed, but the pages themselves are
   not.
 * Blogs may wish to reuse CC-licensed images without licensing the whole
   blog as CC, but while still including attribution and license
   information (which may be required by the licenses in question).

   REQUIREMENTS:
 * Content on a page might be covered by a different license than other
   content on the same page.
 * When licensing a subpart of the page, existing implementations must
   not just assume that the license applies to the whole page rather than
   just part of it.
 * License proliferation should be discouraged.
 * License information should be able to survive from one site to another
   as the data is transfered.
 * Expressing copyright licensing terms should be easy for content
   creators, publishers, and redistributors to provide.
 * It should be more convenient for the users (and tools) to find and
   evaluate copyright statements and licenses than it is today.
 * Shouldn't require the consumer to write XSLT or server-side code to
   process the license information.
 * Machine-readable licensing information shouldn't be on a separate page
   than human-readable licensing information.
 * There should not be ambiguous legal implications.
 * Parsing rules should be unambiguous.
 * Should not require changes to HTML5 parsing rules.


The scenarios described above fall into three categories: searching for 
content, publishing content, and obtaining legal advice.

First, I will examine the search scenario:

 * If a user is looking for recipes of pies to reproduce on his blog, he
   might want to exclude from his results any recipes that are not
   available under a license allowing non-commercial reproduction.

This is technically possible today. The rel=license link type allows 
authors to specify the license that applies to the main content on a page, 
in this case recipes, search engines can be programmed with the most 
common licenses, and the user can tell the search engine what 
characteristics he wants (compatible with GPLv2, no advertising 
clause, doesn't have patent implications, 

[whatwg] microdata use cases and Getting data out of poorly written Web pages

2009-05-08 Thread Shelley Powers
It's difficult to tell where one should comment on the so-called 
microdata use cases. I'm forced to send to multiple mailing lists.


Ian, I would like to see the original request that went into this 
particular use case. In particular, I'd like to know who originated it, 
so that we can ensure that the person has read your follow-up, as well 
as how you condensed the use case down (to check if your interpretation 
is proper or not).


In addition, from my reading of this posting of yours titled [whatwg] 
Getting data out of poorly written Web pages, is this open for any 
discussion? It seems to me that you received the original data, 
generated a use case document from the data, unilaterally, and now 
you're making unilateral decisions as to whether the use case requires a 
change in HTML5 or not.


Is this what we can expect from all of the use cases?

Shelley





Re: [whatwg] microdata use cases and Getting data out of poorly written Web pages

2009-05-08 Thread Ian Hickson
On Fri, 8 May 2009, Shelley Powers wrote:

 It's difficult to tell where one should comment on the so-called 
 microdata use cases. I'm forced to send to multiple mailing lists.

Please don't cross-post to the WHATWG list and other lists -- you may pick 
either one, I read all of them. (Cross-posting results in a lot of 
confusion because some of the lists only allow members to posts, which 
others allow anyone to post, so we end up with fragmented threads.)


 Ian, I would like to see the original request that went into this 
 particular use case. In particular, I'd like to know who originated it, 
 so that we can ensure that the person has read your follow-up, as well 
 as how you condensed the use case down (to check if your interpretation 
 is proper or not).

I did not keep track of where the use cases came from (I generally ignore 
the source of requests so as to avoid any possible bias).

However, I can probably figure out some of the sources of a particular 
scenario if you have a specific one in mind. Could you clarify which 
scenario or requirement you are particularly interested in?


 In addition, from my reading of this posting of yours titled [whatwg] 
 Getting data out of poorly written Web pages, is this open for any 
 discussion?

Naturally, all input is always welcome.


 It seems to me that you received the original data, generated a use case 
 document from the data, unilaterally, and now you're making unilateral 
 decisions as to whether the use case requires a change in HTML5 or not.
 
 Is this what we can expect from all of the use cases?

Yes.

If my proposals don't actually address the use cases, then please do point 
how that is the case. Similarly, if there are missing use cases, please 
bring them up. All input is always welcome (whether on the lists, or 
direct e-mal, on blogs, or wherever). None of the text in the HTML5 spec 
is frozen, it's merely a proposal. If there are use cases that should be 
addressed that are not addressed then we should address them.

(Regarding microdata note that I've so far only sent proposals for three 
of the 20 use cases that I collected. I've still got a lot to go through.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] microdata use cases and Getting data out of poorly written Web pages

2009-05-08 Thread Shelley Powers

Ian Hickson wrote:

On Fri, 8 May 2009, Shelley Powers wrote:
  
It's difficult to tell where one should comment on the so-called 
microdata use cases. I'm forced to send to multiple mailing lists.



Please don't cross-post to the WHATWG list and other lists -- you may pick 
either one, I read all of them. (Cross-posting results in a lot of 
confusion because some of the lists only allow members to posts, which 
others allow anyone to post, so we end up with fragmented threads.)



  
But different people respond to the mailings in different ways, 
depending on the list. This isn't just you, Ian. How can I ensure that 
the W3C people have access to the same concerns?
Ian, I would like to see the original request that went into this 
particular use case. In particular, I'd like to know who originated it, 
so that we can ensure that the person has read your follow-up, as well 
as how you condensed the use case down (to check if your interpretation 
is proper or not).



I did not keep track of where the use cases came from (I generally ignore 
the source of requests so as to avoid any possible bias).


  
Documenting the originator of a use case is introducing bias? In what 
universe?


If anything, documenting where the use cases come from, and providing 
access to the original, raw data helps to ensure that bias has not been 
introduced. More importantly, it gives your teammates a chance to verify 
your interpretation of the use cases, and provide correction, if needed.


However, I can probably figure out some of the sources of a particular 
scenario if you have a specific one in mind. Could you clarify which 
scenario or requirement you are particularly interested in?



  
Ian, I think its important that you provide a place documenting the 
original raw data. This provides a historical perspective on the 
decisions going into HTML5 if nothing else.


If you need help, I'm willing to help you. You'll need to forward me the 
emails you received, and send me links to the other locations. I'll then 
put all these into a document and we can work to map to your condensed 
document. That way there's accountability at all steps in the decision 
process, as well as transparency.


Once I put the document together, we can put with other documents that 
also provide history of the decision processes.
In addition, from my reading of this posting of yours titled [whatwg] 
Getting data out of poorly written Web pages, is this open for any 
discussion?



Naturally, all input is always welcome.


  
No, I didn't ask if input was welcome. I asked if this was still open 
for discussion, or if you have made up your mind, and and further 
discussion will just be wasting everyone's time.
It seems to me that you received the original data, generated a use case 
document from the data, unilaterally, and now you're making unilateral 
decisions as to whether the use case requires a change in HTML5 or not.


Is this what we can expect from all of the use cases?



Yes.
  

That's not appropriate for a team environment.
If my proposals don't actually address the use cases, then please do point 
how that is the case. Similarly, if there are missing use cases, please 
bring them up. All input is always welcome (whether on the lists, or 
direct e-mal, on blogs, or wherever). None of the text in the HTML5 spec 
is frozen, it's merely a proposal. If there are use cases that should be 
addressed that are not addressed then we should address them.


  

Again, how can I? I don't have the original data.
(Regarding microdata note that I've so far only sent proposals for three 
of the 20 use cases that I collected. I've still got a lot to go through.)


  

After digging, I found another one, at

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019620.html

Again, though, the writing style indicates the item is closed, and 
discussion is not welcome. I have to assume that this is how you 
mentally perceive the item, and therefore though we may respond, the 
response will make no difference.


And I can't find the third one. Perhaps you can provide a direct link.

I'm concerned, too, about the fact that the discussion for these is 
happening on the WhatWG group, but not in the HTML WG email list. I've 
never understood two different email lists, and have felt having both is 
confusing, and potentially misleading. Regardless, shouldn't this 
discussion be taking place in the HTML WG, too?


Isn't the specification the W3C HTML5 specification, also?

I'm just concerned because from what I can see of both groups, interests 
and concerns differ between the groups. That means only addressing 
issues in one group, would leave out potentially important discussions 
in the other group.


Shelley




[whatwg] Allowing authors to annotate their documents to explain things for readers

2009-05-08 Thread Ian Hickson

One of the use cases I collected from the e-mails sent in over the past 
few months was the following:

   USE CASE: Allow authors to annotate their documents to highlight the key
   parts, e.g. as when a student highlights parts of a printed page, but in a
   hypertext-aware fashion.

   SCENARIOS:
 * Fred writes a page about Napoleon. He can highlight the word Napoleon
   in a way that indicates to the reader that that is a person. Fred can
   also annotate the page to indicate that Napoleon and France are
   related concepts.

This use case isn't altogether clear, but if the target audience of the 
annotations is human readers (as opposed to machines and readers using 
automated processing tools), then it seems like this is already possible 
in a number of ways in HTML5.

The easiest way of addressing this is just to include text bringing the 
user's attention to relationships:

   pThis page is about Napoleon. He was my uncle and lived in 
   France./p

Individual keywords can be highlighted with b:

   pThis page is about bNapoleon/b. He was my uncle and lived in 
   bFrance/b./p

Prose annotations can be added to individual words or phrases using the 
title= attribute:

   pThis page is about span title=A personNapoleon/span. He was my 
   uncle and lived in span title=A hamlet near Drummond, in Idaho, 
   USAFrance/span./p

These typically show as tooltips.

To highlight material on the page that might be relevant to the user, e.g. 
if the user searched for the word Uncle and the site wanted to highlight 
the word Uncle, the mark element can be used:

   pThis page is about Napoleon. He was my markuncle/mark and lived 
   in France./p

The same element can be used by a reader editing an existing document to 
highlight the parts that warrant further study, possibly using the 
title= attribute to include notes:

   pThis page is about Napoleon. He was my uncle and mark 
   title=really?lived in France/mark./p

Links can be used to link parts of a document together to indicate 
relationships:

   p id=napoleonMy uncle was called Napoleon. See also: a
   href=#franceFrance/a, a href=#uncleUncle/a./p
   ...
   p id=franceFrance is a hamlet near Drummond, ID. My uncle lived 
   there. See also: a href=#napoleonNapoleon/a./p

In conclusion, this use case doesn't seem to need any new changes to the 
language.

A number of further use cases remain to be examined, including some more 
specifically looking at machine-readable annotations rather than 
annotations aimed directly at human readers. I will send further e-mail 
next week as I address them.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] video/audio feedback

2009-05-08 Thread Silvia Pfeiffer
On Sat, May 9, 2009 at 2:25 AM, David Singer sin...@apple.com wrote:
 At 23:46  +1000 8/05/09, Silvia Pfeiffer wrote:

 On Fri, May 8, 2009 at 9:43 AM, David Singer sin...@apple.com wrote:

  At 8:45  +1000 8/05/09, Silvia Pfeiffer wrote:

  On Fri, May 8, 2009 at 5:04 AM, David Singer sin...@apple.com wrote:

  At 8:39  +0200 5/05/09, KÞitof Îelechovski wrote:

  If the author wants to show only a sample of a resource and not the
  full
  resource, I think she does it on purpose.  It is not clear why it is
  vital
  for the viewer to have an _obvious_ way to view the whole resource
  instead;
  if it were the case, the author would provide for this.
  IMHO,
  Chris

  It depends critically on what you think the semantics of the fragment
  are.
  In HTML (the best analogy I can think of), the web page is not trimmed
  or
  edited in any way -- you are merely directed to one section of it.

  There are critical differences between HTML and video, such that this
  analogy has never worked well.

  could you elaborate?

 At the risk of repeating myself ...

 HTML is text and therefore whether you download a snippet only or the
 full page and then do an offset does not make much of a difference.
 Even for a long page.

 you might try loading, say, the one-page version of the HTML5 spec. from the
 WhatWG site...it takes quite a while.  Happily Ian also provides a
 multi-page, but this is not always the case.

That just confirms the problem and it's obviously worse with video. :-)


 The reason I want clarity is that this has ramifications.  For example, if a
 UA is asked to play a video with a fragment indication #time=10s-20s, and
 then a script seeks to 5s, does the user see the video at the 5s point of
 the total resource, or 15s?  I think it has to be 5s.

I agree, it has to be 5s. The discussion was about what timeline is
displayed and what can the user easily access through seeking through
the displayed timeline. A script can access any time of course. But a
user is restricted by what the user interface offers.


 So, the difference is that in HTML the user agent will always have the
 context available within its download buffer, while for video this may
 not be the case.

 I'm sorry, I am lost.  We could quite easily extend HTTP to allow for
 anchor-based retrieval of HTML (i.e. convert a 'please start at anchor X'
 into a pair of byte-range responses, for the global material, and then the
 document from that anchor onwards).

Yes, but that's not the way it currently works and it is not a
proposal currently under discussion.


 This admittedly technical difference also has an influence on the user
 interface.

 If you have all the context available in the user agent, it is easy to
 just grab a scroll-bar and jump around in the full content manually to
 look for things. This is not possible in the video case without many
 further download actions, which will each incur a network delay. This
 difference opens the door to enable user agents with a choice in
 display to either provide the full context, or just the fragment
 focus.

 But we can optimize for the fragment without disallowing the seeking.

What do you mean by optimize for the fragment? Of course none of the
discussion will inherently disallow seeking - scripts will always be
able to do the seeking. But the user may not find it easy to do
seeking to a section that is not accessible through the displayed
timeline, which can be both a good and a bad thing.


Cheers,
Silvia.