[whatwg] id and xml:id

2006-04-02 Thread Henri Sivonen
Since UAs handle whitespace in the id attribute inconsistently (see  
below), old specs imply or require whitespace trimming and ids with  
whitespace are unreferencable from whitespace-separated lists of ids,  
I suggest adding the following language concerning document conformance:


The value of the id attribute must be a string that consists of one  
or more characters matching the following production: [#x21-#xD7FF]| 
[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding  
whitespace).


Also, I suggest requiring that elements must not have both id and  
xml:id and requiring that xml:id must not occur in the HTML  
serialization. (Again, from the document conformance point of view-- 
not disputing requirements on browsers.)


Rationale:
HTML doesn't have namespace processing of colonified names and the  
xml:id spec is not designed for HTML. Allowing xml:id in HTML feels  
intuitively wrong (perhaps even a bit evil :-).


If an element had both an id attribute and an xml:id attribute with  
different values, the document would not be HTML-serializable, which  
would be bad. (Obviously, even with only one kind of ID attribute on  
an element, in round tripping from XHTML to HTML to XHTML, the  
information about whether the original attribute was id or xml:id is  
lost just like the information about whether a table had a tbody is  
lost.)


If an element was allowed to have an id attribute and an xml:id  
attribute with the same value, the following constraint from xml:id  
spec would be violated even for conforming docs:

An xml:id processor should assure that the following constraint holds:
* The values of all attributes of type “ID” (which includes all  
xml:id attributes) within a document are unique.

( http://www.w3.org/TR/xml-id/ )
Assuming, of course, that the XHTML5 id can still be considered an ID  
in the XML sense.


Finally, as the ultimate ID nitpicking, the spec should state that it  
is naughty of authors to turn attributes other than id and xml:id  
into IDs via the DTD. (Well, using a DTD at all is naughty. :-)


- -

Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html
The script tries every id with a whitespaceless value to see if  
whitespace is trimmed before ID assignment.


Firefox:

id='a' PASS
id='2' PASS
id='lt;' PASS
id=',' PASS
id='auml;' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='#13;f#13;' PASS

Opera (weekly build 3312; note that Opera recently changed its  
behavior to match the others with id=' c '):


id='a' PASS
id='2' PASS
id='lt;' PASS
id=',' PASS
id='auml;' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='#13;f#13;' FAIL

Safari and IE 6:

id='a' PASS
id='2' PASS
id='lt;' PASS
id=',' PASS
id='auml;' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='#13;f#13;' FAIL

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-02 Thread Anne van Kesteren

Quoting Henri Sivonen [EMAIL PROTECTED]:

Also, I suggest requiring that elements must not have both id and
xml:id and requiring that xml:id must not occur in the HTML
serialization. (Again, from the document conformance point of view--
not disputing requirements on browsers.)


How could it occur in a HTML document? (Given that the browser in question is
namespace aware.) I'm assuming here that we're not talking about adding stuff
through the DOM given that you talk about serialization.



Rationale:
HTML doesn't have namespace processing of colonified names and the
xml:id spec is not designed for HTML. Allowing xml:id in HTML feels
intuitively wrong (perhaps even a bit evil :-).


I agree.



If an element had both an id attribute and an xml:id attribute with
different values, the document would not be HTML-serializable, which
would be bad.


Now I agree that's bad, but I think there is something to say for elements
having multiple IDs. (Even though that's not valid for some definition of it.)



(Obviously, even with only one kind of ID attribute on  an element,
in round tripping from XHTML to HTML to XHTML, the  information about
whether the original attribute was id or xml:id is  lost just like
the information about whether a table had a tbody is  lost.)


Interesting point. I think tbody should be required in XHTML
personally to go
against that. It just doesn't make sense the way it is now.



If an element was allowed to have an id attribute and an xml:id
attribute with the same value, the following constraint from xml:id
spec would be violated even for conforming docs:
An xml:id processor should assure that the following constraint holds:
* The values of all attributes of type “ID” (which includes
all  xml:id attributes) within a document are unique.
( http://www.w3.org/TR/xml-id/ )
Assuming, of course, that the XHTML5 id can still be considered an ID
 in the XML sense.


It should be considered an ID in the XML sense for getElementByID and friends.



Finally, as the ultimate ID nitpicking, the spec should state that it
 is naughty of authors to turn attributes other than id and xml:id
into IDs via the DTD. (Well, using a DTD at all is naughty. :-)


But through DOM methods is ok? (I agree that DTDs are obsolete...)



Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html


Interesting testcase!



Opera (weekly build 3312; note that Opera recently changed its
behavior to match the others with id=' c '):


Bah. I hope we can revert that... Do you have a similar test for xml:id? Opera
does (did?) passes the following for example:

http://annevankesteren.nl/test/xml/xml-id/008.xml

Cheers,

Anne


--
Anne van Kesteren
http://annevankesteren.nl/



Re: [whatwg] id and xml:id

2006-04-02 Thread fantasai

Henri Sivonen wrote:
Since UAs handle whitespace in the id attribute inconsistently (see  
below), old specs imply or require whitespace trimming and ids with  
whitespace are unreferencable from whitespace-separated lists of ids,  I 
suggest adding the following language concerning document conformance:


The value of the id attribute must be a string that consists of one  or 
more characters matching the following production: [#x21-#xD7FF]| 
[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding  
whitespace).


I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition templates,
but otherwise I think it should be compatible with the XML ID syntax.

So

  xsd:id {
pattern: \S*;
  }

The concept of idness is a useful one for many tools, and even if
browsers don't care what characters there are, other tools do. We can't
express IDness in a schema if we insist on ignoring its syntactic
restrictions.

~fantasai


Re: [whatwg] id and xml:id

2006-04-02 Thread Anne van Kesteren

Quoting fantasai [EMAIL PROTECTED]:

I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition templates,
but otherwise I think it should be compatible with the XML ID syntax.


I agree. Note also that the repetition template also allows for 
characters that

are compatible with XML ID. Of course, this is only for valid documents... All
things should still be defined in a way that they take into account invalid,
yet well-formed, documents as well. (And HTML documents...)


--
Anne van Kesteren
http://annevankesteren.nl/



Re: [whatwg] id and xml:id

2006-04-02 Thread Henri Sivonen

On Apr 2, 2006, at 19:26, Anne van Kesteren wrote:

I agree. Note also that the repetition template also allows for  
characters that
are compatible with XML ID. Of course, this is only for valid  
documents... All
things should still be defined in a way that they take into account  
invalid,

yet well-formed, documents as well. (And HTML documents...)


I am interested in conforming DTD-invalid well-formed XHTML documents  
and conforming HTML documents. I think that whatever is allowed as an  
id attribute value in conforming HTML documents should also be  
conforming as a value of an id attribute in XHTML (but not  
necessarily conforming as an xml:id value) in order to allow XHTML- 
serializability of conforming HTML docs.


(I am not interested in DTD-valid documents. I consider DTDs harmful.)

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-02 Thread fantasai

Henri Sivonen wrote:

On Apr 2, 2006, at 18:56, fantasai wrote:


I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition  templates,
but otherwise I think it should be compatible with the XML ID syntax.


Do you mean common attrs should have a co-occurrence constraint that  
changes the datatype of the id attribute if the repeat attribute is  
present?


Yes. Or, at the very least, if the repetition module is loaded.


I was planning on defining the datatype of the id attribute as
  xsd:string {
pattern = \S+
  }

NCName with the exception that it allows [ and ] will be one huge  
regexp. (But doable, of course.) If that is what we want, the syntax  
should probably be
(Letter | '_') (NCNameCharWithout02D1and00B7)* (('[' | #x02D1)  ( 
NCNameCharWithout02D1and00B7)+ (']' | #x00B7)))?  ( 
NCNameCharWithout02D1and00B7)*

with the XML 1.0 definitions of Letter and NCNameChar.


Cool, that would even catch mismatched brackets. :)


The concept of idness is a useful one for many tools, and even if
browsers don't care what characters there are, other tools do. We  can't
express IDness in a schema if we insist on ignoring its syntactic
restrictions.


I didn't bother to make that argument, because I thought changing the  
language to fit schemas wouldn't go down well with Hixie. :-)


(In http://hsivonen.iki.fi/lists-in-attributes/ I tried to bring a  
general less code and more reuse of correct code argument into it  
instead of only playing the it's incompatible with my schema  language 
of choice argument.)


It's not my schema language of choice, it's the top three (by a long
shot) schema languages in use for XML.

I wasn't even expecting to be able to do IDREF integrity checks in  
RELAX NG. I was planning on doing it in Schematron or Java. Besides,  
general IDREF integrity checking does not check that, for example,  the 
form attribute references only form elements and not just any ids.


I would want that in the RelaxNG schema because there are editing tools
that hook into RelaxNG, but not many (or any besides validators) that can
hook into Schematron (Glazou, for example, is working on a RelaxNG-driven
editor.) RelaxNG /can/ do IDREF integrity checks. The part about form
attributes referencing only form elements can be checked by Schematron.
From an authoring standpoint, the *most* useful part of IDREF integrity
checking is to check against typos, not against misinterpretation of the
idref attribute's intent. :)

~fantasai


[whatwg] Textareas

2006-04-02 Thread Csaba Gabor
I've run into some issues with textareas and after checking
http://www.w3.org/Submission/web-forms2/ and seeing that the mailing list is 
active...

1. I always thought that cols, which has been around forever, was advisory 
regarding width, in
the sense that if there was no other overriding factor (CSS style settings), 
then cols would
dictate the width of the textarea, and that would be the end of it.
I further thought that .wrap=hard meant that the newlines that were sent to the 
server reflected
exactly what the user saw / how the text was arranged in the textarea in the 
sense that there
was exactly and only 1 newline (%0D%0A) exactly between each adjacent (possibly 
empty) pair of
lines.

Indeed, this is the behaviour that IE 6 exhibits on my Win XP Pro system.  So 
when I saw that
the linefeeds that FF was putting in were not reflecting what I was seeing on 
the screen I filed
a bug report against it.  Only afterwards did I come back to web-forms2 to 
review it, and was
shocked by what I read:

To paraphrase, it seems that cols is no longer simply advisory for (only) 
determining the width
of the textarea.  With .wrap=hard it says that the breaks should be dictated by 
the value of
cols (and that if cols doesn't exist, the breaks should be dictated by the size 
of the textarea
(I presume that's what display width means).  It further goes on to say that 
this is anyways
not a good thing because users have different size displays so everyone's 
wrapping position
would be different, defeating the purpose of client side wrapping).

Now, I am really wondering about that whole paragraph, my paraphrasing not 
withstanding.  First
of all, 'defeating the purpose of client side wrapping' begs the question of 
what is the purpose
of .wrap?  It's pretty clearly not something for the client side, right?, since 
there is no
visible difference to the user while working with a textarea (there would be if 
.wrap=off, but
that is not covered here).  So if it's not useful for the client, it should be 
useful for the
server.  How?

Ensuring that text is broken up every cols characters is a pretty trivial 
function (by this I
mean that it is trivial to do server side) that does not serve much purpose in 
having it done on
the client side.  On the other hand, it is very useful information to know what 
the user was
seeing so that what is processed on the server side has some correlation to 
what the user
submitted.  This is the dichotomy between how .wrap has been used so far - 
either delivering
what the user saw (.wrap=hard) or what the user intends to be seen 
(.wrap=soft), making it
useful for both sides.

You can't deduce server side what the user saw client side, even assuming you 
know how wide the
textarea is by any other means - the user might not even be using a fixed width 
font -
.wrap=hard serves a very useful purpose here.  And I don't see that the 
argument is any
different whether or not cols has been set.  In addition, there are millions of 
sites out there
with .cols set because it must be (because cols has been mandatory for so 
long).  These people
will be in for a rude awakening indeed, to find out that cols now means 
something completely
different.  But I think I'd be even more annoyed as a user, it being  as if I 
was chopped off in
mid

In short, I have outlined a compelling reason to have .wrap behave as it does 
on IE6 vs. a
passing comment about the purpose of client side wrapping.  In short, I am 
asking where this
most peculiar mandate about .wrap=hard came from and expressing my strong 
disagreement with what
I understand so far.


2.  As long as I am writing, I may as well ask about another textarea issue 
that has always
seemed strangely absent.  Why is it that there is no way to find out what will 
actually be
transmitted from a textarea.  Seems to me that the client (javascript) might be 
at least as
interested in this as the server cgi.  It would be useful to know what row and 
column one was on
in the textarea.  Of course, it is possible to know what row one is on within 
the .value
(because you can figure out where the caret is, and then count preceding \n's), 
but with
wrapping, these two are different, and you don't really know for sure where 
things are being
wrapped.  Therefore, it would be exceptionally useful to have something like 
.observedValue to
reflect what is being seen.  In IE, I think I can figure this out via some 
range monkeying about
(since they allow for rangeHeight (or something like that)), but I am stumped 
with Mozilla/FF.

Csaba Gabor from Vienna

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com