Re: Datatype for IRIs in RELAX NG

2006-04-28 Thread Henri Sivonen


On Mar 21, 2006, at 11:29, Julian Reschke wrote:

maybe it's time that *some* specification adds new datatypes that  
do *exactly* what RFC3986 and RFC3987 ask for :-)


I have a draft spec at
http://hsivonen.iki.fi/html5-datatypes/

A snapshot of the reference implementation (which currently lacks the  
'language' datatype) is available for testing as part of my  
validation service:

http://hsivonen.iki.fi/validator/

The reference implementation just wraps the IRI library from Jena  
(pulled from CVS; the new version does not ship in a Jena release, yet).


Disclaimer: The datatype library has undergone *very* little testing.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: Datatype for IRIs in RELAX NG

2006-03-21 Thread Martin Duerst


At 02:08 06/03/20, Elliotte Harold wrote:

I would recommend against using xsd:anyURI for IRIs. A URI is much more 
restrictive than an IRI, and one of the easiest things for a schema 
validator to check about an xsd:anyURI is that it only contains URI-legal 
ASCII characters.


This is indeed one of the easiest things, but it would be TOTALLY
wrong.

http://www.w3.org/TR/xmlschema-2/datatypes.html#anyURI says, among else:

   The mapping from anyURI values to URIs is as defined by the URI reference
   escaping procedure defined in Section 5.4 Locator Attribute of [XML
   Linking Language] (see also Section 8 Character Encoding in URI References
   of [Character Model]). This means that a wide range of internationalized
   resource identifiers can be specified when an anyURI is called for, and
   still be understood as URIs per [RFC 2396], as amended by [RFC 2732],
   where appropriate to identify resources.

If there is confusion in other venues about this issue, please help
to make sure it gets fixed.


Regards,Martin. 



Re: Datatype for IRIs in RELAX NG

2006-03-21 Thread Julian Reschke


Martin Duerst wrote:


At 02:08 06/03/20, Elliotte Harold wrote:
 
 I would recommend against using xsd:anyURI for IRIs. A URI is much 
more restrictive than an IRI, and one of the easiest things for a schema 
validator to check about an xsd:anyURI is that it only contains 
URI-legal ASCII characters.


This is indeed one of the easiest things, but it would be TOTALLY
wrong.

http://www.w3.org/TR/xmlschema-2/datatypes.html#anyURI says, among else:

   The mapping from anyURI values to URIs is as defined by the URI 
reference

   escaping procedure defined in Section 5.4 Locator Attribute of [XML
   Linking Language] (see also Section 8 Character Encoding in URI 
References

   of [Character Model]). This means that a wide range of internationalized
   resource identifiers can be specified when an anyURI is called for, and
   still be understood as URIs per [RFC 2396], as amended by [RFC 2732],
   where appropriate to identify resources.

If there is confusion in other venues about this issue, please help
to make sure it gets fixed.


Well,

maybe it's time that *some* specification adds new datatypes that do 
*exactly* what RFC3986 and RFC3987 ask for :-)


Best regards, Julian



Re: Datatype for IRIs in RELAX NG

2006-03-21 Thread Bjoern Hoehrmann

* Martin Duerst wrote:
At 02:30 06/03/20, Bjoern Hoehrmann wrote:

 In Schema 1.1 it is not possible for a xsd:string to be no xsd:anyURI.

Can you explain? It seems you are saying that all xsd:strings are
also xsd:anyURIs, but that seems going a bit too far.

Yes, that's exactly what the XML Schema 1.1 Last Call Working Draft
implies as far as I can tell.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)

2006-03-19 Thread Henri Sivonen


(Discussion started on atom-syntax, but this is a more general RELAX  
NG issue, so cross-posting to rng-users.)


On Mar 19, 2006, at 09:33, Martin Duerst wrote:


At 18:49 06/03/17, Bjoern Hoehrmann wrote:

* Martin Duerst wrote:
When looking with a microscope, you will find some little
differences, because xs:anyURI was described before the IRI
spec (RFC 3987) was approved. These differences are:

1) xs:aryURI also allows spaces and a few other ASCII characters
that are not allowed in URIs nor in IRIs (but the IRI spec has
an escape hatch for such cases).
2) The IRI spec contains many more details than the xs:anyURI
description, in particular also some requirements re.
normalization. However, some of the requirements in this
area of the IRI spec may be lowered or removed in the future
because we have received feedback from implementers that
there are difficulties to implement these.

I agree with Martin that it would be incorrect to use xsd:anyURI  
here.


Sorry, but I never said that it would be incorrect to use
xsd:anyURI. I personally think that it should be okay to
use xsd:anyURI. The differences are microscopic, and they should
become even smaller, or hopefully go away completely, over time.


I need datatypes for IRIs in general (relative, absolute or just  
fragment identifiers) and for absolute IRIs (possibly with a fragment  
id) in a RELAX NG schema.


Is it really the best practice to use xsd:anyURI and sweep the  
discrepancies under the rug in the hope that future definitions of  
xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be  
augmented with a regexp pattern to restrict spaces and a few other  
ASCII characters in such a way that the resulting datatype  
restriction matches the definition of IRI? Has anyone implemented a  
strictly correct IRI datatype in a Java datatype library (for Jing  
and MSV)?


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [rng-users] Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)

2006-03-19 Thread John Cowan

Henri Sivonen scripsit:

 Is it really the best practice to use xsd:anyURI and sweep the  
 discrepancies under the rug in the hope that future definitions of  
 xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be  
 augmented with a regexp pattern to restrict spaces and a few other  
 ASCII characters in such a way that the resulting datatype  
 restriction matches the definition of IRI? Has anyone implemented a  
 strictly correct IRI datatype in a Java datatype library (for Jing  
 and MSV)?

It's certainly possible to construct a regular expression, a long and complex
one, that will match all IRIs and only IRIs (note that IRI by itself
means absolute IRI with or without fragment identifier).  The question
is whether it's really worth doing so.  If you feel you need it,
by all means go ahead.

-- 
LEAR: Dost thou call me fool, boy?  John Cowan
FOOL: All thy other titles  http://www.ccil.org/~cowan
 thou hast given away:  [EMAIL PROTECTED]
  That thou wast born with. http://www.ap.org



Re: Datatype for IRIs in RELAX NG

2006-03-19 Thread Elliotte Harold


I would recommend against using xsd:anyURI for IRIs. A URI is much more 
restrictive than an IRI, and one of the easiest things for a schema 
validator to check about an xsd:anyURI is that it only contains 
URI-legal ASCII characters. I think a new type is necessary if you do 
want to allow IRIs instead of simple URIs. I suspect you could do it 
with a regular expression but the syntax would be really hairy.



--
Elliotte Rusty Harold  [EMAIL PROTECTED]
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim



Re: Datatype for IRIs in RELAX NG

2006-03-19 Thread Bjoern Hoehrmann

* Elliotte Harold wrote:
I would recommend against using xsd:anyURI for IRIs. A URI is much more 
restrictive than an IRI, and one of the easiest things for a schema 
validator to check about an xsd:anyURI is that it only contains 
URI-legal ASCII characters. I think a new type is necessary if you do 
want to allow IRIs instead of simple URIs. I suspect you could do it 
with a regular expression but the syntax would be really hairy.

In Schema 1.1 it is not possible for a xsd:string to be no xsd:anyURI.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/