Re: Atom syndication schema
At 18:49 06/03/17, Bjoern Hoehrmann wrote: * Martin Duerst wrote: When looking with a microscope, you will find some little differences, because xs:anyURI was described before the IRI spec (RFC 3987) was approved. These differences are: 1) xs:aryURI also allows spaces and a few other ASCII characters that are not allowed in URIs nor in IRIs (but the IRI spec has an escape hatch for such cases). 2) The IRI spec contains many more details than the xs:anyURI description, in particular also some requirements re. normalization. However, some of the requirements in this area of the IRI spec may be lowered or removed in the future because we have received feedback from implementers that there are difficulties to implement these. I agree with Martin that it would be incorrect to use xsd:anyURI here. Sorry, but I never said that it would be incorrect to use xsd:anyURI. I personally think that it should be okay to use xsd:anyURI. The differences are microscopic, and they should become even smaller, or hopefully go away completely, over time. It does not make sense to perpetuate minor differences for something that was and is supposed to be one and the same thing. Regards,Martin.
Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)
(Discussion started on atom-syntax, but this is a more general RELAX NG issue, so cross-posting to rng-users.) On Mar 19, 2006, at 09:33, Martin Duerst wrote: At 18:49 06/03/17, Bjoern Hoehrmann wrote: * Martin Duerst wrote: When looking with a microscope, you will find some little differences, because xs:anyURI was described before the IRI spec (RFC 3987) was approved. These differences are: 1) xs:aryURI also allows spaces and a few other ASCII characters that are not allowed in URIs nor in IRIs (but the IRI spec has an escape hatch for such cases). 2) The IRI spec contains many more details than the xs:anyURI description, in particular also some requirements re. normalization. However, some of the requirements in this area of the IRI spec may be lowered or removed in the future because we have received feedback from implementers that there are difficulties to implement these. I agree with Martin that it would be incorrect to use xsd:anyURI here. Sorry, but I never said that it would be incorrect to use xsd:anyURI. I personally think that it should be okay to use xsd:anyURI. The differences are microscopic, and they should become even smaller, or hopefully go away completely, over time. I need datatypes for IRIs in general (relative, absolute or just fragment identifiers) and for absolute IRIs (possibly with a fragment id) in a RELAX NG schema. Is it really the best practice to use xsd:anyURI and sweep the discrepancies under the rug in the hope that future definitions of xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be augmented with a regexp pattern to restrict spaces and a few other ASCII characters in such a way that the resulting datatype restriction matches the definition of IRI? Has anyone implemented a strictly correct IRI datatype in a Java datatype library (for Jing and MSV)? -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [rng-users] Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)
Henri Sivonen scripsit: Is it really the best practice to use xsd:anyURI and sweep the discrepancies under the rug in the hope that future definitions of xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be augmented with a regexp pattern to restrict spaces and a few other ASCII characters in such a way that the resulting datatype restriction matches the definition of IRI? Has anyone implemented a strictly correct IRI datatype in a Java datatype library (for Jing and MSV)? It's certainly possible to construct a regular expression, a long and complex one, that will match all IRIs and only IRIs (note that IRI by itself means absolute IRI with or without fragment identifier). The question is whether it's really worth doing so. If you feel you need it, by all means go ahead. -- LEAR: Dost thou call me fool, boy? John Cowan FOOL: All thy other titles http://www.ccil.org/~cowan thou hast given away: [EMAIL PROTECTED] That thou wast born with. http://www.ap.org
Re: Atom syndication schema
At 00:42 06/03/17, Norman Walsh wrote: / Thomas Broyer [EMAIL PROTECTED] was heard to say: | RFC 3987 says (section 1.2 Applicability): |For example, XML schema [XMLSchema] has an explicit type |anyURI that includes IRIs and IRI references. Therefore, IRIs |and IRI references can be in attributes and elements of type |anyURI. | So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI. Yes, I think that's the case. Some details (from the primary editor of RFC 3987 :-): From a mile high viewpoint, and even from much lower, it is definitely the case. From the viewpoint of intent, it is also definitely the case. When looking with a microscope, you will find some little differences, because xs:anyURI was described before the IRI spec (RFC 3987) was approved. These differences are: 1) xs:aryURI also allows spaces and a few other ASCII characters that are not allowed in URIs nor in IRIs (but the IRI spec has an escape hatch for such cases). 2) The IRI spec contains many more details than the xs:anyURI description, in particular also some requirements re. normalization. However, some of the requirements in this area of the IRI spec may be lowered or removed in the future because we have received feedback from implementers that there are difficulties to implement these. Regards,Martin.
Re: Atom syndication schema
* Martin Duerst wrote: When looking with a microscope, you will find some little differences, because xs:anyURI was described before the IRI spec (RFC 3987) was approved. These differences are: 1) xs:aryURI also allows spaces and a few other ASCII characters that are not allowed in URIs nor in IRIs (but the IRI spec has an escape hatch for such cases). 2) The IRI spec contains many more details than the xs:anyURI description, in particular also some requirements re. normalization. However, some of the requirements in this area of the IRI spec may be lowered or removed in the future because we have received feedback from implementers that there are difficulties to implement these. I agree with Martin that it would be incorrect to use xsd:anyURI here. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: Atom syndication schema
/ Thomas Broyer [EMAIL PROTECTED] was heard to say: | RFC 3987 says (section 1.2 Applicability): |For example, XML schema [XMLSchema] has an explicit type |anyURI that includes IRIs and IRI references. Therefore, IRIs |and IRI references can be in attributes and elements of type |anyURI. | So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI. Yes, I think that's the case. Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | Science is organized knowledge. Wisdom http://nwalsh.com/| is organized life.--Immanuel Kant pgpvRRoRXTkYs.pgp Description: PGP signature
Re: Atom syndication schema
David Powell wrote: Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. Did you cc me because of my coverage of the matter? http://copia.ogbuji.net/blog/2006-02-06/Small_fix_ If so, I think I said all I have to say about it there. My fixed RNG is still available. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/
Re: Atom syndication schema
Thursday, March 16, 2006, 7:31:08 PM, you wrote: David Powell wrote: Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. Did you cc me because of my coverage of the matter? http://copia.ogbuji.net/blog/2006-02-06/Small_fix_ If so, I think I said all I have to say about it there. My fixed RNG is still available. Er, I hadn't read the thread properly, and posted to it having independently discovered the same bug when I was doing some hard-core XSLT-ing of Atom. Doh. Could you post the errata to the rfc-editor, via: http://www.rfc-editor.org/errata.html -- Dave
Re: Atom syndication schema
On Tue, Mar 14, 2006 at 10:23:55PM -0800, Walter Underwood [EMAIL PROTECTED] wrote a message of 29 lines which said: xml:lang isn't enough information to sort out given name and family name. Yes! In many countries, the order changed. In France, one century ago Family-name Given-name was common in official papers but is now deprecated. In Algeria, a former french colony, the two usages still prevail (Matoub Lounes or Lounes Matoub?)
Re: Atom syndication schema
On Tue, Mar 14, 2006 at 10:36:36PM -0700, M. David Peterson [EMAIL PROTECTED] wrote a message of 43 lines which said: As long as your character set for any given feed is properly set, it seems to me then all the information necessary to properly decode the email and URI (in which the work continues to integrate support for non-latin based languages, such as Mandarin, etc... Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC 3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to be actually IRIs). The standard says: # Unconstrained; it's not entirely clear how IRI fit into # xsd:anyURI so let's not try to constrain it here atomUri = text if I understand things correctly, full support for Mandarin Chinese-based domains in not far off (speaking in terms of DNS support and such). It is quite old, RFC 3490 (issued three years ago and implemented even before). email adresses encoded as mentioned There is not yet any standard for Unicode email addresses (work is going on, see the very recent IETF Working Group EAI http://www.ietf.org/html.charters/eai-charter.html).
Re: Atom syndication schema
Excellent! Thanks for the info :) I will have to go back and see what exactly the article I read was refering to, but if I remember correctly they seemed to mention that support for the top level domains, specifically .com, .net, and .cn had not been implemented using the Mandarin character set. Either way, obviously there are a lot of considerations that need to be allowed for when developing from an interanational perspective, something that should be part of the defaul process instead of the exception... of course, I say this mainly because this is really the first time I have put any real thought into this area, and obviously I need to put a lot more. Thanks again for clearing things up and providing the links! On 3/15/06, Stephane Bortzmeyer [EMAIL PROTECTED] wrote: On Tue, Mar 14, 2006 at 10:36:36PM -0700, M. David Peterson [EMAIL PROTECTED] wrote a message of 43 lines which said: As long as your character set for any given feed is properly set, it seems to me then all the information necessary to properly decode the email and URI (in which the work continues to integrate support for non-latin based languages, such as Mandarin, etc... Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC 3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to be actually IRIs). The standard says: # Unconstrained; it's not entirely clear how IRI fit into # xsd:anyURI so let's not try to constrain it here atomUri = text if I understand things correctly, full support for Mandarin Chinese-based domains in not far off (speaking in terms of DNS support and such). It is quite old, RFC 3490 (issued three years ago and implemented even before). email adresses encoded as mentioned There is not yet any standard for Unicode email addresses (work is going on, see the very recent IETF Working Group EAI http://www.ietf.org/html.charters/eai-charter.html). -- M:D/ M. David Peterson http://www.xsltblog.com/
Re: Atom syndication schema
Wednesday, March 15, 2006, 3:21:08 AM, Martin Duerst wrote: For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. The spec says that Any element defined by this specification MAY have an xml:lang attribute. We chose to limit the effects of xml:lang, rather than the occurrence of it. Eg: atom:published is allowed xml:lang, even though it is meaningless. The spec includes a sentence about element xxx being Language-Sensitive when we consider the language to be relevant. The idea is, if a feed reading framework such as Microsoft's Windows/IE7 feed platform doesn't preserve xml:lang on elements that aren't Language-Sensitive, then they are doing nothing wrong. Same for, eg: an Atom publishing server backed by a legacy CMS. While these often contain pieces from one language or another, they are not really in a language. I agree. Note that this is the case in Atom, because those two elements are not Language-Sensitive. Also note, that atom:uri is an IRI-reference, so it is affected by any xml:base attributes on that element. And that atomCommonAttributes also covers extension attributes, which are also allowed anywhere. They are undefined, which *I* think means that implementations need not feel bad about dropping them on the floor. The official meaning is, er, undefined. -- Dave
Re: Atom syndication schema
Also note, that atom:uri is an IRI-reference, so it is affected by any xml:base attributes on that element. Until now, I had no idea this was the case... WOW!!! Amazing the things you can learn around people who know what theyre talking about. :) On 3/15/06, David Powell [EMAIL PROTECTED] wrote: Wednesday, March 15, 2006, 3:21:08 AM, Martin Duerst wrote: For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. The spec says that Any element defined by this specification MAY have an xml:lang attribute. We chose to limit the effects of xml:lang, rather than the occurrence of it. Eg: atom:published is allowed xml:lang, even though it is meaningless. The spec includes a sentence about element xxx being Language-Sensitive when we consider the language to be relevant. The idea is, if a feed reading framework such as Microsoft's Windows/IE7 feed platform doesn't preserve xml:lang on elements that aren't Language-Sensitive, then they are doing nothing wrong. Same for, eg: an Atom publishing server backed by a legacy CMS. While these often contain pieces from one language or another, they are not really in a language. I agree. Note that this is the case in Atom, because those two elements are not Language-Sensitive. Also note, that atom:uri is an IRI-reference, so it is affected by any xml:base attributes on that element. And that atomCommonAttributes also covers extension attributes, which are also allowed anywhere. They are undefined, which *I* think means that implementations need not feel bad about dropping them on the floor. The official meaning is, er, undefined. -- Dave -- M:D/ M. David Peterson http://www.xsltblog.com/
Re: Atom syndication schema
2006/3/15, Stephane Bortzmeyer [EMAIL PROTECTED]: Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC 3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to be actually IRIs). The standard says: Well, not really the standard actually, since the RNC is not normative... # Unconstrained; it's not entirely clear how IRI fit into # xsd:anyURI so let's not try to constrain it here atomUri = text RFC 3987 says (section 1.2 Applicability): For example, XML schema [XMLSchema] has an explicit type anyURI that includes IRIs and IRI references. Therefore, IRIs and IRI references can be in attributes and elements of type anyURI. So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI. ...or RFC 3987 is wrong... (I didn't check XMLSchema to try to figure it out myself) -- Thomas Broyer
Re: Atom syndication schema
Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. -- Dave
Re: Atom syndication schema
At 08:42 06/03/15, David Powell wrote: Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. While these often contain pieces from one language or another, they are not really in a language. Regards, Martin.
Re: Atom syndication schema
On 15/3/06 2:21 PM, Martin Duerst [EMAIL PROTECTED] wrote: Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. While these often contain pieces from one language or another, they are not really in a language. Since the original discussion I've stumbled across something extra that makes xml:lang relevant for atom:name. Seems that in writing Hungarian names, the pattern is always surname followed by forename - e.g. Bartók Béla, where Béla is the personal name and Bartók is the family name. While common western names (eg. Eric Scheid) would be indexed as Scheid, Eric; a comma is instead simply added between the Hungarian surname and forename, making Hungarian names indistinguishable from other Western-style names. For example: Bartók Béla is indexed as Bartók, Béla. Icelandic names are another game altogether. e.
Re: Atom syndication schema
For Latin-based languages, your point is well taken. For non-latin, its all about character sets. As long as your character set for any given feed is properly set, it seems to me then all the information necessary to properly decode the email and URI (in which the work continues to integrate support for non-latin based languages, such as Mandarin, etc... if I understand things correctly, full support for Mandarin Chinese-based domains in not far off (speaking in terms of DNS support and such). Actually, the only reason for writing this response was to point out the fact that we are entering a world in which China will continue to play a dominant role in both our online and offline worlds, so beginning to learn as much as possible in terms of how to properly handle URI's and email adresses encoded as mentioned seems like it would be a pretty good idea. Couldnt hurt. :) On 3/14/06, Martin Duerst [EMAIL PROTECTED] wrote: At 08:42 06/03/15, David Powell wrote: Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept atomCommonAttributes (eg xml:lang) on the atom:name and atom:uri and atom:email elements used within Person constructs. For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. While these often contain pieces from one language or another, they are not really in a language. Regards, Martin. -- M:D/ M. David Peterson http://www.xsltblog.com/
Re: Atom syndication schema
--On March 15, 2006 4:25:40 PM +1100 Eric Scheid [EMAIL PROTECTED] wrote: Since the original discussion I've stumbled across something extra that makes xml:lang relevant for atom:name. Seems that in writing Hungarian names, the pattern is always surname followed by forename - e.g. Bartók Béla, where Béla is the personal name and Bartók is the family name. Or Margittai Neumann János vs. John von Neumann. It can be more complicated than first/last or last/first. I'm pretty sure that I brought this up and the WG decided to punt. Representing personal names well means starting with X.500 and asking around to see what could be improved. That is well outside the Atom charter. Punting was the right thing to do, but it means that atom:name is minimal. xml:lang isn't enough information to sort out given name and family name. About all you can do with atom:name is print it out. xml:lang could be useful in deciding between Chinese and Japanese variants of a character for names. wunder -- Walter Underwood Principal Software Architect, Autonomy
Re: Atom syndication schema
Norman Walsh wrote: I recall a thread not too long ago about changes to the Atom schema and Uche has pointed out some deficiencies http://copia.ogbuji.net/blog/2006-02-06/Small_fix_ I'd be happy to tweak the schema and try to address these bugs, but does the group have any desire to see a WG endorsed set of fixes published? And if it does, where should they be published? I didn't put my thoughts through the RFC errata process because the main bug I saw was in the non-normative schema, and anyway I don't know that there is a full RELAX NG file corresponding to the full RFC (as opposed to the revision 11 I-D). I figured I might as well host such an RNG, and while at it I might as well bring the RNG a bit more in line with the spec wording. I didn't know whether that was something that could be considered a formal erratum. There other things I brought up in my Weblog I do believe are nits in Atom, but I don't think they rise to the level of actual errata. If there is some aspect of my comments that folks do think is worthy of a formal erratum, let me know and I'll do what I can to submit it. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.nethttp://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/
Atom syndication schema
I recall a thread not too long ago about changes to the Atom schema and Uche has pointed out some deficiencies http://copia.ogbuji.net/blog/2006-02-06/Small_fix_ I'd be happy to tweak the schema and try to address these bugs, but does the group have any desire to see a WG endorsed set of fixes published? And if it does, where should they be published? Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | A man may by custom fortify himself http://nwalsh.com/| against pain, shame, and suchlike | accidents; but as to death, we can | experience it but once, and are all | apprentices when we come to it.-- | Montaigne pgpi92tSP4Zgk.pgp Description: PGP signature
RE: Atom syndication schema
-Original Message- From: Norman Walsh [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 1:08 PM To: Atom Syntax Subject: Atom syndication schema I recall a thread not too long ago about changes to the Atom schema and Uche has pointed out some deficiencies http://copia.ogbuji.net/blog/2006-02-06/Small_fix_ I'd be happy to tweak the schema and try to address these bugs, but does the group have any desire to see a WG endorsed set of fixes published? And if it does, where should they be published? Errors in RFCs can be captured as errata by sending a note to the RFC Editor: http://www.rfc-editor.org/errata.html The idea is that they should eventually get taken care of by an update to the RFC. -Scott-
Re: Atom syndication schema
* Norman Walsh [EMAIL PROTECTED] [2006-02-10 19:20]: does the group have any desire to see a WG endorsed set of fixes published? +1, FWIC. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/