Re: Atom syndication schema

2006-03-19 Thread Martin Duerst


At 18:49 06/03/17, Bjoern Hoehrmann wrote:

* Martin Duerst wrote:
When looking with a microscope, you will find some little
differences, because xs:anyURI was described before the IRI
spec (RFC 3987) was approved. These differences are:

1) xs:aryURI also allows spaces and a few other ASCII characters
that are not allowed in URIs nor in IRIs (but the IRI spec has
an escape hatch for such cases).
2) The IRI spec contains many more details than the xs:anyURI
description, in particular also some requirements re.
normalization. However, some of the requirements in this
area of the IRI spec may be lowered or removed in the future
because we have received feedback from implementers that
there are difficulties to implement these.

I agree with Martin that it would be incorrect to use xsd:anyURI here.

Sorry, but I never said that it would be incorrect to use
xsd:anyURI. I personally think that it should be okay to
use xsd:anyURI. The differences are microscopic, and they should
become even smaller, or hopefully go away completely, over time.
It does not make sense to perpetuate minor differences for
something that was and is supposed to be one and the same
thing.

Regards,Martin. 



Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)

2006-03-19 Thread Henri Sivonen


(Discussion started on atom-syntax, but this is a more general RELAX  
NG issue, so cross-posting to rng-users.)


On Mar 19, 2006, at 09:33, Martin Duerst wrote:


At 18:49 06/03/17, Bjoern Hoehrmann wrote:

* Martin Duerst wrote:
When looking with a microscope, you will find some little
differences, because xs:anyURI was described before the IRI
spec (RFC 3987) was approved. These differences are:

1) xs:aryURI also allows spaces and a few other ASCII characters
that are not allowed in URIs nor in IRIs (but the IRI spec has
an escape hatch for such cases).
2) The IRI spec contains many more details than the xs:anyURI
description, in particular also some requirements re.
normalization. However, some of the requirements in this
area of the IRI spec may be lowered or removed in the future
because we have received feedback from implementers that
there are difficulties to implement these.

I agree with Martin that it would be incorrect to use xsd:anyURI  
here.


Sorry, but I never said that it would be incorrect to use
xsd:anyURI. I personally think that it should be okay to
use xsd:anyURI. The differences are microscopic, and they should
become even smaller, or hopefully go away completely, over time.


I need datatypes for IRIs in general (relative, absolute or just  
fragment identifiers) and for absolute IRIs (possibly with a fragment  
id) in a RELAX NG schema.


Is it really the best practice to use xsd:anyURI and sweep the  
discrepancies under the rug in the hope that future definitions of  
xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be  
augmented with a regexp pattern to restrict spaces and a few other  
ASCII characters in such a way that the resulting datatype  
restriction matches the definition of IRI? Has anyone implemented a  
strictly correct IRI datatype in a Java datatype library (for Jing  
and MSV)?


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [rng-users] Datatype for IRIs in RELAX NG (was: Re: Atom syndication schema)

2006-03-19 Thread John Cowan

Henri Sivonen scripsit:

 Is it really the best practice to use xsd:anyURI and sweep the  
 discrepancies under the rug in the hope that future definitions of  
 xsd:anyURI change the meaning of the schema later? Can xsd:anyURI be  
 augmented with a regexp pattern to restrict spaces and a few other  
 ASCII characters in such a way that the resulting datatype  
 restriction matches the definition of IRI? Has anyone implemented a  
 strictly correct IRI datatype in a Java datatype library (for Jing  
 and MSV)?

It's certainly possible to construct a regular expression, a long and complex
one, that will match all IRIs and only IRIs (note that IRI by itself
means absolute IRI with or without fragment identifier).  The question
is whether it's really worth doing so.  If you feel you need it,
by all means go ahead.

-- 
LEAR: Dost thou call me fool, boy?  John Cowan
FOOL: All thy other titles  http://www.ccil.org/~cowan
 thou hast given away:  [EMAIL PROTECTED]
  That thou wast born with. http://www.ap.org



Re: Atom syndication schema

2006-03-17 Thread Martin Duerst


At 00:42 06/03/17, Norman Walsh wrote:
/ Thomas Broyer [EMAIL PROTECTED] was heard to say:
| RFC 3987 says (section 1.2 Applicability):
|For example, XML schema [XMLSchema] has an explicit type
|anyURI that includes IRIs and IRI references. Therefore, IRIs
|and IRI references can be in attributes and elements of type
|anyURI.

| So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI.

Yes, I think that's the case.

Some details (from the primary editor of RFC 3987 :-):
From a mile high viewpoint, and even from much lower,
it is definitely the case. From the viewpoint of intent, it is
also definitely the case.

When looking with a microscope, you will find some little
differences, because xs:anyURI was described before the IRI
spec (RFC 3987) was approved. These differences are:

1) xs:aryURI also allows spaces and a few other ASCII characters
   that are not allowed in URIs nor in IRIs (but the IRI spec has
   an escape hatch for such cases).
2) The IRI spec contains many more details than the xs:anyURI
   description, in particular also some requirements re.
   normalization. However, some of the requirements in this
   area of the IRI spec may be lowered or removed in the future
   because we have received feedback from implementers that
   there are difficulties to implement these.

Regards,Martin. 



Re: Atom syndication schema

2006-03-17 Thread Bjoern Hoehrmann

* Martin Duerst wrote:
When looking with a microscope, you will find some little
differences, because xs:anyURI was described before the IRI
spec (RFC 3987) was approved. These differences are:

1) xs:aryURI also allows spaces and a few other ASCII characters
that are not allowed in URIs nor in IRIs (but the IRI spec has
an escape hatch for such cases).
2) The IRI spec contains many more details than the xs:anyURI
description, in particular also some requirements re.
normalization. However, some of the requirements in this
area of the IRI spec may be lowered or removed in the future
because we have received feedback from implementers that
there are difficulties to implement these.

I agree with Martin that it would be incorrect to use xsd:anyURI here.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: Atom syndication schema

2006-03-16 Thread Norman Walsh
/ Thomas Broyer [EMAIL PROTECTED] was heard to say:
| RFC 3987 says (section 1.2 Applicability):
|For example, XML schema [XMLSchema] has an explicit type
|anyURI that includes IRIs and IRI references. Therefore, IRIs
|and IRI references can be in attributes and elements of type
|anyURI.

| So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI.

Yes, I think that's the case.

Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED] | Science is organized knowledge. Wisdom
http://nwalsh.com/| is organized life.--Immanuel Kant


pgpvRRoRXTkYs.pgp
Description: PGP signature


Re: Atom syndication schema

2006-03-16 Thread Uche Ogbuji


David Powell wrote:

Not sure if this is a known bug, but I just noticed that the RelaxNG
grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
atom:name and atom:uri and atom:email elements used within
Person constructs.


Did you cc me because of my coverage of the matter?

http://copia.ogbuji.net/blog/2006-02-06/Small_fix_

If so, I think I said all I have to say about it there.  My fixed RNG is 
still available.



--
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/



Re: Atom syndication schema

2006-03-16 Thread David Powell


Thursday, March 16, 2006, 7:31:08 PM, you wrote:

 David Powell wrote:
 Not sure if this is a known bug, but I just noticed that the RelaxNG
 grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
 atom:name and atom:uri and atom:email elements used within
 Person constructs.

 Did you cc me because of my coverage of the matter?

 http://copia.ogbuji.net/blog/2006-02-06/Small_fix_

 If so, I think I said all I have to say about it there.  My fixed RNG is
 still available.

Er, I hadn't read the thread properly, and posted to it having
independently discovered the same bug when I was doing some hard-core
XSLT-ing of Atom.  Doh.

Could you post the errata to the rfc-editor, via:
http://www.rfc-editor.org/errata.html


-- 
Dave



Re: Atom syndication schema

2006-03-15 Thread Stephane Bortzmeyer

On Tue, Mar 14, 2006 at 10:23:55PM -0800,
 Walter Underwood [EMAIL PROTECTED] wrote 
 a message of 29 lines which said:

 xml:lang isn't enough information to sort out given name and family
 name.

Yes! In many countries, the order changed. In France, one century ago
Family-name Given-name was common in official papers but is now
deprecated. In Algeria, a former french colony, the two usages still
prevail (Matoub Lounes or Lounes Matoub?)



Re: Atom syndication schema

2006-03-15 Thread Stephane Bortzmeyer

On Tue, Mar 14, 2006 at 10:36:36PM -0700,
 M. David Peterson [EMAIL PROTECTED] wrote 
 a message of 43 lines which said:

 As long as your character set for any given feed is properly set, it
 seems to me then all the information necessary to properly decode
 the email and URI (in which the work continues to integrate support
 for non-latin based languages, such as Mandarin, etc...

Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC
3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to
be actually IRIs). The standard says:

   # Unconstrained; it's not entirely clear how IRI fit into
   # xsd:anyURI so let's not try to constrain it here
   atomUri = text

 if I understand things correctly, full support for Mandarin
 Chinese-based domains in not far off (speaking in terms of DNS
 support and such).

It is quite old, RFC 3490 (issued three years ago and implemented even
before).

 email adresses encoded as mentioned 

There is not yet any standard for Unicode email addresses (work is
going on, see the very recent IETF Working Group EAI
http://www.ietf.org/html.charters/eai-charter.html).



Re: Atom syndication schema

2006-03-15 Thread M. David Peterson

Excellent!  Thanks for the info :)

I will have to go back and see what exactly the article I read was
refering to, but if I remember correctly they seemed to mention that
support for the top level domains, specifically .com, .net, and .cn
had not been implemented using the Mandarin character set.

Either way, obviously there are a lot of considerations that need to
be allowed for when developing from an interanational perspective,
something that should be part of the defaul process instead of the
exception... of course, I say this mainly because this is really the
first time I have put any real thought into this area, and obviously I
need to put a lot more.

Thanks again for clearing things up and providing the links!

On 3/15/06, Stephane Bortzmeyer [EMAIL PROTECTED] wrote:
 On Tue, Mar 14, 2006 at 10:36:36PM -0700,
  M. David Peterson [EMAIL PROTECTED] wrote
  a message of 43 lines which said:

  As long as your character set for any given feed is properly set, it
  seems to me then all the information necessary to properly decode
  the email and URI (in which the work continues to integrate support
  for non-latin based languages, such as Mandarin, etc...

 Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC
 3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to
 be actually IRIs). The standard says:

   # Unconstrained; it's not entirely clear how IRI fit into
   # xsd:anyURI so let's not try to constrain it here
   atomUri = text

  if I understand things correctly, full support for Mandarin
  Chinese-based domains in not far off (speaking in terms of DNS
  support and such).

 It is quite old, RFC 3490 (issued three years ago and implemented even
 before).

  email adresses encoded as mentioned

 There is not yet any standard for Unicode email addresses (work is
 going on, see the very recent IETF Working Group EAI
 http://www.ietf.org/html.charters/eai-charter.html).



--
M:D/

M. David Peterson
http://www.xsltblog.com/



Re: Atom syndication schema

2006-03-15 Thread David Powell


Wednesday, March 15, 2006, 3:21:08 AM, Martin Duerst wrote:

 For atom:uri and atom:email at least, not having xml:lang may
 be seen as a feature.

The spec says that Any element defined by this specification MAY have
an xml:lang attribute. We chose to limit the effects of xml:lang,
rather than the occurrence of it. Eg: atom:published is allowed
xml:lang, even though it is meaningless. The spec includes a sentence
about element xxx being Language-Sensitive when we consider the
language to be relevant. The idea is, if a feed reading framework such
as Microsoft's Windows/IE7 feed platform doesn't preserve xml:lang on
elements that aren't Language-Sensitive, then they are doing nothing
wrong. Same for, eg: an Atom publishing server backed by a legacy CMS.

 While these often contain pieces from one language or another, they
 are not really in a language.

I agree. Note that this is the case in Atom, because those two
elements are not Language-Sensitive.


Also note, that atom:uri is an IRI-reference, so it is affected by any
xml:base attributes on that element.

And that atomCommonAttributes also covers extension attributes, which
are also allowed anywhere. They are undefined, which *I* think means
that implementations need not feel bad about dropping them on the
floor. The official meaning is, er, undefined.


-- 
Dave



Re: Atom syndication schema

2006-03-15 Thread M. David Peterson

 Also note, that atom:uri is an IRI-reference, so it is affected by any
xml:base attributes on that element. 

Until now, I had no idea this was the case... WOW!!! Amazing the
things you can learn around people who know what theyre talking about.
:)


On 3/15/06, David Powell [EMAIL PROTECTED] wrote:


 Wednesday, March 15, 2006, 3:21:08 AM, Martin Duerst wrote:

  For atom:uri and atom:email at least, not having xml:lang may
  be seen as a feature.

 The spec says that Any element defined by this specification MAY have
 an xml:lang attribute. We chose to limit the effects of xml:lang,
 rather than the occurrence of it. Eg: atom:published is allowed
 xml:lang, even though it is meaningless. The spec includes a sentence
 about element xxx being Language-Sensitive when we consider the
 language to be relevant. The idea is, if a feed reading framework such
 as Microsoft's Windows/IE7 feed platform doesn't preserve xml:lang on
 elements that aren't Language-Sensitive, then they are doing nothing
 wrong. Same for, eg: an Atom publishing server backed by a legacy CMS.

  While these often contain pieces from one language or another, they
  are not really in a language.

 I agree. Note that this is the case in Atom, because those two
 elements are not Language-Sensitive.


 Also note, that atom:uri is an IRI-reference, so it is affected by any
 xml:base attributes on that element.

 And that atomCommonAttributes also covers extension attributes, which
 are also allowed anywhere. They are undefined, which *I* think means
 that implementations need not feel bad about dropping them on the
 floor. The official meaning is, er, undefined.


 --
 Dave




--
M:D/

M. David Peterson
http://www.xsltblog.com/



Re: Atom syndication schema

2006-03-15 Thread Thomas Broyer

2006/3/15, Stephane Bortzmeyer [EMAIL PROTECTED]:
 Just to be pedantic, URIs (RFC 3986) are in pure US-ASCII. IRIs (RFC
 3987) are in Unicode and are accepted by Atom (so, Atom's URIs seem to
 be actually IRIs). The standard says:

Well, not really the standard actually, since the RNC is not normative...

# Unconstrained; it's not entirely clear how IRI fit into
# xsd:anyURI so let's not try to constrain it here
atomUri = text

RFC 3987 says (section 1.2 Applicability):
   For example, XML schema [XMLSchema] has an explicit type
   anyURI that includes IRIs and IRI references. Therefore, IRIs
   and IRI references can be in attributes and elements of type
   anyURI.

So, actually, it seems that the Atom RNC could say atomUri = xs:anyURI.

...or RFC 3987 is wrong... (I didn't check XMLSchema to try to figure
it out myself)

--
Thomas Broyer



Re: Atom syndication schema

2006-03-14 Thread David Powell


Not sure if this is a known bug, but I just noticed that the RelaxNG
grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
atom:name and atom:uri and atom:email elements used within
Person constructs.

-- 
Dave



Re: Atom syndication schema

2006-03-14 Thread Martin Duerst


At 08:42 06/03/15, David Powell wrote:


Not sure if this is a known bug, but I just noticed that the RelaxNG
grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
atom:name and atom:uri and atom:email elements used within
Person constructs.

For atom:uri and atom:email at least, not having xml:lang may
be seen as a feature. While these often contain pieces from one
language or another, they are not really in a language.

Regards,   Martin. 



Re: Atom syndication schema

2006-03-14 Thread Eric Scheid

On 15/3/06 2:21 PM, Martin Duerst [EMAIL PROTECTED] wrote:

 Not sure if this is a known bug, but I just noticed that the RelaxNG
 grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
 atom:name and atom:uri and atom:email elements used within
 Person constructs.
 
 For atom:uri and atom:email at least, not having xml:lang may
 be seen as a feature. While these often contain pieces from one
 language or another, they are not really in a language.

Since the original discussion I've stumbled across something extra that
makes xml:lang relevant for atom:name.

Seems that in writing Hungarian names, the pattern is always surname
followed by forename - e.g. Bartók Béla, where Béla is the personal name and
Bartók is the family name.

While common western names (eg. Eric Scheid) would be indexed as Scheid,
Eric; a comma is instead simply added between the Hungarian surname and
forename, making Hungarian names indistinguishable from other Western-style
names. For example: Bartók Béla is indexed as Bartók, Béla.

Icelandic names are another game altogether.

e.




Re: Atom syndication schema

2006-03-14 Thread M. David Peterson

For Latin-based languages, your point is well taken. For non-latin,
its all about character sets.  As long as your character set for any
given feed is properly set, it seems to me then all the information
necessary to properly decode the email and URI (in which the work
continues to  integrate support for non-latin based languages, such as
Mandarin, etc... if I understand things correctly, full support for
Mandarin Chinese-based domains in not far off (speaking in terms of
DNS support and such).

Actually, the only reason for writing this response was to point out
the fact that we are entering a world in which China will continue to
play a dominant role in both our online and offline worlds, so
beginning to learn as much as possible in terms of how to properly
handle URI's and email adresses encoded as mentioned seems like it
would be a pretty good idea.

Couldnt hurt. :)

On 3/14/06, Martin Duerst [EMAIL PROTECTED] wrote:

 At 08:42 06/03/15, David Powell wrote:
  
  
  Not sure if this is a known bug, but I just noticed that the RelaxNG
  grammar doesn't accept atomCommonAttributes (eg xml:lang) on the
  atom:name and atom:uri and atom:email elements used within
  Person constructs.

 For atom:uri and atom:email at least, not having xml:lang may
 be seen as a feature. While these often contain pieces from one
 language or another, they are not really in a language.

 Regards,   Martin.




--
M:D/

M. David Peterson
http://www.xsltblog.com/



Re: Atom syndication schema

2006-03-14 Thread Walter Underwood

--On March 15, 2006 4:25:40 PM +1100 Eric Scheid [EMAIL PROTECTED] wrote:

 Since the original discussion I've stumbled across something extra that
 makes xml:lang relevant for atom:name.
 
 Seems that in writing Hungarian names, the pattern is always surname
 followed by forename - e.g. Bartók Béla, where Béla is the personal name and
 Bartók is the family name.

Or Margittai Neumann János vs. John von Neumann. It can be more complicated
than first/last or last/first.

I'm pretty sure that I brought this up and the WG decided to punt.

Representing personal names well means starting with X.500 and asking
around to see what could be improved. That is well outside the Atom charter.
Punting was the right thing to do, but it means that atom:name is minimal.

xml:lang isn't enough information to sort out given name and family name.
About all you can do with atom:name is print it out.

xml:lang could be useful in deciding between Chinese and Japanese variants
of a character for names. 

wunder
--
Walter Underwood
Principal Software Architect, Autonomy



Re: Atom syndication schema

2006-02-14 Thread Uche Ogbuji

Norman Walsh wrote:
 I recall a thread not too long ago about changes to the Atom schema
 and Uche has pointed out some deficiencies

   http://copia.ogbuji.net/blog/2006-02-06/Small_fix_

 I'd be happy to tweak the schema and try to address these bugs, but
 does the group have any desire to see a WG endorsed set of fixes
 published? And if it does, where should they be published?
   

I didn't put my thoughts through the RFC errata process because the main
bug I saw was in the non-normative schema, and anyway I don't know that
there is a full RELAX NG file corresponding to the full RFC (as opposed
to the revision 11 I-D).  I figured I might as well host such an RNG,
and while at it I might as well bring the RNG a bit more in line with
the spec wording.  I didn't know whether that was something that could
be considered a formal erratum.

There other things I brought up in my Weblog I do believe are nits in
Atom, but I don't think they rise to the level of actual errata.

If there is some aspect of my comments that folks do think is worthy of
a formal erratum, let me know and I'll do what I can to submit it.


-- 
Uche Ogbuji   Fourthought, Inc.
http://uche.ogbuji.nethttp://fourthought.com
http://copia.ogbuji.net   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/



Atom syndication schema

2006-02-10 Thread Norman Walsh
I recall a thread not too long ago about changes to the Atom schema
and Uche has pointed out some deficiencies

  http://copia.ogbuji.net/blog/2006-02-06/Small_fix_

I'd be happy to tweak the schema and try to address these bugs, but
does the group have any desire to see a WG endorsed set of fixes
published? And if it does, where should they be published?

Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED] | A man may by custom fortify himself
http://nwalsh.com/| against pain, shame, and suchlike
  | accidents; but as to death, we can
  | experience it but once, and are all
  | apprentices when we come to it.--
  | Montaigne


pgpi92tSP4Zgk.pgp
Description: PGP signature


RE: Atom syndication schema

2006-02-10 Thread Scott Hollenbeck

 -Original Message-
 From: Norman Walsh [mailto:[EMAIL PROTECTED] 
 Sent: Friday, February 10, 2006 1:08 PM
 To: Atom Syntax
 Subject: Atom syndication schema
 
 I recall a thread not too long ago about changes to the Atom 
 schema and Uche has pointed out some deficiencies
 
   http://copia.ogbuji.net/blog/2006-02-06/Small_fix_
 
 I'd be happy to tweak the schema and try to address these 
 bugs, but does the group have any desire to see a WG 
 endorsed set of fixes published? And if it does, where 
 should they be published?

Errors in RFCs can be captured as errata by sending a note to the RFC
Editor:

http://www.rfc-editor.org/errata.html

The idea is that they should eventually get taken care of by an update to
the RFC.

-Scott-



Re: Atom syndication schema

2006-02-10 Thread A. Pagaltzis

* Norman Walsh [EMAIL PROTECTED] [2006-02-10 19:20]:
does the group have any desire to see a WG endorsed set of
fixes published?

+1, FWIC.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/