Re: Atom syndication schema
--On March 15, 2006 4:25:40 PM +1100 Eric Scheid <[EMAIL PROTECTED]> wrote: > Since the original discussion I've stumbled across something extra that > makes xml:lang relevant for atom:name. > > Seems that in writing Hungarian names, the pattern is always surname > followed by forename - e.g. Bartók Béla, where Béla is the personal name and > Bartók is the family name. Or Margittai Neumann János vs. John von Neumann. It can be more complicated than first/last or last/first. I'm pretty sure that I brought this up and the WG decided to punt. Representing personal names well means starting with X.500 and asking around to see what could be improved. That is well outside the Atom charter. Punting was the right thing to do, but it means that atom:name is minimal. xml:lang isn't enough information to sort out given name and family name. About all you can do with atom:name is print it out. xml:lang could be useful in deciding between Chinese and Japanese variants of a character for names. wunder -- Walter Underwood Principal Software Architect, Autonomy
Re: Atom syndication schema
Hmmm... some interesting points you've brought out. Q: Does anyone know how many of the languages that fall into a similar category, can be determined by character set alone? I realize that theres no way that every single case could be covered using the character set alone, and as such realize this does present an area of considerable thought. Unless, of course, someone already knows the answer, or better said, how does one properly handle the various situations where te character set in and of itself doesn't provide enough information. Can a parent element with an xml:lang attribute be enough? It seems that in the case of name, email and uri, the containing author element, which as long as I'm not mistaken, does allow xml:lang should be enough to make the assumption that the children elements also whould be treated as the same language specified in the value of this attribute. The only element (I think) that might be of concern is the name element, as the email and URI should be handled with the character encoding. Does any of this even sound remotely on target? On 3/14/06, Eric Scheid <[EMAIL PROTECTED]> wrote: > > On 15/3/06 2:21 PM, "Martin Duerst" <[EMAIL PROTECTED]> wrote: > > >> Not sure if this is a known bug, but I just noticed that the RelaxNG > >> grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the > >> "atom:name" and "atom:uri" and "atom:email" elements used within > >> Person constructs. > > > > For atom:uri and atom:email at least, not having xml:lang may > > be seen as a feature. While these often contain pieces from one > > language or another, they are not really in a language. > > Since the original discussion I've stumbled across something extra that > makes xml:lang relevant for atom:name. > > Seems that in writing Hungarian names, the pattern is always surname > followed by forename - e.g. Bartók Béla, where Béla is the personal name and > Bartók is the family name. > > While common western names (eg. Eric Scheid) would be indexed as Scheid, > Eric; a comma is instead simply added between the Hungarian surname and > forename, making Hungarian names indistinguishable from other Western-style > names. For example: Bartók Béla is indexed as Bartók, Béla. > > Icelandic names are another game altogether. > > e. > > > -- M. David Peterson http://www.xsltblog.com/
Re: Atom syndication schema
For Latin-based languages, your point is well taken. For non-latin, its all about character sets. As long as your character set for any given feed is properly set, it seems to me then all the information necessary to properly decode the email and URI (in which the work continues to integrate support for non-latin based languages, such as Mandarin, etc... if I understand things correctly, full support for Mandarin Chinese-based domains in not far off (speaking in terms of DNS support and such). Actually, the only reason for writing this response was to point out the fact that we are entering a world in which China will continue to play a dominant role in both our online and offline worlds, so beginning to learn as much as possible in terms of how to properly handle URI's and email adresses encoded as mentioned seems like it would be a pretty good idea. Couldnt hurt. :) On 3/14/06, Martin Duerst <[EMAIL PROTECTED]> wrote: > > At 08:42 06/03/15, David Powell wrote: > > > > > >Not sure if this is a known bug, but I just noticed that the RelaxNG > >grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the > >"atom:name" and "atom:uri" and "atom:email" elements used within > >Person constructs. > > For atom:uri and atom:email at least, not having xml:lang may > be seen as a feature. While these often contain pieces from one > language or another, they are not really in a language. > > Regards, Martin. > > -- M. David Peterson http://www.xsltblog.com/
Re: Atom syndication schema
On 15/3/06 2:21 PM, "Martin Duerst" <[EMAIL PROTECTED]> wrote: >> Not sure if this is a known bug, but I just noticed that the RelaxNG >> grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the >> "atom:name" and "atom:uri" and "atom:email" elements used within >> Person constructs. > > For atom:uri and atom:email at least, not having xml:lang may > be seen as a feature. While these often contain pieces from one > language or another, they are not really in a language. Since the original discussion I've stumbled across something extra that makes xml:lang relevant for atom:name. Seems that in writing Hungarian names, the pattern is always surname followed by forename - e.g. Bartók Béla, where Béla is the personal name and Bartók is the family name. While common western names (eg. Eric Scheid) would be indexed as Scheid, Eric; a comma is instead simply added between the Hungarian surname and forename, making Hungarian names indistinguishable from other Western-style names. For example: Bartók Béla is indexed as Bartók, Béla. Icelandic names are another game altogether. e.
Re: Atom syndication schema
At 08:42 06/03/15, David Powell wrote: > > >Not sure if this is a known bug, but I just noticed that the RelaxNG >grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the >"atom:name" and "atom:uri" and "atom:email" elements used within >Person constructs. For atom:uri and atom:email at least, not having xml:lang may be seen as a feature. While these often contain pieces from one language or another, they are not really in a language. Regards, Martin.
Re: Atom syndication schema
Not sure if this is a known bug, but I just noticed that the RelaxNG grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the "atom:name" and "atom:uri" and "atom:email" elements used within Person constructs. -- Dave