Re: Atom syndication schema

2006-03-14 Thread Walter Underwood

--On March 15, 2006 4:25:40 PM +1100 Eric Scheid <[EMAIL PROTECTED]> wrote:

> Since the original discussion I've stumbled across something extra that
> makes xml:lang relevant for atom:name.
> 
> Seems that in writing Hungarian names, the pattern is always surname
> followed by forename - e.g. Bartók Béla, where Béla is the personal name and
> Bartók is the family name.

Or Margittai Neumann János vs. John von Neumann. It can be more complicated
than first/last or last/first.

I'm pretty sure that I brought this up and the WG decided to punt.

Representing personal names well means starting with X.500 and asking
around to see what could be improved. That is well outside the Atom charter.
Punting was the right thing to do, but it means that atom:name is minimal.

xml:lang isn't enough information to sort out given name and family name.
About all you can do with atom:name is print it out.

xml:lang could be useful in deciding between Chinese and Japanese variants
of a character for names. 

wunder
--
Walter Underwood
Principal Software Architect, Autonomy



Re: Atom syndication schema

2006-03-14 Thread M. David Peterson

Hmmm... some interesting points you've brought out.

Q: Does anyone know how many of the languages that fall into a similar
category, can be determined by character set alone?  I realize that
theres no  way that every single case could be covered using the
character set alone, and as such realize this does present an area of
considerable thought.

Unless, of course, someone already knows the answer, or better said,
how does one properly handle the various situations where te character
set in and of itself doesn't provide enough information.  Can a parent
element with an xml:lang attribute be enough?  It seems that in the
case of name, email and uri, the containing author element, which as
long as I'm not mistaken, does allow xml:lang should be enough to make
the assumption that the children elements also whould be treated as
the same language specified in the value of this attribute.

The only element (I think) that might be of concern is the name
element, as the email and URI should be handled with the character
encoding.

Does any of this even sound remotely on target?

On 3/14/06, Eric Scheid <[EMAIL PROTECTED]> wrote:
>
> On 15/3/06 2:21 PM, "Martin Duerst" <[EMAIL PROTECTED]> wrote:
>
> >> Not sure if this is a known bug, but I just noticed that the RelaxNG
> >> grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the
> >> "atom:name" and "atom:uri" and "atom:email" elements used within
> >> Person constructs.
> >
> > For atom:uri and atom:email at least, not having xml:lang may
> > be seen as a feature. While these often contain pieces from one
> > language or another, they are not really in a language.
>
> Since the original discussion I've stumbled across something extra that
> makes xml:lang relevant for atom:name.
>
> Seems that in writing Hungarian names, the pattern is always surname
> followed by forename - e.g. Bartók Béla, where Béla is the personal name and
> Bartók is the family name.
>
> While common western names (eg. Eric Scheid) would be indexed as Scheid,
> Eric; a comma is instead simply added between the Hungarian surname and
> forename, making Hungarian names indistinguishable from other Western-style
> names. For example: Bartók Béla is indexed as Bartók, Béla.
>
> Icelandic names are another game altogether.
>
> e.
>
>
>


--


M. David Peterson
http://www.xsltblog.com/



Re: Atom syndication schema

2006-03-14 Thread M. David Peterson

For Latin-based languages, your point is well taken. For non-latin,
its all about character sets.  As long as your character set for any
given feed is properly set, it seems to me then all the information
necessary to properly decode the email and URI (in which the work
continues to  integrate support for non-latin based languages, such as
Mandarin, etc... if I understand things correctly, full support for
Mandarin Chinese-based domains in not far off (speaking in terms of
DNS support and such).

Actually, the only reason for writing this response was to point out
the fact that we are entering a world in which China will continue to
play a dominant role in both our online and offline worlds, so
beginning to learn as much as possible in terms of how to properly
handle URI's and email adresses encoded as mentioned seems like it
would be a pretty good idea.

Couldnt hurt. :)

On 3/14/06, Martin Duerst <[EMAIL PROTECTED]> wrote:
>
> At 08:42 06/03/15, David Powell wrote:
>  >
>  >
>  >Not sure if this is a known bug, but I just noticed that the RelaxNG
>  >grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the
>  >"atom:name" and "atom:uri" and "atom:email" elements used within
>  >Person constructs.
>
> For atom:uri and atom:email at least, not having xml:lang may
> be seen as a feature. While these often contain pieces from one
> language or another, they are not really in a language.
>
> Regards,   Martin.
>
>


--


M. David Peterson
http://www.xsltblog.com/



Re: Atom syndication schema

2006-03-14 Thread Eric Scheid

On 15/3/06 2:21 PM, "Martin Duerst" <[EMAIL PROTECTED]> wrote:

>> Not sure if this is a known bug, but I just noticed that the RelaxNG
>> grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the
>> "atom:name" and "atom:uri" and "atom:email" elements used within
>> Person constructs.
> 
> For atom:uri and atom:email at least, not having xml:lang may
> be seen as a feature. While these often contain pieces from one
> language or another, they are not really in a language.

Since the original discussion I've stumbled across something extra that
makes xml:lang relevant for atom:name.

Seems that in writing Hungarian names, the pattern is always surname
followed by forename - e.g. Bartók Béla, where Béla is the personal name and
Bartók is the family name.

While common western names (eg. Eric Scheid) would be indexed as Scheid,
Eric; a comma is instead simply added between the Hungarian surname and
forename, making Hungarian names indistinguishable from other Western-style
names. For example: Bartók Béla is indexed as Bartók, Béla.

Icelandic names are another game altogether.

e.




Re: Atom syndication schema

2006-03-14 Thread Martin Duerst


At 08:42 06/03/15, David Powell wrote:
>
>
>Not sure if this is a known bug, but I just noticed that the RelaxNG
>grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the
>"atom:name" and "atom:uri" and "atom:email" elements used within
>Person constructs.

For atom:uri and atom:email at least, not having xml:lang may
be seen as a feature. While these often contain pieces from one
language or another, they are not really in a language.

Regards,   Martin. 



Re: Atom syndication schema

2006-03-14 Thread David Powell


Not sure if this is a known bug, but I just noticed that the RelaxNG
grammar doesn't accept "atomCommonAttributes" (eg xml:lang) on the
"atom:name" and "atom:uri" and "atom:email" elements used within
Person constructs.

-- 
Dave