from:"Walter Underwood"

Re: Fyi, Apache project proposal

2006-05-23 Thread Walter Underwood



--On May 23, 2006 3:18:18 PM +0200 Ugo Cei <[EMAIL PROTECTED]> wrote:


Demokritos might be quite well advanced but unfortunately Python code is not
very suited for us poor souls who still have to struggle with  java 
environments ;-)


The goal is a reference implementation. The goal is to be exactly correct.
Being in a particular language, or even being fast enough to be usable,
is beside the point. In particular, a reference implementation should
always choose code readability over speed.

If the goal is to have a standard, free implementation that everyone uses,
that is different from a reference implementation and the goals should
say that.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy (Ultraseek)

Re: Atom syndication schema

2006-03-14 Thread Walter Underwood

--On March 15, 2006 4:25:40 PM +1100 Eric Scheid <[EMAIL PROTECTED]> wrote:

> Since the original discussion I've stumbled across something extra that
> makes xml:lang relevant for atom:name.
> 
> Seems that in writing Hungarian names, the pattern is always surname
> followed by forename - e.g. Bartók Béla, where Béla is the personal name and
> Bartók is the family name.

Or Margittai Neumann János vs. John von Neumann. It can be more complicated
than first/last or last/first.

I'm pretty sure that I brought this up and the WG decided to punt.

Representing personal names well means starting with X.500 and asking
around to see what could be improved. That is well outside the Atom charter.
Punting was the right thing to do, but it means that atom:name is minimal.

xml:lang isn't enough information to sort out given name and family name.
About all you can do with atom:name is print it out.

xml:lang could be useful in deciding between Chinese and Japanese variants
of a character for names. 

wunder
--
Walter Underwood
Principal Software Architect, Autonomy

Re: wiki mime type

2006-03-07 Thread Walter Underwood


It isn't "wiki". Those are used in blogs, and I use Markdown for simple
HTML memos.

Don't use "x-", either. Register a real type.

wunder

--On March 7, 2006 5:51:42 PM +0100 Henry Story <[EMAIL PROTECTED]> wrote:

> 
> On 6 Mar 2006, at 18:54, James Tauber wrote:
>> Agreed that this would be very useful and also that it needs to be  
>> done
>> on a per wiki format basis.
> 
> Is there a forum a large number of them tend to hang out on, so that  one 
> could ask them to think about this? What would be the best to do  in the 
> meantime? Something like
> 
> text/x-wiki+textile
> text/x-wiki+markdown
> 
> perhaps?
> 
> 
>> I think, however, that this is something the format creators should be
>> encouraged to register, or at least suggest a convention for.
>> 
>> James
>> 
>> On Mon, 06 Mar 2006 07:59:10 -0800, "Walter Underwood"
>> <[EMAIL PROTECTED]> said:
>>> 
>>> --On March 6, 2006 3:59:39 PM +0100 Henry Story  
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>> 
>>>> Silly question probably, but is there a wiki mime type?
>>>> I was thinking of "text/wiki" or "text/x-wiki" or something.
>>>> 
>>>> I want people to be able to edit their blogs in wiki format in  
>>>> BlogEd  and be able
>>>> to distinguish when they do that from when they enter  plain  
>>>> text, html or xhtml.
>>>> Perhaps this is also useful for the protocol.
>>> 
>>> It would be really useful, especially for feeds that archive the  
>>> content
>>> of a blog. It would be best to use the official names of the formats,
>>> like
>>> "text/markdown" or "text/textile". The wikis and blogs that I use  
>>> can be
>>> configured to accept different formats, so "text/wiki" doesn't work.
>>> 
>>> wunder
>>> --
>>> Walter Underwood
>>> Principal Software Architect, Autonomy
>>> 
>> -- 
>>   James Tauber   http://jtauber.com/
>>   journeyman of somehttp://jtauber.com/blog/
> 
> 



--
Walter Underwood
Principal Software Architect, Autonomy

Re: Atom logo where?

2006-03-06 Thread Walter Underwood

--On March 6, 2006 7:02:23 PM +0100 "A. Pagaltzis" <[EMAIL PROTECTED]> wrote:
>
> For that matter, who has seen Mena Trott’s alternative Atom logo
> design and what do people think about it?

1. I don't see why Atom needs a logo.
2. The proposed logo is probably too close to the Autonomy logo.

I cannot speak for Autonomy lawyers, but companies are faced with 
"defend it or lose it" on their trademarks. Autonomy is in the 
unstructured info business, so there is probably a conflict.

It also looks like the logo for the Austin Bergtrom International
Airport, but that doesn't conflict.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy

Re: wiki mime type

2006-03-06 Thread Walter Underwood

--On March 6, 2006 3:59:39 PM +0100 Henry Story <[EMAIL PROTECTED]> wrote:
>
> Silly question probably, but is there a wiki mime type?
> I was thinking of "text/wiki" or "text/x-wiki" or something.
> 
> I want people to be able to edit their blogs in wiki format in BlogEd  and be 
> able
> to distinguish when they do that from when they enter  plain text, html or 
> xhtml.
> Perhaps this is also useful for the protocol.

It would be really useful, especially for feeds that archive the content
of a blog. It would be best to use the official names of the formats, like
"text/markdown" or "text/textile". The wikis and blogs that I use can be
configured to accept different formats, so "text/wiki" doesn't work.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy

Re: atom:updated handling

2006-02-15 Thread Walter Underwood


It doesn't hurt to point it out. It could catch some developer errors.
But it doesn't make an invalid feed. --wunder

--On February 15, 2006 4:25:35 PM -0800 James M Snell <[EMAIL PROTECTED]> wrote:

> 
> I personally think that the feedvalidator is being too anal about
> updated handling.  Entries with the same atom:id value MUST have
> different updated values, but the spec says nothing about entries with
> different atom:id's.
> 
> - James
> 
> James Yenne wrote:
>> I'm using the feedvalidtor.org to validate a feed with entries
>> containing atom:updated that may have the same datetime, although
>> different atom:id. The validator complains that two entries cannot have
>> the same value for atom:updated. I generate these feeds and the
>> generator uses the current datetime, which may be exactly the same. I
>> don't understand why the validator should care about these
>> updated values from different entries per atom:id - these are totally
>> unrelated entries.   Is the validator wrong?  It seems that otherwise I
>> have to play tricks to make these entries have different updated within
>> the feed.
>>  
>> I'm not sure how this relates to the thread "More on atom:id handling"
>>  
>> Thanks,
>> James
> 
> 



--
Walter Underwood
Principal Software Architect, Autonomy

Re: atom:updated handling

2006-02-15 Thread Walter Underwood

--On February 15, 2006 4:07:35 PM -0800 James Yenne <[EMAIL PROTECTED]> wrote:
>
> I'm using the feedvalidtor.org to validate a feed with entries containing
> atom:updated that may have the same datetime, although different atom:id.
> The validator complains that two entries cannot have the same value for
> atom:updated.

I got the same spurious warning. My feed is search results, so it is
perfectly OK for them to have the same atom:updated.

It is OK for the validator to point this out, but it should be informational,
not a warning.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy

Re: [Fwd: Re: todo: add language encoding information]

2005-12-23 Thread Walter Underwood

--On December 23, 2005 11:31:22 PM +0100 Henry Story <[EMAIL PROTECTED]> wrote:
>
> So  you can't have a link pointing from an entry to an id, without losing some
> very important information. We need something more  specific. We need a link
> pointing from A to C as shown by the blue line.

Some people will need that in the guts of their publishing system. Why do
we need it in Atom? Is there something essential that subscribers cannot do
because this isn't represented? This sounds like something needed for the
publishing/translation workflow, not for the general readership.

Extended provenance information is sometimes needed, but there is almost
no limit to that. It certainly does not stop at translation, source, and
translator. I'm reading a new translation of Andersen's tales where 
"Thumbelina" is "Inchelina" because the translator knew the right dialect
of Danish. That is significant, but does it need to be in Atom?

The semantics here should be exactly the same as for dates -- the date
means what the publisher thinks it means. Same for language info. Trying
to get more exact means that the model will be wrong for some publishers
that generate completely legal Atom.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: ACE - Atom Common Extensions Namespace

2005-10-02 Thread Walter Underwood

--On October 2, 2005 9:35:28 AM +0200 Anne van Kesteren <[EMAIL PROTECTED]> 
wrote:
> 
> Having a file and folder of the same name is not technically possible. 
> (Although
> you could emulate the effect of course with some mod_rewrite.)

Namespaces aren't files, only names. So the limitations of some particular
file name implementation are meaningless for namespaces.

Also, some filesystem implementations do allow a file and a folder
with the same name.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Arr! Avast me hearties!

2005-09-19 Thread Walter Underwood

I think we just got a nomination for an April 1 RFC. Nice job.
More accurate than the x-hacker locale on Google, because that
is really still english, not some other "hacker" language.
Besides, they didn't make the spell suggest work in l33t.

wunder

--On September 20, 2005 3:09:56 AM +0100 James Holderness <[EMAIL PROTECTED]> 
wrote:

> A conforming client SHOULD perform an HTTP request for the feed with the 
> Accept-Language header set to "en-pirate" (or whatever the standard RFC 3066 
> language tag for the pirate dialect of english). A conforming server SHOULD 
> return the pirate version of the feed with the Content-Language header set to 
> "en-pirate" and/or the xml:lang attribute set to "en-pirate" in the root 
> element.

--
Walter Underwood
Principal Software Architect, Verity

Re: "Top 10" and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre <[EMAIL PROTECTED]> 
wrote:
>> One could read that to mean that feeds are fundamentally unordered or that
>> Atom doesn't say what the order means.
> 
> Is not logical order, if any, determined by the datetime of the published
> (or updated) element?

That is one kind of order. Other kinds are relevance to a search term
(A9 OpenSearch), editorial importance (BBC News feeds), or datetime of
original publication (nearly all blog feeds, not the same as last update).

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: "Top 10" and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre <[EMAIL PROTECTED]> 
wrote:
>> Otherwise, it is not possible to go from Atom to RSS 1.0.
> 
> I assume you mean from RSS 1.0 to Atom. :-)

No. You can go from a Bag to List by ignoring the order. RSS 1.0 is a
List, so you would need to invent an order to put unordered items in it.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: "Top 10" and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 1:49:57 AM -0400 Bob Wyman <[EMAIL PROTECTED]> wrote:

> I’m sorry, but I can’t go on without complaining.  Microsoft has proposed
> extensions which turn RSS V2.0 feeds into lists and we’ve got folk who are
> proposing much the same for Atom (i.e. stateful, incremental or partitioned
> feeds)… I think they are wrong. Feeds aren’t lists and Lists aren’t feeds.

The Atom spec says:

   This specification assigns no significance to the order of atom:entry
   elements within the feed.

One could read that to mean that feeds are fundamentally unordered or that
Atom doesn't say what the order means.

Other RSS formats are ordered, either implicitly or explicity (RSS 1.0).
For interoperatility, lots of software is going to treat Atom as ordered.
Otherwise, it is not possible to go from Atom to RSS 1.0.

> What is a search engine or a matching engine supposed to return as a resul
>  if it find a match for a user query in an entry that comes from a list-feed?

Maybe the list feed should have a noindex flag.

> Should it return the entire feed or should it return just the entry/item
> that contained the stuff in the users’ query?

I'd return the entry. It is all about the entries. If the list position is
semantically important to the entry, then include a link from the entry to
the list. "This is movie 312 in wunder's queue."

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood


--On August 29, 2005 7:05:09 PM -0700 James M Snell <[EMAIL PROTECTED]> wrote:

> x:index="no|yes" doesn't seem to make a lot of sense in this case.

It makes just as much sense as it does for HTML files. Maybe it is a
whole group of Atom test cases. Maybe it is a feed of reboot times 
for the server.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood


--On August 30, 2005 11:39:04 AM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote:
>
> Someone wrote up "A Robots Processing Instruction for XML Documents"
> http://atrus.org/writings/technical/robots_pi/spec-199912__/
> That's a PI though, and I have no idea how well supported they are. I'd
> prefer a namespaced XML vocabulary.

That was me. I think it makes perfect sense as a PI. But I think reuse
via namespaces is oversold. For example, we didn't even try to use
Dublin Core tags in Atom.

PI support is required by the XML spec -- "must be passed to the
application."

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood



--On Monday, August 29, 2005 10:39:33 AM -0600 Antone Roundy <[EMAIL 
PROTECTED]> wrote:


As has been suggested, to "inline images", we need to add frame documents,
stylesheets, Java applets, external JavaScript code, objects such as Flash
files, etc., etc., etc.  The question is, with respect to feed readers, do
external feed content (), enclosures, etc. fall into
the same exceptions category or not?


Of course a feed reader can read the feed, and anything required
to make it readable. Duh.

And all this time, I thought robots.txt was simple.

robots.txt is a polite hint from the publisher that a robot (not
a human) probably should avoid those URLs. Humans can do any stupid
thing they want, and probably will.

The robots.txt spec is silent on what to do with URLs manually-added
to a robot. The normal approach is to deny those, with a message that they
are disallowed by robots.txt, and offer some way to override that.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

--On August 26, 2005 9:51:10 AM -0700 James M Snell <[EMAIL PROTECTED]> wrote:

> Add a new link rel="readers" whose href points to a robots.txt-like file that
> either allows or disallows the aggregator for specific URI's and establishes
> polling rate preferences
> 
>   User-agent: {aggregator-ua}
>   Origin: {ip-address}
>   Allow: {uri}
>   Disallow: {uri}
>   Frequency: {rate} [{penalty}]
>   Max-Requests: {num-requests} {period} [{penalty}]

No, on several counts.

1. Big, scalable spiders don't work like that. They don't do aggregate
frequencies or rates. They may have independent crawlers visiting the
same host. Yes, they try to be good citizens, but you can't force
WWW search folk to redesign their spiders.

2. Frequencies and rates don't work well with either HTTP caching or
with publishing schedules. Things are much cleaner with a single 
model (max-age and/or expires).

3. This is trying to be a remote-control for spiders instead of describing
some characteristic of the content. We've rejected the remote control
approach in Atom.

4. What happens when there are conflicting specs in this file, in
robots.txt, and in a Google Sitemap?

5. Specifying all this detail is pointless if the spider ignores it.
You still need to have enforceable rate controls in your webserver
to handle busted or bad citizen robots.

6. Finally, this sort of thing has been proposed a few times and never
caught on. By itself, that is a weak argument, but I think the causes
are pretty strong (above).

There are some proprietary extensions to robots.txt:

Yahoo crawl-delay:
<http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html>

Google wildcard disallows:
<http://www.google.com/remove.html#images>

It looks like MSNbot does crawl-delay and an extension-only wildcard:
<http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm>

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

I'm adding robots@mccmedia.com to this dicussion. That is the classic
list for robots.txt discussion.

Robots list: this is a discussion about the interactions of /robots.txt
and clients or robots that fetch RSS feeds. "Atom" is a new format in
the RSS family.

--On August 26, 2005 8:39:59 PM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote:

> While true that each of these scenarios involve crawling new links,
> the base principle at stake is to prevent harm caused by automatic or
> robotic behaviour. That can include extremely frequent periodic re-fetching,
> a scenario which didn't really exist when robots.txt was first put together.

It was a problem then:

   In 1993 and 1994 there have been occasions where robots have visited WWW
   servers where they weren't welcome for various reasons. Sometimes these
   reasons were robot specific, e.g. certain robots swamped servers with
   rapid-fire requests, or retrieved the same files repeatedly. In other
   situations robots traversed parts of WWW servers that weren't suitable,
   e.g. very deep virtual trees, duplicated information, temporary information,
   or cgi-scripts with side-effects (such as voting).
   <http://www.robotstxt.org/wc/norobots.html>

I see /robots.txt as a declaration by the publisher (webmaster) that
robots are not welcome at those URLs. 

Web robots do not solely depend on automatic link discovery, and haven't
for at least ten years. Infoseek had a public "Add URL" page. /robots.txt
was honored regardless of whether the link was manually added or automatically
discovered.

A crawling service (robot) should warn users that the URL, Atom or otherwise,
is disallowed by robots.txt. Report that on the status page for that feed.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

There are no wildcards in /robots.txt, only path prefixes and user-agent
names. There is one special user-agent, "*", which means "all".
I can't think of any good reason to always ignore the disallows for *.

I guess it is OK to implement the parts of a spec that you want.
Just don't answer "yes" when someone asks if you honor robots.txt.

A lot of spiders allow the admin to override /robots.txt for specific
sites, or better, for specific URLs.

wunder

--On August 25, 2005 11:47:18 PM -0500 "Roger B." <[EMAIL PROTECTED]> wrote:

> 
> Bob: It's one thing to ignore a wildcard rule in robots.txt. I don't
> think its a good idea, but I can at least see a valid argument for it.
> However, if I put something like:
> 
> User-agent: PubSub
> Disallow: /
> 
> ...in my robots.txt and you ignore it, then you very much belong on
> the Bad List.
> 
> --
> Roger Benningfield
> 
> 

--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood

I would call desktop clients "clients" not "robots". The distinction is
how they add feeds to the polling list. Clients add them because of
human decisions. Robots discover them mechanically and add them.

So, clients should act like browsers, and ignore robots.txt.

Robots.txt is not very widely deployed (around 5% of sites), but it 
does work OK for general web content.

wunder

--On August 25, 2005 10:25:08 PM +0200 Henry Story <[EMAIL PROTECTED]> wrote:

> 
> Mhh. I have not looked into this. But is not every desktop aggregator  a 
> robot?
> 
> Henry
> 
> On 25 Aug 2005, at 22:18, James M Snell wrote:
>> At the very least, aggregators should respect robots.txt.  Doing so  
>> would allow publishers to restrict who is allowed to pull their feed.
>> 
>> - James
>> 
> 
> 

--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood

--On August 25, 2005 3:43:03 PM -0400 Karl Dubost <[EMAIL PROTECTED]> wrote:
> Le 05-08-25 à 12:51, Walter Underwood a écrit :
>> /robots.txt is one approach. Wouldn't hurt to have a recommendation
>> for whether Atom clients honor that.
> 
> Not many honor it.

I'm not surprised. There seems to be a new generation of robots that
hasn't learned much from the first generation. The Robots mailing list
is silent these day. That is why we should make a recommendation about it.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood

I can see reasonable uses for this, like marking a feed of local disk errors
as not of general interest. I would not be surprised to see RSS/Atom catch
on for system monitoring.

Search engines see this all the time -- just because it is HTML doesn't
mean it is the primary content on the site. Log analysis reports are
one good example.

/robots.txt is one approach. Wouldn't hurt to have a recommendation
for whether Atom clients honor that.

A long time ago, I proposed a Robots PI, similar to the Robots meta tag.
That would get around the "only webmaster can edit" problem with /robots.txt.
The Robots PI did not catch on, but I've still got the proposal somewhere.

wunder

--On August 24, 2005 11:25:12 PM -0700 James M Snell <[EMAIL PROTECTED]> wrote:

> 
> Up to this point, the vast majority of use cases for Atom feeds is the 
> traditional syndicated content case.  A bunch of content updates that are 
> designed to be distributed and aggregated within Feed readers or online 
> aggregators, etc.  But with Atom providing a much more flexible content model 
> that allows for data that may not be suitable for display within a feed 
> reader or online aggregator, I'm wondering what the best way would be for a 
> publisher to indicate that a feed should not be
aggregated?
> 
> For example, suppose I build an application that depends on an Atom feed 
> containing binary content (e.g. a software update feed).  I don't really want 
> aggregators pulling and indexing that feed and attempting to display it 
> within a traditional feed reader.  What can I do?
> 
> Does the following work?
> 
> 
>   ...
>   no
> 
> 
> Should I use a processing instruction instead?
> 
> 
> 
>   ...
> 
> 
> I dunno. What do you all think?  Am I just being silly or does any of this 
> actually make a bit of sense?
> 
> - James
> 
> 

--
Walter Underwood
Principal Software Architect, Verity

Re: If you want "Fat Pings" just use Atom!

2005-08-23 Thread Walter Underwood

--On August 23, 2005 9:40:44 AM +0300 Henri Sivonen <[EMAIL PROTECTED]> wrote:

> There's nothing in the XML spec requiring the app to throw away the data
> structures it has already built when the parser reports the error.

There is also nothing requiring it. It is optional. The only 
reqired behavior is to report the error and stop creating parsed
information. Otherwise, "results are undefined" according to the spec.

The spec does require that normal processing stop at the error.
The parser can make data past the error available, but it "must not
continue to pass character data and information about the document's
logical structure to the application in the normal way".

This still feels like a hack to me. An unterminated document is 
not well-formed, and is not XML or Atom. Doing this should require
another RFC that says, "we didn't really mean that it had to be XML."

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: If you want "Fat Pings" just use Atom!

2005-08-22 Thread Walter Underwood

--On August 23, 2005 12:01:11 PM +0900 Martin Duerst <[EMAIL PROTECTED]> wrote:
> 
> Well, modulo character encoding issues, that is. An FF will
> look differently in UTF-16 than in ASCII-based encodings.

Fine. Use two NULs. That is either one illegal UTF-16(BE or LE) character
or two illegal characters in ASCII or UTF-8.

Of course, a transport level multi-payload system would be preferred.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: If you want "Fat Pings" just use Atom!

2005-08-22 Thread Walter Underwood


--On August 22, 2005 2:01:45 PM -0400 Joe Gregorio <[EMAIL PROTECTED]> wrote:

> Interestingly enough the FF separated entries method would also work 
> when storing a large quantity of entries in a single flat file where
> appending an entry needs to be fast.

The original application was logfiles in XML.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: If you want "Fat Pings" just use Atom!

2005-08-22 Thread Walter Underwood

--On August 22, 2005 12:36:17 AM -0400 Sam Ruby <[EMAIL PROTECTED]> wrote:

> With a HTTP client library and SAX, the "absolute simplest solution" is
> what Bob is describing: a single document that never completes.

Except that an endless document can't be legal XML, because XML requires
the root element to balance. An endless document never closes it. So, the
endless document cannot be legal Atom. Worse, there is no chance for error
recovery. One error, and the rest of the stream might not be parsable.

So, it is simple, but busted.

The standard trick here is to use a sequence of small docs, separated
by ASCII form-feed characters. That character is not legal within an
XML document, so it allows the stream to resyncronize on that character.
Besides, form-feed actually has almost the right semantics -- start a
new page.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: FYI: Expires Extension Draft

2005-08-18 Thread Walter Underwood

RSS 3? Eh?

The RSS ttl element is a mess. RSS 3 Lite (could we spell that word correctly?)
specifies it not as information about the feed, but as an attempt to remotely
control robots. RSS 2 specifies it as a caching hint, but in minutes, not
seconds.

Regardless it is useless for a feed with a dedicated update schedule, because
it requires updating the feed every second (or minute) as the publish time
approaches.

For more detail, see: <http://www.intertwingly.net/wiki/pie/PaceCaching>
That was a proposal, and is *not* part of Atom, but it does have some
useful discussion of cache hints.

For caching, use the native HTTP cache features.

wunder 

--On August 18, 2005 2:20:21 PM -0400 Elias Torres <[EMAIL PROTECTED]> wrote:

> 
> I tried commenting on your site, but I have to register to comment. :-(
> 
> You linked to RSS3 [1] and I spotted something related to this
> extension that could be used instead.
> 
> 7
> 
> It seems more elegant than having to convert to whatever you specified
> in your spec.
> 
> Just a thought.
> 
> Elias
> 
> 
> [1] http://www.rss3.org/rss3lite.html
> 
> On 8/17/05, James M Snell <[EMAIL PROTECTED]> wrote:
>> 
>> http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-expires-00.txt
>> 
>> Example:
>> 
>> 
>>   ...
>>   2005-08-16T12:00:00Z
>>   ...
>> 
>> 
>> or
>> 
>> 
>>   ...
>>   2005-08-16T12:00:00Z
>>   2
>>   ...
>> 
>> 
>> This is not to be used for caching of Atom documents; nor is it to be
>> used as a mechanism for scheduling updates of local copies of Atom
>> documents.
>> 
>> - James
>> 
>> 
> 
> 

--
Walter Underwood
Principal Software Architect, Verity

Re: Spec explanations for Pebble?

2005-08-13 Thread Walter Underwood

--On August 13, 2005 8:34:49 AM + Simon Brown <[EMAIL PROTECTED]> wrote:

> If Tim *moves* his blog to www.timbray.com/ongoing, would you expect his Atom
> IDs to remain the same? Spec aside, this has some implications for storing 
> Atom
> IDs next to content they identify, which I imagine doesn't happen in most CMS
> tools at the moment.

Of course they stay the same. At the risk of being rude, "duh".

It is an ID, not an href. ID, ID, ID.

If we need to clarify the spec further, though, let's do it now.
I don't mind specifically saying that the ID stays the same when
content is relocated.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Spec explanations for Pebble?

2005-08-12 Thread Walter Underwood

--On August 12, 2005 6:52:28 AM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> Except for, a bunch of blogs might agree to share a categorization
> scheme, so probably not "unique to each blog".

For example, as libraries start delivering literature monitoring with
feeds, we'll see LCSH or some other standard category system in those.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Finishing up on whitespace in IRIs and dates

2005-08-11 Thread Walter Underwood


--On August 11, 2005 9:04:21 PM -0700 Paul Hoffman <[EMAIL PROTECTED]> wrote:

> Note that there MUST be no whitespace in a Date construct or in any IRI. Some 
> XML-emitting implementations erroneously insert whitespace around values by 
> default, and such implementations will emit invalid Atom.

Nice clear wording.

+1 with "MUST be no" changed to "MUST NOT be", as suggested by Aristotle.

wunder
--
Walter Underwood
Principal Software Architect, Verity

Re: Expires extension draft (was Re: Feed History -02)

2005-08-10 Thread Walter Underwood

--On August 10, 2005 1:56:05 PM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote:

> Aside: a perfect example of what sense of 'expires' is in the I-D itself...
> 
> Network Working Group
> Internet-Draft
> Expires: January 2, 2006

Especially perfect because the HTTP header does not reflect the expiration.

Honestly, another reason to put expiration inside the feed is that 
HTTP caching is just not used. Well, except to force reloads and show
you new ads. But it is extremely rare to see it per-document cache
information.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Feed History -02

2005-08-09 Thread Walter Underwood


--On August 9, 2005 9:28:52 AM -0700 James M Snell <[EMAIL PROTECTED]> wrote:

>> I made some proposals for cache control info (expires and max-age).
>> That might work for this.
>>
> I missed these proposals.  I've been giving some thought to an  
> and  extension myself and was getting ready to write up a draft. 
> Expires is a simple date construct specifying the exact moment (inclusive) 
> that the entry/feed expires.  Max-age is a non negative integer specifying 
> the number of miliseconds (inclusive) from the moment specified by 
> atom:updated when then entry/feed expires.  The two cannot appear together 
> within a single entry/feed and follows the same basic
rules as atom:author elements.

Here it is: <http://www.intertwingly.net/wiki/pie/PaceCaching>

Adding max-age also means defining IntegerConstruct and disallowing
white space around it. Formerly, it was OK as a text construct, but
the white space issues change that.

Also, we should decide whether cache information is part of the signature.
I can see arguments either way.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Feed History -02

2005-08-09 Thread Walter Underwood


--On August 9, 2005 1:07:29 PM +0200 Henry Story <[EMAIL PROTECTED]> wrote:
>
> But I would really like some way to specify that the next feed  document is an
> archive (ie. won't change). This would make it easy  for clients to know when
> to stop following the links, ie, when they have cought up with the changes
> since they last looked at the feed.

I made some proposals for cache control info (expires and max-age).
That might work for this.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: spec bug: can we fix for draft-11?

2005-08-05 Thread Walter Underwood

--On August 4, 2005 9:31:55 AM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> So for now, I'm -1 on an weakening or removing "The element's content  MUST be
> an IRI" or analogous text in any other section. I'll stop  shouting if I'm in
> a small minority here.  -Tim

Wow, this string has made my "away on vacation" mailbox fatter.

I strongly favor making white space around IRIs illegal in Atom, whether
they are an ID or somewhere else. Same for dates.

This follows the robustness principle, where we are conservative in what
we generate. Atom processors are free to be liberal in what they accept,
so they can strip whitespace. Or not, I don't care.

Note that a feed with whitespace around an IRI can never be aggregated
into another feed, because a) the ID IRI cannot be changed, and b) the new
feed cannot cannot contain whitespace.

Making every single processor strip whitespace smells too much like
the HTML tag soup processors that we all have to maintain. Yuk.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: FormatTests

2005-07-17 Thread Walter Underwood


--On July 17, 2005 3:45:26 PM +0100 Graham <[EMAIL PROTECTED]> wrote:

> Now do you see why canonical ids are stupid and irrelevant?

Not unless the robustness principal is stupid and irrelevant.
Canonical IDs are more robust. Feeds that use them will work better
in the quick-and-dirty, "Desperate Perl Hacker" environment of the
internet.

The updated warning is just right. Thank you for using Atom, here is
how you can do a better job.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Evangelism, etc.

2005-07-16 Thread Walter Underwood


--On July 16, 2005 11:16:44 AM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:

> I found the criticism pathetic. 

A little lame, at least. You can't add precision and interoperability
with innovation and extension.

But there is a point buried under all that. What are the changes required
to support Atom? It looks complicated, but how hard is it? Here is a shot
at that information.

For publishers, you need to be precise about the content. There are fallbacks,
where if it is any sort of HTML, send it as HTML, and if it isn't, send it
as text. The XHTML and XML options are there for extra control.

Also, add an ID. It is OK for this to be a URL to the article as long as
it doesn't change later. That is, the article can move to a different URL,
but keep the ID the same.

Add a modified date. The software probably already has this, and you can
fall back to the file last-modified if you have to. But if there is a 
better date available, use it.

The ID and date are required because they allows Atom clients and aggregators
to "get it right" when tracking entries, either in the same feed or when the
same entry shows up in multiple feeds. 

Extending Atom is different from extending RSS, because there are more options.
The mechanical part of extensions are covered in the spec, to guarantee that
an Atom feed is still interoperable when it includes extensions. The political
part of extensions has two options: free innovation and standardization. Anyone
can write an extension to Atom and use it. Or, they can propose a standard to
the IETF (or another body). The standards process usually means more review,
more interoperability, and more delay in deploying it. Sometimes, the delay
is worth it, and we hope that is true for Atom.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: The Atomic age

2005-07-15 Thread Walter Underwood

--On July 14, 2005 11:37:05 PM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> So, implementors... to  work.

Do we have a list of who is implementing it? That could be used in
the "Deployment" section of <http://www.tbray.org/atom/RSS-and-Atom>.

Ultraseek will implement Atom. We need to think more about exactly
what it means for a search engine to implement it, but we'll at
least spider it.

wunder

"Creature with the Atom Brain, why is he acting so strange?"
  Roky Erickson
--
Walter Underwood
Principal Architect, Verity

Re: Mystery abbrevations in draft 9

2005-07-06 Thread Walter Underwood

--On July 6, 2005 11:05:33 AM -0700 Paul Hoffman <[EMAIL PROTECTED]> wrote:
> 
> Spelling out the abbreviations as "Unicode Normalization Form C" and "Unicode
> Normalization Form KC" is fine; referencing them is *not*. A reference to the
> Unicode Standard inherently points to a particular version, and the tables 
> used
> for NFC and NFKC change from version to version.

XML already has normative references to Unicode, so we can't exactly
avoid those without dropping XML.

Of course, the correct choice of normalization rules for atom:id is not
the ones from the XML spec, but the IRI rules from RFC 3987.

We really could have two sets of "standard" normalization rules in 
one document, one for XML and one for atom:id URIs, so I think it
is worth pointing to RFC 3987 for indirect references to NFC/NFKC.
Without clarification, this is a legitimate chance for confusion.

wunder
--
Walter Underwood
Principal Architect, Verity

Mystery abbrevations in draft 9

2005-07-06 Thread Walter Underwood


In 4.2.6 atom:id, the last sentence is:

o Ensure that all components of the IRI are appropriately character-
  normalized, e.g. by using NFC or NFKC.

"NFC" and "NFKC" need to be defined, with a reference to the Unicode spec.

wunder
--
Walter Underwood
Principal Architect, Verity

RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Walter Underwood



--On Tuesday, July 05, 2005 11:48:44 AM -0700 Paul Hoffman <[EMAIL PROTECTED]> 
wrote:

At 2:24 PM -0400 7/5/05, Bob Wyman wrote:

I find it hard to imagine what harm could be done by providing this
recommendation.


Timing. If we change text other than because of an IESG note, there is a strong
chance we will have to delay being finalized by two weeks, possibly more.


I'm fine with the delay. Two or three weeks on top of 18 months is
not a big deal.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Walter Underwood



--On Tuesday, July 05, 2005 10:45:29 AM -0700 Tim Bray <[EMAIL PROTECTED]> 
wrote:


Still -1, despite Bob's arguments, at least in part because we have no idea
what kind of applications are going to be using signed  entries and we shouldn't
try to micromanage a future we don't  understand. -Tim


I'm +1, because this is a "when features collide!" issue for Atom.
We don't have to make it a SHOULD or a MUST, just point out that
signed entries need to be standalone if they will ever be used
outside of their feed context.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Walter Underwood


--On July 5, 2005 9:53:42 AM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
>>> 
>> Bob can clarify exactly what he means but from my perspective it  
>> comes down to an aggregation problem.  If a signature is generated  
>> over an entry that does not contain an author element or a source  
>> element, that entry cannot be re-enveloped into an aggregate feed  
>> that does not contain a top level author element without breaking  
>> the signature
> 
> Well, yes.  Anyone who understands digsig, even someone such as  myself with 
> only a surface knowledge, can see this.  You can't change  a signed object 
> without breaking the sig, that's the point.  If I  want to sign an entry and 
> also want to make it available for  aggregation then yes, I'd better put in 
> an atom:source.  But this is  inherent in the basic definition of digsig; not 
> something we need to  call out.   -Tim

But it is an interoperability consequence of the Atom format and cascaded
values. It would be worth commenting that signed entries need to be standalone
in order to be aggregated in another feed and keep their signature.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Clearing a "discuss" vote on the Atom format

2005-07-01 Thread Walter Underwood

--On July 1, 2005 4:44:23 PM +0900 Martin Duerst <[EMAIL PROTECTED]> wrote:
>
> The reason for this is to make sure we have interoperability
> with a mandatory-to-implement (and default-to-use) canonicalization,
> but that we don't disallow other canonicalizations that for one
> or the other as of now not yet clear reason may be preferable in
> some cases in the future (but in your wording would prohibit
> the result to be called Atom at all).

A potential future reason that we can't even characterize isn't
enough reason for me to support this.

If we discover weaknesses in the canonicalization, we'll need
to change Atom anyway. Explicitly making room for future incompatible
canonicalizations doesn't make any sense to me.

What is the point of calling something "Atom" when it uses a 
canonicalization which prevents interop with legal Atom implementations?

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Google Sitemaps: Yet another "RSS" or site-metadata format and Atom "competitor"

2005-06-07 Thread Walter Underwood

--On June 7, 2005 3:17:04 AM -0700 gstein <[EMAIL PROTECTED]> wrote:
>
> "proprietary" connotes closed. We published the spec and encourage
> other search engines to use it. There is no intent to close or control it.

"Proprietary" means "owned". Google clearly owns "Google Sitemaps".
The license requires derivative works to keep the same license. That is
control.

It was designed in isolation, for Google's use. That is a closed spec.

For example, the priority element is not specified well enough for another
engine to implement it compatibly. Does it apply to ranking, crawl order
or duplicate preference?

An open process would have at least looked at the proposed extensions 
for robots.txt and earlier formats like Infoseek sitelist.txt.

wunder
--
Walter Underwood
Principal Architect, Verity

RE: Google Sitemaps: Yet another "RSS" or site-metadata format and Atom "competitor"

2005-06-03 Thread Walter Underwood


--On June 3, 2005 6:48:31 PM -0400 Bob Wyman <[EMAIL PROTECTED]> wrote:
>   I do still think it unfortunate that Google felt compelled to invent
> yet-another-format for Sitemaps.

Yep. The could have used good ol' Infoseek sitelist.txt. Here is a copy
from eight years ago:

<http://web.archive.org/web/19970529104229/http://software.infoseek.com/products/ultraseek/docs/sitelist.html>

wunder
--
Walter Underwood
Principal Architect, Verity

Re: OpenSearch RSS

2005-05-31 Thread Walter Underwood



--On Tuesday, May 31, 2005 09:46:39 AM +0100 James Aylett <[EMAIL PROTECTED]> 
wrote:


We were also a little concerned that the OpenSearch model was very
simplistic ...

...  This is kind of orthogonal to the OpenSearch issue, but if
people are interested in discussing a richer search extension we can
try to clear some time to pull it into shape.


That was my feeling, too. OpenSearch is so limited that it is not
very interesting.

That's too bad, because most of the hard design work was done nearly
ten years ago at the STARTS project at Stanford. A couple of years ago,
we put together a fairly general search web service. It is time to
update that, so maybe I'll look at doing it on Atom.

STARTS is here: <http://www-db.stanford.edu/~gravano/starts_home.html>

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Last and final consensus pronouncement

2005-05-26 Thread Walter Underwood


The atom:author element name is embarrassing. Make it atom:creator.
There were no objections to that.

wunder

--On May 26, 2005 10:26:54 AM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:

> 
> 
> On behalf of Paul and myself:  This is it.  The initial phase of the  WG's 
> work in designing the Atompub data format specification is  finished over, 
> pining for the fjords, etc.  Please everyone reach  around and pat yourselves 
> on the back, I think the community will  generally view this as a fine piece 
> of work.
> 
> Stand by for announcements on buckling down on Atom-Protocol.
> 
> Note that this is a pronouncement, not a "call for further debate".   Here 
> are the next steps:
> 
> 1. Editors take the assembled changes and produce a format-09 I-D.   Sooner 
> is better.
> 2. They post the I-D.
> 3. Paul sends Scott a message, cc'ing the WG, that we're done.
> 4. At this point there may be objections from the WG.  We decide  whether to 
> accept the objections and pull the draft back, or tell the  objectors they'll 
> have to pursue the appeal process.
> 5. The IESG process takes over at this point and we'll eventually  hear back 
> from them.
> 
> Last two draft changes:
> 
> 1. PaceAtomIdDOS
> 
> We think that the WG has consensus that it is of benefit to add a  warning to 
> section 8 "Security Considerations".  The language from  PaceAtomIdDos is 
> mostly OK, except that the late suggestion of  talking about spoofing instead 
> of DOS seemed to get general support.   I reworded slightly.  We'll leave it 
> up to the editors to decide  whether a new subsection of section 8 is 
> required.
> 
> "Atom Processors should be aware of the potential for spoofing  attacks where 
> the attacker publishes an atom:entry with the atom:id  value of an entry from 
> another feed, perhaps with a falsified  atom:source element duplicating the 
> atom:id of the other feed. Atom  Processors which, for example, suppress 
> display of duplicate entries  by displaying only one entry with a particular 
> atom:id value, perhaps  by selecting the one with the latest atom:updated 
> value, might also  take steps to determine
whether the entries originated from the same  publisher before considering them 
to be duplicates."
> 
> 2. PaceAtom10
> 
> http://www.intertwingly.net/wiki/pie/PaceAtom10
> 
> We just missed this one in the previous consensus call; seeing lots  of +1's 
> and no pushback, it's accepted.
> 
> 
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: Consensus snapshot, 2005/05/25

2005-05-25 Thread Walter Underwood



--On Wednesday, May 25, 2005 11:03:46 AM -0700 Tim Bray <[EMAIL PROTECTED]> 
wrote:


Have I missed any?  Yes, there has been high-volume debate on several
other issues; but have there been any other outcomes where we can
reasonably claim consensus exists?


Changing atom:author to atom:creator? No objections, so far.
I paste together a PACE with the official Dublin Core definition.

Should we mention DC for atom:contributor?

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: inheritance issues

2005-05-24 Thread Walter Underwood

--On May 24, 2005 7:39:40 AM -0600 Antone Roundy <[EMAIL PROTECTED]> wrote:
> On Tuesday, May 24, 2005, at 01:52  AM, Henry Story wrote:
>> Simplify, simplify. I am for removing all inheritance mechanisms...

+1. Inheritance has very minor advantages and very serious disadvantages.

Inheriting values saves typing. It does not save bandwidth, because
HTTP compression will do nearly as well.

It is confusing and tricky to specify and implement. It makes the entry
different when it is standalone or in a feed. It multiplies the number
of test cases needed for validation.

Any one of those problems is serious.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: inheritance issues

2005-05-23 Thread Walter Underwood


--On May 24, 2005 1:02:54 AM +0100 Bill de hÓra <[EMAIL PROTECTED]> wrote:
>
> Inheritance suggests a programming model to allow the evaluator to be coded 
> for it.

Which is why it shouldn't be called "inheritance". I'd prefer something
like "cascading values".

wunder
--
Walter Underwood
Principal Architect, Verity

Re: posted PaceAuthorContributor

2005-05-23 Thread Walter Underwood

--On May 23, 2005 10:52:47 AM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> If you're worried, one good way to  address the issue would be to say that
> "the semantics of this element are based on the Dublin Core's [dc:creator]",
> DC is pretty clear as I  recall.  I've been thinking that would be a good idea
> anyhow.

Let's call it atom:creator, then, and actually use the DC definition.

Not because DC is better, but because it makes the metadata crosswalks
(interoperability) work smoothly.

wunder
--
Walter Underwood
Principal Architect, Verity

PaceCaching

2005-05-20 Thread Walter Underwood

--On Tuesday, May 17, 2005 09:13:37 PM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
PaceCaching
Multiple -1's, it fails.
I'll address the objections anyway, because I (still) think this is 
important.
1. This introduces multiple caching schemes.
Wrong. Right now we have multiple schemes, with HTTP caching, ad hoc client
caching, and ad hoc server-side load shedding. This recommends one consistant
scheme, which we know will work. The current multi-scheme approach is a mess,
and we can be sure that it will have problems.
2. This applies protocol caching to a client.
True, but not really an isssue. HTTP caching does work when used to manage
a client cache. Compare a client working through an HTTP cache to one which
checks the cache information internally before issuing HTTP requests. The HTTP
server will see the same series of requests. Effectively, the client will
run a virtual HTTP cache internally.
3. Server-side parsing is too much overhead.
Maybe with 90 MHz Pentiums, but XML parsing is really fast these days.
Parse the file, cache the values, and toss them if the file has changed
when you stat it. Or, the blog server software can set the cache info
out-of-band to the server.
4. This requires synchronized clocks.
Those are a SHOULD for HTTP, too. And they ought to be a SHOULD for Atom
anwyay, because you cannot date-sort entries from two servers with
unsynchronized clocks.
5. This is just like HTTP-EQUIV and that has failed.
Yes and no. Most HTTP servers ignore HTTP-EQUIV, but it is still useful
for passing through things like content-language when there is no HTTP
header present.
For Atom, the caching info would be valid when there is no HTTP cache
header. This is exactly where HTTP-EQUIV is effective today.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: multiple atom:author elements?

2005-05-20 Thread Walter Underwood

--On Friday, May 20, 2005 09:33:01 AM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
Those are three terrible use cases. Shall we go through every element
in the format and evaluate their fitness for scientific journals,
legal documents, and legislation?
Here is a list of 341 scientific journals with RSS feeds. Soem of
these use a single author element with multiple authors crammed in,
some use multiple author elements. The author elements have other
problems, like "Binks, J. J." vs. "Jar Jar Binks", but that is something
that the WG has ruled out of scope.
 http://www.library.unr.edu/ejournals/alphaRSS.aspx
We really should use "creator" instead of "author". Author is
nonsense for photoblogs. We can do a lot of things with Atom, but
reinventing Dublin Core badly should not be one of those.
+1 for multiple author elements.
+1 for "creator" instead of "author", if anyone wants to go there.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: PaceAllowDuplicateIdsWithModified

2005-05-18 Thread Walter Underwood

--On Thursday, May 19, 2005 01:12:22 AM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote:
(See the wiki for a survey of tools and the dates they support.)
hmmm ... Blogger, Moveable Type, JournURL, bloxsom, ExpressionEngine,
ongoing, Roller, Macsanomat, WordPress, and BigBlogTool all provide dates
which represent the last date/time the entry was modified, and there is no
info for LiveJournal.
We abaondoned full LiveJournal compatability a long time ago by requiring
time zones. Older LJ posts do not have time zones. Don't know about the
current ones.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Atom 1.0?

2005-05-10 Thread Walter Underwood

--On Tuesday, May 10, 2005 09:12:09 AM -0700 Paul Hoffman <[EMAIL PROTECTED]> wrote:
At 9:09 PM -0700 5/9/05, Walter Underwood wrote:
Seriously, I don't mind "Atom 1.0" as long as the next version is
"Atom 2.0".
+12
I'd also be happy with just "Atom" and saying "RFC  Atom" when
pressed for a version. Even with "Atom 1.0" we'll need to say which RFC.
If we choose a specific name, it *must* be in the RFC. Because the RFC
must be a hit for that search.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Walter Underwood

--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck <[EMAIL PROTECTED]> wrote:
>
> I have to agree with Paul.  I don't believe that the issue of white space in
> the syndicated content is really an Atompub issue.  It might be an issue for
> the content creator.  It might be an issue for the reader.  As long as the
> pipe between the two passes the content as submitted, though, the pipe has
> done its job.

If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.

"Everyone has this problem" is not a good reason to ignore it. Someone
has to be the first to solve it, might as well be us. It is not acceptable
to build formats for the "English Wide Web". That doesn't exist any more.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom 1.0?

2005-05-09 Thread Walter Underwood

--On May 9, 2005 7:29:58 PM -0700 Tim Bray <[EMAIL PROTECTED]> wrote:
> 
> Anyone have a better idea? --Tim

Hey, let's vote on a *new* name. I'm +1 on "Naked News", because
it delivers the news without chrome and crap. Or maybe that is what
you get when Atom (Adam?) goes public. Or because sex sells.

Seriously, I don't mind "Atom 1.0" as long as the next version is
"Atom 2.0". Please don't increment the right-of-the-dot part forever,
because I just had to fix some software that made the (reasonable)
assumption that 5.10==5.1, even though "5.10" is really Solaris 10.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Walter Underwood

--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen <[EMAIL PROTECTED]> wrote:
>
> Why would you put line breaks in the CJK source, then? Isn't the "problem"
> solved with the least heuristics by the producer not putting breaks there?

It would be even better if they would just speak English. :-)

White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.

If we get this wrong, Atom-delivered content will look broken in
some languages, and a bunch of extra-spec practice will build up about
how to fix it. Much better to get it right in 1.0.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceCaching

2005-05-07 Thread Walter Underwood

--On May 6, 2005 4:28:44 PM -0700 Paul Hoffman <[EMAIL PROTECTED]> wrote:
>
> -1. Having two mechanisms in two different layers is a recipe for disaster. 
> If HTTP headers are good enough for everything else on the web, they're good 
> enough for Atom.

That would be a problem. But this is one mechanism with two ways to
specify it. One is out-of-band in a server-specific way, the other
is in the document in a standard way. Either way, it is HTTP rules for
caching at all intermediate caches and at the client.

Architecturally, this is exactly the same as HTTP-EQUIV meta tags for
HTTP headers, and very similar to the ROBOTS meta tag for /robots.txt.
In both cases, they provide a way for the document author to specify
something without having permissions on the server software config.

Further, these should be implemented exactly like HTTP-EQUIV, where
the server software reads them and sets the header.

The HTTP-EQUIV meta tag is proof "put it in the header" is not good
enough for everything else. If that wasn't needed, it would be deprecated
by now.

There is a problem here, though. We need to specify the priority of the
in-document specs vs. the HTTP header specs. I propose following the HTTP
standard, in saying that the HTTP headers trump anything in the body.
I'll even assume that following the HTTP spec is non-controversial, and
go update the PACE.

wunder
--
Walter Underwood
Principal Architect, Verity

RE: Selfish Feeds...

2005-05-06 Thread Walter Underwood


--On May 6, 2005 4:37:23 PM -0400 Bob Wyman <[EMAIL PROTECTED]> wrote:

>   Frankly, I really wish that we had done the "blog architecture" work
> many months ago so that we would all have a shared understanding of the
> system-wide issues and components rather than the widely divergent personal
> and partial views that are obvious in many our conversations today...

Agreed. "A conceptual model of a resource" is up there at the front of
our charter, and if we don't have that, it doesn't seem like the WG is done.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-06 Thread Walter Underwood

--On May 5, 2005 10:53:48 AM -0700 John Panzer <[EMAIL PROTECTED]> wrote:
> 
> I assume an HTTP Expires header for Atom content will work and play well with
> caches such as the Google Accelerator (http://webaccelerator.google.com/). 
> I'd also guess that a syntax-level tag won't.  Is this important? 

The syntax-level tag is useful inside a client program with a cache.
It can reduce the number of requests at the source, rather than 
reducing them in the middle of the network at an HTTP cache.

There is extra benefit from putting that info into the HTTP headers,
because the HTTP cache is shared between multiple clients. The source
webserver sees one GET per HTTP cache instead of one GET per Atom client.

The syntax-level tag also provides a way for the feed author to specify the
info without depending on webserver-specific controls. It does depend on
some extra bit of software to take that info and put it in the HTTP
Expires or Cache-control headers.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: AtomPubIssuesList for 2005/05/05

2005-05-05 Thread Walter Underwood

--On May 5, 2005 7:17:00 AM -0400 Sam Ruby <[EMAIL PROTECTED]> wrote:
>
> Demonstrate that you have revisited the previous discussion, and that you 
> either
> have something new to add, or can point out some evidence that the previous
> consensus call was made in error.

PaceCaching was not discussed and rejected based on false information.
It was rejected because it was HTTP-specific (it is not), and because
it was non-core (similar features are common in other RSS specs).

It does not interact with other features, so it should be a fairly
clean, quick discussion.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim <[EMAIL PROTECTED]> wrote:
>
> Not to be flippant, but we have one that's widely available.  It's
> called the Expires header. 

You need the information outside of HTTP. To quote from the RSS spec
for ttl:

  This makes it possible for RSS sources to be managed by a file-sharing 
  network such as Gnutella. 

Caching information is about knowing when your client cache is stale,
regardless of how you got the feed.

wunder
--
Walter Underwood
Principal Architect, Verity

RE: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:15:10 AM +0100 Andy Henderson <[EMAIL PROTECTED]> wrote:
>
> here is no RSS2 feature I can see that allows feed providers to tell
> aggregators the minimum refresh period.  There's the ttl tag.  That was, I
> believe, introduced for a different purpose and determines the Maximum time
> a feed should be cached in a certain situation. 

We need both a ttl (max-age) and expires. One or the other is appropriate
for different publishing needs. We also need to specify what you do with
those values, or you end up with a mess, like the RSS2 ttl meaning reversing
over an undocumented value (Yikes!).

> What has yet to be tried is a specific tag in the core feed standard that
> promotes and determines good behaviour for aggregators refreshing their
> feeds.  Even if it were to prove only a limited benefit, it would still be a
> benefit.

It has been tried several ways, originally in robots.txt extensions and
also in RSS. It doesn't work. The model is not rich enough for publishers
or for spiders/aggregators.

Max-age/expires is already designed and proven. By page count, 20% of the
HTTP 1.1 spec is about caching. If we want to write a new caching/scheduling
approach, we can expect it to be a 20 page spec, plus an additional 10
pages on how to work with the HTTP model.

See the Notes section here for details on when to use max-age or expires,
and on the problems with calendar-based schemes.

  <http://www.intertwingly.net/wiki/pie/PaceCaching>

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom feed refresh rates

2005-05-04 Thread Walter Underwood


PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP
or some other protocol.

PaceCaching was rejected by the editors because it was too late (two months
ago) and non-core. I think that: a) it is never too late to get it right,
and b) scalability is core.

The PACE describes why refresh rates do not solve the problem adequately.

wunder

--On May 4, 2005 5:44:18 AM -0500 Brett Lindsley <[EMAIL PROTECTED]> wrote:

> 
> 
> Andy, I recall bringing up the same issue with respect to portable devices. 
> My angle
> was that firing up the transmitter, making a network connection and 
> connecting to
> the server is still an expensive operation in time and power (for a portable
> device) - even if the server returns nothing .  There is no reason to check 
> feeds
> that are not being updated, but then, there currently is no way to know this.
> 
> I recall there was a proposal on cache control. That seemed like a good 
> direction,
> but I don't recall it being discussed. As you indicated, if the feed had some
> element that indicated it won't be updated (for example) for another day (e.g.
> a "daily news summary"), then the end client would need to only check once
> a day.
> 
> Brett Lindsley, Motorola Labs
> 
> Andy Henderson wrote:
> 
>> If I'm asking this in the wrong place, sorry; please redirect me if you can.
>> 
>> I am the author of an Aggregator and I'm looking for advice on refresh
>> rates.  There was some discussion in this group back in June about a
>> possible 'Refresh rate' element.  That seems to have been dismissed in
>> favour of bandwidth throttling techniques, notably etag, last-modified and
>> compression.  I already support all these plus some additional ones.  I am
>> uncomfortable, though, with the implication that refresh rates don't matter
>> and should be left to the end-user to decide.
>> 
>> I am adding Atom support to my Agg.  For RSS feeds, I have used the ttl and
>> sy:updatePeriod / sy:updateFrequency elements to  allow feed providers to
>> limit refresh rates.  I have, in any case, imposed a minimum refresh rate of
>> one hour - because that seemed the decent thing to do.  However, I'm coming
>> under pressure to reduce that minimum limit for feeds that are clearly
>> designed for shorter refresh periods - such as the Gmail Atom feeds.  I'm
>> reluctant to implement a free-for-all so I'm looking for guidance on how I
>> should tackle this issue.
>> 
>> Andy Henderson
>> Constructive IT Advice
>> 
>>  
>> 
> 
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: FYI: More on duplicates in feeds: DoubleClick does ads the WRONG way!

2005-05-02 Thread Walter Underwood

How to make money from ads is off-topic, sorry for starting in that
direction.
I don't quite get this, though. What about e-mails and RSS is
being compared?
--On Monday, May 02, 2005 11:43:28 AM +0100 James Aylett <[EMAIL PROTECTED]> wrote:
 (2) but it's not that uncommon for people to need to track
 static instances - consider, for instance emails; HTTP level
 cache control is being used as well. There's no reason
 RSS feeds can't be considered in this way.
The business of ads in feeds is pretty confused. How does a feed
relate to an impression, when it might be viewed later or not at all?
I would hope that advertisers figure out that impression-based approaches
don't really mesh with feeds, but in the meantime, we are probably stuck
with hacks like Bob Wyman describes.
We punted explicit support for ads, so they will continue to show up in
content and cause more work for Bob.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: FYI: More on duplicates in feeds: DoubleClick does ads the WRONG way!

2005-05-02 Thread Walter Underwood

--On May 2, 2005 5:32:22 PM +1000 Eric Scheid <[EMAIL PROTECTED]> wrote:
>
> Counting impressions is essential to their trade, and you'll find that it is
> industry standard practice.

Make that "was essential", and "should be a dying practice." Ads have moved to 
results-based billing, paying for clickthrough and conversion.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceOptionalFeedLink

2005-04-30 Thread Walter Underwood

--On April 30, 2005 3:03:50 PM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
>
> "atom:feed elements MUST NOT contain more than one atom:link element
> with a rel attribute value of "alternate" that has the same
> combination of type and hreflang attribute values."

That actually specifies something different, the duplication, without
saying whether atom:link is recommended. I recommend adding this text:

"An atom:feed element SHOULD/MAY contain one such atom:link element."

I'll let other people contribute on whether it is SHOULD or MAY.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceOptionalSummary

2005-04-27 Thread Walter Underwood

--On Wednesday, April 27, 2005 10:38:03 AM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
I am willing to concede that there are valid reasons in particular
circumstances to ignore the requirement for a summary.  Are you willing
to concede that there are implications to such a decision that must be
understood and carefully weighed before chosing to omit a summary?
What are the interoperability considerations that must be carefully
weighed? I think the "full implications" you're worried about are
non-technical.
It certainly makes the feed much less useful for some kinds of applications.
I work on a search engine, and, for us, a titles-only feed is very different
one with content or summaries. A titles-only feed is basically a pretty
version of "ls". The engine follows the links, but does not index the
document. With summaries or content, the feed is a document in its own
right, and it makes sense to index it under its own URL.
A search engine won't ignore a titles-only feed, but it is less likely
to treat it as a first-class document.
Now I'm going to go re-read the PACE to see where I am on +/-.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: PaceOptionalSummary

2005-04-27 Thread Walter Underwood

--On Wednesday, April 27, 2005 02:02:43 AM +0200 "A. Pagaltzis" <[EMAIL PROTECTED]> wrote:
So far I haven?t seen a cogent explanation of the significant
semantics offered by an empty atom:summary inside an otherwise
valid minimum atom:entry.
It should be obvious. It means that when you summarize the content,
you get nothing at all. It is an effectively, but not literally,
content-free entry.
I'm not being entirely silly here. We could distinguish between
"I am not providing a summary" (no element) and "the summary is void"
(empty summary).
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: NoIndex, again

2005-04-19 Thread Walter Underwood


A long time ago, I proposed a robots processing instruction that could
be used in any XML format. I can find that again.

An in-document robots directive is useful because it is controlled
by the document author, rather than by the webmaster.

"nofollow" is not particularly useful, because there is almost always
another path to the document. Still, it can be a polite hint to the 
robot that all the links in the doc are junk.

I would use exactly the model in the HTML robots meta tag, because:
a) that is what robots already know how to deal with, and b) it has
proven good enough.

wunder

--On April 19, 2005 11:15:16 PM -0400 Nikolas 'Atrus' Coukouma <[EMAIL 
PROTECTED]> wrote:

> 
> Hi,
> I've recently ended up in argument about what to do with feeds that
> don't want to be reproduced. I e-mailed Dave Winer in the hope of
> getting some information about RSS end of things. That resulted in a
> blog entry with interesting comments [1], and I now know that Creative
> Commons has an RDF schema for describing licensing [2].
> 
> The only common feature I want to include, and haven't found, is the
> "noindex" type of behavior (do not include in search engines). I
> searched the archives of this list and found an old thread discussing
> this very issue [3]. It seems to have fizzled out and I haven't found
> anything more recent documents or discussions.
> 
> Was the issue simply forgotten or purposfully dropped?
> 
> In the RSS discussion, it was suggested by Roger Benningfield that
> search eninges and syndication sites use atom:summary instead of
> atom:content to avoid the noarchive issue. The rationale is that
> summaries are meant to be reproduced, much like an abstract for a paper.
> 
> I'm not sure about nofollow, I think noindex is definitely needed. The
> latter could be used to opt-out of services such as Feedster,
> Technorati, and PubSub.
> 
> Thoughts and comments?
> 
> [1] http://www.reallysimplesyndication.com/2005/04/19#a445
> [2] http://web.resource.org/cc/
> [3] http://www.imc.org/atom-syntax/mail-archive/msg00183.html
> 
> Regards,
> -Nikolas 'Atrus' Coukouma
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: HTML/XHTML type issues, was: FW: XML Directorate Reviewer Comments

2005-04-13 Thread Walter Underwood

--On April 13, 2005 9:06:59 AM +0300 Henri Sivonen <[EMAIL PROTECTED]> wrote:
>
> Instead of saying "XHTML" it would be clearer to say "XHTML 1.x" or defining 
> it
> in terms of the XHTML 1.x namespace URI.

This could work. "XHTML 1.0" will not be confused with a media type.

When XHTML 2.0 is ready, we can add a supplemental RFC which defines
a new attribute value for that.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceCoConstraintsAreBad

2005-04-09 Thread Walter Underwood


--On April 8, 2005 8:29:52 PM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
>
> Please don't respond to me by saying that accessibility is important.

I would never say that. Required or essential, but not merely important.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceCoConstraintsAreBad

2005-04-08 Thread Walter Underwood

--On April 8, 2005 6:59:47 PM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
>
> Walter, you are missing my point. You've said it yourself:
> 
> "Maybe summaries are optional, but not because accessibility is optional."[0]

That was in reply to a proposal to make accessibility an optional profile, and
to make summaries required only in that profile. That approach is unacceptable.
I would read my comment as "regardless of your position on summaries, 
accessibility
is required."

Local textual summaries are rather common on the web. The  tag, for example.
Current accessibility practice is to make the anchor text understandable out
of context. In other words, to make it a summary of the linked resource.
Even if the remote resource is text!

For the  tag, the alt tag is used to provide a local, textual equivalent.
Again, this is required practice for accessibility. Same thing for graphs,
charts, audio, and video.

These are top-level requirements. They fit on the WAI pocket card. There
are ten "quick tips" and five of them are about local textual equivalents:

  <http://www.w3.org/WAI/References/QuickTips/>

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceCoConstraintsAreBad

2005-04-08 Thread Walter Underwood

--On Friday, April 08, 2005 01:33:20 AM -0400 Robert Sayre <[EMAIL PROTECTED]> wrote:
Accessibility is a non-starter absent expert opinion or substantially
similar formats. Frankly, the notion that remote content constitutes an
accessibility concern is absurd. Might as well write off the whole Web.
No, non-accessible designs are non-starters.
I am mystified when I see non-accessible web sites and technologies
deployed. If a building was build with round doorknobs and steps,
the architect would not get paid until it was fixed and made accessible.
Why is discrimination OK on the web?
Accessibility is required by law, and not just in the US. Plus, it is
an "essential aspect" of the web.
 "The power of the Web is in its universality. Access by everyone
  regardless of disability is an essential aspect."
 -- Tim Berners-Lee
 <http://www.w3.org/WAI/>
Is that expert enough for you?
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Spaces supports slash:comments. Result = Duplicates Galore!

2005-04-07 Thread Walter Underwood

One way to look at this is to define what parts are local content
as opposed to caches of remote, and base the Etag or other hash on
that.
I still think we should address caching in Atom 1.0. This would
have been part of that. Scaling is an essential thing for syndication,
and caching is the best known way to scale.
wunder
--On Thursday, April 07, 2005 02:48:07 PM -0400 Bob Wyman <[EMAIL PROTECTED]> 
wrote:
Spaces.msn.com recently announced support for "slash:comments," an
element which shows how many comments an RSS item has associated with it.
As Dare Obasanjo explains[1]:
"Another cool RSS enhancement is that the number of comments on
each post is now provided using the slash:comments elements. Now
users of aggregators like RSS Bandit can track the comment counts on
various posts on a space. I've been wanting that since last year."
Of course, the side effect of this change is that any aggregator
that uses an MD5-like approach to detect changes will now think that an
entry has been updated every time a new comment is made. This may or may not
be what is desired by consumers of feeds... In any case, there are now
millions of blogs whose entries are changed every time anyone comments on
them. Should aggregators ignore changes that are limited to the
"slash:comments" element? If so, are there other elements that should be
ignored?
Now, Spaces only publishes RSS feeds... However, if similar atom
extensions were to be defined, the problem would appear with Atom feeds as
well.
bob wyman
[1]
http://spaces.msn.com/members/carnage4life/Blog/cns%211piiOwAp2SJRIfUfD95CnR
Lw%21430.entry



--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: Date accuracy

2005-03-25 Thread Walter Underwood

--On March 25, 2005 1:47:29 PM + Graham <[EMAIL PROTECTED]> wrote:
> 
> There are several RSS feeds out there that have dates where the day is 
> accurate
> but the time is always the same (usually 10am for some reason), regardless of
> the time of publication, ...

> Proposal: Add to Date Construct section:
> "Date values must have a granularity of one second"

Precision and accuracy are very different things. Precise timestamps
have a lot of numbers. Accurate timestamps are a correct measurement
of a clock.

Does this mean that they must have an accuracy of one second? That is, that
the timestamp for the update or publish event must be correct within +/- 0.5s 
when compared to a trusted time standard?

Atom already requires the timestamp to be precise to one second, but it is not
practical to require (MUST) accuracy. We could do it, but we'd lock out the 99%
of machines with bad clocks.

Plus, some publications just don't have an accurate time -- archives digitized 
from
paper, like old New Yorker issues or MIT AI lab tech notes. One approach to 
those
is to choose a convention for the time portion, like noon UTC. That is 
most likely to be the same day in other time zones. We could mention that as
a useful convention.

The Atom spec should recommend that clocks be accurate. There is no point in
having sortable timestamps without trustworthy clocks. This is already a
SHOULD in the HTTP 1.1 spec, and you can grab that language from the 
caching Pace I put together. Or I can, if there is support for it.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Alternative to the date regex

2005-03-25 Thread Walter Underwood


+1 on dropping the regex. It isn't from any of the other specs,
it isn't specifically called out as explanatory and non-normative,
and it is too long to be clear.

Some examples would be nice, along with some examples of things
which do not conform.

wunder

--On March 25, 2005 5:11:09 PM + Graham <[EMAIL PROTECTED]> wrote:

> 
> Currently we have this
> 
> "A Date construct is an element whose content MUST conform to the
> date-time BNF rule in [RFC3339].  I.e., the content of this element
> matches this regular expression:
> 
> [0-9]{8}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)
>  ?(Z|[\+\-][0-9]{2}:[0-9]{2})
> 
> As a result, the date values conform to the following specifications..."
> 
> The problem with the regex is that it's entirely redundant. If we look at 
> Norm's message where the regex was suggested [1], he intends it as a profile 
> of xsd:dateTime, which allows a variety of date formats. However we're using 
> it as a profile of RFC3339, which already requires that date-times match the 
> regex 100%. Having the regex there as well is just confusing - until 
> preparing this email I was under the impression it made some additional 
> restrictions on RFC3339.
> 
> The nearest thing I see to an additional restriction is that there must be a 
> capital T between the date and time, which the date-time BNF rule we mention 
> also requires, but the prose later mentions you might be allowed to use 
> something different.
> 
> Proposal:
> Replace the first para and regex with:
> 
> A Date construct is an element whose content MUST conform to the
> date-time BNF rule in [RFC3339]. Note this requires an uppercase letter T
> between the date and time sections.
> 
> Secondly, *all* RFC3339 date-times are compatible with the 4 specs mentioned, 
> so the wording of the second paragraph ("As a result...") is a bit strange, 
> since it's not as a result of anything we've done. Just say "Date values 
> expressed in this way are also compatible with...".
> 
> Graham
> 
> [1]http://www.imc.org/atom-syntax/mail-archive/msg13116.html
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: new issues in draft -06, was: Updated issues list

2005-03-20 Thread Walter Underwood

--On March 20, 2005 11:44:30 AM -0800 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> Good point.  My impression is that we do currently have SHOULD-level mandate 
> to 
> serve valid HTML; recognizing that most real-world implementors do make a 
> best-effort
> with tag soup.  Anyone who thinks that the language needs improving should 
> suggest
> improvements. 

I support a SHOULD on that. The Robustness Principle would suggest exactly
that. Consumers of Atom may make an attempt to parse arbitrary HTML-like
content, but producers should make the effort to serve clean HTML.

That free-range HTML is nasty stuff. In the past week, we had two customers
freely mixing slash and backslash in their URL paths. Sigh.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceRepeatIdInDocument solution

2005-02-20 Thread Walter Underwood

About logical clocks in atom:modified:

--On February 21, 2005 3:30:13 AM +1100 Eric Scheid <[EMAIL PROTECTED]> wrote:
>
> Semantically, it would work ... for comparing two instances of one entry. It
> wouldn't work for establishing if an entry was modified before or after
> [some event moment] (eg. close of the stock exchange).

Establishing sequences of events is rather tricky. See Leslie Lamport's
"Time, Clocks, and the Ordering of Events in Distributed Systems" for how
to do it with logical clocks. The core part of the paper is short, maybe
five pages, and definitely worth reading if you care about this stuff.

 <http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf>

Synchronized clocks make this simpler. If Atom depends on comparing timestamps
from different servers, then synchronized clocks are a SHOULD. See the text in
PaceCaching for an example.

Synchronized clocks are already a SHOULD for HTTP.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Consensus call on last round of Paces

2005-02-15 Thread Walter Underwood

--On February 15, 2005 8:56:24 PM +0100 Anne van Kesteren <[EMAIL PROTECTED]> 
wrote:
> Walter Underwood wrote:
>> This also means that Atom cannot be used for BBC News, where order is
>> significant and non-chronological.
> 
> Could you elaborate on that?

The BBC News feeds are ordered by "importance", not date. Since the order
is not significant, intermediate nodes could re-order the feed and be
perfectly legal Atom processors.

A publishing date order can be recovered from the date information in Atom.
Other orders cannot.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Consensus call on last round of Paces

2005-02-15 Thread Walter Underwood

--On February 15, 2005 11:12:48 AM -0800 Tim Bray <[EMAIL PROTECTED]> wrote:
>
> PaceEntryOrder
> One -1, but overwhelming support otherwise.
> DISPOSITION: Accepted.

I was the -1, and there is an open issue here. Accepting this means
that Atom cannot represent RSS 1.0 feeds. Is that OK? If so, where
do we state that in the spec?

As far as I know, this is the only exception to interoperability with
other RSS formats.

This also means that Atom cannot be used for BBC News, where order
is significant and non-chronological.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: "atom:entry elements MUST contain an atom:summary element in any of the following cases"

2005-02-15 Thread Walter Underwood


I don't think that accessibility is optional. It isn't a profile, it is
a requirement. Maybe summaries are optional, but not because accessibility
is optional.

wunder

--On February 14, 2005 8:48:08 PM -0800 James M Snell <[EMAIL PROTECTED]> wrote:

> At the risk of beating the PaceProfile drum to death, I would think that   an 
> Accessibility profile could be used to specify specific requirements for 
> accessible feeds.  The core could do exactly as you suggest below -- not 
> require summary.



--
Walter Underwood
Principal Architect, Verity

RE: PaceHeadless

2005-02-08 Thread Walter Underwood

--On Tuesday, February 08, 2005 08:39:42 AM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote:
Linking to the feed is not an acceptable solution. It must be
possible to embed feed metadata in an entry in a feed and in an Entry
document.
+1
The feed document *must* be standalone. Everything required to
interpret the feed has to be in the feed.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: PaceProfile

2005-02-07 Thread Walter Underwood


--On February 7, 2005 7:13:21 PM -0500 Robert Sayre <[EMAIL PROTECTED]> wrote:
>
> So, you're looking for a way to include a "schema" association in the feed,
> and you want a standard way to do it. The only processors that will do 
> anything
> useful with this information are those that know about the "profile".

Sounds like a job for .

Or a processing instruction, but I seem to be the only person that likes those.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceEntryOrder

2005-02-07 Thread Walter Underwood

--On February 7, 2005 4:27:12 PM -0500 Sam Ruby <[EMAIL PROTECTED]> wrote:
>
> Ultimately, the sentiment that I want conveyed is that publishers are not
> safe to assume that clients will read anything into the order.

And I think that the order should mean "the publisher put them in this order."
The Pace forbids that interpretation.

Clients can reorder things, show only a few, whatever. I'm not restricting
client behavior.

Do other specs in the RSS family say anything about order? If order is
significant in those, then making it not significant in Atom will hurt
interoperability.

Hmm, I can't finding any ordering restrictions in a quick read of RSS 0.91 
and 2.0, but RSS 1.0 does specify ordering.

>From RSS 1.0: 5.3.5 

   An RDF Seq (sequence) is used to contain all the items rather than an
   RDF Bag to denote item order for rendering and reconstruction.

   <http://web.resource.org/rss/1.0/spec#s5.3.5>

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceEntryOrder

2005-02-07 Thread Walter Underwood

--On Monday, February 07, 2005 12:24:15 PM -0800 Paul Hoffman <[EMAIL PROTECTED]> wrote:
At 11:07 AM -0800 2/7/05, Walter Underwood wrote:
-1. I don't see the benefit. Clients MAY re-order them, but that
doesn't mean they MUST ignore the order. The publisher may prefer
an order which cannot be expressed in the attributes. The Macintouch
and BBC New feeds cited before are good examples.
I'm very confused. Clients that show the entries of those feeds in
the received order are perfectly acceptable according to the wording of this 
Pace.
Correct, clients may choose any order, including the original.
This is about the publisher's order preference. The Pace says that
the publisher cannot indicate a preferred order in the Atom format.
The order is not significant.
This is clearly counter to normal use, where the order does have
some meaning. The meaning varies by publisher, but it is usually
significant.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

Re: PaceEntryOrder

2005-02-07 Thread Walter Underwood


--On February 7, 2005 1:06:49 PM -0500 Robert Sayre <[EMAIL PROTECTED]> wrote:
> Paul Hoffman wrote:
>> 
>> +1. It is a simple clarification that shows the intention without 
>> restricting anyone.
> 
> +1. Agree in full.

-1. I don't see the benefit. Clients MAY re-order them, but that
doesn't mean they MUST ignore the order. The publisher may prefer
an order which cannot be expressed in the attributes. The Macintouch
and BBC New feeds cited before are good examples.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: PaceCaching posted

2005-02-07 Thread Walter Underwood

This is not restricted to HTTP. It uses HTTP's cache age algorithms,
because they are very carefully designed and have proven effective.
But it can be used for any local copy in an Atom client.
wunder
--On Monday, February 07, 2005 10:08:48 AM -0800 Paul Hoffman <[EMAIL 
PROTECTED]> wrote:
At 9:38 AM -0800 2/7/05, Walter Underwood wrote:
I was holding this back as out of scope and too close to the deadline,
but now that we are talking about sliding windows and delayed, cached
state, it is quite relevant.
Sorry, this is too late for consideration for the Atom core. Even if you 
had turned it in on time, I would give it a -1 for not being essential to the 
core for the Atom format. Atom will be distributed over many protocols, HTTP 
being one of them. Having said that, I think this would be an excellent 
extension, one that might keep the folks who don't understand HTTP scalability 
but feel free to talk about it anyway at bay.
--Paul Hoffman, Director
--Internet Mail Consortium

--
Walter Underwood
Principal Architect
Verity Ultraseek

PaceCaching posted

2005-02-07 Thread Walter Underwood


I was holding this back as out of scope and too close to the deadline,
but now that we are talking about sliding windows and delayed, cached
state, it is quite relevant.

This proposal uses HTTP caching algorithms, but does not require an
HTTP transport. Atom over other transports can use these algorithms.

  <http://www.intertwingly.net/wiki/pie/PaceCaching>

wunder
--
Walter Underwood
Principal Architect, Verity

RE: PaceArchiveDocument posted

2005-02-07 Thread Walter Underwood

I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.

So, it is not possible for the current doc to fulfill the charter, and
this document is not ready for last call.

wunder

--On February 6, 2005 2:00:20 AM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote:

> 
> -1.
>   The use cases for archiving have not been well defined or well
> discussed on this list. It is, I believe, inappropriate and unwise to try to
> rush through something this major at the last moment before a pending Last
> Call.
> 
>   bob wyman
> 
> 
> 

--
Walter Underwood
Principal Architect, Verity

Re: PaceClarifyDateUpdated

2005-02-07 Thread Walter Underwood

--On February 6, 2005 1:07:42 PM +0200 Henri Sivonen <[EMAIL PROTECTED]> wrote:
>
> Yes. Also as a spec expectation--that is, how often is the "SHOULD NOT" 
> expected
> to be violated. Will the SHOULD NOT be violated so often that it dilutes the
> meaning of all SHOULD NOTs?

Roughly, a SHOULD or SHOULD NOT can be violated when the implementer 
understands and accepts the interoperability limitations they of that
decision.

So, the spec should (must?) explain what those are.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Entry order

2005-02-04 Thread Walter Underwood


--On February 4, 2005 4:28:53 PM -0600 "Roger B." <[EMAIL PROTECTED]> wrote:
>> If clients are told to ignore the order, and given only an updated timestamp,
>> there is no way to show "most recent headlines"...
> 
> At a single moment within a feedstream, sure... but the next time an
> entry is added to that feed, I'll have no problem letting the user
> know that this is new stuff.

But if three are added, you can't order those three. 

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Entry order

2005-02-04 Thread Walter Underwood

--On February 4, 2005 11:44:31 AM -0800 Tim Bray <[EMAIL PROTECTED]> wrote:
> On Feb 4, 2005, at 11:27 AM, Walter Underwood wrote:
> 
>> Is this a joke? This is like saying that the order of the entries in my
>> mailbox is not significant. Note that ordering a mailbox by date is not
>> the same thing as its native order.
> 
> Except for, Atom entries have a *compulsory*  date.  So I have no
> idea what semantics you'd attach to the "natural" order... -Tim

Order the publisher wants to present them in. Conventionally, most recently
published first. Entries may be updated without being reordered.

If clients are told to ignore the order, and given only an updated timestamp,
there is no way to show "most recent headlines", which is the primary 
purpose of the whole family of RSS formats.

Right now, you can shuffle the entries and Atom says it is the same feed.

Either we need a published date stamp or we need to honor the order.

wunder
--
Walter Underwood
Principal Architect, Verity

RE: Entry order

2005-02-04 Thread Walter Underwood

--On February 3, 2005 11:21:50 PM -0500 Bob Wyman <[EMAIL PROTECTED]> wrote:
> David Powell wrote:
>> It looks like this might have got lost accidently when the 
>> atom:head element was introduced. Previously Atom 0.3 said [1]:
>>> Ordering of the element children of atom:feed element MUST NOT be
>>> considered significant.
>   +1. 
>   The order of entries in an Atom feed should NOT be significant. This
> is, I think, a very, very important point to make. 

-1

Is this a joke? This is like saying that the order of the entries in my
mailbox is not significant. Note that ordering a mailbox by date is not
the same thing as its native order. 

Feed order is the only way we have to show the publication order of items 
in a feed. I just looked at all my subscriptions, and there is only one
where the order might not be relevant, a security test for RSS readers.
That is clearly not within Atom's charter, so it doesn't count.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: xsd:dateTime vs. RFC 3339

2005-02-04 Thread Walter Underwood

--On February 4, 2005 6:46:33 PM +0100 Julian Reschke <[EMAIL PROTECTED]> wrote:
>> Also, we have an unresolved issue with historic Livejournal entries,
>> which do not have timezones. XML Schema explains exactly how to 
> 
> So what does it recommend?
> 
>> handle those. We can have a SHOULD for timezone info, with an explanation
>> of what you lose without that.

Treating the datetime value as if it has an uncertainty equal to the
maximum possible timezone offset. The other advantage of use XML Schema
is that is defines how to order timestamps, which is the main thing
we want to do with them.

I think the section is pretty clear, and I'm picky about specs:

  <http://www.w3.org/TR/xmlschema-2/#dateTime>

wunder
--
Walter Underwood
Principal Architect, Verity

Re: xsd:dateTime vs. RFC 3339

2005-02-04 Thread Walter Underwood

--On February 4, 2005 11:18:17 AM -0500 Norman Walsh <[EMAIL PROTECTED]> wrote:
>
> I know we're writing an IETF document, but I think there's going to be
> a lot of off-the-shelf XML software that understands xsd:dateTimes and
> I think it would be a lot better if we defined Date Constructs in
> terms of W3C XML Schema Part 2 than RFC 3339.

Strongly agree.

Also, we have an unresolved issue with historic Livejournal entries,
which do not have timezones. XML Schema explains exactly how to 
handle those. We can have a SHOULD for timezone info, with an explanation
of what you lose without that.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Atom for Archives (was:Re: Call for final Paces for consideration: deadline imminent)

2005-02-03 Thread Walter Underwood

--On February 3, 2005 1:31:45 PM +0200 Henri Sivonen <[EMAIL PROTECTED]> wrote:
> On Feb 3, 2005, at 08:09, James Snell wrote:
> 
>> What is the model for archiving with Atom?
> 
> What's the *point* in archiving with Atom compared to eg. a zip archive with
> some HTML or XHTML files in it (with relative links and a stipulation that
> index.html and index.xhtml are magic names)?

Cross-platform dump and load. Saving data that is in the database and not
in the HTML. Backups. Dump and reload for an upgrade with a DB schema change.
Consistent save from a live database (hold a read lock while you dump the
archive). Insurance against your blog service going away on short notice.
Sarbanes-Oxley compliance for corporate blogs (internal and external).

And of course, so Brewster Kahle can keep a copy. The Wayback Machine
has saved my butt a couple of times.

wunder
--
Walter Underwood
Principal Architect, Verity

Re: Call for final Paces for consideration: deadline imminent

2005-02-02 Thread Walter Underwood


The charter says that Atom will work for archiving. We don't know that
it will, and it hasn't been discussed for months.

Is the current Atom spec sufficient for archiving? If not, we aren't done.

wunder

--On February 2, 2005 5:46:51 PM -0800 Paul Hoffman <[EMAIL PROTECTED]> wrote:

> 
> Greetings again. And, thanks again for all the work people did on the last 
> work queue rotation. We now have the end of the format draft squarely in 
> sight.
> 
> The WG still has a bunch of finished Paces that have not been formally 
> considered, a (thankfully) much smaller number of unfinished Paces, and a 
> couple of promises that "I'll write that up as a Pace soon". We need to 
> finish soon in order to make our milestone, and I believe we can do so 
> gracefully.
> 
> On Monday, Feb. 7, the Working Group's final queue rotation will consist of 
> all Paces open at that time. Any Paces that have obvious holes in them ("to 
> be filled in later", "more needs to go here", etc.) will be ignored. We have 
> had over a year of time here, and many weeks since the previous attempt to 
> close things out. On Monday, Feb. 14, we will assess WG consensus and ask the 
> document authors to put together a final draft.
> 
> Note that this is not the last opportunity for work on the Atom format. For 
> one thing, there are plenty of non-core extensions that folks have been 
> mulling over; having the core draft finally finished will help those to 
> emerge. Further, we need to do the final work on the protocol document. Also, 
> during the formal IETF Last Call, discussion of the format draft will be 
> welcome from everyone (including people who have not read any of the earlier 
> drafts).
> 
> Please do *not* rush out to write a Pace unless it is for something that is 
> *truly* part of the Atom core, and you really believe that it is likely that 
> there will be consensus within a week. If your idea is appropriate as an 
> extension, or is for something that is quite similar to something else that 
> has explicitly gotten lack of consensus, please do not write a Pace. In the 
> former case, please hold your extensions for a few weeks; in the latter case, 
> please recognize that asking the WG to
focus on something that they don't want will likely cause us to do a worse job 
at carefully reviewing things that we all want.
> 
> So, if you have an incomplete Pace now, you have a few more days to complete 
> it. Of course, everyone should feel free to continue talking about the 
> current Paces now, and to continue to suggest editorial changes to the 
> current Internet Draft.
> 
> --Paul Hoffman, Director
> --Internet Mail Consortium
> 
> 



--
Walter Underwood
Principal Architect, Verity

Re: Format spec vs Protocol spec

2005-02-02 Thread Walter Underwood


Correct. --wunder

--On February 2, 2005 12:35:31 PM -0700 Antone Roundy <[EMAIL PROTECTED]> wrote:

> Let me make sure I understand you correctly--are you saying that it's fine 
> for the format and protocol to have their own elements in their own 
> namespaces, but 1.0 of each should be finished at the same time, to ensure 
> that we don't run into any surprises while finishing protocol 1.0 which 
> require a format revision (eg. 1.1) in order to make protocol 1.0 work? 



--
Walter Underwood
Principal Architect, Verity

1 2 >

1 - 100 of 117 matches

Mail list logo