Re: Fyi, Apache project proposal

2006-05-23 Thread Walter Underwood


--On May 23, 2006 3:18:18 PM +0200 Ugo Cei [EMAIL PROTECTED] wrote:


Demokritos might be quite well advanced but unfortunately Python code is not
very suited for us poor souls who still have to struggle with  java 
environments ;-)


The goal is a reference implementation. The goal is to be exactly correct.
Being in a particular language, or even being fast enough to be usable,
is beside the point. In particular, a reference implementation should
always choose code readability over speed.

If the goal is to have a standard, free implementation that everyone uses,
that is different from a reference implementation and the goals should
say that.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy (Ultraseek)



Re: Atom syndication schema

2006-03-14 Thread Walter Underwood

--On March 15, 2006 4:25:40 PM +1100 Eric Scheid [EMAIL PROTECTED] wrote:

 Since the original discussion I've stumbled across something extra that
 makes xml:lang relevant for atom:name.
 
 Seems that in writing Hungarian names, the pattern is always surname
 followed by forename - e.g. Bartók Béla, where Béla is the personal name and
 Bartók is the family name.

Or Margittai Neumann János vs. John von Neumann. It can be more complicated
than first/last or last/first.

I'm pretty sure that I brought this up and the WG decided to punt.

Representing personal names well means starting with X.500 and asking
around to see what could be improved. That is well outside the Atom charter.
Punting was the right thing to do, but it means that atom:name is minimal.

xml:lang isn't enough information to sort out given name and family name.
About all you can do with atom:name is print it out.

xml:lang could be useful in deciding between Chinese and Japanese variants
of a character for names. 

wunder
--
Walter Underwood
Principal Software Architect, Autonomy



Re: wiki mime type

2006-03-07 Thread Walter Underwood

It isn't wiki. Those are used in blogs, and I use Markdown for simple
HTML memos.

Don't use x-, either. Register a real type.

wunder

--On March 7, 2006 5:51:42 PM +0100 Henry Story [EMAIL PROTECTED] wrote:

 
 On 6 Mar 2006, at 18:54, James Tauber wrote:
 Agreed that this would be very useful and also that it needs to be  
 done
 on a per wiki format basis.
 
 Is there a forum a large number of them tend to hang out on, so that  one 
 could ask them to think about this? What would be the best to do  in the 
 meantime? Something like
 
 text/x-wiki+textile
 text/x-wiki+markdown
 
 perhaps?
 
 
 I think, however, that this is something the format creators should be
 encouraged to register, or at least suggest a convention for.
 
 James
 
 On Mon, 06 Mar 2006 07:59:10 -0800, Walter Underwood
 [EMAIL PROTECTED] said:
 
 --On March 6, 2006 3:59:39 PM +0100 Henry Story  
 [EMAIL PROTECTED]
 wrote:
 
 Silly question probably, but is there a wiki mime type?
 I was thinking of text/wiki or text/x-wiki or something.
 
 I want people to be able to edit their blogs in wiki format in  
 BlogEd  and be able
 to distinguish when they do that from when they enter  plain  
 text, html or xhtml.
 Perhaps this is also useful for the protocol.
 
 It would be really useful, especially for feeds that archive the  
 content
 of a blog. It would be best to use the official names of the formats,
 like
 text/markdown or text/textile. The wikis and blogs that I use  
 can be
 configured to accept different formats, so text/wiki doesn't work.
 
 wunder
 --
 Walter Underwood
 Principal Software Architect, Autonomy
 
 -- 
   James Tauber   http://jtauber.com/
   journeyman of somehttp://jtauber.com/blog/
 
 



--
Walter Underwood
Principal Software Architect, Autonomy



Re: wiki mime type

2006-03-06 Thread Walter Underwood

--On March 6, 2006 3:59:39 PM +0100 Henry Story [EMAIL PROTECTED] wrote:

 Silly question probably, but is there a wiki mime type?
 I was thinking of text/wiki or text/x-wiki or something.
 
 I want people to be able to edit their blogs in wiki format in BlogEd  and be 
 able
 to distinguish when they do that from when they enter  plain text, html or 
 xhtml.
 Perhaps this is also useful for the protocol.

It would be really useful, especially for feeds that archive the content
of a blog. It would be best to use the official names of the formats, like
text/markdown or text/textile. The wikis and blogs that I use can be
configured to accept different formats, so text/wiki doesn't work.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy



Re: Atom logo where?

2006-03-06 Thread Walter Underwood

--On March 6, 2006 7:02:23 PM +0100 A. Pagaltzis [EMAIL PROTECTED] wrote:

 For that matter, who has seen Mena Trott’s alternative Atom logo
 design and what do people think about it?

1. I don't see why Atom needs a logo.
2. The proposed logo is probably too close to the Autonomy logo.

I cannot speak for Autonomy lawyers, but companies are faced with 
defend it or lose it on their trademarks. Autonomy is in the 
unstructured info business, so there is probably a conflict.

It also looks like the logo for the Austin Bergtrom International
Airport, but that doesn't conflict.

wunder
--
Walter Underwood
Principal Software Architect, Autonomy



Re: atom:updated handling

2006-02-15 Thread Walter Underwood

It doesn't hurt to point it out. It could catch some developer errors.
But it doesn't make an invalid feed. --wunder

--On February 15, 2006 4:25:35 PM -0800 James M Snell [EMAIL PROTECTED] wrote:

 
 I personally think that the feedvalidator is being too anal about
 updated handling.  Entries with the same atom:id value MUST have
 different updated values, but the spec says nothing about entries with
 different atom:id's.
 
 - James
 
 James Yenne wrote:
 I'm using the feedvalidtor.org to validate a feed with entries
 containing atom:updated that may have the same datetime, although
 different atom:id. The validator complains that two entries cannot have
 the same value for atom:updated. I generate these feeds and the
 generator uses the current datetime, which may be exactly the same. I
 don't understand why the validator should care about these
 updated values from different entries per atom:id - these are totally
 unrelated entries.   Is the validator wrong?  It seems that otherwise I
 have to play tricks to make these entries have different updated within
 the feed.
  
 I'm not sure how this relates to the thread More on atom:id handling
  
 Thanks,
 James
 
 



--
Walter Underwood
Principal Software Architect, Autonomy



Re: [Fwd: Re: todo: add language encoding information]

2005-12-23 Thread Walter Underwood

--On December 23, 2005 11:31:22 PM +0100 Henry Story [EMAIL PROTECTED] wrote:

 So  you can't have a link pointing from an entry to an id, without losing some
 very important information. We need something more  specific. We need a link
 pointing from A to C as shown by the blue line.

Some people will need that in the guts of their publishing system. Why do
we need it in Atom? Is there something essential that subscribers cannot do
because this isn't represented? This sounds like something needed for the
publishing/translation workflow, not for the general readership.

Extended provenance information is sometimes needed, but there is almost
no limit to that. It certainly does not stop at translation, source, and
translator. I'm reading a new translation of Andersen's tales where 
Thumbelina is Inchelina because the translator knew the right dialect
of Danish. That is significant, but does it need to be in Atom?

The semantics here should be exactly the same as for dates -- the date
means what the publisher thinks it means. Same for language info. Trying
to get more exact means that the model will be wrong for some publishers
that generate completely legal Atom.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: ACE - Atom Common Extensions Namespace

2005-10-02 Thread Walter Underwood

--On October 2, 2005 9:35:28 AM +0200 Anne van Kesteren [EMAIL PROTECTED] 
wrote:
 
 Having a file and folder of the same name is not technically possible. 
 (Although
 you could emulate the effect of course with some mod_rewrite.)

Namespaces aren't files, only names. So the limitations of some particular
file name implementation are meaningless for namespaces.

Also, some filesystem implementations do allow a file and a folder
with the same name.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Arr! Avast me hearties!

2005-09-19 Thread Walter Underwood

I think we just got a nomination for an April 1 RFC. Nice job.
More accurate than the x-hacker locale on Google, because that
is really still english, not some other hacker language.
Besides, they didn't make the spell suggest work in l33t.

wunder

--On September 20, 2005 3:09:56 AM +0100 James Holderness [EMAIL PROTECTED] 
wrote:

 A conforming client SHOULD perform an HTTP request for the feed with the 
 Accept-Language header set to en-pirate (or whatever the standard RFC 3066 
 language tag for the pirate dialect of english). A conforming server SHOULD 
 return the pirate version of the feed with the Content-Language header set to 
 en-pirate and/or the xml:lang attribute set to en-pirate in the root 
 element.



--
Walter Underwood
Principal Software Architect, Verity



Re: Top 10 and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 1:49:57 AM -0400 Bob Wyman [EMAIL PROTECTED] wrote:

 I’m sorry, but I can’t go on without complaining.  Microsoft has proposed
 extensions which turn RSS V2.0 feeds into lists and we’ve got folk who are
 proposing much the same for Atom (i.e. stateful, incremental or partitioned
 feeds)… I think they are wrong. Feeds aren’t lists and Lists aren’t feeds.

The Atom spec says:

   This specification assigns no significance to the order of atom:entry
   elements within the feed.

One could read that to mean that feeds are fundamentally unordered or that
Atom doesn't say what the order means.

Other RSS formats are ordered, either implicitly or explicity (RSS 1.0).
For interoperatility, lots of software is going to treat Atom as ordered.
Otherwise, it is not possible to go from Atom to RSS 1.0.

 What is a search engine or a matching engine supposed to return as a resul
  if it find a match for a user query in an entry that comes from a list-feed?

Maybe the list feed should have a noindex flag.

 Should it return the entire feed or should it return just the entry/item
 that contained the stuff in the users’ query?

I'd return the entry. It is all about the entries. If the list position is
semantically important to the entry, then include a link from the entry to
the list. This is movie 312 in wunder's queue.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Top 10 and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre [EMAIL PROTECTED] 
wrote:
 One could read that to mean that feeds are fundamentally unordered or that
 Atom doesn't say what the order means.
 
 Is not logical order, if any, determined by the datetime of the published
 (or updated) element?

That is one kind of order. Other kinds are relevance to a search term
(A9 OpenSearch), editorial importance (BBC News feeds), or datetime of
original publication (nearly all blog feeds, not the same as last update).

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Top 10 and other lists should be entries, not feeds.

2005-08-30 Thread Walter Underwood

--On August 30, 2005 3:50:45 PM -0600 Peter Saint-Andre [EMAIL PROTECTED] 
wrote:
 Otherwise, it is not possible to go from Atom to RSS 1.0.
 
 I assume you mean from RSS 1.0 to Atom. :-)

No. You can go from a Bag to List by ignoring the order. RSS 1.0 is a
List, so you would need to invent an order to put unordered items in it.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood


--On Monday, August 29, 2005 10:39:33 AM -0600 Antone Roundy [EMAIL 
PROTECTED] wrote:


As has been suggested, to inline images, we need to add frame documents,
stylesheets, Java applets, external JavaScript code, objects such as Flash
files, etc., etc., etc.  The question is, with respect to feed readers, do
external feed content (content src=... /), enclosures, etc. fall into
the same exceptions category or not?


Of course a feed reader can read the feed, and anything required
to make it readable. Duh.

And all this time, I thought robots.txt was simple.

robots.txt is a polite hint from the publisher that a robot (not
a human) probably should avoid those URLs. Humans can do any stupid
thing they want, and probably will.

The robots.txt spec is silent on what to do with URLs manually-added
to a robot. The normal approach is to deny those, with a message that they
are disallowed by robots.txt, and offer some way to override that.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek



Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood

--On August 30, 2005 11:39:04 AM +1000 Eric Scheid [EMAIL PROTECTED] wrote:

 Someone wrote up A Robots Processing Instruction for XML Documents
 http://atrus.org/writings/technical/robots_pi/spec-199912__/
 That's a PI though, and I have no idea how well supported they are. I'd
 prefer a namespaced XML vocabulary.

That was me. I think it makes perfect sense as a PI. But I think reuse
via namespaces is oversold. For example, we didn't even try to use
Dublin Core tags in Atom.

PI support is required by the XML spec -- must be passed to the
application.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-29 Thread Walter Underwood

--On August 29, 2005 7:05:09 PM -0700 James M Snell [EMAIL PROTECTED] wrote:

 x:index=no|yes doesn't seem to make a lot of sense in this case.

It makes just as much sense as it does for HTML files. Maybe it is a
whole group of Atom test cases. Maybe it is a feed of reboot times 
for the server.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

There are no wildcards in /robots.txt, only path prefixes and user-agent
names. There is one special user-agent, *, which means all.
I can't think of any good reason to always ignore the disallows for *.

I guess it is OK to implement the parts of a spec that you want.
Just don't answer yes when someone asks if you honor robots.txt.

A lot of spiders allow the admin to override /robots.txt for specific
sites, or better, for specific URLs.

wunder

--On August 25, 2005 11:47:18 PM -0500 Roger B. [EMAIL PROTECTED] wrote:

 
 Bob: It's one thing to ignore a wildcard rule in robots.txt. I don't
 think its a good idea, but I can at least see a valid argument for it.
 However, if I put something like:
 
 User-agent: PubSub
 Disallow: /
 
 ...in my robots.txt and you ignore it, then you very much belong on
 the Bad List.
 
 --
 Roger Benningfield
 
 



--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

--On August 26, 2005 9:51:10 AM -0700 James M Snell [EMAIL PROTECTED] wrote:

 Add a new link rel=readers whose href points to a robots.txt-like file that
 either allows or disallows the aggregator for specific URI's and establishes
 polling rate preferences
 
   User-agent: {aggregator-ua}
   Origin: {ip-address}
   Allow: {uri}
   Disallow: {uri}
   Frequency: {rate} [{penalty}]
   Max-Requests: {num-requests} {period} [{penalty}]

No, on several counts.

1. Big, scalable spiders don't work like that. They don't do aggregate
frequencies or rates. They may have independent crawlers visiting the
same host. Yes, they try to be good citizens, but you can't force
WWW search folk to redesign their spiders.

2. Frequencies and rates don't work well with either HTTP caching or
with publishing schedules. Things are much cleaner with a single 
model (max-age and/or expires).

3. This is trying to be a remote-control for spiders instead of describing
some characteristic of the content. We've rejected the remote control
approach in Atom.

4. What happens when there are conflicting specs in this file, in
robots.txt, and in a Google Sitemap?

5. Specifying all this detail is pointless if the spider ignores it.
You still need to have enforceable rate controls in your webserver
to handle busted or bad citizen robots.

6. Finally, this sort of thing has been proposed a few times and never
caught on. By itself, that is a weak argument, but I think the causes
are pretty strong (above).

There are some proprietary extensions to robots.txt:

Yahoo crawl-delay:
http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html

Google wildcard disallows:
http://www.google.com/remove.html#images

It looks like MSNbot does crawl-delay and an extension-only wildcard:
http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-26 Thread Walter Underwood

I'm adding robots@mccmedia.com to this dicussion. That is the classic
list for robots.txt discussion.

Robots list: this is a discussion about the interactions of /robots.txt
and clients or robots that fetch RSS feeds. Atom is a new format in
the RSS family.

--On August 26, 2005 8:39:59 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote:

 While true that each of these scenarios involve crawling new links,
 the base principle at stake is to prevent harm caused by automatic or
 robotic behaviour. That can include extremely frequent periodic re-fetching,
 a scenario which didn't really exist when robots.txt was first put together.

It was a problem then:

   In 1993 and 1994 there have been occasions where robots have visited WWW
   servers where they weren't welcome for various reasons. Sometimes these
   reasons were robot specific, e.g. certain robots swamped servers with
   rapid-fire requests, or retrieved the same files repeatedly. In other
   situations robots traversed parts of WWW servers that weren't suitable,
   e.g. very deep virtual trees, duplicated information, temporary information,
   or cgi-scripts with side-effects (such as voting).
   http://www.robotstxt.org/wc/norobots.html

I see /robots.txt as a declaration by the publisher (webmaster) that
robots are not welcome at those URLs. 

Web robots do not solely depend on automatic link discovery, and haven't
for at least ten years. Infoseek had a public Add URL page. /robots.txt
was honored regardless of whether the link was manually added or automatically
discovered.

A crawling service (robot) should warn users that the URL, Atom or otherwise,
is disallowed by robots.txt. Report that on the status page for that feed.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood

--On August 25, 2005 3:43:03 PM -0400 Karl Dubost [EMAIL PROTECTED] wrote:
 Le 05-08-25 à 12:51, Walter Underwood a écrit :
 /robots.txt is one approach. Wouldn't hurt to have a recommendation
 for whether Atom clients honor that.
 
 Not many honor it.

I'm not surprised. There seems to be a new generation of robots that
hasn't learned much from the first generation. The Robots mailing list
is silent these day. That is why we should make a recommendation about it.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: Don't Aggregrate Me

2005-08-25 Thread Walter Underwood

I would call desktop clients clients not robots. The distinction is
how they add feeds to the polling list. Clients add them because of
human decisions. Robots discover them mechanically and add them.

So, clients should act like browsers, and ignore robots.txt.

Robots.txt is not very widely deployed (around 5% of sites), but it 
does work OK for general web content.

wunder

--On August 25, 2005 10:25:08 PM +0200 Henry Story [EMAIL PROTECTED] wrote:

 
 Mhh. I have not looked into this. But is not every desktop aggregator  a 
 robot?
 
 Henry
 
 On 25 Aug 2005, at 22:18, James M Snell wrote:
 At the very least, aggregators should respect robots.txt.  Doing so  
 would allow publishers to restrict who is allowed to pull their feed.
 
 - James
 
 
 



--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-23 Thread Walter Underwood

--On August 23, 2005 9:40:44 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote:

 There's nothing in the XML spec requiring the app to throw away the data
 structures it has already built when the parser reports the error.

There is also nothing requiring it. It is optional. The only 
reqired behavior is to report the error and stop creating parsed
information. Otherwise, results are undefined according to the spec.

The spec does require that normal processing stop at the error.
The parser can make data past the error available, but it must not
continue to pass character data and information about the document's
logical structure to the application in the normal way.

This still feels like a hack to me. An unterminated document is 
not well-formed, and is not XML or Atom. Doing this should require
another RFC that says, we didn't really mean that it had to be XML.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 22, 2005 12:36:17 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote:

 With a HTTP client library and SAX, the absolute simplest solution is
 what Bob is describing: a single document that never completes.

Except that an endless document can't be legal XML, because XML requires
the root element to balance. An endless document never closes it. So, the
endless document cannot be legal Atom. Worse, there is no chance for error
recovery. One error, and the rest of the stream might not be parsable.

So, it is simple, but busted.

The standard trick here is to use a sequence of small docs, separated
by ASCII form-feed characters. That character is not legal within an
XML document, so it allows the stream to resyncronize on that character.
Besides, form-feed actually has almost the right semantics -- start a
new page.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 22, 2005 2:01:45 PM -0400 Joe Gregorio [EMAIL PROTECTED] wrote:

 Interestingly enough the FF separated entries method would also work 
 when storing a large quantity of entries in a single flat file where
 appending an entry needs to be fast.

The original application was logfiles in XML.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: If you want Fat Pings just use Atom!

2005-08-22 Thread Walter Underwood

--On August 23, 2005 12:01:11 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote:
 
 Well, modulo character encoding issues, that is. An FF will
 look differently in UTF-16 than in ASCII-based encodings.

Fine. Use two NULs. That is either one illegal UTF-16(BE or LE) character
or two illegal characters in ASCII or UTF-8.

Of course, a transport level multi-payload system would be preferred.

wunder
--
Walter Underwood
Principal Software Architect, Verity



Re: FYI: Expires Extension Draft

2005-08-18 Thread Walter Underwood

RSS 3? Eh?

The RSS ttl element is a mess. RSS 3 Lite (could we spell that word correctly?)
specifies it not as information about the feed, but as an attempt to remotely
control robots. RSS 2 specifies it as a caching hint, but in minutes, not
seconds.

Regardless it is useless for a feed with a dedicated update schedule, because
it requires updating the feed every second (or minute) as the publish time
approaches.

For more detail, see: http://www.intertwingly.net/wiki/pie/PaceCaching
That was a proposal, and is *not* part of Atom, but it does have some
useful discussion of cache hints.

For caching, use the native HTTP cache features.

wunder 

--On August 18, 2005 2:20:21 PM -0400 Elias Torres [EMAIL PROTECTED] wrote:

 
 I tried commenting on your site, but I have to register to comment. :-(
 
 You linked to RSS3 [1] and I spotted something related to this
 extension that could be used instead.
 
 ttl span=days7/ttl
 
 It seems more elegant than having to convert to whatever you specified
 in your spec.
 
 Just a thought.
 
 Elias
 
 
 [1] http://www.rss3.org/rss3lite.html
 
 On 8/17/05, James M Snell [EMAIL PROTECTED] wrote:
 
 http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-expires-00.txt
 
 Example:
 
 entry
   ...
   t:expires xmlns:t=...2005-08-16T12:00:00Z/t:expires
   ...
 /entry
 
 or
 
 entry
   ...
   updated2005-08-16T12:00:00Z/updated
   t:max-age2/t:max-age
   ...
 /entry
 
 This is not to be used for caching of Atom documents; nor is it to be
 used as a mechanism for scheduling updates of local copies of Atom
 documents.
 
 - James
 
 
 
 



--
Walter Underwood
Principal Software Architect, Verity



Re: Expires extension draft (was Re: Feed History -02)

2005-08-10 Thread Walter Underwood

--On August 10, 2005 1:56:05 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote:

 Aside: a perfect example of what sense of 'expires' is in the I-D itself...
 
 Network Working Group
 Internet-Draft
 Expires: January 2, 2006

Especially perfect because the HTTP header does not reflect the expiration.

Honestly, another reason to put expiration inside the feed is that 
HTTP caching is just not used. Well, except to force reloads and show
you new ads. But it is extremely rare to see it per-document cache
information.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Feed History -02

2005-08-09 Thread Walter Underwood

--On August 9, 2005 1:07:29 PM +0200 Henry Story [EMAIL PROTECTED] wrote:

 But I would really like some way to specify that the next feed  document is an
 archive (ie. won't change). This would make it easy  for clients to know when
 to stop following the links, ie, when they have cought up with the changes
 since they last looked at the feed.

I made some proposals for cache control info (expires and max-age).
That might work for this.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: FormatTests

2005-07-17 Thread Walter Underwood

--On July 17, 2005 3:45:26 PM +0100 Graham [EMAIL PROTECTED] wrote:

 Now do you see why canonical ids are stupid and irrelevant?

Not unless the robustness principal is stupid and irrelevant.
Canonical IDs are more robust. Feeds that use them will work better
in the quick-and-dirty, Desperate Perl Hacker environment of the
internet.

The updated warning is just right. Thank you for using Atom, here is
how you can do a better job.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Evangelism, etc.

2005-07-16 Thread Walter Underwood

--On July 16, 2005 11:16:44 AM -0400 Robert Sayre [EMAIL PROTECTED] wrote:

 I found the criticism pathetic. 

A little lame, at least. You can't add precision and interoperability
with innovation and extension.

But there is a point buried under all that. What are the changes required
to support Atom? It looks complicated, but how hard is it? Here is a shot
at that information.

For publishers, you need to be precise about the content. There are fallbacks,
where if it is any sort of HTML, send it as HTML, and if it isn't, send it
as text. The XHTML and XML options are there for extra control.

Also, add an ID. It is OK for this to be a URL to the article as long as
it doesn't change later. That is, the article can move to a different URL,
but keep the ID the same.

Add a modified date. The software probably already has this, and you can
fall back to the file last-modified if you have to. But if there is a 
better date available, use it.

The ID and date are required because they allows Atom clients and aggregators
to get it right when tracking entries, either in the same feed or when the
same entry shows up in multiple feeds. 

Extending Atom is different from extending RSS, because there are more options.
The mechanical part of extensions are covered in the spec, to guarantee that
an Atom feed is still interoperable when it includes extensions. The political
part of extensions has two options: free innovation and standardization. Anyone
can write an extension to Atom and use it. Or, they can propose a standard to
the IETF (or another body). The standards process usually means more review,
more interoperability, and more delay in deploying it. Sometimes, the delay
is worth it, and we hope that is true for Atom.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: The Atomic age

2005-07-15 Thread Walter Underwood

--On July 14, 2005 11:37:05 PM -0700 Tim Bray [EMAIL PROTECTED] wrote:

 So, implementors... to  work.

Do we have a list of who is implementing it? That could be used in
the Deployment section of http://www.tbray.org/atom/RSS-and-Atom.

Ultraseek will implement Atom. We need to think more about exactly
what it means for a search engine to implement it, but we'll at
least spider it.

wunder

Creature with the Atom Brain, why is he acting so strange?
  Roky Erickson
--
Walter Underwood
Principal Architect, Verity



Mystery abbrevations in draft 9

2005-07-06 Thread Walter Underwood

In 4.2.6 atom:id, the last sentence is:

o Ensure that all components of the IRI are appropriately character-
  normalized, e.g. by using NFC or NFKC.

NFC and NFKC need to be defined, with a reference to the Unicode spec.

wunder
--
Walter Underwood
Principal Architect, Verity



RE: Roll-up of proposed changes to atompub-format section 5

2005-07-05 Thread Walter Underwood


--On Tuesday, July 05, 2005 11:48:44 AM -0700 Paul Hoffman [EMAIL PROTECTED] 
wrote:

At 2:24 PM -0400 7/5/05, Bob Wyman wrote:

I find it hard to imagine what harm could be done by providing this
recommendation.


Timing. If we change text other than because of an IESG note, there is a strong
chance we will have to delay being finalized by two weeks, possibly more.


I'm fine with the delay. Two or three weeks on top of 18 months is
not a big deal.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek



Re: Clearing a discuss vote on the Atom format

2005-07-01 Thread Walter Underwood

--On July 1, 2005 4:44:23 PM +0900 Martin Duerst [EMAIL PROTECTED] wrote:

 The reason for this is to make sure we have interoperability
 with a mandatory-to-implement (and default-to-use) canonicalization,
 but that we don't disallow other canonicalizations that for one
 or the other as of now not yet clear reason may be preferable in
 some cases in the future (but in your wording would prohibit
 the result to be called Atom at all).

A potential future reason that we can't even characterize isn't
enough reason for me to support this.

If we discover weaknesses in the canonicalization, we'll need
to change Atom anyway. Explicitly making room for future incompatible
canonicalizations doesn't make any sense to me.

What is the point of calling something Atom when it uses a 
canonicalization which prevents interop with legal Atom implementations?

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Google Sitemaps: Yet another RSS or site-metadata format and Atom competitor

2005-06-07 Thread Walter Underwood

--On June 7, 2005 3:17:04 AM -0700 gstein [EMAIL PROTECTED] wrote:

 proprietary connotes closed. We published the spec and encourage
 other search engines to use it. There is no intent to close or control it.

Proprietary means owned. Google clearly owns Google Sitemaps.
The license requires derivative works to keep the same license. That is
control.

It was designed in isolation, for Google's use. That is a closed spec.

For example, the priority element is not specified well enough for another
engine to implement it compatibly. Does it apply to ranking, crawl order
or duplicate preference?

An open process would have at least looked at the proposed extensions 
for robots.txt and earlier formats like Infoseek sitelist.txt.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Last and final consensus pronouncement

2005-05-26 Thread Walter Underwood

The atom:author element name is embarrassing. Make it atom:creator.
There were no objections to that.

wunder

--On May 26, 2005 10:26:54 AM -0700 Tim Bray [EMAIL PROTECTED] wrote:

 
 co-chair-mode
 On behalf of Paul and myself:  This is it.  The initial phase of the  WG's 
 work in designing the Atompub data format specification is  finished over, 
 pining for the fjords, etc.  Please everyone reach  around and pat yourselves 
 on the back, I think the community will  generally view this as a fine piece 
 of work.
 
 Stand by for announcements on buckling down on Atom-Protocol.
 
 Note that this is a pronouncement, not a call for further debate.   Here 
 are the next steps:
 
 1. Editors take the assembled changes and produce a format-09 I-D.   Sooner 
 is better.
 2. They post the I-D.
 3. Paul sends Scott a message, cc'ing the WG, that we're done.
 4. At this point there may be objections from the WG.  We decide  whether to 
 accept the objections and pull the draft back, or tell the  objectors they'll 
 have to pursue the appeal process.
 5. The IESG process takes over at this point and we'll eventually  hear back 
 from them.
 
 Last two draft changes:
 
 1. PaceAtomIdDOS
 
 We think that the WG has consensus that it is of benefit to add a  warning to 
 section 8 Security Considerations.  The language from  PaceAtomIdDos is 
 mostly OK, except that the late suggestion of  talking about spoofing instead 
 of DOS seemed to get general support.   I reworded slightly.  We'll leave it 
 up to the editors to decide  whether a new subsection of section 8 is 
 required.
 
 Atom Processors should be aware of the potential for spoofing  attacks where 
 the attacker publishes an atom:entry with the atom:id  value of an entry from 
 another feed, perhaps with a falsified  atom:source element duplicating the 
 atom:id of the other feed. Atom  Processors which, for example, suppress 
 display of duplicate entries  by displaying only one entry with a particular 
 atom:id value, perhaps  by selecting the one with the latest atom:updated 
 value, might also  take steps to determine
whether the entries originated from the same  publisher before considering them 
to be duplicates.
 
 2. PaceAtom10
 
 http://www.intertwingly.net/wiki/pie/PaceAtom10
 
 We just missed this one in the previous consensus call; seeing lots  of +1's 
 and no pushback, it's accepted.
 /co-chair-mode
 
 
 



--
Walter Underwood
Principal Architect, Verity



Re: Consensus snapshot, 2005/05/25

2005-05-25 Thread Walter Underwood


--On Wednesday, May 25, 2005 11:03:46 AM -0700 Tim Bray [EMAIL PROTECTED] 
wrote:


Have I missed any?  Yes, there has been high-volume debate on several
other issues; but have there been any other outcomes where we can
reasonably claim consensus exists?


Changing atom:author to atom:creator? No objections, so far.
I paste together a PACE with the official Dublin Core definition.

Should we mention DC for atom:contributor?

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek



Re: posted PaceAuthorContributor

2005-05-23 Thread Walter Underwood

--On May 23, 2005 10:52:47 AM -0700 Tim Bray [EMAIL PROTECTED] wrote:

 If you're worried, one good way to  address the issue would be to say that
 the semantics of this element are based on the Dublin Core's [dc:creator],
 DC is pretty clear as I  recall.  I've been thinking that would be a good idea
 anyhow.

Let's call it atom:creator, then, and actually use the DC definition.

Not because DC is better, but because it makes the metadata crosswalks
(interoperability) work smoothly.

wunder
--
Walter Underwood
Principal Architect, Verity



PaceCaching

2005-05-20 Thread Walter Underwood
--On Tuesday, May 17, 2005 09:13:37 PM -0700 Tim Bray [EMAIL PROTECTED] wrote:
PaceCaching
Multiple -1's, it fails.
I'll address the objections anyway, because I (still) think this is 
important.
1. This introduces multiple caching schemes.
Wrong. Right now we have multiple schemes, with HTTP caching, ad hoc client
caching, and ad hoc server-side load shedding. This recommends one consistant
scheme, which we know will work. The current multi-scheme approach is a mess,
and we can be sure that it will have problems.
2. This applies protocol caching to a client.
True, but not really an isssue. HTTP caching does work when used to manage
a client cache. Compare a client working through an HTTP cache to one which
checks the cache information internally before issuing HTTP requests. The HTTP
server will see the same series of requests. Effectively, the client will
run a virtual HTTP cache internally.
3. Server-side parsing is too much overhead.
Maybe with 90 MHz Pentiums, but XML parsing is really fast these days.
Parse the file, cache the values, and toss them if the file has changed
when you stat it. Or, the blog server software can set the cache info
out-of-band to the server.
4. This requires synchronized clocks.
Those are a SHOULD for HTTP, too. And they ought to be a SHOULD for Atom
anwyay, because you cannot date-sort entries from two servers with
unsynchronized clocks.
5. This is just like HTTP-EQUIV and that has failed.
Yes and no. Most HTTP servers ignore HTTP-EQUIV, but it is still useful
for passing through things like content-language when there is no HTTP
header present.
For Atom, the caching info would be valid when there is no HTTP cache
header. This is exactly where HTTP-EQUIV is effective today.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceAllowDuplicateIdsWithModified

2005-05-18 Thread Walter Underwood
--On Thursday, May 19, 2005 01:12:22 AM +1000 Eric Scheid [EMAIL PROTECTED] wrote:
(See the wiki for a survey of tools and the dates they support.)
hmmm ... Blogger, Moveable Type, JournURL, bloxsom, ExpressionEngine,
ongoing, Roller, Macsanomat, WordPress, and BigBlogTool all provide dates
which represent the last date/time the entry was modified, and there is no
info for LiveJournal.
We abaondoned full LiveJournal compatability a long time ago by requiring
time zones. Older LJ posts do not have time zones. Don't know about the
current ones.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Walter Underwood

--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck [EMAIL PROTECTED] wrote:

 I have to agree with Paul.  I don't believe that the issue of white space in
 the syndicated content is really an Atompub issue.  It might be an issue for
 the content creator.  It might be an issue for the reader.  As long as the
 pipe between the two passes the content as submitted, though, the pipe has
 done its job.

If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.

Everyone has this problem is not a good reason to ignore it. Someone
has to be the first to solve it, might as well be us. It is not acceptable
to build formats for the English Wide Web. That doesn't exist any more.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Atom 1.0?

2005-05-10 Thread Walter Underwood
--On Tuesday, May 10, 2005 09:12:09 AM -0700 Paul Hoffman [EMAIL PROTECTED] wrote:
At 9:09 PM -0700 5/9/05, Walter Underwood wrote:
Seriously, I don't mind Atom 1.0 as long as the next version is
Atom 2.0.
+12
I'd also be happy with just Atom and saying RFC  Atom when
pressed for a version. Even with Atom 1.0 we'll need to say which RFC.
If we choose a specific name, it *must* be in the RFC. Because the RFC
must be a hit for that search.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: Atom 1.0?

2005-05-09 Thread Walter Underwood

--On May 9, 2005 7:29:58 PM -0700 Tim Bray [EMAIL PROTECTED] wrote:
 
 Anyone have a better idea? --Tim

Hey, let's vote on a *new* name. I'm +1 on Naked News, because
it delivers the news without chrome and crap. Or maybe that is what
you get when Atom (Adam?) goes public. Or because sex sells.

Seriously, I don't mind Atom 1.0 as long as the next version is
Atom 2.0. Please don't increment the right-of-the-dot part forever,
because I just had to fix some software that made the (reasonable)
assumption that 5.10==5.1, even though 5.10 is really Solaris 10.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceCaching

2005-05-07 Thread Walter Underwood

--On May 6, 2005 4:28:44 PM -0700 Paul Hoffman [EMAIL PROTECTED] wrote:

 -1. Having two mechanisms in two different layers is a recipe for disaster. 
 If HTTP headers are good enough for everything else on the web, they're good 
 enough for Atom.

That would be a problem. But this is one mechanism with two ways to
specify it. One is out-of-band in a server-specific way, the other
is in the document in a standard way. Either way, it is HTTP rules for
caching at all intermediate caches and at the client.

Architecturally, this is exactly the same as HTTP-EQUIV meta tags for
HTTP headers, and very similar to the ROBOTS meta tag for /robots.txt.
In both cases, they provide a way for the document author to specify
something without having permissions on the server software config.

Further, these should be implemented exactly like HTTP-EQUIV, where
the server software reads them and sets the header.

The HTTP-EQUIV meta tag is proof put it in the header is not good
enough for everything else. If that wasn't needed, it would be deprecated
by now.

There is a problem here, though. We need to specify the priority of the
in-document specs vs. the HTTP header specs. I propose following the HTTP
standard, in saying that the HTTP headers trump anything in the body.
I'll even assume that following the HTTP spec is non-controversial, and
go update the PACE.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Walter Underwood

--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote:

 Why would you put line breaks in the CJK source, then? Isn't the problem
 solved with the least heuristics by the producer not putting breaks there?

It would be even better if they would just speak English. :-)

White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.

If we get this wrong, Atom-delivered content will look broken in
some languages, and a bunch of extra-spec practice will build up about
how to fix it. Much better to get it right in 1.0.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Atom feed refresh rates

2005-05-06 Thread Walter Underwood

--On May 5, 2005 10:53:48 AM -0700 John Panzer [EMAIL PROTECTED] wrote:
 
 I assume an HTTP Expires header for Atom content will work and play well with
 caches such as the Google Accelerator (http://webaccelerator.google.com/). 
 I'd also guess that a syntax-level tag won't.  Is this important? 

The syntax-level tag is useful inside a client program with a cache.
It can reduce the number of requests at the source, rather than 
reducing them in the middle of the network at an HTTP cache.

There is extra benefit from putting that info into the HTTP headers,
because the HTTP cache is shared between multiple clients. The source
webserver sees one GET per HTTP cache instead of one GET per Atom client.

The syntax-level tag also provides a way for the feed author to specify the
info without depending on webserver-specific controls. It does depend on
some extra bit of software to take that info and put it in the HTTP
Expires or Cache-control headers.

wunder
--
Walter Underwood
Principal Architect, Verity



RE: Selfish Feeds...

2005-05-06 Thread Walter Underwood

--On May 6, 2005 4:37:23 PM -0400 Bob Wyman [EMAIL PROTECTED] wrote:

   Frankly, I really wish that we had done the blog architecture work
 many months ago so that we would all have a shared understanding of the
 system-wide issues and components rather than the widely divergent personal
 and partial views that are obvious in many our conversations today...

Agreed. A conceptual model of a resource is up there at the front of
our charter, and if we don't have that, it doesn't seem like the WG is done.

wunder
--
Walter Underwood
Principal Architect, Verity



RE: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:15:10 AM +0100 Andy Henderson [EMAIL PROTECTED] wrote:

 here is no RSS2 feature I can see that allows feed providers to tell
 aggregators the minimum refresh period.  There's the ttl tag.  That was, I
 believe, introduced for a different purpose and determines the Maximum time
 a feed should be cached in a certain situation. 

We need both a ttl (max-age) and expires. One or the other is appropriate
for different publishing needs. We also need to specify what you do with
those values, or you end up with a mess, like the RSS2 ttl meaning reversing
over an undocumented value (Yikes!).

 What has yet to be tried is a specific tag in the core feed standard that
 promotes and determines good behaviour for aggregators refreshing their
 feeds.  Even if it were to prove only a limited benefit, it would still be a
 benefit.

It has been tried several ways, originally in robots.txt extensions and
also in RSS. It doesn't work. The model is not rich enough for publishers
or for spiders/aggregators.

Max-age/expires is already designed and proven. By page count, 20% of the
HTTP 1.1 spec is about caching. If we want to write a new caching/scheduling
approach, we can expect it to be a 20 page spec, plus an additional 10
pages on how to work with the HTTP model.

See the Notes section here for details on when to use max-age or expires,
and on the problems with calendar-based schemes.

  http://www.intertwingly.net/wiki/pie/PaceCaching

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Atom feed refresh rates

2005-05-05 Thread Walter Underwood

--On May 5, 2005 8:07:15 AM -0500 Mark Pilgrim [EMAIL PROTECTED] wrote:

 Not to be flippant, but we have one that's widely available.  It's
 called the Expires header. 

You need the information outside of HTTP. To quote from the RSS spec
for ttl:

  This makes it possible for RSS sources to be managed by a file-sharing 
  network such as Gnutella. 

Caching information is about knowing when your client cache is stale,
regardless of how you got the feed.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: AtomPubIssuesList for 2005/05/05

2005-05-05 Thread Walter Underwood

--On May 5, 2005 7:17:00 AM -0400 Sam Ruby [EMAIL PROTECTED] wrote:

 Demonstrate that you have revisited the previous discussion, and that you 
 either
 have something new to add, or can point out some evidence that the previous
 consensus call was made in error.

PaceCaching was not discussed and rejected based on false information.
It was rejected because it was HTTP-specific (it is not), and because
it was non-core (similar features are common in other RSS specs).

It does not interact with other features, so it should be a fairly
clean, quick discussion.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Atom feed refresh rates

2005-05-04 Thread Walter Underwood

PaceCaching uses the HTTP model for Atom, whether Atom is used over HTTP
or some other protocol.

PaceCaching was rejected by the editors because it was too late (two months
ago) and non-core. I think that: a) it is never too late to get it right,
and b) scalability is core.

The PACE describes why refresh rates do not solve the problem adequately.

wunder

--On May 4, 2005 5:44:18 AM -0500 Brett Lindsley [EMAIL PROTECTED] wrote:

 
 
 Andy, I recall bringing up the same issue with respect to portable devices. 
 My angle
 was that firing up the transmitter, making a network connection and 
 connecting to
 the server is still an expensive operation in time and power (for a portable
 device) - even if the server returns nothing .  There is no reason to check 
 feeds
 that are not being updated, but then, there currently is no way to know this.
 
 I recall there was a proposal on cache control. That seemed like a good 
 direction,
 but I don't recall it being discussed. As you indicated, if the feed had some
 element that indicated it won't be updated (for example) for another day (e.g.
 a daily news summary), then the end client would need to only check once
 a day.
 
 Brett Lindsley, Motorola Labs
 
 Andy Henderson wrote:
 
 If I'm asking this in the wrong place, sorry; please redirect me if you can.
 
 I am the author of an Aggregator and I'm looking for advice on refresh
 rates.  There was some discussion in this group back in June about a
 possible 'Refresh rate' element.  That seems to have been dismissed in
 favour of bandwidth throttling techniques, notably etag, last-modified and
 compression.  I already support all these plus some additional ones.  I am
 uncomfortable, though, with the implication that refresh rates don't matter
 and should be left to the end-user to decide.
 
 I am adding Atom support to my Agg.  For RSS feeds, I have used the ttl and
 sy:updatePeriod / sy:updateFrequency elements to  allow feed providers to
 limit refresh rates.  I have, in any case, imposed a minimum refresh rate of
 one hour - because that seemed the decent thing to do.  However, I'm coming
 under pressure to reduce that minimum limit for feeds that are clearly
 designed for shorter refresh periods - such as the Gmail Atom feeds.  I'm
 reluctant to implement a free-for-all so I'm looking for guidance on how I
 should tackle this issue.
 
 Andy Henderson
 Constructive IT Advice
 
  
 
 
 
 



--
Walter Underwood
Principal Architect, Verity



Re: FYI: More on duplicates in feeds: DoubleClick does ads the WRONG way!

2005-05-02 Thread Walter Underwood

--On May 2, 2005 5:32:22 PM +1000 Eric Scheid [EMAIL PROTECTED] wrote:

 Counting impressions is essential to their trade, and you'll find that it is
 industry standard practice.

Make that was essential, and should be a dying practice. Ads have moved to 
results-based billing, paying for clickthrough and conversion.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceOptionalFeedLink

2005-04-30 Thread Walter Underwood

--On April 30, 2005 3:03:50 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote:

 atom:feed elements MUST NOT contain more than one atom:link element
 with a rel attribute value of alternate that has the same
 combination of type and hreflang attribute values.

That actually specifies something different, the duplication, without
saying whether atom:link is recommended. I recommend adding this text:

An atom:feed element SHOULD/MAY contain one such atom:link element.

I'll let other people contribute on whether it is SHOULD or MAY.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: HTML/XHTML type issues, was: FW: XML Directorate Reviewer Comments

2005-04-13 Thread Walter Underwood

--On April 13, 2005 9:06:59 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote:

 Instead of saying XHTML it would be clearer to say XHTML 1.x or defining 
 it
 in terms of the XHTML 1.x namespace URI.

This could work. XHTML 1.0 will not be confused with a media type.

When XHTML 2.0 is ready, we can add a supplemental RFC which defines
a new attribute value for that.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceCoConstraintsAreBad

2005-04-09 Thread Walter Underwood

--On April 8, 2005 8:29:52 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote:

 Please don't respond to me by saying that accessibility is important.

I would never say that. Required or essential, but not merely important.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceCoConstraintsAreBad

2005-04-08 Thread Walter Underwood

--On April 8, 2005 6:59:47 PM -0400 Robert Sayre [EMAIL PROTECTED] wrote:

 Walter, you are missing my point. You've said it yourself:
 
 Maybe summaries are optional, but not because accessibility is optional.[0]

That was in reply to a proposal to make accessibility an optional profile, and
to make summaries required only in that profile. That approach is unacceptable.
I would read my comment as regardless of your position on summaries, 
accessibility
is required.

Local textual summaries are rather common on the web. The a tag, for example.
Current accessibility practice is to make the anchor text understandable out
of context. In other words, to make it a summary of the linked resource.
Even if the remote resource is text!

For the img tag, the alt tag is used to provide a local, textual equivalent.
Again, this is required practice for accessibility. Same thing for graphs,
charts, audio, and video.

These are top-level requirements. They fit on the WAI pocket card. There
are ten quick tips and five of them are about local textual equivalents:

  http://www.w3.org/WAI/References/QuickTips/

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Spaces supports slash:comments. Result = Duplicates Galore!

2005-04-07 Thread Walter Underwood
One way to look at this is to define what parts are local content
as opposed to caches of remote, and base the Etag or other hash on
that.
I still think we should address caching in Atom 1.0. This would
have been part of that. Scaling is an essential thing for syndication,
and caching is the best known way to scale.
wunder
--On Thursday, April 07, 2005 02:48:07 PM -0400 Bob Wyman [EMAIL PROTECTED] 
wrote:
Spaces.msn.com recently announced support for slash:comments, an
element which shows how many comments an RSS item has associated with it.
As Dare Obasanjo explains[1]:
Another cool RSS enhancement is that the number of comments on
each post is now provided using the slash:comments elements. Now
users of aggregators like RSS Bandit can track the comment counts on
various posts on a space. I've been wanting that since last year.
Of course, the side effect of this change is that any aggregator
that uses an MD5-like approach to detect changes will now think that an
entry has been updated every time a new comment is made. This may or may not
be what is desired by consumers of feeds... In any case, there are now
millions of blogs whose entries are changed every time anyone comments on
them. Should aggregators ignore changes that are limited to the
slash:comments element? If so, are there other elements that should be
ignored?
Now, Spaces only publishes RSS feeds... However, if similar atom
extensions were to be defined, the problem would appear with Atom feeds as
well.
bob wyman
[1]
http://spaces.msn.com/members/carnage4life/Blog/cns%211piiOwAp2SJRIfUfD95CnR
Lw%21430.entry



--
Walter Underwood
Principal Architect
Verity Ultraseek



Re: Alternative to the date regex

2005-03-25 Thread Walter Underwood

+1 on dropping the regex. It isn't from any of the other specs,
it isn't specifically called out as explanatory and non-normative,
and it is too long to be clear.

Some examples would be nice, along with some examples of things
which do not conform.

wunder

--On March 25, 2005 5:11:09 PM + Graham [EMAIL PROTECTED] wrote:

 
 Currently we have this
 
 A Date construct is an element whose content MUST conform to the
 date-time BNF rule in [RFC3339].  I.e., the content of this element
 matches this regular expression:
 
 [0-9]{8}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)
  ?(Z|[\+\-][0-9]{2}:[0-9]{2})
 
 As a result, the date values conform to the following specifications...
 
 The problem with the regex is that it's entirely redundant. If we look at 
 Norm's message where the regex was suggested [1], he intends it as a profile 
 of xsd:dateTime, which allows a variety of date formats. However we're using 
 it as a profile of RFC3339, which already requires that date-times match the 
 regex 100%. Having the regex there as well is just confusing - until 
 preparing this email I was under the impression it made some additional 
 restrictions on RFC3339.
 
 The nearest thing I see to an additional restriction is that there must be a 
 capital T between the date and time, which the date-time BNF rule we mention 
 also requires, but the prose later mentions you might be allowed to use 
 something different.
 
 Proposal:
 Replace the first para and regex with:
 
 A Date construct is an element whose content MUST conform to the
 date-time BNF rule in [RFC3339]. Note this requires an uppercase letter T
 between the date and time sections.
 
 Secondly, *all* RFC3339 date-times are compatible with the 4 specs mentioned, 
 so the wording of the second paragraph (As a result...) is a bit strange, 
 since it's not as a result of anything we've done. Just say Date values 
 expressed in this way are also compatible with
 
 Graham
 
 [1]http://www.imc.org/atom-syntax/mail-archive/msg13116.html
 
 



--
Walter Underwood
Principal Architect, Verity



Re: new issues in draft -06, was: Updated issues list

2005-03-20 Thread Walter Underwood

--On March 20, 2005 11:44:30 AM -0800 Tim Bray [EMAIL PROTECTED] wrote:

 Good point.  My impression is that we do currently have SHOULD-level mandate 
 to 
 serve valid HTML; recognizing that most real-world implementors do make a 
 best-effort
 with tag soup.  Anyone who thinks that the language needs improving should 
 suggest
 improvements. 

I support a SHOULD on that. The Robustness Principle would suggest exactly
that. Consumers of Atom may make an attempt to parse arbitrary HTML-like
content, but producers should make the effort to serve clean HTML.

That free-range HTML is nasty stuff. In the past week, we had two customers
freely mixing slash and backslash in their URL paths. Sigh.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceRepeatIdInDocument solution

2005-02-20 Thread Walter Underwood

About logical clocks in atom:modified:

--On February 21, 2005 3:30:13 AM +1100 Eric Scheid [EMAIL PROTECTED] wrote:

 Semantically, it would work ... for comparing two instances of one entry. It
 wouldn't work for establishing if an entry was modified before or after
 [some event moment] (eg. close of the stock exchange).

Establishing sequences of events is rather tricky. See Leslie Lamport's
Time, Clocks, and the Ordering of Events in Distributed Systems for how
to do it with logical clocks. The core part of the paper is short, maybe
five pages, and definitely worth reading if you care about this stuff.

 http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf

Synchronized clocks make this simpler. If Atom depends on comparing timestamps
from different servers, then synchronized clocks are a SHOULD. See the text in
PaceCaching for an example.

Synchronized clocks are already a SHOULD for HTTP.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: atom:entry elements MUST contain an atom:summary element in any of the following cases

2005-02-15 Thread Walter Underwood

I don't think that accessibility is optional. It isn't a profile, it is
a requirement. Maybe summaries are optional, but not because accessibility
is optional.

wunder

--On February 14, 2005 8:48:08 PM -0800 James M Snell [EMAIL PROTECTED] wrote:

 At the risk of beating the PaceProfile drum to death, I would think that   an 
 Accessibility profile could be used to specify specific requirements for 
 accessible feeds.  The core could do exactly as you suggest below -- not 
 require summary.



--
Walter Underwood
Principal Architect, Verity



RE: PaceHeadless

2005-02-08 Thread Walter Underwood
--On Tuesday, February 08, 2005 08:39:42 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote:
Linking to the feed is not an acceptable solution. It must be
possible to embed feed metadata in an entry in a feed and in an Entry
document.
+1
The feed document *must* be standalone. Everything required to
interpret the feed has to be in the feed.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceClarifyDateUpdated

2005-02-07 Thread Walter Underwood

--On February 6, 2005 1:07:42 PM +0200 Henri Sivonen [EMAIL PROTECTED] wrote:

 Yes. Also as a spec expectation--that is, how often is the SHOULD NOT 
 expected
 to be violated. Will the SHOULD NOT be violated so often that it dilutes the
 meaning of all SHOULD NOTs?

Roughly, a SHOULD or SHOULD NOT can be violated when the implementer 
understands and accepts the interoperability limitations they of that
decision.

So, the spec should (must?) explain what those are.

wunder
--
Walter Underwood
Principal Architect, Verity



RE: PaceArchiveDocument posted

2005-02-07 Thread Walter Underwood

I agree, but I would put it another way. The charter requires support
for archives, but we don't have a clear model for those. Without a
model, we can't spec syntax.

So, it is not possible for the current doc to fulfill the charter, and
this document is not ready for last call.

wunder

--On February 6, 2005 2:00:20 AM -0500 Bob Wyman [EMAIL PROTECTED] wrote:

 
 -1.
   The use cases for archiving have not been well defined or well
 discussed on this list. It is, I believe, inappropriate and unwise to try to
 rush through something this major at the last moment before a pending Last
 Call.
 
   bob wyman
 
 
 



--
Walter Underwood
Principal Architect, Verity



Re: PaceCaching posted

2005-02-07 Thread Walter Underwood
This is not restricted to HTTP. It uses HTTP's cache age algorithms,
because they are very carefully designed and have proven effective.
But it can be used for any local copy in an Atom client.
wunder
--On Monday, February 07, 2005 10:08:48 AM -0800 Paul Hoffman [EMAIL 
PROTECTED] wrote:
At 9:38 AM -0800 2/7/05, Walter Underwood wrote:
I was holding this back as out of scope and too close to the deadline,
but now that we are talking about sliding windows and delayed, cached
state, it is quite relevant.
Sorry, this is too late for consideration for the Atom core. Even if you 
had turned it in on time, I would give it a -1 for not being essential to the 
core for the Atom format. Atom will be distributed over many protocols, HTTP 
being one of them. Having said that, I think this would be an excellent 
extension, one that might keep the folks who don't understand HTTP scalability 
but feel free to talk about it anyway at bay.
--Paul Hoffman, Director
--Internet Mail Consortium

--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceEntryOrder

2005-02-07 Thread Walter Underwood

--On February 7, 2005 1:06:49 PM -0500 Robert Sayre [EMAIL PROTECTED] wrote:
 Paul Hoffman wrote:
 
 +1. It is a simple clarification that shows the intention without 
 restricting anyone.
 
 +1. Agree in full.

-1. I don't see the benefit. Clients MAY re-order them, but that
doesn't mean they MUST ignore the order. The publisher may prefer
an order which cannot be expressed in the attributes. The Macintouch
and BBC New feeds cited before are good examples.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceEntryOrder

2005-02-07 Thread Walter Underwood
--On Monday, February 07, 2005 12:24:15 PM -0800 Paul Hoffman [EMAIL PROTECTED] wrote:
At 11:07 AM -0800 2/7/05, Walter Underwood wrote:
-1. I don't see the benefit. Clients MAY re-order them, but that
doesn't mean they MUST ignore the order. The publisher may prefer
an order which cannot be expressed in the attributes. The Macintouch
and BBC New feeds cited before are good examples.
I'm very confused. Clients that show the entries of those feeds in
the received order are perfectly acceptable according to the wording of this 
Pace.
Correct, clients may choose any order, including the original.
This is about the publisher's order preference. The Pace says that
the publisher cannot indicate a preferred order in the Atom format.
The order is not significant.
This is clearly counter to normal use, where the order does have
some meaning. The meaning varies by publisher, but it is usually
significant.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


RE: Entry order

2005-02-04 Thread Walter Underwood

--On February 3, 2005 11:21:50 PM -0500 Bob Wyman [EMAIL PROTECTED] wrote:
 David Powell wrote:
 It looks like this might have got lost accidently when the 
 atom:head element was introduced. Previously Atom 0.3 said [1]:
 Ordering of the element children of atom:feed element MUST NOT be
 considered significant.
   +1. 
   The order of entries in an Atom feed should NOT be significant. This
 is, I think, a very, very important point to make. 

-1

Is this a joke? This is like saying that the order of the entries in my
mailbox is not significant. Note that ordering a mailbox by date is not
the same thing as its native order. 

Feed order is the only way we have to show the publication order of items 
in a feed. I just looked at all my subscriptions, and there is only one
where the order might not be relevant, a security test for RSS readers.
That is clearly not within Atom's charter, so it doesn't count.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Entry order

2005-02-04 Thread Walter Underwood

--On February 4, 2005 11:44:31 AM -0800 Tim Bray [EMAIL PROTECTED] wrote:
 On Feb 4, 2005, at 11:27 AM, Walter Underwood wrote:
 
 Is this a joke? This is like saying that the order of the entries in my
 mailbox is not significant. Note that ordering a mailbox by date is not
 the same thing as its native order.
 
 Except for, Atom entries have a *compulsory* updated date.  So I have no
 idea what semantics you'd attach to the natural order... -Tim

Order the publisher wants to present them in. Conventionally, most recently
published first. Entries may be updated without being reordered.

If clients are told to ignore the order, and given only an updated timestamp,
there is no way to show most recent headlines, which is the primary 
purpose of the whole family of RSS formats.

Right now, you can shuffle the entries and Atom says it is the same feed.

Either we need a published date stamp or we need to honor the order.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Entry order

2005-02-04 Thread Walter Underwood

--On February 4, 2005 4:28:53 PM -0600 Roger B. [EMAIL PROTECTED] wrote:
 If clients are told to ignore the order, and given only an updated timestamp,
 there is no way to show most recent headlines...
 
 At a single moment within a feedstream, sure... but the next time an
 entry is added to that feed, I'll have no problem letting the user
 know that this is new stuff.

But if three are added, you can't order those three. 

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Format spec vs Protocol spec

2005-02-02 Thread Walter Underwood
On the other hand, the original plan was to publish both specs at
the same time, which I still think is a good idea.
Do we think there will be any dependencies in the other direction?
That is, when we work on the protocol, will we need to add things
to feed or entry? If there is a reasonable chance of that, then
Atom 1.0 is a temporary thing, and Atom 1.1 will be the real one.
If that is the case, we should leave in the dependency and
publish them together.
wunder
--On Tuesday, February 01, 2005 09:40:38 PM -0800 Paul Hoffman [EMAIL 
PROTECTED] wrote:
At 10:05 PM +0100 2/1/05, Julian Reschke wrote:
As far as I understand the IETF publication process, this means that
draft-ietf-atompub-format can't be published until the protocol spec
is ready as well.
Others have said we can and should remove the dependency, which is fine. 
Wearing my nitpicky-IETF-geek hat, I would point out that specs with dangling 
dependencies can be made standards without clearing the dependencies; they 
simply can't be published as RFCs with them. There are dozens (possibly over a 
hundred) IETF standards-track documents that have not yet been published as 
RFCs for a variety of reasons, many of them quite lame.
--Paul Hoffman, Director
--Internet Mail Consortium


--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: Format spec vs Protocol spec

2005-02-02 Thread Walter Underwood
--On Wednesday, February 02, 2005 11:53:29 AM -0700 Antone Roundy [EMAIL PROTECTED] wrote:
On Wednesday, February 2, 2005, at 11:56  AM, Walter Underwood wrote:
We are assuming that Atom will need extensions for new applications,
but it should not need extensions for editing blog entries.
I'd have to disagree.  I don't think it inappropriate for elements that exist for use by the publishing protocol to live in a separate namespace from the feed itself.  Rather, I think that a clean separation between the two would be desirable.  Why require the feed format to be revised if we really just want to alter the publishing protocol?
I'm not talking about altering the publishing protcol, I'm talking about 
things
needed to make 1.0 work. It would seem a little odd if we needed extensions
or a 1.1 for the 1.0 publishing protocol.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: Issues with draft-ietf-atompub-format-04

2005-01-30 Thread Walter Underwood

--On January 30, 2005 10:06:23 PM +0200 Henri Sivonen [EMAIL PROTECTED] wrote:
 
 So how many European sites besides the EU have the resources to provide
 translations of the *same* content in multiple languages at the same time?

Pretty common in Quebec. We see English and Spanish in the US from Texas
to California. California has voter guides in seven languages. It isn't
limited to goverments, UBS's site is in four languages and the San Jose
Mercury New has editions in Spanish and Vietnamese.

 How many of those can't provide multiple feed links and really want to stuff
 everything in a single feed?

Good question. The answer probably depends on how much client software
allows you to select a preferred locale. All browsers do, so they could
easily do that with Atom feeds.

Locales aren't just language. You could offer English in US, UK, and
Australian versions. I was completely mystified about what the Aussies
might mean by footy tipping.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceOrderSpecAlphabetically

2005-01-30 Thread Walter Underwood

--On January 30, 2005 12:34:42 PM -0500 Sam Ruby [EMAIL PROTECTED] wrote:

 == Abstract ==
 Order the Element Definitions in the specification alphabetically.

+1. Yes, please. It works fine for the HTTP spec.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: PaceMustBeWellFormed status

2005-01-25 Thread Walter Underwood
--On Monday, January 24, 2005 04:17:40 PM -0800 Tim Bray [EMAIL PROTECTED] wrote:
If there were no further discussion:  The WG completely failed to converge 
to
consensus on these issues last time around. Consensus can still be found here. 
-Tim
I'm +1 on this, and feel that it belongs in the spec. This is a
constraint on the format of the feed document, and is testable.
Forbidding re-parsing (6.2) is OK, and not a restatement of the XML spec.
If you use a parser which isn't an XML parser, it might process
the doc. This says you can't do that.
I think that the rationale misstates the Pace. It says that Atom feeds
must always be ASCII, but the proposal only requires that for text/xml
feeds. application/xml feeds may use UTF-8, either in an encoding
declaration or with a charset parameter.
I would add a note that 3023 is normative, and maybe move the
notes in 6.1 to an appendix.
Are we sure we want RFC 3023 or its successor instead of RFC 3023?
A successor could make some Atom feeds illegal without a change to
the Atom spec.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: AtomOWL AtomIsRDF

2005-01-17 Thread Walter Underwood
--On Monday, January 17, 2005 12:16:36 PM -0500 Dan Brickley [EMAIL PROTECTED] wrote:
I fear [2] is unfortunately named. Atom is RDF-like in some ways,
but until the Atom spec says Atom is RDF, Atom isn't RDF.
Call it AtomAsRDF.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceMustUnderstandElement

2005-01-13 Thread Walter Underwood
Excellent examples. Each of these could be handled without mustUnderstand.
Define an extension for entries. Put the restricted content inside the
extension. The extension would include the display constraints between
the content portions and the disclaimer or authentication portions.
This could mean duplicating some content elements inside the extension.
On the other hand, it would add support for restricted content instead
of redefining regular content as potentially restricted.
wunder
--On Thursday, January 13, 2005 02:46:06 PM -0800 Tim Bray [EMAIL PROTECTED] 
wrote:
On Jan 13, 2005, at 2:29 PM, David Powell wrote:
Does anyone have any example use cases for mustUnderstand?
1. A stream of financial disclosures from a public company in a highly-regulated industry.  The legislation is very clear that they may not say anything in public unaccompanied by disclaimers and limitation-of-liability statements.  The financial industry gets together an introduces an extension that requests clients to display these disclaimers in a fashion that meets the regulatory requirements.  If Atom has MustUnderstand, compliant clients that can't do this will never fail to display the 
appropriate material, and this reduces the risk of litigation and makes it more likely that such feeds will be created.
2. A stream of information that uses a special-purpose digital-signature 
scheme to establish the authenticity of the information.  People should not act 
on this information without checking the signature.  A person using a 
conformant Atom client can be sure that they won't see anything that hasn't 
been checked.
  -Tim

--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceFeedRecursive

2005-01-13 Thread Walter Underwood
--On Thursday, January 13, 2005 06:55:53 PM -0500 Sam Ruby [EMAIL PROTECTED] wrote:
The proposal apparently is for feeds to contain other feeds by containment.
My question is whether it would make sense to also support feeds containing
other feeds by reference, perhaps via a link element or via a src= attribute
(analogous to content/@src defined in 5.12.2).
As someone maintaining a spider and search engine, I'm -1 on inclusion
by reference. The additional complexity is just not worth it.
If a referenced doc is a 404, is the feed state OK? Should I index it
or not? Do I always need to re-visit referred docs when I revisit the
main one? What do I do with loops? Do I index all the content as
belonging to the main doc? What if that is included in something else
by reference, can it still be a standalone item?
For the same reasons, I'm not very hot on any sort of content by reference.
But recursive feeds by reference opens up lots more issues.
Basically, if you want your content in a search engine, it had better
be accessible with a single GET.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: Hash for Links [Was: Re: Posted PaceEnclosuresAndPix]

2005-01-10 Thread Walter Underwood
This is really cache management. Use e-tags, from the HTTP 1.1 spec.
Or use an HTTP cache, which would require no changes to Atom.
Going thorugh a client-side HTTP 1.1 cache would automatically take
advantage of the e-tags and other caching information in HTTP.
wunder
--On Saturday, January 08, 2005 06:14:50 PM -0800 James Snell [EMAIL 
PROTECTED] wrote:
I really don't want to be going down the road of requiring HTTP header
equivalents in the Atom feed, etc.  All I want is the ability to
specify a hash of whatever it is that is being linked to.  It could
work in both link and content elements and one could easily use the
Content-MD5 header to verify whether or not the resource referenced
has been modified since the time it was included in the Entry.
The URI and the length of the file do not guarantee that the content
has not changed and yes, I had considered this as a possible non-core
extension but wanted to float it as a core item first.
On Sat, 08 Jan 2005 15:02:27 -0500, Robert Sayre [EMAIL PROTECTED] wrote:
Bill de hÓra wrote:


  link rel=enclosure href=http://example.com/somefile.mp3;
  hash={generated_hash_value}
 hashalg={uri_identifying_the_hash_algorithm_used /

 The hash and hashalg attributes would be optional but MUST appear
 together.

 Thoughts? (If we have more than two people respond favorably to this,
 I'll write up a Pace for it)



 Seems like a good idea - would it be possible to move them into elements?
Well, Content-Length lives in the attributes as length, but I don't
think we need to make a home for every HTTP header. Content-MD5 will
work just fine; it would probably be wise to send a HEAD request before
automatically downloading a giant mp3. Furthermore, you'll get a good
enough identifier by concatenating the URI and the length. Something
more accurate will require a HEAD request. Thirdly, there's absolutely
no reason to have this in core.
Robert Sayre


--
- James Snell
  http://www.snellspace.com
  [EMAIL PROTECTED]

--
Walter Underwood
Principal Architect
Verity Ultraseek