Re: OpenSearch RSS

2005-05-31 Thread James Aylett

On Mon, May 30, 2005 at 07:35:22PM -0400, Robert Sayre wrote:

 I get the feeling that OpenSearch + Atom could be real useful.
 
 There is substantial overlap with Atom Protocol paging, especially
 existing gregorio-07 implementations. I quite like their approach to URI
 construction and their description document, which allows servers to 
 place the query parameters anywhere in their URI space.
 
 http://opensearch.a9.com/spec/opensearchdescription/1.0/

I seem to remember when I looked at it that it rang bells with me by
looking like part of WSDL. At which point I wondered why they didn't
use WSDL. That's more of an aside, though.

A couple of us from Xapian [1] looked at OpenSearch, and implemented
it for our example search engine. It took about half an hour (mostly
because we use a templating language for query results), but ten to
fifteen minutes of that was spent figuring out what OpenSearch
actually meant, and we had to make a number of assumptions in doing
so.

Were someone to produce a document describing Atom+OpenSearch, I'd
expect it to have the rigour of Atom. OpenSearch is written in the
RSS2 style, which isn't really appropriate for a highly interoperable
format, which is clearly what's desired.

We were also a little concerned that the OpenSearch model was very
simplistic - there were many possible use cases we could think of that
weren't catered for, and indeed we could only think of one that was
really supported, being the one that A9 puts it to (which is a slim
superset of feeding OpenSearch into a desktop aggregator). We did
start to put together a proposal for an Atom+Search extension, and
completed the conceptual work but then got distracted by other
things. This is kind of orthogonal to the OpenSearch issue, but if
people are interested in discussing a richer search extension we can
try to clear some time to pull it into shape.

[1] http://xapian.org/

James

-- 
/--\
  James Aylett  xapian.org
  [EMAIL PROTECTED]   uncertaintydivision.org



Re: OpenSearch RSS

2005-05-31 Thread James Aylett

On Tue, May 31, 2005 at 12:13:42AM +0200, Thomas Broyer wrote:

 atom:entry
   atom:titleNew York City History/atom:title
   atom:link rel=alternate
  href=http://www.columbia.edu/cu/lweb/eguides/amerihist/nyc.html; /
   
 atom:idhttp://www.columbia.edu/cu/lweb/eguides/amerihist/nyc.html/atom:id
   atom:updated2005-05-30T00:00:00+02:00/atom:updated
   atom:summary... Harlem.NYC - A virtual tour and information on 
 businesses ... with historic photos of Columbia's own New York 
 neighborhood ... Internet Resources for the City's History. 
 .../atom:summary
 /atom:entry
 atom:entry
   atom:titleGotham Center for New York City History/atom:title
   atom:link rel=alternate href=http://www.gothamcenter.org/; /
   atom:idhttp://www.gothamcenter.org//atom:id
   atom:updated2005-05-30T00:00:00+02:00/atom:updated
   atom:summary... Submit Events Edit Your Submission. Main 
 Neighborhood Stories NYC History in the ... The Gotham Center for New 
 York City History is supported by The CUNY Graduate .../atom:summary

I'm going backwards and forwards on whether
atom:entry/atom:[EMAIL PROTECTED]alternate]/@href is the right thing to put
into atom:entry/atom:id. On the plus side it's simple and works in the
basic case of syndication (eg: my search results appear in an A9
search column). On the negative side I'm wondering what will happen if
I take two search feeds into a desktop aggregator - presumably the
result will only appear once. However that may well be what is
desired, which would kind of be a neutral side.

It does limit things (at least by draft-ietf-atompub-format-08) in
that if I aggregate two search feeds into one and they have search
results for the same site, either I have to drop one of them or I have
to rewrite all the atom:id values myself - and the latter is
explicitly forbidden by 4.2.7 of atompub.

I know there's been some related discussion on this that hasn't made
its way into the I-D yet - I /think/ consensus was reached around
PaceAtomIdDos, but I can't find a statement one way of another on the
list archives. With something like that text in atompub, at least
people are aware of the problem, although something explicit in an
Atom+OpenSearch document is probably required to make sure everyone's
aware of what would happen.

I'm also not convinced that the semantics are quite right (surely the
feed is a feed of search results, where a search result seems
different to me conceptually than the website a search result refers
to as the source of the match). This isn't nearly so important,
though, and I'm quite willing to accept the shmershing together of the
ideas. (Or even that I'm wrong.)

James

-- 
/--\
  James Aylett  xapian.org
  [EMAIL PROTECTED]   uncertaintydivision.org



Re: OpenSearch RSS

2005-05-30 Thread Thomas Broyer


Tim Bray wrote:



Check out A9's OpenSearch at http://opensearch.a9.com/ - I'm starting  
to hear substantial buzz around this thing.


I wonder, is embedding the OpenSearch RSS stuff in Atom going to  
cause any heartburn?  I'm inclined to think not, but would appreciate  
others having a look.


I get the feeling that OpenSearch + Atom could be real useful. -Tim

Just to see how it would look like, this is what the search result 
example [1] would be in Atom:

atom:feed xml:lang=en-us xmlns:atom=...Atom NS...
  xmlns:os=http://a9.com/-/spec/opensearchrss/1.0/;
atom:titleA9.com Search: New York City history/atom:title
atom:idtag:A9.com,2005:New%20York%20City%20history/atom:id
atom:link rel=alternate type=text/html
  href=http://a9.com/New%20York%20City%20history; /
atom:subtitleSearch results for New York City history at 
A9.com/atom:subtitle

atom:rights#169; 2003-2005, A9.com, Inc. or its affiliates./atom:rights
atom:author
  atom:nameA9.com, Inc./atom:name
  atom:urihttp://a9.com//atom:uri
/atom:author
atom:updated2005-05-30T23:50:00+02:00/atom:updated
os:totalResults423/os:totalResults
os:startIndex1/os:startIndex
os:itemsPerPage10/os:itemsPerPage

atom:entry
  atom:titleNew York City History/atom:title
  atom:link rel=alternate
 href=http://www.columbia.edu/cu/lweb/eguides/amerihist/nyc.html; /
  
atom:idhttp://www.columbia.edu/cu/lweb/eguides/amerihist/nyc.html/atom:id

  atom:updated2005-05-30T00:00:00+02:00/atom:updated
  atom:summary... Harlem.NYC - A virtual tour and information on 
businesses ... with historic photos of Columbia's own New York 
neighborhood ... Internet Resources for the City's History. 
.../atom:summary

/atom:entry
atom:entry
  atom:titleGotham Center for New York City History/atom:title
  atom:link rel=alternate href=http://www.gothamcenter.org/; /
  atom:idhttp://www.gothamcenter.org//atom:id
  atom:updated2005-05-30T00:00:00+02:00/atom:updated
  atom:summary... Submit Events Edit Your Submission. Main 
Neighborhood Stories NYC History in the ... The Gotham Center for New 
York City History is supported by The CUNY Graduate .../atom:summary

/atom:entry
!-- ... --
atom:entry
  atom:titleWelcome to the Museum of the City of New York/atom:title
  atom:link rel=alternate href=http://www.mcny.org/; /
  atom:idhttp://www.mcny.org//atom:id
  atom:updated2005-05-30T00:00:00+02:00/atom:updated
  atom:summary... a list with the event staff. Additional information 
will be included in the confirming email. copy; Museum of the City of 
New York./atom:summary

/atom:entry
/atom:feed

Some comments:

   * I set type=text/html on the feed's alternate link, because the
 OpenSearch RSS 1.0 Specification [1] says the RSS link is a URL
 that can recreate the search in HTML format, @type is not used in
 entries as it might not be text/html
   * I changed the escaped-HTML amp;copy; to #169;, it saves us an
 internal DTD subset while allowing us to use type=text
   * Atom mandates an atom:author, I added a dummy one
   * Atom mandates an atom:updated in the feed, I added a dummy one; it
 should be set to the latest atom:updated date found in the feed's
 entries, or at least to the date the request was made.
   * Atom mandates an atom:updated in each entry, I added a dummy one;
 it should be set to the last access of the search engine to the
 result document, or eventually the date the request was made.
 For example, Google is able to give you this date if it has cached
 the document (when you look at a cached page, Google puts a this
 is Google's cache of URI as retrieved on DATE on top of the page.
   * I used the address of the result document (permalink?) as the
 atom:id of each entry, because this is the easiest way to do it...

I've put this document online [2], with the Atom 0.3 namespace URI 
(http://purl.org/Atom/ns#)


Note that the OpenSearch RSS 1.0 Specification [1] forbids the use of 
escaped HTML in many elements. If there were an OpenSearch Atom, it 
could also be limited to type=text (and/or type=xhtml, because it's 
quite easy to transform XHTML to plain text), though A9.com website 
(which acts as an reader/aggregator for OpenSearch RSS documents) would 
then not be a valid Atom Processor.


The OpenSearch Description Document [3] would /a priori/ be the same 
(except of course that it would use a different value in the Format 
element to indicate OpenSearch Atom instead of OpenSearch RSS.


The Atom result document could also link to the next and previous 
pages with additional atom:link elements in the atom:feed, with 
extended @rel values.


[1] http://opensearch.a9.com/spec/opensearchrss/1.0/
[2] http://www.ltgt.net/atom/opensearch.atom
[3] http://opensearch.a9.com/spec/opensearchdescription/1.0/

--
Thomas Broyer





Re: OpenSearch RSS

2005-05-30 Thread Bill de hÓra


Thomas Broyer wrote:


Tim Bray wrote:



Check out A9's OpenSearch at http://opensearch.a9.com/ - I'm starting  
to hear substantial buzz around this thing.


I wonder, is embedding the OpenSearch RSS stuff in Atom going to  
cause any heartburn?  I'm inclined to think not, but would appreciate  
others having a look.


I get the feeling that OpenSearch + Atom could be real useful. -Tim




   * I set type=text/html on the feed's alternate link, because the
 OpenSearch RSS 1.0 Specification [1] says the RSS link is a URL
 that can recreate the search in HTML format, @type is not used in
 entries as it might not be text/html
   * I changed the escaped-HTML amp;copy; to #169;, it saves us an
 internal DTD subset while allowing us to use type=text
   * Atom mandates an atom:author, I added a dummy one
   * Atom mandates an atom:updated in the feed, I added a dummy one; it
 should be set to the latest atom:updated date found in the feed's
 entries, or at least to the date the request was made.
   * Atom mandates an atom:updated in each entry, I added a dummy one;
 it should be set to the last access of the search engine to the
 result document, or eventually the date the request was made.
 For example, Google is able to give you this date if it has cached
 the document (when you look at a cached page, Google puts a this
 is Google's cache of URI as retrieved on DATE on top of the page.
   * I used the address of the result document (permalink?) as the
 atom:id of each entry, because this is the easiest way to do it...


I did the same experiment; bottom line Amazon will need to add

 -atom:updated
 -atom:id
 -atom:modified
 -a few attributes

to use Atom. They also need to fix their example*, it's invalid XML 
(copy; in the last entry). By the looks of things, with the feed level 
extensions, they're going the route Nature have taken with RSS1.0.


cheers
Bill

* http://opensearch.a9.com/spec/opensearchrss/1.0/





Re: OpenSearch RSS

2005-05-30 Thread Bill de hÓra


Bill de hÓra wrote:


I did the same experiment; bottom line Amazon will need to add
 [...]


Oops, please ignore:


 -atom:modified


[My eyes! The specifications! They do nothing!]

cheers
Bill