Hello,

Context: we are feeding our catalogue records to our institutional search 
engine (Autonomy) using the dublin core XSL transformation to supply the 
indexable record content. We first supply a list of record ids that constitute 
the scope of records we want to supply to the search engine, but next step is 
to automate indexing so that we're not always re-crawling the entire database. 
The idea is that we'd re-crawl the entire catalogue when required for 
re-optimizing the Autonomy indexes (say monthly?), but supply a new / modified 
/ deleted updated record listing by RSS or separate txt file posted somewhere 
crawlable (the latter of which I can do now).

Anybody have any thoughts on structuring an RSS feed for this purpose (that 
provides some generalized capabilities for other possible consumers of this 
this type of feed)? 

For example, would you see a separate feed for updated vs. modified vs. new 
records or single feed with some fields like "status" (updated / deleted / 
added, etc.) and date status changed, etc.?  And how would you suggest we can 
hook into the arbitrary variable 'benchmark date' (in this case, the last full 
crawl) from which to determine the relative status changes like updated / 
deleted / added etc.?

I can see another general use case for this type of feed being relevant to 
those who contribute records to external union catalogues - giving our union 
catalogue partners an additional option for scooping up our records with their 
own automation.

Thanks,

George Duimovich
NRCan Library / Bibliothèque de RNCan

Reply via email to