Hey George: On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote: > Hello, > > Context: we are feeding our catalogue records to our institutional search > engine (Autonomy) using the dublin core XSL transformation to supply the > indexable record content. We first supply a list of record ids that > constitute the scope of records we want to supply to the search engine, but > next step is to automate indexing so that we're not always re-crawling the > entire database. The idea is that we'd re-crawl the entire catalogue when > required for re-optimizing the Autonomy indexes (say monthly?), but supply a > new / modified / deleted updated record listing by RSS or separate txt file > posted somewhere crawlable (the latter of which I can do now). > > Anybody have any thoughts on structuring an RSS feed for this purpose (that > provides some generalized capabilities for other possible consumers of this > this type of feed)?
Yep, the wiki has documented this (if I understand what you want correctly) for quite some time now: http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:examples#return_a_feed_of_recently_edited_or_created_records > For example, would you see a separate feed for updated vs. modified vs. new > records or single feed with some fields like "status" (updated / deleted / > added, etc.) and date status changed, etc.? And how would you suggest we can > hook into the arbitrary variable 'benchmark date' (in this case, the last > full crawl) from which to determine the relative status changes like updated > / deleted / added etc.? For the most recent 10 records in MARCXML format which have been edited since 2010-10-01: http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/edit/10/2010-10-01 Similarly, for the most recent 10 records in MARCXML format which have been created since 2010-10-01: http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/import/10/2010-10-01 Substitute "rss2" for "marcxml" and you'll have a generalized feed, albeit with far less metadata. We're missing a deleted feed, though, but that should be pretty easy to create. So assuming your indexer can remember the last time it crawled the feed, it should be able to supply the date to these feeds and gobble up data accordingly. > I can see another general use case for this type of feed being relevant to > those who contribute records to external union catalogues - giving our union > catalogue partners an additional option for scooping up our records with > their own automation. One must be very optimistic that the union catalogue partners would actually adopt this approach, as welcome as it would be!
