On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <[email protected]> wrote: > Hey George: > > On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote: >> Hello, >> >> Context: we are feeding our catalogue records to our institutional search >> engine (Autonomy) using the dublin core XSL transformation to supply the >> indexable record content. We first supply a list of record ids that >> constitute the scope of records we want to supply to the search engine, but >> next step is to automate indexing so that we're not always re-crawling the >> entire database. The idea is that we'd re-crawl the entire catalogue when >> required for re-optimizing the Autonomy indexes (say monthly?), but supply a >> new / modified / deleted updated record listing by RSS or separate txt file >> posted somewhere crawlable (the latter of which I can do now). >> >> Anybody have any thoughts on structuring an RSS feed for this purpose (that >> provides some generalized capabilities for other possible consumers of this >> this type of feed)? > > Yep, the wiki has documented this (if I understand what you want > correctly) for quite some time now: > http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:examples#return_a_feed_of_recently_edited_or_created_records > >> For example, would you see a separate feed for updated vs. modified vs. new >> records or single feed with some fields like "status" (updated / deleted / >> added, etc.) and date status changed, etc.? And how would you suggest we >> can hook into the arbitrary variable 'benchmark date' (in this case, the >> last full crawl) from which to determine the relative status changes like >> updated / deleted / added etc.? > > For the most recent 10 records in MARCXML format which have been edited > since 2010-10-01: > > http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/edit/10/2010-10-01 > > Similarly, for the most recent 10 records in MARCXML format which have > been created since 2010-10-01: > > http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/import/10/2010-10-01 > > Substitute "rss2" for "marcxml" and you'll have a generalized feed, > albeit with far less metadata. > > We're missing a deleted feed, though, but that should be pretty easy to > create. > > So assuming your indexer can remember the last time it crawled the feed, > it should be able to supply the date to these feeds and gobble up data > accordingly. > >> I can see another general use case for this type of feed being relevant to >> those who contribute records to external union catalogues - giving our union >> catalogue partners an additional option for scooping up our records with >> their own automation. > > One must be very optimistic that the union catalogue partners would > actually adopt this approach, as welcome as it would be! >
There is also an item-age axis for browse, which looks like: http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/ That "-" near the end can be replaced by an Org Unit shortname to scope to a specific location, and "html" can be replaced by any unAPI format that's valid for bib records. This is a little different from the bib feeds that Dan mentioned, but will allow you to crawl back through item additions as well as bib edits. -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com
