Thanks Mike & Dan. This looks great.
The one possible glitch I might have: we are currently indexing all our content, but in the future, we may exclude some records from the crawl and any subsequent indexing. So we may have to work out a way to ensure any intended exclusions don't get introduced as new records from the feeds if they've been edited. Not too worried about this as the periodic full re-crawls would 'reset' the scope to the intended set of records. If I recall, there may be some shelf locations and/or call number ranges that our cataloguers would prefer not to make into the Autonomy crawl. Cheers, George -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Rylander Sent: October 6, 2010 16:01 To: Evergreen Development Discussion List Subject: Re: [OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified /Deleted On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <[email protected]> wrote: > Hey George: > > On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote: >> Hello, >> >> Context: we are feeding our catalogue records to our institutional search >> engine (Autonomy) using the dublin core XSL transformation to supply the >> indexable record content. We first supply a list of record ids that >> constitute the scope of records we want to supply to the search engine, but >> next step is to automate indexing so that we're not always re-crawling the >> entire database. The idea is that we'd re-crawl the entire catalogue when >> required for re-optimizing the Autonomy indexes (say monthly?), but supply a >> new / modified / deleted updated record listing by RSS or separate txt file >> posted somewhere crawlable (the latter of which I can do now). >> >> Anybody have any thoughts on structuring an RSS feed for this purpose (that >> provides some generalized capabilities for other possible consumers of this >> this type of feed)? > > Yep, the wiki has documented this (if I understand what you want > correctly) for quite some time now: > http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:e > xamples#return_a_feed_of_recently_edited_or_created_records > >> For example, would you see a separate feed for updated vs. modified vs. new >> records or single feed with some fields like "status" (updated / deleted / >> added, etc.) and date status changed, etc.? And how would you suggest we >> can hook into the arbitrary variable 'benchmark date' (in this case, the >> last full crawl) from which to determine the relative status changes like >> updated / deleted / added etc.? > > For the most recent 10 records in MARCXML format which have been > edited since 2010-10-01: > > http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio > /edit/10/2010-10-01 > > Similarly, for the most recent 10 records in MARCXML format which have > been created since 2010-10-01: > > http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio > /import/10/2010-10-01 > > Substitute "rss2" for "marcxml" and you'll have a generalized feed, > albeit with far less metadata. > > We're missing a deleted feed, though, but that should be pretty easy > to create. > > So assuming your indexer can remember the last time it crawled the > feed, it should be able to supply the date to these feeds and gobble > up data accordingly. > >> I can see another general use case for this type of feed being relevant to >> those who contribute records to external union catalogues - giving our union >> catalogue partners an additional option for scooping up our records with >> their own automation. > > One must be very optimistic that the union catalogue partners would > actually adopt this approach, as welcome as it would be! > There is also an item-age axis for browse, which looks like: http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/ That "-" near the end can be replaced by an Org Unit shortname to scope to a specific location, and "html" can be replaced by any unAPI format that's valid for bib records. This is a little different from the bib feeds that Dan mentioned, but will allow you to crawl back through item additions as well as bib edits. -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com
