On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <[email protected]> wrote:
> Hey George:
>
> On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote:
>> Hello,
>>
>> Context: we are feeding our catalogue records to our institutional search 
>> engine (Autonomy) using the dublin core XSL transformation to supply the 
>> indexable record content. We first supply a list of record ids that 
>> constitute the scope of records we want to supply to the search engine, but 
>> next step is to automate indexing so that we're not always re-crawling the 
>> entire database. The idea is that we'd re-crawl the entire catalogue when 
>> required for re-optimizing the Autonomy indexes (say monthly?), but supply a 
>> new / modified / deleted updated record listing by RSS or separate txt file 
>> posted somewhere crawlable (the latter of which I can do now).
>>
>> Anybody have any thoughts on structuring an RSS feed for this purpose (that 
>> provides some generalized capabilities for other possible consumers of this 
>> this type of feed)?
>
> Yep, the wiki has documented this (if I understand what you want
> correctly) for quite some time now:
> http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:examples#return_a_feed_of_recently_edited_or_created_records
>
>> For example, would you see a separate feed for updated vs. modified vs. new 
>> records or single feed with some fields like "status" (updated / deleted / 
>> added, etc.) and date status changed, etc.?  And how would you suggest we 
>> can hook into the arbitrary variable 'benchmark date' (in this case, the 
>> last full crawl) from which to determine the relative status changes like 
>> updated / deleted / added etc.?
>
> For the most recent 10 records in MARCXML format which have been edited
> since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/edit/10/2010-10-01
>
> Similarly, for the most recent 10 records in MARCXML format which have
> been created since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio/import/10/2010-10-01
>
> Substitute "rss2" for "marcxml" and you'll have a generalized feed,
> albeit with far less metadata.
>
> We're missing a deleted feed, though, but that should be pretty easy to
> create.
>
> So assuming your indexer can remember the last time it crawled the feed,
> it should be able to supply the date to these feeds and gobble up data
> accordingly.
>
>> I can see another general use case for this type of feed being relevant to 
>> those who contribute records to external union catalogues - giving our union 
>> catalogue partners an additional option for scooping up our records with 
>> their own automation.
>
> One must be very optimistic that the union catalogue partners would
> actually adopt this approach, as welcome as it would be!
>

There is also an item-age axis for browse, which looks like:

http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/

That "-" near the end can be replaced by an Org Unit shortname to
scope to a specific location, and "html" can be replaced by any unAPI
format that's valid for bib records.

This is a little different from the bib feeds that Dan mentioned, but
will allow you to crawl back through item additions as well as bib
edits.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  [email protected]
 | web:  http://www.esilibrary.com

Reply via email to