Thanks Mike & Dan.

This looks great. 

The one possible glitch I might have: we are currently indexing all our 
content, but in the future, we may exclude some records from the crawl and any 
subsequent indexing. So we may have to work out a way to ensure any intended 
exclusions don't get introduced as new records from the feeds if they've been 
edited. 

Not too worried about this as the periodic full re-crawls would 'reset' the 
scope to the intended set of records. If I recall, there may be some shelf 
locations and/or call number ranges that our cataloguers would prefer not to 
make into the Autonomy crawl.

Cheers,
George

 

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mike 
Rylander
Sent: October 6, 2010 16:01
To: Evergreen Development Discussion List
Subject: Re: [OPEN-ILS-DEV] RSS Feed (SuperCat/unapi) for New / Modified 
/Deleted

On Wed, Oct 6, 2010 at 3:21 PM, Dan Scott <[email protected]> wrote:
> Hey George:
>
> On Wed, Oct 06, 2010 at 11:24:32AM -0400, Duimovich, George wrote:
>> Hello,
>>
>> Context: we are feeding our catalogue records to our institutional search 
>> engine (Autonomy) using the dublin core XSL transformation to supply the 
>> indexable record content. We first supply a list of record ids that 
>> constitute the scope of records we want to supply to the search engine, but 
>> next step is to automate indexing so that we're not always re-crawling the 
>> entire database. The idea is that we'd re-crawl the entire catalogue when 
>> required for re-optimizing the Autonomy indexes (say monthly?), but supply a 
>> new / modified / deleted updated record listing by RSS or separate txt file 
>> posted somewhere crawlable (the latter of which I can do now).
>>
>> Anybody have any thoughts on structuring an RSS feed for this purpose (that 
>> provides some generalized capabilities for other possible consumers of this 
>> this type of feed)?
>
> Yep, the wiki has documented this (if I understand what you want
> correctly) for quite some time now:
> http://evergreen-ils.org/dokuwiki/doku.php?id=backend-devel:supercat:e
> xamples#return_a_feed_of_recently_edited_or_created_records
>
>> For example, would you see a separate feed for updated vs. modified vs. new 
>> records or single feed with some fields like "status" (updated / deleted / 
>> added, etc.) and date status changed, etc.?  And how would you suggest we 
>> can hook into the arbitrary variable 'benchmark date' (in this case, the 
>> last full crawl) from which to determine the relative status changes like 
>> updated / deleted / added etc.?
>
> For the most recent 10 records in MARCXML format which have been 
> edited since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio
> /edit/10/2010-10-01
>
> Similarly, for the most recent 10 records in MARCXML format which have 
> been created since 2010-10-01:
>
> http://catalogue.nrcan.gc.ca/opac/extras/feed/freshmeat/marcxml/biblio
> /import/10/2010-10-01
>
> Substitute "rss2" for "marcxml" and you'll have a generalized feed, 
> albeit with far less metadata.
>
> We're missing a deleted feed, though, but that should be pretty easy 
> to create.
>
> So assuming your indexer can remember the last time it crawled the 
> feed, it should be able to supply the date to these feeds and gobble 
> up data accordingly.
>
>> I can see another general use case for this type of feed being relevant to 
>> those who contribute records to external union catalogues - giving our union 
>> catalogue partners an additional option for scooping up our records with 
>> their own automation.
>
> One must be very optimistic that the union catalogue partners would 
> actually adopt this approach, as welcome as it would be!
>

There is also an item-age axis for browse, which looks like:

http://catalogue.nrcan.gc.ca/opac/extras/browse/html/item-age/-/

That "-" near the end can be replaced by an Org Unit shortname to scope to a 
specific location, and "html" can be replaced by any unAPI format that's valid 
for bib records.

This is a little different from the bib feeds that Dan mentioned, but will 
allow you to crawl back through item additions as well as bib edits.

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  [email protected]
 | web:  http://www.esilibrary.com

Reply via email to