Hello Henning, >>> If it's only about serving the bibliographic data I'd go >>> for OAI-PMH as it makes a smaller footprint and scales >>> way better than dumping. Just set your master to expose >>> the collection as OAI-PMH and harvest it from your slave >>> periodically. Depending on the nature of your project >>> you'd most likely want to have the OAI-Server anyway for >>> visibility reasons. >> >> Thank you for the idea, since I’m not familiar with OAI, >> I didn’t give consideration to that possibility. > > Usually, it should offer most flexibility as it works though > http-protocol and all the stuff is quite high-level. Say, > you move on and drop MySQL on the go to next-next, OAI > would still work. We even migrated one of our instances from > a proprietary solution to Marc using OAI exports.
I do think, like Alexander, that it is better to go via dumping and reloading marc records rather than lower-level MySQL tools. However, OAI has been, at least to me, somewhat involving, specially the understanding the corner cases and their consequences. Moreover, in your case, where there is exactly one master and one slave (or, maybe in the future, more than one slave), you may find simpler solutions easier to understand and use. Invenio search API (http://invenio-demo.cern.ch/help/hacking/search-engine-api) is great in many respects, because it offers many possibilities and it mirrors the URL search parameters. Among other things, it allows you to get the new and modified records form a database. At UAB I do a daily dump of the bibliographic records to a sqlite database, for offline batch postprocessing. For this, I just get the new or modified records and update the databse. I have done a minimal modification to it so I took some internal stuff, but it passes a quick test. You may find this script useful to as an example to extract new, modified and deleted records from your master site to upload then (via bibupload -ri) to the slave site(s). Excuse that it may not be as polished as it should but, again, I use for internal purposes. I understand that in newer Invenio releases it is possible to bibupload records in text marc (tm) format. In the version of the script I attach I haven't been able to test it, as in 1.1.1 it cannot be done, so in the print_record function I use the 'xm' parameter; probably you may change it to 'tm'. Hope it helps, Ferran
#!/usr/bin/python # -*- coding: utf-8 -*- # Time-stamp: <2013.11.25 09:36:56 marcdump.py [email protected]> from __future__ import print_function, division import os import sys import time import pprint import sqlite3 sys.path.append(os.path.expanduser('~/lib/python/')) from invenio.search_engine import perform_request_search, \ search_pattern, print_record max_records = 0 def create_db(dbname): sql = ''' create table records ( recid integer primary key, record varchar );''' db = sqlite3.connect(dbname) db.execute(sql) db.close() def update_db(db, since): sql = '''replace into records values (?, ?);''' gmtime = time.gmtime(since) (year, month, day) = gmtime[:3] recids = perform_request_search(dt='m', d1y=year, d1m=month, d1d=day) recids.reverse() n = 0 for recid in recids: n += 1 if max_records and n > max_records: break marc = print_record(recid, 'xm') time.sleep(0.001) values = (recid, unicode(marc, 'utf-8')) db.execute(sql, values) db.commit() deleted_record_fmt = '''' <record> <controlfield tag="001">%(recid)s</controlfield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="c">DELETED</subfield> </datafield> </record>''' deleted_recids = search_pattern(p='deleted', f='980').tolist() for recid in deleted_recids: marc = deleted_record_fmt % (recid) values = (recid, unicode(marc, 'utf-8')) db.execute(sql, values) db.commit() def dump_db(db): sql = ''' select recid, record from records order by recid desc;''' cursor = db.cursor() for row in cursor.execute(sql): print(row[1].encode('utf-8')) print() def main(): if len(sys.argv) < 2: print('usage: %s database.db' % (sys.argv[0])) dbname = sys.argv[1] if os.path.isfile(dbname): mtime = os.path.getmtime(dbname) else: create_db(dbname) mtime = 0 db = sqlite3.connect(dbname) update_db(db, mtime) # TODO: split in two operations depending on params dump_db(db) db.close() if __name__ == '__main__': main()

