Hello Ferran, Indeed the so-called BibFilter could be the way to do it. You would need to create a python script that takes one argument (path to an xml file) and finally creates a new file called <path-given-as-argument>.insert.xml.
Adding this file to the filter argument in the OAIHarvest admin, will cause the harvester to call it like so: $ python your_filter.py /path/to/xml For example, you can imagine the filtering script doing the following: 1. Read the given file 2. Parse all records into a list using bibrecord.create_records() (if the file is MARCXML) 3. Iterate over this list and extract some identifier to see if it already exist. Alternatively you can try calling bibupload.retrieve_rec_id() 4. Add any new records to a list 5. Finally write these records into a new file called <path-of-given-file>.insert.xml An example: Handle argument: https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L249-L300 Parse and iterate: https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L302-L327 Add new: https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L364-L369 Output result: https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L471-L473 + make sure it runs from command line: https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L485-L486 Hope this helps. Cheers, Jan --- Jan Age Lavik System Developer INSPIRE-HEP <http://inspirehep.net> Github: @jalavik <https://github.com/jalavik> Work phone: +41 22 76 78682 On Wed, Mar 25, 2015 at 12:26 PM, Ferran Jorba <[email protected]> wrote: > Hello, > > is there a blessed way to harvest only new records? For some sources > we don't want the remote records to overwrite our copy, and I'm not > sure how to do it. According to the OAI harvest guide, it looks like > that BibFilter could be a way to implement it, but I don't find the > documentation clear enough: > > http://ddd.uab.cat/help/admin/oaiharvest-admin-guide#2.1 > > We are still on 1.1.1, if it matters. > > Thanks, > > Ferran >

