Hello Ferran,

Indeed the so-called BibFilter could be the way to do it. You would need to
create a python script that takes one argument (path to an xml file) and
finally creates a new file called <path-given-as-argument>.insert.xml.

Adding this file to the filter argument in the OAIHarvest admin, will cause
the harvester to call it like so:

$ python your_filter.py /path/to/xml

For example, you can imagine the filtering script doing the following:

1. Read the given file
2. Parse all records into a list using bibrecord.create_records() (if the
file is MARCXML)
3. Iterate over this list and extract some identifier to see if it already
exist. Alternatively you can try calling bibupload.retrieve_rec_id()
4. Add any new records to a list
5. Finally write these records into a new file called
<path-of-given-file>.insert.xml

An example:

Handle argument:
https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L249-L300

Parse and iterate:
https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L302-L327

Add new:
https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L364-L369

Output result:
https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L471-L473

+ make sure it runs from command line:
https://github.com/inspirehep/inspire/blob/master/bibharvest/bibfilter_oaiarXiv2inspire.py#L485-L486

Hope this helps.

Cheers,
Jan

---
Jan Age Lavik
System Developer
INSPIRE-HEP <http://inspirehep.net>

Github: @jalavik <https://github.com/jalavik>
Work phone: +41 22 76 78682

On Wed, Mar 25, 2015 at 12:26 PM, Ferran Jorba <[email protected]> wrote:

> Hello,
>
> is there a blessed way to harvest only new records?  For some sources
> we don't want the remote records to overwrite our copy, and I'm not
> sure how to do it.  According to the OAI harvest guide, it looks like
> that BibFilter could be a way to implement it, but I don't find the
> documentation clear enough:
>
>  http://ddd.uab.cat/help/admin/oaiharvest-admin-guide#2.1
>
> We are still on 1.1.1, if it matters.
>
> Thanks,
>
> Ferran
>

Reply via email to