Hi,

FYI, our server RERO DOC has been successfully harvested by OAIster lately, roughly on a weekly basis (last harvesting took place on 09.12.2008). I just made a few tests on their search interface in order to confirm this, and I can indeed find several of our records dating from 08.12.2008.

This might be useful: we currently check OAIster activity on our server simply by grep'ing the apache log for '141.211.175.166', which is obviously quite a fallible method, but has been effective for more than a year now. I don't even know if they use several harvesting servers or this single one, but that might help you out for the time being.

Regards,

Miguel.


On Dec 9, 2008, at 14:11, Jerome Caffaro wrote:

Hi Ferran,

Ferran Jorba wrote:
[..] may we ask you why the default value is
set to 10 seconds?

I guess the default CFG_OAI_SLEEP value was set to 10 seconds more or less arbitrarily: to avoid being flooded with requests that result in potentially bandwidth and CPU hungry responses, a delay between each request can be necessary, depending on your hardware/ network configuration (do not forget that several harvesters might try to access the OAI gateway at the same time).

Is it safe to lower it to zero?

Yes, but you might want to monitor your server load to ensure that it can serve both regular users and requests on the OAI gateway. At CERN we kept the value of CFG_OAI_SLEEP to 10, and we never noticed any problem.

> we are having some issues when being collected by OAIster, and we
> suspect that their robot doesn't obey the Retry-After HTTP header.

I do not think that Retry-After is a problem. However OAIster attempts to validate your repository with the "OAI Repository Explorer" <http://re.cs.uct.ac.za/>. You will notice that it fails with the version of CDS Invenio you have installed: that's because the validation tests have become much more strict than before, to better stick to OAI-PMH. This validation problems has been fixed in later versions of CDS Invenio.

So this means that you cannot yet register your repository in OAIster or other similar services that validate the repository with the "OAI Repository Explorer".

In addition, note that it is important to correctly set the value of CFG_OAI_IDENTIFY_DESCRIPTION: it contains data that is being checked by the validator (pay attention to specify the correct base URL, which must match the URL you submit - trailing "/" must also match - and to not leave spaces or line breaks inside tags like "<scheme>", "<repositoryIdentifier>" etc. as it is by default in the invenio.conf file... You seem to have configured this correctly, but let me specify it here for others who might not be aware of these details).

Are you being successfully collected by Oaister?

Not sure: they seem to have only a subset of our data, but I do not know exactly how they count records, if they do selective harvesting, etc.

Best regards
--
Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>


Reply via email to