Hi Ferran,

Ferran Jorba wrote:
[..] may we ask you why the default value is
set to 10 seconds?

I guess the default CFG_OAI_SLEEP value was set to 10 seconds more or less arbitrarily: to avoid being flooded with requests that result in potentially bandwidth and CPU hungry responses, a delay between each request can be necessary, depending on your hardware/network configuration (do not forget that several harvesters might try to access the OAI gateway at the same time).

Is it safe to lower it to zero?

Yes, but you might want to monitor your server load to ensure that it can serve both regular users and requests on the OAI gateway. At CERN we kept the value of CFG_OAI_SLEEP to 10, and we never noticed any problem.

> we are having some issues when being collected by OAIster, and we
> suspect that their robot doesn't obey the Retry-After HTTP header.

I do not think that Retry-After is a problem. However OAIster attempts to validate your repository with the "OAI Repository Explorer" <http://re.cs.uct.ac.za/>. You will notice that it fails with the version of CDS Invenio you have installed: that's because the validation tests have become much more strict than before, to better stick to OAI-PMH. This validation problems has been fixed in later versions of CDS Invenio.

So this means that you cannot yet register your repository in OAIster or other similar services that validate the repository with the "OAI Repository Explorer".

In addition, note that it is important to correctly set the value of CFG_OAI_IDENTIFY_DESCRIPTION: it contains data that is being checked by the validator (pay attention to specify the correct base URL, which must match the URL you submit - trailing "/" must also match - and to not leave spaces or line breaks inside tags like "<scheme>", "<repositoryIdentifier>" etc. as it is by default in the invenio.conf file... You seem to have configured this correctly, but let me specify it here for others who might not be aware of these details).

Are you being successfully collected by Oaister?

Not sure: they seem to have only a subset of our data, but I do not know exactly how they count records, if they do selective harvesting, etc.

Best regards
--
Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>

Reply via email to