Hi Ferran,
Ferran Jorba wrote:
[..] may we ask you why the default value is
set to 10 seconds?
I guess the default CFG_OAI_SLEEP value was set to 10 seconds more or
less arbitrarily: to avoid being flooded with requests that result in
potentially bandwidth and CPU hungry responses, a delay between each
request can be necessary, depending on your hardware/network
configuration (do not forget that several harvesters might try to access
the OAI gateway at the same time).
Is it safe to lower it to zero?
Yes, but you might want to monitor your server load to ensure that it
can serve both regular users and requests on the OAI gateway. At CERN we
kept the value of CFG_OAI_SLEEP to 10, and we never noticed any problem.
> we are having some issues when being collected by OAIster, and we
> suspect that their robot doesn't obey the Retry-After HTTP header.
I do not think that Retry-After is a problem. However OAIster attempts
to validate your repository with the "OAI Repository Explorer"
<http://re.cs.uct.ac.za/>. You will notice that it fails with the
version of CDS Invenio you have installed: that's because the validation
tests have become much more strict than before, to better stick to
OAI-PMH. This validation problems has been fixed in later versions of
CDS Invenio.
So this means that you cannot yet register your repository in OAIster or
other similar services that validate the repository with the "OAI
Repository Explorer".
In addition, note that it is important to correctly set the value of
CFG_OAI_IDENTIFY_DESCRIPTION: it contains data that is being checked by
the validator (pay attention to specify the correct base URL, which must
match the URL you submit - trailing "/" must also match - and to not
leave spaces or line breaks inside tags like "<scheme>",
"<repositoryIdentifier>" etc. as it is by default in the invenio.conf
file... You seem to have configured this correctly, but let me specify
it here for others who might not be aware of these details).
Are you being successfully collected by Oaister?
Not sure: they seem to have only a subset of our data, but I do not know
exactly how they count records, if they do selective harvesting, etc.
Best regards
--
Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>