Hi,
I'm playing with Invenio's fulltext indexing and working on creating an
index for the .5M fulltexts that we have.
I have 2 remarks that came up:
* BibIndex log output is way too verbose. The limit of 10 log files of
1MB each was met in a few minutes only. This means that we basically
loose all the logs except the latest 10MB. Would it be possible to have
only relevant information logged so that it can actually be useful to
read the logs?
* If a URL cannot be opened, an email is sent to the admin. OK, I agree
that it is nice to alert the admin, but we do have a lot of documents
with dead links (working on it) and I hate having 10,000+ emails waiting
in my inbox. Each of these emails contains this: "HTTPError: HTTP Error
404: Not Found" without the URL. (The URL is buried in verbose output
that follows this message.) So it would be nice to have the URL included
in the title message and it would be also nice to group the HTTP errors
and send only one or just a few emails to the admin.
My guess is that we are not the only one to have such problems and it
would be nice to transform these issues in tasks.
Benoit.