Hi Karl,

pssst : my parents told me that my first name is Yann:)
Too late to change ;)


more below

----- Mail original -----
> De: "Karl Dubost" <[email protected]>
> À: [email protected]
> Cc: "Giovanni Tummarello" <[email protected]>, [email protected], 
> "Kingsley Idehen"
> <[email protected]>, "[email protected]" <[email protected]>
> Envoyé: Dimanche 10 Juillet 2011 13:29:10
> Objet: Re: ANN: Sudoc bibliographic ans authority data
> Bonjour Nicolas,
> 
> First of all, very cool. Two comments.
> 
> 
> # INITIAL INDEXING
> 
> Le 9 juil. 2011 à 19:36, Yann NICOLAS a écrit :
> >> quite politely e.g. 1 every 5 secs-
> > May I suggest that you crawl twice faster ?
> 
> 1 every 2.5s
> 
> Le 8 juil. 2011 à 03:31, Yann NICOLAS a écrit :
> > Sorry, we don't provide any dump, as the 10 000 000 files are
> > generated on the fly from
> 
> It means the crawl will be done in… 289 days.
> There should be an easier way for the initial crawling (an initial
> dump for some specific search engines, once), then update depending on
> the last update. Specifically when there is a sitemap.


You're right. Of course.
We are already thinking about a dump.

> 
> 
> # CACHING
> 
> The cache policy seems to not do a good use of your HTTP resources.
> 
> % curl -sI -H "Accept:application/rdf+xml"
> http://www.sudoc.fr/132133520
> HTTP/1.1 200 OK
> Date: Sun, 10 Jul 2011 11:22:05 GMT
> Cache-Control: no-store
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control: no-cache
> Cache-Control: max-age=0
> Content-Type: application/rdf+xml;charset=UTF-8
> Content-Length: 4105
> 
> Doing again the same HTTP request a few seconds later, the date is now
> Date: Sun, 10 Jul 2011 11:24:29 GMT
> 
> These are not cached at all. I do not think it is a good idea, plus
> wrong information for things like Expires :) Maybe, it is just because
> the service is starting and there are still things to tweak.


Thanks !
We're going to consider these issues.

Yann



> 
> 
> --
> Karl Dubost - http://dev.opera.com/
> Developer Relations & Tools, Opera Software

Reply via email to