> > I don't know if I understand completely your email. > What you mean with "cache"?
So if you go with the standard search results page, there is a link to a cached copy of the page. If the page was html, then there are no problems, however, if the page was binary, it returns a http 500 internal server error. You can see this if you click on the "cached" link of any of the pdf documents in the search results on my search engine: http://ldssearch.com/search.jsp?lang=en&query=pdf > > steven shingler escribió: > > Hi all, > > > > I'm trying to find out which filetypes nutch will cache. > > > > for example: it does html, but not pdf. > > > > Is there any documentation on how different filetypes are handled? > > > > Is it possible to configure nutch to cache pdfs etc? > > > > Any advice very gratefully received. > > Thanks, > > Steve > > > > ------------------------------------------------------------------------ > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.1.405 / Virus Database: 268.12.3/445 - Release Date: 11/09/2006 > > > > > > > __________________________________________________ > Preguntá. Respondé. Descubrí. > Todo lo que querías saber, y lo que ni imaginabas, > está en Yahoo! Respuestas (Beta). > ¡Probalo ya! > http://www.yahoo.com.ar/respuestas > > > -- http://JacobBrunson.com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
