I don't know if I understand completely your email. What you mean with "cache"?
So if you go with the standard search results page, there is a link to a cached copy of the page. If the page was html, then there are no problems, however, if the page was binary, it returns a http 500 internal server error. You can see this if you click on the "cached" link of any of the pdf documents in the search results on my search engine: http://ldssearch.com/search.jsp?lang=en&query=pdf
steven shingler escribió: > Hi all, > > I'm trying to find out which filetypes nutch will cache. > > for example: it does html, but not pdf. > > Is there any documentation on how different filetypes are handled? > > Is it possible to configure nutch to cache pdfs etc? > > Any advice very gratefully received. > Thanks, > Steve > > ------------------------------------------------------------------------ > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.405 / Virus Database: 268.12.3/445 - Release Date: 11/09/2006 > __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas
-- http://JacobBrunson.com
