Hi, Watching your website I can see two kind of different results:
-For example the first hit http://www.lds.org/newsroom/files/jeff_lindsay_DNA_3.pdf, has no summary and it produces the problem with cache. -The third hit belongs to the second group, they have summary and the cache link goes fine. So it looks like nutch cant access the content of first groupt hits. Maybe parse-pdf plugin cant handle this pdf, it could happen, this would also explains why the title of the first group hits is the URL, and not the title keep inside pdf document. If I were you I would crawl only the first hit ( http://www.lds.org/newsroom/files/jeff_lindsay_DNA_3.pdf ), and look the log file. If parse-pdf cant handle this document you will see a big ERROR message. Hope it helps. Alvaro C. 2006/9/14, Jacob Brunson <[EMAIL PROTECTED]>:
> > I don't know if I understand completely your email. > What you mean with "cache"? So if you go with the standard search results page, there is a link to a cached copy of the page. If the page was html, then there are no problems, however, if the page was binary, it returns a http 500 internal server error. You can see this if you click on the "cached" link of any of the pdf documents in the search results on my search engine: http://ldssearch.com/search.jsp?lang=en&query=pdf > > steven shingler escribió: > > Hi all, > > > > I'm trying to find out which filetypes nutch will cache. > > > > for example: it does html, but not pdf. > > > > Is there any documentation on how different filetypes are handled? > > > > Is it possible to configure nutch to cache pdfs etc? > > > > Any advice very gratefully received. > > Thanks, > > Steve > > > > ------------------------------------------------------------------------ > > > > No virus found in this incoming message. > > Checked by AVG Free Edition. > > Version: 7.1.405 / Virus Database: 268.12.3/445 - Release Date: 11/09/2006 > > > > > > > __________________________________________________ > Preguntá. Respondé. Descubrí. > Todo lo que querías saber, y lo que ni imaginabas, > está en Yahoo! Respuestas (Beta). > ¡Probalo ya! > http://www.yahoo.com.ar/respuestas > > > -- http://JacobBrunson.com
