try your http.content.limit and also make sure that you haven't changed anything within the tika mimeType mappings.
On Tue, May 22, 2012 at 9:06 AM, Tolga <[email protected]> wrote: > Sorry, I forgot to also add my original problem. PDF files are not crawled. > I even modified -topN to be 10. > > > -------- Original Message -------- > Subject: PDF not crawled/indexed > Date: Tue, 22 May 2012 10:48:15 +0300 > From: Tolga <[email protected]> > To: [email protected] > > > > Hi, > > I am crawling my website with this command: > > bin/nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr > http://localhost:8983/solr/ -depth 20 -topN 5 > > Is it a good idea to modify the directory name? Should I always delete > indexes prior to crawling and stick to the same directory name? > > Regards, > -- Lewis

