Re: PDF not crawled/indexed

Lewis John Mcgibbney Tue, 22 May 2012 02:14:48 -0700

try your http.content.limit and also make sure that you haven't
changed anything within the tika mimeType mappings.


On Tue, May 22, 2012 at 9:06 AM, Tolga <[email protected]> wrote:
> Sorry, I forgot to also add my original problem. PDF files are not crawled.
> I even modified -topN to be 10.
>
>
> -------- Original Message --------
> Subject:        PDF not crawled/indexed
> Date:   Tue, 22 May 2012 10:48:15 +0300
> From:   Tolga <[email protected]>
> To:     [email protected]
>
>
>
> Hi,
>
> I am crawling my website with this command:
>
> bin/nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr
> http://localhost:8983/solr/ -depth 20 -topN 5
>
> Is it a good idea to modify the directory name? Should I always delete
> indexes prior to crawling and stick to the same directory name?
>
> Regards,
>



-- 
Lewis

Re: PDF not crawled/indexed

Reply via email to