Hi Gong Zhao,
        Make sure you have the parse-pdf plugin enabled in your nutch-site.xml 
file.
I.e.
<property>
  <name>plugin.includes</name>
  <value>...|parse-(xml|text|html|js|pdf)|...</value>
  <description>
  </description>
</property>

That's the only thing I can think of at first glance.

Patrick
-----Original Message-----
From: 宫照 [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 23, 2008 11:27 PM
To: [email protected]
Subject: nutch fetched but no indexed

Hi everybody,

I face a problem when using nutch. I use nuth to crawl in intranet. It works
well before. But recently, I add some urls to crawl. These urls ara
different with normal .The new urls like this:
http://compass.mydomain.com/go/247460034

there are many folders or documents under this url, such as folder:
http://compass.mot.com/go/247460034/2354342276
documents:
http://compass.mot.com/go/247460034/mydoc.pdf

After crawl, the docs under this kind of urls can not be searched,
I check the log, I find when crawling  this kind of urls can be fetched ,but
they were not indexed.

I don't know why. Can you tell how to do?

regards,

Gong Zhao

Reply via email to