Hi,

You may want to take a look at https://issues.apache.org/jira/browse/NUTCH-944 There is a path for 1.0 and 1.3. I need to update it for 2.0 and maybe for a code cleanup.
It add support for embed tag and many more.

-----Message d'origine----- From: Germán Biozzoli
Sent: Saturday, April 23, 2011 5:33 AM
To: [email protected]
Subject: embed tag

Hi everybody

Today I  noticed that the documents that are inside and embed tag are
not crawled by  my nutch. The tag is like this:

<embed height="100%" align="center" width="100%" src="pdf/somepdf.pdf">

I know that is not a common way to put a pdf, but is there a way to
instruct nutch to include embed and obtain this document?

Thanks & Regards
German
  • embed tag Germán Biozzoli
    • Re: embed tag Jean-Francois Gingras

Reply via email to