Hi,
You may want to take a look at
https://issues.apache.org/jira/browse/NUTCH-944
There is a path for 1.0 and 1.3. I need to update it for 2.0 and maybe for a
code cleanup.
It add support for embed tag and many more.
-----Message d'origine-----
From: Germán Biozzoli
Sent: Saturday, April 23, 2011 5:33 AM
To: [email protected]
Subject: embed tag
Hi everybody
Today I noticed that the documents that are inside and embed tag are
not crawled by my nutch. The tag is like this:
<embed height="100%" align="center" width="100%" src="pdf/somepdf.pdf">
I know that is not a common way to put a pdf, but is there a way to
instruct nutch to include embed and obtain this document?
Thanks & Regards
German