Hi,
How to add new parameter at HashMap linkParams? To follow img src="iamge_url"
linkParams.put("img", new LinkParams("img", "src", 1));
This should be:
linkParams.put("img", new LinkParams("img", "src", 0));
linkParams.put("a", new LinkParams("a", "href", 1)); linkParams.put("img", new LinkParams("img", "src", 1)); linkParams.put("area", new LinkParams("area", "href", 0)); linkParams.put("frame", new LinkParams("frame", "src", 0)); linkParams.put("iframe", new LinkParams("iframe", "src", 0));
What's the difference between 0 and 1?
0 means this element has no children, so there is no need to process sub-nodes. 1 means that at least one child node is expected.
Also, to be able to get iamge src urls, I've changed (at conf/nutch-default.xml)
<name>db.ignore.internal.links</name>
<value>false</value>
You also need to allow the image file extensions in the regex-urlfilter.txt.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
