Marco PV wrote:
Hi,

How to add new parameter at HashMap  linkParams?
To follow img src="iamge_url"

linkParams.put("img", new LinkParams("img", "src", 1));

This should be:

linkParams.put("img", new LinkParams("img", "src", 0));


linkParams.put("a", new LinkParams("a", "href", 1)); linkParams.put("img", new LinkParams("img", "src", 1)); linkParams.put("area", new LinkParams("area", "href", 0)); linkParams.put("frame", new LinkParams("frame", "src", 0)); linkParams.put("iframe", new LinkParams("iframe", "src", 0));

What's the difference between 0 and 1?

0 means this element has no children, so there is no need to process sub-nodes. 1 means that at least one child node is expected.



Also, to be able to get iamge src urls, I've changed (at conf/nutch-default.xml)
<name>db.ignore.internal.links</name>
<value>false</value>

You also need to allow the image file extensions in the regex-urlfilter.txt.


-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to