Here is the output from "nutch readdb -dumplinks". This is a clearly a truncated link topology for these [ages. Is this the result of a bug in my script? Or is this something the tool should clean up?
It looks like db.ignore.internal.links is true, so that all but the first internal links are ignored. This parameter determines what happens when you add a link from the same host as the page. If the paramter is true and the page already one or more links to it, then we ignore the new internal link.
Doug
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
