You could 1) exclude links to *.js documents by URL filters, e.g, add to regex-urlfilter.txt:
# exclude JavaScript -\.js$ 2) exclude outlinks from "link" and "script" elements in general by adding these to <property> <name>parser.html.outlinks.ignore_tags</name> <value></value> <description>Comma separated list of HTML tags, from which outlinks shouldn't be extracted. Nutch takes links from: a, area, form, frame, iframe, script, link, img. If you add any of those tags here, it won't be taken. Default is empty list. Probably reasonable value for most people would be "img,script,link".</description> </property> On 04/10/2012 11:36 PM, SUJIT PAL wrote:
Hi all, This is for Nutch trunk version. During the parse phase, it is possible to suppress Javascript outlinks by setting a configuration parameter? If so, what would the parameter be? Thanks very much, Sujit

