href links with javascript

Marco Crivellaro Wed, 12 Dec 2012 08:07:41 -0800

Hi all,

I am new to nutch and trying to find my way on crawling a single website.


The url where crawling starts from contains some href with javascript,
these javascript calls contains the relative link to the page as one of the
parameters.

I don't really need to run javascript server side, I'd just need to replace
the havascript with a canonical link.

I believe I should use regexp normalize but it doesn't seem to work. Is
this correct?
Is there any way I can test how the crawled content looks like once
normalized?

href links with javascript

Reply via email to