Hi all,

I am new to nutch and trying to find my way on crawling a single website.

The url where crawling starts from contains some href with javascript,
these javascript calls contains the relative link to the page as one of the
parameters.

I don't really need to run javascript server side, I'd just need to replace
the havascript with a canonical link.

I believe I should use regexp normalize but it doesn't seem to work. Is
this correct?
Is there any way I can test how the crawled content looks like once
normalized?

Reply via email to