Hi,

I have two urls in my seed.txt url1 & url2. When Nutch runs my ParseFilter
plugin and I can see that I am in url1 by checking webPage.getBaseUrl(). At
that point if I do
String html = new String(webPage.getContent().array());

It returns my the html of both url1 & url 2.

And when my ParseFIlter is again run for url2 and I can see that by
checking webPage.getBaseUrl() == url2

it again return me the html of both pages (url1 & url2)...

Why its doing so ?
How to get the html of only that url for which ParseFilter is currently
running ?

Please any help here !!!

Thanks,
Tony.

Reply via email to