Hi All,

I'm using Nutch 2.x/Cassandra and I have 3 urls in seed list:

www.news1.com
www.news2.com
www.news3.com

Now I wanted to parse and extract html of those links. In order to full
fill my requirement I have written a parse filter plugin.

The problem is that following line "page.getContent().array()" return the
html of above 3 sites each time nutch parsefilter's @overide parse filter
is called for sites in seed list.

Thanks in advance!
Jamshaid

Reply via email to