Hi All, I'm using Nutch 2.x/Cassandra and I have 3 urls in seed list:
www.news1.com www.news2.com www.news3.com Now I wanted to parse and extract html of those links. In order to full fill my requirement I have written a parse filter plugin. The problem is that following line "page.getContent().array()" return the html of above 3 sites each time nutch parsefilter's @overide parse filter is called for sites in seed list. Thanks in advance! Jamshaid

