Hi, There is an open thread on the user list for this right now. Please look in the recent archive. I think it would be best to take this conv over there. Lewis
On Thursday, June 20, 2013, Jamshaid Ashraf <[email protected]> wrote: > Hi All, > > I'm using Nutch 2.x/Cassandra and I have 3 urls in seed list: > > www.news1.com > www.news2.com > www.news3.com > > Now I wanted to parse and extract html of those links. In order to full > fill my requirement I have written a parse filter plugin. > > The problem is that following line "page.getContent().array()" return the > html of above 3 sites each time nutch parsefilter's @overide parse filter > is called for sites in seed list. > > Thanks in advance! > Jamshaid > -- *Lewis*

