Hi All,
I am working with nutch 1.1 . I have modified nutch 1.1 to suit my
purpose. I have implemented custom query filters to handle some
specific functionality . In a sense i am not using any of the query filters
that comes as a bundle in the nutch plugins .
Now i am faced with a major problem . In the NutchBean there is a
search (.. ... ) method which removes duplicate hits from the same site &
restricts it to 2 by default . What happens is that when ever i fire a
query from the command line i find a Query Exception saying that "unknown
field name null" .This is becoz a null field is getting added on re-firing
the query.
When i explicitly add "site" field , it gets stuck in an infinite loop and
the query gets fired continuously .
When i disable this duplicate removal checking by commenting out the lines
, everything works just fine , however the problem is multiple hits from
the same site is shown .
Can any one throw some light on this particular method , what it actually
does and how can i solve this problem .
Thanks & Regards,
Parnab Chanda
Research Scholar
IIT Kharagpur