Hi.

I am using nutch 2.0 with hsql.

I've created some plugins for parsing special content inside company
website, the plugins parse the content and next send some data to a
sql server database,this is working fine. But the problem is the crawl
command. I am starting nutch with:
./nutch crawl -depth 300 -topN 30000.

In nutch-site.xml i configured the refetch interval to 30 days(the
default value) but after each cycle nutch fetches the new pages found
and the old pages.

What i am doing wrong?

Reply via email to