Hi,

I have been established a way to crawl in NUTCH 0,9, but it does not
work in NUTCH 1.0 anymore. Hope someone can shade some lights to this
problem.

This is what I do. I have grouped my set of URLs into few groups and
crawl them separately, so I can crawl them in different depths, filters,
and schedules. Some groups of urls are all from the same site. After I
am done with all groups, I copy all the segments together, do a crawldb
update, which will create a new crawldb, and then index.

This scheme worked well with nutch 0.9. But when I switch to nutch 1.0,
search results will miss urls of certain segments all together. I have
made sure that I am not filtering them out in any of the steps (crawldb
update and index).

Am I doing this totally wrong and just luck it worked in 0.9? Or
something changed in 1.0?

Thanks

Reply via email to