Hi All, I am setting up nutch to crawl forum pages and index the posts in the content pages (threads). I face a problem: nutch could not discover all content pages, despite me setting a very high depth.
This is because, typically a thread could have many posts that span several pages. Suppose I am at page 1 of 30. It only contains links to page2, page3, up to page10, and the last page. "[1,2,3,4....10] Next Last" I have to go to page 2 to discover page 11, and so on. So to discover all 30 pages, nutch has to explore pages 1~20, which is not possible with a typical depth. What should I do in this case? Regards, Jiang

