How high did you set the depth? And why do you think it can't go any higher?



On Oct 9, 2012, at 5:15 AM, Jiang Fung Wong wrote:

> Hi All,
> 
> I am setting up nutch to crawl forum pages and index the posts in the
> content pages (threads). I face a problem: nutch could not discover
> all content pages, despite me setting a very high depth.
> 
> This is because, typically a thread could have many posts that span
> several pages. Suppose I am at page 1 of 30. It only contains links to
> page2, page3, up to page10, and the last page.
> 
> "[1,2,3,4....10] Next Last"
> 
> I have to go to page 2 to discover page 11, and so on. So to discover
> all 30 pages, nutch has to explore pages 1~20, which is not possible
> with a typical depth.
> 
> What should I do in this case?
> 
> 
> Regards,
> Jiang

Reply via email to