My subject is a pretty good summary. I see the first "details.pa?id=123" in my results, but can't search or find any "details.pa?id=456" links that are in that 1st page that was a hit.
Backgrounder: I have a site that includes a lot of dynamic pages. I edited the crawl-urlfilter.txt and added the following regex and did a crawl (bin/nutch crawl urls -dir crawl -depth 30 -topN 30000): +^http://([a-z0-9]*\.)*www.visitpa.com/visitpa/details.pa\?id= Now the search will return hits on the dynamic details page. For example, here is a search that returns hits on my dynamic pages. http://prhodes.r-effects.com/nutch/search.jsp?query=sunnyledge&hitsPerPage=10&lang=en If you look at the details.pa page that nutch had a hit on, it contains several links of the same format ( details.pa ) My problem is that these other detail links are not being crawled/indexed. I set the depth to "30" so that should not be a limiting factor. I also set a "topN" of 30000, because we have around 16K details.pa pages Any clues on how to proceed and figure out what I need to do to get Nutch to crawl these missing "details.pa" links ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general