Hi Scott, cycles/rounds/depth is roughly equivalent to the number of hops/links to reach a document starting from one of the seeds. It has nothing in common with the depth in the server's file system hierarchy. If there is a link from http://www.bizjournals.com/triangle/ to e.g. http://www.bizjournals.com/triangle/blog/techflash/story.html the latter document is crawled in the second round.
The easiest way to limit by directory depth are regex URL filters. Sebastian On 04/07/2015 04:04 PM, Scott Lundgren wrote: > Is Nutch’s Rounds/Crawl Depth relative to the URLs in seed. txt? > > For example if my seed.txt is http://www.bizjournals.com/triangle/ and I want > to make sure that I’m crawling > http://www.bizjournals.com/triangle/prnewswire/press_releases/.* and > http://www.bizjournals.com/triangle/blog/techflash/.* does my rounds need to > be set to 2 (i.e.: everything under /prnewswire/press_releases/ is crawled ) > or 3 (/triangle/prnewswire/press_releases/) > > Scott Lundgren > Software Engineer > (704) 973-7388 > [email protected]<mailto:[email protected]> > > QuietStream Financial, LLC<http://www.quietstreamfinancial.com> > 11121 Carmel Commons Boulevard | Suite 250 > Charlotte, North Carolina 28226 > > Our Portfolio of Commercial Real Estate Solutions: > • <http://www.defeasewithease.com> Commercial > Defeasance<http://www.defeasewithease.com/> (Defease With Ease®) > • Fairview Real Estate Solutions<http://www.fairviewres.com/> > • Great River Mortgage > Capital<http://www.greatrivermortgagecapital.com/> > • Tax Credit Asset Management<http://www.tcamre.com/> > • Radian Generation<http://www.radiangeneration.com/> > • EntityKeeper<http://www.entitykeeper.com/>™ > • Crowd With Ease<http://www.crowdwithease.com>™ > • FullCapitalStack<http://www.fullcapitalstack.com>™ > • CrowdRabbit<http://www.crowdrabbit.com>™ >

