Hi,

I have a list of URLs to crawl. Some of these URLs point to specific parts of a particular site. I don't want to crawl this entire site, but I also don't want to specify regex filters for all those URLs either. There are just too many of them.

I want to limit the number of steps to find more URLs starting from the original URL and not restart finding new URLs from URLs "below" the newly discovered URL.

So:
Inject http://ex.com/p1 --> find http://ex.com/p1/page1.html and http://ex.com/p1/page2.html and link back to homepage http://ex.com

On next cycle:
http://ex.com --> find links to http://ex.com/p2 and http://ex.com/p3

Do not find any more links below http://ex.com/p2 and http://ex.com/p3 as they're already one step away from the original http://ex.com/p1.

Is this possible?

Regards,


Jeroen

Reply via email to