Hi,
I have a list of URLs to crawl. Some of these URLs point to specific
parts of a particular site. I don't want to crawl this entire site, but
I also don't want to specify regex filters for all those URLs either.
There are just too many of them.
I want to limit the number of steps to find more URLs starting from the
original URL and not restart finding new URLs from URLs "below" the
newly discovered URL.
So:
Inject http://ex.com/p1 --> find http://ex.com/p1/page1.html and
http://ex.com/p1/page2.html and link back to homepage http://ex.com
On next cycle:
http://ex.com --> find links to http://ex.com/p2 and http://ex.com/p3
Do not find any more links below http://ex.com/p2 and http://ex.com/p3
as they're already one step away from the original http://ex.com/p1.
Is this possible?
Regards,
Jeroen