Amazon.com, as a common example, has pages with links that do not include the www.amazon.com prefix. The prefix is automatically prepended by the page upon reference and the subsequent composite link successfully resolves.
I think I am observing that Nutch can crawl these pages if the crawl-urlfilter.txt patterns are weakened by not requiring an amazon.com in the URL filter but then one begins crawling out of the amazon.com site. Does anyone have a suggestion for a crawl-urlfilter pattern that achieves my desired goal or another mechanism for doing so? Or perhaps I am misunderstanding, in which case an explanation would be appreciated. Thank you, in advance. Jim Van Sciver
