wildcard urls

karthik085 Fri, 10 Aug 2007 15:44:42 -0700

I am using nutch 0.7.2. I would like to crawl a certain section of a
website...that is
http://domain.com/ID1124
http://domain.com/ID22351
http://domain.com/ID546
and so on....


I tried feeding in just this line:
http://domain.com/ID*
(added it in url.txt and fed that file)...that didn't work. 
It will be difficult to generate a list of IDs from the website and feed
that static list to nutch.

Does nutch accept wildcard in the urls? If so, how can I get it working? If
not, are there any work-arounds?

My crawl-filter works well. I just passed in http://domain.com/ID546 and was
able to retrieve that page.
Thanks.
-- 
View this message in context: 
http://www.nabble.com/wildcard-urls-tf4251600.html#a12100349
Sent from the Nutch - User mailing list archive at Nabble.com.

wildcard urls

Reply via email to