Ywang, Not sure what you mean by arbitrary,maybe u need to be a bit more specific here.
However if you are trying to do a webcrawl, here's some advice : - consider using urlfilter-suffix instead of one of the regex filters. - also, set db.ignore.external.links to false which allows nutch to fetch pages outside of initial injected list (i.e. domains) - lastly since u must have a starting point, create an inject list which would allow you to fetch the pages desired There is a crawl script availabe on the nutch wiki which you can use instead of ./bin/nutch crawl. Regards, Hilkiah G. Lavinier MEng (Hons), ACGI 6 Winston Lane, Goodwill, Roseau, Dominica Mbl: (767) 275 3382 Hm : (767) 440 3924 Fax: (767) 440 4991 VoIP USA: (646) 432 4487 Email: [EMAIL PROTECTED] Email: [EMAIL PROTECTED] IM: Yahoo hilkiah / MSN [EMAIL PROTECTED] IM: ICQ #8978201 / AOL hilkiah21 ----- Original Message ---- From: ywang <[EMAIL PROTECTED]> To: "[email protected]" <[email protected]> Sent: Saturday, April 19, 2008 10:32:17 AM Subject: use crawl command to fetch arbitrary pages? Dear all, How can I use crawl command to fetch arbitrary pages, without being restricted in a domain which defined in crawl-urlfilter.txt? I try to delete or logout that domain property, but the shell will give me a error like "No urls to fetch - check your seed list and URL filters. Oh, in addition, crawl command works well with setting the domain property. Cheers Yong 2008-04-19 ywang ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
