Hi Paul, Someone had to point this out to me too: in conf/crawl-urlfilter.txt there is a line: [EMAIL PROTECTED] which tells which characters are not allowed in urls.
Try to remove this line or only remove '=' from it regards, Jeroen On 3/19/07, Paul Liddelow <[EMAIL PROTECTED]> wrote:
Hi I have set Nutch up and the crawler (following the intranet tutorial) and can fetch results OK for the few URL's I have tested, but for some reason I cannot get any results returned when I try to crawl this URL: http://www.comlaw.gov.au/ComLaw/legislation/actcompilation1.nsf/sh/browse&VIEW=current&ORDER=bytitle&CATEGORY=actcompilation I think it might have something to do with the file extension ".nsf" which is midway in the URL. I think the crawler cannot deal with it. Has anybody else had this problem or can help? Much obliged if anybody knows the answer. Cheers Paul
-- regards, Jeroen
