Thanks for the lead! Okay I try a test for just the nutch.org site (so I'm following exactly what is in the tutorial)
In my conf/crawl-urlfilter.txt I have tried: +^http://([a-z0-9]*\.)*nutch.org/ +^http://*.nutch.org/ +^http://www.nutch.org/ all of these produce the same results. my urls file contains: http://www.nutch.org then I tried just www.nutch.org no luck! At this point it must be something really simple, only I cant seem to find it! Thanks to all for any ideas, Michael. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Michael Nebel Sent: 18 November 2004 19:29 To: [EMAIL PROTECTED] Subject: Re: [Nutch-general] nutch crawl gets no pages and gives no errors Hi, I had a smimiliar problem and I made a mistake withinin the e crawl-urlfilter.txt. Looking at your output: ... > 041118 122750 Starting URL processing > 041118 122750 Using URL filter: net.nutch.net.RegexURLFilter > 041118 122751 found resource crawl-urlfilter.txt at > file:/root/install/nutch-nightly/conf/crawl-urlfilter.txt > .041118 122751 Added 0 pages ... none of the sites you crawled made it through your filter... Regards Michael This email, and any attachment, is confidential to the addressee. If you have received this email and are not an authorised recipient please notify the sender and delete this message from your system. If you are not an authorised recipient you must not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure, error-free or virus-free. Although World Markets Research Centre ("WMRC plc") routinely screens for viruses you are responsible for checking this email and any attachments for viruses and WMRC plc accepts no responsibility for any damage caused to your systems or for loss of data caused by any virus. WMRC plc does not accept liability resulting from errors or omissions in the content of this message following email transmission. If verification is required please request a hard copy version. If this email is of a personal nature any views expressed are solely those of the author and are not made in the course of the author's employment with WMRC. ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
