Re: regex-urlfilter.txt and paging variables

2010-02-25 Thread Andreas P. Koenzen
Replace it with this: -...@!*] That's it... Best regards, --- Andreas P. Koenzen On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote: I suck at regex and in keeping with the Olympic spirit, I probably suck at giant slalom too. In the regex-urlfilter.txt there's the suggested probable

Re: incomplete segment ...

2010-02-15 Thread Andreas P. Koenzen
the time were I live :) Best regards, --- Andreas P. Koenzen On 15/02/2010, at 11:38 a.m., Patricio Galeas wrote: Hello, I'm using Nutch-1.0, one week ago I started an internet crawl (depth=6, slice=5, threads=10) but because of a power breakdown the crawl process could not be finished

Re: Crawling Error

2010-02-14 Thread Andreas P. Koenzen
Hello Ashumeet, Yes, thats a symptom of malformed XML, if you haven't change the nutch- default.xml file, then its probably the version of the SAX parser you are using. Which version of Java are you using...? Best regards, --- Andreas P. Koenzen On 14/02/2010, at 01:31 a.m., Ashumeet

Re: SocketTimeoutException

2010-02-11 Thread Andreas P. Koenzen
Hello, Just increase the HTTP Timeout time in nutch-site.xml. property namehttp.timeout/name valueYour value in milliseconds./value description/description /property Best regards, --- Andreas P. Koenzen On 11/02/2010, at 08:25 p.m., Ted Yu wrote: Hi, Our crawling