Replace it with this: -...@!*]
That's it...
Best regards,
---
Andreas P. Koenzen
On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote:
I suck at regex and in keeping with the Olympic spirit, I probably
suck
at giant slalom too.
In the regex-urlfilter.txt there's the suggested probable
the time were I live :)
Best regards,
---
Andreas P. Koenzen
On 15/02/2010, at 11:38 a.m., Patricio Galeas wrote:
Hello,
I'm using Nutch-1.0, one week ago I started an internet crawl
(depth=6, slice=5, threads=10) but because of a power breakdown
the crawl process could not be finished
Hello Ashumeet,
Yes, thats a symptom of malformed XML, if you haven't change the nutch-
default.xml file, then its probably the version of the SAX parser you
are using. Which version of Java are you using...?
Best regards,
---
Andreas P. Koenzen
On 14/02/2010, at 01:31 a.m., Ashumeet
Hello,
Just increase the HTTP Timeout time in nutch-site.xml.
property
namehttp.timeout/name
valueYour value in milliseconds./value
description/description
/property
Best regards,
---
Andreas P. Koenzen
On 11/02/2010, at 08:25 p.m., Ted Yu wrote:
Hi,
Our crawling