Dear all:
First of all, I am impressed with Nutch's capabilities. In less than 24
hours of work I have a nice system up and running, doing what I thought
would have taken me months to build. Congrats to the community members.
I have RTFM, the tutorials, and the lists. This may be a regex question
more than a Nutch issue. Yet here's the newbie question:
a.) I need to crawl a particular website where the files of interest are
all named as follows: PPPxxxxxxxx ('PPP' followed by 08 digits)
b.) The files are stored under
./show/PPPxxxxxxxx
./show/record/PPPxxxxxxxx
./show/locn/PPPxxxxxxxx
./show/related/PPPxxxxxxxx
After RTFM, I have tried the following with no success:
* regex-urlfilter.txt (+^http://*.*/show/)
* URLs file (http://*.*/show/)
Any pointers appreciated. Thanks.
--
José C. Lacal, Founder & Chief Vision Officer
Open Personalized Health Informatics _OpenPHI
15625 NW 15th Avenue; Suite 15
Miami, FL 33169-5601 USA www.OpenPHI.com
+1 (954) 553-1984 [EMAIL PROTECTED]