Dear all:

First of all, I am impressed with Nutch's capabilities. In less than 24
hours of work I have a nice system up and running, doing what I thought
would have taken me months to build. Congrats to the community members.

I have RTFM, the tutorials, and the lists. This may be a regex question
more than a Nutch issue. Yet here's the newbie question:

a.) I need to crawl a particular website where the files of interest are
all named as follows: PPPxxxxxxxx ('PPP' followed by 08 digits)

b.) The files are stored under
./show/PPPxxxxxxxx
./show/record/PPPxxxxxxxx
./show/locn/PPPxxxxxxxx
./show/related/PPPxxxxxxxx


After RTFM, I have tried the following with no success:

* regex-urlfilter.txt (+^http://*.*/show/)
* URLs file (http://*.*/show/)

Any pointers appreciated. Thanks.


-- 

José C. Lacal, Founder & Chief Vision Officer
Open Personalized Health Informatics _OpenPHI
15625 NW  15th Avenue; Suite 15
Miami, FL 33169-5601  USA     www.OpenPHI.com
+1 (954) 553-1984      [EMAIL PROTECTED]          

Reply via email to