problem with skiped urls

david . wojciechowski Wed, 21 Jun 2006 00:23:53 -0700

hi,
i'm trying to run nutch in our clinicum center and i have a little problem.
we have a few intranet servers and i want that nutch skip a few
direcotries.
for example:


http://sapdoku.ukl.uni-freiburg.de/abteilung/pvs/dokus/

i wrote this urls in the crawl-urlfilter.txt. for example:

-^http://([a-z0-9]*\.)*sapdoku.ukl.uni-freiburg.de/abteilung/pvs/dokus

but nothing happens. nutch don't skip this urls. and i don't know why...

:( kann anyone help me?

i'm cwaling with this command:

bin/nutch crawl urls -dir crawl060621 -depth 15 &> crawl060621.log &

i'm using the release 0.7.1
greets david

==========================================================

David Wojciechowski
Universitätsklinikum Freiburg
Klinikrechenzentrum
Agnesenstrasse 6-8
D-79106 Freiburg

Telefon :  0761 / 270 - 1842
Fax: 0761 / 270 - 2276
E-Mail   :  [EMAIL PROTECTED]

==========================================================

problem with skiped urls

Reply via email to