Adriano,
The email that you sent earlier seemed to have [EMAIL PROTECTED]
on a separate line. As I understand it these lines need to start with a
+ for regular expressions of things that should be included in the crawl
a - for regular expressions of things that should not be included or a #
for comments.
I'm not sure what having [EMAIL PROTECTED] on its own line would
do, but in regular expressions you would be defining a character class
that would match any of the characters between the []s.
I hope that helps.
Jake.
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 15, 2005 3:54 AM
To: [email protected]
Subject: Re: RE: crawl-urlfilter.txt
I'm sorry but Idon't understand very well.
You said : "you try commenting that line out" but out??? where??? in
that mode???
thanks
Adriano
> I'm fairly new to nutch myself, but this line doesn't look right
to me:
>
># skip URLs containing certain characters as probable queries, etc.
[EMAIL
>PROTECTED]
>
> I'd try commenting that line out and try the crawl again.
>
>Jake.
>
------------------------------------------------------------------------
-
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in
Internet,
ecco due esempi di offerte:
- Registrazione Dominio: un dominio con 1 MB di spazio disco + 2
caselle
email a soli 18,59 euro
- MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email
a soli 51,13 euro
Vieni a trovarci!
Lo Staff di Interfree
------------------------------------------------------------------------
-