If i am not mistaken

I think you can enter like this also

+^http://lucene.apache.org/nutch/


any link going out which meets the above condition will work



On 1/31/06, Lakshman, Madhusudhan <[EMAIL PROTECTED]>
wrote:
>
> Hi,
>
>
>
> I am trying to configure for multiple site indexing using intranet
> crawling.  I need help on how to keep the entries in the "urls" flat
> file and the crawl-urlfilter.txt files.
>
>
>
> For example, I want to configure for the below mentioned 2 URLs,
>
>
>
> 1.http://lucene.apache.org/nutch/
>
> 2.http://sourceforge.net/
>
>
>
> can I have them one after the other on 2 lines in the "urls" flat file ?
>
>
>
> and in the crawl-urlfilter.txt,  can I have the entries like:
>
>
>
> +^http://([a-z0-9]*\.)*apache.org/
>
> +^http://([a-z0-9]*\.)*sourceforge.net/
>
>
>
>
>
> Can someone help me ?
>
>
>
> Thanks,
>
> Madhu
>
>
>
>
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>

Reply via email to