If i am not mistaken I think you can enter like this also
+^http://lucene.apache.org/nutch/ any link going out which meets the above condition will work On 1/31/06, Lakshman, Madhusudhan <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I am trying to configure for multiple site indexing using intranet > crawling. I need help on how to keep the entries in the "urls" flat > file and the crawl-urlfilter.txt files. > > > > For example, I want to configure for the below mentioned 2 URLs, > > > > 1.http://lucene.apache.org/nutch/ > > 2.http://sourceforge.net/ > > > > can I have them one after the other on 2 lines in the "urls" flat file ? > > > > and in the crawl-urlfilter.txt, can I have the entries like: > > > > +^http://([a-z0-9]*\.)*apache.org/ > > +^http://([a-z0-9]*\.)*sourceforge.net/ > > > > > > Can someone help me ? > > > > Thanks, > > Madhu > > > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. Thank you. > >
