If you are using release 0.7x, urls is a file containing list of URLs to
start crawling from.
If you are using newer releases (0.8x? Or recent nighly only? I don't
know), it is a directory
that contains files that lists URLs.
-kuro

> -----Original Message-----
> From: Steele, Aaron [mailto:[EMAIL PROTECTED] 
> Sent: 2006-6-08 11:05
> To: [email protected]
> Subject: New User
> 
> I am having a problem setting up nutch to just look at
> http://lucene.apache.org/nutch/  
> I have everything I need installed and can get nutch to run but it is
> not picking up any urls. I am running the command:
> bin/nutch crawl urls crawl.test -depth 3 >& crawl.log
> 
> But I don't think I have the urls part set up correctly. Is this a
> directory or a file? Should it be in root or in the bin dir? 
> I do think
> I have the conf/crawl-urlfilter.txt set up right. Thanks for your help
> in advance.
> 
> 
> Thank You,
> 
> Aaron Steele
> YRI Enterprise Solutions
> https://ris.yumnet.com
> w: 972.338.6862
> c: 817.401.0831
> 
> 
> -----Original Message-----
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, June 08, 2006 12:41 PM
> To: [email protected]
> Subject: Re: Secure Sites
> 
> Steele, Aaron wrote:
> > Can Nutch support "Secure" sites or sites that have an 
> entry login but
> 
> > no pages level security?
> >   
> 
> You mean sites that are protected by so called "form-based
> authentication"? No, this is not supported out of the box. It is
> possible to implement it using protocol-httpclient, because it can
> submit forms and it can handle session cookies.
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||
> \|  ||  |  Embedded Unix, System Integration http://www.sigram.com
> Contact: info at sigram dot com
> 
> 
> 
> 
> This communication is confidential and may be legally 
> privileged.  If you are not the intended recipient, (i) 
> please do not read or disclose to others, (ii) please notify 
> the sender by reply mail, and (iii) please delete this 
> communication from your system.  Failure to follow this 
> process may be unlawful.  Thank you for your cooperation.
> 

Reply via email to