If you are using release 0.7x, urls is a file containing list of URLs to start crawling from. If you are using newer releases (0.8x? Or recent nighly only? I don't know), it is a directory that contains files that lists URLs. -kuro
> -----Original Message----- > From: Steele, Aaron [mailto:[EMAIL PROTECTED] > Sent: 2006-6-08 11:05 > To: [email protected] > Subject: New User > > I am having a problem setting up nutch to just look at > http://lucene.apache.org/nutch/ > I have everything I need installed and can get nutch to run but it is > not picking up any urls. I am running the command: > bin/nutch crawl urls crawl.test -depth 3 >& crawl.log > > But I don't think I have the urls part set up correctly. Is this a > directory or a file? Should it be in root or in the bin dir? > I do think > I have the conf/crawl-urlfilter.txt set up right. Thanks for your help > in advance. > > > Thank You, > > Aaron Steele > YRI Enterprise Solutions > https://ris.yumnet.com > w: 972.338.6862 > c: 817.401.0831 > > > -----Original Message----- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 08, 2006 12:41 PM > To: [email protected] > Subject: Re: Secure Sites > > Steele, Aaron wrote: > > Can Nutch support "Secure" sites or sites that have an > entry login but > > > no pages level security? > > > > You mean sites that are protected by so called "form-based > authentication"? No, this is not supported out of the box. It is > possible to implement it using protocol-httpclient, because it can > submit forms and it can handle session cookies. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| > \| || | Embedded Unix, System Integration http://www.sigram.com > Contact: info at sigram dot com > > > > > This communication is confidential and may be legally > privileged. If you are not the intended recipient, (i) > please do not read or disclose to others, (ii) please notify > the sender by reply mail, and (iii) please delete this > communication from your system. Failure to follow this > process may be unlawful. Thank you for your cooperation. >
