Hello,I configured seed.txt with http://example.com.br site. This site has a authentication session in http://example.com.br/login. I created a rule in httpclient-auth.xml as follow:
<auth -configuration> <credentials username="user" password="1111"> <authscope host="186.xxx.161.xxx" port="80" realm="login"/> </credentials> </auth -configuration> First how can I ensure that nutch used authentication? Second how I can fetch all site? Thanks!!! On Tue, Oct 15, 2013 at 1:23 AM, Talat UYARER <[email protected]>wrote: > Hi Diego, > First Question: > db.ignore.external.links property is correct for staying in domain. > > Second Question: > If you need authentication, I should use protocol-htttpclient instead of > protocol-http. You should changes plugins.include and you should add > > <property> > <name>http.auth.file</name> > <value>httpclient-auth.xml</**value> > <description></description> > </property> > > property in your nutch-site.xml. httpclient-auth.xml is your auth > configuration file. You can add your auth configuration. You can see some > example in this file's comment lines. > > Talat > > > 14-10-2013 23:09 tarihinde, Diego Bonesso yazdı: > > Hello, I have two questions? I'm using nutch 2.2. I put two urls in >> seed.txt . In dir /conf in nutch-site.xml, I create a property >> db.ignore.external.links with value true. First question my job should >> stay >> only in two urls domains? In the second url I have to authenticate , how i >> can configure this? The url auth is something like >> http://www.domain.com/login. Thanks a lot. >> >> >

