Hello - answers inline. M. -----Original message----- > From:Eyeris Rodriguez Rueda <[email protected]> > Sent: Tuesday 25th October 2016 20:58 > To: [email protected] > Subject: questions about hostdb > > Hi all. > I have read about the new feature hostdb and it it very usefull, but i have a > little questions about it. > Question 1 > - Is posible to know the total of host in hostdb? maybe commands or using the > API? > it is a very important information that i want to know. So if not maybe > somebody can tellme what changes i need to do to get this information.
Are you asking the total amount of URL's per host? Or the total amount of hosts? Both are available. The second column gives the sum of all URL's per host. The total amount of hosts is equal the reducer output records. > > > Question 2 > - HostDB is generated automatic by nutch (like linkdb or crawldb) or i need > always execute > the command (bin/nutch updatehostdb -hostdb crawl/hostdb -crawldb > crawl/crawldb) ? I don't think hostdb is part of the crawl.sh script, but you can check it out by looking for that command in the script. If it is not there, just run it once a day or so. We run it every six hours via a simple crontab. I think we made the hostdb so it waits for crawldb locks to be released, so there should be no problem. > > > Please any help will be appreciated. >

