Am 13.12.2014 um 10:27 schrieb Shane Wood: > I am asking a few websites to allow me to index there site, what you > they add to the robots.txt and where do i get the exact name of my crawler. In case you are using nutch-1.9 there is a file conf/nutch-site.xml. In this config file there are properties defined, like <name>http.agent.name</name> and following.
This is used for identifying your crawler. Did you already set this property and your crawler has not used it? > > Cheers. > Shane > Regards, Patrick

