Hi Shane, They get it from the http.agent.* properties in your nutch-conf.xml or your nutch-site.xml. You give your crawler the identifying name., description, url, email and version.
Cheers! Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Shane Wood <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Saturday, December 13, 2014 at 1:27 AM To: "[email protected]" <[email protected]> Subject: question about robots.txt >I am asking a few websites to allow me to index there site, what you >they add to the robots.txt and where do i get the exact name of my >crawler. > >Cheers. >Shane

