Thanks for the help Julien, I'll just copy the files to the hadoop conf directory for now while it is a single node.
If I use the job file do I have to have the nutch package on each node in the cluster, or just on the master node? I'm also curious if it would be possible or practical to declare the NUTCH_CONF_DIR in a nutch-env.sh file like hadoop uses, or somewhere in the nutch script. Thanks again. ~Jason On Mon, Jun 13, 2011 at 4:03 PM, Julien Nioche < [email protected]> wrote: > Hi Jason, > > If you have hadoop running independently from Nutch you should use > runtime/deploy/bin. The conf files can go directly in the hadoop/conf dir > or > in the Nutch job which you will need to regenerate with 'ant job' so that > it > reflects the changes you made in NUTCH/conf > > Julien > > On 13 June 2011 11:59, Jason Stubblefield > <[email protected]>wrote: > > > Update: The nutch configuration files need to go in the hadoop conf > file. > > > > Maybe someone could recommend some best practices regarding the file > > structure? Should all the nutch config files simply be copied to the > > hadoop > > conf directory? Currently I have: > > > > /webcrawler/hadoop > > /webcrawler/nutch > > > > I guess im a bit confused because 1.3 didn't come bundled with hadoop. > > > > Thanks! > > > > ~Jason > > > > On Mon, Jun 13, 2011 at 12:07 PM, Jason Stubblefield < > > [email protected]> wrote: > > > > > Hello, > > > > > > I'm trying to fetch a segment using hadoop on a single node with nutch > > 1.3. > > > I seem to be struggling with the new runtime configuration. I have > > hadoop > > > up and running and have successfully run the readdb -stats command and > > > generated a sement, but when I run: > > > > > > runtime/deploy/bin/nutch fetch crawl/segments/20110613103305 -threads 8 > > > > > > I get an error message: No agents listed in 'http.agent.name' property > > > > > > I noticed there are now 2 conf files, one at trunk/conf and the other > at > > > trunk/runtime/local/conf, and hae updated both of them with my > > > nutch-site.xml file, both have a properly configured http.agent.name. > > > > > > Do I need to explicitly declare the conf directory somewhere? Do in > need > > > to move the conf file to trunk/runtime/deploy/conf, or put it somewhere > > > else? What am i missing? > > > > > > Thanks in advance! > > > > > > ~Jason > > > > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com >

