When the tasktracker starts a task it reads the config files in the following order: nutch-default.xml, mapred-default.xml, job.xml, nutch-site.xml. Except the job.xml all files are those local in the tasktrackers conf directory. The job.xml is generated for each job by the tool you use, e.g. Generator or Fetcher, and distributed to all tasktrackers.

So in you example with http.content.limit that new value won't be forwared because the fetcher class doesn't put this in the job.xml.

By the way, when you have changed any mapred related settings in the nutch-site.xml you should move them to the mapred-default.xml. I made this mistake and the generator didn't work as expected anymore.

best regards,
Dominik

Raghavendra Prabhu schrieb:
Hi I have one doubt


When you run a job tracker and task tracker , the task tracker is the
machine which does indexing

The task tracker points to the job tracker in its conf file.(nutch-default
and nutch-site)

My question is while indexing ,does the job tracker look at its conf file or
gets the details through the job tracker.

I would summarize it by asking whether things like http.content.limit are
the properties of the job tracker assuming that

(assuming http content.limit differs in both the conf file of job tracker
and task tracker )

I am asking this because if i update the job tracker to new conf values ,do
i need to update the task tracker also

( i was under the impression that since nutchconf is also passed in the job
behaviour , that shud be taken while sending the task0)

I am hoping that i clearly stated the problem . But if  i am ambiguous i can
rephrase it.

Does the NutchConf.get() value differ for each machine (task tracker) and
which propery will be taken

Assuming that the job tracker configuration  is taken , how does each task
tracker read its configuration file


Rgds
Prabhu





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to