I'm using nutch 0.8.1 (so with the included hadoop-0.4.0-patched.jar) With 2 servers, and with the namenode and the jobtracker on the first one. My hadoop-site.xml is like: mapred.map.tasks=2 mapred.reduce.tasks=2
so i should have running 1 node of each type on each server? server1 - data-node - task-tracker server2 - data-node - task-tracker But some times a task is not allocated on the two servers but on only one like: server1 - data-node server2 - data-node - task-tracker - task-tracker and when it appends for the "fetch" task, it's a real lack of performance, to have just one server working! . Can't we specify manually the allocation of the tasktracker-nodes for each server? . Why hadoop is doing this strange behavior? . I assume that he decides to allocate the task-node dynamically regarding the load of the servers, so should I put in my params: mapred.map.tasks=2*numbers-of-servers ? -- View this message in context: http://www.nabble.com/hadoop-and-nutch-%3A-task-load-allocation-problem-tf3751447.html#a10601144 Sent from the Nutch - User mailing list archive at Nabble.com.
