I'm using nutch 0.8.1 (so with the included hadoop-0.4.0-patched.jar)
With 2 servers, and with the namenode and the jobtracker on the first one.
My hadoop-site.xml is like:
mapred.map.tasks=2
mapred.reduce.tasks=2

so i should have running 1 node of each type on each server?

server1
- data-node
- task-tracker

server2
- data-node
- task-tracker

But some times a task is not allocated on the two servers but on only one
like:

server1
- data-node

server2
- data-node
- task-tracker
- task-tracker

and when it appends for the "fetch" task, it's a real lack of performance, 
to have just one server working!


. Can't we specify manually the allocation of the tasktracker-nodes for each
server? 
. Why hadoop is doing this strange behavior? 
. I assume that he decides to allocate the task-node dynamically regarding
the load of the servers, so should I put in my params:
mapred.map.tasks=2*numbers-of-servers ?
-- 
View this message in context: 
http://www.nabble.com/hadoop-and-nutch-%3A-task-load-allocation-problem-tf3751447.html#a10601144
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to