Hi,

Great. Thanks for the tips.

I've tried the following startup sequences:

 * Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes.
Start all DataNodes.
 * Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes.
Start each DataNode with a 10 minutes pause between them.
 * Start all DataNodes. Wait 10 min. Start NameNode.
 
In every case, I ran into the same "Problem making IPC call".

I changed the number of threads to 100 in NameNode, without any effect. 

I would say that the biggest issue is the replication of blocks. We
are seeing tons of lines like this in the DataNode logs:

 050308 121928 Replicated block blk_-9167778052227947819 to
vlex-cluster-6/192.168.166.121:7000

Other odd things in the DataNode logs are:

 java.io.IOException: Block blk_-9157092366090071006 is valid, and
cannot be written to.
(thousands of them)

And in the NameNode, we see periodic bursts of:

 050308 131028 Lost heartbeat for vlex-cluster-3:7000
 050308 131028 Lost heartbeat for vlex-cluster-4:7000
 050308 131029 Lost heartbeat for vlex-cluster-7:7000
 050308 131029 Lost heartbeat for vlex-cluster-8:7000
 050308 131030 Lost heartbeat for vlex-cluster-9:7000
 050308 131030 Lost heartbeat for vlex-cluster-2:7000

Afterwards, of course, the remaining DataNodes try desperately to
replicate their data:

 050308 131041 Pending transfer from vlex-cluster-5:7000 to 3 destinations
 ...
 

The "Lost heartbeat" error would indicate connectivity problems, but
both "ping vlex-cluster-4" and "telnet vlex-cluster-4 7000" from the
NameNode work consistently well.

By the way, the NameNode startup time doesn't seem related to
replaying the log, since right
now "edits" is an empty file.

Summing up: I think that the best thing I can do is wait for the patch
that enables throttling of block replication, and replay the tests.

Thanks again for you responsiveness,


angel


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to