Hi, Great. Thanks for the tips.
I've tried the following startup sequences: * Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes. Start all DataNodes. * Start NameNode. Wait until CPU goes to 0. Wait 2 extra minutes. Start each DataNode with a 10 minutes pause between them. * Start all DataNodes. Wait 10 min. Start NameNode. In every case, I ran into the same "Problem making IPC call". I changed the number of threads to 100 in NameNode, without any effect. I would say that the biggest issue is the replication of blocks. We are seeing tons of lines like this in the DataNode logs: 050308 121928 Replicated block blk_-9167778052227947819 to vlex-cluster-6/192.168.166.121:7000 Other odd things in the DataNode logs are: java.io.IOException: Block blk_-9157092366090071006 is valid, and cannot be written to. (thousands of them) And in the NameNode, we see periodic bursts of: 050308 131028 Lost heartbeat for vlex-cluster-3:7000 050308 131028 Lost heartbeat for vlex-cluster-4:7000 050308 131029 Lost heartbeat for vlex-cluster-7:7000 050308 131029 Lost heartbeat for vlex-cluster-8:7000 050308 131030 Lost heartbeat for vlex-cluster-9:7000 050308 131030 Lost heartbeat for vlex-cluster-2:7000 Afterwards, of course, the remaining DataNodes try desperately to replicate their data: 050308 131041 Pending transfer from vlex-cluster-5:7000 to 3 destinations ... The "Lost heartbeat" error would indicate connectivity problems, but both "ping vlex-cluster-4" and "telnet vlex-cluster-4 7000" from the NameNode work consistently well. By the way, the NameNode startup time doesn't seem related to replaying the log, since right now "edits" is an empty file. Summing up: I think that the best thing I can do is wait for the patch that enables throttling of block replication, and replay the tests. Thanks again for you responsiveness, angel ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
