Andrzej Bialecki wrote:
I have an idea - you remember the old issue of MapFile's "index" being corrupted, if Fetcher was interrupted. Random accesses to MapFile's would take ages in that case. Does calculating splits involve random access to the segment's MapFiles?

No, calculating splits just lists directories and then gets the size of each file. So this could point to an NDFS name node performance problem, or an RPC performance problem. Each file size request is an RPC, and there could be hundreds or even thousands of input files, but even thousands of RPCs shouldn't take 14 minutes.

Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to