Chris Schneider wrote:
I was unable to even get the indexing phase started; I would get a timeout right at the 
beginning. I tried increasing the ipc.client.timeout from 5 minutes to 10 minutes, but 
that didn't help. In desperation, I increased it to 30 minutes and went to walk the dogs. 
As it turned out, it apparently took 14 minutes for it to "compute the splits". 
The job is still running (34% complete). Thus, it does seem like Doug was right about 
this being the problem.

I have no idea why this takes so long. We should profile this operation to figure out what's going on, because it shouldn't anywhere near that long. It should be easy to write a simple program that constructs a JobConf and InputFormat like those used in this job, and calls getSplits(). Then profile this as a standalone program to see where the time is going. Probably you don't really want to profile something that takes 14 minutes, so perhaps profile it on a subset of the input.

Doug

Reply via email to