Robin Haswell wrote:
On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote:
This should be shown in the logs - the map xx% or reduce xx% progress is
printed to the logs.
The reduce phase consists of copying map outputs (reduce 0-33%), then
sorting them - and here's where most CPU and disk IO and time is spent -
which happens between 33%-66%, and finally copying sorted outputs to
form the final result.
The last entries from hadoop.log are:
2006-12-07 16:34:50,547 INFO fetcher.Fetcher - fetching
http://zut.languageskills.co.uk/press.html
2006-12-07 16:34:50,582 INFO fetcher.Fetcher - fetching
http://zwartelijst-vliegtuigen.capita-pc.co.uk/
2006-12-07 16:34:50,614 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/
2006-12-07 16:34:51,005 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/comp/
2006-12-07 16:34:51,582 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/contact/
2006-12-07 16:34:51,584 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/courses/
2006-12-07 16:34:51,586 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/essinfo/
2006-12-07 16:34:51,740 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/funds/
2006-12-07 16:34:51,816 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/intro/
2006-12-07 16:34:51,876 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/javascript.js
2006-12-07 16:34:51,934 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/js/CreateTrail.js
2006-12-07 16:34:52,186 INFO fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/societies/
This pretty much corresponds to my stdout output. Here's a strace:
No lines like "INFO map 100%" ? Strange.
Process 18245 attached - interrupt to quit
clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0
[...]
That's of the process consuming loads of CPU
What do you think?
I think that instead of running strace you should get a thread dump ;)
strace cannot tell you what each JVM thread is doing.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com