Robin Haswell wrote:
On Fri, 2006-12-08 at 11:22 +0100, Andrzej Bialecki wrote:
This should be shown in the logs - the map xx% or reduce xx% progress is printed to the logs.

The reduce phase consists of copying map outputs (reduce 0-33%), then sorting them - and here's where most CPU and disk IO and time is spent - which happens between 33%-66%, and finally copying sorted outputs to form the final result.

The last entries from hadoop.log are:

2006-12-07 16:34:50,547 INFO  fetcher.Fetcher - fetching
http://zut.languageskills.co.uk/press.html
2006-12-07 16:34:50,582 INFO  fetcher.Fetcher - fetching
http://zwartelijst-vliegtuigen.capita-pc.co.uk/
2006-12-07 16:34:50,614 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/
2006-12-07 16:34:51,005 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/comp/
2006-12-07 16:34:51,582 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/contact/
2006-12-07 16:34:51,584 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/courses/
2006-12-07 16:34:51,586 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/essinfo/
2006-12-07 16:34:51,740 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/funds/
2006-12-07 16:34:51,816 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/intro/
2006-12-07 16:34:51,876 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/javascript.js
2006-12-07 16:34:51,934 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/js/CreateTrail.js
2006-12-07 16:34:52,186 INFO  fetcher.Fetcher - fetching
http://zzz.grad.ucl.ac.uk/societies/


This pretty much corresponds to my stdout output. Here's a strace:

No lines like "INFO map 100%" ? Strange.

Process 18245 attached - interrupt to quit
clock_gettime(CLOCK_REALTIME, {1165573641, 880921000}) = 0
[...]
That's of the process consuming loads of CPU

What do you think?

I think that instead of running strace you should get a thread dump ;) strace cannot tell you what each JVM thread is doing.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to