Tomislav Poljak wrote:
Hi,
I have noticed that Nutch while parsing segment data, doing update,
indexing or other CPU demanding operations is using only one CPU (core).
Actually it uses both but alternately: when one CPU goes 100% other CPU
is on 1%, and then they switch (never using both CPU on 100%). For
example, when parsing segment data Nutch java process uses 100% CPU
(according to top) for a long time, but when I look CPU history (with
System Monitor) I see only one CPU is on 100% while other CPU is barely
used (and then they switch). Is it possible to configure Nutch to use
both CPUs (cores) simultaneously to get max performance (more threads
for parse,update,index,merge)? I am also curious about why Nutch uses so
much CPU time when parsing fetched data while there is absolutely no IO
(no disk write/read)?

It's likely that this problem is related to the OS scheduler, or the way that this JVM implementation uses kernel threads. Perhaps there is a method in the OS to select how application threads are mapped to kernel threads? (there is in FreeBSD, I'm not that familiar with Linux)

Long periods of no IO during parsing are probably related to the fact that Hadoop uses internal buffers which are several MB large.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to