Hi,

I am trying to utilize a 12-core machine with 24-GB of memory when performing 
search queries. I observed that throughput does 
not scale linearly after 6 cores,s o I am trying to use two nutch processes 
instead of one. Although I map each process  into a different 
set of cores, I cannot utilize my cores. 

I would like to ask you whether it's straight forward to run two nutch 
processes at the same node. 

When running the following commands on two separate nodes, each process 
utilizes 4 cores, so the distributed version of
nutch runs pretty ok.

$ taskset -c 0,2,4,6 bin/nutch server 8890
$ taskset -c 1,3,5,7 bin/nutch server 8891

When running the two commands on the same node, I observed a 8% IO-wait. IPtraf 
shows that network is not saturated. So my
understanding is that I am IO-bound. Each process uses a 4GB dataset. I would 
expect that the datasets would be cached in the 
disk, but it seems they do not.

Any thoughts that may cause the problem I am observing? 

Thanks in advance, 
Stavros.

Reply via email to