Hi, I am trying to utilize a 12-core machine with 24-GB of memory when performing search queries. I observed that throughput does not scale linearly after 6 cores,s o I am trying to use two nutch processes instead of one. Although I map each process into a different set of cores, I cannot utilize my cores.
I would like to ask you whether it's straight forward to run two nutch processes at the same node. When running the following commands on two separate nodes, each process utilizes 4 cores, so the distributed version of nutch runs pretty ok. $ taskset -c 0,2,4,6 bin/nutch server 8890 $ taskset -c 1,3,5,7 bin/nutch server 8891 When running the two commands on the same node, I observed a 8% IO-wait. IPtraf shows that network is not saturated. So my understanding is that I am IO-bound. Each process uses a 4GB dataset. I would expect that the datasets would be cached in the disk, but it seems they do not. Any thoughts that may cause the problem I am observing? Thanks in advance, Stavros.

