Hi,

On 06/21/2012 10:55 AM, Andreas Calvo wrote:
Hello,
We are facing a performance issue in our opennebula infrastructure, and
I'd like to heard your opinion on the best approach to solve it.

We have 15 nodes plus 1 front-end. They all have the same shared storage
thru iscsi, and they mount the opennebula home folder (/var/lib/one)
which is a GFS2 partition.
All machines are based on CentOS 6.2, using QEMU-KVM.

We use the cloud to perform tests against a 120 VMs farm.

As we are using QCOW2, it really decreases the need to write changes to
disk.

However, all machines need to copy over 1G of data every time they
start, and this really collapse our iscsi network, until some machines
receive a timeout accessing to data which stops the test.
Opennebula infrastructure suffers from a read/write penalty leaving some
VMs in pending state and the system (almost) non-responsive.

We are not using at all the local disk of the nodes.

It seems that the only option is to use the local disk to write disk
changes, but I wanted to know what's your experienced opinion on our
problem.

I have two suggestions and two comments:
Suggestion: Maybe you could try to move to multiple GFS2 partitions, potentially spread over multiple servers. This way the traffic will be more local.

Comment: find the person who first told you that iSCSI is fine for serious use. Consider hitting them.

Suggestion: Then either deploy a second/third/forth iSCSI network under your multipathing to raise bandwidth. It seems you just need to have enough B/W to cover these spikes so it should be possible to quantify how much B/W you're missing. Comment: I'd immediately migrate out of iSCSI over to FC instead of deploying more ethernet, but that's just me.

_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to