Hi again, cpu pinning does not result in improved availability. Executing test with just VCPU=1 and CPU=0.5 results on 55% of availability. Before CPU pinning, there was 51% of availability.
Facts points to disk access, as sad on previous messages. There is a huge overload when squid is reading disks from VMs. I'll be grateful for any guidance. Thank you in advance, Erico. 2012/8/27 Erico Augusto Cavalcanti Guedes <e...@cin.ufpe.br> > Hi, > > 2012/8/26 Ruben S. Montero <rsmont...@opennebula.org> > > Hi >> >> If you want to try the cpu pinning suggested by Steven, simply add a >> RAW attribute in your VM template. Something similar to: >> >> RAW=[ >> TYPE="kvm", >> DATA="<cputune><vcpupin vcpu=\"0\" cpuset=\"1\"/></cputune>" ] >> > > Added to vm template. Running an expirement right now to observe > availability. > > >> >> Cheers >> >> Ruben >> >> >> On Sun, Aug 26, 2012 at 3:27 AM, Steven C Timm <t...@fnal.gov> wrote: >> > I run high-availability squid servers on virtual machines although not >> yet >> > in OpenNebula. >> > >> > It can be done with very high availability. >> > >> > I am not familiar with Ubuntu Server 12.04 but if it has libvirt 0.9.7 >> or >> > better, and you are >> > >> > Using KVM hypervisor, you should be able to use the cpu-pinning and >> > numa-aware features of libvirt to pin >> >> > >> > each virtual machine to a given physical cpu. That will beat the >> migration >> > issue you are seeing now. >> > > Done. migration processes was beaten, as you sad. Thank you. > > >> > >> > With Xen hypervisor you can (and should) also pin. >> > >> > I think if you beat the cpu and memory pinning problem you will be OK. >> > >> > >> > >> > However, you did not say what network topology you are using for your >> > virtual machine, >> > > That is my vnet template: > > TYPE = RANGED > BRIDGE = br0 > NETWORK_SIZE = C > NETWORK_ADDRESS = 192.168.15.0 > NETMASK = 255.255.255.0 > GATEWAY = 192.168.15.10 > IP_START = 192.168.15.110 > IP_END = 192.168.15.190 > > futhermore, there is one squid instance in execution on each VM, connected > on a mesh architecture, with LRU replacement policy. Web-polygraph clients > are configured to send requests on round-robin, to each of VMs (only three > currently: 192.168.15.110-112). Web-polygraph server, that represents > Internet, is running on 192.168.15.10, closing the topology. > > Remember that when same test is performed on Physical Machines, > availability is 100% invariably. > > and what kind of virtual network drivers, >> > > VMs are configured on VirtualBox, based on Ubuntu Server 12.04. > Executing: > ps ax | grep kvm > > on 192.168.15.110 cloud node: > > /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 -smp > 1,sockets=1,cores=1,threads=1 -name one-609 -uuid > a7115950-3f0e-0fab-ea20-3f829d835889 -nodefconfig -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-609.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -no-acpi > -boot c -drive > file=/srv3/cloud/one/var/datastores/0/609/disk.0,if=none,id=drive-ide0-0-0,format=qcow2 > -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive > file=/srv3/cloud/one/var/datastores/0/609/disk.1,if=none,id=drive-ide0-1-1,format=raw > -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev > tap,fd=20,id=hostnet0 -device > rtl8139,netdev=hostnet0,id=net0,mac=02:00:c0:a8:0f:70,bus=pci.0,addr=0x3 > -usb -vnc 0.0.0.0:609 -vga cirrus -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 > > network driver used on VMs is rtl8139. > > >> >> > > >> > That is important too. Also—is your squid cache mostly disk-resident >> or >> > mostly RAM-resident? >> > > disk-resident (from squid.conf): > cache_mem 128 MB > cache_dir aufs /var/spool/squid3 1024 16 256 > > Note that disk space is reduced to 1024MB for testing purposes. > > >> If the former then the virtual disk drivers matter >> > too, a lot. >> > > Some reading suggestions? I'll appreciate a lot. > > One additional information that can be relevant. Monitoring VMs processes > on D status, through command: > ps -eo pid,tid,class,rtprio,psr,pcpu,stat,wchan:25,comm | grep D > PID TID CLS RTPRIO PSR %CPU STAT WCHAN COMMAND > 170 170 TS - 0 0.1 D sleep_on_page jbd2/sda1-8 > 1645 1645 TS - 0 11.4 Dsl get_write_access squid3 > 1647 1647 TS - 0 0.2 D get_write_access > autExperiment.sh > 1679 1679 TS - 0 0.8 D get_write_access > autExperiment.sh > 1680 1680 TS - 0 0.5 D get_write_access > autExperiment.sh > > It reveals that journally ext4 process(jdb2/sda1-8), squid and my > monitoring scripting are on D status, always waiting on sleep_on_page and > get_write_access kernel functions. I am researching the relevance/influence > of these WCHANs. I performed tests without my monitoring scripts and > availability is as bad as with them. > > Thanks for Steve and Ruben, for your time and answers. > > Erico. > > > >> > >> > >> > >> > Steve Timm >> > >> > >> > >> > >> > >> > >> > >> > From: users-boun...@lists.opennebula.org >> > [mailto:users-boun...@lists.opennebula.org] On Behalf Of Erico Augusto >> > Cavalcanti Guedes >> > Sent: Saturday, August 25, 2012 6:33 PM >> > To: users@lists.opennebula.org >> > Subject: [one-users] Very high unavailable service >> > >> > >> > >> > Dears, >> > >> > I 'm running Squid Web Cache Proxy server on Ubuntu Server 12.04 VMs >> (kernel >> > 3.2.0-23-generic-pae), OpenNebula 3.4. >> > My private cloud is composed by one frontend and three nodes. VMs are >> > running on that 3 nodes, initially one by node. >> > Outside cloud, there are 2 hosts, one working as web clients and >> another as >> > web server, using Web Polygraph Benchmakring Tool. >> > >> > The goal of tests is stress Squid cache running on VMs. >> > When same test is executed outside the cloud, using the three nodes as >> > Physical Machines, there are 100% of cache service availability. >> > Nevertheless, when cache service is provided by VMs, nothing better >> than 45% >> > of service availability is reached. >> > Web clients do not receive responses from squid when it is running on >> VMs in >> > 55% of the time. >> > >> > I have monitored load average of VMs and PMs where VMs are been >> executed. >> > First load average field reaches 15 after some hours of tests on VMs, >> and 3 >> > on physical machines. >> > Furthermore, there is a set of processes, called migration/X, that are >> > champions in CPU TIME when VMs are in execution. A sample: >> > >> > top - 20:01:38 up 1 day, 3:36, 1 user, load average: 5.50, 5.47, 4.20 >> > >> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME >> COMMAND >> > 13 root RT 0 0 0 0 S 0 0.0 408:27.25 408:27 >> > migration/2 >> > 8 root RT 0 0 0 0 S 0 0.0 404:13.63 404:13 >> > migration/1 >> > 6 root RT 0 0 0 0 S 0 0.0 401:36.78 401:36 >> > migration/0 >> > 17 root RT 0 0 0 0 S 0 0.0 400:59.10 400:59 >> > migration/3 >> > >> > >> > It isn't possible to offer web cache service via VMs in the way the >> service >> > is behaving, with so small availability. >> > >> > So, my questions: >> > >> > 1. Does anybody has experienced a similar problem of unresponsive >> service? >> > (Whatever service). >> > 2. How to state the bootleneck that is overloading the system, so that >> it >> > can be minimized? >> > >> > Thanks a lot, >> > >> > Erico. >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@lists.opennebula.org >> > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> > >> >> >> >> -- >> Ruben S. Montero, PhD >> Project co-Lead and Chief Architect >> OpenNebula - The Open Source Solution for Data Center Virtualization >> www.OpenNebula.org | rsmont...@opennebula.org | @OpenNebula >> > >
_______________________________________________ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org