Hi Mark, yes, I was thinking your patch should fix this issue, in fact it was one of the reasons I thought it was a large improvement over our current ones. I didn't have the time to merge it into master yet, but I'll work on that pretty soon :)
cheers, Jaime On Thu, Dec 6, 2012 at 8:09 PM, Mark Gergely <[email protected]>wrote: > Dear Ruben, Alain, > > our improved iSCSI driver set that we proposed before should solve this > issue. As mentioned in the ticket, it is possible to simultaneously start > hundreds of non persistent virtual machines. > The TM concurrency level is 15. > You can check the details at: http://dev.opennebula.org/issues/1592 > > All the best, > Mark Gergely > MTA-SZTAKI LPDS > > On 2012.12.06., at 20:01, "Ruben S. Montero" <[email protected]> > wrote: > > > Hi Alain, > > > > You are totally right, this may be a problem when instantiated > > multiple VMs at the same time. I've filled an issue to look for the > > best way to generate the TID [1]. > > > > We'd be interested in updating the tgtadm_next_tid function in > > scripts_common.sh. Also if the tgt server is getting overloaded by > > this simultaneous deployments, there are several ways to limit the > > concurrency of the TM (e.g. the -t option in oned.conf) > > > > THANKS for the feedback! > > > > Ruben > > > > [1] http://dev.opennebula.org/issues/1682 > > > > [1] http://dev.opennebula.org/issues/1682 > > > > On Thu, Dec 6, 2012 at 1:52 PM, Alain Pannetrat > > <[email protected]> wrote: > >> Hi all, > >> > >> I'm new to OpenNebula and this mailing list, so forgive me if I > >> stumble over a topic that may have already been discussed. > >> > >> I'm currently discovering opennebula 3.8.1 with a simple 3 node > >> system: a control node, a compute node and a datastore node > >> (iscsi+lvm). > >> > >> I have been testing the bulk instantiation of virtual machines in > >> sunstone, where I initiate the bulk creation of 8 virtual machines in > >> parallel. I have noticed that between 2 and 4 machines just fail to > >> instantiate correctly with the typical following error message: > >> > >> 08 2012 [TM][I]: Command execution fail: > >> /var/lib/one/remotes/tm/iscsi/clone > >> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26 > >> compute.admin.lan:/var/lib/one//datastores/0/111/disk.0 111 101 > >> Thu Dec 6 14:40:08 2012 [TM][E]: clone: Command " set -e > >> Thu Dec 6 14:40:08 2012 [TM][I]: set -x > >> Thu Dec 6 14:40:08 2012 [TM][I]: > >> Thu Dec 6 14:40:08 2012 [TM][I]: # get size > >> Thu Dec 6 14:40:08 2012 [TM][I]: SIZE=$(sudo lvs --noheadings -o > >> lv_size "/dev/vg-one/lv-one-26") > >> Thu Dec 6 14:40:08 2012 [TM][I]: > >> Thu Dec 6 14:40:08 2012 [TM][I]: # create lv > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo lvcreate -L${SIZE} vg-one -n > >> lv-one-26-111 > >> Thu Dec 6 14:40:08 2012 [TM][I]: > >> Thu Dec 6 14:40:08 2012 [TM][I]: # clone lv with dd > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo dd if=/dev/vg-one/lv-one-26 > >> of=/dev/vg-one/lv-one-26-111 bs=64k > >> Thu Dec 6 14:40:08 2012 [TM][I]: > >> Thu Dec 6 14:40:08 2012 [TM][I]: # new iscsi target > >> Thu Dec 6 14:40:08 2012 [TM][I]: TID=$(sudo tgtadm --lld iscsi --op > >> show --mode target | grep "Target" | tail -n 1 | > >> awk '{split($2,tmp,":"); print tmp[1]+1;}') > >> Thu Dec 6 14:40:08 2012 [TM][I]: > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new > >> --mode target --tid $TID --targetname > >> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111 > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op bind > >> --mode target --tid $TID -I ALL > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new > >> --mode logicalunit --tid $TID --lun 1 --backing-store > >> /dev/vg-one/lv-one-26-111 > >> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgt-admin --dump |sudo tee > >> /etc/tgt/targets.conf > /dev/null 2>&1" failed: + sudo lvs > >> --noheadings -o lv_size /dev/vg-one/lv-one-26 > >> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records in > >> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records out > >> Thu Dec 6 14:40:08 2012 [TM][I]: 8589934592 bytes (8.6 GB) copied, > >> 898.903 s, 9.6 MB/s > >> Thu Dec 6 14:40:08 2012 [TM][I]: tgtadm: this target already exists > >> Thu Dec 6 14:40:08 2012 [TM][E]: Error cloning > >> compute.admin.lan:/dev/vg-one/lv-one-26-111 > >> Thu Dec 6 14:40:08 2012 [TM][I]: ExitCode: 22 > >> Thu Dec 6 14:40:08 2012 [TM][E]: Error executing image transfer > >> script: Error cloning compute.admin.lan:/dev/vg-one/lv-one-26-111 > >> Thu Dec 6 14:40:09 2012 [DiM][I]: New VM state is FAILED > >> > >> After adding traces in the code, I found that there seems to be a race > >> condition in /var/lib/one/remotes/tm/iscsi/clone here the following > >> commands get executed: > >> > >> TID=\$($SUDO $(tgtadm_next_tid)) > >> $SUDO $(tgtadm_target_new "\$TID" "$NEW_IQN") > >> > >> These commands are typically expanded to something like this: > >> > >> TID=$(sudo tgtadm --lld iscsi --op show --mode target | grep "Target" > >> | tail -n 1 | awk '{split($2,tmp,":"); > >> sudo tgtadm --lld iscsi --op new --mode target --tid $TID > >> --targetname iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111 > >> > >> What seems to happens is two (or more) calls to the first command > >> tgtadm_next_tid happen simultaneously before the second command gets a > >> chance to get executed, and then TID as the same value for two (or > >> more) VMs. > >> > >> The workaround I found is to replace the line: > >> TID=\$($SUDO $(tgtadm_next_tid)) > >> with > >> TID=$VMID > >> in /var/lib/one/remotes/tm/iscsi/clone > >> > >> Since $VMID is globally unique no race conditions can happen here. > >> I've tested this and the failures don't happen anymore in my setting. > >> Of course I'm not sure this is the ideal fix, since perhaps VMID can > >> take values that are out of range for tgtadm. So futher testing would > >> be needed. > >> > >> I'd be happy to get your thoughts/feedback on this issue. > >> > >> Best, > >> > >> Alain > >> _______________________________________________ > >> Users mailing list > >> [email protected] > >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > > > > > > > > -- > > Ruben S. Montero, PhD > > Project co-Lead and Chief Architect > > OpenNebula - The Open Source Solution for Data Center Virtualization > > www.OpenNebula.org | [email protected] | @OpenNebula > > _______________________________________________ > > Users mailing list > > [email protected] > > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > -- Jaime Melis Project Engineer OpenNebula - The Open Source Toolkit for Cloud Computing www.OpenNebula.org | [email protected]
_______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
