Dear Mark, Ruben, >From reading the code from those patches, I think they seem indeed to greatly improve the iSCSI driver, and solve my problem.
Tonight, I was also confronted with the issue that in some cases it sometimes takes more than two seconds between "iscsiadm_login "$NEW_IQN" "$TARGET_HOST"" and "/dev/disk/by-path/*$NEW_IQN-lun-1" in the DISCOVERY_CMD inside iscsi/clone, so the "sleep 2" inserted there is not enough. This is also fixed by the proposed patches. Nice work! All the best, Alain On Thu, Dec 6, 2012 at 9:09 PM, Mark Gergely <[email protected]> wrote: > Dear Ruben, Alain, > > our improved iSCSI driver set that we proposed before should solve this > issue. As mentioned in the ticket, it is possible to simultaneously start > hundreds of non persistent virtual machines. > The TM concurrency level is 15. > You can check the details at: http://dev.opennebula.org/issues/1592 > > All the best, > Mark Gergely > MTA-SZTAKI LPDS > > On 2012.12.06., at 20:01, "Ruben S. Montero" <[email protected]> wrote: > >> Hi Alain, >> >> You are totally right, this may be a problem when instantiated >> multiple VMs at the same time. I've filled an issue to look for the >> best way to generate the TID [1]. >> >> We'd be interested in updating the tgtadm_next_tid function in >> scripts_common.sh. Also if the tgt server is getting overloaded by >> this simultaneous deployments, there are several ways to limit the >> concurrency of the TM (e.g. the -t option in oned.conf) >> >> THANKS for the feedback! >> >> Ruben >> >> [1] http://dev.opennebula.org/issues/1682 >> >> [1] http://dev.opennebula.org/issues/1682 >> >> On Thu, Dec 6, 2012 at 1:52 PM, Alain Pannetrat >> <[email protected]> wrote: >>> Hi all, >>> >>> I'm new to OpenNebula and this mailing list, so forgive me if I >>> stumble over a topic that may have already been discussed. >>> >>> I'm currently discovering opennebula 3.8.1 with a simple 3 node >>> system: a control node, a compute node and a datastore node >>> (iscsi+lvm). >>> >>> I have been testing the bulk instantiation of virtual machines in >>> sunstone, where I initiate the bulk creation of 8 virtual machines in >>> parallel. I have noticed that between 2 and 4 machines just fail to >>> instantiate correctly with the typical following error message: >>> >>> 08 2012 [TM][I]: Command execution fail: >>> /var/lib/one/remotes/tm/iscsi/clone >>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26 >>> compute.admin.lan:/var/lib/one//datastores/0/111/disk.0 111 101 >>> Thu Dec 6 14:40:08 2012 [TM][E]: clone: Command " set -e >>> Thu Dec 6 14:40:08 2012 [TM][I]: set -x >>> Thu Dec 6 14:40:08 2012 [TM][I]: >>> Thu Dec 6 14:40:08 2012 [TM][I]: # get size >>> Thu Dec 6 14:40:08 2012 [TM][I]: SIZE=$(sudo lvs --noheadings -o >>> lv_size "/dev/vg-one/lv-one-26") >>> Thu Dec 6 14:40:08 2012 [TM][I]: >>> Thu Dec 6 14:40:08 2012 [TM][I]: # create lv >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo lvcreate -L${SIZE} vg-one -n >>> lv-one-26-111 >>> Thu Dec 6 14:40:08 2012 [TM][I]: >>> Thu Dec 6 14:40:08 2012 [TM][I]: # clone lv with dd >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo dd if=/dev/vg-one/lv-one-26 >>> of=/dev/vg-one/lv-one-26-111 bs=64k >>> Thu Dec 6 14:40:08 2012 [TM][I]: >>> Thu Dec 6 14:40:08 2012 [TM][I]: # new iscsi target >>> Thu Dec 6 14:40:08 2012 [TM][I]: TID=$(sudo tgtadm --lld iscsi --op >>> show --mode target | grep "Target" | tail -n 1 | >>> awk '{split($2,tmp,":"); print tmp[1]+1;}') >>> Thu Dec 6 14:40:08 2012 [TM][I]: >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new >>> --mode target --tid $TID --targetname >>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111 >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op bind >>> --mode target --tid $TID -I ALL >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new >>> --mode logicalunit --tid $TID --lun 1 --backing-store >>> /dev/vg-one/lv-one-26-111 >>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgt-admin --dump |sudo tee >>> /etc/tgt/targets.conf > /dev/null 2>&1" failed: + sudo lvs >>> --noheadings -o lv_size /dev/vg-one/lv-one-26 >>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records in >>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records out >>> Thu Dec 6 14:40:08 2012 [TM][I]: 8589934592 bytes (8.6 GB) copied, >>> 898.903 s, 9.6 MB/s >>> Thu Dec 6 14:40:08 2012 [TM][I]: tgtadm: this target already exists >>> Thu Dec 6 14:40:08 2012 [TM][E]: Error cloning >>> compute.admin.lan:/dev/vg-one/lv-one-26-111 >>> Thu Dec 6 14:40:08 2012 [TM][I]: ExitCode: 22 >>> Thu Dec 6 14:40:08 2012 [TM][E]: Error executing image transfer >>> script: Error cloning compute.admin.lan:/dev/vg-one/lv-one-26-111 >>> Thu Dec 6 14:40:09 2012 [DiM][I]: New VM state is FAILED >>> >>> After adding traces in the code, I found that there seems to be a race >>> condition in /var/lib/one/remotes/tm/iscsi/clone here the following >>> commands get executed: >>> >>> TID=\$($SUDO $(tgtadm_next_tid)) >>> $SUDO $(tgtadm_target_new "\$TID" "$NEW_IQN") >>> >>> These commands are typically expanded to something like this: >>> >>> TID=$(sudo tgtadm --lld iscsi --op show --mode target | grep "Target" >>> | tail -n 1 | awk '{split($2,tmp,":"); >>> sudo tgtadm --lld iscsi --op new --mode target --tid $TID >>> --targetname iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111 >>> >>> What seems to happens is two (or more) calls to the first command >>> tgtadm_next_tid happen simultaneously before the second command gets a >>> chance to get executed, and then TID as the same value for two (or >>> more) VMs. >>> >>> The workaround I found is to replace the line: >>> TID=\$($SUDO $(tgtadm_next_tid)) >>> with >>> TID=$VMID >>> in /var/lib/one/remotes/tm/iscsi/clone >>> >>> Since $VMID is globally unique no race conditions can happen here. >>> I've tested this and the failures don't happen anymore in my setting. >>> Of course I'm not sure this is the ideal fix, since perhaps VMID can >>> take values that are out of range for tgtadm. So futher testing would >>> be needed. >>> >>> I'd be happy to get your thoughts/feedback on this issue. >>> >>> Best, >>> >>> Alain >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> >> >> >> -- >> Ruben S. Montero, PhD >> Project co-Lead and Chief Architect >> OpenNebula - The Open Source Solution for Data Center Virtualization >> www.OpenNebula.org | [email protected] | @OpenNebula >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org _______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
