Occasional Reservation Failures Performance Tuning

2012-01-17 Thread Evelio Quiros
Hello,

We have a small VCL system serving 200 images on 8 VMware servers. Our system 
seems to operate pretty well, except that we occasionally get reservation 
failures. The servers (web, mgt, db) do not appear overloaded, max at about 10% 
utilization. Out of about 200 reservations (roughly simultaneous after a block 
allocation), about 4 reservations fail. These failures usually indicate failed 
to update private IP address. I have included some of the failure entries from 
vcld.log below. What can we do to improve the reliability and performance of 
our VCL system ?

Thanks,
Al Quiros
Florida International University



|4443|2053:2053|new| 2012-01-17 
10:24:25|4443|2053:2053|new|OS.pm:update_public_ip_address(608)|failed to 
retrieve dynamic public IP address from vclimg9
2012-01-17 10:24:25|4443|2053:2053|new|utils.pm:insertloadlog(3875)|inserted 
computer=20, dynamicDHCPaddress, failed to retrieve dynamic public IP address 
from vclimg9
|4443|2053:2053|new| 2012-01-17 
10:24:25|4443|2053:2053|new|State.pm:reservation_failed(213)|reservation failed 
on vclimg9: failed to update private IP address
|4443|2053:2053|new| ( 0) State.pm, reservation_failed (line: 213)
2012-01-17 10:24:26|4443|2053:2053|new|utils.pm:insertloadlog(3875)|inserted 
computer=20, failed, failed to update private IP address
2012-01-17 
10:24:26|4443|2053:2053|new|State.pm:reservation_failed(216)|inserted 
computerloadlog entry
2012-01-17 10:24:26|4443|2053:2053|new|State.pm:reservation_failed(224)|updated 
log ending value to 'failed', logid=474
2012-01-17 
10:24:26|4443|2053:2053|new|utils.pm:update_computer_state(2033)|computer 20 
state updated to: failed
2012-01-17 
10:24:26|4443|2053:2053|new|State.pm:reservation_failed(235)|computer vclimg9 
(20) state set to failed
2012-01-17 
10:24:26|4443|2053:2053|new|utils.pm:update_request_state(1991)|request 2053 
state updated to: failed, laststate to: new
2012-01-17 10:24:26|4443|2053:2053|new|State.pm:reservation_failed(248)|set 
request state to 'failed'/'new'
2012-01-17 10:24:26|4443|2053:2053|new|State.pm:reservation_failed(257)|vclimg9 
in blockcomputers table
2012-01-17 10:24:26|4443|2053:2053|new|State.pm:reservation_failed(258)|removed 
vclimg9 from blockcomputers table
2012-01-17 10:24:26|4443|2053:2053|new|State.pm:reservation_failed(269)|exiting 
1



|32292|1990:1990|reload| 2012-01-17 
10:26:19|32292|1990:1990|reload|OS.pm:wait_for_response(465)|failed to connect 
to vclimg148 via SSH after 600 seconds
|32292|1990:1990|reload| 2012-01-17 
10:26:19|32292|1990:1990|reload|VMware.pm:load(419)|failed to perform OS 
post-load tasks on VM vclimg148 on VM host: idp06.fiu.edu
|32292|1990:1990|reload| 2012-01-17 
10:26:19|32292|1990:1990|reload|new.pm:reload_image(623)|vmwarelinux-LinuxBase19-v0
 failed to load on vclimg148, returning
2012-01-17 
10:26:19|32292|1990:1990|reload|utils.pm:insertloadlog(3875)|inserted 
computer=166, loadimagefailed, vmwarelinux-LinuxBase19-v0 failed to load on 
vclimg148
|32292|1990:1990|reload| 2012-01-17 
10:26:19|32292|1990:1990|reload|new.pm:process(295)|failed to load vclimg148 
with vmwarelinux-LinuxBase19-v0
|32292|1990:1990|reload| 2012-01-17 
10:26:19|32292|1990:1990|reload|State.pm:reservation_failed(213)|reservation 
failed on vclimg148: process failed after trying to load or make available
|32292|1990:1990|reload| ( 0) State.pm, reservation_failed (line: 213)
2012-01-17 
10:26:20|32292|1990:1990|reload|utils.pm:insertloadlog(3875)|inserted 
computer=166, failed, process failed after trying to load or make available
2012-01-17 
10:26:20|32292|1990:1990|reload|State.pm:reservation_failed(216)|inserted 
computerloadlog entry
2012-01-17 
10:26:20|32292|1990:1990|reload|utils.pm:update_computer_state(2033)|computer 
166 state updated to: failed
2012-01-17 
10:26:20|32292|1990:1990|reload|State.pm:reservation_failed(235)|computer 
vclimg148 (166) state set to failed
2012-01-17 
10:26:20|32292|1990:1990|reload|utils.pm:update_request_state(1991)|request 
1990 state updated to: failed, laststate to: reload
2012-01-17 10:26:20|32292|1990:1990|reload|State.pm:reservation_failed(248)|set 
request state to 'failed'/'reload'
2012-01-17 
10:26:20|32292|1990:1990|reload|State.pm:reservation_failed(266)|vclimg148 is 
NOT in blockcomputers table
2012-01-17 
10:26:20|32292|1990:1990|reload|State.pm:reservation_failed(269)|exiting 1
2012-01-17 10:26:20|2218|1990:1990|failed|vcld:main(252)|request deleted



|6463|2090:2090|new| 2012-01-17 
10:28:47|6463|2090:2090|new|Linux.pm:get_network_configuration(2635)|failed to 
determine the public interface name
|6463|2090:2090|new| 2012-01-17 
10:28:47|6463|2090:2090|new|Linux.pm:get_public_ip_address(2736)|failed to 
retrieve public network configuration
|6463|2090:2090|new| 2012-01-17 
10:28:47|6463|2090:2090|new|OS.pm:update_public_ip_address(608)|failed to 
retrieve dynamic public IP address from vclimg128
2012-01-17 10:28:47|6463|2090:2090|new|utils.pm:insertloadlog(3875)|inserted 
computer=146, 

Removing idle computers

2012-01-17 Thread Emir Imamagic

Hello,

during the very nice VCL Bootcamp you organized in Wroclaw we discussed 
about the option to switch off computers without reservation.


First few words about our infrastructure. It consists of 2 VMWare ESXi 
boxes with 256GB RAM each handling 60-100 VMs. We use iSCSI-based 
storage for datastore and 2 VM stores. Since VMFS3 is clustered this 
works quite well and bootstrap of machines takes less than a minute.


In this setup reload of machines takes longer than bootstrap and 
therefore it would be better to switch off/remove the VM once the 
reservation is done.


Are you planning add such feature in the next version?

Thanks
--
Emir Imamagic
Sektor za racunalne sustave
Sveuciliste u Zagrebu, Sveucilisni racunski centar (Srce), www.srce.unizg.hr
emir.imama...@srce.hr, tel: +385 1 616 5809, fax: +385 1 616 5559