Re: [one-users] VM remains in boot state

2010-07-30 Thread Jaime Melis
Hello Rubén,

does this happen all the time, or just when deploying many VMs
simultaneously? Have you tried to do 'virsh create deployment.0' (if you're
using kvm) or 'xm create deployment.0' (if you're using xen) manually from
the worker nodes, to see what happens?

regards,
Jaime

On Thu, Jul 29, 2010 at 2:40 PM, Ruben Diez rd...@cesga.es wrote:

 HI:

 When we attempt to launch a VM, it remains in boot status.

 The deployment.0 file is generated in the OpenNebula side, but it is not
 copied to the $ONE_LOCATION/var/xx/images directory...

 The last message in the vm.log file is:

  Generating deployment file: /srv/cloud/one/var/41/deployment.0

 No strange things appears in oned.log file???

 Some ideas about what is happens??
 Any probe to to debug???

 Thanks

 ___
 Users mailing list
 Users@lists.opennebula.org
 http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] oned crashed - general protection fault

2010-07-30 Thread Tino Vazquez
Hi Neil,

Well, that shouldn't happen ;) Could you please send us a backtrace
with the gdb? If the process still can be attached to, you can use

$ gdb $ONE_LOCATION/bin/oned oned-pid

then, in the (gdb) prompt

(gdb) bt

and send us the output.

If this is not possible, could you please set you system to produce
core files, and send it to us (it should appear under
$ONE_LOCATION/var after the oned crashed again). To achieve this:

$ ulimit -c unlimited

Best regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Fri, Jul 30, 2010 at 11:07 AM, Neil Mooney neil.moo...@sara.nl wrote:
 Oned just crashed , leaving just this message in the log:

 [2046071.765934] oned[17361] general protection ip:7feaaee82e0a
 sp:7feaa5fcc940 error:0 in libc-2.11.2.so[7feaaee11000+158000]

 Our distro is Debian / squeeze running a 2.6.32-5-amd64 kernel.

 We are seeing crashes quite often now, considering auto-restarting from
 nagios or similar ...

 Cheers

 Neil
 ___
 Users mailing list
 Users@lists.opennebula.org
 http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] oned crashed - general protection fault

2010-07-30 Thread Saurav Lahiri
Hi Neil,
We faced a similar problem(general protection error). On moving libxmlrpc to
latest super stable version 1.06.40 we were able to resolve this situation.
We had to build libxmlrpc 1.06.40 from source.

Thanks and Regards
Saurav Lahiri

On Fri, Jul 30, 2010 at 3:36 PM, Tino Vazquez tin...@fdi.ucm.es wrote:

 Hi Neil,

 Well, that shouldn't happen ;) Could you please send us a backtrace
 with the gdb? If the process still can be attached to, you can use

 $ gdb $ONE_LOCATION/bin/oned oned-pid

 then, in the (gdb) prompt

 (gdb) bt

 and send us the output.

 If this is not possible, could you please set you system to produce
 core files, and send it to us (it should appear under
 $ONE_LOCATION/var after the oned crashed again). To achieve this:

 $ ulimit -c unlimited

 Best regards,

 -Tino

 --
 Constantino Vázquez Blanco | dsa-research.org/tinova
 Virtualization Technology Engineer / Researcher
 OpenNebula Toolkit | opennebula.org



 On Fri, Jul 30, 2010 at 11:07 AM, Neil Mooney neil.moo...@sara.nl wrote:
  Oned just crashed , leaving just this message in the log:
 
  [2046071.765934] oned[17361] general protection ip:7feaaee82e0a
  sp:7feaa5fcc940 error:0 in libc-2.11.2.so[7feaaee11000+158000]
 
  Our distro is Debian / squeeze running a 2.6.32-5-amd64 kernel.
 
  We are seeing crashes quite often now, considering auto-restarting from
  nagios or similar ...
 
  Cheers
 
  Neil
  ___
  Users mailing list
  Users@lists.opennebula.org
  http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
 
 ___
 Users mailing list
 Users@lists.opennebula.org
 http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] VM's are going to pending state though enough resources

2010-07-30 Thread bathina nageswararao
Hi Carlos,


Please find the below details to resolve the VM pending state.


onead...@onefrontend:~$ onehost list

  ID NAME  RVM   TCPU   FCPU   ACPU    TMEM    FMEM STAT
   0 192.168.138.241 0    400    400    400 8189640 7408148   on


onead...@onefrontend:~$ onevm list
  ID USER NAME STAT CPU MEM    HOSTNAME    TIME



Template file details

onead...@onefrontend:~/one-template$ cat ttylinux.one
NAME   = ttylinux1
CPU    = 1
MEMORY = 128

DISK   = [
  source   = /srv/cloud/one/one-template/ttylinux.img,
  target   = hda,
  readonly = no ]

NIC    = [ NETWORK = Small network ]

FEATURES=[ acpi=no ]

#CONTEXT = [
#    hostname    = $NAME,
#    ip_public   = PUBLIC_IP,
#    files  = /path/to/init.sh /path/to/id_dsa.pub,
#    target  = hdc,
#    root_pubkey = id_dsa.pub,
#    username    = opennebula,
#    user_pubkey = id_dsa.pub
#]
# --- VNC server ---

GRAPHICS = [
  type    = vnc,
  listen  = 0.0.0.0,
  passwd = passwd,
  port    = 501]


onead...@onefrontend:~/one-template$ onevm list
  ID USER NAME STAT CPU MEM    HOSTNAME    TIME
  54 oneadmin ttylinux runn   0  131072 192.168.138.241 00 00:07:51
  55 oneadmin ttylinux runn   0  131072 192.168.138.241 00 00:07:24
  56 oneadmin ttylinux runn   0  131072 192.168.138.241 00 00:06:08
  57 oneadmin ttylinux runn   0  131072 192.168.138.241 00 00:05:00
  58 oneadmin ttylinux pend   0   0 00 00:03:31
  59 oneadmin ttylinux pend   0   0 00 00:01:51


onead...@onefrontend:~/one-template$ onehost list
  ID NAME  RVM   TCPU   FCPU   ACPU    TMEM    FMEM STAT
   0 192.168.138.241 4    400    394    394 8189640 7343376   on


onead...@onefrontend:~/var$ onehost show 0
HOST 0 INFORMATION
ID    : 0
NAME  : 192.168.138.241
STATE : MONITORED
IM_MAD    : im_kvm
VM_MAD    : vmm_kvm
TM_MAD    : tm_nfs

HOST SHARES
MAX MEM   : 8189640
USED MEM (REAL)   : 1016876
USED MEM (ALLOCATED)  : 524288
MAX CPU   : 400
USED CPU (REAL)   : 10
USED CPU (ALLOCATED)  : 400
RUNNING VMS   : 4

MONITORING INFORMATION
ARCH=x86_64
CPUSPEED=2499
FREECPU=389.2
FREEMEMORY=7343080
HOSTNAME=onenode1
HYPERVISOR=kvm
MODELNAME=Intel(R) Xeon(R) CPU   X3323  @ 2.50GHz
NETRX=0
NETTX=0
TOTALCPU=400
TOTALMEMORY=8189640
USEDCPU=10.8
USEDMEMORY=1016876

onead...@onefrontend:~/var$ cat sched.log

Fri Jul 30 19:31:01 2010 [SCHED][I]: Dispatching virtual machine 56 to HID: 0
  93177 Fri Jul 30 19:31:31 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93178 Fri Jul 30 19:31:31 2010 [VM][D]: Pending virtual machines :
  93179 Fri Jul 30 19:31:31 2010 [SCHED][I]: Select hosts
  93180 PRI HID
  93181 ---
  93182
  93183 Fri Jul 30 19:32:01 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93184 Fri Jul 30 19:32:01 2010 [VM][D]: Pending virtual machines : 57
  93185 Fri Jul 30 19:32:01 2010 [RANK][W]: No rank defined for VM
  93186 Fri Jul 30 19:32:01 2010 [SCHED][I]: Select hosts
  93187 PRI HID
  93188 ---
  93189 Virtual Machine: 57
  93190 0   0
  93191
  93192
  93193 Fri Jul 30 19:32:01 2010 [SCHED][I]: Dispatching virtual machine 57 to 
HID: 0
  93194 Fri Jul 30 19:32:31 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93195 Fri Jul 30 19:32:31 2010 [VM][D]: Pending virtual machines :
  93196 Fri Jul 30 19:32:31 2010 [SCHED][I]: Select hosts
  93197 PRI HID
  93198 ---
  93199
  93200 Fri Jul 30 19:33:01 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93201 Fri Jul 30 19:33:01 2010 [VM][D]: Pending virtual machines :
  93202 Fri Jul 30 19:33:01 2010 [SCHED][I]: Select hosts
  93203 PRI HID
  93204 ---
  93205
  93206 Fri Jul 30 19:33:31 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93207 Fri Jul 30 19:33:31 2010 [VM][D]: Pending virtual machines : 58
  93208 Fri Jul 30 19:33:31 2010 [RANK][W]: No rank defined for VM
  93209 Fri Jul 30 19:33:31 2010 [SCHED][I]: Select hosts
  93210 PRI HID
  93211 ---
  93212 Virtual Machine: 58
  93213
  93214
  93215 Fri Jul 30 19:34:01 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93216 Fri Jul 30 19:34:01 2010 [VM][D]: Pending virtual machines : 58
  93217 Fri Jul 30 19:34:01 2010 [RANK][W]: No rank defined for VM
  93218 Fri Jul 30 19:34:01 2010 [SCHED][I]: Select hosts
  93219 PRI HID
  93220 ---
  93221 Virtual Machine: 58
  93222
  93223
  93224 Fri Jul 30 19:34:31 2010 [HOST][D]: Discovered Hosts (enabled): 0
  93225 Fri Jul 30 19:34:31 2010 [VM][D]: Pending virtual machines : 58
  93226 Fri Jul 30 19:34:31 2010 [RANK][W]: No rank defined for VM
  93227 Fri Jul 30 19:34:31 2010 [SCHED][I]: Select hosts
  93228 PRI HID
  93229 

[one-users] In Onemc LCM state is unknown

2010-07-30 Thread Mirza Baig
Hi,
  
Due
 to some reasons my fronetend and node machines got powered off. After 
restarting the systems,  in onemc all running images are showing as 
below and i am unable to connect to those images. please help me in 
bringing the images up.
  
Id   
 User  Name    VM State LCM State   Cpu    
Memory  Host   VNC Port 
Time  
44  
 oneadmin   tty9a  active    unknown 
0  131072  192.168.138.241    6  3d 
2:26:35    [console] [details] [log] 
  
Thanks  Regards,
Waseem


  ___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] In Onemc LCM state is unknown

2010-07-30 Thread Csom Gyula
Hi,

it's just a tipp: you may try restarting your vm-s eg:

  onevm restart 44

This worked for us when powering of the vm from within the vm itself.
Your situation is not the same, but similar...

Hope it helps,

Cheers
Gyula


Feladó: 
users-boun...@lists.opennebula.orgmailto:users-boun...@lists.opennebula.org 
[users-boun...@lists.opennebula.orgmailto:users-boun...@lists.opennebula.org] 
; meghatalmazó: Mirza Baig [waseem_...@yahoo.commailto:waseem_...@yahoo.com]
Küldve: 2010. július 30. 16:33
Címzett: users@lists.opennebula.orgmailto:users@lists.opennebula.org
Tárgy: [one-users] In Onemc LCM state is unknown


Hi,



Due to some reasons my fronetend and node machines got powered off. After 
restarting the systems,  in onemc all running images are showing as below and i 
am unable to connect to those images. please help me in bringing the images up.



IdUser  NameVM State LCM State   Cpu
Memory  Host   VNC Port Time

44   oneadmin   tty9a  activeunknown 0  
131072  192.168.138.2416  3d 2:26:35
[console] [details] [log]



Thanks  Regards,
Waseem



___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] Monitoring/Deployement issues: Virsh failes often

2010-07-30 Thread Tino Vazquez
Hi Floris,

We are not really proposing that you decrease the VM polling time,
that's just a new default in the new version that can be changed at
will.

Furthermore, I think we are mixing two issues here:

Monitoring Issue


OpenNebula performs a dominfo request to extract VM information. The
way it is doing it now is blocking the libvirt socket, this causes
issues with other operations that are being performed simultaneously.
This can be solved by making this request readonly, as you proposed.
Other possible solution, perfectly aligned and contemplated in the
OpenNebula design, is to develop new probes for the Information
Manager as other large deployments does that tackles SNMP, Ganglia,
Nagios or similar tools inside the VMs and/or the physical hosts to
avoid saturation of a particular hypervisor (as this case).

Deployment Issue
---

OpenNebula performs a domain create operation, which also blocks the
socket. This basically causes the same behavior as the monitoring
issue, but cannot be solved with the readonly flag, since this
operation is not possible in readonly mode. What OpenNebula is
providing here is means to circumvent limitations showed by libvirt by
spacing the domain create operations.


Summarizing, libvirt doesn't like more than one simultaneous operation
with VMs since it blocks the socket. The monitoring issue can be
solved with the readonly flag or by creating new monitoring probes,
and therefore the VM polling frequency can be lifted at will if you
feel that the default frequency doesn't cut your infrastructure needs.
The deployment issue cannot be solved by any SNMP-like mechanism, and
needs to be handled with care, we know that the current OpenNebula
approach works for large deployments just by spacing the deployments.

Best regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Fri, Jul 30, 2010 at 12:49 PM, Floris Sluiter floris.slui...@sara.nl wrote:
 Hi Tino and List,

 For us these  methods are not very acceptable in a production environment. We 
 feel that we do need to monitor the status of the Cloud, both the hosts and 
 the VMs, at least once every minute for each component. Stopping or 
 drastically reducing the monitoring of VMs is not the way to go for us. If 
 the method of monitoring causes the Cloud to fail, then the method needs 
 changing, not the frequency of it...

 I'll see what we can come up with to improve on this, we did already donate 
 the SNMP driver for the hosts, maybe something similar can be done for the 
 VMs (we are testing the read only method for virsh).

 Kind regards,

 Floris


 -Original Message-
 From: tinov...@gmail.com [mailto:tinov...@gmail.com] On Behalf Of Tino Vazquez
 Sent: donderdag 29 juli 2010 19:29
 To: Floris Sluiter
 Cc: DuDu; users@lists.opennebula.org
 Subject: Re: Monitoring/Deployement issues: Virsh failes often

 Dear Floris,

 We noticed this behavior with the scalability tests we put OpenNebula
 through, for 2.0 there was a ticket opened regarding this [1]. It
 happens with libvirt and also with the xen hypervisor. It is by no
 means an OpenNebula scalability issue, since the intended libvirt
 behavior is not the one exposed.

 Notwithstanding, we introduced in the 2.0 version means to avoid this
 unpredictable behavior. We have limited the number of simultaneous
 deployments to a same host to one, to avoid this blocked socket issue.
 This can be configured through the scheduler configuration file. More
 information on this can be found on [2]. Also, we have introduced a
 limitation on the simultaneous VM polling requests, also with the same
 purpose.

 With the changes detailed above, OpenNebula is able to deploy dozens
 of thousands of virtual machines on a pool of hundreds of physical
 servers at the same time.

 The readonly method to perform the polling is a very neat and
 interesting proposal, we will evaluate its inclusion in the 2.0
 version. Thanks a lot for this valuable feedback.

 Best regards,

 -Tino

 [1] http://dev.opennebula.org/issues/261
 [2] http://www.opennebula.org/documentation:rel2.0:schg

 --
 Constantino Vázquez Blanco | dsa-research.org/tinova
 Virtualization Technology Engineer / Researcher
 OpenNebula Toolkit | opennebula.org



 On Thu, Jul 29, 2010 at 6:28 PM, Floris Sluiter floris.slui...@sara.nl 
 wrote:
 Hi,



 We again had issues when having many VMs deployed on many hosts at the same
 time (log excerpts below) and deploying more.

 We saw over  25 runaway  VMs left behind running from the last two weeks,
 that one had marked as DONE, also deploy, copy and stop failed randomly
 quite often.



 It starts to be a major problem, when we can't run opennebula in a stable
 and predictable manner on larger Clouds.

 We have the following intervals configured, we do need to monitor more often
 then every 10 minutes we feel.

 

Re: [one-users] Monitoring/Deployement issues: Virsh failes often

2010-07-30 Thread Floris Sluiter
Hi Tino,

I do not think that spacing alone will solve it. Maybe a solution would be to 
detect if virsh is locked, then wait and retry until  the lock is freed (for 
example with a mutex mechanism)? Because deploying to a host where another user 
is busy with a VM will probably still result in a fail. Or multiple users 
deploying/deleting on the same host. It is not a scheduling issue, it is a 
concurrency issue...

Kind regards,

Floris



-Original Message-
From: tinov...@gmail.com [mailto:tinov...@gmail.com] On Behalf Of Tino Vazquez
Sent: vrijdag 30 juli 2010 18:07
To: Floris Sluiter
Cc: users@lists.opennebula.org
Subject: Re: Monitoring/Deployement issues: Virsh failes often

Hi Floris,

We are not really proposing that you decrease the VM polling time,
that's just a new default in the new version that can be changed at
will.

Furthermore, I think we are mixing two issues here:

Monitoring Issue


OpenNebula performs a dominfo request to extract VM information. The
way it is doing it now is blocking the libvirt socket, this causes
issues with other operations that are being performed simultaneously.
This can be solved by making this request readonly, as you proposed.
Other possible solution, perfectly aligned and contemplated in the
OpenNebula design, is to develop new probes for the Information
Manager as other large deployments does that tackles SNMP, Ganglia,
Nagios or similar tools inside the VMs and/or the physical hosts to
avoid saturation of a particular hypervisor (as this case).

Deployment Issue
---

OpenNebula performs a domain create operation, which also blocks the
socket. This basically causes the same behavior as the monitoring
issue, but cannot be solved with the readonly flag, since this
operation is not possible in readonly mode. What OpenNebula is
providing here is means to circumvent limitations showed by libvirt by
spacing the domain create operations.


Summarizing, libvirt doesn't like more than one simultaneous operation
with VMs since it blocks the socket. The monitoring issue can be
solved with the readonly flag or by creating new monitoring probes,
and therefore the VM polling frequency can be lifted at will if you
feel that the default frequency doesn't cut your infrastructure needs.
The deployment issue cannot be solved by any SNMP-like mechanism, and
needs to be handled with care, we know that the current OpenNebula
approach works for large deployments just by spacing the deployments.

Best regards,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org



On Fri, Jul 30, 2010 at 12:49 PM, Floris Sluiter floris.slui...@sara.nl wrote:
 Hi Tino and List,

 For us these  methods are not very acceptable in a production environment. We 
 feel that we do need to monitor the status of the Cloud, both the hosts and 
 the VMs, at least once every minute for each component. Stopping or 
 drastically reducing the monitoring of VMs is not the way to go for us. If 
 the method of monitoring causes the Cloud to fail, then the method needs 
 changing, not the frequency of it...

 I'll see what we can come up with to improve on this, we did already donate 
 the SNMP driver for the hosts, maybe something similar can be done for the 
 VMs (we are testing the read only method for virsh).

 Kind regards,

 Floris


 -Original Message-
 From: tinov...@gmail.com [mailto:tinov...@gmail.com] On Behalf Of Tino Vazquez
 Sent: donderdag 29 juli 2010 19:29
 To: Floris Sluiter
 Cc: DuDu; users@lists.opennebula.org
 Subject: Re: Monitoring/Deployement issues: Virsh failes often

 Dear Floris,

 We noticed this behavior with the scalability tests we put OpenNebula
 through, for 2.0 there was a ticket opened regarding this [1]. It
 happens with libvirt and also with the xen hypervisor. It is by no
 means an OpenNebula scalability issue, since the intended libvirt
 behavior is not the one exposed.

 Notwithstanding, we introduced in the 2.0 version means to avoid this
 unpredictable behavior. We have limited the number of simultaneous
 deployments to a same host to one, to avoid this blocked socket issue.
 This can be configured through the scheduler configuration file. More
 information on this can be found on [2]. Also, we have introduced a
 limitation on the simultaneous VM polling requests, also with the same
 purpose.

 With the changes detailed above, OpenNebula is able to deploy dozens
 of thousands of virtual machines on a pool of hundreds of physical
 servers at the same time.

 The readonly method to perform the polling is a very neat and
 interesting proposal, we will evaluate its inclusion in the 2.0
 version. Thanks a lot for this valuable feedback.

 Best regards,

 -Tino

 [1] http://dev.opennebula.org/issues/261
 [2] http://www.opennebula.org/documentation:rel2.0:schg

 --
 Constantino Vázquez Blanco | dsa-research.org/tinova