Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-22 Thread Ulrich Windl
Hi! I'm no expert, but checkout the previous commit, apply you patch, tehn do git add --interactive and you can pick each chunk for the next commit. The rest is still there, but won't be committed. You may repeat the git add then. Regards, Ulrich Tom Parker tpar...@cbnco.com schrieb am

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Ulrich Windl
Hi! Basically I think there should be no hard-coded constants whose value depends on some performance measurements, like 5s for rebooting a VM. So I support Tom's changes. However I noticed: +running; apparently, this period lasts only for a second or +two (missing full stop at end of

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Tom Parker
Thanks for the feedback. Dejan, I have some SLES nodes that are running around 30 pretty heavy VMs and I found that while I never go to 5s that the time it would take to reboot was not a constant. I have a feeling that this bug in xen-list may take a while to be fixed upstream and trickle down

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Dejan Muhamedagic
Hi Ulrich! On Mon, Oct 21, 2013 at 09:28:50AM +0200, Ulrich Windl wrote: Hi! Basically I think there should be no hard-coded constants whose value depends on some performance measurements, like 5s for rebooting a VM. It's actually not 5s, but the status is run 5 times. If the load is high,

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Dejan Muhamedagic
On Fri, Oct 18, 2013 at 02:03:23PM -0400, Tom Parker wrote: I may have actually created the pull request properly... Indeed! You should have the commits be self-contained and fix one thing only. If you need to create items in the commit description, that's a signal that there's too much stuff

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Dejan Muhamedagic
On Mon, Oct 21, 2013 at 07:49:58AM -0400, Tom Parker wrote: Thanks for the feedback. Dejan, I have some SLES nodes that are running around 30 pretty heavy VMs and I found that while I never go to 5s that the time it would take to reboot was not a constant. As mentioned elsewhere, it is

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Tom Parker
Hi Dejan. How can I revert my commits so that they are not include multiple things? I will submit one patch with the logging cleanup and then if needed another with my changes to the meta-data. Tom On 10/21/2013 09:39 AM, Dejan Muhamedagic wrote: Hi Ulrich! On Mon, Oct 21, 2013 at 09:28:50AM

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-18 Thread Tom Parker
Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything looks good. I spent some time tweaking the RA and I added a parameter called wait_for_reboot (default 5s) to allow us to override the reboot sleep times (in case it's more than 5 seconds on really loaded

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-18 Thread Tom Parker
I may have actually created the pull request properly... Please let me know and again thanks for your help. Tom On 10/18/2013 01:30 PM, Tom Parker wrote: Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything looks good. I spent some time tweaking the RA

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-17 Thread Dejan Muhamedagic
Hi Tom, On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Yes, you're right. I'll drop that part of the if statement. Many thanks for testing. Cheers, Dejan

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-17 Thread Dejan Muhamedagic
On Thu, Oct 17, 2013 at 11:45:17AM +0200, Dejan Muhamedagic wrote: Hi Tom, On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Yes, you're right. I'll drop that part of the if

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Dejan Muhamedagic
Hi Tom, On Tue, Oct 15, 2013 at 07:55:11PM -0400, Tom Parker wrote: Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... Do you know

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Dejan Muhamedagic
Hi, On Thu, Oct 10, 2013 at 08:29:04AM -0400, Tom Parker wrote: This scares me too. If the start operation finds a running vm and fails, my cluster config will automatically try to start the same VM on the next node it has available. This scenario almost guarantees duplicate VMs even if I

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Tom Parker
Hi. I think there is an issue with the Updated Xen RA. I think there is an issue with the if statement here but I am not sure. I may be confused about how bash || works but I don't see my servers ever entering the loop on a vm disappearing. if ocf_is_probe || [ $__OCF_ACTION != stop ]; then

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Tom Parker
Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Xen_Status_with_Retry() is only called from Stop and Monitor so we only need to check if it's a probe. Everything else should be handled in the case statement in the loop. Tom On 10/16/2013

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-15 Thread Tom Parker
Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... Do you know where I can set my logging to see warn level messages? I expected to see them

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-10 Thread Tom Parker
This scares me too. If the start operation finds a running vm and fails, my cluster config will automatically try to start the same VM on the next node it has available. This scenario almost guarantees duplicate VMs even if I have the on_reboot=destroy. Dejan, I am not sure but I don't think

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-10 Thread Tom Parker
I want to test the updated RA. Does anyone know how I can increase the loglevel to warn or debug without restarting my cluster? I am not seeing any of the new messages in my logs. Tom On 10/08/2013 07:52 AM, Ulrich Windl wrote: Hi! I thought, I'll never be bitten by this bug, but I actually

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-08 Thread Ulrich Windl
Hi! I thought, I'll never be bitten by this bug, but I actually was! Now I'm wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is still counting down for actual boot... But the reason why I'm writing is that I think I've discovered another bug in the RA: CRM decided to

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-08 Thread Dejan Muhamedagic
Hi, On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote: Hi! I thought, I'll never be bitten by this bug, but I actually was! Now I'm wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is still counting down for actual boot... But the reason why I'm

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-02 Thread Tom Parker
Thanks to everyone who helped on this one. I really appreciate the speed that this has been looked at and resolved. I am kind of surprised that no one has reported it before. Lars. Do you know the bug report number with the Xen guys? I would like to watch that as it progresses as well.

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-01 Thread Lars Marowsky-Bree
On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one with this issue. For now I have set all my VMs to destroy so that the cluster is the only thing managing them but this is not super

[Linux-HA] Antw: Re: Xen RA and rebooting

2013-09-30 Thread Ulrich Windl
Hi! With Xen paravirtualization, when a VM (guest) is rebootet (e.g. via guest's reboot), the actual VM (which doesn't really exist as a concept in paravirtualization) is destroyed for a moment and then is recreated (AFAIK). That's why xm console does not survive a guest reboot, and that's why

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-09-30 Thread Tom Parker
Hi Ulrich. You have summed it up exactly and the chances seem small but in the real world (Murphy's Law I guess) I have hit this many times. Twice to the point where I have mangled a Production VM to the point of garbage. The larger the available free memory on the cluster as a whole seems to