Le 22/11/2016 à 18:48, Michael Rasmussen a écrit :
>>> Have you tested your filter rules?
>> Yes, i set this filter at install :
>>
>> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|",
>> "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ]
>>
> Does vgscan and lvscan list
On Tue, 22 Nov 2016 18:04:39 +
Dhaussy Alexandre wrote:
> Le 22/11/2016 à 18:48, Michael Rasmussen a écrit :
> > Have you tested your filter rules?
> Yes, i set this filter at install :
>
> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|",
>
Le 22/11/2016 à 18:48, Michael Rasmussen a écrit :
> Have you tested your filter rules?
Yes, i set this filter at install :
global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|",
"r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ]
>
> On November 22, 2016 6:12:27 PM
Have you tested your filter rules?
On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre
wrote:
>
>Le 22/11/2016 à 17:56, Michael Rasmussen a écrit :
>> On Tue, 22 Nov 2016 16:35:08 +
>> Dhaussy Alexandre wrote:
>>
>>> I don't
Le 22/11/2016 à 17:56, Michael Rasmussen a écrit :
> On Tue, 22 Nov 2016 16:35:08 +
> Dhaussy Alexandre wrote:
>
>> I don't know how, but i feel that every node i add to the cluster currently
>> slows down LVM scan a little more...until it ends up interfering with
On Tue, 22 Nov 2016 16:35:08 +
Dhaussy Alexandre wrote:
>
> I don't know how, but i feel that every node i add to the cluster currently
> slows down LVM scan a little more...until it ends up interfering with cluster
> services at boot...
Maybe you need to tune
...sequel to those thrilling adventures...
I _still_ have problems with nodes not joining the cluster properly after
rebooting...
Here's what we have done last night :
- Stopped ALL VMs (just to ensure no corruption happen in case of unexpected
reboots...)
- Patched qemu from 2.6.1 to 2.6.2 to
Le 14/11/2016 à 12:33, Thomas Lamprecht a écrit :
> Hope that helps a bit understanding. :)
Sure, thank you for clearing things up. :)
I wish i had done this before, but i learned a lot in the last few days...
___
pve-user mailing list
Le 14/11/2016 à 12:34, Dietmar Maurer a écrit :
>> What i understand so far, is that every state/service change from LRM
>> must be acknowledged (cluster-wise) by CRM master.
>> So if a multicast disruption occurs, and i assume LRM wouldn't be able
>> talk to the CRM MASTER, then it also couldn't
On 14.11.2016 11:50, Dhaussy Alexandre wrote:
Le 11/11/2016 à 19:43, Dietmar Maurer a écrit :
On November 11, 2016 at 6:41 PM Dhaussy Alexandre
wrote:
you lost quorum, and the watchdog expired - that is how the watchdog
based fencing works.
I don't expect to
> What i understand so far, is that every state/service change from LRM
> must be acknowledged (cluster-wise) by CRM master.
> So if a multicast disruption occurs, and i assume LRM wouldn't be able
> talk to the CRM MASTER, then it also couldn't reset the watchdog, am i
> right ?
Nothing
Le 11/11/2016 à 19:43, Dietmar Maurer a écrit :
> On November 11, 2016 at 6:41 PM Dhaussy Alexandre
> wrote:
>>> you lost quorum, and the watchdog expired - that is how the watchdog
>>> based fencing works.
>> I don't expect to loose quorum when _one_ node joins or
> On November 11, 2016 at 6:41 PM Dhaussy Alexandre
> wrote:
>
>
> > you lost quorum, and the watchdog expired - that is how the watchdog
> > based fencing works.
>
> I don't expect to loose quorum when _one_ node joins or leave the cluster.
This was probably a
> you lost quorum, and the watchdog expired - that is how the watchdog
> based fencing works.
I don't expect to loose quorum when _one_ node joins or leave the cluster.
Nov 8 10:38:58 proxmoxt20 pmxcfs[22537]: [status] notice: update cluster info
(cluster name pxmcluster, version = 14)
Nov 8
> A long shot. Do you have a hardware watchdog enabled in bios?
I didn't modify any BIOS parameters, except power management.
So I believe it's enabled.
hpwdt module (hp ilo watchdog) is not loaded.
HP ASR is enabled (10 min timeout.)
Ipmi_watchdog is blacklisted.
nmi_watchdog is enabled => I
> Responding to myself, i find this interesting :
>
> Nov 8 10:39:01 proxmoxt35 corosync[35250]: [TOTEM ] A new membership
> (10.xx.xx.11:684) was formed. Members joined: 13
> Nov 8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired -
> disable watchdog updates
you lost quorum,
A long shot. Do you have a hardware watchdog enabled in bios?
On November 11, 2016 4:28:09 PM GMT+01:00, Dhaussy Alexandre
wrote:
>> Do you have a hint why there is no messages in the logs when watchdog
>> actually seems to trigger fencing ?
>> Because when a node
> Do you have a hint why there is no messages in the logs when watchdog
> actually seems to trigger fencing ?
> Because when a node suddently reboots, i can't be sure if it's the watchdog,
> a hardware bug, kernel bug or whatever..
Responding to myself, i find this interesting :
Nov 8 10:39:01
I really hope to find an explanation to all this mess.
Because i'm not very confident right now..
So far if i understand all this correctly.. I'm not very found of how watchdog
behaves with crm/lrm.
To make a comparison with PVE 3 (RedHat cluster), fencing happened on the
corosync/cluster
I had again another outage...
BUT now everything is back online ! yay !
So i think i had (at least) two problems :
1 - When installing/upgrading a node.
If the node sees all SAN storages LUN before install, debian
partitionner tries to scan all LUNs..
This causes almost all nodes to reboot
I have done a cleanup of ressources with echo "" >
/etc/pve/ha/resources.cfg
It seems to have resolved all problems with inconsistent status of
lrm/lcm in the GUI.
A new master have been elected. The manager_status file have been
cleaned up.
All nodes are idle or active.
I am re-starting
Sorry my old message was too big...
Thanks for the input !...
I have attached manager_status files.
.old is the original file, and .new is the file i have modified and put
in /etc/pve/ha.
I know this is bad but here's what i've done :
- delnode on known NON-working nodes.
- rm -Rf
Typo
- delnode on known NON-working nodes.
Le 09/11/2016 à 17:32, Alexandre DHAUSSY a écrit :
> - delnode on known now-working nodes.
___
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Hi,
On 09.11.2016 16:29, Dhaussy Alexandre wrote:
I try to remove from ha in the gui, but nothing happends.
There are some services in "error" or "fence" state.
Now i tried to remove the non-working nodes from the cluster... but i
still see those nodes in /etc/pve/ha/manager_status.
Can you
I try to remove from ha in the gui, but nothing happends.
There are some services in "error" or "fence" state.
Now i tried to remove the non-working nodes from the cluster... but i
still see those nodes in /etc/pve/ha/manager_status.
Le 09/11/2016 à 16:13, Dietmar Maurer a écrit :
>> I wanted
Hello,
I have a big problem on my cluster (1500 HA VMs), storage is LVM + SAN (around
70 PVs, 2000 LVs)
Problems began adding a new node to the cluster…
All nodes crashed and rebooted (happended yesterday)
After some work I managed to get all back online, but some nodes were down
(hardware
26 matches
Mail list logo