Hi All, my issue came back. So it wasn't related to having Proxmox 4.2 on 4 nodes and Proxmox 4.3 on the other 8 nodes.
Now for example if I log into the web UI of my first node all the 11 other nodes are marked with the red cross. But if I click on a node I can still see the summary (uptime, load, etc), still can get a shell on other nodes. But I can't see the name/status of virtual machines running on the red crossed nodes (I can only see the VM ID/number). And of course I can't migrated any VM from one host to another. Any ideas? Thanks! On Wed, Oct 26, 2016 at 12:57 PM, Szabolcs F. <subc...@gmail.com> wrote: > Hello again, > > sorry for another followup. I just realised that 4 of the 12 cluster nodes > still have PVE Manager version 4.2 and the other 8 nodes have version 4.3. > Can this be the reason of all my troubles? > > I'm in the process of updating these 4 nodes. These 4 nodes were installed > with the Proxmox install media, but the other 8 nodes were installed with > Debian 8 first. So the 4 outdated nodes didn't have the 'deb > http://download.proxmox.com/debian jessie pve-no-subscription' repo file. > Adding this repo made the 4.3 updates available. > > > > On Wed, Oct 26, 2016 at 12:20 PM, Szabolcs F. <subc...@gmail.com> wrote: > >> Hi Michael, >> >> I can change to LACP, sure. Would it be better than simple active-backup? >> I haven't got too much experience with LACP though. >> >> On Wed, Oct 26, 2016 at 11:55 AM, Michael Rasmussen <m...@miras.org> >> wrote: >> >>> Is it possible to switch to 802.3ad bond mode? >>> >>> On October 26, 2016 11:12:06 AM GMT+02:00, "Szabolcs F." < >>> subc...@gmail.com> wrote: >>> >>>> Hi Lutz, >>>> >>>> my bondXX files look like this: http://pastebin.com/GX8x3ZaN >>>> and my corosync.conf : http://pastebin.com/2ss0AAEr >>>> >>>> Mutlicast is enabled on my switches. >>>> >>>> The problem is I don't have a way to to replicate the problem, it seems to >>>> happen randomly, so I'm unsure how to do more tests. At the moment my >>>> cluster is working fine for about 16 hours. Any ideas forcing the issue? >>>> >>>> Thanks, >>>> Szabolcs >>>> >>>> On Wed, Oct 26, 2016 at 9:17 AM, Lutz Willek >>>> <l.wil...@science-computing.de> >>>> wrote: >>>> >>>> Am 24.10.2016 um 15:16 schrieb Szabolcs F.: >>>>> >>>>> Corosync has a lot of >>>>>> these in the /var/logs/daemon.log : >>>>>> http://pastebin.com/ajhE8Rb9 >>>>> >>>>> >>>>> >>>>> please carefully check your (node/switch/multicast) network >>>>> configuration, >>>>> and please paste your corosync configuration file and output of >>>>> /proc/net/bonding/bondXX >>>>> >>>>> just a guess: >>>>> >>>>> * powerdown 1/3 - 1/2 of your nodes, adjust quorum (pvecm expect) >>>>> --> Problems still occours? >>>>> >>>>> * during "problem time" >>>>> --> omping is still ok? >>>>> >>>>> https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quor >>>>> um_and_cluster_issues >>>>> >>>>> >>>>> Freundliche Grüße / Best Regards >>>>> >>>>> Lutz Willek >>>>> >>>>> -- >>>>> ------------------------------ >>>>> creating IT solutions >>>>> Lutz Willek science + computing ag >>>>> Senior Systems Engineer Geschäftsstelle Berlin >>>>> IT Services Berlin >>>>> Friedrichstraße 187 >>>>> phone +49(0)30 2007697-21 10117 Berlin, Germany >>>>> fax +49(0)30 2007697-11 http://de.atos.net/sc >>>>> >>>>> S/MIME-Sicherheit: >>>>> http://www.science-computing.de/cacert.crt >>>>> http://www.science-computing.de/cacert-sha512.crt >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> pve-user mailing list >>>>> pve-user@pve.proxmox.com >>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>> >>>> >>>> ------------------------------ >>>> >>>> pve-user mailing list >>>> pve-user@pve.proxmox.com >>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>> >>>> >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> >> >> > _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user