[PVE-User] HA migration behaviour vs. failures

2014-07-22 Thread Dhaussy Alexandre
Greetings, I've been "playing" with the last version of proxmox (3 nodes cluster + glusterfs) for a couple of month. My goal is to replace 3 RedHat 5 KVM servers (no HA) hosting ~100 VMs on NAS storage. But i have some annoying issues with live migrations.. Sometimes it will work, but sometimes

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-22 Thread Dhaussy Alexandre
Le 22/07/2014 18:56, Michael Rasmussen a écrit : > On Tue, 22 Jul 2014 16:30:05 + > Dhaussy Alexandre wrote: > >> >> But i have some annoying issues with live migrations.. >> Sometimes it will work, but sometimes (with no reason) it won't. >> When it

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-22 Thread Dhaussy Alexandre
ve ceph a try. > > Sent from my iPhone > >> On 22 Jul 2014, at 7:25 PM, Dhaussy Alexandre >> wrote: >> >> >> >> Le 22/07/2014 18:56, Michael Rasmussen a écrit : >>> On Tue, 22 Jul 2014 16:30:05 + >>> Dhaussy Alexandre wrote:

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-24 Thread Dhaussy Alexandre
22/07/2014 18:30, Dhaussy Alexandre a écrit : > Greetings, > > I've been "playing" with the last version of proxmox (3 nodes cluster + > glusterfs) for a couple of month. > My goal is to replace 3 RedHat 5 KVM servers (no HA) hosting ~100 VMs on NAS > storage. &g

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-24 Thread Dhaussy Alexandre
In broad outline. With RedHat cluster : 1/ @node1: clusvcadm -M vm:foo -m node2 2/ @node2: kill the kvm/foo process 3/ @node1: clusvcadm fails with non-critical error (OCF code 150, see /usr/share/vm.sh) => You are happy and your VM is still in a good health. With Proxmox : 1/ @node1: clusvcadm

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-24 Thread Dhaussy Alexandre
Typo, /usr/share/cluster/vm.sh, not /usr/share/vm.sh Le 24/07/2014 14:15, Dhaussy Alexandre a écrit : > In broad outline. > > With RedHat cluster : > 1/ @node1: clusvcadm -M vm:foo -m node2 > 2/ @node2: kill the kvm/foo process > 3/ @node1: clusvcadm fails with non-critical e

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-25 Thread Dhaussy Alexandre
tmar Maurer a écrit : > If kvm migration command fails, how can you assume that the VM is still OK? > > > Le 24/07/2014 14:15, Dhaussy Alexandre a écrit : >>> In broad outline. >>> >>> With RedHat cluster : >>> 1/ @node1: clusvcadm -M vm:foo -m node2

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-28 Thread Dhaussy Alexandre
Le 28/07/2014 06:13, Dietmar Maurer a écrit : > Because there is definitely something wrong. check_running() only tests > if the kvm process is still alive, but cannot test if everything is OK with > the VM. True, but that's exactly what the cluster does, it monitors kvm process, nothing else, and

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-29 Thread Dhaussy Alexandre
Le 29/07/2014 07:20, Dietmar Maurer a écrit : > OK, I changed the behavior: > > https://git.proxmox.com/?p=qemu-server.git;a=commitdiff;h=debe88829e468928271c6d0baf6592b682a70c46 > https://git.proxmox.com/?p=pve-manager.git;a=commitdiff;h=c0a008a8b3e1a4938b10cbd09f7be403ce17f1cb > > Would be great

Re: [PVE-User] HA migration behaviour vs. failures

2014-07-29 Thread Dhaussy Alexandre
Sure, here it is. You would need libparallel-forkmanager-perl. Regards, Alexandre. Le 29/07/2014 17:34, Joel S. | VOZELIA a écrit : > Hi, > > Would you mind sharing that HA control script > /usr/local/bin/bascule_rhcluster.pl? > > > Best regards, > Joel. > > bascule_rhcluster.pl Description: b

[PVE-User] Storage migration fails at 100%

2014-07-30 Thread Dhaussy Alexandre
Hello, I am currently online migrating VM storage from NAS to GlusterFS. I have migrated 6 vdisks so far without problems. But i had this error for three vdisks : transferred: 35507208192 bytes remaining: 127598592 bytes total: 35634806784 bytes progression: 99.64 % transferred: 35549151232 byt

Re: [PVE-User] Storage migration fails at 100%

2014-07-30 Thread Dhaussy Alexandre
Le 30/07/2014 19:52, Dhaussy Alexandre a écrit : > Also, in the UI, i now see the same vdisk twice. > As virtio0, and as unused storage.. BUG ! :) > Nah, no bug here.. i mistaken "virtio0" with "unused disk 0". "unused disk 0" is indeed

[PVE-User] Can't get NoVnc

2014-09-10 Thread Dhaussy Alexandre
Hello, I'm getting a timeout when i try to use novnc (java console OK.) When launching a console, it says "noVNC ready: native WebSockets, canvas rendering" and it asks for Host/Port(empty)/Password. Task status says : no connection : Connection timed out TASK ERROR: command '/bin/nc -l -p 5901

Re: [PVE-User] Can't get NoVnc

2014-09-15 Thread Dhaussy Alexandre
No idea ? or should i blame my english ? x) Le 10/09/2014 16:05, Alexandre DHAUSSY a écrit : > Hello, > > I'm getting a timeout when i try to use novnc (java console OK.) > When launching a console, it says "noVNC ready: native WebSockets, > canvas rendering" and it asks for Host/Port(empty)/Pass

Re: [PVE-User] Can't get NoVnc

2014-09-18 Thread Dhaussy Alexandre
Finally, i asked the network guys to open port 8006, removed the NAT, and it works ! Sweeet :) Le 15/09/2014 16:35, Lex Rivera a écrit : > I have exact same issue. Java console works, but novnc gives me same > error with exit code 1 > > On Mon, Sep 15, 2014, at 07:20 AM, Dhaussy Ale

Re: [PVE-User] about Intel 82576 nic multiple queue issue.

2014-09-22 Thread Dhaussy Alexandre
Hello, I also have some trouble with dropped/underruns paquets...maybe this is related. My cards is Intel 82580 Gigabit. I didn't check for queues... root@proxmoxt2:~# grep eth6 /proc/interrupts 102: 1 0 0 0 0 0 0 0

Re: [PVE-User] about Intel 82576 nic multiple queue issue.

2014-09-22 Thread Dhaussy Alexandre
And beware of link down/ups when changing the channel numbers... :( Alexandre. Le 22/09/2014 19:31, Dhaussy Alexandre a écrit : Hello, I also have some trouble with dropped/underruns paquets...maybe this is related. My cards is Intel 82580 Gigabit. I didn't check for queues... root@prox

[PVE-User] creating/hotplug new drive from command line

2014-09-30 Thread Dhaussy Alexandre
Hello, I have been doing some research to automate this...I found something that works but it seems a bit complicated. Maybe there is a better way ? $ qemu-img-create /glusterstorage/images/150/vm-150-disk-2.raw 30G $ /usr/bin/expect -c 'spawn qm monitor 150; send "drive_add auto file=/gluster

Re: [PVE-User] creating/hotplug new drive from command line

2014-10-02 Thread Dhaussy Alexandre
Cool ! Thank you. Le 01/10/2014 18:01, Alexandre DERUMIER a écrit : > Hi, > > simply add in your vmid.conf > > hotplug:1 > > (or through gui in proxmox 3.3) > > > then > > $ qm set 150 -virtio2 glusterstorage:30 > > > > > - Mail origin

[PVE-User] inconsistency between rgmanager & pve status

2014-10-06 Thread Dhaussy Alexandre
Hello, I had some trouble this morning with my storage nodes and i needed to restart some vms... All vms restarted fine, but one. On proxmox servers i noticed that rgmanager was keeping the vm status as started. root@proxmoxt2:~# clustat | grep 140 pvevm:140 proxmoxt2 started But it was not.

Re: [PVE-User] inconsistency between rgmanager & pve status

2014-10-06 Thread Dhaussy Alexandre
GlusterFS shared directory. Le 06/10/2014 11:49, Dietmar Maurer a écrit : >> This is not the first time it happens, and everytime i have to kill "pvevm >> status" to >> unblock the service. >> Not sure, but maybe a timeout on "pvevm status" would help ? > I assume the VM is on NFS? >

Re: [PVE-User] inconsistency between rgmanager & pve status

2014-10-06 Thread Dhaussy Alexandre
Mounted manually. In /etc/pve/storage.cfg dir: glusterstorage path /glusterstorage shared content images,iso,vztmpl,rootdir,backup maxfiles 1 /etc/fstab : ip1:/glusterstorage /glusterstorage glusterfs defaults,noauto,_netdev,backup-volfile-servers=ip2:ip3 L

Re: [PVE-User] inconsistency between rgmanager & pve status

2014-10-06 Thread Dhaussy Alexandre
Le 06/10/2014 15:08, Dietmar Maurer a écrit : >>> ip1:/glusterstorage /glusterstorage glusterfs >>> defaults,noauto,_netdev,backup-volfile-servers=ip2:ip3 >> OK, we use system mount command for glusterfs, so we run into same problem >> as with NFS - we get blocked by the operating system. I have no

Re: [PVE-User] inconsistency between rgmanager & pve status

2014-10-22 Thread Dhaussy Alexandre
Hello, This problem didn't show up since two weeks.. Last time it hanged, i captured a quick strace, and it seems there was a timeout in a file descriptor. Unfortunatly i killed the process faster than i thought to look in /proc/pid/fd...so not sure if it helps. root@proxmoxt2:~# strace -s 819

[PVE-User] Bad crash pmxcfs

2014-10-22 Thread Dhaussy Alexandre
Hello, Is there a known issue with pmxcfs crashing ? Oct 19 04:33:58 proxmoxt1 kernel: [2705338.898341] pmxcfs[4003]: segfault at 0 ip (null) sp 7f27c465c818 error 14 in pmxcfs[40+25000] Oct 19 04:34:18 proxmoxt1 rgmanager[2395]: status on pvevm "157" returned 2 (invalid argu

Re: [PVE-User] Bad crash pmxcfs

2014-10-22 Thread Dhaussy Alexandre
Le 22/10/2014 12:10, Dietmar Maurer a écrit : > Never saw that. Is there a way to reprocude that? > >> Is there a known issue with pmxcfs crashing ? >> >> Oct 19 04:33:58 proxmoxt1 kernel: [2705338.898341] pmxcfs[4003]: >> segfault at 0 ip (null) sp 7f27c465c818 error 14 in >> pmxcfs[

Re: [PVE-User] Bad crash pmxcfs

2014-10-23 Thread Dhaussy Alexandre
] pmxcfs[4351]: segfault at 0 ip (null) sp 7feeae141818 error 14 in pmxcfs[40+25000] All the VMs are still up, but cluster services are failed.. Regards, Alexandre. Le 22/10/2014 16:44, Dhaussy Alexandre a écrit : > Le 22/10/2014 12:10, Dietmar Maurer a écrit : >> Neve

Re: [PVE-User] Bad crash pmxcfs

2014-10-23 Thread Dhaussy Alexandre
Le 23/10/2014 12:43, Benjamin Redling a écrit : > Could you check the output of ulimit -n at least? > > Debian (at least up to Wheezy) has a default limit of 1024 open file > handles. That's often a problem on serious workloads. I regularly set it > to 10k. > > Regards, > Benjamin Yes, i raised to

Re: [PVE-User] Disk array on Fiber channel & Thin provisionning & Storage LVM

2014-10-23 Thread Dhaussy Alexandre
> > Do you know if there is a better method concerning the storage to configure > and use regarding a disk array connected on Fiber Channel to benefit from the > thin provisionning option please ? > Are there plans to support lvm level thin provisioning in the future ? We plan to migrate from

Re: [PVE-User] Bad crash pmxcfs

2014-10-24 Thread Dhaussy Alexandre
Le 24/10/2014 11:23, Dietmar Maurer a écrit : >> Maybe it should be tuned by default, especially when proxmox is used as >> storage >> server. > Not sure why you run into that limit? Are there many connections to the API > on that server? I'm not sure if it's the proper way to count open files,

Re: [PVE-User] inconsistency between rgmanager & pve status

2014-10-27 Thread Dhaussy Alexandre
I think there is a good chance that this problem ("pvevm status" hanging) also came from insufficent open files limit. Le 22/10/2014 11:37, Dhaussy Alexandre a écrit : > Hello, > > This problem didn't show up since two weeks.. > > Last time it hanged, i capture

[PVE-User] bnx2x issue with kernel 3.10

2014-12-23 Thread Dhaussy Alexandre
Hello, I am testing proxmox on HP blade servers, kernel 3.10 from pve test repository. There is an issue with bnx2x driver, actually half of the ethernet ports aren't working (link DOWN.) It works fine when i reboot with kernel 2.6.32 (from pve no-subscription repository.) The bug is reported

Re: [PVE-User] bnx2x issue with kernel 3.10

2014-12-24 Thread Dhaussy Alexandre
Le 24/12/2014 15:17, Dietmar Maurer a écrit : >> As seen in the comments, kmod-bnx2x-1.710.51-3.el7_0.x86_64 seems to fix >> the issue. >> Could this driver be included in next 3.10 pve kernel ? > I compiled a new driver for that kernel. Please can you test? Yes, now it works ! Thank you. root@pr

[PVE-User] disk devices not properly cleaned up after removing vm

2015-04-24 Thread Dhaussy Alexandre
Hello, Using proxmox 3.4 with SAN shared storage (LVM.) It seems i have a problem after deleting several vms (from GUI.) lvs are properly removed but devices are still remaining there on random nodes. root@proxmoxt7:~# clustat Cluster Status for proxmox @ Fri Apr 24 16:42:51 2015 Member Status:

Re: [PVE-User] disk devices not properly cleaned up after removing vm

2015-05-04 Thread Dhaussy Alexandre
Hello, I'm back at work, sorry for the late response. I have cleaned up the devices, so here's another example from now : After stopping vms 103 & 104 : root@proxmoxt9:~# lvs LVVG Attr LSize Pool Origin Data% Move Log Copy% Convert vm-100-disk-1 T_proxmox_1 -

Re: [PVE-User] disk devices not properly cleaned up after removing vm

2015-05-05 Thread Dhaussy Alexandre
SAN / LVM shared storage. But i thought it was more related to device mapper or udev ? Multipath isn't aware of what's inside pvs or vgs. root@proxmoxt9:/usr/share/perl5/PVE/Storage# multipath -ll VSCVNX4_T_proxmox_2 (3600014400010202fb54185fd9298) dm-4 EMC,Invista size=2.0T features='1 queue

Re: [PVE-User] disk devices not properly cleaned up after removing vm

2015-05-05 Thread Dhaussy Alexandre
I see, deactivate_volumes() is called in vm_stop_cleanup or migrate/phase3_cleanup(). I think i get the problem. When the vm fails to start, the cleanup is missing. The remaining devices are there because i had errors at first (TASK ERROR: minimum memory must be 1024MB.) QemuServer.pm 4166

[PVE-User] persistent net rules not generated

2016-05-12 Thread Dhaussy Alexandre
Hello, Today i installed a node with proxmox 4.2 iso. After struggling with network problems, i noticed that udev persistent network rules had not been generated at first boot, and my network interfaces were in the wrong order. Usually /etc/udev/rules.d/70-persistent-net.rules is generated with

[PVE-User] proxmox 4 : HA services won't start after an error

2016-08-31 Thread Dhaussy Alexandre
Hello, Today i created a VM with 512mo and numa activated. After starting the VM, it threw up an error "minimum memory must be 1024MB". (Which is ok..) In the Summary tab, the VM said "HA Managed : Yes, State : error, Group: none" And in this state, there was no way to restart the service from t

Re: [PVE-User] proxmox 4 : HA services won't start after an error

2016-09-01 Thread Dhaussy Alexandre
Ok i see, thanks for the quick answer. I found the checkbox to disable a service (in Datacenter/HA/Ressources.) Just a thought, but it would be nice to be able to control that directly from the VM menu. Le 01/09/2016 à 06:49, Dietmar Maurer a écrit : >> I know that to start the VM again, the wor

Re: [PVE-User] proxmox 4 : HA services won't start after an error

2016-09-01 Thread Dhaussy Alexandre
Still no picture.. Well, http://imgur.com/a/40b8u Le 01/09/2016 à 10:36, Dhaussy Alexandre a écrit : > Ok i see, thanks for the quick answer. > I found the checkbox to disable a service (in Datacenter/HA/Ressources.) > > Just a thought, but it would be nice to be able to control t

Re: [PVE-User] Hotplug Memory boots VM with 1GB

2016-09-06 Thread Dhaussy Alexandre
Hello, I had the same problem today while migrating from proxmox 3.. Fortunatly i found this is fixed in recent Redhat kernel ! (2015-07-22 kernel-2.6.32-517.el6) So i just upgraded the kernel on all my vms and everything works fine. \o/ Here's the bugzilla : https://bugzilla.redhat.com/show_bu

[PVE-User] Migrating a vmdk from NAS to LVM

2016-09-09 Thread Dhaussy Alexandre
Hello, I'm trying to "storage move" disks from NFS/vmdk to LVM/raw. But it doesn't seems to work within proxmox GUI. Here's the error message: ## Task viewer: VM 410 - Move disk create full clone of drive virtio0 (isilon:410/vm-410-disk-1.vmdk) Rounding up size to fu

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-09-09 Thread Dhaussy Alexandre
mox2/testmox2-flat.vmdk file format: raw virtual size: 41G (44023414784 bytes) disk size: 2.8G Le 09/09/2016 à 16:54, Alexandre DERUMIER a écrit : > Hi, > > Seem to be a proxmox bug. > What is the size of lvm volume generated by proxmox ? > > - Mail original - > De

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-09-12 Thread Dhaussy Alexandre
Le 12/09/2016 à 08:55, Alexandre DERUMIER a écrit : > can you provide the output of > > "qemu-img info youroriginal.vmdk" file ? De: "Dhaussy Alexandre" > À: "proxmoxve" > Envoyé: Vendredi 9 Septembre 2016 17:06:23 > > # qemu-img info /nas/p

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-09-12 Thread Dhaussy Alexandre
> Here's the content of the vmdk descriptor file : root@proxmoxt30:/nas/proxmox/images/410# cat /nas/proxmox/testmox2/testmox2.vmdk # Disk DescriptorFile version=3 encoding="UTF-8" CID=b4f54854 parentCID= isNativeSnapshot="no" createType="vmfs" # Extent description RW 85983232 VMFS "tes

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-09-12 Thread Dhaussy Alexandre
Curiously, qemu doesn't seem to recognise the vmdk file until i use "-f wmdk" : # qemu-img info -f vmdk /nas/proxmox/testmox2/testmox2.vmdk image: /nas/proxmox/testmox2/testmox2.vmdk file format: vmdk virtual size: 41G (44023414784 bytes) disk size: 2.8G Format specific information: cid: 30

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-09-13 Thread Dhaussy Alexandre
Le 13/09/2016 à 05:30, Alexandre DERUMIER a écrit : > ok, this could be a fast workaround to implement > # qemu-img info -f vmdk /nas/proxmox/testmox2/testmox2.vmdk > Yes, i have made a quick workaround that seems to work. Not sure if it's the way to go.. --- /tmp/Plugin.pm2016-09-13 10:24:11.

[PVE-User] Proxmox 4.2 : Online Migration Successful but VM stops

2016-09-30 Thread Dhaussy Alexandre
Hello, Here's the log of a successfull LIVE migration, but the VM stops afterwards and just before resuming. At this point, the service fails, and is restarted by HA services. * sept. 30 14:13:12 starting migration of VM 463 to node 'proxmoxt32' (x.x.x.x) * * sept. 30 14:13:19 migration sp

Re: [PVE-User] Proxmox 4.2 : Online Migration Successful but VM stops

2016-09-30 Thread Dhaussy Alexandre
Le 30/09/2016 à 16:28, Kevin Lemonnier a écrit : > Firstly thanks, I had no idea that existed.. You're welcome. It's in the wiki : https://pve.proxmox.com/wiki/Hotplug_%28qemu_disk,nic,cpu,memory%29 > Maybe it's your kernel version ? I have 3.16.36-1+deb8u1 inside the VM I use > to test this, an

[PVE-User] Storage migration issue with thin provisionning SAN storage

2016-10-03 Thread Dhaussy Alexandre
Hello, I'm actually migrating more than 1000 Vms from VMware to proxmox, but i'm hitting a major issue with storage migrations.. Actually i'm migrating from datastores VMFS to NFS on VMWare, then from NFS to LVM on Proxmox. LVMs on Proxmox are on top thin provisionned (FC SAN) LUNs. Thin provis

Re: [PVE-User] Storage migration issue with thin provisionning SAN storage

2016-10-04 Thread Dhaussy Alexandre
lexandre, > > If guests are linux you could try use the scsi driver with discard enabled > > fstrim -v / then may make the unused space free on the underlying FS then. > > I don't use LVM but this certainly works with other types of storage.. > > > > > >

Re: [PVE-User] Storage migration issue with thin provisionning SAN storage

2016-10-04 Thread Dhaussy Alexandre
fs > protocol have limitations, I think it'll be fixed in nfs 4.2) > > I have same problem when I migrate from nfs to ceph. > > I'm using discard/triming to retrieve space after migration. > > - Mail original - > De: "Dhaussy Alexandre" >

Re: [PVE-User] Proxmox 4.2 : Online Migration Successful but VM stops

2016-10-04 Thread Dhaussy Alexandre
> Now about migration, maybe is it a qemu bug, but I never hit it. > > Do you have the same problem without HA enabled ? > Can you reproduce it 100% ? > Yes, 100% when memory hotplug enabled. Besides i found an interesting update to qemu-kvm, because i'm using version 2.6-1 on all nodes : pve-qe

Re: [PVE-User] Migrating a vmdk from NAS to LVM

2016-10-06 Thread Dhaussy Alexandre
Le 13/09/2016 à 05:30, Alexandre DERUMIER a écrit : >> ok, this could be a fast workaround to implement >> # qemu-img info -f vmdk /nas/proxmox/testmox2/testmox2.vmdk >> May i send this patch to pve-devel ? > --- /tmp/Plugin.pm2016-09-13 10:24:11.665002256 +0200 > +++ /usr/share/perl5/PVE/Stor

Re: [PVE-User] Feedback wanted - Cluster Dashboard

2016-10-18 Thread Dhaussy Alexandre
It would be nice to have a unified view with all nodes ressources. I was thinknig of doing this with an external metric server (graphite/grafana.) Imho it is not relevant to aggregate ressources of the whole cluster, because there is no mechanism of load balancing. So you could have for ex : some

[PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
Hello, I have a big problem on my cluster (1500 HA VMs), storage is LVM + SAN (around 70 PVs, 2000 LVs) Problems began adding a new node to the cluster… All nodes crashed and rebooted (happended yesterday) After some work I managed to get all back online, but some nodes were down (hardware prob

Re: [PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
I try to remove from ha in the gui, but nothing happends. There are some services in "error" or "fence" state. Now i tried to remove the non-working nodes from the cluster... but i still see those nodes in /etc/pve/ha/manager_status. Le 09/11/2016 à 16:13, Dietmar Maurer a écrit : >> I wanted to

Re: [PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
Typo - delnode on known NON-working nodes. Le 09/11/2016 à 17:32, Alexandre DHAUSSY a écrit : > - delnode on known now-working nodes. ___ pve-user mailing list pve-user@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
al de transport n'est pas connecté We are also investigating on a possible network problem.. Le 09/11/2016 à 17:00, Thomas Lamprecht a écrit : > Hi, > > On 09.11.2016 16:29, Dhaussy Alexandre wrote: >> I try to remove from ha in the gui, but nothing happends. >> Ther

Re: [PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
m re-starting all vms in ha with "ha manager add". Seems to work now... :-/ Le 09/11/2016 à 17:40, Dhaussy Alexandre a écrit : > Sorry my old message was too big... > > Thanks for the input !... > > I have attached manager_status files. > .old is the original file, and .new

Re: [PVE-User] Cluster disaster

2016-11-09 Thread Dhaussy Alexandre
limit is something like 20-20 kb AFAIK). > I cannot promise any deep examination, but I can skim through them and > look what happened in the HA stack, maybe I see something obvious. > >> A new master have been elected. The manager_status file have been >> cleaned up. >&g

Re: [PVE-User] Cluster disaster

2016-11-11 Thread Dhaussy Alexandre
I really hope to find an explanation to all this mess. Because i'm not very confident right now.. So far if i understand all this correctly.. I'm not very found of how watchdog behaves with crm/lrm. To make a comparison with PVE 3 (RedHat cluster), fencing happened on the corosync/cluster commu

Re: [PVE-User] Cluster disaster

2016-11-11 Thread Dhaussy Alexandre
> Do you have a hint why there is no messages in the logs when watchdog > actually seems to trigger fencing ? > Because when a node suddently reboots, i can't be sure if it's the watchdog, > a hardware bug, kernel bug or whatever.. Responding to myself, i find this interesting : Nov 8 10:39:01 p

Re: [PVE-User] Cluster disaster

2016-11-11 Thread Dhaussy Alexandre
> A long shot. Do you have a hardware watchdog enabled in bios? I didn't modify any BIOS parameters, except power management. So I believe it's enabled. hpwdt module (hp ilo watchdog) is not loaded. HP ASR is enabled (10 min timeout.) Ipmi_watchdog is blacklisted. nmi_watchdog is enabled => I hav

Re: [PVE-User] Cluster disaster

2016-11-11 Thread Dhaussy Alexandre
> you lost quorum, and the watchdog expired - that is how the watchdog > based fencing works. I don't expect to loose quorum when _one_ node joins or leave the cluster. Nov 8 10:38:58 proxmoxt20 pmxcfs[22537]: [status] notice: update cluster info (cluster name pxmcluster, version = 14) Nov 8

Re: [PVE-User] Cluster disaster

2016-11-14 Thread Dhaussy Alexandre
Le 11/11/2016 à 19:43, Dietmar Maurer a écrit : > On November 11, 2016 at 6:41 PM Dhaussy Alexandre > wrote: >>> you lost quorum, and the watchdog expired - that is how the watchdog >>> based fencing works. >> I don't expect to loose quorum when _one_ node joi

Re: [PVE-User] Cluster disaster

2016-11-14 Thread Dhaussy Alexandre
Le 14/11/2016 à 12:34, Dietmar Maurer a écrit : >> What i understand so far, is that every state/service change from LRM >> must be acknowledged (cluster-wise) by CRM master. >> So if a multicast disruption occurs, and i assume LRM wouldn't be able >> talk to the CRM MASTER, then it also couldn't r

Re: [PVE-User] Cluster disaster

2016-11-14 Thread Dhaussy Alexandre
Le 14/11/2016 à 12:33, Thomas Lamprecht a écrit : > Hope that helps a bit understanding. :) Sure, thank you for clearing things up. :) I wish i had done this before, but i learned a lot in the last few days... ___ pve-user mailing list pve-user@pve.proxm

[PVE-User] weird memory stats in GUI graphs

2016-11-15 Thread Dhaussy Alexandre
Hello, I just noticed two different values on the node Summary tab : Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB) And graphs : Total RAM : 540.94GB and Usage : 504.53GB The server has 512G + proxmox-ve: 4.3-66. ___ pve-user mailing list pve-u

Re: [PVE-User] weird memory stats in GUI graphs

2016-11-15 Thread Dhaussy Alexandre
> Le 15 nov. 2016 à 19:52, Dietmar Maurer a écrit : >> No, values are correct - it is just the different unit. > > Also see: https://en.wikipedia.org/wiki/Gibibyte > > And yes, I know it is not ideal to display values with different base unit, > but this has technical reasons... > Indeed, i

Re: [PVE-User] Cluster disaster

2016-11-22 Thread Dhaussy Alexandre
nother try and proper testing to see if it fix my issue... Anyhow, i'm open to suggestions or thoughts that could enlighten me... (And sorry for the long story) Le 14/11/2016 à 12:33, Thomas Lamprecht a écrit : On 14.11.2016 11:50, Dhaussy Alexandre wrote: Le 11/11/2016 à 19:43, Dietmar

Re: [PVE-User] Cluster disaster

2016-11-22 Thread Dhaussy Alexandre
Le 22/11/2016 à 17:56, Michael Rasmussen a écrit : > On Tue, 22 Nov 2016 16:35:08 + > Dhaussy Alexandre wrote: > >> I don't know how, but i feel that every node i add to the cluster currently >> slows down LVM scan a little more...until it ends up interfering wi

Re: [PVE-User] Cluster disaster

2016-11-22 Thread Dhaussy Alexandre
quot;, "a|.*|" ] > > On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre > wrote: >> Le 22/11/2016 à 17:56, Michael Rasmussen a écrit : >>> On Tue, 22 Nov 2016 16:35:08 + >>> Dhaussy Alexandre wrote: >>> >>>> I do

Re: [PVE-User] Cluster disaster

2016-11-22 Thread Dhaussy Alexandre
Le 22/11/2016 à 18:48, Michael Rasmussen a écrit : >>> Have you tested your filter rules? >> Yes, i set this filter at install : >> >> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", >> "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ] >> > Does vgscan and lvscan list the

[PVE-User] qemu write cache mode

2017-01-03 Thread Dhaussy Alexandre
Hello, I just spotted something strange with storage cache mode. I set cache=none on all my VMs (i believe this is the default), however qm monitor says "writeback, direct". Am i missing something ? root@proxmoxt20:~# qm monitor 198 Entering Qemu Monitor for VM 198 - type 'help' for help qm> i

Re: [PVE-User] qemu write cache mode

2017-01-03 Thread Dhaussy Alexandre
Le 03/01/2017 à 13:59, Dominik Csapak a écrit : > On 01/03/2017 11:33 AM, Dhaussy Alexandre wrote: >> Hello, >> >> I just spotted something strange with storage cache mode. >> >> I set cache=none on all my VMs (i believe this is the default), however >