>> In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 >> GB switcher, exclusive to communication between the five nodes and the >> Storage .
What is the intel model card ? do you use mtu 9000 ? >>pvestatd[2804]: WARNING: storage 'iudice01' is not online What storage protocol do you use ? nfs/iscsi/lvm ? if nfs, what is your mounts options ? >>After that, if I try to restart the pve daemon, it refuses to . >>If I try to reboot the server, it stops when the PVE daemon should stops, and >>stays there forever . >> >>The only way to reboot any of the nodes is a hard reset ! It's possible that a access to the storage is hanging (stats, vm volume info,...). Normally a check is done to avoid that. (this is the "not online" message you see). The check are : for nfs:: /usr/bin/rpcinfo -p nfsipserver with a timeout of 2sec for iscsi: ping iscsiserverip tcp port 3260 with a timeout of 2sec. So maybe the timeout is too low in proxmox code, when your san is under load. Also, do you have vms hang ? or is it only pvedaemon/manager ? ----- Mail original ----- De: "Fábio Rabelo" <[email protected]> À: "Andreu Sànchez i Costa" <[email protected]> Cc: [email protected] Envoyé: Mardi 12 Mars 2013 12:32:21 Objet: Re: [PVE-User] Unreliable 2013/3/12 Andreu Sànchez i Costa < [email protected] > Hello Fábio, Al 12/03/13 01:00, En/na Fábio Rabelo ha escrit: <blockquote> 2.3 do not have the reliability 1.9 has !!!! I am struggling with it for 3 months, my deadline are gone, and I cannot make it work for more than 3 days without an issue ... I cannot give my opinion about 2.3 but with 2.2.x it works perfectly, I only had to change elevator to deadline cause CFQ had performance problems with our P2000 iSCSI array disk. As other list members asked, what are your main problems? </blockquote> I already described the problems several times here . This is a five node cluster, motherboards dual opteron from Supermicro . Storage uses the same motherboard as the five nodes, but with a 16 3,5 HD slots, with 12 occupied by WD enterprise disks . Storage runs Nas4Free . ( already try Freenas, same result ) Like I said, when I installed PVE 1.9 everything works fine for, now 9 days, and counting . In the five nodes, are embedded 2 network ports, connected to Linksys switcher, I am using it to serve the VMs . In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 GB switcher, exclusive to communication between the five nodes and the Storage . This switcher have no link with anything else . In the Storage, I use one of the embedded ports to manage, and all images are served through 10 GB card . After sometime, between 1 and 3 days the system is working, the nodes stops to talk with the storage . When it happens, the log shows lots of msg like this : Mar 6 17:15:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:15:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:15:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:15:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:09 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:19 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online Mar 6 17:16:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online After that, if I try to restart the pve daemon, it refuses to . If I try to reboot the server, it stops when the PVE daemon should stops, and stays there forever . The only way to reboot any of the nodes is a hard reset ! At first, I my suspects goes to Storage, changed from Freenas to Nas4Free, sane thing, desperation ! Then, for tests, I installed PVE 1.9 In all five nodes ( I have 2 systems running it for 3 years, so issue, this new system are to replace both ) Like I said, 9 days and counting !!! So, there is no problem in the hardware, and there is no problem with Nas4Free ! What left ?!? Fábio Rabelo _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
