Re: [gpfsug-discuss] Forcing which node gets expelled?
We hit something like this due to a bug in gskit. We all thought it was networking at first and it took me a fair bit of time to check all that. We have 7 nsd servers and around 400 clients running 4.2.0.4. We are just trying a workaround now that looks promising. The bug will be fixed at some point. Cheers, Greg From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Matt Thorpe Sent: Tuesday, 25 October 2016 11:06 PM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] Forcing which node gets expelled? Hi Bob, That is exactly what I was after, thanks very much! Should buy us a little time so we can resolve our networking issue. Thanks again Matt. From: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: 25 October 2016 13:23 To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Subject: Re: [gpfsug-discuss] Forcing which node gets expelled? If you look at /usr/lpp/mmfs/samples/expelnode.sample you can use this as a base and install this in /var/mmfs/etc on the cluster manager. You can control which of the two nodes get expelled. We use it here to send an alert when a node is expelled. There is also "mmexpelnode" which you can force a particular node to be expelled. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>> on behalf of Matt Thorpe <matt.tho...@bodleian.ox.ac.uk<mailto:matt.tho...@bodleian.ox.ac.uk>> Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Date: Tuesday, October 25, 2016 at 7:05 AM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Subject: [EXTERNAL] [gpfsug-discuss] Forcing which node gets expelled? Hi, We are in the process of diagnosing a networking issue that is causing 2 of our 6 node GPFS cluster to expel each other (it appears they experience a temporary network connection outage and lose contact with each other). At present it's not consistent which gets expelled by the cluster manager, and I wondered if there was any way to always force a specific node to be expelled in this situation? Thanks and best regards, Matt Matt Thorpe | BDLSS Systems Administrator Bodleian Libraries Osney One Building, Osney Mead, Oxford, OX2 0EW matt.tho...@bodleian.ox.ac.uk<mailto:matt.tho...@bodleian.ox.ac.uk> | 01865 (2)80027 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=CwICAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=Q_f6z64tvENxA9ac7rqWFj2jNd5IrpWEXcynJzHFjz4=jqso6xVB-V_zgLba-xjlWwiw3fNRan9NspsVq4PY4nA= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Forcing which node gets expelled?
Hi Bob, That is exactly what I was after, thanks very much! Should buy us a little time so we can resolve our networking issue. Thanks again Matt. From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Oesterlin, Robert Sent: 25 October 2016 13:23 To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] Forcing which node gets expelled? If you look at /usr/lpp/mmfs/samples/expelnode.sample you can use this as a base and install this in /var/mmfs/etc on the cluster manager. You can control which of the two nodes get expelled. We use it here to send an alert when a node is expelled. There is also "mmexpelnode" which you can force a particular node to be expelled. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>> on behalf of Matt Thorpe <matt.tho...@bodleian.ox.ac.uk<mailto:matt.tho...@bodleian.ox.ac.uk>> Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Date: Tuesday, October 25, 2016 at 7:05 AM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Subject: [EXTERNAL] [gpfsug-discuss] Forcing which node gets expelled? Hi, We are in the process of diagnosing a networking issue that is causing 2 of our 6 node GPFS cluster to expel each other (it appears they experience a temporary network connection outage and lose contact with each other). At present it's not consistent which gets expelled by the cluster manager, and I wondered if there was any way to always force a specific node to be expelled in this situation? Thanks and best regards, Matt Matt Thorpe | BDLSS Systems Administrator Bodleian Libraries Osney One Building, Osney Mead, Oxford, OX2 0EW matt.tho...@bodleian.ox.ac.uk<mailto:matt.tho...@bodleian.ox.ac.uk> | 01865 (2)80027 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=CwICAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=Q_f6z64tvENxA9ac7rqWFj2jNd5IrpWEXcynJzHFjz4=jqso6xVB-V_zgLba-xjlWwiw3fNRan9NspsVq4PY4nA= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Forcing which node gets expelled?
All, As Bob Oesterlin indicated, it is possible to define an expel script (see /usr/lpp/mmfs/samples/expelnode.sample) to control which of the two nodes to get expelled. The script can also be used to issue alerts, etc. The current policy used (before the script is invoked) when deciding which node to expel is the following: 1. quorum nodes over non-quorum nodes 2. local nodes over remote nodes 3. manager-capable nodes over non-manager-capable nodes 4. nodes managing more FSs over nodes managing fewer FSs 5. NSD server over non-NSD server Otherwise, expel whoever joined the cluster more recently. The statement below from Dr. Uwe Falke is also correct: addressing the network connectivity is the better long-term approach, but the callback script could be used to control which node to expel. Felipe Felipe Knop k...@us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Uwe Falke" <uwefa...@de.ibm.com> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 10/25/2016 08:32 AM Subject: Re: [gpfsug-discuss] Forcing which node gets expelled? Sent by:gpfsug-discuss-boun...@spectrumscale.org Usually, the cluster mgr, receiving a complaint from a node about another node being gone, checks its own connection to that other node. If that is positive it expells the requester, if not it follows the request and expells the other node. AFAIK, there are some more subtle algorithms in place if managers or quorum nodes are affected. Maybe that can be used to protect certain nodes from getting expelled by assigning some role in the cluster to them. I do however not know these exactly. That means: it is not easily controllable which one gets expelled. It is better to concentrate on fixing your connectivity issues, as GPFS will not feel comfortable in such a unreliable environment anyway. Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Matt Thorpe <matt.tho...@bodleian.ox.ac.uk> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 10/25/2016 02:05 PM Subject:[gpfsug-discuss] Forcing which node gets expelled? Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi, We are in the process of diagnosing a networking issue that is causing 2 of our 6 node GPFS cluster to expel each other (it appears they experience a temporary network connection outage and lose contact with each other). At present it's not consistent which gets expelled by the cluster manager, and I wondered if there was any way to always force a specific node to be expelled in this situation? Thanks and best regards, Matt Matt Thorpe | BDLSS Systems Administrator Bodleian Libraries Osney One Building, Osney Mead, Oxford, OX2 0EW matt.tho...@bodleian.ox.ac.uk | 01865 (2)80027 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss