Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Jonathan Buzzard

On 05/10/2020 09:40, Simon Thompson wrote:

I now need to check IBM are not going to throw a wobbler down the
line if I need to get support before deploying it to the DSS-G
nodes :-)


I know there were a lot of other emails about this ...

I think you maybe want to be careful doing this. Whilst it might work
when you setup the DSS-G like this, remember that the memory usage
you are seeing at this point in time may not be what you always need.
For example if you fail-over the recovery groups, you need to have
enough free memory to handle this. E.g. a node failure, or more
likely you are upgrading the building blocks.


I think there is a lack of understanding on exactly how light weight 
keepalived is.


It's the same code as on my routers which are admittedly different CPU's 
(MIPS to be precise) but memory usage (taking out shared memory usage - 
libc for example is loaded anyway) is under 200KB. A bash shell uses 
more memory...




Personally I wouldn't run other things like this on my DSS-G storage
nodes. We do run e.g. nrpe monitoring to collect and report faults,
but this is pretty lightweight compared to everything else. They even
removed support for running the gui packages on the IO nodes - the
early DSS-G builds used the IO nodes for this, but now you need
separate systems for this.



And keepalived is in the same range as nrpe, which you do run :-) I have 
seen nrpe get out of hand and consume significant amounts of resources 
on a machine; the machine was ground to halt due to nrpe. One of the 
standard plugins was failing and sitting their busy waiting. Every five 
minutes it ran again. It of course decided to wait till ~7pm on a Friday 
to go wonky. By mid morning on Saturday it was virtually unresponsive, 
several minutes to get a shell...


I would note that you can run keepalived quite happily on an Ubiquiti 
EdgeRouter X which has a dual core 880 MHz MIPS CPU with 256MB of RAM. 
Mikrotik have models with similar specs that run it too.


On a dual Xeon Gold 6142 machine the usage of RAM and CPU by keepalived 
is noise.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Carl Zetie - ca...@us.ibm.com
>> Mixing DSS and ESS in the same cluster is not a supported configuration.
>
> I know, it means you can never ever migrate your storage from DSS to ESS
> without a full backup and restore. Who with any significant amount of
> storage is going to want to do that? The logic behind this escapes me,
> or perhaps in that scenario IBM might relax the rules for the migration
> period.
>

We do indeed relax the rules temporarily for a migration.

The reasoning behind this rule is for support. Many Scale support issues - 
often the toughest ones - are not about a single node, but about the cluster or 
network as a whole. So if you have a mix of IBM systems with systems supported 
by an OEM (this applies to any OEM by the way, not just Lenovo) and a 
cluster-wide issue, who are you going to call. (Well, in practice you’re going 
to call IBM and we’ll do our best to help you despite limits on our knowledge 
of the OEM systems…).

--CZ



Carl Zetie
Program Director
Offering Management
Spectrum Scale

(919) 473 3318 ][ Research Triangle Park
ca...@us.ibm.com

[signature_386371469]


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Carl Zetie - ca...@us.ibm.com

Jordi wrote:
“Both compute clusters join using multicluster setup the storage cluster. There 
is no need both compute clusters see each other, they only need to see the 
storage cluster. One of the clusters using the 10G, the other cluster using the 
IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing 
is per drive on the DSS, it is covered.”

As a side note: One of the reasons we designed capacity (per Disk or per TB) 
licensing the way we did was specifically so that you could make this kind of 
architectural decision on its own merits, without worrying about a licensing 
penalty.




Carl Zetie
Program Director
Offering Management
Spectrum Scale

(919) 473 3318 ][ Research Triangle Park
ca...@us.ibm.com

[signature_1243111775]


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Jonathan Buzzard


On 05/10/2020 07:27, Jordi Caubet Serrabou wrote:

> Coming to the routing point, is there any reason why you need it ? I
> mean, this is because GPFS trying to connect between compute nodes or
> a reason outside GPFS scope ?
> If the reason is GPFS,  imho best approach - without knowledge of the
> licensing you have - would be to use separate clusters: a storage
> cluster and two compute clusters.

The issue is that individual nodes want to talk to one another on the 
data interface. Which caught me by surprise as the cluster is set to 
admin mode central.


The admin interface runs over ethernet for all nodes on a specific VLAN 
which which is given 802.1p priority 5 (that's Voice, < 10 ms latency 
and jitter). That saved a bunch of switching and cabling as you don't 
need the extra interface for the admin traffic. The cabling already 
significantly restricts airflow for a compute rack as it is, without 
adding a whole bunch more for a barely used admin interface.


It's like the people who wrote the best practice about separate 
interface for the admin traffic know very little about networking to be 
frankly honest. This is all last century technology.


The nodes for undergraduate teaching only have a couple of 1Gb ethernet 
ports which would suck for storage usage. However they also have QDR 
Infiniband. That is because even though undergraduates can't run 
multinode jobs, on the old cluster the Lustre storage was delivered over 
Infiniband, so they got Infiniband cards.


> Both compute clusters join using multicluster setup the storage
> cluster. There is no need both compute clusters see each other, they
> only need to see the storage cluster. One of the clusters using the
> 10G, the other cluster using the IPoIB interface.
> You need at least three quorum nodes in each compute cluster but if
> licensing is per drive on the DSS, it is covered.

Three clusters is starting to get complicated from an admin perspective. 
The biggest issue is coordinating maintenance and keep sufficient quorum 
nodes up.


Maintenance on compute nodes is done via the job scheduler. I know some 
people think this is crazy, but it is in reality extremely elegant.


We can schedule a reboot on a node as soon as the current job has 
finished (usually used for firmware upgrades). Or we can schedule a job 
to run as root (usually for applying updates) as soon as the current job 
has finished. As such we have no way of knowing when that will be for a 
given node, and there is a potential for all three quorum nodes to be 
down at once.


Using this scheme we can seamlessly upgrade the nodes safe in the 
knowledge that a node is either busy and it's running on the current 
configuration or it has been upgraded and is running the new 
configuration. Consequently multinode jobs are guaranteed to have all 
nodes in the job running on the same configuration.


The alternative is to drain the node, but there is only a 23% chance the 
node will become available during working hours leading to a significant 
loss of compute time when doing maintenance compared to our existing 
scheme where the loss of compute time is only as long as the upgrade 
takes to install. Pretty much the only time we have idle nodes is when 
the scheduler is reserving nodes ready to schedule a multi node job.


Right now we have a single cluster with the quorum nodes being the two 
DSS-G nodes and the node used for backup. It is easy to ensure that 
quorum is maintained on these, they also all run real RHEL, where as the 
compute nodes run CentOS.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Simon Thompson
>I now need to check IBM are not going to throw a wobbler down the line 
>if I need to get support before deploying it to the DSS-G nodes :-)

I know there were a lot of other emails about this ...

I think you maybe want to be careful doing this. Whilst it might work when you 
setup the DSS-G like this, remember that the memory usage you are seeing at 
this point in time may not be what you always need. For example if you 
fail-over the recovery groups, you need to have enough free memory to handle 
this. E.g. a node failure, or more likely you are upgrading the building blocks.

Personally I wouldn't run other things like this on my DSS-G storage nodes. We 
do run e.g. nrpe monitoring to collect and report faults, but this is pretty 
lightweight compared to everything else. They even removed support for running 
the gui packages on the IO nodes - the early DSS-G builds used the IO nodes for 
this, but now you need separate systems for this.

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Jordi Caubet Serrabou
Coming to the routing point, is there any reason why you need it ? I mean, 
this is because GPFS trying to connect between compute nodes or a reason 
outside GPFS scope ?
If the reason is GPFS,  imho best approach - without knowledge of the licensing 
you have - would be to use separate clusters: a storage cluster and two compute 
clusters.

Both compute clusters join using multicluster setup the storage cluster. There 
is no need both compute clusters see each other, they only need to see the 
storage cluster. One of the clusters using the 10G, the other cluster using the 
IPoIB interface.
You need at least three quorum nodes in each compute cluster but if licensing 
is per drive on the DSS, it is covered.
--
Jordi Caubet Serrabou
IBM Software Defined Infrastructure (SDI) and Flash Technical Sales Specialist
Technical Computing and HPC IT Specialist and Architect
Ext. Phone: (+34) 679.79.17.84 (internal 55834)
E-mail: jordi.cau...@es.ibm.com

> On 5 Oct 2020, at 08:19, Olaf Weiser  wrote:
> 
> 
> let me add a few comments from some very successful large installations in 
> Eruope
>  
> # InterOP
> Even though (as Luis pointed to) , there is no support statement to run 
> intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for 
> short term purposes, such as e.g migration.
> The reason to not support those DSS/ESS mixed configuration in general is 
> simply driven by the fact, that different release version of DSS/ESS 
> potentially (not in every release, but sometimes)  comes with different 
> driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning,  etc...
> Those changes can have an impact/multiple impacts and therefore, we do not 
> support that in general. Of course -and this would be the advice for every 
> one - if you are faced the need to run a mixed configuration for e.g. a 
> migration and/or e.g. cause of you need to temporary provide space etc... 
> contact you IBM representative and settle to plan that accordingly..
> There will be (likely) some additional requirements/dependencies defined  
> like  driver versions, OS,  and/or Scale versions, but you'll get a chance to 
> run mixed configuration - temporary limited to your specific scenario.
>  
> # Monitoring
> No doubt, monitoring is essential and absolutely needed. - and/but - IBM 
> wants customers to be very sensitive, what kind of additional software 
> (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as 
> well to any other important GPFS node with special roles (e.g. any other NSD 
> server etc)
> But given the fact, that customer's usually manage and monitor their server 
> farms from a central point of control (any 3rd party software), it is common/ 
> best practice , that additionally monitor software(clients/endpoints) has to 
> run on GPFS nodes, so as on ESS nodes too.
>  
> If that way of acceptance applies for DSS too, you may want to double check 
> with Lenovo ?!
>  
>  
> #additionally GW functions
> It would be a hot iron, to general allow routing on IO nodes. Similar to the 
> mixed support approach, the field variety for such a statement would be 
> hard(==impossible) to manage. As we all agree, additional network traffic can 
> (and in fact will) impact GPFS.
> In your special case, the expected data rates seems to me more than ok and 
> acceptable to go with your suggested config (as long workloads remain on that 
> level / monitor it accordingly as you are already obviously doing) 
> Again,to be on the safe side.. contact your IBM representative and I'm sure 
> you 'll find a way..
>  
>  
>  
> kind regards
> olaf
>  
>  
> - Original message -
> From: Jonathan Buzzard 
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> To: gpfsug-discuss@spectrumscale.org
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodes
> Date: Sun, Oct 4, 2020 12:17 PM
>  
> On 04/10/2020 10:29, Luis Bolinches wrote:
> > Hi
> >
> > As stated on the same link you can do remote mounts from each other and
> > be a supported setup.
> >
> > “ You can use the remote mount feature of IBM Spectrum Scale to share
> > file system data across clusters.”
> >
> 
> You can, but imagine I have a DSS-G cluster, with 2PB of storage on it
> which is quite modest in 2020. It is now end of life and for whatever
> reason I decide I want to move to ESS instead.
> 
> What any sane storage admin want to do at this stage is set the ESS, add
> the ESS nodes to the existing cluster on the DSS-G then do a bit of
> mmadddisk/mmdeldisk and sit back while the data is seemlessly moved from
> the DSS-G to the ESS. Admittedly this might take a while :-)
> 
> Then once all the data is moved a bit of mmdelnode and bingo the storage
> has been migrated from DSS-G to ESS with zero downtime.
> 
> As that is not allowed for what I presume are commercial reasons (you
> could do it in reverse and presumable that is what IBM don't want) then
> once you are down the rabbit hole of one type of storage 

Re: [gpfsug-discuss] Services on DSS/ESS nodes

2020-10-05 Thread Olaf Weiser
let me add a few comments from some very successful large installations in Eruope
 
# InterOP
Even though (as Luis pointed to) , there is no support statement to run intermix DSS/ESS in general, it was ~, and is, and will be, ~ allowed for short term purposes, such as e.g migration.
The reason to not support those DSS/ESS mixed configuration in general is simply driven by the fact, that different release version of DSS/ESS potentially (not in every release, but sometimes)  comes with different driver levels, (e.g. MOFED), OS, RDMA-settings, GPFS tuning,  etc...
Those changes can have an impact/multiple impacts and therefore, we do not support that in general. Of course -and this would be the advice for every one - if you are faced the need to run a mixed configuration for e.g. a migration and/or e.g. cause of you need to temporary provide space etc... contact you IBM representative and settle to plan that accordingly..
There will be (likely) some additional requirements/dependencies defined  like  driver versions, OS,  and/or Scale versions, but you'll get a chance to run mixed configuration - temporary limited to your specific scenario.
 
# Monitoring
No doubt, monitoring is essential and absolutely needed. - and/but - IBM wants customers to be very sensitive, what kind of additional software (=workload) gets installed on the ESS-IO servers. BTW, this rule applies as well to any other important GPFS node with special roles (e.g. any other NSD server etc)
But given the fact, that customer's usually manage and monitor their server farms from a central point of control (any 3rd party software), it is common/ best practice , that additionally monitor software(clients/endpoints) has to run on GPFS nodes, so as on ESS nodes too.
 
If that way of acceptance applies for DSS too, you may want to double check with Lenovo ?!
 
 
#additionally GW functions
It would be a hot iron, to general allow routing on IO nodes. Similar to the mixed support approach, the field variety for such a statement would be hard(==impossible) to manage. As we all agree, additional network traffic can (and in fact will) impact GPFS.
In your special case, the expected data rates seems to me more than ok and acceptable to go with your suggested config (as long workloads remain on that level / monitor it accordingly as you are already obviously doing) 
Again,to be on the safe side.. contact your IBM representative and I'm sure you 'll find a way..
 
 
 
kind regards
olaf
 
 
- Original message -From: Jonathan Buzzard Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug-discuss@spectrumscale.orgCc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Services on DSS/ESS nodesDate: Sun, Oct 4, 2020 12:17 PM 
On 04/10/2020 10:29, Luis Bolinches wrote:> Hi>> As stated on the same link you can do remote mounts from each other and> be a supported setup.>> “ You can use the remote mount feature of IBM Spectrum Scale to share> file system data across clusters.”>You can, but imagine I have a DSS-G cluster, with 2PB of storage on itwhich is quite modest in 2020. It is now end of life and for whateverreason I decide I want to move to ESS instead.What any sane storage admin want to do at this stage is set the ESS, addthe ESS nodes to the existing cluster on the DSS-G then do a bit ofmmadddisk/mmdeldisk and sit back while the data is seemlessly moved fromthe DSS-G to the ESS. Admittedly this might take a while :-)Then once all the data is moved a bit of mmdelnode and bingo the storagehas been migrated from DSS-G to ESS with zero downtime.As that is not allowed for what I presume are commercial reasons (youcould do it in reverse and presumable that is what IBM don't want) thenonce you are down the rabbit hole of one type of storage the you are notgoing to switch to a different one.You need to look at it from the perspective of the users. They franklycould not give a monkeys what storage solution you are using. All theycare about is having usable storage and large amounts of downtime toswitch from one storage type to another is not really acceptable.JAB.--Jonathan A. Buzzard                         Tel: +44141-5483420HPC System Administrator, ARCHIE-WeSt.University of Strathclyde, John Anderson Building, Glasgow. G4 0NG___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss  
 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss