Re: [gpfsug-discuss] [Replicated and non replicated data

2018-04-16 Thread Oesterlin, Robert
A DW post from Yuri a few years back talks about it:

https://www.ibm.com/developerworks/community/forums/html/topic?id=4cebdb97-3052-4cf2-abb1-462660a1489c


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


From: "Simon Thompson (IT Research Support)" 
Date: Monday, April 16, 2018 at 3:43 AM
To: "Oesterlin, Robert" , gpfsug main discussion 
list 
Subject: [EXTERNAL] Re: [Replicated and non replicated data

Yeah that did it, it was set to the default value of “no”.

What exactly does “no” mean as opposed to “yes”? The docs
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_tuningguide.htm

Aren’t very forthcoming on this …

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] [Replicated and non replicated data

2018-04-16 Thread Simon Thompson (IT Research Support)
Yeah that did it, it was set to the default value of “no”.

What exactly does “no” mean as opposed to “yes”? The docs
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_tuningguide.htm

Aren’t very forthcoming on this …

(note it looks like we also have to set this in multi-cluster environments in 
client clusters as well)

Simon

From: "robert.oester...@nuance.com" 
Date: Friday, 13 April 2018 at 21:17
To: "gpfsug-discuss@spectrumscale.org" 
Cc: "Simon Thompson (IT Research Support)" 
Subject: Re: [Replicated and non replicated data

Add:

unmountOnDiskFail=meta

To your config. You can add it with “-I” to have it take effect w/o reboot.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From:  on behalf of "Simon Thompson 
(IT Research Support)" 
Reply-To: gpfsug main discussion list 
Date: Friday, April 13, 2018 at 3:06 PM
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [EXTERNAL] [gpfsug-discuss] Replicated and non replicated data

I have a question about file-systems with replicated an non replicated data.

We have a file-system where metadata is set to copies=2 and data copies=2, we 
then use a placement policy to selectively replicate some data only once based 
on file-set. We also place the non-replicated data into a specific pool 
(6tnlsas) to ensure we know where it is placed.

My understanding was that in doing this, if we took the disks with the non 
replicated data offline, we’d still have the FS available for users as the 
metadata is replicated. Sure accessing a non-replicated data file would give an 
IO error, but the rest of the FS should be up.

We had a situation today where we wanted to take stg01 offline today, so tried 
using mmchdisk stop -d …. Once we got to about disk stg01-01_12_12, GPFS would 
refuse to stop any more disks and complain about too many disks, similarly if 
we shutdown the NSD servers hosting the disks, the filesystem would have an 
SGPanic and force unmount.

First, am I correct in thinking that a FS with non-replicated data, but 
replicated metadata should still be accessible (not the non-replicated data) 
when the LUNS hosting it are down?

If so, any suggestions why my FS is panic-ing when we take down the one set of 
disks?

I thought at first we had some non-replicated metadata, tried a mmrestripefs -R 
–metadata-only to force it to ensure 2 replicas, but this didn’t help.

Running 5.0.0.2 on the NSD server nodes.

(First time we went round this we didn’t have a FS descriptor disk, but you can 
see below that we added this)

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Replicated and non replicated data

2018-04-13 Thread Steve Xiao
What is your unmountOnDiskFail configuration setting on the cluster? You 
need to set unmountOnDiskFail to meta if you only have metadata 
replication.

Steve Y. Xiao

> --
> 
> Message: 1
> Date: Fri, 13 Apr 2018 20:05:53 +
> From: "Simon Thompson (IT Research Support)" 
> To: "gpfsug-discuss@spectrumscale.org"
>
> Subject: [gpfsug-discuss] Replicated and non replicated data
> Message-ID: <98f781f7-7063-4293-a5bc-1e8f5a0c9...@bham.ac.uk>
> Content-Type: text/plain; charset="utf-8"
> 
> I have a question about file-systems with replicated an non replicated 
data.
> 
> We have a file-system where metadata is set to copies=2 and data 
> copies=2, we then use a placement policy to selectively replicate 
> some data only once based on file-set. We also place the non-
> replicated data into a specific pool (6tnlsas) to ensure we know 
> where it is placed.
> 
> My understanding was that in doing this, if we took the disks with 
> the non replicated data offline, we?d still have the FS available 
> for users as the metadata is replicated. Sure accessing a non-
> replicated data file would give an IO error, but the rest of the FS 
> should be up.
> 
> We had a situation today where we wanted to take stg01 offline 
> today, so tried using mmchdisk stop -d ?. Once we got to about disk 
> stg01-01_12_12, GPFS would refuse to stop any more disks and 
> complain about too many disks, similarly if we shutdown the NSD 
> servers hosting the disks, the filesystem would have an SGPanic and 
> force unmount.
> 
> First, am I correct in thinking that a FS with non-replicated data, 
> but replicated metadata should still be accessible (not the non-
> replicated data) when the LUNS hosting it are down?
> 
> If so, any suggestions why my FS is panic-ing when we take down the 
> one set of disks?
> 
> I thought at first we had some non-replicated metadata, tried a 
> mmrestripefs -R ?metadata-only to force it to ensure 2 replicas, but
> this didn?t help.
> 
> Running 5.0.0.2 on the NSD server nodes.
> 
> (First time we went round this we didn?t have a FS descriptor disk, 
> but you can see below that we added this)
> 
> Thanks
> 
> Simon
> 
> [root@nsd01 ~]# mmlsdisk castles -L
> disk driver   sector failure holdsholds storage
> name type   size   group metadata data  status 
> availability disk id pool remarks
>   -- ---  - 
> -  ---  -
> CASTLES_GPFS_DESCONLY01 nsd 512 310 no   no 
> ready up 1 systemdesc
> stg01-01_3_3 nsd4096 210 no   yes   ready 
> down   4 6tnlsas
> stg01-01_4_4 nsd4096 210 no   yes   ready 
> down   5 6tnlsas
> stg01-01_5_5 nsd4096 210 no   yes   ready 
> down   6 6tnlsas
> stg01-01_6_6 nsd4096 210 no   yes   ready 
> down   7 6tnlsas
> stg01-01_7_7 nsd4096 210 no   yes   ready 
> down   8 6tnlsas
> stg01-01_8_8 nsd4096 210 no   yes   ready 
> down   9 6tnlsas
> stg01-01_9_9 nsd4096 210 no   yes   ready 
> down  10 6tnlsas
> stg01-01_10_10 nsd4096 210 no   yes   ready 
> down  11 6tnlsas
> stg01-01_11_11 nsd4096 210 no   yes   ready 
> down  12 6tnlsas
> stg01-01_12_12 nsd4096 210 no   yes   ready 
> down  13 6tnlsas
> stg01-01_13_13 nsd4096 210 no   yes   ready 
> down  14 6tnlsas
> stg01-01_14_14 nsd4096 210 no   yes   ready 
> down  15 6tnlsas
> stg01-01_15_15 nsd4096 210 no   yes   ready 
> down  16 6tnlsas
> stg01-01_16_16 nsd4096 210 no   yes   ready 
> down  17 6tnlsas
> stg01-01_17_17 nsd4096 210 no   yes   ready 
> down  18 6tnlsas
> stg01-01_18_18 nsd4096 210 no   yes   ready 
> down  19 6tnlsas
> stg01-01_19_19 nsd4096 210 no   yes   ready 
> down  20 6tnlsas
> stg01-01_20_20 nsd4096 210 no   yes   ready 
> down  21 6tnlsas
> stg01-01_21_21 nsd4096 210 no   yes   ready 
> down  22 6tnlsas
> stg01-01_ssd_54_54 nsd4096 210 yes  noready 
> down  23 system
> stg01-01_ssd_56_56 nsd4096 210 yes  noready 
> down  24 system
> stg02-01_0_0 nsd4096 110 no   yes   ready 
> up25 6tnlsas
> stg02-01_1_1 nsd4096 110 no   yes   ready 
> up26 6tnlsas
> stg02-01_2_2 nsd4096 110 no   yes   ready 
> u

Re: [gpfsug-discuss] [Replicated and non replicated data

2018-04-13 Thread Oesterlin, Robert
Add:

unmountOnDiskFail=meta

To your config. You can add it with “-I” to have it take effect w/o reboot.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From:  on behalf of "Simon Thompson 
(IT Research Support)" 
Reply-To: gpfsug main discussion list 
Date: Friday, April 13, 2018 at 3:06 PM
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [EXTERNAL] [gpfsug-discuss] Replicated and non replicated data

I have a question about file-systems with replicated an non replicated data.

We have a file-system where metadata is set to copies=2 and data copies=2, we 
then use a placement policy to selectively replicate some data only once based 
on file-set. We also place the non-replicated data into a specific pool 
(6tnlsas) to ensure we know where it is placed.

My understanding was that in doing this, if we took the disks with the non 
replicated data offline, we’d still have the FS available for users as the 
metadata is replicated. Sure accessing a non-replicated data file would give an 
IO error, but the rest of the FS should be up.

We had a situation today where we wanted to take stg01 offline today, so tried 
using mmchdisk stop -d …. Once we got to about disk stg01-01_12_12, GPFS would 
refuse to stop any more disks and complain about too many disks, similarly if 
we shutdown the NSD servers hosting the disks, the filesystem would have an 
SGPanic and force unmount.

First, am I correct in thinking that a FS with non-replicated data, but 
replicated metadata should still be accessible (not the non-replicated data) 
when the LUNS hosting it are down?

If so, any suggestions why my FS is panic-ing when we take down the one set of 
disks?

I thought at first we had some non-replicated metadata, tried a mmrestripefs -R 
–metadata-only to force it to ensure 2 replicas, but this didn’t help.

Running 5.0.0.2 on the NSD server nodes.

(First time we went round this we didn’t have a FS descriptor disk, but you can 
see below that we added this)

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss