Re: [gpfsug-discuss] Upgrading kernel on RHEL

2016-12-01 Thread mark . bergman
In the message dated: Tue, 29 Nov 2016 20:56:25 +,
The pithy ruminations from Luis Bolinches on 
 were:
=> Its been around in certain cases, some kernel <-> storage combination get
=> hit some not
=>  
=> Scott referenced it here https://www.ibm.com/developerworks/community/wikis
=> /home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/
=> Storage+with+GPFS+on+Linux
=>  
=> https://access.redhat.com/solutions/2437991
=>  
=> It happens also on 7.2 and 7.3 ppc64 (not yet on the list of "supported")
=> it does not on 7.1. I can confirm this at least for XIV storage, that it
=> can go up to 1024 only.
=>  
=> I know the FAQ will get updated about this, at least there is a CMVC that
=> states so.
=>  
=> Long short, you create a FS, and you see all your paths die and recover and
=> die and receover and ..., one after another. And it never really gets done.
=> Also if you boot from SAN ... well you can figure it out ;)

Wow, that sounds extremely similar to a kernel bug/incompatibility with GPFS 
that I reported in May:

https://patchwork.kernel.org/patch/9140337/
https://bugs.centos.org/view.php?id=10997

My conclusion is not to apply kernel updates, unless strictly necessary (Dirty 
COW, anyone) or tested & validated with GPFS.

Mark


=>  
=> 
=> --
=> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations
=> 
=> Luis Bolinches
=> Lab Services
=> http://www-03.ibm.com/systems/services/labservices/
=> 
=> IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
=> Phone: +358 503112585
=> 
=> "If you continually give you will continually have." Anonymous
=>  
=>  
=> 
=> - Original message -
=> From: Nathan Harper 
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
=> To: gpfsug main discussion list 
=> Cc:
=> Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL
=> Date: Tue, Nov 29, 2016 10:44 PM
=>  
=> This is the first I've heard of this max_sectors_kb issue, has it
=> already been discussed on the list? Can you point me to any more info?
=> 
=>  
=> 
=> On 29 Nov 2016, at 19:08, Luis Bolinches 
=> wrote:
=>  
=> 
=> Seen that one on 6.8 too
=>  
=> teh 4096 does NOT work if storage is XIV then is 1024
=>  
=> 
=> --
=> Ystävällisin terveisin / Kind regards / Saludos cordiales /
=> Salutations
=> 
=> Luis Bolinches
=> Lab Services
=> http://www-03.ibm.com/systems/services/labservices/
=> 
=> IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
=> Phone: +358 503112585
=> 
=> "If you continually give you will continually have." Anonymous
=>  
=>  
=> 
=> - Original message -
=> From: "Kevin D Johnson" 
=> Sent by: gpfsug-discuss-boun...@spectrumscale.org
=> To: gpfsug-discuss@spectrumscale.org
=> Cc: gpfsug-discuss@spectrumscale.org
=> Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL
=> Date: Tue, Nov 29, 2016 8:48 PM
=>  
=> I have run into the max_sectors_kb issue and creating a file
=> system when moving beyond 3.10.0-327 on RH 7.2 as well.  You
=> either have to reinstall the OS or walk the kernel back to 327
=> via:
=> 
=> https://access.redhat.com/solutions/186763
=>  
=> Kevin D. Johnson, MBA, MAFM
=> Spectrum Computing, Senior Managing Consultant
=> 
=> IBM Certified Deployment Professional - Spectrum Scale V4.1.1
=> IBM Certified Deployment Professional - Cloud Object Storage
=> V3.8
=> IBM Certified Solution Advisor - Spectrum Computing V1
=>  
=> 720.349.6199 - kevin...@us.ibm.com
=>  
=>  
=>  
=> 
=> - Original message -
=> From: "Luis Bolinches" 
=> Sent by: gpfsug-discuss-boun...@spectrumscale.org
=> To: gpfsug-discuss@spectrumscale.org
=> Cc: gpfsug-discuss@spectrumscale.org
=> Subject: Re: [gpfsug-discuss] Upgrading kernel on RHEL
=> Date: Tue, Nov 29, 2016 5:20 AM
=>  
=> My 2 cents
=>  
=> And I am sure different people have different opinions.
=>  
=> New kernels might be problematic.
=>  
=> Now got my fun with RHEL 7.3 kernel and max_sectors_kb for
=> new FS. Is something will come to the FAQ soon. It is
=> already on draft not public.
=>  
=> I guess whatever you do  get a TEST cluster 

Re: [gpfsug-discuss] rpldisk vs deldisk & adddisk

2016-12-01 Thread Matt Weil
I always suspend the disk then use mmrestripefs -m to remove the data.  Then 
delete the disk with mmdeldisk.

 ‐m
  Migrates all critical data off of any suspended
  disk in this file system. Critical data is all
  data that would be lost if currently suspended
  disks were removed.

Can do multiple that why and us the entire cluster to move data if you want.

On 12/1/16 1:10 PM, J. Eric Wonderley wrote:
I have a few misconfigured disk groups and I have a few same size correctly 
configured disk groups.

Is there any (dis)advantage to running mmrpldisk over mmdeldisk and mmadddisk?  
Everytime I have ever run mmdeldisk...it been somewhat painful(even with qos) 
process.



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




The materials in this message are private and may contain Protected Healthcare 
Information or other information of a sensitive nature. If you are not the 
intended recipient, be advised that any unauthorized use, disclosure, copying 
or the taking of any action in reliance on the contents of this information is 
strictly prohibited. If you have received this email in error, please 
immediately notify the sender via telephone or return mail.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Strategies - servers with local SAS disks

2016-12-01 Thread Dean Hildebrand
Hi Bob,

If you mean #4 with 2x data replication...then I would be very wary as the
chance of data loss would be very high given local disk failure rates.  So
I think its really #4 with 3x replication vs #3 with 2x replication (and
raid5/6 in node) (with maybe 3x for metadata).  The space overhead is
somewhat similar, but the rebuild times should be much faster for #3 given
that a failed disk will not place any load on the storage network (as well
there will be less data placed on network).

Dean




From:   "Oesterlin, Robert" 
To: gpfsug main discussion list 
Date:   12/01/2016 04:48 AM
Subject:Re: [gpfsug-discuss] Strategies - servers with local SAS disks
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Some interesting discussion here. Perhaps I should have been a bit clearer
on what I’m looking at here:

I have 12 servers with 70*4TB drives each – so the hardware is free. What’s
the best strategy for using these as GPFS NSD servers, given that I don’t
want to relay on any “bleeding edge” technologies.

1) My first choice would be GNR on commodity hardware – if IBM would give
that to us. :-)
2) Use standard RAID groups with no replication – downside is data
availability of you lose an NSD and RAID group rebuild time with large
disks
3) RAID groups with replication – but I lose a LOT of space (20% for RAID +
50% of what’s left for replication)
4) No raid groups, single NSD per disk, single failure group per servers,
replication. Downside here is I need to restripe every time a disk fails to
get the filesystem back to a good state. Might be OK using QoS to get the
IO impact down
5) FPO doesn’t seem to by me anything, as these are straight NSD servers
and no computation is going on these servers, and I still must live with
the re-stripe.

Option (4) seems the best of the “no great options” I have in front of me.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance




From:  on behalf of Zachary Giles

Reply-To: gpfsug main discussion list 
Date: Wednesday, November 30, 2016 at 10:27 PM
To: gpfsug main discussion list 
Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local
SAS disks

Aaron, Thanks for jumping onboard. It's nice to see others confirming this.
Sometimes I feel alone on this topic.

It's should also be possible to use ZFS with ZVOLs presented as block
devices for a backing store for NSDs. I'm not claiming it's stable, nor a
good idea, nor performant.. but should be possible. :) There are various
reports about it. Might be at least worth looking in to compared to Linux
"md raid" if one truly needs an all-software solution that already exists.
Something to think about and test over.

On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister 
wrote:
 Thanks Zach, I was about to echo similar sentiments and you saved me a ton
 of typing :)

 Bob, I know this doesn't help you today since I'm pretty sure its not yet
 available, but if one scours the interwebs they can find mention of
 something called Mestor.

 There's very very limited information here:

 -
 
https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf

 -
 
https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc
  (slide 20)

 Sounds like if it were available it would fit this use case very well.

 I also had preliminary success with using sheepdog (
 https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a
 similar situation. It's perhaps at a very high conceptually level similar
 to Mestor. You erasure code your data across the nodes w/ the SAS disks
 and then present those block devices to your NSD servers. I proved it
 could work but never tried to to much with it because the requirements
 changed.

 My money would be on your first option-- creating local RAIDs and then
 replicating to give you availability in the event a node goes offline.

 -Aaron


 On 11/30/16 10:59 PM, Zachary Giles wrote:
  Just remember that replication protects against data availability, not
  integrity. GPFS still requires the underlying block device to return
  good data.

  If you're using it on plain disks (SAS or SSD), and the drive returns
  corrupt data, GPFS won't know any better and just deliver it to the
  client. Further, if you do a partial read followed by a write, both
  replicas could be destroyed. There's also no efficient way to force use
  of a second replica if you realize the first is bad, short of taking the
  first entirely offline. In that case while migrating data, there's no
  good way to prevent read-rewrite of other corrupt data on your drive
  that has the "good copy" while restriping off a faulty drive.

  Ideally RAID would have a goal of only returning data that passed the
  RAID 

Re: [gpfsug-discuss] Strategies - servers with local SAS disks

2016-12-01 Thread Oesterlin, Robert
Yep, I should have added those requirements :-)

1) Yes I care about the data. It’s not scratch but a permanent repository of 
older, less frequently accessed data.
2) Yes, it will be backed up
3) I expect it to grow over time
4) Data integrity requirement: high

Bob Oesterlin
Sr Principal Storage Engineer, Nuance




From:  on behalf of Stephen Ulmer 

Reply-To: gpfsug main discussion list 
Date: Thursday, December 1, 2016 at 7:13 AM
To: gpfsug main discussion list 
Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS 
disks

Just because I don’t think I’ve seen you state it:  (How much) Do you care 
about the data?

Is it scratch? Is it test data that exists elsewhere? Does it ever flow from 
this storage to any other storage? Will it be dubbed business critical two 
years after they swear to you that it’s not important at all? Is it just your 
movie collection? Are you going to back it up? Is it going to grow? Is this 
temporary?

That would inform us about the level of integrity required, which is one of the 
main differentiators for the options you’re considering.

Liberty,

--
Stephen


On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert 
> wrote:

Some interesting discussion here. Perhaps I should have been a bit clearer on 
what I’m looking at here:

I have 12 servers with 70*4TB drives each – so the hardware is free. What’s the 
best strategy for using these as GPFS NSD servers, given that I don’t want to 
relay on any “bleeding edge” technologies.

1) My first choice would be GNR on commodity hardware – if IBM would give that 
to us. :-)
2) Use standard RAID groups with no replication – downside is data availability 
of you lose an NSD and RAID group rebuild time with large disks
3) RAID groups with replication – but I lose a LOT of space (20% for RAID + 50% 
of what’s left for replication)
4) No raid groups, single NSD per disk, single failure group per servers, 
replication. Downside here is I need to restripe every time a disk fails to get 
the filesystem back to a good state. Might be OK using QoS to get the IO impact 
down
5) FPO doesn’t seem to by me anything, as these are straight NSD servers and no 
computation is going on these servers, and I still must live with the re-stripe.

Option (4) seems the best of the “no great options” I have in front of me.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance





From: 
>
 on behalf of Zachary Giles >
Reply-To: gpfsug main discussion list 
>
Date: Wednesday, November 30, 2016 at 10:27 PM
To: gpfsug main discussion list 
>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS 
disks

Aaron, Thanks for jumping onboard. It's nice to see others confirming this. 
Sometimes I feel alone on this topic.

It's should also be possible to use ZFS with ZVOLs presented as block devices 
for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, 
nor performant.. but should be possible. :) There are various reports about it. 
Might be at least worth looking in to compared to Linux "md raid" if one truly 
needs an all-software solution that already exists.  Something to think about 
and test over.

On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister 
> wrote:
Thanks Zach, I was about to echo similar sentiments and you saved me a ton of 
typing :)

Bob, I know this doesn't help you today since I'm pretty sure its not yet 
available, but if one scours the interwebs they can find mention of something 
called Mestor.

There's very very limited information here:

- 
https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf
- 
https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc
 (slide 20)

Sounds like if it were available it would fit 

Re: [gpfsug-discuss] Strategies - servers with local SAS disks

2016-12-01 Thread Stephen Ulmer
Just because I don’t think I’ve seen you state it:  (How much) Do you care 
about the data?

Is it scratch? Is it test data that exists elsewhere? Does it ever flow from 
this storage to any other storage? Will it be dubbed business critical two 
years after they swear to you that it’s not important at all? Is it just your 
movie collection? Are you going to back it up? Is it going to grow? Is this 
temporary?

That would inform us about the level of integrity required, which is one of the 
main differentiators for the options you’re considering.

Liberty,

-- 
Stephen



> On Dec 1, 2016, at 7:47 AM, Oesterlin, Robert  
> wrote:
> 
> Some interesting discussion here. Perhaps I should have been a bit clearer on 
> what I’m looking at here:
>  
> I have 12 servers with 70*4TB drives each – so the hardware is free. What’s 
> the best strategy for using these as GPFS NSD servers, given that I don’t 
> want to relay on any “bleeding edge” technologies.
>  
> 1) My first choice would be GNR on commodity hardware – if IBM would give 
> that to us. :-)
> 2) Use standard RAID groups with no replication – downside is data 
> availability of you lose an NSD and RAID group rebuild time with large disks
> 3) RAID groups with replication – but I lose a LOT of space (20% for RAID + 
> 50% of what’s left for replication)
> 4) No raid groups, single NSD per disk, single failure group per servers, 
> replication. Downside here is I need to restripe every time a disk fails to 
> get the filesystem back to a good state. Might be OK using QoS to get the IO 
> impact down
> 5) FPO doesn’t seem to by me anything, as these are straight NSD servers and 
> no computation is going on these servers, and I still must live with the 
> re-stripe.
>  
> Option (4) seems the best of the “no great options” I have in front of me.
>  
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 
>  
>  
>  
> From:  > on behalf of Zachary Giles 
> >
> Reply-To: gpfsug main discussion list  >
> Date: Wednesday, November 30, 2016 at 10:27 PM
> To: gpfsug main discussion list  >
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS 
> disks
>  
> Aaron, Thanks for jumping onboard. It's nice to see others confirming this. 
> Sometimes I feel alone on this topic. 
>  
> It's should also be possible to use ZFS with ZVOLs presented as block devices 
> for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, 
> nor performant.. but should be possible. :) There are various reports about 
> it. Might be at least worth looking in to compared to Linux "md raid" if one 
> truly needs an all-software solution that already exists.  Something to think 
> about and test over.
>  
> On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister  > wrote:
> Thanks Zach, I was about to echo similar sentiments and you saved me a ton of 
> typing :)
> 
> Bob, I know this doesn't help you today since I'm pretty sure its not yet 
> available, but if one scours the interwebs they can find mention of something 
> called Mestor.
> 
> There's very very limited information here:
> 
> - 
> https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf
>  
> 
> - 
> https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc
>  
> 
>  (slide 20)
> 
> Sounds like if it were available it would fit this use case very well.
> 
> I also had preliminary success with using sheepdog 
> (https://sheepdog.github.io/sheepdog/ 
> )
>  as a backing store for GPFS in a similar situation. It's perhaps at a very 
> high conceptually level similar to Mestor. You erasure code your data 

Re: [gpfsug-discuss] Strategies - servers with local SAS disks

2016-12-01 Thread Oesterlin, Robert
Some interesting discussion here. Perhaps I should have been a bit clearer on 
what I’m looking at here:

I have 12 servers with 70*4TB drives each – so the hardware is free. What’s the 
best strategy for using these as GPFS NSD servers, given that I don’t want to 
relay on any “bleeding edge” technologies.

1) My first choice would be GNR on commodity hardware – if IBM would give that 
to us. :-)
2) Use standard RAID groups with no replication – downside is data availability 
of you lose an NSD and RAID group rebuild time with large disks
3) RAID groups with replication – but I lose a LOT of space (20% for RAID + 50% 
of what’s left for replication)
4) No raid groups, single NSD per disk, single failure group per servers, 
replication. Downside here is I need to restripe every time a disk fails to get 
the filesystem back to a good state. Might be OK using QoS to get the IO impact 
down
5) FPO doesn’t seem to by me anything, as these are straight NSD servers and no 
computation is going on these servers, and I still must live with the re-stripe.

Option (4) seems the best of the “no great options” I have in front of me.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance




From:  on behalf of Zachary Giles 

Reply-To: gpfsug main discussion list 
Date: Wednesday, November 30, 2016 at 10:27 PM
To: gpfsug main discussion list 
Subject: [EXTERNAL] Re: [gpfsug-discuss] Strategies - servers with local SAS 
disks

Aaron, Thanks for jumping onboard. It's nice to see others confirming this. 
Sometimes I feel alone on this topic.

It's should also be possible to use ZFS with ZVOLs presented as block devices 
for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, 
nor performant.. but should be possible. :) There are various reports about it. 
Might be at least worth looking in to compared to Linux "md raid" if one truly 
needs an all-software solution that already exists.  Something to think about 
and test over.

On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister 
> wrote:
Thanks Zach, I was about to echo similar sentiments and you saved me a ton of 
typing :)

Bob, I know this doesn't help you today since I'm pretty sure its not yet 
available, but if one scours the interwebs they can find mention of something 
called Mestor.

There's very very limited information here:

- 
https://indico.cern.ch/event/531810/contributions/2306222/attachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf
- 
https://www.yumpu.com/en/document/view/5544551/ibm-system-x-gpfs-storage-server-stfc
 (slide 20)

Sounds like if it were available it would fit this use case very well.

I also had preliminary success with using sheepdog 
(https://sheepdog.github.io/sheepdog/)
 as a backing store for GPFS in a similar situation. It's perhaps at a very 
high conceptually level similar to Mestor. You erasure code your data across 
the nodes w/ the SAS disks and then present those block devices to your NSD 
servers. I proved it could work but never tried to to much with it because the 
requirements changed.

My money would be on your first option-- creating local RAIDs and then 
replicating to give you availability in the event a node goes offline.

-Aaron


On 11/30/16 10:59 PM, Zachary Giles wrote:
Just remember that replication protects against data availability, not
integrity. GPFS still requires the underlying block device to return
good data.

If you're using it on plain disks (SAS or SSD), and the drive returns
corrupt data, GPFS won't know any better and just deliver it to the
client. Further, if you do a partial read followed by a write, both
replicas could be destroyed. There's also no efficient way to force use
of a second replica if you realize the first is bad, short of taking the
first entirely offline. In that case while migrating data, there's no
good way to prevent read-rewrite of other corrupt