Re: [gpfsug-discuss] Anybody running GPFS over iSCSI? -

2018-12-16 Thread Sanchez, Paul
Using iSCSI with Spectrum Scale is definitely do-able.  As with running Scale 
in general, your networking needs to be very solid.  For iSCSI the best 
practice I’m aware of is the dedicated/simple approach described by JF below: 
one subnet per switch (failure domain), nothing fancy like VRRP/HSRP/STP, and 
let multipathd do its job at ensuring that the available paths are the ones 
being used.

We have also had some good experiences using routed iSCSI (which fits the 
rackscale/hyperscale style deployment model too, but this implies that you have 
a good QoS plan to assure that markings are correct and any link which can 
become congested can’t completely starve the dedicated queue you should be 
using for iSCSI.  It’s also a good practice for the other TCP traffic in your 
non-iSCSI queue to use ECN in order to keep switch buffer utilization low.  (As 
of today, I haven’t seen any iSCSI arrays which support ECN.)  If you’re 
sharing arrays with multiple clusters/filesystems (i.e. not a single workload), 
then I would also recommend using iSCSI arrays which support 
per-volume/volume-group QOS limits to avoid noisy-neighbor problems in the 
iSCSI realm.  As of today, there are even 100GbE capable all-flash solutions 
available which work well with Scale.

Lastly, I’d say that iSCSI might not be the future… but NVMeOF hasn’t exactly 
given us many products ready to be the present.  Most of the early offerings in 
this space are under-featured, over-priced, inflexible, proprietary, or 
fragile.  We are successfully using non-standards based NVMe solutions today 
with Scale, but they have much more stringent and sensitive networking 
requirements (e.g. non-routed dedicated networking with PFC for RoCE) in order 
to provide reliable performance.  So far, we’ve found these early offerings 
best-suited for single-workload use cases.  I do expect this to continue to 
develop and improve on price, features, reliability/fragility.

Thx
Paul

From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Jan-Frode Myklebust
Sent: Sunday, December 16, 2018 8:46 AM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Anybody running GPFS over iSCSI? -


I have been running GPFS over iSCSI, and know of customers who are also. 
Probably not in the most demanding environments, but from my experience iSCSI 
works perfectly fine as long as you have a stable network. Having a dedicated 
(simple) storage network for iSCSI is probably a good idea (just like for FC), 
otherwise iSCSI or GPFS is going to look bad when your network admins cause 
problems on the shared network.


-jf
søn. 16. des. 2018 kl. 12:59 skrev Frank Kraemer 
mailto:kraem...@de.ibm.com>>:

Kevin,

Ethernet networking of today is changing very fast as the driving forces are 
the "Hyperscale" datacenters. This big innovation is changing the world and is 
happening right now. You must understand the conversation by breaking down the 
differences between ASICs, FPGAs, and NPUs in modern Ethernet networking.

1) Mellanox has a very good answer here based on the Spectrum-2 chip
http://www.mellanox.com/page/press_release_item?id=1933

2) Broadcom's answer to this is the 12.8 Tb/s StrataXGS Tomahawk 3 Ethernet 
Switch Series
https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56980-series
https://globenewswire.com/news-release/2018/10/24/1626188/0/en/Broadcom-Achieves-Mass-Production-on-Industry-Leading-12-8-Tbps-Tomahawk-3-Ethernet-Switch-Family.html

3) Barefoots Tofinio2 is another valid answer to this problem as it's 
programmable with the P4 language (important for Hyperscale Datacenters)
https://www.barefootnetworks.com/

The P4 language itself is open source. There’s details at 
p4.org, or you can download code at GitHub: 
https://github.com/p4lang/

4) The last newcomer to this party comes from Innovium named Teralynx
https://innovium.com/products/teralynx/
https://innovium.com/2018/03/20/innovium-releases-industrys-most-advanced-switch-software-platform-for-high-performance-data-center-networking-2-2-2-2-2-2/

(Most of the new Cisco switches are powered by the Teralynx silicon, as Cisco 
seems to be late to this game with it's own development.)

So back your question - iSCSI is not the future! NVMe and it's variants is the 
way to go and these new ethernet swichting products does have this in focus.
Due to the performance demands of NVMe, high performance and low latency 
networking is required and Ethernet based RDMA — RoCE, RoCEv2 or iWARP are the 
leading choices.

-frank-

P.S. My Xmas wishlist to the IBM Spectrum Scale development team would be a 
"2019 HighSpeed Ethernet Networking optimization for Spectrum Scale" to make 
use of all these new things and options :-)

Frank Kraemer
IBM Consulting IT Specialist / Client Technical Architect
Am Weiher 24, 65451 Kelsterbach, Germany
mailto:kraem...@de.ibm.com
Mobile +49171-3043699
IBM Germany
___

Re: [gpfsug-discuss] Anybody running GPFS over iSCSI? -

2018-12-16 Thread Frank Kraemer
Kevin,

Ethernet networking of today is changing very fast as the driving forces
are the "Hyperscale" datacenters. This big innovation is changing the world
and is happening right now. You must understand the conversation by
breaking down the differences between ASICs, FPGAs, and NPUs in modern
Ethernet networking.

1) Mellanox has a very good answer here based on the Spectrum-2 chip
http://www.mellanox.com/page/press_release_item?id=1933

2) Broadcom's answer to this is the 12.8 Tb/s StrataXGS Tomahawk 3 Ethernet
Switch Series
https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56980-series
https://globenewswire.com/news-release/2018/10/24/1626188/0/en/Broadcom-Achieves-Mass-Production-on-Industry-Leading-12-8-Tbps-Tomahawk-3-Ethernet-Switch-Family.html

3) Barefoots Tofinio2 is another valid answer to this problem as it's
programmable with the P4 language (important for Hyperscale Datacenters)
https://www.barefootnetworks.com/

The P4 language itself is open source. There’s details at p4.org, or you
can download code at GitHub: https://github.com/p4lang/

4) The last newcomer to this party comes from Innovium named Teralynx
https://innovium.com/products/teralynx/
https://innovium.com/2018/03/20/innovium-releases-industrys-most-advanced-switch-software-platform-for-high-performance-data-center-networking-2-2-2-2-2-2/

(Most of the new Cisco switches are powered by the Teralynx silicon, as
Cisco seems to be late to this game with it's own development.)

So back your question - iSCSI is not the future! NVMe and it's variants is
the way to go and these new ethernet swichting products does have this in
focus.
Due to the performance demands of NVMe, high performance and low latency
networking is required and Ethernet based RDMA — RoCE, RoCEv2 or iWARP are
the leading choices.

-frank-

P.S. My Xmas wishlist to the IBM Spectrum Scale development team would be a
"2019 HighSpeed Ethernet Networking optimization for Spectrum Scale" to
make use of all these new things and options :-)

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Am Weiher 24, 65451 Kelsterbach, Germany
mailto:kraem...@de.ibm.com
Mobile +49171-3043699
IBM Germany
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Anybody running GPFS over iSCSI? -

2018-12-16 Thread Jan-Frode Myklebust
I have been running GPFS over iSCSI, and know of customers who are also.
Probably not in the most demanding environments, but from my experience
iSCSI works perfectly fine as long as you have a stable network. Having a
dedicated (simple) storage network for iSCSI is probably a good idea (just
like for FC), otherwise iSCSI or GPFS is going to look bad when your
network admins cause problems on the shared network.


-jf
søn. 16. des. 2018 kl. 12:59 skrev Frank Kraemer :

> Kevin,
>
> Ethernet networking of today is changing very fast as the driving forces
> are the "Hyperscale" datacenters. This big innovation is changing the world
> and is happening right now. You must understand the conversation by
> breaking down the differences between ASICs, FPGAs, and NPUs in modern
> Ethernet networking.
>
> 1) Mellanox has a very good answer here based on the Spectrum-2 chip
> http://www.mellanox.com/page/press_release_item?id=1933
>
> 2) Broadcom's answer to this is the 12.8 Tb/s StrataXGS Tomahawk 3
> Ethernet Switch Series
>
> https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56980-series
>
> https://globenewswire.com/news-release/2018/10/24/1626188/0/en/Broadcom-Achieves-Mass-Production-on-Industry-Leading-12-8-Tbps-Tomahawk-3-Ethernet-Switch-Family.html
>
> 3) Barefoots Tofinio2 is another valid answer to this problem as it's
> programmable with the P4 language (important for Hyperscale Datacenters)
> https://www.barefootnetworks.com/
>
> The P4 language itself is open source. There’s details at p4.org, or you
> can download code at GitHub: https://github.com/p4lang/
>
> 4) The last newcomer to this party comes from Innovium named Teralynx
> https://innovium.com/products/teralynx/
>
> https://innovium.com/2018/03/20/innovium-releases-industrys-most-advanced-switch-software-platform-for-high-performance-data-center-networking-2-2-2-2-2-2/
>
> (Most of the new Cisco switches are powered by the Teralynx silicon, as
> Cisco seems to be late to this game with it's own development.)
>
> So back your question - iSCSI is not the future! NVMe and it's variants is
> the way to go and these new ethernet swichting products does have this in
> focus.
> Due to the performance demands of NVMe, high performance and low latency
> networking is required and Ethernet based RDMA — RoCE, RoCEv2 or iWARP are
> the leading choices.
>
> -frank-
>
> P.S. My Xmas wishlist to the IBM Spectrum Scale development team would be
> a "2019 HighSpeed Ethernet Networking optimization for Spectrum Scale" to
> make use of all these new things and options :-)
>
> Frank Kraemer
> IBM Consulting IT Specialist / Client Technical Architect
> Am Weiher 24, 65451 Kelsterbach, Germany
> mailto:kraem...@de.ibm.com 
> Mobile +49171-3043699
> IBM Germany
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Anybody running GPFS over iSCSI?

2018-12-16 Thread Jan-Frode Myklebust
I’d be curious to hear if all these arguments against iSCSI shouldn’t also
apply to NSD protocol over TCP/IP?


-jf
man. 17. des. 2018 kl. 01:22 skrev Jonathan Buzzard <
jonathan.buzz...@strath.ac.uk>:

> On 13/12/2018 20:54, Buterbaugh, Kevin L wrote:
>
> [SNIP]
>
> >
> > Two things that I am already aware of are:  1) use jumbo frames, and 2)
> > run iSCSI over it’s own private network.  Other things I should be aware
> > of?!?
> >
>
> Yes, don't do it. Really do not do it unless you have datacenter
> Ethernet switches and adapters. Those are the ones required for FCoE.
> Basically unless you have per channel pause on your Ethernet fabric then
> performance will at some point all go to shit.
>
> So what happens is your NSD makes a whole bunch of requests to read
> blocks off the storage array. Requests are small, response is not. The
> response can overwhelms the Ethernet channel at which point performance
> falls through the floor. Now you might be lucky not to see this,
> especially if you have say have 10Gbps links from the storage and 40Gbps
> links to the NSD servers, but you are taking a gamble. Also the more
> storage arrays you have the more likely you are to see the problem.
>
> To fix this you have two options. The first is datacenter Ethernet with
> per channel pause. This option is expensive, probably in the same ball
> park as fibre channel. At least it was last time I looked, though this
> was some time ago now.
>
> The second option is dedicated links between the storage array and the
> NSD server. That is the cable goes directly between the storage array
> and the NSD server with no switches involved. This option is a
> maintenance nightmare.
>
> At he site where I did this, we had to go option two because I need to
> make it work, We ended up ripping it all out are replacing with FC.
>
> Personally I would see what price you can get DSS storage for, or use
> SAS arrays.
>
> Note iSCSI can in theory work, it's just the issue with GPFS scattering
> stuff to the winds over multiple storage arrays so your ethernet channel
> gets swamped and standard ethernet pauses all the upstream traffic. The
> vast majority of iSCSI use cases don't see this effect.
>
> There is a reason that to run FC over ethernet they had to turn ethernet
> lossless.
>
>
> JAB.
>
> --
> Jonathan A. Buzzard Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Anybody running GPFS over iSCSI?

2018-12-16 Thread Jonathan Buzzard

On 13/12/2018 20:54, Buterbaugh, Kevin L wrote:

[SNIP]



Two things that I am already aware of are:  1) use jumbo frames, and 2) 
run iSCSI over it’s own private network.  Other things I should be aware 
of?!?




Yes, don't do it. Really do not do it unless you have datacenter 
Ethernet switches and adapters. Those are the ones required for FCoE. 
Basically unless you have per channel pause on your Ethernet fabric then 
performance will at some point all go to shit.


So what happens is your NSD makes a whole bunch of requests to read 
blocks off the storage array. Requests are small, response is not. The 
response can overwhelms the Ethernet channel at which point performance 
falls through the floor. Now you might be lucky not to see this, 
especially if you have say have 10Gbps links from the storage and 40Gbps 
links to the NSD servers, but you are taking a gamble. Also the more 
storage arrays you have the more likely you are to see the problem.


To fix this you have two options. The first is datacenter Ethernet with 
per channel pause. This option is expensive, probably in the same ball 
park as fibre channel. At least it was last time I looked, though this 
was some time ago now.


The second option is dedicated links between the storage array and the 
NSD server. That is the cable goes directly between the storage array 
and the NSD server with no switches involved. This option is a 
maintenance nightmare.


At he site where I did this, we had to go option two because I need to 
make it work, We ended up ripping it all out are replacing with FC.


Personally I would see what price you can get DSS storage for, or use 
SAS arrays.


Note iSCSI can in theory work, it's just the issue with GPFS scattering 
stuff to the winds over multiple storage arrays so your ethernet channel 
gets swamped and standard ethernet pauses all the upstream traffic. The 
vast majority of iSCSI use cases don't see this effect.


There is a reason that to run FC over ethernet they had to turn ethernet 
lossless.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss