Re: [lustre-discuss] lustre manual formatting error
On 2015/06/18, 5:05 PM, Alexander I Kulyavtsev a...@fnal.gov wrote: I believe path in /proc/fs/lustre/obdfilter/*/brw_stats got broken in this manual subsection: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifa ct/lustre_manual.xhtml 25.3.4.2. Visualizing Results ... skip ... It is also useful to monitor and record average disk I/O sizes during each test using the 'disk io size' histogram in the file /proc/fs/lustre/obdfilter/ (see Section 32.3.5, ³Monitoring the OST Block I/O Stream² for details). These numbers help identify problems in the system when full-sized I/Os are not submitted to the underlying disk. This may be caused by problems in the device driver or Linux block layer. */brw_stats shall be It is also useful to monitor and record average disk I/O sizes during each test using the 'disk io size' histogram in the file /proc/fs/lustre/obdfilter/*/brw_stats (see Section 32.3.5, ³Monitoring the OST Block I/O Stream² for details). ... Hi Alex, it would be great if you could submit a patch to fix this. Please see: https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of INKozin Sent: Friday, June 19, 2015 5:43 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 5:43 AM *To:* lustre-discuss@lists.lustre.org *Subject:* [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years) Balancing your network performance with what your disks can provide is a whole other level of system design and implementation. You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck. You can stack up enough of both so that the PCIE bus is your bottleneck. Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can. -Ben Evans From: igk...@gmail.com [mailto:igk...@gmail.com] On Behalf Of INKozin Sent: Friday, June 19, 2015 11:10 AM To: Ben Evans Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.commailto:bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.orgmailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of INKozin Sent: Friday, June 19, 2015 5:43 AM To: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
I know that QDR IB gives the best bang for buck currently and that's what we have now. However due to various reasons we are looking at alternatives hence the question. Thank you very much for your information, Ben. On 19 June 2015 at 16:24, Ben Evans bev...@cray.com wrote: It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years) Balancing your network performance with what your disks can provide is a whole other level of system design and implementation. You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck. You can stack up enough of both so that the PCIE bus is your bottleneck. Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can. -Ben Evans *From:* igk...@gmail.com [mailto:igk...@gmail.com] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 11:10 AM *To:* Ben Evans *Cc:* lustre-discuss@lists.lustre.org *Subject:* Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 5:43 AM *To:* lustre-discuss@lists.lustre.org *Subject:* [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
I’d put a set of lnet gateways, possibly mount the FS as NFS or CIFS in one or two places if there is some need to access it from ‘outside’. If it’s something like Corporate IT or Security demanding that everything be homogenous, find some way of charging them for the slowdowns you’ll have. Also note that you’ll have some really weird issues if someone starts running portscanners against Lustre. -Ben Evans From: Jeff Johnson [mailto:jeff.john...@aeoncomputing.com] Sent: Friday, June 19, 2015 12:50 PM To: INKozin Cc: Ben Evans; lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Why choose? Why not install a lnet router QDR-10GbE or dual home your MDS OSS nodes with QDR and a 10GbE nic? --Jeff On Fri, Jun 19, 2015 at 9:10 AM, INKozin i.n.ko...@googlemail.commailto:i.n.ko...@googlemail.com wrote: I know that QDR IB gives the best bang for buck currently and that's what we have now. However due to various reasons we are looking at alternatives hence the question. Thank you very much for your information, Ben. On 19 June 2015 at 16:24, Ben Evans bev...@cray.commailto:bev...@cray.com wrote: It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years) Balancing your network performance with what your disks can provide is a whole other level of system design and implementation. You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck. You can stack up enough of both so that the PCIE bus is your bottleneck. Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can. -Ben Evans From: igk...@gmail.commailto:igk...@gmail.com [mailto:igk...@gmail.commailto:igk...@gmail.com] On Behalf Of INKozin Sent: Friday, June 19, 2015 11:10 AM To: Ben Evans Cc: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.commailto:bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.orgmailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of INKozin Sent: Friday, June 19, 2015 5:43 AM To: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.commailto:jeff.john...@aeoncomputing.com www.aeoncomputing.comhttp://www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
Why choose? Why not install a lnet router QDR-10GbE or dual home your MDS OSS nodes with QDR and a 10GbE nic? --Jeff On Fri, Jun 19, 2015 at 9:10 AM, INKozin i.n.ko...@googlemail.com wrote: I know that QDR IB gives the best bang for buck currently and that's what we have now. However due to various reasons we are looking at alternatives hence the question. Thank you very much for your information, Ben. On 19 June 2015 at 16:24, Ben Evans bev...@cray.com wrote: It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years) Balancing your network performance with what your disks can provide is a whole other level of system design and implementation. You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck. You can stack up enough of both so that the PCIE bus is your bottleneck. Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can. -Ben Evans *From:* igk...@gmail.com [mailto:igk...@gmail.com] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 11:10 AM *To:* Ben Evans *Cc:* lustre-discuss@lists.lustre.org *Subject:* Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 5:43 AM *To:* lustre-discuss@lists.lustre.org *Subject:* [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org