[gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes
Dear all, through some gpfsperf tests against an ESS block (config as is) I am seeing lots of waiters like NSDThread: on ThCond 0x3FFA800670A0 (FreePTrackCondvar), reason 'wait for free PTrack' That is not on file creation but on writing to an already existing file. what ressource is the system short of here? IMHO it cannot be physical data tracks on pdisks (the test does not allocate any space, just rewrites an existing file)? The only shortage in threads i could see might be Total server worker threads: running 3042, desired 3072, forNSD 2, forGNR 3070, nsdBigBufferSize 16777216 nsdMultiQueue: 512, nsdMultiQueueType: 1, nsdMinWorkerThreads: 3072, nsdMaxWorkerThreads: 3072 where a difference of 30 is between desired and running number of worker threads (but that is only 1% and 30 more would not necessarily make a big difference). Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven Schooß Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes
Hi, Scott, thanks, good to hear that it worked for you. I can at least confirm that GPFS RDMA itself does work between x86-64 clients the ESS here, it appears just nsdperf has an issue in my particular environment. I'll see what IBM support can do for me as Olaf suggested. Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven Schooß Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Scott D <sden...@gmail.com> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 10/24/2017 10:35 PM Subject: Re: [gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes Sent by:gpfsug-discuss-boun...@spectrumscale.org I have run nsdperf with RDMA enabled against an ESS ppc-64 server without any problems. I don't have access to that system at the moment, and it was running a fairly old (4.1.x) version of GPFS, not that that should matter for nsdperf unless that source code has changed since 4.1. Scott Denham Staff Engineer Cray, Inc On Tue, Oct 24, 2017 at 11:49 AM, Uwe Falke <uwefa...@de.ibm.com> wrote: Hi, I am about to run nsdperf for testing the IB fabric in a new system comprising ESS (BE) and Intel-based nodes. nsdperf crashes reliably when invoking ESS nodes and x86-64 nodes in one test using RDMA: client server RDMA x86-64 ppc-64 on crash ppc-64 x86-64 on crash x86-64 ppc-64 off success x86-64 x86-64 on success ppc-64 ppc-64 on success That implies that the nsdperf RDMA test might struggle with BE vs LE. However, I learned from a talk given at a GPFS workshop in Germany in 2015 that RDMA works between Power-BE and Intel boxes. Has anyone made similar or contrary experiences? Is it an nsdperf issue or more general (I have not yet attempted any GPFS mount)? Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven Schooß Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes
I have run nsdperf with RDMA enabled against an ESS ppc-64 server without any problems. I don't have access to that system at the moment, and it was running a fairly old (4.1.x) version of GPFS, not that that should matter for nsdperf unless that source code has changed since 4.1. Scott Denham Staff Engineer Cray, Inc On Tue, Oct 24, 2017 at 11:49 AM, Uwe Falkewrote: > Hi, > I am about to run nsdperf for testing the IB fabric in a new system > comprising ESS (BE) and Intel-based nodes. > nsdperf crashes reliably when invoking ESS nodes and x86-64 nodes in one > test using RDMA: > > client server RDMA > x86-64 ppc-64 on crash > ppc-64 x86-64 on crash > x86-64 ppc-64 off success > x86-64 x86-64 on success > ppc-64 ppc-64 on success > > That implies that the nsdperf RDMA test might struggle with BE vs LE. > However, I learned from a talk given at a GPFS workshop in Germany in 2015 > that RDMA works between Power-BE and Intel boxes. Has anyone made similar > or contrary experiences? Is it an nsdperf issue or more general (I have > not yet attempted any GPFS mount)? > > > > Mit freundlichen Grüßen / Kind regards > > > Dr. Uwe Falke > > IT Specialist > High Performance Computing Services / Integrated Technology Services / > Data Center Services > > > --- > IBM Deutschland > Rathausstr. 7 > 09111 Chemnitz > Phone: +49 371 6978 2165 > Mobile: +49 175 575 2877 > E-Mail: uwefa...@de.ibm.com > > > --- > IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: > Thomas Wolter, Sven Schooß > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 17122 > > > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes
Hi Falk, can you open a PMR for it .. it should be investigated in detail From: "Uwe Falke"To: gpfsug main discussion list Date: 10/24/2017 06:49 PMSubject: [gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodesSent by: gpfsug-discuss-boun...@spectrumscale.orgHi, I am about to run nsdperf for testing the IB fabric in a new system comprising ESS (BE) and Intel-based nodes. nsdperf crashes reliably when invoking ESS nodes and x86-64 nodes in one test using RDMA: client server RDMA x86-64 ppc-64 on crashppc-64 x86-64 on crashx86-64 ppc-64 off successx86-64 x86-64 on successppc-64 ppc-64 on successThat implies that the nsdperf RDMA test might struggle with BE vs LE. However, I learned from a talk given at a GPFS workshop in Germany in 2015 that RDMA works between Power-BE and Intel boxes. Has anyone made similar or contrary experiences? Is it an nsdperf issue or more general (I have not yet attempted any GPFS mount)? Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT SpecialistHigh Performance Computing Services / Integrated Technology Services / Data Center Services---IBM DeutschlandRathausstr. 709111 ChemnitzPhone: +49 371 6978 2165Mobile: +49 175 575 2877E-Mail: uwefa...@de.ibm.com---IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven SchooßSitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 ___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] nsdperf crash testing RDMA between Power BE and Intel nodes
Hi, I am about to run nsdperf for testing the IB fabric in a new system comprising ESS (BE) and Intel-based nodes. nsdperf crashes reliably when invoking ESS nodes and x86-64 nodes in one test using RDMA: client server RDMA x86-64 ppc-64 on crash ppc-64 x86-64 on crash x86-64 ppc-64 off success x86-64 x86-64 on success ppc-64 ppc-64 on success That implies that the nsdperf RDMA test might struggle with BE vs LE. However, I learned from a talk given at a GPFS workshop in Germany in 2015 that RDMA works between Power-BE and Intel boxes. Has anyone made similar or contrary experiences? Is it an nsdperf issue or more general (I have not yet attempted any GPFS mount)? Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven Schooß Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss