Thank you Rolly for your comments Previously I used both intel MKL and MPI. MPI (intel) was not running at all so that I switched to Openmpi. current version of my intel MKL library was "l_mkl_2018.1.163"
My linux-OS was Ubuntu-16.04 serever, Is OS also create some problem?? Can you explain Is there any difference between Parallel Studio XE inetel and above intel MKL (above version)?? (sorry , since it was so long time using pw-forum so I forgot that, This is my affiliation) Phanikumar Research scholar Department of Chemical engineering Indian Institute of Technology Kharagpur West Bengal India > Message: 4 > Date: Sun, 10 Dec 2017 09:01:59 +0530 > From: Phanikumar Pentyala <[email protected]> > Subject: [Pw_forum] QE-GPU performance > To: PWSCF Forum <[email protected]> > Message-ID: > <CAOgLYHHDQWV7JeYe17KBTwGwv4NVyNTJ-6XpqKfkVjXYbj8ELQ@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Dear users and developers > > Currently I am using two Tesla K40m cards for my computational work on > quantum espresso (QE). My GPU enabled QE code running very slower than > normal version. My question was weather particular application will be fast > only in some versions of CUDA toolkit? (as mentioned in previous post: > http://qe-forge.org/pipermail/pw_forum/2015-May/106889.html) OR is there > any other reason hindering performance (memory) of GPU? (when I am hitting > top command in my server, option of 'VIRT' showing different values (top > command pasted in attached file)) > > Some error was generating while submitting code that "A high-performance > Open MPI point-to-point messaging module was unable to find any relevant > network interfaces: Module: OpenFabrics (openib) Host: XXXX Another > transport will be used instead, although this may result in lower > performance". Is this MPI thread hindering GPU performance ? > > (P.S: We don't have any Infiband adapter HCA in server) > > > Current details of server are (full details attached): > > Server: FUJITSU PRIMERGY RX2540 M2 > CUDA version: 9.0 > NVIDIA driver: 384.9 > openmpi version: 2.0.4 with intel mkl libraries > QE-gpu version : 5.4.0 > > > Thanks in advance > > Regards > Phanikumar > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://pwscf.org/pipermail/pw_forum/attachments/20171210/ > 91bedf7a/attachment-0001.html > -------------- next part -------------- > ############################################################ > ############################################################ > ########################## > > SERVER architecture information (from "lscpu" command in terminal) > > ############################################################ > ############################################################ > ########################## > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 40 > On-line CPU(s) list: 0-39 > Thread(s) per core: 2 > Core(s) per socket: 10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 79 > Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz > Stepping: 1 > CPU MHz: 1200.000 > BogoMIPS: 4788.53 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0-9,20-29 > NUMA node1 CPU(s): 10-19,30-39 > > > ############################################################ > ############################################################ > ########################## > > After I run device quiry in CUDA_samples I got this information about my > GPU accelerators > > ############################################################ > ############################################################ > ########################## > > CUDA Device Query (Runtime API) version (CUDART static linking) > > Detected 2 CUDA Capable device(s) > > Device 0: "Tesla K40m" > CUDA Driver Version / Runtime Version 9.0 / 9.0 > CUDA Capability Major/Minor version number: 3.5 > Total amount of global memory: 11440 MBytes (11995578368 > bytes) > (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores > GPU Max Clock rate: 745 MHz (0.75 GHz) > Memory Clock rate: 3004 Mhz > Memory Bus Width: 384-bit > L2 Cache Size: 1572864 bytes > Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, > 65536), 3D=(4096, 4096, 4096) > Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers > Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 > layers > Total amount of constant memory: 65536 bytes > Total amount of shared memory per block: 49152 bytes > Total number of registers available per block: 65536 > Warp size: 32 > Maximum number of threads per multiprocessor: 2048 > Maximum number of threads per block: 1024 > Max dimension size of a thread block (x,y,z): (1024, 1024, 64) > Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) > Maximum memory pitch: 2147483647 bytes > Texture alignment: 512 bytes > Concurrent copy and kernel execution: Yes with 2 copy engine(s) > Run time limit on kernels: No > Integrated GPU sharing Host Memory: No > Support host page-locked memory mapping: Yes > Alignment requirement for Surfaces: Yes > Device has ECC support: Enabled > Device supports Unified Addressing (UVA): Yes > Supports Cooperative Kernel Launch: No > Supports MultiDevice Co-op Kernel Launch: No > Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0 > Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() with > device simultaneously) > > > Device 1: "Tesla K40m" > CUDA Driver Version / Runtime Version 9.0 / 9.0 > CUDA Capability Major/Minor version number: 3.5 > Total amount of global memory: 11440 MBytes (11995578368 > bytes) > (15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores > GPU Max Clock rate: 745 MHz (0.75 GHz) > Memory Clock rate: 3004 Mhz > Memory Bus Width: 384-bit > L2 Cache Size: 1572864 bytes > Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, > 65536), 3D=(4096, 4096, 4096) > Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers > Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 > layers > Total amount of constant memory: 65536 bytes > Total amount of shared memory per block: 49152 bytes > Total number of registers available per block: 65536 > Warp size: 32 > Maximum number of threads per multiprocessor: 2048 > Maximum number of threads per block: 1024 > Max dimension size of a thread block (x,y,z): (1024, 1024, 64) > Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) > Maximum memory pitch: 2147483647 bytes > Texture alignment: 512 bytes > Concurrent copy and kernel execution: Yes with 2 copy engine(s) > Run time limit on kernels: No > Integrated GPU sharing Host Memory: No > Support host page-locked memory mapping: Yes > Alignment requirement for Surfaces: Yes > Device has ECC support: Enabled > Device supports Unified Addressing (UVA): Yes > Supports Cooperative Kernel Launch: No > Supports MultiDevice Co-op Kernel Launch: No > Device PCI Domain ID / Bus ID / location ID: 0 / 129 / 0 > Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() with > device simultaneously) > > > Peer access from Tesla K40m (GPU0) -> Tesla K40m (GPU1) : No > > Peer access from Tesla K40m (GPU1) -> Tesla K40m (GPU0) : No > > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime > Version = 9.0, NumDevs = 2 > Result = PASS > > > ############################################################ > ############################################################ > ########################## > > GPU performance after 'nvidia-smi' command in terminal > > ############################################################ > ############################################################ > ########################## > > +----------------------------------------------------------- > ------------------+ > | NVIDIA-SMI 384.90 Driver Version: 384.90 > | > |-------------------------------+----------------------+---- > ------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > |===============================+======================+==== > ==================| > | 0 Tesla K40m Off | 00000000:02:00.0 Off | > 0 | > | N/A 42C P0 75W / 235W | 11381MiB / 11439MiB | 83% > Default | > +-------------------------------+----------------------+---- > ------------------+ > | 1 Tesla K40m Off | 00000000:81:00.0 Off | > 0 | > | N/A 46C P0 75W / 235W | 11380MiB / 11439MiB | 87% > Default | > +-------------------------------+----------------------+---- > ------------------+ > > > ############################################################ > ############################################################ > ########################## > > TOP command if my server > > ############################################################ > ############################################################ > ########################## > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 20019 xxxxx 20 0 0.158t 426080 152952 R 100.3 0.3 36:29.44 > pw-gpu.x > 20023 xxxxx 20 0 0.158t 422380 153328 R 100.0 0.3 36:29.42 > pw-gpu.x > 20025 xxxxx 20 0 0.158t 418256 153376 R 100.0 0.3 36:27.74 > pw-gpu.x > 20042 xxxxx 20 0 0.158t 416912 153104 R 100.0 0.3 36:24.63 > pw-gpu.x > 20050 xxxxx 20 0 0.158t 412564 153084 R 100.0 0.3 36:25.68 > pw-gpu.x > 20064 xxxxx 20 0 0.158t 408012 153100 R 100.0 0.3 36:25.54 > pw-gpu.x > 20098 xxxxx 20 0 0.158t 398404 153436 R 100.0 0.3 36:27.92 > pw-gpu.x > > > ------------------------------ > > Message: 5 > Date: Sun, 10 Dec 2017 17:07:59 +0800 > From: Rolly Ng <[email protected]> > Subject: Re: [Pw_forum] QE-GPU performance > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Dear Phanikumar, > > Please include your affiliation when posting to the forum. > > In my experience with QE-GPU v5.3.0 and v5.4.0, the working combination > of software is, > > 1) Intel PSXE 2017 > > 2) CUDA 6.5 or 7.0 > > 3) Centos 7.1 > > Please try the above combination. > > Regards, > Rolly > > PhD. Research Fellow, > Dept. of Physics & Materials Science, > City University of Hong Kong > Tel: +852 3442 4000 > Fax: +852 3442 0538 > > On 12/10/2017 11:31 AM, Phanikumar Pentyala wrote: > > Dear users and developers > > > > Currently I am using two Tesla K40m cards for my computational work on > > quantum espresso (QE). My GPU enabled QE code running very slower than > > normal version. My question was weather particular application will be > > fast only in some versions of CUDA toolkit? (as mentioned in previous > > post: http://qe-forge.org/pipermail/pw_forum/2015-May/106889.html) OR > > is there any other reason hindering performance (memory) of GPU? (when > > I am hitting top command in my server, option of 'VIRT' showing > > different values (top command pasted in attached file)) > > > > Some error was generating while submitting code that "A > > high-performance Open MPI point-to-point messaging module was unable > > to find any relevant network interfaces: Module: OpenFabrics (openib)? > > Host: XXXX Another transport will be used instead, although this may > > result in lower performance". Is this MPI thread hindering GPU > > performance ? > > > > (P.S: We don't have any Infiband adapter HCA in server) > > > > > > Current details of server are (full details attached): > > > > Server: FUJITSU PRIMERGY RX2540 M2 > > CUDA version: 9.0 > > NVIDIA driver: 384.9 > > openmpi version: 2.0.4 with intel mkl libraries > > QE-gpu version : 5.4.0 > > > > > > Thanks in advance > > > > Regards > > Phanikumar > > > > > > _______________________________________________ > > Pw_forum mailing list > > [email protected] > > http://pwscf.org/mailman/listinfo/pw_forum > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://pwscf.org/pipermail/pw_forum/attachments/20171210/ > 35e7e383/attachment-0001.html > > ------------------------------ > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > > End of Pw_forum Digest, Vol 125, Issue 8 > **************************************** >
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
