[gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

2018-05-22 Thread valleru
Hello All, We have recently upgraded from GPFS 4.2.3.2 to GPFS 5.0.0-2 about a month ago. We have not yet converted the 4.2.2.2 filesystem version to 5. ( That is we have not run the mmchconfig release=LATEST command) Right after the upgrade, we are seeing many “ps hangs" across the cluster.

Re: [gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

2018-05-22 Thread valleru
10G Ethernet. Thanks, Lohit On May 22, 2018, 11:55 AM -0400, dwayne.h...@med.mun.ca, wrote: > Hi Lohit, > > What type of network are you using on the back end to transfer the GPFS > traffic? > > Best, > Dwayne > > From: gpfsug-discuss-boun...@spectrumscale.org >

Re: [gpfsug-discuss] SMB server on GPFS clients and Followsymlinks

2018-05-15 Thread valleru
; > > Lohit, > > > > > > Just be aware that exporting the data from GPFS via SMB requires a SERVER > > > license for the node in question. You’ve mentioned client a few times > > > now. :) > > > > > > -- > > &

Re: [gpfsug-discuss] SMB server on GPFS clients and Followsymlinks

2018-05-15 Thread valleru
ia SMB requires a SERVER > license for the node in question. You’ve mentioned client a few times now. :) > > -- > Stephen > > > > > On May 15, 2018, at 6:48 PM, Lohit Valleru <vall...@cbio.mskcc.org> wrote: > > > > Thanks Christof. > > > > The

[gpfsug-discuss] SMB server on GPFS clients and Followsymlinks

2018-05-15 Thread valleru
Hello All, Has anyone tried serving SMB export of GPFS mounts from a SMB server on GPFS client? Is it supported and does it lead to any issues? I understand that i will not need a redundant SMB server configuration. I could use CES, but CES does not support follow-symlinks outside respective

Re: [gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

2018-05-22 Thread valleru
Thank Dwayne. I don’t think, we are facing anything else from network perspective as of now. We were seeing deadlocks initially when we upgraded to 5.0, but it might not be because of network. We also see deadlocks now, but they are mostly caused due to high waiters i believe. I have temporarily

[gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-04-30 Thread valleru
Hello All, I read from the below link, that it is now possible to export remote mounts over NFS/SMB. https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_protocoloverremoteclu.htm I am thinking of using a single CES protocol cluster, with remote

Re: [gpfsug-discuss] Singularity + GPFS

2018-04-26 Thread valleru
We do run Singularity + GPFS, on our production HPC clusters. Most of the time things are fine without any issues. However, i do see a significant performance loss when running some applications on singularity containers with GPFS. As of now, the applications that have severe performance issues

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-01 Thread valleru
Thanks Simon. I will make sure i am careful about the CES root and test nfs exporting more than 2 remote file systems. Regards, Lohit On Apr 30, 2018, 5:57 PM -0400, Simon Thompson (IT Research Support) , wrote: > You have been able to do this for some time, though I

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-03 Thread valleru
Thanks Simon. Currently, we are thinking of using the same remote filesystem for both NFS/SMB exports. I do have a related question with respect to SMB and AD integration on user-defined authentication. I have seen a past discussion from you on the usergroup regarding a similar integration, but

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-03 Thread valleru
Thanks Mathiaz, Yes i do understand the concern, that if one of the remote file systems go down abruptly - the others will go down too. However, i suppose we could bring down one of the filesystems before a planned downtime? For example, by unexporting the filesystems on NFS/SMB before the

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-03 Thread valleru
Thanks Bryan. Yes i do understand it now, with respect to multi clusters reading the same file and metanode flapping. Will make sure the workload design will prevent metanode flapping. Regards, Lohit On May 3, 2018, 11:15 AM -0400, Bryan Banister , wrote: > Hi Lohit,

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-03 Thread valleru
Thanks Brian, May i know, if you could explain a bit more on the metadata updates issue? I am not sure i exactly understand on why the metadata updates would fail between filesystems/between clusters - since every remote cluster will have its own metadata pool/servers. I suppose the metadata

[gpfsug-discuss] Spectrum Scale CES , SAMBA and AD keytab integration with userdefined authentication

2018-05-03 Thread valleru
Hello All, I am trying to export a single remote filesystem over NFS/SMB using GPFS CES. ( GPFS 5.0.0.2 and CentOS 7 ). We need NFS exports to be accessible on client nodes, that use public key authentication and ldap authorization. I already have this working with a previous CES setup on

[gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage

2018-02-22 Thread valleru
Hi All, I am trying to figure out a GPFS tiering architecture with flash storage in front end and near line storage as backend, for Supercomputing The Backend storage will be a GPFS storage on near line of about 8-10PB. The backend storage will/can be tuned to give out large streaming

[gpfsug-discuss] GPFS, MMAP and Pagepool

2018-02-22 Thread valleru
Hi all, I wanted to know, how does mmap interact with GPFS pagepool with respect to filesystem block-size? Does the efficiency depend on the mmap read size and the block-size of the filesystem even if all the data is cached in pagepool? GPFS 4.2.3.2 and CentOS7. Here is what i observed: I

Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage

2018-02-22 Thread valleru
Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the

Re: [gpfsug-discuss] GPFS, MMAP and Pagepool

2018-02-22 Thread valleru
Thanks a lot Sven. I was trying out all the scenarios that Ray mentioned, with respect to lroc and all flash GPFS cluster and nothing seemed to be effective. As of now, we are deploying a new test cluster on GPFS 5.0 and it would be good to know the respective features that could be enabled and

Re: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage

2018-02-22 Thread valleru
Thanks, I will try the file heat feature but i am really not sure, if it would work - since the code can access cold files too, and not necessarily files recently accessed/hot files. With respect to LROC. Let me explain as below: The use case is that - The code initially reads headers (small

Re: [gpfsug-discuss] sublocks per block in GPFS 5.0

2018-03-30 Thread valleru
Thanks Mark, I did not know, we could explicitly mention sub-block size when creating File system. It is no-where mentioned in the “man mmcrfs”. Is this a new GPFS 5.0 feature? Also, i see from the “man mmcrfs” that the default sub-block size for 8M and 16M is 16K.

[gpfsug-discuss] sublocks per block in GPFS 5.0

2018-03-30 Thread valleru
Hello Everyone, I am a little bit confused with the number of sub-blocks per block-size of 16M in GPFS 5.0. In the below documentation, it mentions that the number of sub-blocks per block is 16K, but "only for Spectrum Scale RAID"

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-27 Thread valleru
gt; > > > > > i would get the most performance for both random/sequential > > > > > > > > > reads from 16M than the smaller block-sizes. > > > > > > > > > With GPFS 5.0, i made use the 1024 sub-blocks instead of 32 > >

Re: [gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

2018-11-02 Thread valleru
Also - You could just upgrade one of the clients to this version, and test to see if the hang still occurs. You do not have to upgrade the NSD servers, to test. Regards, Lohit On Nov 2, 2018, 12:29 PM -0400, vall...@cbio.mskcc.org, wrote: > Yes, > > We have upgraded to 5.0.1-0.5, which has the

Re: [gpfsug-discuss] Critical Hang issues with GPFS 5.0. Downgrading from GPFS 5.0.0-2 to GPFS 4.2.3.2

2018-11-02 Thread valleru
Yes, We have upgraded to 5.0.1-0.5, which has the patch for the issue. The related IBM case number was : TS001010674 Regards, Lohit On Nov 2, 2018, 12:27 PM -0400, Mazurkova, Svetlana/Information Systems , wrote: > Hi Damir, > > It was related to specific user jobs and mmap (?). We opened PMR

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-19 Thread valleru
that is not good. > > > > > > With the way i see things now - > > > I believe it could be best if the application does random reads of 4k/1M > > > from pagepool but some how does 16M from rotating disks. > > > > > > I don’t see any way of doing t

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-19 Thread valleru
n 1M. > > > > > > It gives the best performance when reading from local disk, with 4K > > > > > > block size filesystem. > > > > > > > > > > > > What i mean by performance when it comes to this workload

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-18 Thread valleru
400, Lohit Valleru , wrote: > Hey Sven, > > This is regarding mmap issues and GPFS. > We had discussed previously of experimenting with GPFS 5. > > I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2 > > I am yet to experiment with mmap performance, but before

Re: [gpfsug-discuss] Follow-up: migrating billions of files

2019-03-08 Thread valleru
I had to do this twice too. Once i had to copy a 4 PB filesystem as fast as possible when NSD disk descriptors were corrupted and shutting down GPFS would have led to me loosing those files forever, and the other was a regular maintenance but had to copy similar data in less time. In both the

Re: [gpfsug-discuss] Exporting remote GPFS mounts on a non-ces SMB share

2019-03-08 Thread valleru
Well, reading the user-defined authentication documentation again. It is basically left to sysadmins to deal with authentication and it looks like it would not be so much of a hack, to customize smb on CES nodes according to our needs. I will see if i could do this without much trouble.

Re: [gpfsug-discuss] Follow-up: migrating billions of files

2019-03-08 Thread valleru
Thank you Marc. I was just trying to suggest another approach to this email thread. However i believe, we cannot run mmfind/mmapplypolicy with remote filesystems and can only be run on the owning cluster? In our clusters - All the gpfs clients are generally in there own compute clusters and

Re: [gpfsug-discuss] Exporting remote GPFS mounts on a non-ces SMB share

2019-03-07 Thread valleru
We have many current usernames from LDAP that do not exactly match with the usernames from AD. Unfortunately, i guess CES SMB will need us to use either AD or LDAP or use the same usernames in both AD and LDAP. I have been looking for a solution where could map the different usernames from LDAP

Re: [gpfsug-discuss] Exporting remote GPFS mounts on a non-ces SMB share

2019-03-07 Thread valleru
Thanks a lot Andrew. It does look promising but It does not strike me immediately on how this could solve the SMB export where user authenticates with an AD username but the gpfs files that are present are owned by LDAP username. May be you are saying that if i enable GPFS to use these scripts

Re: [gpfsug-discuss] Exporting remote GPFS mounts on a non-ces SMB share

2019-03-07 Thread valleru
Thank you Andrew. However, we are not using SMB from the CES cluster but instead running a Redhat based SMB on a GPFS client of the CES cluster and exporting it from the GPFS client. Is the above supported, and not known to cause any issues? Regards, Lohit On Mar 7, 2019, 2:45 PM -0600,

[gpfsug-discuss] Exporting remote GPFS mounts on a non-ces SMB share

2019-03-07 Thread valleru
Hello All, We are thinking of exporting “remote" GPFS mounts on a remote GPFS 5.0 cluster through a SMB share. I have heard in a previous thread that it is not a good idea to export NFS/SMB share on a remote GPFS mount, and make it writable. The issue that could be caused by making it

Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 81, Issue 43

2019-11-20 Thread valleru
Tue, Sep 18, 2018 at 10:31 AM wrote: > >>> > >>>> Hello All, > >>>> > >>>> This is a continuation to the previous discussion that i had with Sven. > >>>> However against what i had mentioned previously - i realize that this > &g

Re: [gpfsug-discuss] SMB server on GPFS clients and Followsymlinks

2018-05-15 Thread Lohit Valleru
Thanks Christof. The usecase is just that : it is easier to have symlinks of files/dirs from various locations/filesystems rather than copying or duplicating that data. The design from many years was maintaining about 8 PB of NFS filesystem with thousands of symlinks to various locations and

Re: [gpfsug-discuss] GPFS, MMAP and Pagepool

2018-04-11 Thread Lohit Valleru
Hey Sven, This is regarding mmap issues and GPFS. We had discussed previously of experimenting with GPFS 5. I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2 I am yet to experiment with mmap performance, but before that - I am seeing weird hangs with GPFS 5 and I think it

Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files

2020-06-10 Thread Lohit Valleru
Thank you everyone for the Inputs. The answers to some of the questions are as follows: > From Jez: I've done this a few times in the past in a previous life.  In many > respects it is easier (and faster!) to remap the AD side to the uids already > on the filesystem. - Yes we had

[gpfsug-discuss] Change uidNumber and gidNumber for billions of files

2020-06-08 Thread Lohit Valleru
Hello Everyone, We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve. May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files

Re: [gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers

2020-02-28 Thread Valleru, Lohit/Information Systems
one: 55-19-2132-4317 > E-mail: ano...@br.ibm.com <mailto:ano...@br.ibm.com> > > > - Original message - > From: "Valleru, Lohit/Information Systems" > Sent by: gpfsug-discuss-boun...@spectrumscale.org > To: gpfsug-discuss@spectrumscale.org > Cc: >

[gpfsug-discuss] Maxblocksize tuning alternatives/max number of buffers

2020-02-28 Thread Valleru, Lohit/Information Systems
Hello Everyone, I am looking for alternative tuning parameters that could do the same job as tuning the maxblocksize parameter. One of our users run a deep learning application on GPUs, that does the following IO pattern: It needs to read random small sections about 4K in size from about

[gpfsug-discuss] Network switches/architecture for GPFS

2020-03-20 Thread Valleru, Lohit/Information Systems
Hello All, I would like to discuss or understand on which ethernet networking switches/architecture seems to work best with GPFS. We had thought about infiniband, but are not yet ready to move to infiniband because of the complexity/upgrade and debugging issues that come with it. Current