Hello All,
We have recently upgraded from GPFS 4.2.3.2 to GPFS 5.0.0-2 about a month ago.
We have not yet converted the 4.2.2.2 filesystem version to 5. ( That is we
have not run the mmchconfig release=LATEST command)
Right after the upgrade, we are seeing many “ps hangs" across the cluster.
10G Ethernet.
Thanks,
Lohit
On May 22, 2018, 11:55 AM -0400, dwayne.h...@med.mun.ca, wrote:
> Hi Lohit,
>
> What type of network are you using on the back end to transfer the GPFS
> traffic?
>
> Best,
> Dwayne
>
> From: gpfsug-discuss-boun...@spectrumscale.org
>
; > > Lohit,
> > >
> > > Just be aware that exporting the data from GPFS via SMB requires a SERVER
> > > license for the node in question. You’ve mentioned client a few times
> > > now. :)
> > >
> > > --
> > &
ia SMB requires a SERVER
> license for the node in question. You’ve mentioned client a few times now. :)
>
> --
> Stephen
>
>
>
> > On May 15, 2018, at 6:48 PM, Lohit Valleru <vall...@cbio.mskcc.org> wrote:
> >
> > Thanks Christof.
> >
> > The
Hello All,
Has anyone tried serving SMB export of GPFS mounts from a SMB server on GPFS
client? Is it supported and does it lead to any issues?
I understand that i will not need a redundant SMB server configuration.
I could use CES, but CES does not support follow-symlinks outside respective
Thank Dwayne.
I don’t think, we are facing anything else from network perspective as of now.
We were seeing deadlocks initially when we upgraded to 5.0, but it might not be
because of network.
We also see deadlocks now, but they are mostly caused due to high waiters i
believe. I have temporarily
Hello All,
I read from the below link, that it is now possible to export remote mounts
over NFS/SMB.
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_protocoloverremoteclu.htm
I am thinking of using a single CES protocol cluster, with remote
We do run Singularity + GPFS, on our production HPC clusters.
Most of the time things are fine without any issues.
However, i do see a significant performance loss when running some applications
on singularity containers with GPFS.
As of now, the applications that have severe performance issues
Thanks Simon.
I will make sure i am careful about the CES root and test nfs exporting more
than 2 remote file systems.
Regards,
Lohit
On Apr 30, 2018, 5:57 PM -0400, Simon Thompson (IT Research Support)
, wrote:
> You have been able to do this for some time, though I
Thanks Simon.
Currently, we are thinking of using the same remote filesystem for both NFS/SMB
exports.
I do have a related question with respect to SMB and AD integration on
user-defined authentication.
I have seen a past discussion from you on the usergroup regarding a similar
integration, but
Thanks Mathiaz,
Yes i do understand the concern, that if one of the remote file systems go down
abruptly - the others will go down too.
However, i suppose we could bring down one of the filesystems before a planned
downtime?
For example, by unexporting the filesystems on NFS/SMB before the
Thanks Bryan.
Yes i do understand it now, with respect to multi clusters reading the same
file and metanode flapping.
Will make sure the workload design will prevent metanode flapping.
Regards,
Lohit
On May 3, 2018, 11:15 AM -0400, Bryan Banister ,
wrote:
> Hi Lohit,
Thanks Brian,
May i know, if you could explain a bit more on the metadata updates issue?
I am not sure i exactly understand on why the metadata updates would fail
between filesystems/between clusters - since every remote cluster will have its
own metadata pool/servers.
I suppose the metadata
Hello All,
I am trying to export a single remote filesystem over NFS/SMB using GPFS CES. (
GPFS 5.0.0.2 and CentOS 7 ).
We need NFS exports to be accessible on client nodes, that use public key
authentication and ldap authorization. I already have this working with a
previous CES setup on
Hi All,
I am trying to figure out a GPFS tiering architecture with flash storage in
front end and near line storage as backend, for Supercomputing
The Backend storage will be a GPFS storage on near line of about 8-10PB. The
backend storage will/can be tuned to give out large streaming
Hi all,
I wanted to know, how does mmap interact with GPFS pagepool with respect to
filesystem block-size?
Does the efficiency depend on the mmap read size and the block-size of the
filesystem even if all the data is cached in pagepool?
GPFS 4.2.3.2 and CentOS7.
Here is what i observed:
I
Thank you.
I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS
clusters that we use. Its just the data pool that is on Near-Line Rotating
disks.
I understand that AFM might not be able to solve the issue, and I will try and
see if file heat works for migrating the
Thanks a lot Sven.
I was trying out all the scenarios that Ray mentioned, with respect to lroc and
all flash GPFS cluster and nothing seemed to be effective.
As of now, we are deploying a new test cluster on GPFS 5.0 and it would be good
to know the respective features that could be enabled and
Thanks, I will try the file heat feature but i am really not sure, if it would
work - since the code can access cold files too, and not necessarily files
recently accessed/hot files.
With respect to LROC. Let me explain as below:
The use case is that -
The code initially reads headers (small
Thanks Mark,
I did not know, we could explicitly mention sub-block size when creating File
system. It is no-where mentioned in the “man mmcrfs”.
Is this a new GPFS 5.0 feature?
Also, i see from the “man mmcrfs” that the default sub-block size for 8M and
16M is 16K.
Hello Everyone,
I am a little bit confused with the number of sub-blocks per block-size of 16M
in GPFS 5.0.
In the below documentation, it mentions that the number of sub-blocks per block
is 16K, but "only for Spectrum Scale RAID"
gt; > > > > > i would get the most performance for both random/sequential
> > > > > > > > > reads from 16M than the smaller block-sizes.
> > > > > > > > > With GPFS 5.0, i made use the 1024 sub-blocks instead of 32
> >
Also - You could just upgrade one of the clients to this version, and test to
see if the hang still occurs.
You do not have to upgrade the NSD servers, to test.
Regards,
Lohit
On Nov 2, 2018, 12:29 PM -0400, vall...@cbio.mskcc.org, wrote:
> Yes,
>
> We have upgraded to 5.0.1-0.5, which has the
Yes,
We have upgraded to 5.0.1-0.5, which has the patch for the issue.
The related IBM case number was : TS001010674
Regards,
Lohit
On Nov 2, 2018, 12:27 PM -0400, Mazurkova, Svetlana/Information Systems
, wrote:
> Hi Damir,
>
> It was related to specific user jobs and mmap (?). We opened PMR
that is not good.
> > >
> > > With the way i see things now -
> > > I believe it could be best if the application does random reads of 4k/1M
> > > from pagepool but some how does 16M from rotating disks.
> > >
> > > I don’t see any way of doing t
n 1M.
> > > > > > It gives the best performance when reading from local disk, with 4K
> > > > > > block size filesystem.
> > > > > >
> > > > > > What i mean by performance when it comes to this workload
400, Lohit Valleru , wrote:
> Hey Sven,
>
> This is regarding mmap issues and GPFS.
> We had discussed previously of experimenting with GPFS 5.
>
> I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2
>
> I am yet to experiment with mmap performance, but before
I had to do this twice too. Once i had to copy a 4 PB filesystem as fast as
possible when NSD disk descriptors were corrupted and shutting down GPFS would
have led to me loosing those files forever, and the other was a regular
maintenance but had to copy similar data in less time.
In both the
Well, reading the user-defined authentication documentation again. It is
basically left to sysadmins to deal with authentication and it looks like it
would not be so much of a hack, to customize smb on CES nodes according to our
needs.
I will see if i could do this without much trouble.
Thank you Marc. I was just trying to suggest another approach to this email
thread.
However i believe, we cannot run mmfind/mmapplypolicy with remote filesystems
and can only be run on the owning cluster? In our clusters - All the gpfs
clients are generally in there own compute clusters and
We have many current usernames from LDAP that do not exactly match with the
usernames from AD.
Unfortunately, i guess CES SMB will need us to use either AD or LDAP or use the
same usernames in both AD and LDAP.
I have been looking for a solution where could map the different usernames from
LDAP
Thanks a lot Andrew.
It does look promising but It does not strike me immediately on how this could
solve the SMB export where user authenticates with an AD username but the gpfs
files that are present are owned by LDAP username.
May be you are saying that if i enable GPFS to use these scripts
Thank you Andrew.
However, we are not using SMB from the CES cluster but instead running a Redhat
based SMB on a GPFS client of the CES cluster and exporting it from the GPFS
client.
Is the above supported, and not known to cause any issues?
Regards,
Lohit
On Mar 7, 2019, 2:45 PM -0600,
Hello All,
We are thinking of exporting “remote" GPFS mounts on a remote GPFS 5.0 cluster
through a SMB share.
I have heard in a previous thread that it is not a good idea to export NFS/SMB
share on a remote GPFS mount, and make it writable.
The issue that could be caused by making it
Tue, Sep 18, 2018 at 10:31 AM wrote:
> >>>
> >>>> Hello All,
> >>>>
> >>>> This is a continuation to the previous discussion that i had with Sven.
> >>>> However against what i had mentioned previously - i realize that this
> &g
Thanks Christof.
The usecase is just that : it is easier to have symlinks of files/dirs from
various locations/filesystems rather than copying or duplicating that data.
The design from many years was maintaining about 8 PB of NFS filesystem with
thousands of symlinks to various locations and
Hey Sven,
This is regarding mmap issues and GPFS.
We had discussed previously of experimenting with GPFS 5.
I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2
I am yet to experiment with mmap performance, but before that - I am seeing
weird hangs with GPFS 5 and I think it
Thank you everyone for the Inputs.
The answers to some of the questions are as follows:
> From Jez: I've done this a few times in the past in a previous life. In many
> respects it is easier (and faster!) to remap the AD side to the uids already
> on the filesystem.
- Yes we had
Hello Everyone,
We are planning to migrate from LDAP to AD, and one of the best solution was to
change the uidNumber and gidNumber to what SSSD or Centrify would resolve.
May I know, if anyone has come across a tool/tools that can change the
uidNumbers and gidNumbers of billions of files
one: 55-19-2132-4317
> E-mail: ano...@br.ibm.com <mailto:ano...@br.ibm.com>
>
>
> - Original message -
> From: "Valleru, Lohit/Information Systems"
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> To: gpfsug-discuss@spectrumscale.org
> Cc:
>
Hello Everyone,
I am looking for alternative tuning parameters that could do the same job as
tuning the maxblocksize parameter.
One of our users run a deep learning application on GPUs, that does the
following IO pattern:
It needs to read random small sections about 4K in size from about
Hello All,
I would like to discuss or understand on which ethernet networking
switches/architecture seems to work best with GPFS.
We had thought about infiniband, but are not yet ready to move to infiniband
because of the complexity/upgrade and debugging issues that come with it.
Current
42 matches
Mail list logo