from:"Simon Thompson \(IT Research Support\)"

Re: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4

2018-06-11 Thread Simon Thompson (IT Research Support)

We have on our DSS-G …

Have you looked at:
https://access.redhat.com/solutions/238533

?

Simon

From:  on behalf of "Sobey, Richard 
A" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Monday, 11 June 2018 at 11:46
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] RHEL updated to 7.5 instead of 7.4

Has anyone ever used subscription-manager to set a release to 7.4 only for the 
system to upgrade to 7.5 anyway?

Also is 7.5 now supported with the 4.2.3.9 PTF or should I concentrate on 
downgrading back to 7.4?

Richard
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

2018-06-04 Thread Simon Thompson (IT Research Support)

Thanks Felipe,

Is it safe to assume that there is intent* for RHEL 7.5 support for Power 9 
when the x86 7.5 release is also made?

Simon
* Insert standard IBM disclaimer about the meaning of intent etc etc


From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of k...@us.ibm.com 
[k...@us.ibm.com]
Sent: 04 June 2018 16:47
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

Simon,

The support statement for Power9 / RHEL 7.4 has not yet been included in the 
FAQ, but I understand that a FAQ update is under way:

4.2.3.8 for the 4.2.3 release

5.0.0.0 for the 5.0.0 release

Kernel level tested with: 4.11.0-44.6.1.el7a

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314



[Inactive hide details for "Simon Thompson (IT Research Support)" ---06/04/2018 
07:21:56 AM---So … I have another question on s]"Simon Thompson (IT Research 
Support)" ---06/04/2018 07:21:56 AM---So … I have another question on support. 
We’ve just ordered some Power 9 nodes, now my understanding

From: "Simon Thompson (IT Research Support)" 
To: gpfsug main discussion list 
Date: 06/04/2018 07:21 AM
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862
Sent by: gpfsug-discuss-boun...@spectrumscale.org




So … I have another question on support.

We’ve just ordered some Power 9 nodes, now my understanding is that with 7.4, 
they require the -ALT kernel 
(https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm)
 which is 4.x based. I don’t see any reference in the Spectrum Scale FAQ to the 
ALT kernels.

So what Scale code is supported for us to run on the Power9s?

Thanks

Simon

From:  on behalf of "k...@us.ibm.com" 

Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Friday, 25 May 2018 at 14:24
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

All,

Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships 
with RHEL 7.5) as a result of applying kernel security patches may open a PMR 
to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be 
provided once the internal tests on RHEL 7.5 have been completed, likely a few 
days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid 
June).

Regards,

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

2018-06-04 Thread Simon Thompson (IT Research Support)

So … I have another question on support.

We’ve just ordered some Power 9 nodes, now my understanding is that with 7.4, 
they require the -ALT kernel 
(https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaam/liaamdistros.htm)
 which is 4.x based. I don’t see any reference in the Spectrum Scale FAQ to the 
ALT kernels.

So what Scale code is supported for us to run on the Power9s?

Thanks

Simon

From:  on behalf of "k...@us.ibm.com" 

Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Friday, 25 May 2018 at 14:24
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862


All,

Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships 
with RHEL 7.5) as a result of applying kernel security patches may open a PMR 
to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be 
provided once the internal tests on RHEL 7.5 have been completed, likely a few 
days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid 
June).

Regards,

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] AFM negative file caching

2018-05-30 Thread Simon Thompson (IT Research Support)

So we use easybuild to build software and dependency stacks (and modules to do 
all this), yeah I did wonder about putting it first, but my worry is that other 
"stuff" installed locally that dumps in there might then break the dependency 
stack.

I was thinking maybe we can create something local with select symlinks and add 
that to the path ... but I was hoping we could do some sort of negative caching.

Simon

On 30/05/2018, 13:26, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
pesero...@gmail.com"  wrote:

As a quick means, why not adding /usr/lib64 at the beginning of 
LD_LIBRARY_PATH?

(Not to get started on using LD_LIBRARY_PATH in the first place…)


— Peter

> On 2018 May 30 Wed, at 13:52, Simon Thompson (IT Research Support) 
 wrote:
> 
> Hi All,
>  
> We have a file-set which is an AFM fileset and contains installed 
software.
>  
> We’ve been experiencing some performance issues with workloads when this 
is running and think this is down to LD_LIBRARY_PATH being set to the software 
installed in the AFM cache, e.g.
>  
> /gpfs/apps/somesoftware/v1.2/lib
>  
> Subsequently when you run (e.g.) “who” on the system, LD_LIBRARY_PATH is 
being searched for e.g. libnss_ldap, which is in /usr/lib64. We’re assuming 
that AFM is checking with home each time the directory is processed (and other 
sub directories like lib/tls) and that each time AFM is checking for the file’s 
existence at home. Is there a way to change the negative cache at all on AFM 
for this one file-set? (e.g as you might with NFS). The file-set only has 
applications so changes are pretty rare and so a 10 min or so check would be 
fine with me.
>  
> Thanks
>  
> Simon 
>  
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] AFM negative file caching

2018-05-30 Thread Simon Thompson (IT Research Support)

p.s.

I wasn’t sure if afmDirLookupRefreshInterval and afmFileLookupRefreshInterval 
would be the right thing if it’s a file/directory that doesn’t exist?

Simon

From:  on behalf of "Simon Thompson 
(IT Research Support)" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Wednesday, 30 May 2018 at 12:52
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] AFM negative file caching

Hi All,

We have a file-set which is an AFM fileset and contains installed software.

We’ve been experiencing some performance issues with workloads when this is 
running and think this is down to LD_LIBRARY_PATH being set to the software 
installed in the AFM cache, e.g.

/gpfs/apps/somesoftware/v1.2/lib

Subsequently when you run (e.g.) “who” on the system, LD_LIBRARY_PATH is being 
searched for e.g. libnss_ldap, which is in /usr/lib64. We’re assuming that AFM 
is checking with home each time the directory is processed (and other sub 
directories like lib/tls) and that each time AFM is checking for the file’s 
existence at home. Is there a way to change the negative cache at all on AFM 
for this one file-set? (e.g as you might with NFS). The file-set only has 
applications so changes are pretty rare and so a 10 min or so check would be 
fine with me.

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] AFM negative file caching

2018-05-30 Thread Simon Thompson (IT Research Support)

Hi All,

We have a file-set which is an AFM fileset and contains installed software.

We’ve been experiencing some performance issues with workloads when this is 
running and think this is down to LD_LIBRARY_PATH being set to the software 
installed in the AFM cache, e.g.

/gpfs/apps/somesoftware/v1.2/lib

Subsequently when you run (e.g.) “who” on the system, LD_LIBRARY_PATH is being 
searched for e.g. libnss_ldap, which is in /usr/lib64. We’re assuming that AFM 
is checking with home each time the directory is processed (and other sub 
directories like lib/tls) and that each time AFM is checking for the file’s 
existence at home. Is there a way to change the negative cache at all on AFM 
for this one file-set? (e.g as you might with NFS). The file-set only has 
applications so changes are pretty rare and so a 10 min or so check would be 
fine with me.

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

2018-05-25 Thread Simon Thompson (IT Research Support)

I was talking about protocols.

But yes, DSS is also supported and runs fine on 7.4.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jonathan Buzzard 
[jonathan.buzz...@strath.ac.uk]
Sent: 25 May 2018 21:37
To: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

On 25/05/18 21:06, Simon Thompson (IT Research Support) wrote:
> Hi Richard,
>
> Ours run on 7.4 without issue. We had one upgrade to 7.5 packages
> (didn't reboot into new kernel) and it broke, reverting it back to a
> 7.4 release fixed it, so when support comes along, do it with care!
>

I will at this point chime in that DSS is on 7.4 at the moment, so I am
not surprised ESS is just fine too.

JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

2018-05-25 Thread Simon Thompson (IT Research Support)

Hi Richard,

Ours run on 7.4 without issue. We had one upgrade to 7.5 packages (didn't 
reboot into new kernel) and it broke, reverting it back to a 7.4 release fixed 
it, so when support comes along, do it with care!

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Sobey, Richard A 
[r.so...@imperial.ac.uk]
Sent: 25 May 2018 15:29
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862

Hi Felipe

What about protocol servers, can they go above 7.3 yet with any version of 
Scale?

From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Felipe Knop
Sent: 25 May 2018 14:24
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] RHEL 7.5 and kernel 3.10.0-862


All,

Folks that have been updated to the 3.10.0-862 kernel (the kernel which ships 
with RHEL 7.5) as a result of applying kernel security patches may open a PMR 
to request an efix for Scale versions 4.2 or 5.0 . The efixes will then be 
provided once the internal tests on RHEL 7.5 have been completed, likely a few 
days before the 4.2.3.9 and 5.0.1.1 PTFs GA (currently targeted around mid 
June).

Regards,

Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Question concerning integration of CES with AD authentication system

2018-05-24 Thread Simon Thompson (IT Research Support)

You can change them using the normal SMB commands, from the appropriate bin 
directory, whether this is supported is another matter.

We have one parameter set this way but I forgot which.

Simkn

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Skylar Thompson 
[skyl...@uw.edu]
Sent: 24 May 2018 15:51
To: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] Question concerning integration of CES with AD 
authentication system

On Thu, May 24, 2018 at 03:46:32PM +0100, Jonathan Buzzard wrote:
> On Thu, 2018-05-24 at 14:16 +, Skylar Thompson wrote:
> > I haven't needed to change the LDAP attributes that CES uses, but I
> > do see --user-id-attrib in the mmuserauth documentation.
> > Unfortunately, I don't see an equivalent one for gidNumber.
> >
>
> Is it not doing the "Samba thing" where your GID is the GID of your
> primary Active Directory group? This is usually "Domain Users" but not
> always.
>
> Basically Samba ignores the separate GID field in RFC2307bis, so one
> imagines the options for changing the LDAP attributes are none
> existent.
>
> I know back in the day this had me stumped for a while because unless
> you assign a GID number to the users primary group then Winbind does
> not return anything, aka a "getent passwd" on the user fails.

At least for us, it seems to be using the gidNumber attribute of our users.
On the back-end, of course, it is Samba, but I don't know that there are
mm* commands available for all of the tunables one can set in smb.conf.

--
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] 5.0.1.0 Update issue with python dependencies

2018-05-16 Thread Simon Thompson (IT Research Support)

I wondered if it came from the object RPMs maybe… I haven’t actually checked, 
but I recall that it was mentioned 5.0.1 was bumping to Pike swift stack (I 
think!) and that typically requires newer RPMs if using RDO packages so maybe 
it came that route?

Simon

From:  on behalf of 
"olaf.wei...@de.ibm.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Tuesday, 15 May 2018 at 08:10
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] 5.0.1.0 Update issue with python dependencies

Renar,
can you share , what gpfs packages you tried to install
I just did a fresh 5.0.1 install and it works fine for me... even though, I 
don't see this ibm python rpm

[root@tlinc04 ~]# rpm -qa | grep -i openssl
openssl-1.0.2k-12.el7.x86_64
openssl-libs-1.0.2k-12.el7.x86_64
pyOpenSSL-0.13.1-3.el7.x86_64
openssl-devel-1.0.2k-12.el7.x86_64
xmlsec1-openssl-1.2.20-7.el7_4.x86_64

So I assume, you installed GUI, or scale mgmt .. let us know -
thx




From:"Grunenberg, Renar" 
To:"'gpfsug-discuss@spectrumscale.org'" 

Date:05/15/2018 08:00 AM
Subject:Re: [gpfsug-discuss] 5.0.1.0 Update issue with python 
dependencies
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hallo All,
follow some experiences with the update to 5.0.1.0 (from 5.0.0.2) on rhel7.4. 
After the complete yum update to this version, we had a non-function yum cmd.
The reason for this is following packet pyOpenSSL-0.14-1.ibm.el7.noarch This 
package break the yum cmds.
The error are:
Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos
Traceback (most recent call last):
  File "/bin/yum", line 29, in 
yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 370, in user_main
errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 165, in main
base.getOptionsConfig(args)
  File "/usr/share/yum-cli/cli.py", line 261, in getOptionsConfig
self.conf
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1078, in 

conf = property(fget=lambda self: self._getConfig(),
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 420, in 
_getConfig
self.plugins.run('init')
  File "/usr/lib/python2.7/site-packages/yum/plugins.py", line 188, in run
func(conduitcls(self, self.base, conf, **kwargs))
  File "/usr/share/yum-plugins/rhnplugin.py", line 141, in init_hook
svrChannels = rhnChannel.getChannelDetails(timeout=timeout)
  File "/usr/share/rhn/up2date_client/rhnChannel.py", line 71, in 
getChannelDetails
sourceChannels = getChannels(timeout=timeout)
  File "/usr/share/rhn/up2date_client/rhnChannel.py", line 98, in getChannels
up2dateChannels = s.up2date.listChannels(up2dateAuth.getSystemId())
  File "/usr/share/rhn/up2date_client/rhnserver.py", line 63, in __call__
return rpcServer.doCall(method, *args, **kwargs)
  File "/usr/share/rhn/up2date_client/rpcServer.py", line 204, in doCall
ret = method(*args, **kwargs)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
  File "/usr/share/rhn/up2date_client/rpcServer.py", line 38, in _request1
ret = self._request(methodname, params)
  File "/usr/lib/python2.7/site-packages/rhn/rpclib.py", line 384, in _request
self._handler, request, verbose=self._verbose)
  File "/usr/lib/python2.7/site-packages/rhn/transports.py", line 171, in 
request
headers, fd = req.send_http(host, handler)
  File "/usr/lib/python2.7/site-packages/rhn/transports.py", line 721, in 
send_http
self._connection.connect()
  File "/usr/lib/python2.7/site-packages/rhn/connections.py", line 187, in 
connect
self.sock.init_ssl()
  File "/usr/lib/python2.7/site-packages/rhn/SSL.py", line 90, in init_ssl
self._ctx.load_verify_locations(f)
  File "/usr/lib/python2.7/site-packages/OpenSSL/SSL.py", line 303, in 
load_verify_locations
raise TypeError("cafile must be None or a byte string")
TypeError: cafile must be None or a byte string

My questions now: why does IBM patch here rhel python-libaries. This goes to a 
update nirvana.

The Dependencies does looks like this!!
rpm -e pyOpenSSL-0.14-1.ibm.el7.noarch
error: Failed dependencies:
pyOpenSSL is needed by (installed) 
redhat-access-insights-0:1.0.13-2.el7_3.noarch
pyOpenSSL is needed by (installed) rhnlib-2.5.65-4.el7.noarch
pyOpenSSL >= 0.14 is needed by (installed) 
python2-urllib3-1.21.1-1.ibm.el7.noarch

Its PMR time.

Regards Renar



Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

renar.grunenb...@huk-coburg.de

Internet:

www.huk.de


HUK-COBURG

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-05-03 Thread Simon Thompson (IT Research Support)

Yes we do this when we really really need to take a remote FS offline, which we 
try at all costs to avoid unless we have a maintenance window.

Note if you only export via SMB, then you don’t have the same effect (unless 
something has changed recently)

Simon

From: <gpfsug-discuss-boun...@spectrumscale.org> on behalf of 
"vall...@cbio.mskcc.org" <vall...@cbio.mskcc.org>
Reply-To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Date: Thursday, 3 May 2018 at 15:41
To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

Thanks Mathiaz,
Yes i do understand the concern, that if one of the remote file systems go down 
abruptly - the others will go down too.

However, i suppose we could bring down one of the filesystems before a planned 
downtime?
For example, by unexporting the filesystems on NFS/SMB before the downtime?

I might not want to be in a situation, where i have to bring down all the 
remote filesystems because of planned downtime of one of the remote clusters.

Regards,
Lohit

On May 3, 2018, 7:41 AM -0400, Mathias Dietz <mdi...@de.ibm.com>, wrote:

Hi Lohit,

>I am thinking of using a single CES protocol cluster, with remote mounts from 
>3 storage clusters.
Technically this should work fine (assuming all 3 clusters use the same 
uids/guids). However this has not been tested in our Test lab.

>One thing to watch, be careful if your CES root is on a remote fs, as if that 
>goes away, so do all CES exports.
Not only the ces root file system is a concern, the whole CES cluster will go 
down if any remote file systems with NFS exports is not available.
e.g. if remote cluster 1 is not available, the CES cluster will unmount the 
corresponding file system which will lead to a NFS failure on all CES nodes.

Mit freundlichen Grüßen / Kind regards

Mathias Dietz

Spectrum Scale Development - Release Lead Architect (4.2.x)
Spectrum Scale RAS Architect
---
IBM Deutschland
Am Weiher 24
65451 Kelsterbach
Phone: +49 70342744105
Mobile: +49-15152801035
E-Mail: mdi...@de.ibm.com
-
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz, Geschäftsführung: Dirk 
WittkoppSitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
Stuttgart, HRB 243294

From:vall...@cbio.mskcc.org
To:gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date:01/05/2018 16:34
Subject:Re: [gpfsug-discuss] Spectrum Scale CES and remote file system 
mounts
Sent by:gpfsug-discuss-boun...@spectrumscale.org

Thanks Simon.
I will make sure i am careful about the CES root and test nfs exporting more 
than 2 remote file systems.

Regards,
Lohit

On Apr 30, 2018, 5:57 PM -0400, Simon Thompson (IT Research Support) 
<s.j.thomp...@bham.ac.uk>, wrote:
You have been able to do this for some time, though I think it's only just 
supported.

We've been exporting remote mounts since CES was added.

At some point we've had two storage clusters supplying data and at least 3 
remote file-systems exported over NFS and SMB.

One thing to watch, be careful if your CES root is on a remote fs, as if that 
goes away, so do all CES exports. We do have CES root on a remote fs and it 
works, just be aware...

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of vall...@cbio.mskcc.org 
[vall...@cbio.mskcc.org]
Sent: 30 April 2018 22:11
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

Hello All,

I read from the below link, that it is now possible to export remote mounts 
over NFS/SMB.

https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_protocoloverremoteclu.htm

I am thinking of using a single CES protocol cluster, with remote mounts from 3 
storage clusters.
May i know, if i will be able to export the 3 remote mounts(from 3 storage 
clusters) over NFS/SMB from a single CES protocol cluster?

Because according to the limitations as mentioned in the below link:

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_limitationofprotocolonRMT.htm

It says “You can configure one storage cluster and up to five protocol clusters 
(current limit).”

Regards,
Lohit
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss___
gpfsug-discuss mailing li

Re: [gpfsug-discuss] Recharging where HSM is used

2018-05-03 Thread Simon Thompson (IT Research Support)

Our charging model for disk storage assumes that a percentage of it is really 
HSM’d, though in practise we aren’t heavily doing this.

My (personal) view on tape really is that anything on tape is FoC, that way 
people can play games to recall/keep it hot it if they want, but it eats their 
FoC or paid disk allocations, whereas if they leave it on tape, they benefit in 
having more total capacity.

We currently use the pre-migrate/SOBAR for our DR piece, so we’d already be 
pre-migrating to tape anyway, so it doesn’t really cost us anything extra to 
give FoC HSM’d storage. So my suggestion is pitch HSM (or even TCT maybe … if 
only we could do both) as your DR proposal, and then you can give it to users 
for free 

Simon

From:  on behalf of "Sobey, Richard 
A" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Thursday, 3 May 2018 at 16:03
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] Recharging where HSM is used

Stephen, Bryan,

Thanks for the input, it’s greatly appreciated.

For us we’re trying – as many people are – to drive down the usage of 
under-the-desk NAS appliances and USB HDDs. We offer space on disk, but you 
can’t charge for 3TB of storage the same as you would down PC World and many 
customers don’t understand the difference between what we do, and what a USB 
disk offers.

So, offering tape as a medium to store cold data, but not archive data, is one 
offering we’re just getting round to discussing. The solution is in place. To 
answer the specific question: for our customers that adopt HSM, how much less 
should/could/can we charge them per TB. We know how much a tape costs, but we 
don’t necessarily have the means (or knowledge?) to say that for a given 
fileset, 80% of the data is on tape. Then you get into 80% of 1TB is not the 
same as 80% of 10TB.

Richard

From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Stephen Ulmer
Sent: 03 May 2018 14:03
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Recharging where HSM is used

I work for a partner, but I occasionally have to help customers work on cost 
justification that includes charge-back (or I encourage them to do show-back to 
alleviate some political concerns).

I’d also like to see what people are doing around this.

If I may ask a question, what is the goal for your site? Are you trying to 
figure out how to charge for the tape space, or to NOT charge the migrated 
files as resident? Would you (need to) charge for pre-migrated files twice? Are 
you trying to figure out how to have users pay for recalls? Basically, what 
costs are you trying to cover? I realize that was not “a” question… :)

Also, do you specifically mean TSM HSM, or do you mean GPFS policies and an 
external storage pool?

--
Stephen





On May 3, 2018, at 5:43 AM, Sobey, Richard A 
> wrote:

Hi all,

I’d be interested to talk to anyone that is using HSM to move data to tape, 
(and stubbing the file(s)) specifically any strategies you’ve employed to 
figure out how to charge your customers (where you do charge anyway) based on 
usage.

On-list or off is fine with me.

Thanks
Richard
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] AFM with clones

2018-05-02 Thread Simon Thompson (IT Research Support)

Hi,

We are looking at providing an AFM cache of a home which has a number of cloned 
files. From the docs:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1ins_afmandafmdrlimitations.htm

· We can see that “The mmclone command is not supported on AFM cache 
and AFM DR primary filesets. Clones created at home for AFM filesets are 
treated as separate files in the cache.”

So it’s no surprise that when we pre-cache the files, they space consumed is 
different.

What I’m not clear on is what happens if we update a clone file at home? I know 
AFM is supposed to only transfer the exact bytes updated, does this work with 
clones? i.e. at home do we just get the bytes updated in the copy-on-write 
clone, or do we accidentally end up shipping the whole file back?

(note we are using IW mode)

Thanks

Simon
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

2018-04-30 Thread Simon Thompson (IT Research Support)

You have been able to do this for some time, though I think it's only just 
supported.

We've been exporting remote mounts since CES was added.

At some point we've had two storage clusters supplying data and at least 3 
remote file-systems exported over NFS and SMB.

One thing to watch, be careful if your CES root is on a remote fs, as if that 
goes away, so do all CES exports. We do have CES root on a remote fs and it 
works, just be aware...

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of vall...@cbio.mskcc.org 
[vall...@cbio.mskcc.org]
Sent: 30 April 2018 22:11
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Spectrum Scale CES and remote file system mounts

Hello All,

I read from the below link, that it is now possible to export remote mounts 
over NFS/SMB.

https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_protocoloverremoteclu.htm

I am thinking of using a single CES protocol cluster, with remote mounts from 3 
storage clusters.
May i know, if i will be able to export the 3 remote mounts(from 3 storage 
clusters) over NFS/SMB from a single CES protocol cluster?

Because according to the limitations as mentioned in the below link:

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_limitationofprotocolonRMT.htm

It says “You can configure one storage cluster and up to five protocol clusters 
(current limit).”


Regards,
Lohit
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Pool migration and replicate

2018-04-26 Thread Simon Thompson (IT Research Support)

Hi all,

We'd like to move some data from a non replicated pool to another pool, but 
keep replication at 1 (the fs default is 2).

When using an ILM policy, is the default to keep the current replication or use 
the fs default?

I.e.just wondering if I need to include a "REPLICATE(1)" clause.

Also if the data is already migrated to the pool, is it still considered by the 
policy engine, or should I include FROM POOL...?

I.e. just wondering what is the most efficient way to target the files.

Thanks

Simon
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] afmPrepopEnd Callback

2018-04-23 Thread Simon Thompson (IT Research Support)

My very unconsidered and unsupported suggestion would be to edit mmfsfuncs on 
your test cluster and see if it’s actually implemented further in the code 

Simon

From:  on behalf of 
"luke.raimb...@googlemail.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Monday, 23 April 2018 at 15:11
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] afmPrepopEnd Callback

Good Afternoon AFM Experts,

I looked in the manual for afmPreopopEnd event variables I can extract to log 
something useful after a prefetch event completes. Here is the manual entry:

 %prepopAlreadyCachedFiles
  Specifies the number of files that are cached.
  These number of files are not read into cache
  because data is same between cache and home.

However, when I try to install a callback like this, I get the associated error:

# mmaddcallback afmCompletionReport --command /var/mmfs/etc/afmPrepopEnd.sh 
--event afmPrepopEnd -N afm --parms "%fsName %filesetName %prepopCompletedReads 
%prepopFailedReads %prepopAlreadyCachedFiles %prepopData"
mmaddcallback: Invalid callback variable "%prepopAlreadyCachedFiles" was 
specified.
mmaddcallback: Command failed. Examine previous error messages to determine 
cause.

I have a butcher's in /usr/lpp/mmfs/bin/mmfsfuncs and see only these three 
%prepop variables listed:

%prepopcompletedreads ) validCallbackVariable="%prepopCompletedReads";;
%prepopfailedreads ) validCallbackVariable="%prepopFailedReads";;
%prepopdata) validCallbackVariable="%prepopData";;

Is the %prepopAlreadyCachedFiles not implemented? Will it be implemented?

Unusual to see the manual ahead of the code ;)

Cheers,
Luke
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] UK Meeting - tooling Spectrum Scale

2018-04-20 Thread Simon Thompson (IT Research Support)

Sorry, it was a typo from my side.

The talks that are missing we are chasing for copies of the slides that we can 
release.

Simon

From:  on behalf of 
"renar.grunenb...@huk-coburg.de" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Friday, 20 April 2018 at 15:02
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] UK Meeting - tooling Spectrum Scale

Hallo Simon,
are there any reason why the link of the presentation from Yong ZY 
Zheng(Cognitive, ML, Hortonworks) is not linked.

Renar Grunenberg
Abteilung Informatik – Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:

09561 96-44110

Telefax:

09561 96-44104

E-Mail:

renar.grunenb...@huk-coburg.de

Internet:

www.huk.de


HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas.

Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist 
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in 
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this 
information is strictly forbidden.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] [Replicated and non replicated data

2018-04-16 Thread Simon Thompson (IT Research Support)

Yeah that did it, it was set to the default value of “no”.

What exactly does “no” mean as opposed to “yes”? The docs
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_tuningguide.htm

Aren’t very forthcoming on this …

(note it looks like we also have to set this in multi-cluster environments in 
client clusters as well)

Simon

From: "robert.oester...@nuance.com" <robert.oester...@nuance.com>
Date: Friday, 13 April 2018 at 21:17
To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Cc: "Simon Thompson (IT Research Support)" <s.j.thomp...@bham.ac.uk>
Subject: Re: [Replicated and non replicated data

Add:

unmountOnDiskFail=meta

To your config. You can add it with “-I” to have it take effect w/o reboot.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-boun...@spectrumscale.org> on behalf of "Simon Thompson 
(IT Research Support)" <s.j.thomp...@bham.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date: Friday, April 13, 2018 at 3:06 PM
To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Replicated and non replicated data

I have a question about file-systems with replicated an non replicated data.

We have a file-system where metadata is set to copies=2 and data copies=2, we 
then use a placement policy to selectively replicate some data only once based 
on file-set. We also place the non-replicated data into a specific pool 
(6tnlsas) to ensure we know where it is placed.

My understanding was that in doing this, if we took the disks with the non 
replicated data offline, we’d still have the FS available for users as the 
metadata is replicated. Sure accessing a non-replicated data file would give an 
IO error, but the rest of the FS should be up.

We had a situation today where we wanted to take stg01 offline today, so tried 
using mmchdisk stop -d …. Once we got to about disk stg01-01_12_12, GPFS would 
refuse to stop any more disks and complain about too many disks, similarly if 
we shutdown the NSD servers hosting the disks, the filesystem would have an 
SGPanic and force unmount.

First, am I correct in thinking that a FS with non-replicated data, but 
replicated metadata should still be accessible (not the non-replicated data) 
when the LUNS hosting it are down?

If so, any suggestions why my FS is panic-ing when we take down the one set of 
disks?

I thought at first we had some non-replicated metadata, tried a mmrestripefs -R 
–metadata-only to force it to ensure 2 replicas, but this didn’t help.

Running 5.0.0.2 on the NSD server nodes.

(First time we went round this we didn’t have a FS descriptor disk, but you can 
see below that we added this)

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Dual server NSDs - change of hostname

2018-04-05 Thread Simon Thompson (IT Research Support)

Yeah that was my thoughts too given Bob said you can update the server list for 
an NSD device in 5.0. I also thought that bringing up a second nic and changing 
the name etc could bring a whole world or danger from having split routing and 
rp_filter (been there, had the weirdness, RDMA traffic continues but admin 
traffic randomly fails, but hey, if you like the world crashing down around 
you….)

Simon

From:  on behalf of 
"makap...@us.ibm.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Thursday, 5 April 2018 at 14:37
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] Dual server NSDs - change of hostname

To my mind this is simpler:  IF you can mmdelnode without too much suffering, 
do that. Then reconfigure the host name and whatever else you'd like to do. 
Then mmaddnode...


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] UK April meeting

2018-04-05 Thread Simon Thompson (IT Research Support)

It’s now just two weeks until the UK meeting and we are down to our last few 
places available. If you were planning on attending, please register now!

Simon

From:  on behalf of 
"ch...@spectrumscale.org" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Thursday, 1 March 2018 at 11:26
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] UK April meeting

Hi All,

We’ve just posted the draft agenda for the UK meeting in April at:
http://www.spectrumscaleug.org/event/uk-2018-user-group-event/

So far, we’ve issued over 50% of the available places, so if you are planning 
to attend, please do register now! Please register at:
https://www.eventbrite.com/e/spectrum-scale-gpfs-user-group-2018-registration-41489952565?aff=MailingList

We’ve also confirmed our evening networking/social event between days 1 and 2 
with thanks to our sponsors for supporting this.

Please remember that we are currently limiting to two registrations per 
organisation.

We’d like to thank our sponsors from DDN, E8, Ellexus, IBM, Lenovo, NEC and OCF 
for supporting the event.

Simon
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] GPFS Encryption

2018-03-26 Thread Simon Thompson (IT Research Support)

John,

I think we might need the decrypt key ...

Simon

On 26/03/2018, 13:29, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
john.hea...@asml.com"  wrote:

Fbeel Tnergu. Pnaabg nqq nalguvta hfrshy urer.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Preferred NSD

2018-03-14 Thread Simon Thompson (IT Research Support)

Not always true.

1. Use them with socket licenses as HAWC or LROC is OK on a client.
2. Have data management edition and capacity license the amount of storage.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jeffrey R. Lang 
[jrl...@uwyo.edu]
Sent: 14 March 2018 14:11
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Preferred NSD

Something I haven't heard in this discussion, it that of licensing of GPFS.

I believe that once you export disks from a node it then becomes a server node 
and the license may need to be changed, from client to server.  There goes the 
budget.



-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Lukas Hejtmanek
Sent: Wednesday, March 14, 2018 4:28 AM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Preferred NSD

Hello,

thank you for insight. Well, the point is, that I will get ~60 with 120 NVMe 
disks in it, each about 2TB size. It means that I will have 240TB in NVMe SSD 
that could build nice shared scratch. Moreover, I have no different HW or place 
to put these SSDs into. They have to be in the compute nodes.

On Tue, Mar 13, 2018 at 10:48:21AM -0700, Alex Chekholko wrote:
> I would like to discourage you from building a large distributed
> clustered filesystem made of many unreliable components.  You will
> need to overprovision your interconnect and will also spend a lot of
> time in "healing" or "degraded" state.
>
> It is typically cheaper to centralize the storage into a subset of
> nodes and configure those to be more highly available.  E.g. of your
> 60 nodes, take 8 and put all the storage into those and make that a
> dedicated GPFS cluster with no compute jobs on those nodes.  Again,
> you'll still need really beefy and reliable interconnect to make this work.
>
> Stepping back; what is the actual problem you're trying to solve?  I
> have certainly been in that situation before, where the problem is
> more like: "I have a fixed hardware configuration that I can't change,
> and I want to try to shoehorn a parallel filesystem onto that."
>
> I would recommend looking closer at your actual workloads.  If this is
> a "scratch" filesystem and file access is mostly from one node at a
> time, it's not very useful to make two additional copies of that data
> on other nodes, and it will only slow you down.
>
> Regards,
> Alex
>
> On Tue, Mar 13, 2018 at 7:16 AM, Lukas Hejtmanek
> 
> wrote:
>
> > On Tue, Mar 13, 2018 at 10:37:43AM +, John Hearns wrote:
> > > Lukas,
> > > It looks like you are proposing a setup which uses your compute
> > > servers
> > as storage servers also?
> >
> > yes, exactly. I would like to utilise NVMe SSDs that are in every
> > compute servers.. Using them as a shared scratch area with GPFS is
> > one of the options.
> >
> > >
> > >   *   I'm thinking about the following setup:
> > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB
> > > interconnected
> > >
> > > There is nothing wrong with this concept, for instance see
> > > https://www.beegfs.io/wiki/BeeOND
> > >
> > > I have an NVMe filesystem which uses 60 drives, but there are 10 servers.
> > > You should look at "failure zones" also.
> >
> > you still need the storage servers and local SSDs to use only for
> > caching, do I understand correctly?
> >
> > >
> > > From: gpfsug-discuss-boun...@spectrumscale.org
> > > [mailto:gpfsug-discuss-
> > boun...@spectrumscale.org] On Behalf Of Knister, Aaron S.
> > (GSFC-606.2)[COMPUTER SCIENCE CORP]
> > > Sent: Monday, March 12, 2018 4:14 PM
> > > To: gpfsug main discussion list 
> > > Subject: Re: [gpfsug-discuss] Preferred NSD
> > >
> > > Hi Lukas,
> > >
> > > Check out FPO mode. That mimics Hadoop's data placement features.
> > > You
> > can have up to 3 replicas both data and metadata but still the
> > downside, though, as you say is the wrong node failures will take your 
> > cluster down.
> > >
> > > You might want to check out something like Excelero's NVMesh
> > > (note: not
> > an endorsement since I can't give such things) which can create
> > logical volumes across all your NVMe drives. The product has erasure
> > coding on their roadmap. I'm not sure if they've released that
> > feature yet but in theory it will give better fault tolerance *and*
> > you'll get more efficient usage of your SSDs.
> > >
> > > I'm sure there are other ways to skin this cat too.
> > >
> > > -Aaron
> > >
> > >
> > >
> > > On March 12, 2018 at 10:59:35 EDT, Lukas Hejtmanek
> > >  > > wrote:
> > > Hello,
> > >
> > > I'm thinking about the following setup:
> > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB
> > > interconnected
> > >
> > > I would like to setup shared scratch area using GPFS and

Re: [gpfsug-discuss] Preferred NSD

2018-03-14 Thread Simon Thompson (IT Research Support)

I would look at using LROC and possibly using HAWC ...

Note you need to be a bit careful with HAWC client side and failure group 
placement.

Simon

On 14/03/2018, 09:28, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
xhejt...@ics.muni.cz"  wrote:

Hello,

thank you for insight. Well, the point is, that I will get ~60 with 120 NVMe
disks in it, each about 2TB size. It means that I will have 240TB in NVMe 
SSD
that could build nice shared scratch. Moreover, I have no different HW or 
place 
to put these SSDs into. They have to be in the compute nodes.

On Tue, Mar 13, 2018 at 10:48:21AM -0700, Alex Chekholko wrote:
> I would like to discourage you from building a large distributed clustered
> filesystem made of many unreliable components.  You will need to
> overprovision your interconnect and will also spend a lot of time in
> "healing" or "degraded" state.
> 
> It is typically cheaper to centralize the storage into a subset of nodes
> and configure those to be more highly available.  E.g. of your 60 nodes,
> take 8 and put all the storage into those and make that a dedicated GPFS
> cluster with no compute jobs on those nodes.  Again, you'll still need
> really beefy and reliable interconnect to make this work.
> 
> Stepping back; what is the actual problem you're trying to solve?  I have
> certainly been in that situation before, where the problem is more like: 
"I
> have a fixed hardware configuration that I can't change, and I want to try
> to shoehorn a parallel filesystem onto that."
> 
> I would recommend looking closer at your actual workloads.  If this is a
> "scratch" filesystem and file access is mostly from one node at a time,
> it's not very useful to make two additional copies of that data on other
> nodes, and it will only slow you down.
> 
> Regards,
> Alex
> 
> On Tue, Mar 13, 2018 at 7:16 AM, Lukas Hejtmanek 
> wrote:
> 
> > On Tue, Mar 13, 2018 at 10:37:43AM +, John Hearns wrote:
> > > Lukas,
> > > It looks like you are proposing a setup which uses your compute 
servers
> > as storage servers also?
> >
> > yes, exactly. I would like to utilise NVMe SSDs that are in every 
compute
> > servers.. Using them as a shared scratch area with GPFS is one of the
> > options.
> >
> > >
> > >   *   I'm thinking about the following setup:
> > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB interconnected
> > >
> > > There is nothing wrong with this concept, for instance see
> > > https://www.beegfs.io/wiki/BeeOND
> > >
> > > I have an NVMe filesystem which uses 60 drives, but there are 10 
servers.
> > > You should look at "failure zones" also.
> >
> > you still need the storage servers and local SSDs to use only for 
caching,
> > do
> > I understand correctly?
> >
> > >
> > > From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-
> > boun...@spectrumscale.org] On Behalf Of Knister, Aaron S.
> > (GSFC-606.2)[COMPUTER SCIENCE CORP]
> > > Sent: Monday, March 12, 2018 4:14 PM
> > > To: gpfsug main discussion list 
> > > Subject: Re: [gpfsug-discuss] Preferred NSD
> > >
> > > Hi Lukas,
> > >
> > > Check out FPO mode. That mimics Hadoop's data placement features. You
> > can have up to 3 replicas both data and metadata but still the downside,
> > though, as you say is the wrong node failures will take your cluster 
down.
> > >
> > > You might want to check out something like Excelero's NVMesh (note: 
not
> > an endorsement since I can't give such things) which can create logical
> > volumes across all your NVMe drives. The product has erasure coding on
> > their roadmap. I'm not sure if they've released that feature yet but in
> > theory it will give better fault tolerance *and* you'll get more 
efficient
> > usage of your SSDs.
> > >
> > > I'm sure there are other ways to skin this cat too.
> > >
> > > -Aaron
> > >
> > >
> > >
> > > On March 12, 2018 at 10:59:35 EDT, Lukas Hejtmanek 
 > > wrote:
> > > Hello,
> > >
> > > I'm thinking about the following setup:
> > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB interconnected
> > >
> > > I would like to setup shared scratch area using GPFS and those NVMe
> > SSDs. Each
> > > SSDs as on NSD.
> > >
> > > I don't think like 5 or more data/metadata replicas are practical 
here.
> > On the
> > > other hand, multiple node failures is something really expected.
> > >
> > > Is there a way to instrument that local NSD is strongly

Re: [gpfsug-discuss] mmfind performance

2018-03-07 Thread Simon Thompson (IT Research Support)

I can’t comment on mmfind vs perl, but have you looked at trying “tsfindinode” ?

Simon

From:  on behalf of "Buterbaugh, 
Kevin L" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Tuesday, 6 March 2018 at 18:52
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] mmfind performance

Hi All,

In the README for the mmfind command it says:

mmfind
  A highly efficient file system traversal tool, designed to serve
   as a drop-in replacement for the 'find' command as used against GPFS FSes.

And:

mmfind is expected to be slower than find on file systems with relatively few 
inodes.
This is due to the overhead of using mmapplypolicy.
However, if you make use of the -exec flag to carry out a relatively expensive 
operation
on each file (e.g. compute a checksum), using mmfind should yield a significant 
performance
improvement, even on a file system with relatively few inodes.

I have a list of just shy of 50 inode numbers that I need to figure out what 
file they correspond to, so I decided to give mmfind a try:

+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 -o 
-inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 137939675 -o 
-inum 137997971 -o -inum 138013736 -o -inum 138029061 -o -inum 138029065 -o 
-inum 138029076 -o -inum 138029086 -o -inum 138029093 -o -inum 138029099 -o 
-inum 138029101 -o -inum 138029102 -o -inum 138029106 -o -inum 138029112 -o 
-inum 138029113 -o -inum 138029114 -o -inum 138029119 -o -inum 138029120 -o 
-inum 138029121 -o -inum 138029130 -o -inum 138029131 -o -inum 138029132 -o 
-inum 138029141 -o -inum 138029146 -o -inum 138029147 -o -inum 138029152 -o 
-inum 138029153 -o -inum 138029154 -o -inum 138029163 -o -inum 138029164 -o 
-inum 138029165 -o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o 
-inum 138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o 
-inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 -o 
-inum 138235320 -ls

I kicked that off last Friday and it is _still_ running.  By comparison, I have 
a Perl script that I have run in the past that simple traverses the entire 
filesystem tree and stat’s each file and outputs that to a log file.  That 
script would “only” run ~24 hours.

Clearly mmfind as I invoked it is much slower than the corresponding Perl 
script, so what am I doing wrong?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] tscCmdPortRange question

2018-03-06 Thread Simon Thompson (IT Research Support)

We are looking at setting a value for tscCmdPortRange so that we can apply 
firewalls to a small number of GPFS nodes in one of our clusters.

The docs don’t give an indication on the number of ports that are required to 
be in the range. Could anyone make a suggestion on this?

It doesn’t appear as a parameter for “mmchconfig -i”, so I assume that it 
requires the nodes to be restarted, however I’m not clear if we could do a 
rolling restart on this?

Thanks

Simon
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Odd d????????? permissions

2018-02-14 Thread Simon Thompson (IT Research Support)

Is it an AFM cache? We see this sort of behaviour occasionally where the cache 
has an "old" view of the directory. Doing an ls, it evidently goes back to home 
but by then you already have weird stuff. The next ls is usually fine.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of john.hea...@asml.com 
[john.hea...@asml.com]
Sent: 14 February 2018 09:00
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Odd d? permissions

I am sure this is a known behavior and I am going to feel very foolish in a few 
minutes…

We often see this behavior on a GPFS filesystem.
I log into a client.

[jhearns@pn715 test]$ ls -la ../
ls: cannot access ../..: Permission denied
total 160
drwx-- 4 jhearns root   4096 Feb 14 09:46 .
d? ? ?   ? ?? ..
drwxr-xr-x 2 jhearns users  4096 Feb  9 11:13 gpfsperf
-rw-r--r-- 1 jhearns users 27336 Feb  9 22:24 iozone.out
-rw-r--r-- 1 jhearns users  6083 Feb  9 10:55 IozoneResults.py
-rw-r--r-- 1 jhearns users 22959 Feb  9 11:17 iozone.txt
-rw-r--r-- 1 jhearns users  2977 Feb  9 10:55 iozone.txtvi
-rwxr-xr-x 1 jhearns users   102 Feb  9 10:55 run-iozone.sh
drwxr-xr-x 2 jhearns users  4096 Feb 14 09:46 test
-r-x-- 1 jhearns users 51504 Feb  9 11:02 tsqosperf

This behavior changes after a certain number of minutes, and the .. directory 
looks normal.

For information this filesystem has nfsv4 file locking semantics and ACL 
semantics set to all

-- The information contained in this communication and any attachments is 
confidential and may be privileged, and is for the sole use of the intended 
recipient(s). Any unauthorized review, use, disclosure or distribution is 
prohibited. Unless explicitly stated otherwise in the body of this 
communication or the attachment thereto (if any), the information is provided 
on an AS-IS basis without any express or implied warranties or liabilities. To 
the extent you are relying on this information, you are doing so at your own 
risk. If you are not the intended recipient, please notify the sender 
immediately by replying to this message and destroy all copies of this message 
and any attachments. Neither the sender nor the company/group of companies he 
or she represents shall be liable for the proper and complete transmission of 
the information contained in this communication, or for any delay in its 
receipt.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] In place upgrade of ESS?

2018-02-02 Thread Simon Thompson (IT Research Support)

If you mean adding storage shelves to increase capacity to an ESS, then no I 
don't believe it is supported. I think it is supported on the Lenovo DSS-G 
models, though you have to have a separate DA for each shelf increment so the 
performance may different between an upgraded Vs complete solution.

Simon 

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of 
sander...@convergeone.com [sander...@convergeone.com]
Sent: 02 February 2018 19:59
To: gpfsug main discussion list
Subject: [gpfsug-discuss] In place upgrade of ESS?

I haven't found a firm answer yet.  Is it possible to in place upgrade say, a 
GL2 to a GL4 and subsequently a GL6?



Do we know if this feature is coming?

SHAUN ANDERSON
STORAGE ARCHITECT
O 208.577.2112
M 214.263.7014


NOTICE: This email message and any attachments hereto may contain confidential
information. Any unauthorized review, use, disclosure, or distribution of such
information is prohibited. If you are not the intended recipient, please contact
the sender by reply email and destroy the original message and all copies of it.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Expeliarmus

2017-12-20 Thread Simon Thompson (IT Research Support)

This is assuming you directly have “IBM” licenses (as opposed to OEM licenses 
where the route is different … or where the licenses are held by your VAR 
rather than you …)

You need to have an IBM account which is attached to a (Passport Advantage) PA 
site that has current support for the product…

If you go to fix central, its also listed there now as well so again assuming 
your account is appropriately attached to a support contract, you can download 
it.

If you are struggling with these, then go speak to your business partner or IBM 
account manager.

Simon

From:  on behalf of 
"john.hea...@asml.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Wednesday, 20 December 2017 at 08:41
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] Expeliarmus

I have downloaded several versions of Spectrum Scale 4.X from the MyIBM site.
For the life of me I cannot summon the spell needed to put Spectrum Scale 5 on 
my orders list.

Can some kindly witch give me the incantation please?
I would like to install on a test cluster, as the wisdom of the mages and 
mavens here has it.

-- The information contained in this communication and any attachments is 
confidential and may be privileged, and is for the sole use of the intended 
recipient(s). Any unauthorized review, use, disclosure or distribution is 
prohibited. Unless explicitly stated otherwise in the body of this 
communication or the attachment thereto (if any), the information is provided 
on an AS-IS basis without any express or implied warranties or liabilities. To 
the extent you are relying on this information, you are doing so at your own 
risk. If you are not the intended recipient, please notify the sender 
immediately by replying to this message and destroy all copies of this message 
and any attachments. Neither the sender nor the company/group of companies he 
or she represents shall be liable for the proper and complete transmission of 
the information contained in this communication, or for any delay in its 
receipt.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Spectrum Scale 5.0 now available on Fix Central

2017-12-19 Thread Simon Thompson (IT Research Support)

Maybe it would have been a good idea to make this clear in the “What’s new in 
5.0” slide decks used at SC. I don’t recall it being there. And the lack of 
forward public notification on this is not great, particularly for those not in 
NYC. Sure most of my clusters are on EL7 now, but I still have some nodes still 
running 6.x (notably some of our Spectrum Protect nodes which are not just 
systems we can reinstall).

Simon

From:  on behalf of 
"duer...@us.ibm.com" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Tuesday, 19 December 2017 at 13:19
To: "gpfsug-discuss@spectrumscale.org" 
Subject: Re: [gpfsug-discuss] Spectrum Scale 5.0 now available on Fix Central


As Mike Taylor pointed out in a previous post this was an incorrect statement.
You can be at 4.2.x (ie 4.2.0, 4.2.1, 4.2.2, or 4.2.3) and still do a rolling 
upgrade.
The minReleaseLevel is not pertinent to a rolling upgrade. The running daemon 
is the important part. So you can't have any 4.1.x nodes in your cluster and do 
a rolling upgrade to 5.0.

Also, Aaron, as to the OS support. This decision was not made without some 
angst. As I mentioned at the user group meeting in NYC...the key point is that 
we would like to get to a more current compiler. This will allow us to take 
advantage of newer features and functions and hopefully make the code better 
for customers. SLES 12 has been around for over 2 years.

I hope this helps give some thinking behind the decision.


Steve Duersch
Spectrum Scale
845-433-7902
IBM Poughkeepsie, New York


> Today's Topics:
>
>1. Re: Spectrum Scale 5.0 now available on Fix Central
>   (Sobey, Richard A)
>
>
> --
>
> Message: 1
> Date: Tue, 19 Dec 2017 09:06:08 +
> From: "Sobey, Richard A" 
> To: gpfsug main discussion list 
> Subject: Re: [gpfsug-discuss] Spectrum Scale 5.0 now available on Fix
>Central
> Message-ID:
>
> 
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Robert
>
> Do you mean the minReleaseLevel from mmlsconfig or just making sure
> all the nodes are running 4.2.3?
>
> Cheers!
> Richard
>
> From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-
> discuss-boun...@spectrumscale.org] On Behalf Of Oesterlin, Robert
> Sent: 18 December 2017 19:44
> To: gpfsug main discussion list 
> Subject: [gpfsug-discuss] FW: Spectrum Scale 5.0 now available on Fix Central
>
> The Scale 5.0 fix level is now up on Fix Central.
>
> You need to be at Scale 4.2.3 (cluster level) to do a rolling
> upgrade to this level.
>
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Smallest block quota/limit and file quota/limit possible to set?

2017-12-04 Thread Simon Thompson (IT Research Support)

Stuart,

Have you looked at using filesets instead an using fileset quotas to achieve 
this?

This is what we do and the max number of filesets (currently) isn't an issue 
for us.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of stua...@4gh.net 
[stua...@4gh.net]
Sent: 04 December 2017 16:33
To: gpfsug main discussion list
Cc: gpfsug-discuss-boun...@spectrumscale.org
Subject: Re: [gpfsug-discuss] Smallest block quota/limit and file quota/limit 
possible to set?

We have a /projects filesystem where individual projects can "buy" a
specific amount of disk space.  We enforce this purchase limit by
creating a specific group for the allocation, adding designated users
to the group and setting a group quota.

This works fine as long as the users properly use setgid directories
and keep proper group ownership of the files and directories.

However, for various reasons our users keep creating files and
directories with incorrect group ownership.  In most cases this is
accidental and eventually causes problems when other group members
need to access the files.  In abusive cases (not yet seen) people
could use this to exceed project disk space allocations.

To address this problem we have default quotas set to about 2GB (the
smallest we seem to be able to set).  This prevents users for
consuming too much unpurchased disk space.  However, this continues to
allow users to create files and directories with incorrect group
ownership and it takes users a while to discover their error.  User
education and cleanup becomes a problem long after the user thinks
things are working.

We would like to have groups without quota definitions to not be able
to create any files.  This would prevent accidental file creation at
the first attempt.

Stuart Barkley

On Mon, 4 Dec 2017 at 08:46 -, Stephen Ulmer wrote:

> I don?t understand why not having permission(s) doesn?t prevent the
> user from writing into the fileset...
>
> As described, your case is about not wanting userA to be able to
> write to a fileset if userA isn?t in some groups. Don?t put them in
> those groups. That?s not even Spectrum Scale specific, it?s about
> generic *nix permissions.
>
> What am I missing? I don?t understand why you would want to use
> quota to enforce permissions. (There could be a legitimate reason
> here, but I don?t understand it.)
>
> Liberty,
>
> --
> Stephen Ulmer
>
> Sent from a mobile device; please excuse autocorrect silliness.
>
> > On Dec 3, 2017, at 10:49 PM, IBM Spectrum Scale  wrote:
> >
> > Hi Keith,
> >
> > You can use ACLs for fine grained permissions. A quota limit of 0
> > in GPFS implies no limits.
> >
> > Regards, The Spectrum Scale (GPFS) team
> >
> > From:Keith Ball 
> > To:gpfsug-discuss@spectrumscale.org
> > Date:12/04/2017 08:19 AM
> > Subject:[gpfsug-discuss] Smallest block quota/limit and file 
> > quota/limitpossible to set?
> > Sent by:gpfsug-discuss-boun...@spectrumscale.org
> >
> > HI All,
> >
> > We have a system where all users have their own private group as
> > well. However, for a given fileset (we are using
> > --perfileset-quota), we would like to ONLY allow users who also
> > belong to just a few central groups to be able to write to the
> > fileset.
> >
> > That is, user "userA" has its own "groupA", but we only want the
> > user to be able to write to the fileset if:
> >  - userA belongs to one of the groups (e.g. group1, group2,
> >  group3) that have explicitly set quotas
> >  - The group(s) in question are within quota/limits.
> >
> > In general, we do not want any users that do NOT belong to one of
> > the three groups with enabled quotas to be able to write anything
> > at all to the fileset.
> >
> > Is there a way to set a ZERO quota for block/file in GPFS, that
> > means what it actually should mean? i.e. "Your limit is 0 file =
> > you cannot create files in this fileset". Creating some kind of
> > "supergroup" owner of the fileset (with entitled users as members
> > of the group) could work, but that will only work for *one* group.
> >
> > If we cannot set the block and file limits to zero, what *are* the
> > smallest block and fie limits? In GPFS 3.5, they seem to be 1760MB
> > for block. Is there a smallest quota for files? (blocksize is
> > 16MB, which will be reduced to 4MB probably, in a subsequent
> > cluster).
> >
> > Many Thanks,
> >   Keith
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Can gpfs 4.2.3-4.2 work for kernel 3.12.x or above?

2017-12-04 Thread Simon Thompson (IT Research Support)

The FAQ at:

https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux

Lists support with (e.g. Ubutu 16.04.2) with kernel 4.4.0-62, so likely it 
would work with a build your own kernel, but that doesn’t mean it is 
**supported**

Simon

On 04/12/2017, 09:52, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
z@imperial.ac.uk"  wrote:

Hi All,

Any one is using a Linux kernel 3.12.x or above
to run gpfs 4.2.3-4.2? I mean you've compiled
your own kernel without paying for a professional
service.

We're stuck by CentOS/RHEL's distributed kernel
as the PCI passthrough is required for VMs. Your
comments or suggestions are much appreciated.

Kind regards,

Zong-Pei

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmauth/mmremotecluster wonkyness?

2017-11-30 Thread Simon Thompson (IT Research Support)

Um no, you are talking GPFS protocol between cluster nodes still in 
multicluster. Contact nodes are where the remote cluster goes to start with, 
but after that it's just normal node to node gpfs traffic (not just the contact 
nodes).

At least that is my understanding.

If you want traffic separation, you need something like AFM.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of valdis.kletni...@vt.edu 
[valdis.kletni...@vt.edu]
Sent: 30 November 2017 16:27
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] mmauth/mmremotecluster wonkyness?

We have a 10-node cluster running gpfs 4.2.2.3, where 8 nodes are GPFS contact
nodes for 2 filesystems, and 2 are protocol nodes doingNFS exports of the
filesystems.

But we see some nodes in remote clusters trying to GPFS connect to
the 2 protocol nodes anyhow.

My reading of the manpages is that the remote cluster is responsible
for setting '-n contactNodes' when they do the 'mmremotecluster add',
and there's no way to sanity check or enforce that at the local end, and
fail/flag connections to unintended non-contact nodes if the remote
admin forgets/botches the -n.

Is that actually correct?  If so, is it time for an RFE?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] 5.0 features?

2017-11-29 Thread Simon Thompson (IT Research Support)

You can in place upgrade.

I think what people are referring to is likely things like the new sub block 
sizing for **new** filesystems.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of jfosb...@mdanderson.org 
[jfosb...@mdanderson.org]
Sent: 29 November 2017 17:40
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] 5.0 features?

I haven’t even heard it’s been released or has been announced.  I’ve requested 
a roadmap discussion.

From:  on behalf of Marc A Kaplan 

Reply-To: gpfsug main discussion list 
Date: Wednesday, November 29, 2017 at 11:38 AM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] 5.0 features?

Which features of 5.0 require a not-in-place upgrade of a file system?  Where 
has this information been published?


The information contained in this e-mail message may be privileged, 
confidential, and/or protected from disclosure. This e-mail message may contain 
protected health information (PHI); dissemination of PHI should comply with 
applicable federal and state laws. If you are not the intended recipient, or an 
authorized representative of the intended recipient, any further review, 
disclosure, use, dissemination, distribution, or copying of this message or any 
attachment (or the information contained therein) is strictly prohibited. If 
you think that you have received this e-mail message in error, please notify 
the sender by return e-mail and delete all references to it and its contents 
from your systems.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Callbacks / softQuotaExceeded

2017-11-06 Thread Simon Thompson (IT Research Support)

Thanks Eric,

One other question, when it says it must run on a manager node, I'm assuming 
that means a manager node in a storage cluster (we multi-cluster clients 
clusters in).

Thanks

Simon

From: Eric Agar <a...@us.ibm.com<mailto:a...@us.ibm.com>> on behalf of 
"sc...@us.ibm.com<mailto:sc...@us.ibm.com>" 
<sc...@us.ibm.com<mailto:sc...@us.ibm.com>>
Date: Monday, 6 November 2017 at 19:51
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>, 
Simon Thompson <s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
Cc: IBM Spectrum Scale <sc...@us.ibm.com<mailto:sc...@us.ibm.com>>
Subject: Re: [gpfsug-discuss] Callbacks / softQuotaExceeded

Simon,

Based on my reading of the code, when a softQuotaExceeded event callback is 
invoked with %quotaType having the value "FILESET", the following arguments 
correspond with each other for filesetLimitExceeded and softQuotaExceeded:

- filesetLimitExceeded %inodeUsage  and softQuotaExceeded  %filesUsage
- filesetLimitExceeded %inodeQuota  and softQuotaExceeded  %filesQuota
- filesetLimitExceeded %inodeLimit  and softQuotaExceeded  %filesLimit
- filesetLimitExceeded %filesetSize and softQuotaExceeded  %blockUsage
- filesetLimitExceeded %softLimit   and softQuotaExceeded  %blockQuota
- filesetLimitExceeded %hardLimit   and softQuotaExceeded  %blockLimit

So, terms have changed to make them a little friendlier and to generalize them. 
 An inode is a file.  Limits related to inodes and to blocks are being reported.

Regards, The Spectrum Scale (GPFS) team
Eric Agar

--
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact  1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.



From:"Simon Thompson (IT Research Support)" 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
To:
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:11/06/2017 09:17 AM
Subject:[gpfsug-discuss] Callbacks / softQuotaExceeded
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




We were looking at adding some callbacks to notify us when file-sets go
over their inode limit by implementing it as a soft inode quota.

In the docs:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectru
m.scale.v4r23.doc/bl1adm_mmaddcallback.htm#mmaddcallback__Table1


There is an event filesetLimitExceeded, which has parameters: %inodeUsage
%inodeQuota, however the docs say that we should instead use
softQuotaExceeded as filesetLimitExceeded "It exists only for
compatibility (and may be deleted in a future version); therefore, using
softQuotaExceeded is recommended instead"

However.

softQuotaExceeded seems to have no %inodeQuota of %inodeUsage parameters.
Is this a doc error or is there genuinely no way to get the
inodeQuota/Usage with softQuotaExceeded? The same applies to passing
%quotaEventType.


Any suggestions?

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=IbxtjdkPAM2Sbon4Lbbi4w=7fytZP7U6ExP93umOcOUIXEUXD2KWdWEsrEqMtxOB0I=BiROZ43JuhZRhqOOpqTvHvl7bTqjPFxIrCxqIWAWa7U=



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Callbacks / softQuotaExceeded

2017-11-06 Thread Simon Thompson (IT Research Support)

We were looking at adding some callbacks to notify us when file-sets go
over their inode limit by implementing it as a soft inode quota.

In the docs:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectru
m.scale.v4r23.doc/bl1adm_mmaddcallback.htm#mmaddcallback__Table1


There is an event filesetLimitExceeded, which has parameters: %inodeUsage
%inodeQuota, however the docs say that we should instead use
softQuotaExceeded as filesetLimitExceeded "It exists only for
compatibility (and may be deleted in a future version); therefore, using
softQuotaExceeded is recommended instead"

However.

softQuotaExceeded seems to have no %inodeQuota of %inodeUsage parameters.
Is this a doc error or is there genuinely no way to get the
inodeQuota/Usage with softQuotaExceeded? The same applies to passing
%quotaEventType.


Any suggestions?

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] el7.4 compatibility

2017-10-23 Thread Simon Thompson (IT Research Support)


Just picking up this old thread, but...

October updates:
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#
linux


7.4 is now listed as supported with min scale version of 4.1.1.17 or
4.2.3.4

(incidentally 4.2.3.5 looks to have been released today).

Simon


On 27/09/2017, 09:16, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of kenneth.waege...@ugent.be"  wrote:

>Hi,
>
>Is there already some information available of gpfs (and protocols) on
>el7.4 ?
>
>Thanks!
>
>Kenneth
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] User group Meeting at SC17 - Registration and program details

2017-10-13 Thread Simon Thompson (IT Research Support)

The slides from the Manchester meeting are at:

http://files.gpfsug.org/presentations/2017/Manchester/09_licensing-update.p
df


We moved all of our socket licenses to per TB DME earlier this year, and
then also have DME per drive for our Lenovo DSS-G system, which for
various reasons is in a different cluster

There are certainly people in IBM UK who understand this process if that
was something you wanted to look at.

Simon


On 13/10/2017, 13:45, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Sobey, Richard A"  wrote:

>Actually, I was being 100% serious :) Although it's a boring topic, it's
>nonetheless fairly crucial and I'd like to see more about it. I won't be
>at SC17 unless you're livestreaming it anyway.
>
>Richard
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Carl Zetie
>Sent: 13 October 2017 13:13
>To: gpfsug-discuss@spectrumscale.org
>Subject: Re: [gpfsug-discuss] User group Meeting at SC17 - Registration
>and program details
>
>>I *need* to see the presentation from the licensing session ? everyone?s
>>favourite topic ?
> 
>
>I believe (hope?) that's just a placeholder, and we'll actually use the
>time for something more engaging...
> 
>  
>
> Carl Zetie
> Offering Manager for Spectrum Scale, IBM
> 
> (540) 882 9353 ][ Research Triangle Park
> ca...@us.ibm.com 
>
>Message: 3
>Date: Fri, 13 Oct 2017 09:47:39 +
>From: "Sobey, Richard A" 
>To: gpfsug main discussion list 
>Subject: Re: [gpfsug-discuss] User group Meeting at SC17 -
>   Registrationand program details
>Message-ID:
>   
> utlook.com>
>   
>Content-Type: text/plain; charset="utf-8"
>
>I *need* to see the presentation from the licensing session ? everyone?s
>favourite topic ?
>
>
>
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Checking a file-system for errors

2017-10-11 Thread Simon Thompson (IT Research Support)

So with the help of IBM support and Venkat (thanks guys!), we think its a
problem with DMAPI. As we initially saw this as an issue with AFM
replication, we had traces from there, and had entries like:

gpfsWrite exit: failed err 688


Now apparently err 688 relates to "DMAPI disposition", once we had this we
were able to get someone to take a look at the HSM dsmrecalld, it was
running, but had failed over to a node that wasn't able to service
requests properly. (multiple NSD servers with different file-systems each
running dsmrecalld, but I don't think you can scope nods XYZ to filesystem
ABC but not DEF).

Anyway once we got that fixed, a bunch of stuff in the AFM cache popped
out (and a little poke for some stuff that hadn't updated metadata cache
probably).

So hopefully its now also solved for our other users.

What is complicated here is that a DMAPI issue was giving intermittent IO
errors, people could write into new folders, but not existing files,
though I could (some sort of Schrödinger's cat IO issue??).

So hopefully we are fixed...

Simon

On 11/10/2017, 15:01, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of uwefa...@de.ibm.com" <gpfsug-discuss-boun...@spectrumscale.org on
behalf of uwefa...@de.ibm.com> wrote:

>Usually, IO errors point to some basic problem reading/writing data .
>if there are repoducible errors, it's IMHO always a nice thing to trace
>GPFS for such an access. Often that reveals already the area where the
>cause lies and maybe even the details of it.
> 
>
>
> 
>Mit freundlichen Grüßen / Kind regards
>
> 
>Dr. Uwe Falke
> 
>IT Specialist
>High Performance Computing Services / Integrated Technology Services /
>Data Center Services
>--
>-
>IBM Deutschland
>Rathausstr. 7
>09111 Chemnitz
>Phone: +49 371 6978 2165
>Mobile: +49 175 575 2877
>E-Mail: uwefa...@de.ibm.com
>--
>-
>IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
>Thomas Wolter, Sven Schooß
>Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
>HRB 17122 
>
>
>
>
>From:   "Simon Thompson (IT Research Support)" <s.j.thomp...@bham.ac.uk>
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Date:   10/11/2017 01:22 PM
>Subject:Re: [gpfsug-discuss] Checking a file-system for errors
>Sent by:gpfsug-discuss-boun...@spectrumscale.org
>
>
>
>Yes I get we should only be doing this if we think we have a problem.
>
>And the answer is, right now, we're not entirely clear.
>
>We have a couple of issues our users are reporting to us, and its not
>clear to us if they are related, an FS problem or ACLs getting in the way.
>
>We do have users who are trying to work on files getting IO error, and we
>have an AFM sync issue. The disks are all online, I poked the FS with
>tsdbfs and the files look OK - (small files, but content of the block
>matches).
>
>Maybe we have a problem with DMAPI and TSM/HSM (could that cause IO error
>reported to user when they access a file even if its not an offline
>file??)
>
>We have a PMR open with IBM on this already.
>
>But there's a wanting to be sure in our own minds that we don't have an
>underlying FS problem. I.e. I have confidence that I can tell my users,
>yes I know you are seeing weird stuff, but we have run checks and are not
>introducing data corruption.
>
>Simon
>
>On 11/10/2017, 11:58, "gpfsug-discuss-boun...@spectrumscale.org on behalf
>of uwefa...@de.ibm.com" <gpfsug-discuss-boun...@spectrumscale.org on
>behalf of uwefa...@de.ibm.com> wrote:
>
>>Mostly, however,  filesystem checks are only done if fs issues are
>>indicated by errors in the logs. Do you have reason to assume your fs has
>>probs?
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Checking a file-system for errors

2017-10-11 Thread Simon Thompson (IT Research Support)

Yes I get we should only be doing this if we think we have a problem.

And the answer is, right now, we're not entirely clear.

We have a couple of issues our users are reporting to us, and its not
clear to us if they are related, an FS problem or ACLs getting in the way.

We do have users who are trying to work on files getting IO error, and we
have an AFM sync issue. The disks are all online, I poked the FS with
tsdbfs and the files look OK - (small files, but content of the block
matches).

Maybe we have a problem with DMAPI and TSM/HSM (could that cause IO error
reported to user when they access a file even if its not an offline file??)

We have a PMR open with IBM on this already.

But there's a wanting to be sure in our own minds that we don't have an
underlying FS problem. I.e. I have confidence that I can tell my users,
yes I know you are seeing weird stuff, but we have run checks and are not
introducing data corruption.

Simon

On 11/10/2017, 11:58, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of uwefa...@de.ibm.com"  wrote:

>Mostly, however,  filesystem checks are only done if fs issues are
>indicated by errors in the logs. Do you have reason to assume your fs has
>probs?

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Checking a file-system for errors

2017-10-11 Thread Simon Thompson (IT Research Support)

OK thanks,

So if I run mmfsck in online mode and it says:
"File system is clean.
Exit status 0:10:0."

Then I can assume there is no benefit to running in offline mode?

But it would also be prudent to run "mmrestripefs -c" to be sure my
filesystem is happy?

Thanks

Simon

On 11/10/2017, 11:19, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of uwefa...@de.ibm.com" <gpfsug-discuss-boun...@spectrumscale.org on
behalf of uwefa...@de.ibm.com> wrote:

>Hm , mmfsck will  return not very reliable results in online mode,
>especially it will report many issues which are just due to the transient
>states in a files system in operation.
>It should however not find less issues than in off-line mode.
>
>mmrestripefs -c does not do any logical checks, it just checks for
>differences of multiple replicas of the same data/metadata.
>File system errors can be caused by such discrepancies (if an odd/corrupt
>replica is used by the GPFS), but can also be caused (probably more
>likely) by logical errors / bugs when metadata were modified in the file
>system. In those cases, all the replicas are identical nevertheless
>corrupt (cannot be found by mmrestripefs)
> 
>So, mmrestripefs -c is like scrubbing for silent data corruption (on its
>own, it cannot decide which is the correct replica!), while mmfsck checks
>the filesystem structure for logical consistency.
>If the contents of the replicas of a data block differ, mmfsck won't see
>any problem (as long as the fs metadata are consistent), but mmrestripefs
>-c will. 
>
> 
>Mit freundlichen Grüßen / Kind regards
>
> 
>Dr. Uwe Falke
> 
>IT Specialist
>High Performance Computing Services / Integrated Technology Services /
>Data Center Services
>--
>-
>IBM Deutschland
>Rathausstr. 7
>09111 Chemnitz
>Phone: +49 371 6978 2165
>Mobile: +49 175 575 2877
>E-Mail: uwefa...@de.ibm.com
>--
>-
>IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
>Thomas Wolter, Sven Schooß
>Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
>HRB 17122 
>
>
>
>
>From:   "Simon Thompson (IT Research Support)" <s.j.thomp...@bham.ac.uk>
>To: "gpfsug-discuss@spectrumscale.org"
><gpfsug-discuss@spectrumscale.org>
>Date:   10/11/2017 10:47 AM
>Subject:[gpfsug-discuss] Checking a file-system for errors
>Sent by:gpfsug-discuss-boun...@spectrumscale.org
>
>
>
>I'm just wondering if anyone could share any views on checking a
>file-system for errors.
>
>For example, we could use mmfsck in online and offline mode. Does online
>mode detect errors (but not fix) things that would be found in offline
>mode?
>
>And then were does mmrestripefs -c fit into this?
>
>"-c
>  Scans the file system and compares replicas of
>  metadata and data for conflicts. When conflicts
>  are found, the -c option attempts to fix
>  the replicas.
>"
>
>Which sorta sounds like fix things in the file-system, so how does that
>intersect (if at all) with mmfsck?
>
>Thanks
>
>Simon
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Checking a file-system for errors

2017-10-11 Thread Simon Thompson (IT Research Support)

I'm just wondering if anyone could share any views on checking a
file-system for errors.

For example, we could use mmfsck in online and offline mode. Does online
mode detect errors (but not fix) things that would be found in offline
mode?

And then were does mmrestripefs -c fit into this?

"-c
  Scans the file system and compares replicas of
  metadata and data for conflicts. When conflicts
  are found, the -c option attempts to fix
  the replicas.
"

Which sorta sounds like fix things in the file-system, so how does that
intersect (if at all) with mmfsck?

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Changing ip on spectrum scale cluster with every node down and not connected to network.

2017-10-11 Thread Simon Thompson (IT Research Support)

I think you really want a PMR for this. There are some files you could 
potentially edit and copy around, but given its cluster configuration, I 
wouldn't be doing this on a cluster I cared about with explicit instruction 
from IBM support.

So I suggest log a ticket with IBM.

Simon

From: 
>
 on behalf of "a...@b4restore.com" 
>
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Wednesday, 11 October 2017 at 08:46
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: [gpfsug-discuss] Changing ip on spectrum scale cluster with every node 
down and not connected to network.

Hi,

Does anyone know how to change the ips on all the nodes within a cluster when 
gpfs and interfaces are down?
Right now the cluster has been shutdown and all ports disconnected(ports has 
been shut down on new switch)

The problem is that when I try to execute any mmchnode command(as the ibm 
documentation states) the command fails, and that makes sense as the ip on the 
interface has been changed without the deamon knowing.. But is there a way to 
do it manually within the configuration files so that the gpfs daemon updates 
the ips of all nodes within the cluster or does anyone know of a hack around to 
do it without having network access.

It is not possible to turn on the switch ports as the cluster has the same ips 
right now as another cluster on the new switch.

Hope you understand, relatively new to gpfs/spectrum scale

Venlig hilsen / Best Regards

Andi R. Christiansen
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] AFM fun (more!)

2017-10-10 Thread Simon Thompson (IT Research Support)

a/fastq/PD7446i.fastq
[root@server

I am not sure if that helps and you probably already know about it inflight 
checking…


Kind Regards,

Leo

Leo Earl | Head of Research & Specialist Computing
Room ITCS 01.16, University of East Anglia, Norwich Research Park, Norwich NR4 
7TJ
+44 (0) 1603 593856


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Venkateswara R 
Puvvada
Sent: 10 October 2017 05:56
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] AFM fun (more!)

Simon,

>Question 1.
>Can we force the gateway node for the other file-sets to our "02" node.
>I.e. So that we can get the queue services for the other filesets.

AFM automatically maps the fileset to gateway node, and today there is no 
option available for users to assign fileset to a particular gateway node. This 
feature will be supported in future releases.

>Question 2.
>How can we make AFM actually work for the "facility" file-set. If we shut
>down GPFS on the node, on the secondary node, we'll see log entires like:
>2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove
>operations...
>So I'm assuming the massive queue is all file remove operations?

These are the files which were created in cache, and were deleted before they 
get replicated to home. AFM recovery will delete them locally. Yes, it is 
possible that most of these operations are local remove operations.Try finding 
those operations using dump command.

 mmfsadm saferdump afm all | grep 'Remove\|Rmdir' | grep local | wc -l


>Alarmingly, we are also seeing entires like:
>2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache
>fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name  remote
>error 5

Traces are needed to verify IO errors. Also try disabling the parallel IO and 
see if replication speed improves.

mmchfileset device fileset -p afmParallelWriteThreshold=disable

~Venkat (vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>)



From:"Simon Thompson (IT Research Support)" 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
To:
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:10/09/2017 06:27 PM
Subject:[gpfsug-discuss] AFM fun (more!)
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





Hi All,

We're having fun (ok not fun ...) with AFM.

We have a file-set where the queue length isn't shortening, watching it
over 5 sec periods, the queue length increases by ~600-1000 items, and the
numExec goes up by about 15k.

The queues are steadily rising and we've seen them over 100 ...

This is on one particular fileset e.g.:

mmafmctl rds-cache getstate
  Mon Oct  9 08:43:58 2017

Fileset NameFileset TargetCache State
   Gateway NodeQueue Length   Queue numExec
--
-   -
rds-projects-facility gpfs:///rds/projects/facility   Dirty
   bber-afmgw013068953520504
rds-projects-2015 gpfs:///rds/projects/2015   Active
   bber-afmgw010  3
rds-projects-2016 gpfs:///rds/projects/2016   Dirty
   bber-afmgw011482   70
rds-projects-2017 gpfs:///rds/projects/2017   Dirty
   bber-afmgw017139104
bear-apps gpfs:///rds/bear-apps Dirty
 bber-afmgw023  2472770871
user-homes gpfs:///rds/homes Active
  bber-afmgw020  19
bear-sysappsgpfs:///rds/bear-sysapps  Active
   bber-afmgw020  4



This is having the effect that other filesets on the same "Gateway" are
not getting their queues processed.

Question 1.
Can we force the gateway node for the other file-sets to our "02" node.
I.e. So that we can get the queue services for the other filesets.

Question 2.
How can we make AFM actually work for the "facility" file-set. If we shut
down GPFS on the node, on the secondary node, we'll see log entires like:
2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove
operations...

So I'm assuming the massive queue is all file remove operations?

Alarmingly, we are also seeing entires like:
2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache
fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name  rem

Re: [gpfsug-discuss] changing default configuration values

2017-10-10 Thread Simon Thompson (IT Research Support)

They do, but ...

I don't know what happens to a running node if its then added to a
nodeclass. Ie.. Would it apply the options it can immediately, or only
once the node is recycled?

Pass...

Making an mmchconfig change to the node class after its a member would
work as expected.

Simon

On 10/10/2017, 13:32, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Aaron Knister" <gpfsug-discuss-boun...@spectrumscale.org on behalf of
aaron.s.knis...@nasa.gov> wrote:

>Simon,
>
>Does that mean node classes don't work the way individual node names do
>with the "-i/-I" options?
>
>-Aaron
>
>On 10/10/17 8:30 AM, Simon Thompson (IT Research Support) wrote:
>> Yes, but obviously only when you recycle mmfsd on the node after adding
>>it
>> to the node class, e.g. page pool cannot be changed online.
>> 
>> We do this all the time, e.g. We have nodes with different IB
>> fabrics/cards in clusters, so use mlx4_0/... And mlx5/... And have
>>classes
>> for (e.g.) "FDR" and "EDR" nodes. (different fabric numbers in different
>> DCs etc)
>> 
>> Simon
>> 
>> On 10/10/2017, 13:04, "gpfsug-discuss-boun...@spectrumscale.org on
>>behalf
>> of sco...@emailhosting.com" <gpfsug-discuss-boun...@spectrumscale.org on
>> behalf of sco...@emailhosting.com> wrote:
>> 
>>> So when a node is added to the node class, my defaults" will be
>>>applied?
>>> If so,excellent. Thanks
>>>
>>>
>>>  Original Message
>>> From: s.j.thomp...@bham.ac.uk
>>> Sent: October 10, 2017 8:02 AM
>>> To: gpfsug-discuss@spectrumscale.org
>>> Reply-to: gpfsug-discuss@spectrumscale.org
>>> Subject: Re: [gpfsug-discuss] changing default configuration values
>>>
>>> Use mmchconfig and change the defaults, and then have a node class for
>>> "not the defaults"?
>>>
>>> Apply settings to a node class and add all new clients to the node
>>>class?
>>>
>>> Note there was some version of Scale where node classes were enumerated
>>> when the config was set for the node class, but in (4.2.3 at least),
>>>this
>>> works as expected, I.e. The node class is not expanded when doing
>>> mmchconfig -N 
>>>
>>> Simon
>>>
>>> On 10/10/2017, 10:49, "gpfsug-discuss-boun...@spectrumscale.org on
>>>behalf
>>> of sco...@emailhosting.com" <gpfsug-discuss-boun...@spectrumscale.org
>>>on
>>> behalf of sco...@emailhosting.com> wrote:
>>>
>>>> So, I think brings up one of the slight frustrations I've always had
>>>>with
>>>> mmconfig..
>>>>
>>>> If I have a cluster to which new nodes will eventually be added, OR, I
>>>> have standard I always wish to apply, there is no way to say "all
>>>>FUTURE"
>>>> nodes need to have my defaults.. I just have to remember to extended
>>>>the
>>>> changes in as new nodes are brought into the cluster.
>>>>
>>>> Is there a way to accomplish this?
>>>> Thanks
>>>>
>>>>  Original Message
>>>> From: aaron.s.knis...@nasa.gov
>>>> Sent: October 9, 2017 2:56 PM
>>>> To: gpfsug-discuss@spectrumscale.org
>>>> Reply-to: gpfsug-discuss@spectrumscale.org
>>>> Subject: Re: [gpfsug-discuss] changing default configuration values
>>>>
>>>> Thanks! Good to know.
>>>>
>>>> On 10/6/17 11:06 PM, IBM Spectrum Scale wrote:
>>>>> Hi Aaron,
>>>>>
>>>>> The default value applies to all nodes in the cluster. Thus changing
>>>>>it
>>>>> will change all nodes in the cluster. You need to run mmchconfig to
>>>>> customize the node override again.
>>>>>
>>>>>
>>>>> Regards, The Spectrum Scale (GPFS) team
>>>>>
>>>>>
>>>>> 
>>>>>--
>>>>>--
>>>>> -
>>>>> -
>>>>> If you feel that your question can benefit other users of Spectrum
>>>>> Scale 
>>>>> (GPFS), then please post it to the public IBM developerWroks Forum at
>>>>>
>>>>> 
>>>>>https://www.ibm.c

Re: [gpfsug-discuss] changing default configuration values

2017-10-10 Thread Simon Thompson (IT Research Support)

Use mmchconfig and change the defaults, and then have a node class for
"not the defaults"?

Apply settings to a node class and add all new clients to the node class?

Note there was some version of Scale where node classes were enumerated
when the config was set for the node class, but in (4.2.3 at least), this
works as expected, I.e. The node class is not expanded when doing
mmchconfig -N 

Simon

On 10/10/2017, 10:49, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of sco...@emailhosting.com"  wrote:

>So, I think brings up one of the slight frustrations I've always had with
>mmconfig..
>
>If I have a cluster to which new nodes will eventually be added, OR, I
>have standard I always wish to apply, there is no way to say "all FUTURE"
>nodes need to have my defaults.. I just have to remember to extended the
>changes in as new nodes are brought into the cluster.
>
>Is there a way to accomplish this?
>Thanks
>
>  Original Message
>From: aaron.s.knis...@nasa.gov
>Sent: October 9, 2017 2:56 PM
>To: gpfsug-discuss@spectrumscale.org
>Reply-to: gpfsug-discuss@spectrumscale.org
>Subject: Re: [gpfsug-discuss] changing default configuration values
>
>Thanks! Good to know.
>
>On 10/6/17 11:06 PM, IBM Spectrum Scale wrote:
>> Hi Aaron,
>> 
>> The default value applies to all nodes in the cluster. Thus changing it
>> will change all nodes in the cluster. You need to run mmchconfig to
>> customize the node override again.
>> 
>> 
>> Regards, The Spectrum Scale (GPFS) team
>> 
>> 
>>-
>>-
>> If you feel that your question can benefit other users of Spectrum
>>Scale 
>> (GPFS), then please post it to the public IBM developerWroks Forum at
>> 
>>https://www.ibm.com/developerworks/community/forums/html/forum?id=111
>>1----0479.
>> 
>> 
>> If your query concerns a potential software error in Spectrum Scale
>> (GPFS) and you have an IBM software maintenance contract please contact
>> 1-800-237-5511 in the United States or your local IBM Service Center in
>> other countries.
>> 
>> The forum is informally monitored as time permits and should not be
>>used 
>> for priority messages to the Spectrum Scale (GPFS) team.
>> 
>> Inactive hide details for Aaron Knister ---10/06/2017 06:30:20 PM---Is
>> there a way to change the default value of a configuratiAaron Knister
>> ---10/06/2017 06:30:20 PM---Is there a way to change the default value
>> of a configuration option without overriding any overrid
>> 
>> From: Aaron Knister 
>> To: gpfsug main discussion list 
>> Date: 10/06/2017 06:30 PM
>> Subject: [gpfsug-discuss] changing default configuration values
>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>> 
>> 
>> 
>> 
>> 
>> Is there a way to change the default value of a configuration option
>> without overriding any overrides in place?
>> 
>> Take the following situation:
>> 
>> - I set parameter foo=bar for all nodes (mmchconfig foo=bar)
>> - I set parameter foo to baz for a few nodes (mmchconfig foo=baz -N
>> n001,n002)
>> 
>> Is there a way to then set the default value of foo to qux without
>> changing the value of foo for nodes n001 and n002?
>> 
>> -Aaron
>> 
>> -- 
>> Aaron Knister
>> NASA Center for Climate Simulation (Code 606.2)
>> Goddard Space Flight Center
>> (301) 286-2776
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> 
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_li
>>stinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=IbxtjdkPAM2Sb
>>on4Lbbi4w=PvuGmTBHKi7CNLU4X2GzUXkHzzezwTSrL4EdgwI0wrk=ma-IogZTBRL11_4
>>zp2l8JqWmpHajoLXubAPSIS3K7GY=
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>-- 
>Aaron Knister
>NASA Center for Climate Simulation (Code 606.2)
>Goddard Space Flight Center
>(301) 286-2776
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] AFM fun (more!)

2017-10-09 Thread Simon Thompson (IT Research Support)


Hi All,

We're having fun (ok not fun ...) with AFM.

We have a file-set where the queue length isn't shortening, watching it
over 5 sec periods, the queue length increases by ~600-1000 items, and the
numExec goes up by about 15k.

The queues are steadily rising and we've seen them over 100 ...

This is on one particular fileset e.g.:

mmafmctl rds-cache getstate
   Mon Oct  9 08:43:58 2017

Fileset NameFileset TargetCache State
Gateway NodeQueue Length   Queue numExec
--
-   -
rds-projects-facility gpfs:///rds/projects/facility   Dirty
bber-afmgw013068953520504
rds-projects-2015 gpfs:///rds/projects/2015   Active
bber-afmgw010  3
rds-projects-2016 gpfs:///rds/projects/2016   Dirty
bber-afmgw011482   70
rds-projects-2017 gpfs:///rds/projects/2017   Dirty
bber-afmgw017139104
bear-apps   gpfs:///rds/bear-apps Dirty
  bber-afmgw023  2472770871
user-homes  gpfs:///rds/homes Active
   bber-afmgw020  19
bear-sysappsgpfs:///rds/bear-sysapps  Active
bber-afmgw020  4



This is having the effect that other filesets on the same "Gateway" are
not getting their queues processed.

Question 1.
Can we force the gateway node for the other file-sets to our "02" node.
I.e. So that we can get the queue services for the other filesets.

Question 2.
How can we make AFM actually work for the "facility" file-set. If we shut
down GPFS on the node, on the secondary node, we'll see log entires like:
2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove
operations...

So I'm assuming the massive queue is all file remove operations?

Alarmingly, we are also seeing entires like:
2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache
fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name  remote
error 5

Anyone any suggestions?

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] el7.4 compatibility

2017-09-29 Thread Simon Thompson (IT Research Support)

Correct they some from IBM support.

The AFM issue we have (and is fixed in the efix) is if you have client
code running on the AFM cache that uses truncate. The AFM write coalescing
processing does something funny with it, so the file isn't truncated and
then the data you write afterwards isn't copied back to home.

We found this with ABAQUS code running on our HPC nodes onto the AFM
cache, I.e. At home, the final packed output file from ABAQUS is corrupt
as its the "untruncated and then filled" version of the file (so just a
big blob of empty data). I would guess that anything using truncate would
see the same issue.

4.2.3.x: APAR IV99796

See IBM Flash Alert at:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010629=s033=O
CSTXKQY=OCSWJ00=E_sp=s033-_-OCSTXKQY-OCSWJ00-_-E


Its remedied in efix2, of course remember that an efix has not gone
through a full testing validation cycle (otherwise it would be a PTF), but
we have not seen any issues in our environments running 4.2.3.4efix2.

Simon

On 29/09/2017, 10:04, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Sobey, Richard A" <gpfsug-discuss-boun...@spectrumscale.org on behalf
of r.so...@imperial.ac.uk> wrote:

>Efixes (in my one time only limited experience!) come direct from IBM as
>a result of a PMR.
>Richard
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of John Hearns
>Sent: 29 September 2017 10:02
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: Re: [gpfsug-discuss] el7.4 compatibility
>
>Simon,
>I would appreciate a heads up on that AFM issue.
>I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is
>if a remote NFS mount goes down then an asynchronous operation such as a
>read can be stopped.
>
>I must admit to being not clued up on how the efixes are distributed. I
>downloaded the 4.2.3.4 installer for Linux yesterday.
>Should I be searching for additional fix packs on top of that (which I am
>in fact doing now).
>
>John H
>
>
>
>
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Simon
>Thompson (IT Research Support)
>Sent: Thursday, September 28, 2017 4:45 PM
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: Re: [gpfsug-discuss] el7.4 compatibility
>
>
>Aren't listed as tested
>
>Sorry ...
>4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM
>issue we have.
>
>Simon
>
>On 28/09/2017, 15:36, "kenneth.waege...@ugent.be"
><kenneth.waege...@ugent.be> wrote:
>
>>
>>
>>On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote:
>>> The 7.4 kernels are listed as having been tested by IBM.
>>Hi,
>>
>>Were did you find this?
>>>
>>> Having said that, we have clients running 7.4 kernel and its OK, but
>>> we are 4.2.3.4efix2, so bump versions...
>>Do you have some information about the efix2? Is this for 7.4 ? And
>>where should we find this :-)
>>
>>Thank you!
>>
>>Kenneth
>>
>>>
>>> Simon
>>>
>>> On 28/09/2017, 15:18, "gpfsug-discuss-boun...@spectrumscale.org on
>>>behalf  of Jeffrey R. Lang" <gpfsug-discuss-boun...@spectrumscale.org
>>>on behalf of  jrl...@uwyo.edu> wrote:
>>>
>>>> I just tired to build the GPFS GPL module against the latest version
>>>>of  RHEL 7.4 kernel and the build fails.  The link below show that it
>>>>should  work.
>>>>
>>>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread
>>>> kdump-kern.o: In function `GetOffset':
>>>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base'
>>>> kdump-kern.o: In function `KernInit':
>>>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base'
>>>> collect2: error: ld returned 1 exit status
>>>> make[1]: *** [modules] Error 1
>>>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
>>>> make: *** [Modules] Error 1
>>>> 
>>>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT
>>>>2017.
>>>> 
>>>> mmbuildgpl: Command failed. Examine previous error messages to
>>>>determine  cause.
>>>> [root@bkupsvr3 ~]#
>>>> [root@bkupsvr3 ~]#
>>>> [root@bkupsvr3 ~]#
>>>> [root@bkupsvr3 ~]

Re: [gpfsug-discuss] el7.4 compatibility

2017-09-28 Thread Simon Thompson (IT Research Support)


Aren't listed as tested

Sorry ...
4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM
issue we have.

Simon

On 28/09/2017, 15:36, "kenneth.waege...@ugent.be"
<kenneth.waege...@ugent.be> wrote:

>
>
>On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote:
>> The 7.4 kernels are listed as having been tested by IBM.
>Hi,
>
>Were did you find this?
>>
>> Having said that, we have clients running 7.4 kernel and its OK, but we
>> are 4.2.3.4efix2, so bump versions...
>Do you have some information about the efix2? Is this for 7.4 ? And
>where should we find this :-)
>
>Thank you!
>
>Kenneth
>
>>
>> Simon
>>
>> On 28/09/2017, 15:18, "gpfsug-discuss-boun...@spectrumscale.org on
>>behalf
>> of Jeffrey R. Lang" <gpfsug-discuss-boun...@spectrumscale.org on behalf
>>of
>> jrl...@uwyo.edu> wrote:
>>
>>> I just tired to build the GPFS GPL module against the latest version of
>>> RHEL 7.4 kernel and the build fails.  The link below show that it
>>>should
>>> work.
>>>
>>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread
>>> kdump-kern.o: In function `GetOffset':
>>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base'
>>> kdump-kern.o: In function `KernInit':
>>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base'
>>> collect2: error: ld returned 1 exit status
>>> make[1]: *** [modules] Error 1
>>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
>>> make: *** [Modules] Error 1
>>> 
>>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017.
>>> 
>>> mmbuildgpl: Command failed. Examine previous error messages to
>>>determine
>>> cause.
>>> [root@bkupsvr3 ~]#
>>> [root@bkupsvr3 ~]#
>>> [root@bkupsvr3 ~]#
>>> [root@bkupsvr3 ~]#
>>> [root@bkupsvr3 ~]# uname -a
>>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9
>>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
>>> [root@bkupsvr3 ~]# mmdiag --version
>>>
>>> === mmdiag: version ===
>>> Current GPFS build: "4.2.2.3 ".
>>> Built on Mar 16 2017 at 11:19:59
>>>
>>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel.  In my
>>> case 514.26.2
>>>
>>> If I'm missing something can some one point me in the right direction?
>>>
>>>
>>> -Original Message-
>>> From: gpfsug-discuss-boun...@spectrumscale.org
>>> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan
>>> Banister
>>> Sent: Thursday, September 28, 2017 8:22 AM
>>> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility
>>>
>>> Please review this site:
>>>
>>> 
>>>https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.ht
>>>ml
>>>
>>> Hope that helps,
>>> -Bryan
>>>
>>> -Original Message-
>>> From: gpfsug-discuss-boun...@spectrumscale.org
>>> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of
>>> greg.lehm...@csiro.au
>>> Sent: Wednesday, September 27, 2017 6:45 PM
>>> To: gpfsug-discuss@spectrumscale.org
>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility
>>>
>>> Note: External Email
>>> -
>>>
>>> I guess I may as well ask about SLES 12 SP3 as well! TIA.
>>>
>>> -Original Message-
>>> From: gpfsug-discuss-boun...@spectrumscale.org
>>> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Kenneth
>>> Waegeman
>>> Sent: Wednesday, 27 September 2017 6:17 PM
>>> To: gpfsug-discuss@spectrumscale.org
>>> Subject: [gpfsug-discuss] el7.4 compatibility
>>>
>>> Hi,
>>>
>>> Is there already some information available of gpfs (and protocols) on
>>> el7.4 ?
>>>
>>> Thanks!
>>>
>>> Kenneth
>>>
>>> ___
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> ___
>>&g

Re: [gpfsug-discuss] el7.4 compatibility

2017-09-28 Thread Simon Thompson (IT Research Support)

The 7.4 kernels are listed as having been tested by IBM.

Having said that, we have clients running 7.4 kernel and its OK, but we
are 4.2.3.4efix2, so bump versions...

Simon

On 28/09/2017, 15:18, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Jeffrey R. Lang"  wrote:

>I just tired to build the GPFS GPL module against the latest version of
>RHEL 7.4 kernel and the build fails.  The link below show that it should
>work.
>
>cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread
>kdump-kern.o: In function `GetOffset':
>kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base'
>kdump-kern.o: In function `KernInit':
>kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base'
>collect2: error: ld returned 1 exit status
>make[1]: *** [modules] Error 1
>make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
>make: *** [Modules] Error 1
>
>mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017.
>
>mmbuildgpl: Command failed. Examine previous error messages to determine
>cause.
>[root@bkupsvr3 ~]#
>[root@bkupsvr3 ~]#
>[root@bkupsvr3 ~]#
>[root@bkupsvr3 ~]#
>[root@bkupsvr3 ~]# uname -a
>Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9
>03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
>[root@bkupsvr3 ~]# mmdiag --version
>
>=== mmdiag: version ===
>Current GPFS build: "4.2.2.3 ".
>Built on Mar 16 2017 at 11:19:59
>
>In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel.  In my
>case 514.26.2
>
>If I'm missing something can some one point me in the right direction?
>
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Bryan
>Banister
>Sent: Thursday, September 28, 2017 8:22 AM
>To: gpfsug main discussion list 
>Subject: Re: [gpfsug-discuss] el7.4 compatibility
>
>Please review this site:
>
>https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html
>
>Hope that helps,
>-Bryan
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of
>greg.lehm...@csiro.au
>Sent: Wednesday, September 27, 2017 6:45 PM
>To: gpfsug-discuss@spectrumscale.org
>Subject: Re: [gpfsug-discuss] el7.4 compatibility
>
>Note: External Email
>-
>
>I guess I may as well ask about SLES 12 SP3 as well! TIA.
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Kenneth
>Waegeman
>Sent: Wednesday, 27 September 2017 6:17 PM
>To: gpfsug-discuss@spectrumscale.org
>Subject: [gpfsug-discuss] el7.4 compatibility
>
>Hi,
>
>Is there already some information available of gpfs (and protocols) on
>el7.4 ?
>
>Thanks!
>
>Kenneth
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] filesets inside of filesets

2017-09-06 Thread Simon Thompson (IT Research Support)

Filesets in filesets are fine. BUT if you use scoped backups with TSM... Er 
Spectrum Protect, then there are restrictions on creating an IFS inside an IFS 
...

Simon

From: 
>
 on behalf of "damir.krs...@gmail.com" 
>
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Wednesday, 6 September 2017 at 13:35
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: [gpfsug-discuss] filesets inside of filesets

Today we have following fileset structure on our filesystem:

/projects <-- gpfs filesystem

/projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it

I need to create a fileset or a directory inside of this project and have 
separate quota applied to it e.g.:

/projects/b1000 (b1000 has 10TB quota applied)
/projects/b1000/backup (backup has 1TB quota applied)

Is this possible? I am thinking nested filesets would work if GPFS supports 
that. Otherwise, I was going to create a separate filesystem, create 
corresponding backup filesets on it and symlink them to the 
/projects/ directory.

Thanks in advance.

Damir
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] AFM weirdness

2017-08-25 Thread Simon Thompson (IT Research Support)

So as Venkat says, AFM doesn't support using fallocate() to preallocate space.

So why aren't other people seeing this ... Well ...

We use EasyBuild to build our HPC cluster software including the compiler tool 
chains.
This enables the new linker ld.gold by default rather than the "old" ld.
Interestingly we don't seem to have seen this with C code being compiled, only 
fortran.
We can work around it by using the options to gfortran I mention below.

There is a mention to this limitation at:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_afmlimitations.htm

We aren;t directly calling gpfs_prealloc, but I guess the linker is indirectly 
calling it by making a call to posix_fallocate.

I do have a new problem with AFM where the data written to the cache differs 
from that replicated back to home... I'm beginning to think I don't like the 
decision to use AFM! Given the data written back to HOME is corrupt, I think 
this is definitely PMR time. But ... If you have Abaqus on you system and are 
using AFM, I'd be interested to see if someone else sees the same issue as us!

Simon

From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Simon Thompson 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
Reply-To: 
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Wednesday, 23 August 2017 at 14:01
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] AFM weirdness

I've got a PMR open about this ... Will email you the number directly.

Looking at the man page for ld.gold, it looks to set '--posix-fallocate' by 
default. In fact, testing with '-Xlinker -no-posix-fallocate' does indeed make 
the code compile.

Simon

From: "vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>" 
<vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>>
Date: Wednesday, 23 August 2017 at 13:36
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>, 
Simon Thompson <s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
Subject: Re: [gpfsug-discuss] AFM weirdness

I believe this error is result of preallocation failure, but traces are needed 
to confirm this.  AFM caching modes does not support preallocation of blocks 
(ex. using fallocate()). This feature is supported only in AFM DR.

~Venkat (vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>)

From:"Simon Thompson (IT Research Support)" 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:08/23/2017 03:48 PM
Subject:Re: [gpfsug-discuss] AFM weirdness
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>

OK so I checked and if I run directly on the "AFM" FS in a different "non
AFM" directory, it works fine, so its something AFM related ...

Simon

On 23/08/2017, 11:11, 
"gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf
of Simon Thompson (IT Research Support)"
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf of
s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>> wrote:

>We're using an AFM cache from our HPC nodes to access data in another GPFS
>cluster, mostly this seems to be working fine, but we've just come across
>an interesting problem with a user using gfortran from the GCC 5.2.0
>toolset.
>
>When linking their code, they get a "no space left on device" error back
>from the linker. If we do this on a node that mounts the file-system
>directly (I.e. Not via AFM cache), then it works fine.
>
>We tried with GCC 4.5 based tools and it works OK, but the difference
>there is that 4.x uses ld and 5x uses ld.gold.
>
>If we strike the ld.gold when using AFM, we see:
>
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")   = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480) = -1 ENOSPC (No space left on
>device)
>
>
>
>Vs when running directly on the file-system:
>stat(&

Re: [gpfsug-discuss] AFM weirdness

2017-08-23 Thread Simon Thompson (IT Research Support)

I've got a PMR open about this ... Will email you the number directly.

Looking at the man page for ld.gold, it looks to set '--posix-fallocate' by 
default. In fact, testing with '-Xlinker -no-posix-fallocate' does indeed make 
the code compile.

Simon

From: "vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>" 
<vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>>
Date: Wednesday, 23 August 2017 at 13:36
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>, 
Simon Thompson <s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
Subject: Re: [gpfsug-discuss] AFM weirdness

I believe this error is result of preallocation failure, but traces are needed 
to confirm this.  AFM caching modes does not support preallocation of blocks 
(ex. using fallocate()). This feature is supported only in AFM DR.

~Venkat (vpuvv...@in.ibm.com<mailto:vpuvv...@in.ibm.com>)



From:"Simon Thompson (IT Research Support)" 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:08/23/2017 03:48 PM
Subject:Re: [gpfsug-discuss] AFM weirdness
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




OK so I checked and if I run directly on the "AFM" FS in a different "non
AFM" directory, it works fine, so its something AFM related ...

Simon

On 23/08/2017, 11:11, 
"gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf
of Simon Thompson (IT Research Support)"
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf of
s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>> wrote:

>We're using an AFM cache from our HPC nodes to access data in another GPFS
>cluster, mostly this seems to be working fine, but we've just come across
>an interesting problem with a user using gfortran from the GCC 5.2.0
>toolset.
>
>When linking their code, they get a "no space left on device" error back
>from the linker. If we do this on a node that mounts the file-system
>directly (I.e. Not via AFM cache), then it works fine.
>
>We tried with GCC 4.5 based tools and it works OK, but the difference
>there is that 4.x uses ld and 5x uses ld.gold.
>
>If we strike the ld.gold when using AFM, we see:
>
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")   = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480) = -1 ENOSPC (No space left on
>device)
>
>
>
>Vs when running directly on the file-system:
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")   = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480) = 0
>
>
>
>Anyone seen anything like this before?
>
>... Actually I'm about to go off and see if its a function of AFM, or
>maybe something to do with the FS in use (I.e. Make a local directory on
>the filesystem on the "AFM" FS and see if that works ...)
>
>Thanks
>
>Simon
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A=UqTzoU-bx454OgyeB4f0Nrruvs7yYAxFutzIe2eKmnc=8E5opHyyAwomLS8kdxpvKCvf6sdKBLlfZvx6wDdaZy4=

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A=UqTzoU-bx454OgyeB4f0Nrruvs7yYAxFutzIe2eKmnc=8E5opHyyAwomLS8kdxpvKCvf6sdKBLlfZvx6wDdaZy4=




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] AFM weirdness

2017-08-23 Thread Simon Thompson (IT Research Support)

OK so I checked and if I run directly on the "AFM" FS in a different "non
AFM" directory, it works fine, so its something AFM related ...

Simon

On 23/08/2017, 11:11, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Simon Thompson (IT Research Support)"
<gpfsug-discuss-boun...@spectrumscale.org on behalf of
s.j.thomp...@bham.ac.uk> wrote:

>We're using an AFM cache from our HPC nodes to access data in another GPFS
>cluster, mostly this seems to be working fine, but we've just come across
>an interesting problem with a user using gfortran from the GCC 5.2.0
>toolset.
>
>When linking their code, they get a "no space left on device" error back
>from the linker. If we do this on a node that mounts the file-system
>directly (I.e. Not via AFM cache), then it works fine.
>
>We tried with GCC 4.5 based tools and it works OK, but the difference
>there is that 4.x uses ld and 5x uses ld.gold.
>
>If we strike the ld.gold when using AFM, we see:
>
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")   = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480) = -1 ENOSPC (No space left on
>device)
>
>
>
>Vs when running directly on the file-system:
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")   = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480) = 0
>
>
>
>Anyone seen anything like this before?
>
>... Actually I'm about to go off and see if its a function of AFM, or
>maybe something to do with the FS in use (I.e. Make a local directory on
>the filesystem on the "AFM" FS and see if that works ...)
>
>Thanks
>
>Simon
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] AFM weirdness

2017-08-23 Thread Simon Thompson (IT Research Support)

We're using an AFM cache from our HPC nodes to access data in another GPFS
cluster, mostly this seems to be working fine, but we've just come across
an interesting problem with a user using gfortran from the GCC 5.2.0
toolset.

When linking their code, they get a "no space left on device" error back
from the linker. If we do this on a node that mounts the file-system
directly (I.e. Not via AFM cache), then it works fine.

We tried with GCC 4.5 based tools and it works OK, but the difference
there is that 4.x uses ld and 5x uses ld.gold.

If we strike the ld.gold when using AFM, we see:

stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
unlink("program")   = 0
open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
fallocate(30, 0, 0, 248480) = -1 ENOSPC (No space left on
device)



Vs when running directly on the file-system:
stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
unlink("program")   = 0
open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
fallocate(30, 0, 0, 248480) = 0



Anyone seen anything like this before?

... Actually I'm about to go off and see if its a function of AFM, or
maybe something to do with the FS in use (I.e. Make a local directory on
the filesystem on the "AFM" FS and see if that works ...)

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] what is mmnfs under the hood

2017-08-06 Thread Simon Thompson (IT Research Support)

What do you mean by cannot use mmsmb and cannot use Ganesha? Do you 
functionally you are not allowed to or they are not working for you?

If it's the latter, then this should be resolvable. If you are under active 
maintenance you could try raising a ticket with IBM, though basic 
implementation is not really a support issue and so you may be better engaging 
a business partner or integrator to help you out.

Simon 

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of ila...@gmail.com 
[ila...@gmail.com]
Sent: 06 August 2017 10:49
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] what is mmnfs under the hood

I have read this atricle:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_ces_migrationcnfstoces.htm

So, in a shortcut, CNFS cannot be used when sharing via CES.
I cannot use ganesha NFS.

Is it possible to share a cluster via SMB and NFS without using CES ?
the nfs will be expored via CNFS but what about SMB ? i cannot use
mmsmb..


On Sun, Aug 6, 2017 at 12:42 PM, Ilan Schwarts <ila...@gmail.com> wrote:
> I have gpfs (spectrum scale 4.2.2.0) and I wish to create NFS exports,
> I cannot use ganesha NFS.
> How do I make NFS exports ? just editing all nodes /etc/exports is enough ?
> I should i use the CNFS as described here:
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adv_cnfssetup.htm
>
>
>
>
> On Sun, Aug 6, 2017 at 12:10 PM, Simon Thompson (IT Research Support)
> <s.j.thomp...@bham.ac.uk> wrote:
>> Under the hood, the NFS services are provided by IBM supplied Ganesha rpms. 
>> It's then fully supported by IBM, e.g. the GPFS VFS later to handle locking, 
>> ACLs, quota etc...
>>
>> Note it's different from using the cnfs support in Spectrum Scale which uses 
>> Kernel NFS AFAIK. Using user space Ganesha means they have control of the 
>> NFS stack, so if something needs patching/fixing, then can roll out new 
>> Ganesha rpms rather than having to get (e.g.) RedHat to incorporate 
>> something into kernel NFS.
>>
>> Mmnfs is a wrapper round the config of Ganesha using CCR to distribute the 
>> config to the nodes.
>>
>> Simon
>> 
>> From: gpfsug-discuss-boun...@spectrumscale.org 
>> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of ila...@gmail.com 
>> [ila...@gmail.com]
>> Sent: 06 August 2017 09:26
>> To: gpfsug main discussion list
>> Subject: [gpfsug-discuss] what is mmnfs under the hood
>>
>> Hi guys,
>>
>> I see IBM spectrumscale configure the NFS via command: mmnfs
>>
>> Is the command mmnfs is a  wrapper on top of the normal kernel NFS
>> (Kernel VFS) ?
>> Is it a wrapper on top of ganesha NFS ?
>> Or it is NFS implemented by SpectrumScale team ?
>>
>>
>> Thanks
>>
>> --
>>
>>
>> -
>> Ilan Schwarts
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> --
>
>
> -
> Ilan Schwarts



--


-
Ilan Schwarts
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] what is mmnfs under the hood

2017-08-06 Thread Simon Thompson (IT Research Support)

Under the hood, the NFS services are provided by IBM supplied Ganesha rpms. 
It's then fully supported by IBM, e.g. the GPFS VFS later to handle locking, 
ACLs, quota etc...

Note it's different from using the cnfs support in Spectrum Scale which uses 
Kernel NFS AFAIK. Using user space Ganesha means they have control of the NFS 
stack, so if something needs patching/fixing, then can roll out new Ganesha 
rpms rather than having to get (e.g.) RedHat to incorporate something into 
kernel NFS.

Mmnfs is a wrapper round the config of Ganesha using CCR to distribute the 
config to the nodes.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of ila...@gmail.com 
[ila...@gmail.com]
Sent: 06 August 2017 09:26
To: gpfsug main discussion list
Subject: [gpfsug-discuss] what is mmnfs under the hood

Hi guys,

I see IBM spectrumscale configure the NFS via command: mmnfs

Is the command mmnfs is a  wrapper on top of the normal kernel NFS
(Kernel VFS) ?
Is it a wrapper on top of ganesha NFS ?
Or it is NFS implemented by SpectrumScale team ?


Thanks

--


-
Ilan Schwarts
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] SOBAR questions

2017-07-18 Thread Simon Thompson (IT Research Support)

So just following up on my questions from January.

We tried to do 2. I.e. Restore to a new file-system with different block sizes. 
It got part way through creating the file-sets on the new SOBAR file-system and 
then GPFS asserts and crashes... We weren't actually intentionally trying to 
move block sizes, but because we were restoring from a traditional SAN based 
system to a shiny new GNR based system, we'd manually done the FS create steps.

I have a PMR open now. I don't know if someone internally in IBM actually tried 
this after my emails, as apparently there is a similar internal defect which is 
~6 months old...

Simon

From: 
>
 on behalf of Marc A Kaplan >
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Friday, 20 January 2017 at 17:57
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: Re: [gpfsug-discuss] SOBAR questions

I worked on some aspects of SOBAR, but without studying and testing the 
commands - I'm not in a position right now to give simple definitive answers -
having said that

Generally your questions are reasonable and the answer is: "Yes it should be 
possible to do that, but you might be going a bit beyond the design point..,
so you'll need to try it out on a (smaller) test system with some smaller tedst 
files.

Point by point.

1. If SOBAR is unable to restore a particular file, perhaps because the 
premigration did not complete -- you should only lose that particular file,
and otherwise "keep going".

2. I think SOBAR helps you build a similar file system to the original, 
including block sizes.  So you'd have to go in and tweak the file system 
creation step(s).
I think this is reasonable... If you hit a problem... IMO that would be a fair 
APAR.

3. Similar to 2.





From:"Simon Thompson (Research Computing - IT Services)" 
>
To:
"gpfsug-discuss@spectrumscale.org" 
>
Date:01/20/2017 10:44 AM
Subject:[gpfsug-discuss] SOBAR questions
Sent by:
gpfsug-discuss-boun...@spectrumscale.org




We've recently been looking at deploying SOBAR to support DR of some of
our file-systems, I have some questions (as ever!) that I can't see are
clearly documented, so was wondering if anyone has any insight on this.

1. If we elect not to premigrate certain files, are we still able to use
SOBAR? We are happy to take a hit that those files will never be available
again, but some are multi TB files which change daily and we can't stream
to tape effectively.

2. When doing a restore, does the block size of the new SOBAR'd to
file-system have to match? For example the old FS was 1MB blocks, the new
FS we create with 2MB blocks. Will this work (this strikes me as one way
we might be able to migrate an FS to a new block size?)?

3. If the file-system was originally created with an older GPFS code but
has since been upgraded, does restore work, and does it matter what client
code? E.g. We have a file-system that was originally 3.5.x, its been
upgraded over time to 4.2.2.0. Will this work if the client code was say
4.2.2.5 (with an appropriate FS version). E.g. Mmlsfs lists, "13.01
(3.5.0.0) Original file system version" and "16.00 (4.2.2.0) Current file
system version". Say there was 4.2.2.5 which created version 16.01
file-system as the new FS, what would happen?

This sort of detail is missing from:
https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.s
cale.v4r22.doc/bl1adv_sobarrestore.htm

But is probably quite important for us to know!

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Job Vacancy: Research Storage Systems Senior Specialist/Specialist

2017-07-17 Thread Simon Thompson (IT Research Support)

Hi all,

Members of this group may be particularly interested in the role "Research
Storage Systems Senior Specialist/Specialist"...


As part of the University of Birmingham's investment in our ability to
support outstanding research by providing technical computing facilities,
we are expanding the team and currently have 6 vacancies. I've provided a
short description of each post, but please do follow the links where you
will find the full job description attached at the bottom of the page.

For some of the posts, they are graded either at 7 or 8 and will be
appointed based upon skills and experience, the expectation is that if the
appointment is made at grade 7 that as the successful candidate grows into
the role, we should be able to regrade up.

Research Storage Systems Senior Specialist/Specialist:
https://goo.gl/NsL1EG
Responsible for the delivery and maintenance of research storage systems,
focussed on the delivery of Spectrum Scale storage systems and data
protection.
(this is available either as a grade 8 or grade 7 post depending on skills
and experience so may suit someone wishing to grow into the senior role)

HPC Specialist post (Research Systems Administrator / Senior Research
Systems Administrator):
https://goo.gl/1SxM4j
Helping to deliver and operationally support the technical computing
environments, with a focus on supporting and delivery of HPC and HTC
services.
(this is available either as a grade 7 or grade 8 post depending on skills
and experience so may suit someone wishing to grow into the senior role)

Research Computing (Analytics):
https://goo.gl/uCNdMH
Helping our researchers to understand data analytics and supporting their
research

Senior Research Software Engineer:
https://goo.gl/dcGgAz
Working with research groups to develop and deliver bespoke software
solutions to support their research

Research Training and Engagement Officer:
https://goo.gl/U48m7z
Helping with the delivery and coordination of training and engagement
works to support users helping ensure they are able to use the facilities
to
support their research.

Research IT Partner in the College of Arts and Law:
https://goo.gl/A7czEA
Providing technical knowledge and skills to support project delivery
through research bid preparation to successful solution delivery.

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Fwd: update smb package ?

2017-07-05 Thread Simon Thompson (IT Research Support)


IBM code comes from either IBM Passport Advantage (where you sign in with
a corporate account that lists your product associations), or from IBM Fix
Central (google it). Fix Central is supposed to be for service updates.

Give the lack of experience, you may want to look at the install toolkit
which ships with Spectrum Scale.

Simon

On 05/07/2017, 14:08, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of ila...@gmail.com"  wrote:

>Sorry for newbish question,
>What do you mean by "from Fix Central",
>Do i need to define another repository for the yum ? or download manually
>?
>its spectrum scale 4.2.2
>
>On Wed, Jul 5, 2017 at 3:41 PM, Sobey, Richard A 
>wrote:
>> Ah... yes you need to download the protocols version of gpfs from Fix
>>Central. Same GPFS but with the SMB/Object etc packages.
>>
>> -Original Message-
>> From: Ilan Schwarts [mailto:ila...@gmail.com]
>> Sent: 05 July 2017 13:29
>> To: gpfsug main discussion list ;
>>Sobey, Richard A 
>> Subject: Re: [gpfsug-discuss] Fwd: update smb package ?
>>
>> [root@LH20-GPFS1 ~]# yum install gpfs.smb Loaded plugins:
>>fastestmirror, langpacks base
>>
>>| 3.6 kB  00:00:00
>> epel/x86_64/metalink
>>
>>|  24 kB  00:00:00
>> epel
>>
>>| 4.3 kB  00:00:00
>> extras
>>
>>| 3.4 kB  00:00:00
>> updates
>>
>>| 3.4 kB  00:00:00
>> (1/4): epel/x86_64/updateinfo
>>
>>| 789 kB  00:00:00
>> (2/4): extras/7/x86_64/primary_db
>>
>>| 188 kB  00:00:00
>> (3/4): epel/x86_64/primary_db
>>
>>| 4.8 MB  00:00:00
>> (4/4): updates/7/x86_64/primary_db
>>
>>| 7.7 MB  00:00:01
>> Loading mirror speeds from cached hostfile
>>  * base: centos.spd.co.il
>>  * epel: mirror.nonstop.co.il
>>  * extras: centos.spd.co.il
>>  * updates: centos.spd.co.il
>> No package gpfs.smb available.
>> Error: Nothing to do
>>
>>
>> [root@LH20-GPFS1 ~]# ls /usr/lpp/mmfs/4.2.2.0/
>> gpfs_rpms/  license/manifestzimon_debs/ zimon_rpms/
>>
>>
>> something is missing in my machine :)
>>
>>
>> On Wed, Jul 5, 2017 at 3:23 PM, Sobey, Richard A
>> wrote:
>>> You don't have the gpfs.smb package installed.
>>>
>>>
>>>
>>> Yum install gpfs.smb
>>>
>>>
>>>
>>> Or install the package manually from /usr/lpp/mmfs//smb_rpms
>>>
>>>
>>>
>>> [root@ces ~]# rpm -qa | grep gpfs
>>>
>>> gpfs.smb-4.3.11_gpfs_21-8.el7.x86_64
>>>
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: gpfsug-discuss-boun...@spectrumscale.org
>>> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Ilan
>>> Schwarts
>>> Sent: 05 July 2017 13:19
>>> To: gpfsug main discussion list 
>>> Subject: [gpfsug-discuss] Fwd: update smb package ?
>>>
>>>
>>>
>>> [root@LH20-GPFS1 ~]# rpm -qa | grep gpfs
>>>
>>> gpfs.ext-4.2.2-0.x86_64
>>>
>>> gpfs.msg.en_US-4.2.2-0.noarch
>>>
>>> gpfs.gui-4.2.2-0.noarch
>>>
>>> gpfs.gpl-4.2.2-0.noarch
>>>
>>> gpfs.gskit-8.0.50-57.x86_64
>>>
>>> gpfs.gss.pmsensors-4.2.2-0.el7.x86_64
>>>
>>> gpfs.adv-4.2.2-0.x86_64
>>>
>>> gpfs.java-4.2.2-0.x86_64
>>>
>>> gpfs.gss.pmcollector-4.2.2-0.el7.x86_64
>>>
>>> gpfs.base-4.2.2-0.x86_64
>>>
>>> gpfs.crypto-4.2.2-0.x86_64
>>>
>>> [root@LH20-GPFS1 ~]# uname -a
>>>
>>> Linux LH20-GPFS1.LH20.com 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20
>>>
>>> 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> [root@LH20-GPFS1 ~]#
>>>
>>> ___
>>>
>>> gpfsug-discuss mailing list
>>>
>>> gpfsug-discuss at spectrumscale.org
>>>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>> ___
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>>
>> --
>>
>>
>> -
>> Ilan Schwarts
>
>
>
>-- 
>
>
>-
>Ilan Schwarts
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

2017-07-04 Thread Simon Thompson (IT Research Support)

AFAIK. Always.

We have had the service eat itself BTW by having different code releases and 
trying this.

Yes its a PITA that we have to get a change approval for it (so we don't do it 
as often as we should)...

The upgrade process upgrades the SMB registry, we have also seen the CTDB lock 
stuff break when they are not running the same code release, so now we just 
don't do this.

Simon

From: 
>
 on behalf of "Sobey, Richard A" 
>
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Tuesday, 4 July 2017 at 11:54
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: [gpfsug-discuss] Requirement to keep gpfs.smb the same on all nodes

Hi all,

For how long has this requirement been in force, and why?

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_updatingsmb.htm

All protocol nodes running the SMB service must have the same version of 
gpfs.smb installed at any time. This requires a brief outage of the SMB service 
to upgrade gpfs.smb to the newer version across all protocol nodes. The 
procedure outlined here is intended to reduce the outage to a minimum.

Previously I’ve upgraded nodes one at a time over the course of a few days.

Is the impact just that we won’t be supported, or will a hole open up beneath 
my feet and swallow me whole?

I really don’t fancy the headache of getting approvals to get an outage of even 
5 minutes at 6am….

Cheers
Richard

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Infiniband: device mlx4_0 not found

2017-06-18 Thread Simon Thompson (IT Research Support)

There used to be issues with the CX-3 cards and specific ports for if you 
wanted to use IB and Eth, but that went away in later firmwares, as did a whole 
load of bits with it being slow to detect media type, so see if you are running 
an up to date Mellanox firmware (assuming it's a VPI card).

On CX-4 there is no auto detect media, but default is IB unless you changed it.

Simon 

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of jcat...@gmail.com 
[jcat...@gmail.com]
Sent: 18 June 2017 16:30
To: gpfsug main discussion list
Subject: ?spam? Re: [gpfsug-discuss] Infiniband: device mlx4_0 not found

Are any cards VPI that can do both eth and ib? I remember reading in 
documentation that that there is a bus order to having mixed media with 
mellanox cards. There is a module setting during init where you can set eth ib 
or auto detect. If the card is on auto it might be coming up eth and making the 
driver flake out because it's in the wrong order.
Responding from my phone so I can't really look it up myself right now about 
what the proper order is, but maybe this might be some help troubleshooting.

On Jun 18, 2017 12:58 AM, "Frank Tower" 
> wrote:

Hi,

You were right, ibv_devinfo -v doesn't return something if both card are 
connected. I didn't checked ibv_* tools, I supposed once IP stack and ibstat 
OK, the rest should work. I'm stupid 

Anyway, once I disconnect one card, ibv_devinfo show me input but with both 
cards, I don't have any input except "device not found".

And what is weird here, it's that it work only when one card are connected, no 
matter the card (both are similar: model, firmware, revision, company)... 
Really strange, I will dig more about the issue.

Stupid and bad workaround: connected a dual port Infiniband. But production 
system doesn't wait..

Thank for your help,
Frank

From: Aaron Knister >
Sent: Saturday, June 10, 2017 2:05 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Infiniband: device mlx4_0 not found

Out of curiosity could you send us the output of "ibv_devinfo -v"?

-Aaron

Sent from my iPhone

On Jun 10, 2017, at 06:55, Frank Tower 
> wrote:

Hi everybody,

I don't get why one of our compute node cannot start GPFS over IB.

I have the following error:

[I] VERBS RDMA starting with verbsRdmaCm=no verbsRdmaSend=no 
verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes

[I] VERBS RDMA library libibverbs.so (version >= 1.1) loaded and initialized.

[I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match 
(nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)).

[I] VERBS RDMA parse verbsPorts mlx4_0/1

[W] VERBS RDMA parse error   verbsPort mlx4_0/1   ignored due to device mlx4_0 
not found

[I] VERBS RDMA library libibverbs.so unloaded.

[E] VERBS RDMA failed to start, no valid verbsPorts defined.

I'm using Centos 7.3, Kernel 3.10.0-514.21.1.el7.x86_64.

I have 2 infinibands card, both have an IP and working well.

[root@rdx110 ~]# ibstat -l

mlx4_0

mlx4_1

[root@rdx110 ~]#

I tried configuration with both card, and no one work with GPFS.

I also tried with mlx4_0/1, but same problem.

Someone already have the issue ?

Kind Regards,

Frank

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] 'mmces address move' weirdness?

2017-06-13 Thread Simon Thompson (IT Research Support)

Suspending the node doesn't stop the services though, we've done a bunch of 
testing by connecting to the "real" IP on the box we wanted to test and that 
works fine.

OK, so you end up connecting to shares like \\192.168.1.20\sharename, but its 
perfectly fine for testing purposes.

In our experience, suspending the node has been fine for this as it moves the 
IP to a "working" node and keeps user service running whilst we test.

Simon

From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Sobey, Richard A" 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>>
Reply-To: 
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Tuesday, 13 June 2017 at 09:08
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness?

Yes, suspending the node would do it, but in the case where you want to remove 
a node from service but keep it running for testing it’s not ideal.

I think you can set the IP address balancing policy to none which might do what 
we want.
From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Simon Thompson 
(IT Research Support)
Sent: 12 June 2017 21:06
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness?

mmces node suspend -N

Is what you want. This will move the address and stop it being assigned one, 
otherwise the rebalance will occur. I think you can change the way it balances, 
but the default is to distribute.

Simon

From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Sobey, Richard A" 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>>
Reply-To: 
"gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Monday, 12 June 2017 at 21:01
To: "gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>" 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness?


I think it's intended but I don't know why. The AUTH service became unhealthy 
on one of our CES nodes (SMB only) and we moved its float address elsewhere. 
CES decided to move it back again moments later despite the node not being fit.



Sorry that doesn't really help but at least you're not alone!


From:gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of valdis.kletni...@vt.edu<mailto:valdis.kletni...@vt.edu> 
<valdis.kletni...@vt.edu<mailto:valdis.kletni...@vt.edu>>
Sent: 12 June 2017 20:41
To: gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>
Subject: [gpfsug-discuss] 'mmces address move' weirdness?

So here's our address setup:

mmces address list

Address NodeGroup  Attribute
-
172.28.45.72arproto1.ar.nis.isb.internalisbnone
172.28.45.73arproto2.ar.nis.isb.internalisbnone
172.28.46.72arproto2.ar.nis.vtc.internalvtcnone
172.28.46.73arproto1.ar.nis.vtc.internalvtcnone

Having some nfs-ganesha weirdness on arproto2.ar.nis.vtc.internal, so I try to
move the address over to its pair so I can look around without impacting users.
However, seems like something insists on moving it right back 60 seconds
later...

Question 1: Is this expected behavior?
Question 2: If it is, what use is 'mmces address move' if it just gets
undone a few seconds later...

(running on arproto2.ar.nis.vtc.internal):

## (date; ip addr show | grep '\.72';mmces address move --ces-ip 172.28.46.72 
--ces-node arproto1.ar.nis.vtc.internal;  while (/bin/true); do date; ip addr 
show | grep '\.72'; sleep 1; done;) | tee migrate.not.nailed.down
Mon Jun 12 15:34:33 EDT 2017
inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0
Mon Jun 12 15:34:40 EDT 2017
Mon Jun 12 15:34:41 EDT 2017
Mon Jun 12 15:34:42 EDT 2017
Mon Jun 12 15:34:43 EDT 2017
(skipped)
Mon Jun 12 15:35:44 EDT 2017
Mon Jun 12 15:35:

Re: [gpfsug-discuss] 'mmces address move' weirdness?

2017-06-12 Thread Simon Thompson (IT Research Support)

mmces node suspend -N

Is what you want. This will move the address and stop it being assigned one, 
otherwise the rebalance will occur. I think you can change the way it balances, 
but the default is to distribute.

Simon

From: 
>
 on behalf of "Sobey, Richard A" 
>
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Monday, 12 June 2017 at 21:01
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: Re: [gpfsug-discuss] 'mmces address move' weirdness?


I think it's intended but I don't know why. The AUTH service became unhealthy 
on one of our CES nodes (SMB only) and we moved its float address elsewhere. 
CES decided to move it back again moments later despite the node not being fit.


Sorry that doesn't really help but at least you're not alone!



From: 
gpfsug-discuss-boun...@spectrumscale.org
 
>
 on behalf of valdis.kletni...@vt.edu 
>
Sent: 12 June 2017 20:41
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] 'mmces address move' weirdness?

So here's our address setup:

mmces address list

Address NodeGroup  Attribute
-
172.28.45.72arproto1.ar.nis.isb.internalisbnone
172.28.45.73arproto2.ar.nis.isb.internalisbnone
172.28.46.72arproto2.ar.nis.vtc.internalvtcnone
172.28.46.73arproto1.ar.nis.vtc.internalvtcnone

Having some nfs-ganesha weirdness on arproto2.ar.nis.vtc.internal, so I try to
move the address over to its pair so I can look around without impacting users.
However, seems like something insists on moving it right back 60 seconds
later...

Question 1: Is this expected behavior?
Question 2: If it is, what use is 'mmces address move' if it just gets
undone a few seconds later...

(running on arproto2.ar.nis.vtc.internal):

## (date; ip addr show | grep '\.72';mmces address move --ces-ip 172.28.46.72 
--ces-node arproto1.ar.nis.vtc.internal;  while (/bin/true); do date; ip addr 
show | grep '\.72'; sleep 1; done;) | tee migrate.not.nailed.down
Mon Jun 12 15:34:33 EDT 2017
inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0
Mon Jun 12 15:34:40 EDT 2017
Mon Jun 12 15:34:41 EDT 2017
Mon Jun 12 15:34:42 EDT 2017
Mon Jun 12 15:34:43 EDT 2017
(skipped)
Mon Jun 12 15:35:44 EDT 2017
Mon Jun 12 15:35:45 EDT 2017
inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0
Mon Jun 12 15:35:46 EDT 2017
inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0
Mon Jun 12 15:35:47 EDT 2017
inet 172.28.46.72/26 brd 172.28.46.127 scope global secondary bond1:0
^C
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] NSD access routes

2017-06-05 Thread Simon Thompson (IT Research Support)

We've seen exactly this behaviour.

Removing and readding the lroc nsd device worked for us.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of d...@milk-vfx.com 
[d...@milk-vfx.com]
Sent: 05 June 2017 14:55
To: Oesterlin, Robert
Cc: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NSD access routes

OK slightly ignore that last email. It's still not updating the output but I 
realise the Stats from line is when they started so probably won't update! :(

Still nothing seems to being cached though.

Dave Goodbourn
Head of Systems
MILK VISUAL EFFECTS
[http://www.milk-vfx.com/src/milk_email_logo.jpg]
5th floor, Threeways House,
40-44 Clipstone Street London, W1W 5DW
Tel: +44 (0)20 3697 8448
Mob: +44 (0)7917 411 069

On 5 June 2017 at 14:49, Dave Goodbourn 
> wrote:
Thanks Bob,

That pagepool comment has just answered my next question!

But it doesn't seem to be working. Here's my mmdiag output:

=== mmdiag: lroc ===
LROC Device(s): 
'0AF259355BA8#/dev/sdb;0AF259355BA9#/dev/sdc;0AF259355BAA#/dev/sdd;'
 status Running
Cache inodes 1 dirs 1 data 1  Config: maxFile 0 stubFile 0
Max capacity: 1151997 MB, currently in use: 0 MB
Statistics from: Mon Jun  5 13:40:50 2017

Total objects stored 0 (0 MB) recalled 0 (0 MB)
  objects failed to store 0 failed to recall 0 failed to inval 0
  objects queried 0 (0 MB) not found 0 = 0.00 %
  objects invalidated 0 (0 MB)

  Inode objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 %
  Inode objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB)
  Inode objects failed to store 0 failed to recall 0 failed to query 0 
failed to inval 0

  Directory objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 %
  Directory objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB)
  Directory objects failed to store 0 failed to recall 0 failed to query 0 
failed to inval 0

  Data objects stored 0 (0 MB) recalled 0 (0 MB) = 0.00 %
  Data objects queried 0 (0 MB) = 0.00 % invalidated 0 (0 MB)
  Data objects failed to store 0 failed to recall 0 failed to query 0 
failed to inval 0

  agent inserts=0, reads=0
response times (usec):
insert min/max/avg=0/0/0
read   min/max/avg=0/0/0

  ssd   writeIOs=0, writePages=0
readIOs=0, readPages=0
response times (usec):
write  min/max/avg=0/0/0
read   min/max/avg=0/0/0

I've restarted GPFS on that node just in case but that didn't seem to help. I 
have LROC on a node that DOESN'T have direct access to an NSD so will hopefully 
cache files that get requested over NFS.

How often are these stats updated? The Statistics line doesn't seem to update 
when running the command again.

Dave,

Dave Goodbourn
Head of Systems
MILK VISUAL EFFECTS
[http://www.milk-vfx.com/src/milk_email_logo.jpg]
5th floor, Threeways House,
40-44 Clipstone Street London, W1W 5DW
Tel: +44 (0)20 3697 8448
Mob: +44 (0)7917 411 069

On 5 June 2017 at 13:48, Oesterlin, Robert 
> wrote:
Hi Dave

I’ve done a large-scale (600 node) LROC deployment here - feel free to reach 
out if you have questions.

mmdiag --lroc is about all there is but it does give you a pretty good idea how 
the cache is performing but you can’t tell which files are cached. Also, watch 
out that the LROC cached will steal pagepool memory (1% of the LROC cache size)

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

From: 
>
 on behalf of Dave Goodbourn >
Reply-To: gpfsug main discussion list 
>
Date: Monday, June 5, 2017 at 7:19 AM
To: gpfsug main discussion list 
>
Subject: [EXTERNAL] Re: [gpfsug-discuss] NSD access routes

I'm testing out the LROC idea. All seems to be working well, but, is there 
anyway to monitor what's cached? How full it might be? The performance etc??

I can see some stats in mmfsadm dump lroc but that's about it.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Report on Scale and Cloud

2017-05-24 Thread Simon Thompson (IT Research Support)

Hi All,

I forgot that I never circulated, as part of the RCUK Working Group on
Cloud, we produced a report on using Scale with Cloud/Undercloud ...

You can download the report from:

https://cloud.ac.uk/reports/spectrumscale/


We had some input from various IBM people whilst writing, and bear in mind
that its a snapshot of support at the point in time when it was written.

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RPM Packages

2017-05-19 Thread Simon Thompson (IT Research Support)

Well, I installed it one node and it still claims that it's advanced licensed 
on the node (only after installing gpfs.adv of course).

I know the license model for DME, but we've never installed the 
gpfs.license.standard packages before.

I agree the XML string pro ably is used somewhere, just not clear if it's 
needed or not...

My guess would be maybe the GUI uses it.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jonathon A Anderson 
[jonathon.ander...@colorado.edu]
Sent: 19 May 2017 17:16
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] RPM Packages

Data Management Edition optionally replaces the traditional GPFS licensing 
model with a per-terabyte licensing fee, rather than a per-socket licensing fee.

https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca=an=iSource=897=ENUS216-158

Presumably installing this RPM is how you tell GPFS which licensing model 
you’re using.

~jonathon


On 5/19/17, 10:12 AM, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
Mark Bush" <gpfsug-discuss-boun...@spectrumscale.org on behalf of 
mark.b...@siriuscom.com> wrote:

For what it’s worth, I have been running 4.2.3 DM for a few weeks in a test 
lab and didn’t have the gpfs.license.dm package installed and everything worked 
fine (GUI, CES, etc, etc).  Here’s what rpm says about itself

[root@node1 ~]# rpm -qpl gpfs.license.dm-4.2.3-0.x86_64.rpm
/usr/lpp/mmfs
/usr/lpp/mmfs/bin

/usr/lpp/mmfs/properties/version/ibm.com_IBM_Spectrum_Scale_Data_Mgmt_Edition-4.2.3.swidtag

This file seems to be some XML code with strings of numbers in it.  Not 
sure what it does for you.


Mark

On 5/18/17, 4:55 PM, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
Simon Thompson (IT Research Support)" <gpfsug-discuss-boun...@spectrumscale.org 
on behalf of s.j.thomp...@bham.ac.uk> wrote:

Hi All,

Normally we never use the install toolkit, but deploy GPFS from a config
management tool.

I see there are now RPMs such as gpfs.license.dm, are these actually
required to be installed? Everything seems to work well without them, so
just interested. Maybe the GUI uses them?

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


This message (including any attachments) is intended only for the use of 
the individual or entity to which it is addressed and may contain information 
that is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law. If you are not the intended recipient, you are 
hereby notified that any use, dissemination, distribution, or copying of this 
communication is strictly prohibited. This message may be viewed by parties at 
Sirius Computer Solutions other than those named in the message header. This 
message does not contain an official representation of Sirius Computer 
Solutions. If you have received this communication in error, notify Sirius 
Computer Solutions immediately and (i) destroy this message if a facsimile or 
(ii) delete this message immediately if this is an electronic communication. 
Thank you.

Sirius Computer Solutions<http://www.siriuscom.com>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] RPM Packages

2017-05-18 Thread Simon Thompson (IT Research Support)

Hi All,

Normally we never use the install toolkit, but deploy GPFS from a config
management tool.

I see there are now RPMs such as gpfs.license.dm, are these actually
required to be installed? Everything seems to work well without them, so
just interested. Maybe the GUI uses them?

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Edge case failure mode

2017-05-11 Thread Simon Thompson (IT Research Support)

Just following up on some discussions we had at the UG this week. I
mentioned a few weeks back that we were having issues with failover of
NFS, and we figured a work around to our clients for this so that failover
works great now (plus there is some code fixes coming down the line as
well to help).

Here's my story of fun with protocol nodes ...

Since then we've occasionally been seeing the load average of 1 CES node
rise to over 400 and then its SOOO SLOW to respond to NFS and SMB clients.
A lot of digging and we found that CTDB was reporting > 80% memory used,
so we tweaked the page pool down to solve this.

Great we thought ... But alas that wasn't the cause.

Just to be clear 95% of the time, the CES node is fine, I can do and ls in
the mounted file-systems and all is good. When the load rises to 400, an
ls takes 20-30 seconds, so they are related, but what is the initial
cause? Other CES nodes are 100% fine and if we do mmces node suspend, and
then resume all is well on the node (and no other CES node assumes the
problem as the IP moves). Its not always the same CES IP, node or even
data centre, and most of the time is looks fine.

I logged a ticket with OCF today, and one thing they suggested was to
disable NFSv3 as they've seen similar behaviour at another site. As far as
I know, all my NFS clients are v4, but sure we disable v3 anyway as its
not actually needed. (Both at the ganesha layer, change the default for
exports and reconfigure all existing exports to v4 only for good measure).
That didn't help, but certainly worth a try!

Note that my CES cluster is multi-cluster mounting the file-systems and
from the POSIX side, its fine most of the time.

We've used the mmnetverify command to check that all is well as well. Of
course this only checks the local cluster, not remote nodes, but as we
aren't seeing expels and can access the FS, we assume that the GPFS layer
is working fine.

So we finally log a PMR with IBM, I catch a node in a broken state and
pull a trace from it and upload that, and ask what other traces they might
want (apparently there is no protocol trace for NFS in 4.2.2-3).

Now, when we run this, I note that its doing things like mmlsfileset to
the remote storage, coming from two clusters and some of this is timing
out. We've already had issues with rp_filter on remote nodes causing
expels, but the storage backend here has only 1 nic, and we can mount and
access it all fine.

So why doesn't mmlsfileset work to this node (I can ping it - ICMP, not
GPFS ping of course), but not make "admin" calls to it. Ssh appears to
work fine as well BTW to it.

So I check on my CES and this is multi-homed and rp_filter is enabled.
Setting it to a value of 2, seems to make mmlsfileset work, so yes, I'm
sure I'm an edge case, but it would be REALLY REALLY helpful to get
mmnetverify to work across a cluster (e.g. I say this is a remote node and
here's its FQDN, can you talk to it) which would have helped with
diagnosis here. I'm not entirely sure why ssh etc would work and pass
rp_filter, but not GPFS traffic (in some cases apparently), but I guess
its something to do with how GPFS is binding and then the kernel routing
layer.

I'm still not sure if this is my root cause as the occurrences of the high
load are a bit random (anything from every hour to being stable for 2-3
days), but since making the rp_filter change this afternoon, so far ...?

I've created an RFE for mmnetverify to be able to test across a cluster...
https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe_ID=10503
0


Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] HAWC question

2017-05-04 Thread Simon Thompson (IT Research Support)

Which cluster though? The client and storage are separate clusters, so all the 
nodes on the remote cluster or storage cluster?

Thanks

Simon 

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of oeh...@gmail.com 
[oeh...@gmail.com]
Sent: 04 May 2017 14:28
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HAWC question

well, it's a bit complicated which is why the message is there in the first 
place.

reason is, there is no easy way to tell except by dumping the stripgroup on the 
filesystem manager and check what log group your particular node is assigned to 
and then check the size of the log group.

as soon as the client node gets restarted it should in most cases pick up a new 
log group and that should be at the new size, but to be 100% sure we say all 
nodes need to be restarted.

you need to also turn HAWC on as well, i assume you just left this out of the 
email , just changing log size doesn't turn it on :-)

On Thu, May 4, 2017 at 6:15 AM Simon Thompson (IT Research Support) 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>> wrote:
Hi,

I have a question about HAWC, we are trying to enable this for our
OpenStack environment, system pool is on SSD already, so we try to change
the log file size with:

mmchfs FSNAME -L 128M

This says:

mmchfs: Attention: You must restart the GPFS daemons before the new log
file
size takes effect. The GPFS daemons can be restarted one node at a time.
When the GPFS daemon is restarted on the last node in the cluster, the new
log size becomes effective.


We multi-cluster the file-system, so do we have to restart every node in
all clusters, or just in the storage cluster?

And how do we tell once it has become active?

Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Meet other spectrum scale users in May

2017-05-02 Thread Simon Thompson (IT Research Support)

Hi All,

Just to note that we need to send final numbers of the venue today for lunches 
etc, so if you are planning to attend, please register NOW! (otherwise you 
might not get lunch/entry to the evening event)

Thanks

Simon

From: 
>
 on behalf of Secretary GPFS UG 
>
Reply-To: 
"gpfsug-discuss@spectrumscale.org" 
>
Date: Thursday, 27 April 2017 at 09:29
To: "gpfsug-discuss@spectrumscale.org" 
>
Subject: [gpfsug-discuss] Meet other spectrum scale users in May


Dear Members,

Please join us and other spectrum scale users for 2 days of great talks and 
networking!

When: 9-10th May 2017

Where: Macdonald Manchester Hotel & Spa, Manchester, UK (right by Manchester 
Piccadilly train station)

Who? The event is free to attend, is open to members from all industries and 
welcomes users with a little and a lot of experience using Spectrum Scale.

The SSUG brings together the Spectrum Scale User Community including Spectrum 
Scale developers and architects to share knowledge, experiences and future 
plans.

Topics include transparent cloud tiering, AFM, automation and security best 
practices, Docker and HDFS support, problem determination, and an update on 
Elastic Storage Server (ESS). Our popular forum includes interactive problem 
solving, a best practices discussion and networking. We're very excited to 
welcome back Doris Conti the Director for Spectrum Scale (GPFS) and HPC SW 
Product Development at IBM.

The May meeting is sponsored by IBM, DDN, Lenovo, Mellanox, Seagate, 
Arcastream, Ellexus, and OCF.

It is an excellent opportunity to learn more and get your questions answered. 
Register your place today at the Eventbrite page https://goo.gl/tRptru

We hope to see you there!

--

Claire O'Toole
Spectrum Scale/GPFS User Group Secretary
+44 (0)7508 033896
www.spectrumscaleug.org
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] NFS issues

2017-04-26 Thread Simon Thompson (IT Research Support)

We have no issues with L3 SMB accessing clients, so I'm pretty sure it's not 
arp. And some of the boxes on the other side of the L3 gateway don't see the 
issues.

We don't use Cisco kit.

I posted in a different update that we think it's related to connections to 
other ports on the same IP which get left open when the IP quickly gets moved 
away and back again.

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Peter Serocka 
[pesero...@gmail.com]
Sent: 26 April 2017 18:53
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NFS issues

> On 2017 Apr 26 Wed, at 16:20, Simon Thompson (IT Research Support) 
> <s.j.thomp...@bham.ac.uk> wrote:
>
> Nope, the clients are all L3 connected, so not an arp issue.


...not on the client, but the server-facing L3 switch
still need to manage its ARP table, and might miss
the IP moving to a new MAC.

Cisco switches have  a default ARP cache timeout of 4 hours, fwiw.

Can your network team provide you the ARP status
from the switch when you see a fail-over being stuck?

— Peter


>
> Two things we have observed:
>
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
>
> 2017-04-25 20:36:49 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip 
> 2017-04-25 20:36:49 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip 
> 2017-04-25 20:36:49 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 :  :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
>
>
>
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
>
>
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
>
>
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
>
> Simon
>
> On 26/04/2017, 00:46, "gpfsug-discuss-boun...@spectrumscale.org on behalf
> of greg.lehm...@csiro.au" <gpfsug-discuss-boun...@spectrumscale.org on
> behalf of greg.lehm...@csiro.au> wrote:
>
>> Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>> the gratuitous arp issue which we see with our non-protocols NFS
>> implementation.
>>
>> -Original Message-
>> From: gpfsug-discuss-boun...@spectrumscale.org
>> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Simon
>> Thompson (IT Research Support)
>> Sent: Wednesday, 26 April 2017 3:31 AM
>> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] NFS issues
>>
>> I did some digging in the mmcesfuncs to see what happens server side on
>> fail over.
>>
>> Basically the server losing the IP is supposed to terminate all sessions
>> and the receiver server sends ACK tickles.
>>
>> My current supposition is that for whatever reason, the losing server
>> isn't releasing something and the client still has hold of a connection
>> which is mostly dead. The tickle then fails to the client from the new
>> server.
>>
>> This would explain why failing the IP back to the original server usually
>> brings the client back to life.
>>
>> This is only my working theory at the moment as we can't reliably
>> reproduce this. Next time it happens we plan to grab some netstat from
>> each side.
>>
>> The

Re: [gpfsug-discuss] NFS issues

2017-04-26 Thread Simon Thompson (IT Research Support)

Nope, the clients are all L3 connected, so not an arp issue.

Two things we have observed:

1. It triggers when one of the CES IPs moves and quickly moves back again.
The move occurs because the NFS server goes into grace:

2017-04-25 20:36:49 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:36:49 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 2 nodeid -1 ip 
2017-04-25 20:36:49 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
recovery release ip 
2017-04-25 20:36:49 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
2017-04-25 20:37:42 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 :  :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 4 nodeid 2 ip



We can't see in any of the logs WHY ganesha is going into grace. Any
suggestions on how to debug this further? (I.e. If we can stop the grace
issues, we can solve the problem mostly).


2. Our clients are using LDAP which is bound to the CES IPs. If we
shutdown nslcd on the client we can get the client to recover once all the
TIME_WAIT connections have gone. Maybe this was a bad choice on our side
to bind to the CES IPs - we figured it would handily move the IPs for us,
but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
connections to the IP as it goes away.


So two approaches we are going to try. Reconfigure the nslcd on a couple
of clients and see if they still show up the issues when fail-over occurs.
Second is to work out why the NFS servers are going into grace in the
first place.

Simon

On 26/04/2017, 00:46, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of greg.lehm...@csiro.au" <gpfsug-discuss-boun...@spectrumscale.org on
behalf of greg.lehm...@csiro.au> wrote:

>Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>the gratuitous arp issue which we see with our non-protocols NFS
>implementation.
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Simon
>Thompson (IT Research Support)
>Sent: Wednesday, 26 April 2017 3:31 AM
>To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
>Subject: Re: [gpfsug-discuss] NFS issues
>
>I did some digging in the mmcesfuncs to see what happens server side on
>fail over.
>
>Basically the server losing the IP is supposed to terminate all sessions
>and the receiver server sends ACK tickles.
>
>My current supposition is that for whatever reason, the losing server
>isn't releasing something and the client still has hold of a connection
>which is mostly dead. The tickle then fails to the client from the new
>server.
>
>This would explain why failing the IP back to the original server usually
>brings the client back to life.
>
>This is only my working theory at the moment as we can't reliably
>reproduce this. Next time it happens we plan to grab some netstat from
>each side. 
>
>Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>server that received the IP and see if that fixes it (i.e. the receiver
>server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>for anyone interested.
>
>Then try and kill he sessions on the losing server to check if there is
>stuff still open and re-tickle the client.
>
>If we can get steps to workaround, I'll log a PMR. I suppose I could do
>that now, but given its non deterministic and we want to be 100% sure
>it's not us doing something wrong, I'm inclined to wait until we do some
>more testing.
>
>I agree with the suggestion that it's probably IO pending nodes that are
>affected, but don't have any data to back that up yet. We did try with a
>read workload on a client, but may we need either long IO blocked reads
>or writes (from the GPFS end).
>
>We also originally had soft as the default option, but saw issues then
>and the docs suggested hard, so we switched and also enabled sync (we
>figured maybe it was NFS client with uncommited writes), but neither have
>resolved the issues entirely. Difficult for me to say if they improved
>the issue though given its sporadic.
>
>Appreciate people's suggestions!
>
>Thanks
>
>Simon
>
>From: gpfsug-discuss-boun...@spectrumscale.org
>[

Re: [gpfsug-discuss] NFS issues

2017-04-25 Thread Simon Thompson (IT Research Support)

I did some digging in the mmcesfuncs to see what happens server side on fail 
over.

Basically the server losing the IP is supposed to terminate all sessions and 
the receiver server sends ACK tickles.

My current supposition is that for whatever reason, the losing server isn't 
releasing something and the client still has hold of a connection which is 
mostly dead. The tickle then fails to the client from the new server.

This would explain why failing the IP back to the original server usually 
brings the client back to life.

This is only my working theory at the moment as we can't reliably reproduce 
this. Next time it happens we plan to grab some netstat from each side. 

Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the server 
that received the IP and see if that fixes it (i.e. the receiver server didn't 
tickle properly). (Usage extracted from mmcesfuncs which is ksh of course). ... 
CesIPPort is colon separated IP:portnumber (of NFSd) for anyone interested.

Then try and kill he sessions on the losing server to check if there is stuff 
still open and re-tickle the client.

If we can get steps to workaround, I'll log a PMR. I suppose I could do that 
now, but given its non deterministic and we want to be 100% sure it's not us 
doing something wrong, I'm inclined to wait until we do some more testing.

I agree with the suggestion that it's probably IO pending nodes that are 
affected, but don't have any data to back that up yet. We did try with a read 
workload on a client, but may we need either long IO blocked reads or writes 
(from the GPFS end).

We also originally had soft as the default option, but saw issues then and the 
docs suggested hard, so we switched and also enabled sync (we figured maybe it 
was NFS client with uncommited writes), but neither have resolved the issues 
entirely. Difficult for me to say if they improved the issue though given its 
sporadic.

Appreciate people's suggestions!

Thanks

Simon

From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jan-Frode Myklebust 
[janfr...@tanso.net]
Sent: 25 April 2017 18:04
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] NFS issues

I *think* I've seen this, and that we then had open TCP connection from client 
to NFS server according to netstat, but these connections were not visible from 
netstat on NFS-server side.

Unfortunately I don't remember what the fix was..

  -jf

tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support) 
<s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>:
Hi,

>From what I can see, Ganesha uses the Export_Id option in the config file
(which is managed by CES) for this. I did find some reference in the
Ganesha devs list that if its not set, then it would read the FSID from
the GPFS file-system, either way they should surely be consistent across
all the nodes. The posts I found were from someone with an IBM email
address, so I guess someone in the IBM teams.

I checked a couple of my protocol nodes and they use the same Export_Id
consistently, though I guess that might not be the same as the FSID value.

Perhaps someone from IBM could comment on if FSID is likely to the cause
of my problems?

Thanks

Simon

On 25/04/2017, 14:51, 
"gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf
of Ouwehand, JJ" 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 on behalf of
j.ouweh...@vumc.nl<mailto:j.ouweh...@vumc.nl>> wrote:

>Hello,
>
>At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
>Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
>Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
>(office, research and clinical data) business process. We have three
>large GPFS filesystems for different purposes.
>
>We also had such a situation with cNFS. A failover (IPtakeover) was
>technically good, only clients experienced "stale filehandles". We opened
>a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
>later, the solution appeared to be in the fsid option.
>
>An NFS filehandle is built by a combination of fsid and a hash function
>on the inode. After a failover, the fsid value can be different and the
>client has a "stale filehandle". To avoid this, the fsid value can be
>statically specified. See:
>
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>scale.v4r22.doc/bl1adm_nfslin.htm
>
>Maybe there is also a value in Ganesha that changes after a failover.
>Certainly since most sessions will be re-established after a failback.
>Maybe you see more debug information with tcpdump.
>
>
>Kind regards,
>
>Jaap

Re: [gpfsug-discuss] NFS issues

2017-04-25 Thread Simon Thompson (IT Research Support)

Hi,

>From what I can see, Ganesha uses the Export_Id option in the config file
(which is managed by CES) for this. I did find some reference in the
Ganesha devs list that if its not set, then it would read the FSID from
the GPFS file-system, either way they should surely be consistent across
all the nodes. The posts I found were from someone with an IBM email
address, so I guess someone in the IBM teams.

I checked a couple of my protocol nodes and they use the same Export_Id
consistently, though I guess that might not be the same as the FSID value.

Perhaps someone from IBM could comment on if FSID is likely to the cause
of my problems?

Thanks

Simon

On 25/04/2017, 14:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Ouwehand, JJ" <gpfsug-discuss-boun...@spectrumscale.org on behalf of
j.ouweh...@vumc.nl> wrote:

>Hello,
>
>At first a short introduction. My name is Jaap Jan Ouwehand, I work at a
>Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of IBM
>Spectrum Scale, Spectrum Archive and Spectrum Protect in our critical
>(office, research and clinical data) business process. We have three
>large GPFS filesystems for different purposes.
>
>We also had such a situation with cNFS. A failover (IPtakeover) was
>technically good, only clients experienced "stale filehandles". We opened
>a PMR at IBM and after testing, deliver logs, tcpdumps and a few months
>later, the solution appeared to be in the fsid option.
>
>An NFS filehandle is built by a combination of fsid and a hash function
>on the inode. After a failover, the fsid value can be different and the
>client has a "stale filehandle". To avoid this, the fsid value can be
>statically specified. See:
>
>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum.
>scale.v4r22.doc/bl1adm_nfslin.htm
>
>Maybe there is also a value in Ganesha that changes after a failover.
>Certainly since most sessions will be re-established after a failback.
>Maybe you see more debug information with tcpdump.
>
>
>Kind regards,
> 
>Jaap Jan Ouwehand
>ICT Specialist (Storage & Linux)
>VUmc - ICT
>E: jj.ouweh...@vumc.nl
>W: www.vumc.com
>
>
>
>-----Oorspronkelijk bericht-
>Van: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] Namens Simon Thompson
>(IT Research Support)
>Verzonden: dinsdag 25 april 2017 13:21
>Aan: gpfsug-discuss@spectrumscale.org
>Onderwerp: [gpfsug-discuss] NFS issues
>
>Hi,
>
>We have recently started deploying NFS in addition our existing SMB
>exports on our protocol nodes.
>
>We use a RR DNS name that points to 4 VIPs for SMB services and failover
>seems to work fine with SMB clients. We figured we could use the same
>name and IPs and run Ganesha on the protocol servers, however we are
>seeing issues with NFS clients when IP failover occurs.
>
>In normal operation on a client, we might see several mounts from
>different IPs obviously due to the way the DNS RR is working, but it all
>works fine.
>
>In a failover situation, the IP will move to another node and some
>clients will carry on, others will hang IO to the mount points referred
>to by the IP which has moved. We can *sometimes* trigger this by manually
>suspending a CES node, but not always and some clients mounting from the
>IP moving will be fine, others won't.
>
>If we resume a node an it fails back, the clients that are hanging will
>usually recover fine. We can reboot a client prior to failback and it
>will be fine, stopping and starting the ganesha service on a protocol
>node will also sometimes resolve the issues.
>
>So, has anyone seen this sort of issue and any suggestions for how we
>could either debug more or workaround?
>
>We are currently running the packages
>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>
>At one point we were seeing it a lot, and could track it back to an
>underlying GPFS network issue that was causing protocol nodes to be
>expelled occasionally, we resolved that and the issues became less
>apparent, but maybe we just fixed one failure mode so see it less often.
>
>On the clients, we use -o sync,hard BTW as in the IBM docs.
>
>On a client showing the issues, we'll see in dmesg, NFS related messages
>like:
>[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
>responding, timed out
>
>Which explains the client hang on certain mount points.
>
>The symptoms feel very much like those logged in this Gluster/ganesha bug:
>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>
>
>Thanks
>
>Simon
>
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscal

[gpfsug-discuss] NFS issues

2017-04-25 Thread Simon Thompson (IT Research Support)

Hi,

We have recently started deploying NFS in addition our existing SMB
exports on our protocol nodes.

We use a RR DNS name that points to 4 VIPs for SMB services and failover
seems to work fine with SMB clients. We figured we could use the same name
and IPs and run Ganesha on the protocol servers, however we are seeing
issues with NFS clients when IP failover occurs.

In normal operation on a client, we might see several mounts from
different IPs obviously due to the way the DNS RR is working, but it all
works fine.

In a failover situation, the IP will move to another node and some clients
will carry on, others will hang IO to the mount points referred to by the
IP which has moved. We can *sometimes* trigger this by manually suspending
a CES node, but not always and some clients mounting from the IP moving
will be fine, others won't.

If we resume a node an it fails back, the clients that are hanging will
usually recover fine. We can reboot a client prior to failback and it will
be fine, stopping and starting the ganesha service on a protocol node will
also sometimes resolve the issues.

So, has anyone seen this sort of issue and any suggestions for how we
could either debug more or workaround?

We are currently running the packages
nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).

At one point we were seeing it a lot, and could track it back to an
underlying GPFS network issue that was causing protocol nodes to be
expelled occasionally, we resolved that and the issues became less
apparent, but maybe we just fixed one failure mode so see it less often.

On the clients, we use -o sync,hard BTW as in the IBM docs.

On a client showing the issues, we'll see in dmesg, NFS related messages
like:
[Wed Apr 12 16:59:53 2017] nfs: server MYNFSSERVER.bham.ac.uk not
responding, timed out

Which explains the client hang on certain mount points.

The symptoms feel very much like those logged in this Gluster/ganesha bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1354439


Thanks

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

75 matches

Mail list logo