Re: [gpfsug-discuss] AFM gateway node scaling

2020-03-25 Thread Matt Weil
thank you thank you... I would like to see that in IBM documentation
somewhere.

On 3/25/20 11:50 AM, Venkateswara R Puvvada wrote:
> Matt,
>
> It is recommended to have dedicated AFM gateway nodes. Memory and CPU
> requirements for AFM gateway node depends on the number of filesets
> handled by the node and the inode usage of those filesets. Since AFM
> keeps track of changes in the memory, any network disturbance can
> cause the memory utilization to go high and which eventually leads to
> in-memory queue to be dropped. After the queue is dropped, AFM runs
> recovery to recover the lost operations which is expensive as it
> involves creating the snapshot, running policy scan, doing readdir
> from home/secondary and build the list of  lost operations. When the
> gateway node goes down, all the filesets handled by that node
> distributed to the remaining active gateway nodes. After the gateway
> node comes back, filesets are transferred back to the original gateway
> node. When designing the gateway node, make sure that it have enough
> memory , CPU resources for handling the incoming and outgoing data
> based on the bandwidth. Limit the filesets per gateway(ex. less than
> 20 filesets per gateway)  so that number of AFM recoveries triggered
> will be minimal when the queues are lost. Also limit the total number
> of inodes handled by the gateway node across all the filesets (ex.
> less than 400 million inodes per gateway).  AFM gateway nodes are
> licensed as server nodes.
>
>
> ~Venkat (vpuvv...@in.ibm.com)
>
>
>
> From:        Matt Weil 
> To:        gpfsug-discuss@spectrumscale.org
> Date:        03/23/2020 11:39 PM
> Subject:        [EXTERNAL] [gpfsug-discuss] AFM gateway node scaling
> Sent by:        gpfsug-discuss-boun...@spectrumscale.org
> 
>
>
>
> Hello all,
>
> Is there any guide and or recommendation as to how to scale this.
>
> filesets per gateway node?  Is it necessary to separate NSD server and
> gateway roles.  Are dedicated gateway nodes licensed as clients?
>
> Thanks for any guidance.
>
> Matt
>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] AFM gateway node scaling

2020-03-25 Thread Venkateswara R Puvvada
Matt,

It is recommended to have dedicated AFM gateway nodes. Memory and CPU 
requirements for AFM gateway node depends on the number of filesets 
handled by the node and the inode usage of those filesets. Since AFM keeps 
track of changes in the memory, any network disturbance can cause the 
memory utilization to go high and which eventually leads to in-memory 
queue to be dropped. After the queue is dropped, AFM runs recovery to 
recover the lost operations which is expensive as it involves creating the 
snapshot, running policy scan, doing readdir from home/secondary and build 
the list of  lost operations. When the gateway node goes down, all the 
filesets handled by that node distributed to the remaining active gateway 
nodes. After the gateway node comes back, filesets are transferred back to 
the original gateway node. When designing the gateway node, make sure that 
it have enough memory , CPU resources for handling the incoming and 
outgoing data based on the bandwidth. Limit the filesets per gateway(ex. 
less than 20 filesets per gateway)  so that number of AFM recoveries 
triggered will be minimal when the queues are lost. Also limit the total 
number of inodes handled by the gateway node across all the filesets (ex. 
less than 400 million inodes per gateway).  AFM gateway nodes are licensed 
as server nodes.


~Venkat (vpuvv...@in.ibm.com)



From:   Matt Weil 
To: gpfsug-discuss@spectrumscale.org
Date:   03/23/2020 11:39 PM
Subject:[EXTERNAL] [gpfsug-discuss] AFM gateway node scaling
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hello all,

Is there any guide and or recommendation as to how to scale this.

filesets per gateway node?  Is it necessary to separate NSD server and
gateway roles.  Are dedicated gateway nodes licensed as clients?

Thanks for any guidance.

Matt

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A=BosatlBIMbvMZJYB2C0VAcEW4Dr9ApcpPbM9zYSCz7A=dmS3n52oSxBzBmWt0E1YgfkPxxwttyfkDBt_sW60f6I=
 






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Jonathan Buzzard

On 25/03/2020 16:32, Skylar Thompson wrote:

On Wed, Mar 25, 2020 at 04:27:27PM +, Jonathan Buzzard wrote:

On 25/03/2020 14:15, Skylar Thompson wrote:

We execute mmbackup via a regular TSM client schedule with an incremental
action, with a virtualmountpoint set to an empty, local "canary" directory.
mmbackup runs as a preschedule command, and the client -domain parameter is
set only to backup the canary directory. dsmc will backup the canary
directory as a filespace only if mmbackup succeeds (exits with 0). We can
then monitor the canary and infer the status of the associated GPFS
filespace or fileset.



I prefer this approach I think than grovelling around in log files that
could easily break on an update. Though there is a better approach which in
my view IBM should be using already in mmbackup.

It came to me this afternoon that one could use the TSM API for this. After
a bit of Googling I find there is an API call dsmUpdateFS, which allows you
to update the filespace information on the TSM server.

Fields that you can update include

DSM_FSUPD_OCCUPANCY
DSM_FSUPD_CAPACITY
DSM_FSUPD_BACKSTARTDATE
DSM_FSUPD_BACKCOMPLETEDATE

Information on the API call here

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2FSSEQVQ_8.1.9%2Fapi%2Fr_cmd_dsmupdatefs.htmldata=02%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C8c7605146223442e8a2708d7d0dab99d%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637207510145541248sdata=QV9U0WxL5BTtR5%2Fasv1X202d9PqnLXZNg5bzX8KpUOo%3Dreserved=0

How do we submit this as a feature request again? That said in my view it's
a bug in mmbackup. The latest in a very long line stretching back well over
a decade that make mmbackup less than production ready rather than a feature
request :-)

I feel a breakout of a text editor and some C code coming on in the
meantime.


I actually tried using the API years ago to try to do some custom queries,
and ran into the problem that custom API clients can only see data from
custom API clients; they can't see data from the standard BA client. I
contacted IBM about this, and they said it was a safety feature to prevent
a rogue/poorly-written client from trashing regular backup/archive data,
which makes some sense. Unfortunately, it does mean that IBM would have to
be the source of the fix.




Grrr, I had forgotten that. Well then IBM need to fix this.

Bug mmbackup does not update the occupancy, capacity, backup start date 
and backup end date when doing a backup.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Jonathan Buzzard

On 25/03/2020 14:15, Skylar Thompson wrote:

We execute mmbackup via a regular TSM client schedule with an incremental
action, with a virtualmountpoint set to an empty, local "canary" directory.
mmbackup runs as a preschedule command, and the client -domain parameter is
set only to backup the canary directory. dsmc will backup the canary
directory as a filespace only if mmbackup succeeds (exits with 0). We can
then monitor the canary and infer the status of the associated GPFS
filespace or fileset.



I prefer this approach I think than grovelling around in log files that 
could easily break on an update. Though there is a better approach which 
in my view IBM should be using already in mmbackup.


It came to me this afternoon that one could use the TSM API for this. 
After a bit of Googling I find there is an API call dsmUpdateFS, which 
allows you to update the filespace information on the TSM server.


Fields that you can update include

DSM_FSUPD_OCCUPANCY
DSM_FSUPD_CAPACITY
DSM_FSUPD_BACKSTARTDATE
DSM_FSUPD_BACKCOMPLETEDATE

Information on the API call here

https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/api/r_cmd_dsmupdatefs.html

How do we submit this as a feature request again? That said in my view 
it's a bug in mmbackup. The latest in a very long line stretching back 
well over a decade that make mmbackup less than production ready rather 
than a feature request :-)


I feel a breakout of a text editor and some C code coming on in the 
meantime.



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GUI timeout when running HW_INVENTORY on little endian ESS server

2020-03-25 Thread Billich Heinrich Rainer (ID SD)
Hello,

Sorry, I was wrong. Looks like the timeout already happens in xCAT/rinv and the 
gui just reports it. What to some respect is good - now this is a purely 
xCAT/hardware issue. The GUI isn't involved any more.

Kind regards

Heiner


/var/log/xcat/command.log:


[Date]   2020-03-25 15:03:46
[ClientType] cli
[Request]rinv * all
[Response]
***: Error: timeout
[NumberNodes] 1
[ElapsedTime] 97.085 s


GUI:

HW_INVENTORY *  2020-03-25 15:03:26 681436ms failed CmdRunTask.doExecute
   nas12io04b-i: Error executing rinv command. Exit code = 1; Command output = 
; Command error =***: [**]: Error: timeout

On 25.03.20, 16:35, "Billich  Heinrich Rainer (ID SD)" 
 wrote:

 

 
Hello,

I did ask about this timeouts when the gui runs HW_INVENTORY before. Now I 
would like to know what the exact timeout value in the gui code is and if we 
can change it. I want to argue: If a xCat command takes X seconds but the  GUI 
code timeouts after Y we  know the command will fail if X > Y, hence we need to 
increase Y unless we can reduce X ...

It's this function which raises the timeout: 
at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory

If we can't fix the long execution time for the time being, can we raise 
the timeout value? I know this most likely is a Firmware issue with little 
endian power systems, but we won't update some more time.

Thank you,

Heiner


debug: Running 'xcat.sh rinv '10.250.***' '*' 'all' ' on node localhost
err: com.ibm.fscc.common.exceptions.FsccException: Error executing rinv 
command. Exit code = 1; Command output = ; Command error = *: []: 
Error: timeout

at 
com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory(InventoryAndStateHelper.java:92)
at 
com.ibm.fscc.ras.xcat.InventoryAndStateHelper.buildHardwareInventory(InventoryAndStateHelper.java:175)
at 
com.ibm.fscc.ras.xcat.InventoryRefreshTask.inner_run(InventoryRefreshTask.java:94)
at 
com.ibm.fscc.ras.xcat.InventoryRefreshTask.run(InventoryRefreshTask.java:72)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:227)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:199)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:482)
at 
com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:80)
at 
com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156)
at 
com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470)
at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456)
at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97)
...
debug: Running 'mmsysmonc event 'gui' 'xcat_nodelist_ok' -i' ***-i' ' 
on node localhost
err: ***-i: Error executing rinv command. Exit code = 1; Command output = ; 
Command error = nas12io04b: [***]: Error: timeout
,*** -i: Error executing rinv command. Exit code = 1; Command output = ; 
Command error =***: [***]: Error: timeout
err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task 
was unsuccessful.
at 
com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117)
at 
com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:84)
at 
com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156)
at 
com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470)
at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456)
at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97)
EFSSG1150C Running specified task was unsuccessful.

-- 
===
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.bill...@id.ethz.ch





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] GUI timeout when running HW_INVENTORY on little endian ESS server

2020-03-25 Thread Billich Heinrich Rainer (ID SD)
 

 
Hello,

I did ask about this timeouts when the gui runs HW_INVENTORY before. Now I 
would like to know what the exact timeout value in the gui code is and if we 
can change it. I want to argue: If a xCat command takes X seconds but the  GUI 
code timeouts after Y we  know the command will fail if X > Y, hence we need to 
increase Y unless we can reduce X ...

It's this function which raises the timeout: 
at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory

If we can't fix the long execution time for the time being, can we raise the 
timeout value? I know this most likely is a Firmware issue with little endian 
power systems, but we won't update some more time.

Thank you,

Heiner


debug: Running 'xcat.sh rinv '10.250.***' '*' 'all' ' on node localhost
err: com.ibm.fscc.common.exceptions.FsccException: Error executing rinv 
command. Exit code = 1; Command output = ; Command error = *: []: 
Error: timeout

at 
com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory(InventoryAndStateHelper.java:92)
at 
com.ibm.fscc.ras.xcat.InventoryAndStateHelper.buildHardwareInventory(InventoryAndStateHelper.java:175)
at 
com.ibm.fscc.ras.xcat.InventoryRefreshTask.inner_run(InventoryRefreshTask.java:94)
at 
com.ibm.fscc.ras.xcat.InventoryRefreshTask.run(InventoryRefreshTask.java:72)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:227)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:199)
at 
com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:482)
at 
com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:80)
at 
com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156)
at 
com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470)
at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456)
at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97)
...
debug: Running 'mmsysmonc event 'gui' 'xcat_nodelist_ok' -i' ***-i' ' on 
node localhost
err: ***-i: Error executing rinv command. Exit code = 1; Command output = ; 
Command error = nas12io04b: [***]: Error: timeout
,*** -i: Error executing rinv command. Exit code = 1; Command output = ; 
Command error =***: [***]: Error: timeout
err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was 
unsuccessful.
at 
com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117)
at 
com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:84)
at 
com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156)
at 
com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470)
at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456)
at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97)
EFSSG1150C Running specified task was unsuccessful.

-- 
===
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.bill...@id.ethz.ch



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Skylar Thompson
IIRC, I think you need to set 2 in the bit field of the DEBUGmmbackup
environment variable. I had a long-term task to see what I could get out of
that, but this just reminded me of it and current events might actually let
me have time to look into it now...

On Wed, Mar 25, 2020 at 10:38:55AM -0400, Jaime Pinto wrote:
> Additionally, mmbackup creates by default a .mmbackupCfg directory on the 
> root of the fileset where it dumps several files and directories with the 
> progress of the backup. For instance: expiredFiles/, prepFiles/, 
> updatedFiles/, dsminstr.log, ...
> 
> You may then create a script to search these directories for logs/lists of 
> what has happened, and generate a more detailed report of what happened 
> during the backup. In our case I generate a daily report of how many files 
> and how much data have been sent to the TSM server and deleted for each user, 
> including their paths. You can do more tricks if you want.
> 
> Jaime
> 
> 
> On 3/25/2020 10:15:59, Skylar Thompson wrote:
> > We execute mmbackup via a regular TSM client schedule with an incremental
> > action, with a virtualmountpoint set to an empty, local "canary" directory.
> > mmbackup runs as a preschedule command, and the client -domain parameter is
> > set only to backup the canary directory. dsmc will backup the canary
> > directory as a filespace only if mmbackup succeeds (exits with 0). We can
> > then monitor the canary and infer the status of the associated GPFS
> > filespace or fileset.
> > 
> > On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote:
> > > 
> > > What is the best way of monitoring whether or not mmbackup has managed to
> > > complete a backup successfully?
> > > 
> > > Traditionally one use a TSM monitoring solution of your choice to make 
> > > sure
> > > nodes where backing up (I am assuming mmbackup is being used in 
> > > conjunction
> > > with TSM here).
> > > 
> > > However mmbackup does not update the backup_end column in the 
> > > filespaceview
> > > table (at least in 4.2) which makes things rather more complicated.
> > > 
> > > The best I can come up with is querying the events table to see if the
> > > client schedule completed, but that gives a false sense of security as the
> > > schedule completing does not mean the backup completed as far as I know.
> > > 
> > > What solutions are you all using, or does mmbackup in 5.x update the
> > > filespaceview table?
> > 
> 
> .
> .
> .
>   TELL US ABOUT YOUR SUCCESS STORIES
>  http://www.scinethpc.ca/testimonials
>  
> ---
> Jaime Pinto - Storage Analyst
> SciNet HPC Consortium - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.ca
> University of Toronto
> 661 University Ave. (MaRS), Suite 1140
> Toronto, ON, M5G1M1
> P: 416-978-2755
> C: 416-505-1477
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Jaime Pinto

Additionally, mmbackup creates by default a .mmbackupCfg directory on the root 
of the fileset where it dumps several files and directories with the progress 
of the backup. For instance: expiredFiles/, prepFiles/, updatedFiles/, 
dsminstr.log, ...

You may then create a script to search these directories for logs/lists of what 
has happened, and generate a more detailed report of what happened during the 
backup. In our case I generate a daily report of how many files and how much 
data have been sent to the TSM server and deleted for each user, including 
their paths. You can do more tricks if you want.

Jaime


On 3/25/2020 10:15:59, Skylar Thompson wrote:

We execute mmbackup via a regular TSM client schedule with an incremental
action, with a virtualmountpoint set to an empty, local "canary" directory.
mmbackup runs as a preschedule command, and the client -domain parameter is
set only to backup the canary directory. dsmc will backup the canary
directory as a filespace only if mmbackup succeeds (exits with 0). We can
then monitor the canary and infer the status of the associated GPFS
filespace or fileset.

On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote:


What is the best way of monitoring whether or not mmbackup has managed to
complete a backup successfully?

Traditionally one use a TSM monitoring solution of your choice to make sure
nodes where backing up (I am assuming mmbackup is being used in conjunction
with TSM here).

However mmbackup does not update the backup_end column in the filespaceview
table (at least in 4.2) which makes things rather more complicated.

The best I can come up with is querying the events table to see if the
client schedule completed, but that gives a false sense of security as the
schedule completing does not mean the backup completed as far as I know.

What solutions are you all using, or does mmbackup in 5.x update the
filespaceview table?




.
.
.
  TELL US ABOUT YOUR SUCCESS STORIES
 http://www.scinethpc.ca/testimonials
 
---
Jaime Pinto - Storage Analyst
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Skylar Thompson
We execute mmbackup via a regular TSM client schedule with an incremental
action, with a virtualmountpoint set to an empty, local "canary" directory.
mmbackup runs as a preschedule command, and the client -domain parameter is
set only to backup the canary directory. dsmc will backup the canary
directory as a filespace only if mmbackup succeeds (exits with 0). We can
then monitor the canary and infer the status of the associated GPFS
filespace or fileset.

On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote:
> 
> What is the best way of monitoring whether or not mmbackup has managed to
> complete a backup successfully?
> 
> Traditionally one use a TSM monitoring solution of your choice to make sure
> nodes where backing up (I am assuming mmbackup is being used in conjunction
> with TSM here).
> 
> However mmbackup does not update the backup_end column in the filespaceview
> table (at least in 4.2) which makes things rather more complicated.
> 
> The best I can come up with is querying the events table to see if the
> client schedule completed, but that gives a false sense of security as the
> schedule completing does not mean the backup completed as far as I know.
> 
> What solutions are you all using, or does mmbackup in 5.x update the
> filespaceview table?

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-03-25 Thread Carl Zetie - ca...@us.ibm.com
So far we have not revisited the EOS date for 4.2.3, but I would not rule it 
out entirely if the lockdown continues well into the summer. If we did, the 
next likely EOS date would be April 30th. 

Even if we do postpone the date for 4.2.3, keep two other dates in mind for 
planning:

- RHEL 6 support is coming to an end in November. We won't support Scale with 
RHEL 6 once Red Hat stops supporting RHEL 6
- RHEL 7 will be supported with 5.0.5, but not "5.next", the release scheduled 
for the second half of 2020. So you'll need to plan to adopt RHEL 8 before 
upgrading to Scale "5.next"


As much as possible we are going to try to stick to our release cadence of 
twice a year even through these difficulties, including designating 5.0.5 for 
Extended Updates.

"Keep Calm and Scale Out".
  

 
 
Carl Zetie
Program Director
Offering Management 
Spectrum Scale

(919) 473 3318 ][ Research Triangle Park
ca...@us.ibm.com
 

   
Message: 2
Date: Wed, 25 Mar 2020 10:09:12 +
From: Jonathan Buzzard 
To: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] GPFS 5 and supported rhel OS
Message-ID: <91d02fd3-2af7-5880-e1f2-aaf9b1f80...@strath.ac.uk>
Content-Type: text/plain; charset=utf-8; format=flowed

On 19/02/2020 23:34, Renata Maria Dart wrote:
> Hi, I understand gpfs 4.2.3 is end of support this coming September.  

A planning question at this stage. Do IBM intend to hold to this date or 
is/could there be a relaxation due to COVID-19?

Basically I was planning to do the upgrade this summer, but what with 
working from home I am less keen to do a a 4.2.3 to 5.x upgrade while 
not on hand to the actual hardware.

Obviously if we have to we have to, just want to know where we stand.

JAB.

-- 
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


--



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-03-25 Thread Jonathan Buzzard

On 19/02/2020 23:34, Renata Maria Dart wrote:
Hi, I understand gpfs 4.2.3 is end of support this coming September.  


A planning question at this stage. Do IBM intend to hold to this date or 
is/could there be a relaxation due to COVID-19?


Basically I was planning to do the upgrade this summer, but what with 
working from home I am less keen to do a a 4.2.3 to 5.x upgrade while 
not on hand to the actual hardware.


Obviously if we have to we have to, just want to know where we stand.

JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] mmbackup monitoring

2020-03-25 Thread Jonathan Buzzard



What is the best way of monitoring whether or not mmbackup has managed 
to complete a backup successfully?


Traditionally one use a TSM monitoring solution of your choice to make 
sure nodes where backing up (I am assuming mmbackup is being used in 
conjunction with TSM here).


However mmbackup does not update the backup_end column in the 
filespaceview table (at least in 4.2) which makes things rather more 
complicated.


The best I can come up with is querying the events table to see if the 
client schedule completed, but that gives a false sense of security as 
the schedule completing does not mean the backup completed as far as I know.


What solutions are you all using, or does mmbackup in 5.x update the 
filespaceview table?



JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss