Re: [gpfsug-discuss] AFM gateway node scaling
thank you thank you... I would like to see that in IBM documentation somewhere. On 3/25/20 11:50 AM, Venkateswara R Puvvada wrote: > Matt, > > It is recommended to have dedicated AFM gateway nodes. Memory and CPU > requirements for AFM gateway node depends on the number of filesets > handled by the node and the inode usage of those filesets. Since AFM > keeps track of changes in the memory, any network disturbance can > cause the memory utilization to go high and which eventually leads to > in-memory queue to be dropped. After the queue is dropped, AFM runs > recovery to recover the lost operations which is expensive as it > involves creating the snapshot, running policy scan, doing readdir > from home/secondary and build the list of lost operations. When the > gateway node goes down, all the filesets handled by that node > distributed to the remaining active gateway nodes. After the gateway > node comes back, filesets are transferred back to the original gateway > node. When designing the gateway node, make sure that it have enough > memory , CPU resources for handling the incoming and outgoing data > based on the bandwidth. Limit the filesets per gateway(ex. less than > 20 filesets per gateway) so that number of AFM recoveries triggered > will be minimal when the queues are lost. Also limit the total number > of inodes handled by the gateway node across all the filesets (ex. > less than 400 million inodes per gateway). AFM gateway nodes are > licensed as server nodes. > > > ~Venkat (vpuvv...@in.ibm.com) > > > > From: Matt Weil > To: gpfsug-discuss@spectrumscale.org > Date: 03/23/2020 11:39 PM > Subject: [EXTERNAL] [gpfsug-discuss] AFM gateway node scaling > Sent by: gpfsug-discuss-boun...@spectrumscale.org > > > > > Hello all, > > Is there any guide and or recommendation as to how to scale this. > > filesets per gateway node? Is it necessary to separate NSD server and > gateway roles. Are dedicated gateway nodes licensed as clients? > > Thanks for any guidance. > > Matt > > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] AFM gateway node scaling
Matt, It is recommended to have dedicated AFM gateway nodes. Memory and CPU requirements for AFM gateway node depends on the number of filesets handled by the node and the inode usage of those filesets. Since AFM keeps track of changes in the memory, any network disturbance can cause the memory utilization to go high and which eventually leads to in-memory queue to be dropped. After the queue is dropped, AFM runs recovery to recover the lost operations which is expensive as it involves creating the snapshot, running policy scan, doing readdir from home/secondary and build the list of lost operations. When the gateway node goes down, all the filesets handled by that node distributed to the remaining active gateway nodes. After the gateway node comes back, filesets are transferred back to the original gateway node. When designing the gateway node, make sure that it have enough memory , CPU resources for handling the incoming and outgoing data based on the bandwidth. Limit the filesets per gateway(ex. less than 20 filesets per gateway) so that number of AFM recoveries triggered will be minimal when the queues are lost. Also limit the total number of inodes handled by the gateway node across all the filesets (ex. less than 400 million inodes per gateway). AFM gateway nodes are licensed as server nodes. ~Venkat (vpuvv...@in.ibm.com) From: Matt Weil To: gpfsug-discuss@spectrumscale.org Date: 03/23/2020 11:39 PM Subject:[EXTERNAL] [gpfsug-discuss] AFM gateway node scaling Sent by:gpfsug-discuss-boun...@spectrumscale.org Hello all, Is there any guide and or recommendation as to how to scale this. filesets per gateway node? Is it necessary to separate NSD server and gateway roles. Are dedicated gateway nodes licensed as clients? Thanks for any guidance. Matt ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A=BosatlBIMbvMZJYB2C0VAcEW4Dr9ApcpPbM9zYSCz7A=dmS3n52oSxBzBmWt0E1YgfkPxxwttyfkDBt_sW60f6I= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup monitoring
On 25/03/2020 16:32, Skylar Thompson wrote: On Wed, Mar 25, 2020 at 04:27:27PM +, Jonathan Buzzard wrote: On 25/03/2020 14:15, Skylar Thompson wrote: We execute mmbackup via a regular TSM client schedule with an incremental action, with a virtualmountpoint set to an empty, local "canary" directory. mmbackup runs as a preschedule command, and the client -domain parameter is set only to backup the canary directory. dsmc will backup the canary directory as a filespace only if mmbackup succeeds (exits with 0). We can then monitor the canary and infer the status of the associated GPFS filespace or fileset. I prefer this approach I think than grovelling around in log files that could easily break on an update. Though there is a better approach which in my view IBM should be using already in mmbackup. It came to me this afternoon that one could use the TSM API for this. After a bit of Googling I find there is an API call dsmUpdateFS, which allows you to update the filespace information on the TSM server. Fields that you can update include DSM_FSUPD_OCCUPANCY DSM_FSUPD_CAPACITY DSM_FSUPD_BACKSTARTDATE DSM_FSUPD_BACKCOMPLETEDATE Information on the API call here https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2FSSEQVQ_8.1.9%2Fapi%2Fr_cmd_dsmupdatefs.htmldata=02%7C01%7Cjonathan.buzzard%40strath.ac.uk%7C8c7605146223442e8a2708d7d0dab99d%7C631e0763153347eba5cd0457bee5944e%7C0%7C0%7C637207510145541248sdata=QV9U0WxL5BTtR5%2Fasv1X202d9PqnLXZNg5bzX8KpUOo%3Dreserved=0 How do we submit this as a feature request again? That said in my view it's a bug in mmbackup. The latest in a very long line stretching back well over a decade that make mmbackup less than production ready rather than a feature request :-) I feel a breakout of a text editor and some C code coming on in the meantime. I actually tried using the API years ago to try to do some custom queries, and ran into the problem that custom API clients can only see data from custom API clients; they can't see data from the standard BA client. I contacted IBM about this, and they said it was a safety feature to prevent a rogue/poorly-written client from trashing regular backup/archive data, which makes some sense. Unfortunately, it does mean that IBM would have to be the source of the fix. Grrr, I had forgotten that. Well then IBM need to fix this. Bug mmbackup does not update the occupancy, capacity, backup start date and backup end date when doing a backup. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup monitoring
On 25/03/2020 14:15, Skylar Thompson wrote: We execute mmbackup via a regular TSM client schedule with an incremental action, with a virtualmountpoint set to an empty, local "canary" directory. mmbackup runs as a preschedule command, and the client -domain parameter is set only to backup the canary directory. dsmc will backup the canary directory as a filespace only if mmbackup succeeds (exits with 0). We can then monitor the canary and infer the status of the associated GPFS filespace or fileset. I prefer this approach I think than grovelling around in log files that could easily break on an update. Though there is a better approach which in my view IBM should be using already in mmbackup. It came to me this afternoon that one could use the TSM API for this. After a bit of Googling I find there is an API call dsmUpdateFS, which allows you to update the filespace information on the TSM server. Fields that you can update include DSM_FSUPD_OCCUPANCY DSM_FSUPD_CAPACITY DSM_FSUPD_BACKSTARTDATE DSM_FSUPD_BACKCOMPLETEDATE Information on the API call here https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/api/r_cmd_dsmupdatefs.html How do we submit this as a feature request again? That said in my view it's a bug in mmbackup. The latest in a very long line stretching back well over a decade that make mmbackup less than production ready rather than a feature request :-) I feel a breakout of a text editor and some C code coming on in the meantime. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GUI timeout when running HW_INVENTORY on little endian ESS server
Hello, Sorry, I was wrong. Looks like the timeout already happens in xCAT/rinv and the gui just reports it. What to some respect is good - now this is a purely xCAT/hardware issue. The GUI isn't involved any more. Kind regards Heiner /var/log/xcat/command.log: [Date] 2020-03-25 15:03:46 [ClientType] cli [Request]rinv * all [Response] ***: Error: timeout [NumberNodes] 1 [ElapsedTime] 97.085 s GUI: HW_INVENTORY * 2020-03-25 15:03:26 681436ms failed CmdRunTask.doExecute nas12io04b-i: Error executing rinv command. Exit code = 1; Command output = ; Command error =***: [**]: Error: timeout On 25.03.20, 16:35, "Billich Heinrich Rainer (ID SD)" wrote: Hello, I did ask about this timeouts when the gui runs HW_INVENTORY before. Now I would like to know what the exact timeout value in the gui code is and if we can change it. I want to argue: If a xCat command takes X seconds but the GUI code timeouts after Y we know the command will fail if X > Y, hence we need to increase Y unless we can reduce X ... It's this function which raises the timeout: at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory If we can't fix the long execution time for the time being, can we raise the timeout value? I know this most likely is a Firmware issue with little endian power systems, but we won't update some more time. Thank you, Heiner debug: Running 'xcat.sh rinv '10.250.***' '*' 'all' ' on node localhost err: com.ibm.fscc.common.exceptions.FsccException: Error executing rinv command. Exit code = 1; Command output = ; Command error = *: []: Error: timeout at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory(InventoryAndStateHelper.java:92) at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.buildHardwareInventory(InventoryAndStateHelper.java:175) at com.ibm.fscc.ras.xcat.InventoryRefreshTask.inner_run(InventoryRefreshTask.java:94) at com.ibm.fscc.ras.xcat.InventoryRefreshTask.run(InventoryRefreshTask.java:72) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:227) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:199) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:482) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:80) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97) ... debug: Running 'mmsysmonc event 'gui' 'xcat_nodelist_ok' -i' ***-i' ' on node localhost err: ***-i: Error executing rinv command. Exit code = 1; Command output = ; Command error = nas12io04b: [***]: Error: timeout ,*** -i: Error executing rinv command. Exit code = 1; Command output = ; Command error =***: [***]: Error: timeout err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was unsuccessful. at com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:84) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97) EFSSG1150C Running specified task was unsuccessful. -- === Heinrich Billich ETH Zürich Informatikdienste Tel.: +41 44 632 72 56 heinrich.bill...@id.ethz.ch ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] GUI timeout when running HW_INVENTORY on little endian ESS server
Hello, I did ask about this timeouts when the gui runs HW_INVENTORY before. Now I would like to know what the exact timeout value in the gui code is and if we can change it. I want to argue: If a xCat command takes X seconds but the GUI code timeouts after Y we know the command will fail if X > Y, hence we need to increase Y unless we can reduce X ... It's this function which raises the timeout: at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory If we can't fix the long execution time for the time being, can we raise the timeout value? I know this most likely is a Firmware issue with little endian power systems, but we won't update some more time. Thank you, Heiner debug: Running 'xcat.sh rinv '10.250.***' '*' 'all' ' on node localhost err: com.ibm.fscc.common.exceptions.FsccException: Error executing rinv command. Exit code = 1; Command output = ; Command error = *: []: Error: timeout at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.runRemoteInventory(InventoryAndStateHelper.java:92) at com.ibm.fscc.ras.xcat.InventoryAndStateHelper.buildHardwareInventory(InventoryAndStateHelper.java:175) at com.ibm.fscc.ras.xcat.InventoryRefreshTask.inner_run(InventoryRefreshTask.java:94) at com.ibm.fscc.ras.xcat.InventoryRefreshTask.run(InventoryRefreshTask.java:72) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:227) at com.ibm.fscc.common.newscheduler.RefreshTaskExecutor.executeRefreshTask(RefreshTaskExecutor.java:199) at com.ibm.fscc.common.newscheduler.RefreshTaskIds.execute(RefreshTaskIds.java:482) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:80) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97) ... debug: Running 'mmsysmonc event 'gui' 'xcat_nodelist_ok' -i' ***-i' ' on node localhost err: ***-i: Error executing rinv command. Exit code = 1; Command output = ; Command error = nas12io04b: [***]: Error: timeout ,*** -i: Error executing rinv command. Exit code = 1; Command output = ; Command error =***: [***]: Error: timeout err: com.ibm.fscc.cli.CommandException: EFSSG1150C Running specified task was unsuccessful. at com.ibm.fscc.cli.CommandException.createCommandException(CommandException.java:117) at com.ibm.fscc.newcli.commands.task.CmdRunTask.doExecute(CmdRunTask.java:84) at com.ibm.fscc.newcli.internal.AbstractCliCommand.execute(AbstractCliCommand.java:156) at com.ibm.fscc.cli.CliProtocol.processNewStyleCommand(CliProtocol.java:470) at com.ibm.fscc.cli.CliProtocol.processRequest(CliProtocol.java:456) at com.ibm.fscc.cli.CliServer$CliClientServer.run(CliServer.java:97) EFSSG1150C Running specified task was unsuccessful. -- === Heinrich Billich ETH Zürich Informatikdienste Tel.: +41 44 632 72 56 heinrich.bill...@id.ethz.ch ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup monitoring
IIRC, I think you need to set 2 in the bit field of the DEBUGmmbackup environment variable. I had a long-term task to see what I could get out of that, but this just reminded me of it and current events might actually let me have time to look into it now... On Wed, Mar 25, 2020 at 10:38:55AM -0400, Jaime Pinto wrote: > Additionally, mmbackup creates by default a .mmbackupCfg directory on the > root of the fileset where it dumps several files and directories with the > progress of the backup. For instance: expiredFiles/, prepFiles/, > updatedFiles/, dsminstr.log, ... > > You may then create a script to search these directories for logs/lists of > what has happened, and generate a more detailed report of what happened > during the backup. In our case I generate a daily report of how many files > and how much data have been sent to the TSM server and deleted for each user, > including their paths. You can do more tricks if you want. > > Jaime > > > On 3/25/2020 10:15:59, Skylar Thompson wrote: > > We execute mmbackup via a regular TSM client schedule with an incremental > > action, with a virtualmountpoint set to an empty, local "canary" directory. > > mmbackup runs as a preschedule command, and the client -domain parameter is > > set only to backup the canary directory. dsmc will backup the canary > > directory as a filespace only if mmbackup succeeds (exits with 0). We can > > then monitor the canary and infer the status of the associated GPFS > > filespace or fileset. > > > > On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote: > > > > > > What is the best way of monitoring whether or not mmbackup has managed to > > > complete a backup successfully? > > > > > > Traditionally one use a TSM monitoring solution of your choice to make > > > sure > > > nodes where backing up (I am assuming mmbackup is being used in > > > conjunction > > > with TSM here). > > > > > > However mmbackup does not update the backup_end column in the > > > filespaceview > > > table (at least in 4.2) which makes things rather more complicated. > > > > > > The best I can come up with is querying the events table to see if the > > > client schedule completed, but that gives a false sense of security as the > > > schedule completing does not mean the backup completed as far as I know. > > > > > > What solutions are you all using, or does mmbackup in 5.x update the > > > filespaceview table? > > > > . > . > . > TELL US ABOUT YOUR SUCCESS STORIES > http://www.scinethpc.ca/testimonials > > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup monitoring
Additionally, mmbackup creates by default a .mmbackupCfg directory on the root of the fileset where it dumps several files and directories with the progress of the backup. For instance: expiredFiles/, prepFiles/, updatedFiles/, dsminstr.log, ... You may then create a script to search these directories for logs/lists of what has happened, and generate a more detailed report of what happened during the backup. In our case I generate a daily report of how many files and how much data have been sent to the TSM server and deleted for each user, including their paths. You can do more tricks if you want. Jaime On 3/25/2020 10:15:59, Skylar Thompson wrote: We execute mmbackup via a regular TSM client schedule with an incremental action, with a virtualmountpoint set to an empty, local "canary" directory. mmbackup runs as a preschedule command, and the client -domain parameter is set only to backup the canary directory. dsmc will backup the canary directory as a filespace only if mmbackup succeeds (exits with 0). We can then monitor the canary and infer the status of the associated GPFS filespace or fileset. On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote: What is the best way of monitoring whether or not mmbackup has managed to complete a backup successfully? Traditionally one use a TSM monitoring solution of your choice to make sure nodes where backing up (I am assuming mmbackup is being used in conjunction with TSM here). However mmbackup does not update the backup_end column in the filespaceview table (at least in 4.2) which makes things rather more complicated. The best I can come up with is querying the events table to see if the client schedule completed, but that gives a false sense of security as the schedule completing does not mean the backup completed as far as I know. What solutions are you all using, or does mmbackup in 5.x update the filespaceview table? . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup monitoring
We execute mmbackup via a regular TSM client schedule with an incremental action, with a virtualmountpoint set to an empty, local "canary" directory. mmbackup runs as a preschedule command, and the client -domain parameter is set only to backup the canary directory. dsmc will backup the canary directory as a filespace only if mmbackup succeeds (exits with 0). We can then monitor the canary and infer the status of the associated GPFS filespace or fileset. On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote: > > What is the best way of monitoring whether or not mmbackup has managed to > complete a backup successfully? > > Traditionally one use a TSM monitoring solution of your choice to make sure > nodes where backing up (I am assuming mmbackup is being used in conjunction > with TSM here). > > However mmbackup does not update the backup_end column in the filespaceview > table (at least in 4.2) which makes things rather more complicated. > > The best I can come up with is querying the events table to see if the > client schedule completed, but that gives a false sense of security as the > schedule completing does not mean the backup completed as far as I know. > > What solutions are you all using, or does mmbackup in 5.x update the > filespaceview table? -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS 5 and supported rhel OS
So far we have not revisited the EOS date for 4.2.3, but I would not rule it out entirely if the lockdown continues well into the summer. If we did, the next likely EOS date would be April 30th. Even if we do postpone the date for 4.2.3, keep two other dates in mind for planning: - RHEL 6 support is coming to an end in November. We won't support Scale with RHEL 6 once Red Hat stops supporting RHEL 6 - RHEL 7 will be supported with 5.0.5, but not "5.next", the release scheduled for the second half of 2020. So you'll need to plan to adopt RHEL 8 before upgrading to Scale "5.next" As much as possible we are going to try to stick to our release cadence of twice a year even through these difficulties, including designating 5.0.5 for Extended Updates. "Keep Calm and Scale Out". Carl Zetie Program Director Offering Management Spectrum Scale (919) 473 3318 ][ Research Triangle Park ca...@us.ibm.com Message: 2 Date: Wed, 25 Mar 2020 10:09:12 + From: Jonathan Buzzard To: gpfsug-discuss@spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS 5 and supported rhel OS Message-ID: <91d02fd3-2af7-5880-e1f2-aaf9b1f80...@strath.ac.uk> Content-Type: text/plain; charset=utf-8; format=flowed On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. A planning question at this stage. Do IBM intend to hold to this date or is/could there be a relaxation due to COVID-19? Basically I was planning to do the upgrade this summer, but what with working from home I am less keen to do a a 4.2.3 to 5.x upgrade while not on hand to the actual hardware. Obviously if we have to we have to, just want to know where we stand. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG -- ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS 5 and supported rhel OS
On 19/02/2020 23:34, Renata Maria Dart wrote: Hi, I understand gpfs 4.2.3 is end of support this coming September. A planning question at this stage. Do IBM intend to hold to this date or is/could there be a relaxation due to COVID-19? Basically I was planning to do the upgrade this summer, but what with working from home I am less keen to do a a 4.2.3 to 5.x upgrade while not on hand to the actual hardware. Obviously if we have to we have to, just want to know where we stand. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] mmbackup monitoring
What is the best way of monitoring whether or not mmbackup has managed to complete a backup successfully? Traditionally one use a TSM monitoring solution of your choice to make sure nodes where backing up (I am assuming mmbackup is being used in conjunction with TSM here). However mmbackup does not update the backup_end column in the filespaceview table (at least in 4.2) which makes things rather more complicated. The best I can come up with is querying the events table to see if the client schedule completed, but that gives a false sense of security as the schedule completing does not mean the backup completed as far as I know. What solutions are you all using, or does mmbackup in 5.x update the filespaceview table? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss