Hi,
I also used a custom script (database driven) via cron which creates many
fileset snapshots during the day via the "default helper nodes". Because of
the iops, the oldest snapshots are deleted at night.
Perhaps it's a good idea to take one global filesystem snapshot and make it
available
Big vote for cron jobs.
Our snapshot are created by a script, installed on each GPFS node. The script
handles naming, removing old snapshots, checking that sufficient disk space
exists before creating a snapshot,
etc. We do snapshots every 15 minutes, keeping them with lower frequency over
Hi,
I use a python script via cron job, it checks how many snapshots exist and
removes those that
exceed a configurable limit, then creates a new one.
Deployed via puppet it's much less hassle than click around in a GUI/
> From: "Kidger, Daniel"
> To: "gpfsug main discussion list"
> Sent:
Maybe some colleagues at IBM devel can correct me, but pagepool size should not
make much difference. Afaik, it is mostly read cache data. Another think could
be if using HAWC function, I am not sure in such case.
Anyhow, looking at your node name, your system seems a DSS from Lenovo so you
That's true, although I would not expect the memory to be flushed for just
snapshots deletion. But it could well be a problem at snapshot creation time.
Anyway for changing the pagepool we should contact the vendor, since this is
configured by their installation scripts, so we better have them
Ok that sounds a good candidate for an improvement. Thanks.
We didn't want to do a full filesystem snapshot for the space consumption
indeed. But we may consider it, keeping an eye on the space.
Cheers,
Ivano
__
Paul Scherrer Institut
Ivano Talamo
Sure, that makes a lot of sense and we were already doing in that way.
Cheers,
Ivano
__
Paul Scherrer Institut
Ivano Talamo
WHGA/038
Forschungsstrasse 111
5232 Villigen PSI
Schweiz
Telefon: +41 56 310 47 11
E-Mail: ivano.tal...@psi.ch
Hi Jordi,
thanks for the explanation, I can now see better why something like that would
happen. Indeed the cluster has a lot of clients, coming via different clusters
and even some NFS/SMB via protocol nodes.
So I think opening a case makes a lot of sense to track it down. Not sure how
we
Might it be a case of being over built? In the old days you could really
mess up an Oracle DW by giving it too much RAM... It would spend all day
reading in and out data to the ram that it didn't really need, because it
had the SGA available to load the whole table.
Perhaps the pagepool is so
Simon,
Thanks - that is a good insight.
The HA 'feature' of the snapshot automation is perhaps a key feature as Linux
still lacks a decent 'cluster cron'
Also, If "HA" do we know where the state is centrally kept?
On the point of snapshots being left undeleted, do you ever use
keep in mind... creating many snapshots... means ;-) .. you'll have to delete many snapshots..
at a certain level, which depends on #files, #directories, ~workload, #nodes, #networks etc we ve seen cases, where generating just full snapshots (whole file system) is the better approach instead
Also, if snapshotting multiple filesets, it's important to group these into
a single mmcrsnapshot command. Then you get a single quiesce, instead of
one per fileset.
i.e. do:
snapname=$(date --utc +@GMT-%Y.%m.%d-%H.%M.%S)
mmcrsnapshot gpfs0
Ivano,
if it happens frequently, I would recommend to open a support case.
The creation or deletion of a snapshot requires a quiesce of the nodes to obtain a consistent point-in-time image of the file system and/or update some internal structures afaik. Quiesce is required for nodes at the
I always used the GUI for automating snapshots that were tagged with the YYMMDD
format so that they were accessible via the previous versions tab from CES
access.
This requires no locking if you have multiple GUI servers running, so in theory
the snapshots creation is "HA". BUT if you shutdown
Hello Andrew,
Thanks for your questions.
We're not experiencing any other issue/slowness during normal activity.
The storage is a Lenovo DSS appliance with a dedicated SSD enclosure/pool for
metadata only.
The two NSD servers have 750GB of RAM and 618 are configured as pagepool.
The
Hi all,
Since the subject of snapshots has come up, I also have a question ...
Snapshots can be created from the command line with mmcrsnapshot, and hence can
be automated via con jobs etc.
Snapshots can also be created from the Scale GUI. The GUI also provides its own
automation for the
Ivano,
How big is the filesystem in terms of number of files?
How big is the filesystem in terms of capacity?
Is the Metadata on Flash or Spinning disk?
Do you see issues when users do an LS of the filesystem or only when you are
doing snapshots.
How much memory do the NSD servers have?
How
Dear all,
Since a while we are experiencing an issue when dealing with snapshots.
Basically what happens is that when deleting a fileset snapshot (and maybe also
when creating new ones) the filesystem becomes inaccessible on the clients for
the duration of the operation (can take a few
18 matches
Mail list logo