if you are looking for a commercial supported solution, our Dataflow
product is purpose build for this kind of task. a presentation that
covers some high level aspects of it was given by me last year at one
of the spectrum scale meetings in the UK -->
Filesystem quiesce failed has nothing to do with open files.
What it means is that the filesystem couldn’t flush dirty data and metadata
within a defined time to take a snapshot. This can be caused by to high
maxfilestocache or pagepool settings.
To give you an simplified example (its more
while it is true that you can backup the data with everything that can read a
POSIX filesystem, you will miss all the metadata associated like extended
attributes and ACL’s.
beside mmbackup (which uses spectrum protect) DDN also offers a product for
data management including backup/restore
Unfortunate more complicated :)
The consumption here is an estimate based on 512b inodes, which no newly
created filesystem has as all new default to 4k. So if you have 4k inodes
you could easily need 2x of the estimated value.
Then there are extended attributes, also not added here, etc .
So
I know this will be very confusing, but the code works different than one would
think (not sure this is documented anywhere). The number of subblocks across
pools of a fileystsem is calculated based on the smallest pools blocksize.
So given you have a 1MB blocksize in the system pool you will
ed in your presentation). Do they do I/O test in different ways ?
thanks,
Alvise
From: gpfsug-discuss-boun...@spectrumscale.org
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Sven Oehme
[oeh...@gmail.com]
Sent: Thursday, March 21, 2019 6:35 PM
To: gpfsug main discussion list
Subject: R
Lots of details in a presentation I did last year before I left IBM à
http://files.gpfsug.org/presentations/2018/Singapore/Sven_Oehme_ESS_in_CORAL_project_update.pdf
Sven
From: on behalf of Daniel Kidger
Reply-To: gpfsug main discussion list
Date: Thursday, March 21, 2019 at 10:15 AM
While Fred is right, in most cases you shouldn’t see this, under heavy burst
create workloads before 5.0.2 you can even trigger out of space errors even you
have plenty of space in the filesystem (very hard to reproduce so unlikely to
hit for a normal enduser). to address the issues there have
Hi,
using nsdSmallThreadRatio 1 is not necessarily correct, as it 'significant
depends' (most used word combination of performance engineers) on your
workload. to give some more background - on reads you need much more
threads for small i/os than for large i/os to get maximum performance, the
Just to add a bit more details to that, If you want to track down an individual
i/o or all i/o to a particular file you can do this with mmfsadm dump iohist
(mmdiag doesn’t give you all you need) :
so run /usr/lpp/mmfs/bin/mmfsadm dump iohist >iohist on server as well as
client :
I/O
If you run it on the client, it includes local queuing, network as well as NSD
Server processing and the actual device I/O time.
if issued on the NSD Server it contains processing and I/O time, the processing
shouldn’t really add any overhead but in some cases I have seen it
contributing.
and i already talk about NUMA stuff at the CIUK usergroup meeting, i won't
volunteer for a 2nd advanced topic :-D
On Tue, Nov 27, 2018 at 12:43 PM Sven Oehme wrote:
> was the node you rebooted a client or a server that was running kswapd at
> 100% ?
>
> sven
>
>
> On Tu
was the node you rebooted a client or a server that was running kswapd at
100% ?
sven
On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson
wrote:
> The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1
> I think).
>
>
>
> So is this memory pressure on the NSD nodes then? I
Hi,
now i need to swap back in a lot of information about GPFS i tried to swap
out :-)
i bet kswapd is not doing anything you think the name suggest here, which
is handling swap space. i claim the kswapd thread is trying to throw
dentries out of the cache and what it tries to actually get rid
if this happens you should check a couple of things :
1. are you under memory pressure or even worse started swapping .
2. is there any core running at ~ 0% idle - run top , press 1 and check the
idle column.
3. is there any single thread running at ~100% - run top , press shift - h
and check
Hi,
the best way to debug something like that is to start with top. start top
then press 1 and check if any of the cores has almost 0% idle while others
have plenty of CPU left. if that is the case you have one very hot thread.
to further isolate it you can press 1 again to collapse the cores,
Marc,
The issue with that is that you need multiple passes and things change in
between, it also significant increases migration times. You will always miss
something or you need to manually correct. The right thing is to have 1 tool
that takes care of both, the bulk transfer and the
i am not sure if that was mentioned already but in some version of V5.0.X
based on my suggestion a tool was added by mark on a AS-IS basis (thanks
mark) to do what you want with one exception :
/usr/lpp/mmfs/samples/ilm/mmxcp -h
Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count
1.483945423 72170 TRACE_LOCK: unlock_vfs_m: cP
> 0xC90069346B68 holdCount 25
>
> 1.483945624 72170 TRACE_VNODE: gpfsRead exit: fast err
> 0
>
>1.483946831 72170 TRACE_KSVFS: ReleSG: sli 38 sgP
> 0xC90035E52F78 NotQuiesced vfsOp 2
>
>1.4
___
From: gpfsug-discuss-boun...@spectrumscale.org
on behalf of Sven Oehme
Sent: Thursday, October 18, 2018 7:09:56 PM
To: gpfsug main discussion list; gpfsug-disc...@gpfsug.org
Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping
Pete
Peter,
If the 2 operations wouldn't be compatible you should have gotten a different
message.
To understand what the message means one needs to understand how the snapshot
code works.
When GPFS wants to do a snapshot it goes through multiple phases. It tries to
first flush all dirty data a
while most said here is correct, it can’t explain the performance of 200 files
/sec and I couldn’t resist jumping in here :-D
lets assume for a second each operation is synchronous and its done by just 1
thread. 200 files / sec means 5 ms on average per file write. Lets be generous
and say
somebody did listen and remembered to what i did say :-D
... and you are absolute correct, there is no need for SSD's to get great
zero length mdtest results, most people don't know that create workloads
unless carefully executed in general almost exclusively stresses a
filesystem client and has
ated to many threads trying to compete for the same buffer space.
>
> I will try to take the trace with trace=io option and see if can find
> something.
>
> How do i turn of prefetching? Can i turn it off for a single node/client?
>
> Regards,
> Lohit
>
> On Sep 18, 20
condition that
> GPFS was not handling well.
>
> I am not sure if this issue is a repeat and I am yet to isolate the
> incident and test with increasing number of mmap threads.
>
> I am not 100 percent sure if this is related to mmap yet but just wanted
> to ask you if you have s
Hi Ken,
what the documents is saying (or try to) is that the behavior of data in
inode or metadata operations are not changed if HAWC is enabled, means if
the data fits into the inode it will be placed there directly instead of
writing the data i/o into a data recovery log record (which is what
whose apps want to create a bazillion tiny files!
>
> So how do I do that?
>
> Thanks!
>
> —
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
>
> kevin.buterba...@vanderbilt.edu - (615)875-
the number of subblocks is derived by the smallest blocksize in any pool of
a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k
in the metadata pool, but 4 x of that in the data pool if your data pool is
4M.
sven
On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote:
>
;
> [ 871.316987] [] cxiDeallocPageList+0x45/0x110
> [mmfslinux]
>
> [ 871.356886] [] ? _raw_spin_lock+0x10/0x30
>
> [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130
> [mmfslinux]
>
> [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160
> [mmfs26]
>
> [ 871.
Hi,
what does numactl -H report ?
also check if this is set to yes :
root@fab3a:~# mmlsconfig numaMemoryInterleave
numaMemoryInterleave yes
Sven
On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) <
heiner.bill...@psi.ch> wrote:
> Hello,
>
>
>
> I have two nodes which hang on
by:gpfsug-discuss-boun...@spectrumscale.org
> --
>
>
>
> Hi Sven,
>
> What is the resulting indirect-block size with a 4mb metadata block size?
>
> Does the new sub-block magic mean that it will take up 32k, or will it
> occupy 128k?
>
> Che
negative impact
at least on controllers i have worked with.
hope this helps.
On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote:
> Hi, it's for a traditional NSD setup.
>
> --Joey
>
> On 6/26/18 12:21 AM, Sven Oehme wrote:
>
> Joseph,
>
> the subblocksize will be derived
Joseph,
the subblocksize will be derived from the smallest blocksize in the
filesytem, given you specified a metadata block size of 512k thats what
will be used to calculate the number of subblocks, even your data pool is
4mb.
is this setup for a traditional NSD Setup or for GNR as the
there is nothing wrong with running CES on NSD Servers, in fact if all CES
nodes have access to all LUN's of the filesystem thats the fastest possible
configuration as you eliminate 1 network hop.
the challenge is always to do the proper sizing, so you don't run out of
CPU and memory on the nodes
a few more weeks and we have a better answer than dump pgalloc ;-)
On Wed, May 2, 2018 at 6:07 AM Peter Smith
wrote:
> "how do I see how much of the pagepool is in use and by what? I've looked
> at mmfsadm dump and mmdiag --memory and neither has provided me the
>
GPFS doesn't do flush on close by default unless explicit asked by the
application itself, but you can configure that .
mmchconfig flushOnClose=yes
if you use O_SYNC or O_DIRECT then each write ends up on the media before
we return.
sven
On Wed, Apr 11, 2018 at 7:06 AM Peter Serocka
Hi,
i guess you saw that in some of my presentations about communication code
overhaul. we started in 4.2.X and since then added more and more numa
awareness to GPFS. Version 5.0 also has enhancements in this space.
sven
On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister
Hi Lohit,
i am working with ray on a mmap performance improvement right now, which
most likely has the same root cause as yours , see -->
http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html
the thread above is silent after a couple of back and rorth, but ray and i
have active
der immediately and destroy this information.
> Any unauthorized copying, disclosure or distribution of the material in
> this information is strictly forbidden.
> ------
>
> *Von:* gpfsug-discuss-boun...@spectrumscale.org [mailto:
> gpfsug-discuss
Renar,
if you specify the filesystem blocksize of 1M during mmcr you don't have to
restart anything. scale 5 didn't change anything on the behaviour of
maxblocksize change while the cluster is online, it only changed the
default passed to the blocksize parameter for create a new filesystem. one
ds
>
> Ray Coetzee
> Mob: +44 759 704 7060 <+44%207597%20047060>
>
> Skype: ray.coetzee
>
> Email: coetzee@gmail.com
>
>
> On Fri, Jan 12, 2018 at 8:57 PM, Sven Oehme <oeh...@gmail.com> wrote:
>
>> is this primary read or write ?
>>
>
what version of Scale are you using right now ?
On Fri, Jan 12, 2018 at 2:29 AM Ray Coetzee wrote:
> I'd like to ask the group of their experiences in improving the
> performance of applications that use mmap calls against files on Spectrum
> Scale.
>
> Besides using an
, Aaron Knister <aaron.s.knis...@nasa.gov> wrote:
>
> Thanks, Sven. Understood!
>
> On 12/19/17 3:20 PM, Sven Oehme wrote:
>
> Hi,
>
>
> the zero padding was never promoted into a GA stream, it was an
>
> experiment to proof we are on the right track when
i don't know if that works with cisco, but i use 50 an 100m cables for 40
as well as 100Gbit in my lab between 2 Mellanox switches :
http://www.mellanox.com/products/interconnect/ethernet-active-optical-cables.php
as paul pointed out one of the very first things one needs to do after
adding an
ng if you could explain the performance
> difference between the no zero padding code and the > 32 subblock code
> since given your the example of 32K files and 16MB block size I figure both
> cases ought to write the same amount to disk.
>
> Thanks!
>
> -Aaron
>
>
>
i thought i answered that already, but maybe i just thought about answering
it and then forgot about it :-D
so yes more than 32 subblocks per block significant increase the
performance of filesystems with small files, for the sake of the argument
let's say 32k in a large block filesystem again
we can not commit on timelines on mailing lists, but this is a known issue
and will be addressed in a future release.
sven
On Mon, Oct 23, 2017, 11:23 AM Bryan Banister
wrote:
> This becomes very disruptive when you have to add or remove many NFS
> exports. Is it
if this is a new cluster and you use reasonable new HW, i probably would
start with just the following settings on the clients :
pagepool=4g,workerThreads=256,maxStatCache=0,maxFilesToCache=256k
depending on what storage you use and what workload you have you may have
to set a couple of other
on
>
> On 9/8/17 5:21 PM, Sven Oehme wrote:
> > Hi,
> >
> > the code assumption is that the underlying device has no volatile write
> > cache, i was absolute sure we have that somewhere in the FAQ, but i
> > couldn't find it, so i will talk to somebody to correc
can you please share the entire command line you are using ?
also gpfs version, mmlsconfig output would help as well as if this is a
shared storage filesystem or a system using local disks.
thx. Sven
On Wed, Sep 13, 2017 at 5:19 PM wrote:
> So we have a number of very
I am not sure what exactly you are looking for but all blockdevices are
opened with O_DIRECT , we never cache anything on this layer .
On Thu, Sep 7, 2017, 7:11 PM Aaron Knister wrote:
> Hi Everyone,
>
> This is something that's come up in the past and has recently
t;
> Are these settings still needed, or is this also tackled in the code?
>
> Thank you!!
>
> Cheers,
> Kenneth
>
>
>
> On 02/09/17 00:42, Sven Oehme wrote:
>
> Hi Ed,
>
> yes the defaults for that have changed for customers who had not
> overridden
Hi Ed,
yes the defaults for that have changed for customers who had not overridden
the default settings. the reason we did this was that many systems in the
field including all ESS systems that come pre-tuned where manually changed
to 8k from the 16k default due to better performance that was
a trace during a mmfsck with the checksum parameters turned on would reveal
it.
the support team should be able to give you specific triggers to cut a
trace during checksum errors , this way the trace is cut when the issue
happens and then from the trace on server and client side one can extract
; >> anyway, our current question is: if these are hardware issues, is there
> >> anything in gpfs client->nsd (on the network side) that would detect
> >> such errors. ie can we trust the data (and metadata).
> >> i was under the impression that client to disk is not
covered, but i
> assumed that at least client to nsd (the network part) was checksummed.
>
> stijn
>
>
> On 08/02/2017 09:10 PM, Sven Oehme wrote:
> > ok, i think i understand now, the data was already corrupted. the config
> > change i proposed only prevents a potentially k
wei...@ugent.be>
wrote:
> yes ;)
>
> the system is in preproduction, so nothing that can't stopped/started in
> a few minutes (current setup has only 4 nsds, and no clients).
> mmfsck triggers the errors very early during inode replica compare.
>
>
> stijn
>
> On 08/02/2017
the very first thing you should check is if you have this setting set :
mmlsconfig envVar
envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
MLX5_USE_MUTEX 1
if that doesn't come back the way above you need to set it :
mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0
it can happen for multiple reasons , one is a linux install, unfortunate
there are significant more simpler explanations. Linux as well as BIOS in
servers from time to time looks for empty disks and puts a GPT label on it
if the disk doesn't have one, etc. this thread is explaining a lot of this
:
Hi,
i talked with a few others to confirm this, but unfortunate this is a
limitation of the code today (maybe not well documented which we will look
into). Encryption only encrypts data blocks, it doesn't encrypt metadata.
Hence, if encryption is enabled, we don't store data in the inode,
while i really like competition on SpecSFS, the claims from the WekaIO
people are lets say 'alternative facts' at best
The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage
devices attached, they compare this to a WekaIO system with 14 times more
memory (14 TB vs 1TB) , 120 SSD's
end-to-end data integrity is very important and the reason it hasn't been
done in Scale is not because its not important, its because its very hard
to do without impacting performance in a very dramatic way.
imagine your raid controller blocksize is 1mb and your filesystem blocksize
is 1MB . if
yes as long as you haven't pushed anything to it (means pagepool got under
enough pressure to free up space) you won't see anything in the stats :-)
sven
On Mon, Jun 5, 2017 at 7:00 AM Dave Goodbourn wrote:
> OK I'm going to hang my head in the corner...RTFM...I've not
if you are using O_DIRECT calls they will be ignored by default for LROC,
same for encrypted data.
how exactly are you testing this?
On Mon, Jun 5, 2017 at 6:50 AM Dave Goodbourn wrote:
> Thanks Bob,
>
> That pagepool comment has just answered my next question!
>
> But it
Hi,
the very first thing to do would be to do a mmfsadm dump iohist instead of
mmdiag --iohist one time (we actually add this info in next release to
mmdiag --iohist) to see if the thread type will reveal something :
07:25:53.578522 Wdata1:20260249600 8192 35.930488076
The reason is the default setting of :
verbsRdmasPerConnection: 16
you can increase this , on smaller clusters i run on some with 1024, but
its not advised to run this on 100's of nodes and not if you know exactly
what you are doing.
i would start by doubling it to 32 and see how much of
and they all will have high settings.
sven
--
Sven Oehme
Scalable Storage Research
email: oeh...@us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
--
From: Mark Bush <mark.b...@siriuscom.com>
To: gpfsu
let me clarify and get back, i am not 100% sure on a cross cluster , i
think the main point was that the FS manager for that fs should be
reassigned (which could also happen via mmchmgr) and then the individual
clients that mount that fs restarted , but i will double check and reply
later .
On
well, it's a bit complicated which is why the message is there in the first
place.
reason is, there is no easy way to tell except by dumping the stripgroup on
the filesystem manager and check what log group your particular node is
assigned to and then check the size of the log group.
as soon as
changes in ganesha management code were made in April 2016 to reduce the
need for high maxfilestocache value, the ganesha daemon adjusts it allowed
file cache by reading the maxfilestocache value and then reducing its
allowed NOFILE value . the code shipped with 4.2.2 release.
you want a high
>
> The fs has 40M inodes allocated and 12M free.
>
> -Aaron
>
> On 3/24/17 1:41 PM, Sven Oehme wrote:
> > ok, that seems a different problem then i was thinking.
> > can you send output of mmlscluster, mmlsconfig, mmlsfs all ?
> > also are you getting
d server or manager node that's running full
> > throttle across all cpus. There is one that's got relatively high CPU
> > utilization though (300-400%). I'll send a screenshot of it in a sec.
> >
> > no zimon yet but we do have other tools to see cpu utilization.
> >
&
8 CPUs)
-
Segmentation fault
-Aaron
On 3/24/17 1:04 PM, Sven Oehme wrote:
> while this is happening run to
while this is happening run top and see if there is very high cpu
utilization at this time on the NSD Server.
if there is , run perf top (you might need to install perf command) and see
if the top cpu contender is a spinlock . if so send a screenshot of perf
top as i may know what that is and
rlin
> Sr Principal Storage Engineer, Nuance
>
>
>
>
>
>
>
> *From: *<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven
> Oehme <oeh...@gmail.com>
> *Reply-To: *gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
> *Date: *W
a couple of years ago tridge demonstrated things you can do with DMAPI
interface and even delivered some non supported example code to demonstrate
it :
https://www.samba.org/~tridge/hacksm/
keep in mind that the DMAPI interface has some severe limitations in terms
of scaling, it can only run on
SD server queues?
On 2/23/17 12:12 PM, Sven Oehme wrote:
> all this waiter shows is that you have more in flight than the node or
> connection can currently serve. the reasons for that can be
> misconfiguration or you simply run out of resources on the node, not the
> connection. with l
all this waiter shows is that you have more in flight than the node or
connection can currently serve. the reasons for that can be
misconfiguration or you simply run out of resources on the node, not the
connection. with latest code you shouldn't see this anymore for node limits
as the system
.
On Wed, Jan 25, 2017 at 10:20 PM Matt Weil <mw...@wustl.edu> wrote:
>
>
> On 1/25/17 3:00 PM, Sven Oehme wrote:
>
> Matt,
>
> the assumption was that the remote devices are slower than LROC. there is
> some attempts in the code to not schedule more than a maximum
en thanks, looks like I'll be checking out grafana.
>
>
> Richard
>
>
> --
> *From:* gpfsug-discuss-boun...@spectrumscale.org <
> gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme <
> oeh...@gmail.com>
> *
Matt,
the assumption was that the remote devices are slower than LROC. there is
some attempts in the code to not schedule more than a maximum numbers of
outstanding i/os to the LROC device, but this doesn't help in all cases and
is depending on what kernel level parameters for the device are set.
Hi,
you either need to request access to GPFS 4.2.1.0 efix16 via your PMR or
need to upgrade to 4.2.2.1 both contain the fixes required.
Sven
On Mon, Jan 23, 2017 at 6:27 AM Sven Oehme <oeh...@gmail.com> wrote:
> Aaron,
>
> hold a bit with the upgrade , i just got word that wh
On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme <oeh...@gmail.com> wrote:
> Then i would suggest to move up to at least 4.2.1.LATEST , there is a high
> chance your problem might already be fixed.
>
> i see 2 potential area that got significant improvements , Token Manager
> reco
and multiple log files in parallel .
Sven
On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister <aaron.s.knis...@nasa.gov>
wrote:
> It's at 4.1.1.10.
>
> On 1/22/17 11:12 PM, Sven Oehme wrote:
> > What version of Scale/ GPFS code is this cluster on ?
> >
> > --
It is not caching however. I will restart gpfs to see if that makes it
> start working.
>
> On 12/29/16 10:18 AM, Matt Weil wrote:
>
>
>
> On 12/29/16 10:09 AM, Sven Oehme wrote:
>
> i agree that is a very long name , given this is a nvme device it should
> show up a
i agree that is a very long name , given this is a nvme device it should
show up as /dev/nvmeXYZ
i suggest to report exactly that in nsddevices and retry.
i vaguely remember we have some fixed length device name limitation , but i
don't remember what the length is, so this would be my first guess
; as many as possible and both
>
> have maxFilesToCache 128000
>
> and maxStatCache 4
>
> do these effect what sits on the LROC as well? Are those to small?
> 1million seemed excessive.
>
> On 12/20/16 11:03 AM, Sven Oehme wrote:
>
> how much files do you want
i am not sure i understand your comment with 'persistent' do you mean when
you create a nsddevice on a nvme device it won't get recognized after a
restart ?
if thats what you mean there are 2 answers , short term you need to add
a /var/mmfs/etc/nsddevices script to your node that simply adds an
Yes and no :)
While olaf is right, it needs two independent blockdevices, partitions are
just fine.
So one could have in fact have a 200g ssd as a boot device and partitions
it lets say
30g os
20g hawc
150g lroc
you have to keep in mind that lroc and hawc have 2 very different
requirements on
all depends on workload. we will publish a paper comparing mixed workload
with and without LROC pretty soon.
most numbers i have seen show anywhere between 30% - 1000% (very rare case)
improvements, so its for sure worth a test.
sven
On Fri, Oct 14, 2016 at 6:31 AM Sobey, Richard A
many customers
in the field using 1MB or even smaller blocksize on RAID stripes of 2 MB or
above and your performance will be significant impacted by that.
Sven
--
Sven Oehme
Scalable Storage Research
email: oeh...@us.ibm.com
Phone: +1 (408) 824-8904
IBM
Because they both require a different distribution of OFED, which are mutual
exclusive to install.
in theory if you deploy plain OFED it might work, but that will be hard to find
somebody to support.
Sent from IBM Verse
Aaron Knister --- Re: [gpfsug-discuss] EDR and omnipath ---
so lets start with some simple questions.
when you say mmbackup takes ages, what version of gpfs code are you running
?
how do you execute the mmbackup command ? exact parameters would be useful
.
what HW are you using for the metadata disks ?
how much capacity (df -h) and how many inodes (df -i)
they should get started as soon as you shutdown via mmshutdown
could you check a node where the processes are NOT started and simply run
mmshutdown on this node to see if they get started ?
On Thu, Jul 28, 2016 at 10:57 AM, Bryan Banister
wrote:
> I now see that
Hi,
this is a undocumented mmpmon call, so you are on your own, but here is the
correct description :
_n_
IP address of the node responding. This is the address by which GPFS knows
the node.
_nn_
The name by which GPFS knows the node.
_rc_
The reason/error code. In this case, the reply
If you can wait a few more month we will have stats for this in Zimon.
Sven
On Apr 15, 2016 12:02 PM, "Oesterlin, Robert"
wrote:
> This command is just using ssh to all the nodes and dumping the waiter
> information and collecting it. That means if the node is down,
.
just a few thoughts :-D
sven
--
Sven Oehme
Scalable Storage Research
email: oeh...@us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
--
From: Zachary Giles <zgi...@gmail.com>
To: gpfsu
its direct SAS attached .
--
Sven Oehme
Scalable Storage Research
email: oeh...@us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
--
From: "Simon Thompson (Research Computing - IT Ser
Matt,
this was true for a while, but got fixed, Netbackup has added support for
GPFS metadata and acls in newer versions.
more details can be read here :
https://www.veritas.com/support/en_US/article.79433
sven
On Thu, Dec 3, 2015 at 11:34 AM, Matt Weil wrote:
>
99 matches
Mail list logo