Hi Jim,
The common non-GPFS-specific way is to use a tool that dumps all of your
filesystem metadata into an SQL database and then you can have a webapp
that makes nice graphs/reports from the SQL database, or do your own
queries.
The Free Software example is "Robinhood" (use the POSIX scanner,
If the waiters are on a compute node and there is not much user work
running there, then the open files listed by lsof will probably be the
culprits.
On Thu, Oct 10, 2019 at 1:44 PM Damir Krstic wrote:
> is it possible via some set of mmdiag --waiters or mmfsadm dump ? to
> figure out which
son" behalf of skyl...@uw.edu> wrote:
>
> IIRC, maxMBpS isn't really a limit, but more of a hint for how GPFS
> should
> use its in-memory buffers for read prefetches and dirty writes.
>
> On Mon, Jun 17, 2019 at 09:31:38AM -0700, Alex Chekholko wrote:
> > H
Hi Chris,
I think the next thing to double-check is when the maxMBpS change takes
effect. You may need to restart the nsds. Otherwise I think your plan is
sound.
Regards,
Alex
On Mon, Jun 17, 2019 at 9:24 AM Christopher Black
wrote:
> Our network team sometimes needs to take down sections
Hi,
I have tried this before and I would like to temper your expectations.
If you use a placement policy to allow users to write any files into your
"small" pool (e.g. by directory), they will get E_NOSPC when your small
pool fills up. And they will be confused because they can't see the pool
The re-striping uses a lot of I/O, so if your goal is user-facing
performance, the re-striping is definitely hurting in the short term and is
of questionable value in the long term, depending on how much churn there
is on your filesystem.
One way to split the difference would be to run your
Hi Kevin,
Why not do single SSD devices and then just use -m DefaultMetadataReplicas
= 3 and -M MaxMetadataReplicas = 3 for your mmcrfs ? And maybe you can
even get away with -m 2 -M 3.
You will get higher performance overall by having more devices. You will
get good redundancy with GPFS
Hey Dave,
Can you say more about what you are trying to accomplish by doing the
rebalance? IME, the performance hit from running the rebalance was higher
than the performance hit from writes being directed to a subset of the
disks.
If you have any churn of the data, eventually they will
Hi,
My experience has been that you could spend the same money to just make
your main pool more performant. Instead of doing two data transfers (one
from cold pool to AFM or hot pools, one from AFM/hot to client), you can
just make the direct access of the data faster by adding more resources to
2.8TB seems quite high for only 350M inodes. Are you sure you only have
metadata in there?
On Tue, Jan 23, 2018 at 9:25 AM, Frederick Stock wrote:
> One possibility is the creation/expansion of directories or allocation of
> indirect blocks for large files.
>
> Not sure if
Hi Damir,
I'm not sure whether this applies to you, but this was my experience.
GPFS absolutely depends on a reliable network interconnect. If anything
goes wrong on the network layer, GPFS may not be able to recover.
Do you have visibility and monitoring on all the low-level network counters
Hey Aaron,
Can you define your sizes for "large blocks" and "small files"? If you
dial one up and the other down, your performance will be worse. And in any
case it's a pathological corner case so it shouldn't matter much for your
workflow, unless you've designed your system with the wrong
One of the parameters that you need to choose at filesystem creation time
is the block allocation type. -j {cluster|scatter} parameter to mmcrfs:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_blkalmap.htm#ballmap
If you use "cluster", you
John,
I think a "philosophical" difference between GPFS code and newer
filesystems which were written later, in the age of "commodity hardware",
is that GPFS expects the underlying hardware to be very reliable. So
"disks" are typically RAID arrays available via multiple paths. And
network links
Hi Kevin,
IMHO, safe to just run it again.
You can also run it with '-I test -L 6' again and look through the
output. But I don't think you can "break" anything by having it scan
and/or move data.
Can you post the full command line that you use to run it?
The behavior you describe is odd;
nd quota adjustments are manual and infrequent,
and I'm guessing the adjustments are pro-rated.
--
Alex Chekholko ch...@stanford.edu
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
are using the 4k inode size).
I have a system where the SSDs are regularly doing 6-7k IOPS for
metadata stuff. If those same 7k IOPS were spread out over the slow
data LUNs... which only have like 100 IOPS per 8+2P LUN... I'd be
consuming 700 disks just for metadata IOPS.
--
Alex Chekholko ch
51200 51200 0 none
| 1663212 004 none
[root@scg-gs0 ~]# mmlsfileset gsfs0 |grep gbsc
projects.gbscLinked/srv/gsfs0/projects/gbsc
Regards,
--
Alex Chekholko ch...@stanford.edu
___
gpfsug
king for a commitment *to* implement it, just
that I'm not asking for something seemingly simple that's actually
fairly hard to implement)?
-Aaron
--
Alex Chekholko ch...@stanford.edu
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
++
On 04/12/2016 04:54 AM, Oesterlin, Robert wrote:
For my larger clusters, I dump the cluster waiters on a regular basis
(once a minute: mmlsnode –N waiters –L), count the types and dump them
into a database for graphing via Grafana.
--
Alex Chekholko ch...@stanford.edu 347-401-4860
20 matches
Mail list logo