Re: [gpfsug-discuss] AFM Alternative?

2020-02-26 Thread Sven Oehme
if you are looking for a commercial supported solution, our Dataflow product is purpose build for this kind of task. a presentation that covers some high level aspects of it was given by me last year at one of the spectrum scale meetings in the UK -->

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Sven Oehme
Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn’t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more

Re: [gpfsug-discuss] Backup question

2019-08-29 Thread Sven Oehme
while it is true that you can backup the data with everything that can read a POSIX filesystem, you will miss all the metadata associated like extended attributes and ACL’s. beside mmbackup (which uses spectrum protect) DDN also offers a product for data management including backup/restore

Re: [gpfsug-discuss] advanced filecache math

2019-05-09 Thread Sven Oehme
Unfortunate more complicated :) The consumption here is an estimate based on 512b inodes, which no newly created filesystem has as all new default to 4k. So if you have 4k inodes you could easily need 2x of the estimated value. Then there are extended attributes, also not added here, etc . So

Re: [gpfsug-discuss] GPFS v5: Blocksizes and subblocks

2019-03-26 Thread Sven Oehme
I know this will be very confusing, but the code works different than one would think (not sure this is documented anywhere). The number of subblocks across pools of a fileystsem is calculated based on the smallest pools blocksize. So given you have a 1MB blocksize in the system pool you will

Re: [gpfsug-discuss] Clarification about blocksize in stardanrd gpfs and GNR

2019-03-22 Thread Sven Oehme
ed in your presentation). Do they do I/O test in different ways ? thanks, Alvise From: gpfsug-discuss-boun...@spectrumscale.org [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Sven Oehme [oeh...@gmail.com] Sent: Thursday, March 21, 2019 6:35 PM To: gpfsug main discussion list Subject: R

Re: [gpfsug-discuss] Clarification about blocksize in stardanrd gpfs and GNR

2019-03-21 Thread Sven Oehme
Lots of details in a presentation I did last year before I left IBM à http://files.gpfsug.org/presentations/2018/Singapore/Sven_Oehme_ESS_in_CORAL_project_update.pdf Sven From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, March 21, 2019 at 10:15 AM

Re: [gpfsug-discuss] Question about inodes incrise

2019-03-06 Thread Sven Oehme
While Fred is right, in most cases you shouldn’t see this, under heavy burst create workloads before 5.0.2 you can even trigger out of space errors even you have plenty of space in the filesystem (very hard to reproduce so unlikely to hit for a normal enduser). to address the issues there have

Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-28 Thread Sven Oehme
Hi, using nsdSmallThreadRatio 1 is not necessarily correct, as it 'significant depends' (most used word combination of performance engineers) on your workload. to give some more background - on reads you need much more threads for small i/os than for large i/os to get maximum performance, the

Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-19 Thread Sven Oehme
Just to add a bit more details to that, If you want to track down an individual i/o or all i/o to a particular file you can do this with mmfsadm dump iohist (mmdiag doesn’t give you all you need) : so run /usr/lpp/mmfs/bin/mmfsadm dump iohist >iohist on server as well as client : I/O

Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-18 Thread Sven Oehme
If you run it on the client, it includes local queuing, network as well as NSD Server processing and the actual device I/O time. if issued on the NSD Server it contains processing and I/O time, the processing shouldn’t really add any overhead but in some cases I have seen it contributing.

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
and i already talk about NUMA stuff at the CIUK usergroup meeting, i won't volunteer for a 2nd advanced topic :-D On Tue, Nov 27, 2018 at 12:43 PM Sven Oehme wrote: > was the node you rebooted a client or a server that was running kswapd at > 100% ? > > sven > > > On Tu

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
was the node you rebooted a client or a server that was running kswapd at 100% ? sven On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson wrote: > The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1 > I think). > > > > So is this memory pressure on the NSD nodes then? I

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
Hi, now i need to swap back in a lot of information about GPFS i tried to swap out :-) i bet kswapd is not doing anything you think the name suggest here, which is handling swap space. i claim the kswapd thread is trying to throw dentries out of the cache and what it tries to actually get rid

Re: [gpfsug-discuss] Hanging file-systems

2018-11-27 Thread Sven Oehme
if this happens you should check a couple of things : 1. are you under memory pressure or even worse started swapping . 2. is there any core running at ~ 0% idle - run top , press 1 and check the idle column. 3. is there any single thread running at ~100% - run top , press shift - h and check

Re: [gpfsug-discuss] mmfsd recording High CPU usage

2018-11-21 Thread Sven Oehme
Hi, the best way to debug something like that is to start with top. start top then press 1 and check if any of the cores has almost 0% idle while others have plenty of CPU left. if that is the case you have one very hot thread. to further isolate it you can press 1 again to collapse the cores,

Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp

2018-10-22 Thread Sven Oehme
Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the

Re: [gpfsug-discuss] Best way to migrate data

2018-10-22 Thread Sven Oehme
i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-10-22 Thread Sven Oehme
1.483945423 72170 TRACE_LOCK: unlock_vfs_m: cP > 0xC90069346B68 holdCount 25 > > 1.483945624 72170 TRACE_VNODE: gpfsRead exit: fast err > 0 > >1.483946831 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xC90035E52F78 NotQuiesced vfsOp 2 > >1.4

Re: [gpfsug-discuss] Can't take snapshots while re-striping

2018-10-18 Thread Sven Oehme
___ From: gpfsug-discuss-boun...@spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-disc...@gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Pete

Re: [gpfsug-discuss] Can't take snapshots while re-striping

2018-10-18 Thread Sven Oehme
Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a

Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS

2018-10-17 Thread Sven Oehme
while most said here is correct, it can’t explain the performance of 200 files /sec and I couldn’t resist jumping in here :-D lets assume for a second each operation is synchronous and its done by just 1 thread. 200 files / sec means 5 ms on average per file write. Lets be generous and say

Re: [gpfsug-discuss] Metadata with GNR code

2018-09-21 Thread Sven Oehme
somebody did listen and remembered to what i did say :-D ... and you are absolute correct, there is no need for SSD's to get great zero length mdtest results, most people don't know that create workloads unless carefully executed in general almost exclusively stresses a filesystem client and has

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-19 Thread Sven Oehme
ated to many threads trying to compete for the same buffer space. > > I will try to take the trace with trace=io option and see if can find > something. > > How do i turn of prefetching? Can i turn it off for a single node/client? > > Regards, > Lohit > > On Sep 18, 20

Re: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size

2018-09-18 Thread Sven Oehme
condition that > GPFS was not handling well. > > I am not sure if this issue is a repeat and I am yet to isolate the > incident and test with increasing number of mmap threads. > > I am not 100 percent sure if this is related to mmap yet but just wanted > to ask you if you have s

Re: [gpfsug-discuss] system.log pool on client nodes for HAWC

2018-09-03 Thread Sven Oehme
Hi Ken, what the documents is saying (or try to) is that the behavior of data in inode or metadata operations are not changed if HAWC is enabled, means if the data fits into the inode it will be placed there directly instead of writing the data i/o into a data recovery log record (which is what

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Sven Oehme
whose apps want to create a bazillion tiny files! > > So how do I do that? > > Thanks! > > — > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > > kevin.buterba...@vanderbilt.edu - (615)875-

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Sven Oehme
the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: >

Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown

2018-07-12 Thread Sven Oehme
; > [ 871.316987] [] cxiDeallocPageList+0x45/0x110 > [mmfslinux] > > [ 871.356886] [] ? _raw_spin_lock+0x10/0x30 > > [ 871.389455] [] cxiFreeSharedMemory+0x12a/0x130 > [mmfslinux] > > [ 871.429784] [] kxFreeAllSharedMemory+0xe2/0x160 > [mmfs26] > > [ 871.

Re: [gpfsug-discuss] /sbin/rmmod mmfs26 hangs on mmshutdown

2018-07-11 Thread Sven Oehme
Hi, what does numactl -H report ? also check if this is set to yes : root@fab3a:~# mmlsconfig numaMemoryInterleave numaMemoryInterleave yes Sven On Wed, Jul 11, 2018 at 6:40 AM Billich Heinrich Rainer (PSI) < heiner.bill...@psi.ch> wrote: > Hello, > > > > I have two nodes which hang on

Re: [gpfsug-discuss] subblock sanity check in 5.0

2018-07-02 Thread Sven Oehme
by:gpfsug-discuss-boun...@spectrumscale.org > -- > > > > Hi Sven, > > What is the resulting indirect-block size with a 4mb metadata block size? > > Does the new sub-block magic mean that it will take up 32k, or will it > occupy 128k? > > Che

Re: [gpfsug-discuss] subblock sanity check in 5.0

2018-07-01 Thread Sven Oehme
negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza wrote: > Hi, it's for a traditional NSD setup. > > --Joey > > On 6/26/18 12:21 AM, Sven Oehme wrote: > > Joseph, > > the subblocksize will be derived

Re: [gpfsug-discuss] subblock sanity check in 5.0

2018-06-26 Thread Sven Oehme
Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the

Re: [gpfsug-discuss] Not recommended, but why not?

2018-05-04 Thread Sven Oehme
there is nothing wrong with running CES on NSD Servers, in fact if all CES nodes have access to all LUN's of the filesystem thats the fastest possible configuration as you eliminate 1 network hop. the challenge is always to do the proper sizing, so you don't run out of CPU and memory on the nodes

Re: [gpfsug-discuss] Confusing I/O Behavior

2018-05-02 Thread Sven Oehme
a few more weeks and we have a better answer than dump pgalloc ;-) On Wed, May 2, 2018 at 6:07 AM Peter Smith wrote: > "how do I see how much of the pagepool is in use and by what? I've looked > at mmfsadm dump and mmdiag --memory and neither has provided me the >

Re: [gpfsug-discuss] Confusing I/O Behavior

2018-05-02 Thread Sven Oehme
GPFS doesn't do flush on close by default unless explicit asked by the application itself, but you can configure that . mmchconfig flushOnClose=yes if you use O_SYNC or O_DIRECT then each write ends up on the media before we return. sven On Wed, Apr 11, 2018 at 7:06 AM Peter Serocka

Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-25 Thread Sven Oehme
Hi, i guess you saw that in some of my presentations about communication code overhaul. we started in 4.2.X and since then added more and more numa awareness to GPFS. Version 5.0 also has enhancements in this space. sven On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister

Re: [gpfsug-discuss] GPFS, MMAP and Pagepool

2018-02-22 Thread Sven Oehme
Hi Lohit, i am working with ray on a mmap performance improvement right now, which most likely has the same root cause as yours , see --> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html the thread above is silent after a couple of back and rorth, but ray and i have active

Re: [gpfsug-discuss] V5 Experience -- maxblocksize

2018-02-09 Thread Sven Oehme
der immediately and destroy this information. > Any unauthorized copying, disclosure or distribution of the material in > this information is strictly forbidden. > ------ > > *Von:* gpfsug-discuss-boun...@spectrumscale.org [mailto: > gpfsug-discuss

Re: [gpfsug-discuss] V5 Experience -- maxblocksize

2018-02-09 Thread Sven Oehme
Renar, if you specify the filesystem blocksize of 1M during mmcr you don't have to restart anything. scale 5 didn't change anything on the behaviour of maxblocksize change while the cluster is online, it only changed the default passed to the blocksize parameter for create a new filesystem. one

Re: [gpfsug-discuss] mmap performance against Spectrum Scale

2018-01-15 Thread Sven Oehme
ds > > Ray Coetzee > Mob: +44 759 704 7060 <+44%207597%20047060> > > Skype: ray.coetzee > > Email: coetzee@gmail.com > > > On Fri, Jan 12, 2018 at 8:57 PM, Sven Oehme <oeh...@gmail.com> wrote: > >> is this primary read or write ? >> >

Re: [gpfsug-discuss] mmap performance against Spectrum Scale

2018-01-12 Thread Sven Oehme
what version of Scale are you using right now ? On Fri, Jan 12, 2018 at 2:29 AM Ray Coetzee wrote: > I'd like to ask the group of their experiences in improving the > performance of applications that use mmap calls against files on Spectrum > Scale. > > Besides using an

Re: [gpfsug-discuss] Online data migration tool

2017-12-21 Thread Sven Oehme
, Aaron Knister <aaron.s.knis...@nasa.gov> wrote: > > Thanks, Sven. Understood! > > On 12/19/17 3:20 PM, Sven Oehme wrote: > > Hi, > > > the zero padding was never promoted into a GA stream, it was an > > experiment to proof we are on the right track when

Re: [gpfsug-discuss] more than one mlx connectx-4 adapter in samehost

2017-12-20 Thread Sven Oehme
i don't know if that works with cisco, but i use 50 an 100m cables for 40 as well as 100Gbit in my lab between 2 Mellanox switches : http://www.mellanox.com/products/interconnect/ethernet-active-optical-cables.php as paul pointed out one of the very first things one needs to do after adding an

Re: [gpfsug-discuss] Online data migration tool

2017-12-19 Thread Sven Oehme
ng if you could explain the performance > difference between the no zero padding code and the > 32 subblock code > since given your the example of 32K files and 16MB block size I figure both > cases ought to write the same amount to disk. > > Thanks! > > -Aaron > > >

Re: [gpfsug-discuss] Online data migration tool

2017-12-15 Thread Sven Oehme
i thought i answered that already, but maybe i just thought about answering it and then forgot about it :-D so yes more than 32 subblocks per block significant increase the performance of filesystems with small files, for the sake of the argument let's say 32k in a large block filesystem again

Re: [gpfsug-discuss] Experience with CES NFS export management

2017-10-23 Thread Sven Oehme
we can not commit on timelines on mailing lists, but this is a known issue and will be addressed in a future release. sven On Mon, Oct 23, 2017, 11:23 AM Bryan Banister wrote: > This becomes very disruptive when you have to add or remove many NFS > exports. Is it

Re: [gpfsug-discuss] Recommended pagepool size on clients?

2017-10-10 Thread Sven Oehme
if this is a new cluster and you use reasonable new HW, i probably would start with just the following settings on the clients : pagepool=4g,workerThreads=256,maxStatCache=0,maxFilesToCache=256k depending on what storage you use and what workload you have you may have to set a couple of other

Re: [gpfsug-discuss] mmfsd write behavior

2017-10-09 Thread Sven Oehme
on > > On 9/8/17 5:21 PM, Sven Oehme wrote: > > Hi, > > > > the code assumption is that the underlying device has no volatile write > > cache, i was absolute sure we have that somewhere in the FAQ, but i > > couldn't find it, so i will talk to somebody to correc

Re: [gpfsug-discuss] mmapplypolicy run time weirdness..

2017-09-13 Thread Sven Oehme
can you please share the entire command line you are using ? also gpfs version, mmlsconfig output would help as well as if this is a shared storage filesystem or a system using local disks. thx. Sven On Wed, Sep 13, 2017 at 5:19 PM wrote: > So we have a number of very

Re: [gpfsug-discuss] mmfsd write behavior

2017-09-07 Thread Sven Oehme
I am not sure what exactly you are looking for but all blockdevices are opened with O_DIRECT , we never cache anything on this layer . On Thu, Sep 7, 2017, 7:11 PM Aaron Knister wrote: > Hi Everyone, > > This is something that's come up in the past and has recently

Re: [gpfsug-discuss] Change to default for verbsRdmaMinBytes?

2017-09-06 Thread Sven Oehme
t; > Are these settings still needed, or is this also tackled in the code? > > Thank you!! > > Cheers, > Kenneth > > > > On 02/09/17 00:42, Sven Oehme wrote: > > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden

Re: [gpfsug-discuss] Change to default for verbsRdmaMinBytes?

2017-09-01 Thread Sven Oehme
Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was

Re: [gpfsug-discuss] data integrity documentation

2017-08-03 Thread Sven Oehme
a trace during a mmfsck with the checksum parameters turned on would reveal it. the support team should be able to give you specific triggers to cut a trace during checksum errors , this way the trace is cut when the issue happens and then from the trace on server and client side one can extract

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Sven Oehme
; >> anyway, our current question is: if these are hardware issues, is there > >> anything in gpfs client->nsd (on the network side) that would detect > >> such errors. ie can we trust the data (and metadata). > >> i was under the impression that client to disk is not

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Sven Oehme
covered, but i > assumed that at least client to nsd (the network part) was checksummed. > > stijn > > > On 08/02/2017 09:10 PM, Sven Oehme wrote: > > ok, i think i understand now, the data was already corrupted. the config > > change i proposed only prevents a potentially k

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Sven Oehme
wei...@ugent.be> wrote: > yes ;) > > the system is in preproduction, so nothing that can't stopped/started in > a few minutes (current setup has only 4 nsds, and no clients). > mmfsck triggers the errors very early during inode replica compare. > > > stijn > > On 08/02/2017

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Sven Oehme
the very first thing you should check is if you have this setting set : mmlsconfig envVar envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1 MLX5_USE_MUTEX 1 if that doesn't come back the way above you need to set it : mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0

Re: [gpfsug-discuss] Lost disks

2017-07-26 Thread Sven Oehme
it can happen for multiple reasons , one is a linux install, unfortunate there are significant more simpler explanations. Linux as well as BIOS in servers from time to time looks for empty disks and puts a GPT label on it if the disk doesn't have one, etc. this thread is explaining a lot of this :

Re: [gpfsug-discuss] GPFS, LTFS/EE and data-in-inode?

2017-07-21 Thread Sven Oehme
Hi, i talked with a few others to confirm this, but unfortunate this is a limitation of the code today (maybe not well documented which we will look into). Encryption only encrypts data blocks, it doesn't encrypt metadata. Hence, if encryption is enabled, we don't store data in the inode,

Re: [gpfsug-discuss] WekaIO Unveils Cloud-Native Scalable File System

2017-07-12 Thread Sven Oehme
while i really like competition on SpecSFS, the claims from the WekaIO people are lets say 'alternative facts' at best The Spectrum Scale results were done on 4 Nodes with 2 Flash Storage devices attached, they compare this to a WekaIO system with 14 times more memory (14 TB vs 1TB) , 120 SSD's

Re: [gpfsug-discuss] Fwd: FLASH: IBM Spectrum Scale (GPFS): RDMA-enabled network adapter failure on the NSD server may result in file IO error (2017.06.30)

2017-06-30 Thread Sven Oehme
end-to-end data integrity is very important and the reason it hasn't been done in Scale is not because its not important, its because its very hard to do without impacting performance in a very dramatic way. imagine your raid controller blocksize is 1mb and your filesystem blocksize is 1MB . if

Re: [gpfsug-discuss] NSD access routes

2017-06-05 Thread Sven Oehme
yes as long as you haven't pushed anything to it (means pagepool got under enough pressure to free up space) you won't see anything in the stats :-) sven On Mon, Jun 5, 2017 at 7:00 AM Dave Goodbourn wrote: > OK I'm going to hang my head in the corner...RTFM...I've not

Re: [gpfsug-discuss] NSD access routes

2017-06-05 Thread Sven Oehme
if you are using O_DIRECT calls they will be ignored by default for LROC, same for encrypted data. how exactly are you testing this? On Mon, Jun 5, 2017 at 6:50 AM Dave Goodbourn wrote: > Thanks Bob, > > That pagepool comment has just answered my next question! > > But it

Re: [gpfsug-discuss] Associating I/O operations with files/processes

2017-05-30 Thread Sven Oehme
Hi, the very first thing to do would be to do a mmfsadm dump iohist instead of mmdiag --iohist one time (we actually add this info in next release to mmdiag --iohist) to see if the thread type will reveal something : 07:25:53.578522 Wdata1:20260249600 8192 35.930488076

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Sven Oehme
The reason is the default setting of : verbsRdmasPerConnection: 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of

Re: [gpfsug-discuss] CES and Directory list populating very slowly

2017-05-09 Thread Sven Oehme
and they all will have high settings. sven -- Sven Oehme Scalable Storage Research email: oeh...@us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab -- From: Mark Bush <mark.b...@siriuscom.com> To: gpfsu

Re: [gpfsug-discuss] HAWC question

2017-05-04 Thread Sven Oehme
let me clarify and get back, i am not 100% sure on a cross cluster , i think the main point was that the FS manager for that fs should be reassigned (which could also happen via mmchmgr) and then the individual clients that mount that fs restarted , but i will double check and reply later . On

Re: [gpfsug-discuss] HAWC question

2017-05-04 Thread Sven Oehme
well, it's a bit complicated which is why the message is there in the first place. reason is, there is no easy way to tell except by dumping the stripgroup on the filesystem manager and check what log group your particular node is assigned to and then check the size of the log group. as soon as

Re: [gpfsug-discuss] CES node slow to respond

2017-03-24 Thread Sven Oehme
changes in ganesha management code were made in April 2016 to reduce the need for high maxfilestocache value, the ganesha daemon adjusts it allowed file cache by reading the maxfilestocache value and then reducing its allowed NOFILE value . the code shipped with 4.2.2 release. you want a high

Re: [gpfsug-discuss] strange waiters + filesystem deadlock

2017-03-24 Thread Sven Oehme
> > The fs has 40M inodes allocated and 12M free. > > -Aaron > > On 3/24/17 1:41 PM, Sven Oehme wrote: > > ok, that seems a different problem then i was thinking. > > can you send output of mmlscluster, mmlsconfig, mmlsfs all ? > > also are you getting

Re: [gpfsug-discuss] strange waiters + filesystem deadlock

2017-03-24 Thread Sven Oehme
d server or manager node that's running full > > throttle across all cpus. There is one that's got relatively high CPU > > utilization though (300-400%). I'll send a screenshot of it in a sec. > > > > no zimon yet but we do have other tools to see cpu utilization. > > &

Re: [gpfsug-discuss] strange waiters + filesystem deadlock

2017-03-24 Thread Sven Oehme
8 CPUs) - Segmentation fault -Aaron On 3/24/17 1:04 PM, Sven Oehme wrote: > while this is happening run to

Re: [gpfsug-discuss] strange waiters + filesystem deadlock

2017-03-24 Thread Sven Oehme
while this is happening run top and see if there is very high cpu utilization at this time on the NSD Server. if there is , run perf top (you might need to install perf command) and see if the top cpu contender is a spinlock . if so send a screenshot of perf top as i may know what that is and

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-08 Thread Sven Oehme
rlin > Sr Principal Storage Engineer, Nuance > > > > > > > > *From: *<gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven > Oehme <oeh...@gmail.com> > *Reply-To: *gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> > *Date: *W

Re: [gpfsug-discuss] Tracking deleted files

2017-02-27 Thread Sven Oehme
a couple of years ago tridge demonstrated things you can do with DMAPI interface and even delivered some non supported example code to demonstrate it : https://www.samba.org/~tridge/hacksm/ keep in mind that the DMAPI interface has some severe limitations in terms of scaling, it can only run on

Re: [gpfsug-discuss] waiting for conn rdmas < conn maxrdmas

2017-02-24 Thread Sven Oehme
SD server queues? On 2/23/17 12:12 PM, Sven Oehme wrote: > all this waiter shows is that you have more in flight than the node or > connection can currently serve. the reasons for that can be > misconfiguration or you simply run out of resources on the node, not the > connection. with l

Re: [gpfsug-discuss] waiting for conn rdmas < conn maxrdmas

2017-02-23 Thread Sven Oehme
all this waiter shows is that you have more in flight than the node or connection can currently serve. the reasons for that can be misconfiguration or you simply run out of resources on the node, not the connection. with latest code you shouldn't see this anymore for node limits as the system

Re: [gpfsug-discuss] LROC 100% utilized in terms of IOs

2017-01-25 Thread Sven Oehme
. On Wed, Jan 25, 2017 at 10:20 PM Matt Weil <mw...@wustl.edu> wrote: > > > On 1/25/17 3:00 PM, Sven Oehme wrote: > > Matt, > > the assumption was that the remote devices are slower than LROC. there is > some attempts in the code to not schedule more than a maximum

Re: [gpfsug-discuss] LROC Zimon sensors

2017-01-25 Thread Sven Oehme
en thanks, looks like I'll be checking out grafana. > > > Richard > > > -- > *From:* gpfsug-discuss-boun...@spectrumscale.org < > gpfsug-discuss-boun...@spectrumscale.org> on behalf of Sven Oehme < > oeh...@gmail.com> > *

Re: [gpfsug-discuss] LROC 100% utilized in terms of IOs

2017-01-25 Thread Sven Oehme
Matt, the assumption was that the remote devices are slower than LROC. there is some attempts in the code to not schedule more than a maximum numbers of outstanding i/os to the LROC device, but this doesn't help in all cases and is depending on what kernel level parameters for the device are set.

Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?

2017-01-23 Thread Sven Oehme
Hi, you either need to request access to GPFS 4.2.1.0 efix16 via your PMR or need to upgrade to 4.2.2.1 both contain the fixes required. Sven On Mon, Jan 23, 2017 at 6:27 AM Sven Oehme <oeh...@gmail.com> wrote: > Aaron, > > hold a bit with the upgrade , i just got word that wh

Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?

2017-01-22 Thread Sven Oehme
On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme <oeh...@gmail.com> wrote: > Then i would suggest to move up to at least 4.2.1.LATEST , there is a high > chance your problem might already be fixed. > > i see 2 potential area that got significant improvements , Token Manager > reco

Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?

2017-01-22 Thread Sven Oehme
and multiple log files in parallel . Sven On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister <aaron.s.knis...@nasa.gov> wrote: > It's at 4.1.1.10. > > On 1/22/17 11:12 PM, Sven Oehme wrote: > > What version of Scale/ GPFS code is this cluster on ? > > > > --

Re: [gpfsug-discuss] LROC

2016-12-29 Thread Sven Oehme
It is not caching however. I will restart gpfs to see if that makes it > start working. > > On 12/29/16 10:18 AM, Matt Weil wrote: > > > > On 12/29/16 10:09 AM, Sven Oehme wrote: > > i agree that is a very long name , given this is a nvme device it should > show up a

Re: [gpfsug-discuss] LROC

2016-12-29 Thread Sven Oehme
i agree that is a very long name , given this is a nvme device it should show up as /dev/nvmeXYZ i suggest to report exactly that in nsddevices and retry. i vaguely remember we have some fixed length device name limitation , but i don't remember what the length is, so this would be my first guess

Re: [gpfsug-discuss] LROC

2016-12-21 Thread Sven Oehme
; as many as possible and both > > have maxFilesToCache 128000 > > and maxStatCache 4 > > do these effect what sits on the LROC as well? Are those to small? > 1million seemed excessive. > > On 12/20/16 11:03 AM, Sven Oehme wrote: > > how much files do you want

Re: [gpfsug-discuss] Intel Whitepaper - Spectrum Scale & LROC with NVMe

2016-12-06 Thread Sven Oehme
i am not sure i understand your comment with 'persistent' do you mean when you create a nsddevice on a nvme device it won't get recognized after a restart ? if thats what you mean there are 2 answers , short term you need to add a /var/mmfs/etc/nsddevices script to your node that simply adds an

Re: [gpfsug-discuss] HAWC and LROC

2016-11-05 Thread Sven Oehme
Yes and no :) While olaf is right, it needs two independent blockdevices, partitions are just fine. So one could have in fact have a 200g ssd as a boot device and partitions it lets say 30g os 20g hawc 150g lroc you have to keep in mind that lroc and hawc have 2 very different requirements on

Re: [gpfsug-discuss] LROC benefits

2016-10-15 Thread Sven Oehme
all depends on workload. we will publish a paper comparing mixed workload with and without LROC pretty soon. most numbers i have seen show anywhere between 30% - 1000% (very rare case) improvements, so its for sure worth a test. sven On Fri, Oct 14, 2016 at 6:31 AM Sobey, Richard A

Re: [gpfsug-discuss] Blocksize

2016-09-23 Thread Sven Oehme
many customers in the field using 1MB or even smaller blocksize on RAID stripes of 2 MB or above and your performance will be significant impacted by that. Sven -- Sven Oehme Scalable Storage Research email: oeh...@us.ibm.com Phone: +1 (408) 824-8904 IBM

Re: [gpfsug-discuss] EDR and omnipath

2016-09-19 Thread Sven Oehme
Because they both require a different distribution of OFED, which are mutual exclusive to install. in theory if you deploy plain OFED it might work, but that will be hard to find somebody to support. Sent from IBM Verse Aaron Knister --- Re: [gpfsug-discuss] EDR and omnipath ---

Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Sven Oehme
so lets start with some simple questions. when you say mmbackup takes ages, what version of gpfs code are you running ? how do you execute the mmbackup command ? exact parameters would be useful . what HW are you using for the metadata disks ? how much capacity (df -h) and how many inodes (df -i)

Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown

2016-07-28 Thread Sven Oehme
they should get started as soon as you shutdown via mmshutdown could you check a node where the processes are NOT started and simply run mmshutdown on this node to see if they get started ? On Thu, Jul 28, 2016 at 10:57 AM, Bryan Banister wrote: > I now see that

Re: [gpfsug-discuss] mmpmon gfis fields question

2016-07-07 Thread Sven Oehme
Hi, this is a undocumented mmpmon call, so you are on your own, but here is the correct description : _n_ IP address of the node responding. This is the address by which GPFS knows the node. _nn_ The name by which GPFS knows the node. _rc_ The reason/error code. In this case, the reply

Re: [gpfsug-discuss] Executing Callbacks on other Nodes

2016-04-15 Thread Sven Oehme
If you can wait a few more month we will have stats for this in Zimon. Sven On Apr 15, 2016 12:02 PM, "Oesterlin, Robert" wrote: > This command is just using ssh to all the nodes and dumping the waiter > information and collecting it. That means if the node is down,

Re: [gpfsug-discuss] Small cluster

2016-03-04 Thread Sven Oehme
. just a few thoughts :-D sven -- Sven Oehme Scalable Storage Research email: oeh...@us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab -- From: Zachary Giles <zgi...@gmail.com> To: gpfsu

Re: [gpfsug-discuss] IBM-Sandisk Announcement

2016-03-02 Thread Sven Oehme
its direct SAS attached . -- Sven Oehme Scalable Storage Research email: oeh...@us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab -- From: "Simon Thompson (Research Computing - IT Ser

Re: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS?

2015-12-03 Thread Sven Oehme
Matt, this was true for a while, but got fixed, Netbackup has added support for GPFS metadata and acls in newer versions. more details can be read here : https://www.veritas.com/support/en_US/article.79433 sven On Thu, Dec 3, 2015 at 11:34 AM, Matt Weil wrote: >