Re: [gpfsug-discuss] advanced filecache math

2019-05-09 Thread Stijn De Weirdt
seems like we are suffering from
http://www-01.ibm.com/support/docview.wss?uid=isg1IJ12737

as these are ces nodes, we susepcted something wrong the caches, but it
looks like a memleak instead.

sorry for the noise (as usual you find the solution right after sending
the mail ;)

stijn

On 5/9/19 4:38 PM, Stijn De Weirdt wrote:
> hi achim,
> 
>> you just misinterpreted the term fileCacheLimit.
>> This is not in byte, but specifies the maxFilesToCache setting :
> i understand that, but how does the fileCacheLimit relate to the
> fileCacheMem number?
> 
> 
> 
> (we have a 32GB pagepool, and mmfsd is using 80GB RES (101 VIRT), so we
> are looking for large numbers that might explain wtf is going on
> (pardon my french ;)
> 
> stijn
> 
>>
>> UMALLOC limits:
>>  bufferDescLimit  4 desired4
>>  fileCacheLimit  4000 desired 4000   <=== mFtC
>>  statCacheLimit  1000 desired 1000   <=== mSC
>>  diskAddrBuffLimit  200 desired  200
>>
>> # mmfsadm dump config | grep -E "maxFilesToCache|maxStatCache"
>> maxFilesToCache 4000
>> maxStatCache 1000
>>
>> Mit freundlichen Grüßen / Kind regards
>>
>> *Achim Rehor*
>>
>> 
>> Software Technical Support Specialist AIX/ Emea HPC Support  
>> IBM Certified Advanced Technical Expert - Power Systems with AIX
>> TSCC Software Service, Dept. 7922
>> Global Technology Services
>> 
>> Phone:   +49-7034-274-7862IBM Deutschland
>> E-Mail:  achim.re...@de.ibm.com   Am Weiher 24
>>   65451 Kelsterbach
>>   Germany
>>  
>> 
>> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
>> Geschäftsführung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan 
>> Lutz, 
>> Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt
>> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
>> HRB 
>> 14562 WEEE-Reg.-Nr. DE 99369940
>>
>>
>>
>>
>>
>>
>> From: Stijn De Weirdt 
>> To: gpfsug main discussion list 
>> Date: 09/05/2019 16:21
>> Subject: [gpfsug-discuss] advanced filecache math
>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>>
>> 
>>
>>
>>
>> hi all,
>>
>> we are looking into some memory issues with gpfs 5.0.2.2, and found
>> following in mmfsadm dump fs:
>>
>>  > fileCacheLimit 100 desired  100
>> ...
>>  > fileCacheMem 38359956 k  = 11718554 * 3352 bytes (inode size 512 
>> + 2840)
>>
>> the limit is 1M (we configured that), however, the fileCacheMem mentions
>> 11.7M?
>>
>> this is also reported right after a mmshutdown/startup.
>>
>> how do these 2 relate (again?)?
>>
>> mnay thanks,
>>
>> stijn
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>>
>>
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] advanced filecache math

2019-05-09 Thread Stijn De Weirdt
hi achim,

> you just misinterpreted the term fileCacheLimit.
> This is not in byte, but specifies the maxFilesToCache setting :
i understand that, but how does the fileCacheLimit relate to the
fileCacheMem number?



(we have a 32GB pagepool, and mmfsd is using 80GB RES (101 VIRT), so we
are looking for large numbers that might explain wtf is going on
(pardon my french ;)

stijn

> 
> UMALLOC limits:
>  bufferDescLimit  4 desired4
>  fileCacheLimit  4000 desired 4000   <=== mFtC
>  statCacheLimit  1000 desired 1000   <=== mSC
>  diskAddrBuffLimit  200 desired  200
> 
> # mmfsadm dump config | grep -E "maxFilesToCache|maxStatCache"
> maxFilesToCache 4000
> maxStatCache 1000
> 
> Mit freundlichen Grüßen / Kind regards
> 
> *Achim Rehor*
> 
> 
> Software Technical Support Specialist AIX/ Emea HPC Support   
> IBM Certified Advanced Technical Expert - Power Systems with AIX
> TSCC Software Service, Dept. 7922
> Global Technology Services
> 
> Phone:+49-7034-274-7862IBM Deutschland
> E-Mail:   achim.re...@de.ibm.com   Am Weiher 24
>65451 Kelsterbach
>Germany
>   
> 
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martin Hartmann (Vorsitzender), Norbert Janzen, Stefan 
> Lutz, 
> Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 
> 14562 WEEE-Reg.-Nr. DE 99369940
> 
> 
> 
> 
> 
> 
> From: Stijn De Weirdt 
> To: gpfsug main discussion list 
> Date: 09/05/2019 16:21
> Subject: [gpfsug-discuss] advanced filecache math
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> 
> 
> hi all,
> 
> we are looking into some memory issues with gpfs 5.0.2.2, and found
> following in mmfsadm dump fs:
> 
>  > fileCacheLimit 100 desired  100
> ...
>  > fileCacheMem 38359956 k  = 11718554 * 3352 bytes (inode size 512 + 
> 2840)
> 
> the limit is 1M (we configured that), however, the fileCacheMem mentions
> 11.7M?
> 
> this is also reported right after a mmshutdown/startup.
> 
> how do these 2 relate (again?)?
> 
> mnay thanks,
> 
> stijn
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] advanced filecache math

2019-05-09 Thread Stijn De Weirdt
hi all,

we are looking into some memory issues with gpfs 5.0.2.2, and found
following in mmfsadm dump fs:

> fileCacheLimit 100 desired  100
...
> fileCacheMem 38359956 k  = 11718554 * 3352 bytes (inode size 512 + 
> 2840)

the limit is 1M (we configured that), however, the fileCacheMem mentions
11.7M?

this is also reported right after a mmshutdown/startup.

how do these 2 relate (again?)?

mnay thanks,

stijn
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS

2018-10-17 Thread Stijn De Weirdt
hi all,

has anyone tried to use tools like eatmydata that allow the user to
"ignore" the syncs (there's another tool that has less explicit name if
it would make you feel better ;).

stijn

On 10/17/2018 03:26 PM, Tomer Perry wrote:
> Just to clarify ( from man exports):
> "  async  This option allows the NFS server to violate the NFS protocol 
> and reply to requests before any changes made by that request have been 
> committed  to  stable  storage  (e.g.
>   disc drive).
> 
>   Using this option usually improves performance, but at the 
> cost that an unclean server restart (i.e. a crash) can cause data to be 
> lost or corrupted."
> 
> With the Ganesha implementation in Spectrum Scale, it was decided not to 
> allow this violation - so this async export options wasn't exposed.
> I believe that for those customers  that agree to take the risk, using 
> async mount option ( from the client) will achieve similar behavior.
> 
> Regards,
> 
> Tomer Perry
> Scalable I/O Development (Spectrum Scale)
> email: t...@il.ibm.com
> 1 Azrieli Center, Tel Aviv 67021, Israel
> Global Tel:+1 720 3422758
> Israel Tel:  +972 3 9188625
> Mobile: +972 52 2554625
> 
> 
> 
> 
> From:   "Olaf Weiser" 
> To: gpfsug main discussion list 
> Date:   17/10/2018 16:16
> Subject:Re: [gpfsug-discuss] Preliminary conclusion: single 
> client, single thread, small files - native Scale vs NFS
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> Jallo Jan, 
> you can expect to get slightly improved numbers from the lower response 
> times of the HAWC ... but the loss of performance comes from the fact, 
> that 
> GPFS or (async kNFS) writes with multiple parallel threads - in opposite 
> to e.g. tar via GaneshaNFS  comes with single threads fsync on each file.. 
> 
> 
> you'll never outperform e.g. 128 (maybe slower), but, parallel threads 
> (running write-behind)   <--->   with one single but fast threads, 
> 
> so as Alex suggest.. if possible.. take gpfs client of kNFS  for those 
> types of workloads..
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From:Jan-Frode Myklebust 
> To:gpfsug main discussion list 
> Date:10/17/2018 02:24 PM
> Subject:Re: [gpfsug-discuss] Preliminary conclusion: single 
> client, single thread, small files - native Scale vs NFS
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> Do you know if the slow throughput is caused by the network/nfs-protocol 
> layer, or does it help to use faster storage (ssd)? If on storage, have 
> you considered if HAWC can help?
> 
> I?m thinking about adding an SSD pool as a first tier to hold the active 
> dataset for a similar setup, but that?s mainly to solve the small file 
> read workload (i.e. random I/O ).
> 
> 
> -jf
> ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp <
> alexander.sa...@de.ibm.com>:
> Dear Mailing List readers,
> 
> I've come to a preliminary conclusion that explains the behavior in an 
> appropriate manner, so I'm trying to summarize my current thinking with 
> this audience.
> 
> Problem statement: 
> Big performance derivation between native GPFS (fast) and loopback NFS 
> mount on the same node (way slower) for single client, single thread, 
> small files workload.
> 
> 
> Current explanation:
> tar seems to use close() on files, not fclose(). That is an application 
> choice and common behavior. The ideas is to allow OS write caching to 
> speed up process run time.
> 
> When running locally on ext3 / xfs / GPFS / .. that allows async destaging 
> of data down to disk, somewhat compromising data for better performance. 
> As we're talking about write caching on the same node that the application 
> runs on - a crash is missfortune but in the same failure domain.
> E.g. if you run a compile job that includes extraction of a tar and the 
> node crashes you'll have to restart the entire job, anyhow.
> 
> The NFSv2 spec defined that NFS io's are to be 'sync', probably because 
> the compile job on the nfs client would survive if the NFS Server crashes, 
> so the failure domain would be different
> 
> NFSv3 in rfc1813 below acknowledged the performance impact and introduced 
> the 'async' flag for NFS, which would handle IO's similar to local IOs, 
> allowing to destage in the background.
> 
> Keep in mind - applications, independent if running locally or via NFS can 
> always decided to use the fclose() option, which will ensure that data is 
> destaged to persistent storage right away.
> But its an applications choice if that's really mandatory or whether 
> performance has higher priority.
> 
> The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down 
> to disk - very filesystem independent.
> 
> -> single client, single thread, small files workload on GPFS can be 
> destaged async, allowing to hide latency and parallelizing disk IOs.
> -> NFS client IO's are sync, so the second IO can only be started after 
> the first 

Re: [gpfsug-discuss] system.log pool on client nodes for HAWC

2018-09-04 Thread Stijn De Weirdt
hi vasily, sven,

and is there any advantage in moving the system.log pool to faster
storage (like nvdimm) or increasing its default size when HAWC is not
used (ie write-cache-threshold kept to 0). (i remember the (very
creative) logtip placement on the gss boxes ;)

thanks a lot for the detailed answer

stijn

On 09/04/2018 05:57 PM, Vasily Tarasov wrote:
> Let me add just one more item to Sven's detailed reply: HAWC is especially 
> helpful to decrease the latencies of small synchronous I/Os that come in 
> *bursts*. If your workload contains a sustained high rate of writes, the 
> recovery log will get full very quickly, and HAWC won't help much (or can 
> even 
> decrease performance). Making the recovery log larger allows to adsorb longer 
> I/O bursts. The specific amount of improvements depends  on  the workload 
> (how 
> long/high are bursts, e.g.) and hardware.
> Best,
> Vasily
> --
> Vasily Tarasov,
> Research Staff Member,
> Storage Systems Research,
> IBM Research - Almaden
> 
> - Original message -
> From: Sven Oehme 
> To: gpfsug main discussion list 
> Cc: Vasily Tarasov 
> Subject: Re: [gpfsug-discuss] system.log pool on client nodes for HAWC
> Date: Mon, Sep 3, 2018 8:32 AM
> Hi Ken,
> what the documents is saying (or try to) is that the behavior of data in
> inode or metadata operations are not changed if HAWC is enabled, means if
> the data fits into the inode it will be placed there directly instead of
> writing the data i/o into a data recovery log record (which is what HAWC
> uses) and then later destage it where ever the data blocks of a given file
> eventually will be written. that also means if all your application does 
> is
> creating small files that fit into the inode, HAWC will not be able to
> improve performance.
> its unfortunate not so simple to say if HAWC will help or not, but here 
> are
> a couple of thoughts where HAWC will not help and help :
> on the where it won't help :
> 1. if you have storage device which has very large or even better are log
> structured write cache.
> 2. if majority of your files are very small
> 3. if your files will almost always be accesses sequentially
> 4. your storage is primarily flash based
> where it most likely will help :
> 1. your majority of storage is direct attached HDD (e.g. FPO) with a small
> SSD pool for metadata and HAWC
> 2. your ratio of clients to storage devices is very high (think hundreds 
> of
> clients and only 1 storage array)
> 3. your workload is primarily virtual machines or databases
> as always there are lots of exceptions and corner cases, but is the best
> list i could come up with.
> on how to find out if HAWC could help, there are 2 ways of doing this
> first, look at mmfsadm dump iocounters , you see the average size of i/os
> and you could check if there is a lot of small write operations done.
> a more involved but more accurate way would be to take a trace with trace
> level trace=io , that will generate a very lightweight trace of only the
> most relevant io layers of GPFS, you could then post process the 
> operations
> performance, but the data is not the simplest to understand for somebody
> with low knowledge of filesystems, but if you stare at it for a while it
> might make some sense to you.
> Sven
> On Mon, Sep 3, 2018 at 4:06 PM Kenneth Waegeman  > wrote:
> 
> Thank you Vasily and Simon for the clarification!
> 
> I was looking further into it, and I got stuck with more questions :)
> 
> 
> - In
> 
> https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_tuning.htm
> I read:
>  HAWC does not change the following behaviors:
>  write behavior of small files when the data is placed in the
> inode itself
>  write behavior of directory blocks or other metadata
> 
> I wondered why? Is the metadata not logged in the (same) recovery 
> logs?
> (It seemed by reading
> 
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_logfile.htm
> it does )
> 
> 
> - Would there be a way to estimate how much of the write requests on a
> running cluster would benefit from enabling HAWC ?
> 
> 
> Thanks again!
> 
> 
> Kenneth
> On 31/08/18 19:49, Vasily Tarasov wrote:
>> That is correct. The blocks of each recovery log are striped across
>> the devices in the system.log pool (if it is defined). As a result,
>> even when all clients have a local device in the system.log pool, 
>> many
>> writes to the recovery log will go to remote devices. For a client
>> that lacks a local device in the system.log pool, log 

Re: [gpfsug-discuss] gpfs 4.2.3.6 stops workingwithkernel3.10.0-862.2.3.el7

2018-05-15 Thread Stijn De Weirdt
hi stephen,

> There isn’t a flaw in that argument, but where the security experts
> are concerned there is no argument.
we have gpfs clients hosts where users can login, we can't update those.
that is a certain worry.

> 
> Apparently this time Red Hat just told all of their RHEL 7.4
> customers to upgrade to RHEL 7.5, rather than back-porting the
> security patches. So this time the retirement to upgrade
> distributions is much worse than normal.
there's no 'this time', this is the default rhel support model. only
with EUS you get patches for non-latest minor releases.

stijn

> 
> 
> 
> ___ gpfsug-discuss
> mailing list gpfsug-discuss at spectrumscale.org 
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working withkernel 3.10.0-862.2.3.el7

2018-05-15 Thread Stijn De Weirdt
so this means running out-of-date kernels for at least another month? oh
boy...

i hope this is not some new trend in gpfs support. othwerwise all RHEL
based sites will have to start adding EUS as default cost to run gpfs
with basic security compliance.

stijn


On 05/15/2018 09:02 PM, Felipe Knop wrote:
> All,
> 
> Validation of RHEL 7.5 on Scale is currently under way, and we are
> currently targeting mid June to release the PTFs on 4.2.3 and 5.0 which
> will include the corresponding fix.
> 
> Regards,
> 
>   Felipe
> 
> 
> Felipe Knop k...@us.ibm.com
> GPFS Development and Security
> IBM Systems
> IBM Building 008
> 2455 South Rd, Poughkeepsie, NY 12601
> (845) 433-9314  T/L 293-9314
> 
> 
> 
> 
> 
> From: Ryan Novosielski 
> To:   gpfsug main discussion list 
> Date: 05/15/2018 12:56 PM
> Subject:  Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working withkernel
> 3.10.0-862.2.3.el7
> Sent by:  gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> I know these dates can move, but any vague idea of a timeframe target for
> release (this quarter, next quarter, etc.)?
> 
> Thanks!
> 
> --
> 
> || \\UTGERS,
> |---*O*---
> ||_// the State | Ryan Novosielski - 
> novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ | Office of Advanced Research Computing - MSB
> C630, Newark
>  `'
> 
>> On May 14, 2018, at 9:30 AM, Felipe Knop  wrote:
>>
>> All,
>>
>> Support for RHEL 7.5 and kernel level 3.10.0-862 in Spectrum Scale is
> planned for upcoming PTFs on 4.2.3 and 5.0. Since code changes are needed
> in Scale to support this kernel level, upgrading to one of those upcoming
> PTFs will be required in order to run with that kernel.
>>
>> Regards,
>>
>> Felipe
>>
>> 
>> Felipe Knop  k...@us.ibm.com
>> GPFS Development and Security
>> IBM Systems
>> IBM Building 008
>> 2455 South Rd, Poughkeepsie, NY 12601
>> (845) 433-9314 T/L 293-9314
>>
>>
>>
>> Andi Rhod Christiansen ---05/14/2018 08:15:25 AM---You are
> welcome. I see your concern but as long as IBM has not released spectrum
> scale for 7.5 that
>>
>> From:  Andi Rhod Christiansen 
>> To:  gpfsug main discussion list 
>> Date:  05/14/2018 08:15 AM
>> Subject:  Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working with kernel
> 3.10.0-862.2.3.el7
>> Sent by:  gpfsug-discuss-boun...@spectrumscale.org
>>
>>
>>
>>
>> You are welcome.
>>
>> I see your concern but as long as IBM has not released spectrum scale for
> 7.5 that is their only solution, in regards to them caring about security I
> would say yes they do care, but from their point of view either they tell
> the customer to upgrade as soon as red hat releases new versions and
> forcing the customer to be down until they have a new release or they tell
> them to stay on supported level to a new release is ready.
>>
>> they should release a version supporting the new kernel soon, IBM told me
> when I asked that they are "currently testing and have a support date soon"
>>
>> Best regards.
>>
>>
>> -Oprindelig meddelelse-
>> Fra: gpfsug-discuss-boun...@spectrumscale.org
>  På vegne af z@imperial.ac.uk
>> Sendt: 14. maj 2018 13:59
>> Til: gpfsug main discussion list 
>> Emne: Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working with kernel
> 3.10.0-862.2.3.el7
>>
>> Thanks. Does IBM care about security, one would ask? In this case I'd
> choose to use the new kernel for my virtualization over gpfs ... sigh
>>
>>
>> https://access.redhat.com/errata/RHSA-2018:1318
>>
>> Kernel: KVM: error in exception handling leads to wrong debug stack value
> (CVE-2018-1087)
>>
>> Kernel: error in exception handling leads to DoS (CVE-2018-8897)
>> Kernel: ipsec: xfrm: use-after-free leading to potential privilege
> escalation (CVE-2017-16939)
>>
>> kernel: Out-of-bounds write via userland offsets in ebt_entry struct in
> netfilter/ebtables.c (CVE-2018-1068)
>>
>> ...
>>
>>
>> On Mon, 14 May 2018, Andi Rhod Christiansen wrote:
>>> Date: Mon, 14 May 2018 11:10:18 +
>>> From: Andi Rhod Christiansen 
>>> Reply-To: gpfsug main discussion list
>>> 
>>> To: gpfsug main discussion list 
>>> Subject: Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working with kernel
>>> 3.10.0-862.2.3.el7
>>>
>>> Hi,
>>>
>>> Yes, kernel 3.10.0-862.2.3.el7 is not supported yet as it is RHEL 7.5
>>> and latest support is 7.4. You have to revert back to 3.10.0-693 
>>>
>>> I just had the same issue
>>>
>>> Revert to previous working kernel at redhat 7.4 release which is
> 3.10.9.693. Make sure kernel-headers and kernel-devel are also at this
> level.
>>>

Re: [gpfsug-discuss] asking for your vote for an RFE to support NFS V4.1

2018-03-12 Thread Stijn De Weirdt
hi malahal,

we already figured that out but were hesitant to share it in case ibm
wanted to remove this loophole.

but can we assume that manuanlly editing the ganesha.conf and pushing it
to ccr is supported?
the config file is heavily edited / rewritten when certain mm commands,
so we want to make sure we can always do this.

it would be even better if the main.conf that is generated/edited by the
ccr commands just had an include statement so we can edit another file
locally instead of doing mmccr magic.

stijn

On 03/12/2018 10:54 AM, Malahal R Naineni wrote:
> Upstream Ganesha code allows all NFS versions including NFSv4.2. Most Linux 
> clients were defaulting to NFSv4.0, but now they started using NFS4.1 which 
> IBM 
> doesn't support. To avoid people accidentally using NFSv4.1, we decided to 
> remove it by default.
> We don't support NFSv4.1, so there is no spectrum command to enable NFSv4.1 
> support with PTF6. Of course, if you are familiar with mmccr, you can change 
> the 
> config and let it use NFSv4.1 but any issues with NFS4.1 will go to 
> /dev/null. :-)
> You need to add "minor_versions = 0,1;" to NFSv4{} block 
> in /var/mmfs/ces/nfs-config/gpfs.ganesha.main.conf to allow NFSv4.0 and 
> NFsv4.1, 
> and make sure you use mmccr command to make this change permanent.
> Regards, Malahal.
> 
> - Original message -
> From: Stijn De Weirdt <stijn.dewei...@ugent.be>
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> To: gpfsug-discuss@spectrumscale.org
> Cc:
> Subject: Re: [gpfsug-discuss] asking for your vote for an RFE to support 
> NFS
> V4.1
> Date: Fri, Mar 9, 2018 6:13 PM
> hi all,
> 
> i would second this request to upvote this. the fact that 4.1 support
> was dropped in a subsubminor update (4.2.3.5 to 4.3.26 afaik) was
> already pretty bad to discover, but at the very least there should be an
> option to reenable it.
> 
> i'm also interested why this was removed (or actively prevented to
> enable). i can understand that eg pnfs is not support, bu basic protocol
> features wrt HA are a must have.
> only with 4.1 are we able to do ces+ganesha failover without IO error,
> something that should be basic feature nowadays.
> 
> stijn
> 
> On 03/09/2018 01:21 PM, Engeli  Willi (ID SD) wrote:
>  > Hello Group,
>  >
>  > I’ve just created a request for enhancement (RFE) to have ganesha 
> supporting
>  > NFS V4.1.
>  >
>  > It is important, to have this new Protocol version supported, since our
>  > Linux clients default support is more then 80% based in this version by
>  > default and Linux distributions are actively pushing this Protocol.
>  >
>  > The protocol also brings important corrections and enhancements with 
> it.
>  >
>  >
>  >
>  > I would like to ask you all very kindly to vote for this RFE please.
>  >
>  > You find it here: https://www.ibm.com/developerworks/rfe/execute
>  >
>  > Headline:NFS V4.1 Support
>  >
>  > ID:117398
>  >
>  >
>  >
>  >
>  >
>  > Freundliche Grüsse
>  >
>  >
>  >
>  > Willi Engeli
>  >
>  > ETH Zuerich
>  >
>  > ID Speicherdienste
>  >
>  > Weinbergstrasse 11
>  >
>  > WEC C 18
>  >
>  > 8092 Zuerich
>  >
>  >
>  >
>  > Tel: +41 44 632 02 69
>  >
>  >
>  >
>  >
>  >
>  >
>  > ___
>  > gpfsug-discuss mailing list
>  > gpfsug-discuss at spectrumscale.org
>  >
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=oaQVLOYto6Ftb8wAbynvIiIdh2UEjHxQByDz70-6a_0=yq4xoVKCPWQTqZVp0BgG8fBpXrS2FehGlAua1Eixci4=9DJi6qkF4eRc81vv6SlC3gxKL9oJJ4efkktzNaZAnkA=
>  >
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=oaQVLOYto6Ftb8wAbynvIiIdh2UEjHxQByDz70-6a_0=yq4xoVKCPWQTqZVp0BgG8fBpXrS2FehGlAua1Eixci4=9DJi6qkF4eRc81vv6SlC3gxKL9oJJ4efkktzNaZAnkA=
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] asking for your vote for an RFE to support NFS V4.1

2018-03-09 Thread Stijn De Weirdt
hi marcelo,

can you try

https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe_ID=117398


stijn

On 03/09/2018 01:51 PM, Marcelo Garcia wrote:
> Hi
> 
> I got the following error when trying the URL below:
> {e: 'Exception usecase string is null'}
> 
> Regards
> 
> mg.
> 
> 
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Engeli Willi 
> (ID SD)
> Sent: Freitag, 9. März 2018 13:21
> To: gpfsug-discuss@spectrumscale.org
> Subject: [gpfsug-discuss] asking for your vote for an RFE to support NFS V4.1
> 
> Hello Group,
> I've just created a request for enhancement (RFE) to have ganesha supporting 
> NFS V4.1.
> It is important, to have this new Protocol version supported, since our Linux 
> clients default support is more then 80% based in this version by default and 
> Linux distributions are actively pushing this Protocol.
> The protocol also brings important corrections and enhancements with it.
> 
> I would like to ask you all very kindly to vote for this RFE please.
> You find it here: https://www.ibm.com/developerworks/rfe/execute
> 
> Headline:NFS V4.1 Support
> 
> ID:117398
> 
> 
> Freundliche Grüsse
> 
> Willi Engeli
> ETH Zuerich
> ID Speicherdienste
> Weinbergstrasse 11
> WEC C 18
> 8092 Zuerich
> 
> Tel: +41 44 632 02 69
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-23 Thread Stijn De Weirdt
hi all,

we had the same idea long ago, afaik the issue we had was due to the
pinned memory the pagepool uses when RDMA is enabled.

at some point we restarted gpfs on the compute nodes for each job,
similar to the way we do swapoff/swapon; but in certain scenarios gpfs
really did not like it; so we gave up on it.

the other issue that needs to be resolved is that the pagepool needs to
be numa aware, so the pagepool is nicely allocated across all numa
domains, instead of using the first ones available. otherwise compute
jobs might start that only do non-local doamin memeory access.

stijn

On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote:
> AFAIK you can increase the pagepool size dynamically but you cannot shrink 
> it dynamically.  To shrink it you must restart the GPFS daemon.   Also, 
> could you please provide the actual pmap commands you executed?
> 
> Regards, The Spectrum Scale (GPFS) team
> 
> --
> If you feel that your question can benefit other users of  Spectrum Scale 
> (GPFS), then please post it to the public IBM developerWroks Forum at 
> https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
> . 
> 
> If your query concerns a potential software error in Spectrum Scale (GPFS) 
> and you have an IBM software maintenance contract please contact 
> 1-800-237-5511 in the United States or your local IBM Service Center in 
> other countries. 
> 
> The forum is informally monitored as time permits and should not be used 
> for priority messages to the Spectrum Scale (GPFS) team.
> 
> 
> 
> From:   Aaron Knister 
> To: 
> Date:   02/22/2018 10:30 PM
> Subject:Re: [gpfsug-discuss] pagepool shrink doesn't release all 
> memory
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> This is also interesting (although I don't know what it really means). 
> Looking at pmap run against mmfsd I can see what happens after each step:
> 
> # baseline
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
> Total:   1613580K 1191020K 1189650K 1171836K  0K
> 
> # tschpool 64G
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 0200 67108864K 67108864K 67108864K 67108864K  0K rwxp 
> [anon]
> Total:   67706636K 67284108K 67282625K 67264920K  0K
> 
> # tschpool 1G
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
> 020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
> 020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
> Total:   67706636K 1223820K 1222451K 1204632K  0K
> 
> Even though mmfsd has that 64G chunk allocated there's none of it 
> *used*. I wonder why Linux seems to be accounting it as allocated.
> 
> -Aaron
> 
> On 2/22/18 10:17 PM, Aaron Knister wrote:
>> I've been exploring the idea for a while of writing a SLURM SPANK plugin 
> 
>> to allow users to dynamically change the pagepool size on a node. Every 
>> now and then we have some users who would benefit significantly from a 
>> much larger pagepool on compute nodes but by default keep it on the 
>> smaller side to make as much physmem available as possible to batch 
> work.
>>
>> In testing, though, it seems as though reducing the pagepool doesn't 
>> quite release all of the memory. I don't really understand it because 
>> I've never before seen memory that was previously resident become 
>> un-resident but still maintain the virtual memory allocation.
>>
>> Here's what I mean. Let's take a node with 128G and a 1G pagepool.
>>
>> If I do the following to simulate what might happen as various jobs 
>> tweak the pagepool:
>>
>> - tschpool 64G
>> - tschpool 1G
>> - tschpool 32G
>> - tschpool 1G
>> - tschpool 32G
>>
>> I end up with this:
>>
>> mmfsd thinks there's 32G resident but 64G virt
>> # ps -o vsz,rss,comm -p 24397
>> VSZ   RSS COMMAND
>> 67589400 33723236 mmfsd
>>
>> however, linux thinks there's ~100G used
>>
>> # free -g
>>   total   used   free sharedbuffers 
> cached
>> Mem:   125100 25  0  0  
> 0
>> -/+ buffers/cache: 98 26
>> Swap:7  0  7
>>
>> I can jump back and forth between 1G and 32G *after* allocating 64G 
>> pagepool and the overall amount of memory in use doesn't balloon but I 
>> can't seem to shed that original 64G.
>>
>> I don't understand what's going on... 

Re: [gpfsug-discuss] CCR cluster down for the count?

2017-09-20 Thread Stijn De Weirdt
hi kevin,

we were hit by similar issue when we did something not so smart: we had
a 5 node quorum, and we wanted to replace 1 test node with 3 more
production quorum node. we however first removed the test node, and then
with 4 quorum nodes we did mmshutdown for some other config
modifications. when we tried to start it, we hit the same "Not enough
CCR quorum nodes available" errors.

also, none of the ccr commands were helpful; they also hanged, even
simple ones like show etc etc.

what we did in the end was the following (and some try-and-error):

from the /var/adm/ras/mmsdrserv.log logfiles we guessed that we had some
sort of split brain paxos cluster (some reported " ccrd: recovery
complete (rc 809)", some same message with 'rc 0' and some didn't have
the recovery complete on the last line(s))

* stop ccr everywhere
mmshutdown -a
mmdsh -N all pkill -9 -f mmccr

* one by one, start the paxos cluster using mmshutdown on the quorum
nodes (mmshutdown will start ccr and there is no unit or something to
help with that).
 * the nodes will join after 3-4 minutes and report "recovery complete";
wait for it before you start another one

* the trial-and-error part was that sometimes there was recovery
complete with rc=809, sometimes with rc=0. in the end, once they all had
same rc=0, paxos was happy again and eg mmlsconfig worked again.


this left a very bad experience with CCR with us, but we want to use
ces, so no real alternative (and to be honest, with odd number of
quorum, we saw no more issues, everyting was smooth).

in particular we were missing
* unit files for all extra services that gpfs launched (mmccrmoniotr,
mmsysmon); so we can monitor and start/stop them cleanly
* ccr commands that work with broken paxos setup; eg to report that the
paxos cluster is broken or operating in some split-brain mode.

anyway, YMMV and good luck.

stijn


On 09/20/2017 06:27 PM, Buterbaugh, Kevin L wrote:
> Hi Ed,
> 
> Thanks for the suggestion … that’s basically what I had done yesterday after 
> Googling and getting a hit or two on the IBM DeveloperWorks site.  I’m 
> including some output below which seems to show that I’ve got everything set 
> up but it’s still not working.
> 
> Am I missing something?  We don’t use CCR on our production cluster (and this 
> experience doesn’t make me eager to do so!), so I’m not that familiar with 
> it...
> 
> Kevin
> 
> /var/mmfs/gen
> root@testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v 
> grep" | sort
> testdellnode1:  root  2583 1  0 May30 ?00:10:33 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testdellnode1:  root  6694  2583  0 11:19 ?00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testgateway:  root  2023  5828  0 11:19 ?00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testgateway:  root  5828 1  0 Sep18 ?00:00:19 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd1:  root 19356  4628  0 11:19 tty1 00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd1:  root  4628 1  0 Sep19 tty1 00:00:04 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd2:  root 22149  2983  0 11:16 ?00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd2:  root  2983 1  0 Sep18 ?00:00:27 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd3:  root 15685  6557  0 11:19 ?00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testnsd3:  root  6557 1  0 Sep19 ?00:00:04 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testsched:  root 29424  6512  0 11:19 ?00:00:00 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> testsched:  root  6512 1  0 Sep18 ?00:00:20 
> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
> /var/mmfs/gen
> root@testnsd2# mmstartup -a
> get file failed: Not enough CCR quorum nodes available (err 809)
> gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
> mmstartup: Command failed. Examine previous error messages to determine cause.
> /var/mmfs/gen
> root@testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort
> testdellnode1:  drwxr-xr-x 2 root root 4096 Mar  3  2017 cached
> testdellnode1:  drwxr-xr-x 2 root root 4096 Nov 10  2016 committed
> testdellnode1:  -rw-r--r-- 1 root root   99 Nov 10  2016 ccr.nodes
> testdellnode1:  total 12
> testgateway:  drwxr-xr-x. 2 root root 4096 Jun 29  2016 committed
> testgateway:  drwxr-xr-x. 2 root root 4096 Mar  3  2017 cached
> testgateway:  -rw-r--r--. 1 root root   99 Jun 29  2016 ccr.nodes
> testgateway:  total 12
> testnsd1:  drwxr-xr-x 2 root root  6 Sep 19 15:38 cached
> testnsd1:  drwxr-xr-x 2 root root  6 Sep 19 15:38 committed
> testnsd1:  -rw-r--r-- 1 root root  0 Sep 19 15:39 ccr.disks
> 

[gpfsug-discuss] mixed verbsRdmaSend

2017-09-06 Thread Stijn De Weirdt
hi all,

what is the expected behaviour of a mixed verbsRdmaSend setup: some
nodes enabled, most disabled.

we have some nodes that have a very high iops workload, but most of the
cluster of 500+ nodes do not have such usecase.
we enabled verbsRdmaSend on the managers/quorum nodes (<10) and on the
few (<10) clients with this workload, but not on the others (500+). it
seems to work out fine, but is this acceptable as config? (the docs
mention that enabling verbsrdamSend on a> 100 nodes might lead to errors).


the nodes use ipoib as ip network, and running with verbsRdmaSend
disabled on all nodes leads to unstable cluster (TX errors (<1 error in
1M packets) on some clients leading to gpfs expel nodes etc).
(we still need to open a case wil mellanox to investigate further)

many thanks,

stijn
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
hi steve,

> The nsdChksum settings for none GNR/ESS based system is not officially 
> supported.It will perform checksum on data transfer over the network 
> only and can be used to help debug data corruption when network is a 
> suspect.
i'll take not officially supported over silent bitrot any day.

> 
> Did any of those "Encountered XYZ checksum errors on network I/O to NSD 
> Client disk" warning messages resulted in disk been changed to "down" 
> state due to IO error? 
no.

 If no disk IO error was reported in GPFS log,
> that means data was retransmitted successfully on retry. 
we suspected as much. as sven already asked, mmfsck now reports clean
filesystem.
i have an ibdump of 2 involved nsds during the reported checksums, i'll
have a closer look if i can spot these retries.

> 
> As sven said, only GNR/ESS provids the full end to end data integrity.
so with the silent network error, we have high probabilty that the data
is corrupted.

we are now looking for a test to find out what adapters are affected. we
hoped that nsdperf with verify=on would tell us, but it doesn't.

> 
> Steve Y. Xiao
> 
> 
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
o1;f0v00e0p0_S08o1;f1v01e0g0_Sm09o1;f1v01e0p0_S09o1;
>  -d 
> f0v02e0g0_Sm10o1;f0v02e0p0_S10o1;f1v03e0g0_Sm11o1;f1v03e0p0_S11o1;f0v04e0g0_Sm12o1;f0v04e0p0_S12o1;f1v05e0g0_Sm13o1;f1v05e0p0_S13o1;f0v06e0g0_Sm14o1;f0v06e0p0_S14o1;
>  -d 
> f1v07e0g0_Sm15o1;f1v07e0p0_S15o1;f0v00e0p0_S16o0;f1v01e0p0_S17o0;f0v02e0p0_S18o0;f1v03e0p0_S19o0;f0v04e0p0_S20o0;f1v05e0p0_S21o0;f0v06e0p0_S22o0;f1v07e0p0_S23o0;
>  -d 
> f0v00e0p0_S24o1;f1v01e0p0_S25o1;f0v02e0p0_S26o1;f1v03e0p0_S27o1;f0v04e0p0_S28o1;f1v05e0p0_S29o1;f0v06e0p0_S30o1;f1v07e0p0_S31o1
>   Disks in file system
>  -A no   Automatic mount option
>  -o none Additional mount options
>  -T /scratch  Default mount point
>  --mount-priority   0   



> 
> on the tsdbfs i am not sure if it gave wrong results, but it would be worth
> a test to see whats actually on the disk .
ok. i'll try this tomorrow.

> 
> you are correct that GNR extends this to the disk, but the network part is
> covered by the nsdchecksums you turned on
> when you enable the not to be named checksum parameter do you actually
> still get an error from fsck ?
hah, no, we don't. mmfsck says the filesystem is clean. we found this
odd, so we already asked ibm support about this but no answer yet.

stijn

> 
> sven
> 
> 
> On Wed, Aug 2, 2017 at 2:14 PM Stijn De Weirdt <stijn.dewei...@ugent.be>
> wrote:
> 
>> hi sven,
>>
>>> before i answer the rest of your questions, can you share what version of
>>> GPFS exactly you are on mmfsadm dump version would be best source for
>> that.
>> it returns
>> Build branch "4.2.3.3 ".
>>
>>> if you have 2 inodes and you know the exact address of where they are
>>> stored on disk one could 'dd' them of the disk and compare if they are
>>> really equal.
>> ok, i can try that later. are you suggesting that the "tsdbfs comp"
>> might gave wrong results? because we ran that and got eg
>>
>>> # tsdbfs somefs comp 7:5137408 25:221785088 1024
>>> Comparing 1024 sectors at 7:5137408 = 0x7:4E6400 and 25:221785088 =
>> 0x19:D382C00:
>>>   All sectors identical
>>
>>
>>> we only support checksums when you use GNR based systems, they cover
>>> network as well as Disk side for that.
>>> the nsdchecksum code you refer to is the one i mentioned above thats only
>>> supported with GNR at least i am not aware that we ever claimed it to be
>>> supported outside of it, but i can check that.
>> ok, maybe i'm a bit consfused. we have a GNR too, but it's not this one,
>> and they are not in the same gpfs cluster.
>>
>> i thought the GNR extended the checksumming to disk, and that it was
>> already there for the network part. thanks for clearing this up. but
>> that is worse then i thought...
>>
>> stijn
>>
>>>
>>> sven
>>>
>>> On Wed, Aug 2, 2017 at 12:20 PM Stijn De Weirdt <stijn.dewei...@ugent.be
>>>
>>> wrote:
>>>
>>>> hi sven,
>>>>
>>>> the data is not corrupted. mmfsck compares 2 inodes, says they don't
>>>> match, but checking the data with tbdbfs reveals they are equal.
>>>> (one replica has to be fetched over the network; the nsds cannot access
>>>> all disks)
>>>>
>>>> with some nsdChksum... settings we get during this mmfsck a lot of
>>>> "Encountered XYZ checksum errors on network I/O to NSD Client disk"
>>>>
>>>> ibm support says these are hardware issues, but wrt to mmfsck false
>>>> positives.
>>>>
>>>> anyway, our current question is: if these are hardware issues, is there
>>>> anything in gpfs client->nsd (on the network side) that would detect
>>>> such errors. ie can we trust the data (and metadata).
>>>> i was under the impression that client to disk is not covered, but i
>>>> assumed that at least client to nsd (the network part) was checksummed.
>>>>
>>>> stijn
>>>>
>>>>
>>>> On 08/02/2017 09:10 PM, Sven Oehme wrote:
>>>>> ok, i think i understand now, the data was already corrupted. the
>> config
>>>>> change i proposed only prevents a potentially known future on the wire
>>>>> corruption, this will not fix something that made it to the disk
>> already.
>>>>>
>>>>> Sven
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 2, 2017 at 

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
hi ed,

On 08/02/2017 10:11 PM, Edward Wahl wrote:
> What version of GPFS?  Are you generating a patch file?
4.2.3 series, now we run 4.2.3.3

to be clear, right now we use mmfsck to trigger the chksum issue hoping
we can find the actual "hardware" issue.

we know by elimination which HCAs to avoid, so we do not get the
checksum errors. but to consider that a fix, we need to know if the data
written by the client can be trusted due to these silent hw errors.

> 
> Try using this before your mmfsck:
> 
> mmdsh -N <nsdnodes|all> mmfsadm test fsck usePatchQueue 0
mmchmgr somefs nsdXYZ
mmfsck somefs -Vn -m -N nsdXYZ -t /var/tmp/

the idea is to force everything as much as possible on one node,
accessing the other failure group is forced over network

> 
> my notes say all, but I would have only had NSD nodes up at the time.
> Supposedly the mmfsck mess in 4.1 and 4.2.x was fixed in 4.2.2.3. 
we had the "pleasure" last to have mmfsck segfaulting while we were
trying to recover a filesystem, at least that was certainly fixed ;)


stijn

> I won't know for sure until late August.
> 
> Ed
> 
> 
> On Wed, 2 Aug 2017 21:20:14 +0200
> Stijn De Weirdt <stijn.dewei...@ugent.be> wrote:
> 
>> hi sven,
>>
>> the data is not corrupted. mmfsck compares 2 inodes, says they don't
>> match, but checking the data with tbdbfs reveals they are equal.
>> (one replica has to be fetched over the network; the nsds cannot access
>> all disks)
>>
>> with some nsdChksum... settings we get during this mmfsck a lot of
>> "Encountered XYZ checksum errors on network I/O to NSD Client disk"
>>
>> ibm support says these are hardware issues, but wrt to mmfsck false
>> positives.
>>
>> anyway, our current question is: if these are hardware issues, is there
>> anything in gpfs client->nsd (on the network side) that would detect
>> such errors. ie can we trust the data (and metadata).
>> i was under the impression that client to disk is not covered, but i
>> assumed that at least client to nsd (the network part) was checksummed.
>>
>> stijn
>>
>>
>> On 08/02/2017 09:10 PM, Sven Oehme wrote:
>>> ok, i think i understand now, the data was already corrupted. the config
>>> change i proposed only prevents a potentially known future on the wire
>>> corruption, this will not fix something that made it to the disk already.
>>>
>>> Sven
>>>
>>>
>>>
>>> On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.dewei...@ugent.be>
>>> wrote:
>>>   
>>>> yes ;)
>>>>
>>>> the system is in preproduction, so nothing that can't stopped/started in
>>>> a few minutes (current setup has only 4 nsds, and no clients).
>>>> mmfsck triggers the errors very early during inode replica compare.
>>>>
>>>>
>>>> stijn
>>>>
>>>> On 08/02/2017 08:47 PM, Sven Oehme wrote:  
>>>>> How can you reproduce this so quick ?
>>>>> Did you restart all daemons after that ?
>>>>>
>>>>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.dewei...@ugent.be>
>>>>> wrote:
>>>>>  
>>>>>> hi sven,
>>>>>>
>>>>>>  
>>>>>>> the very first thing you should check is if you have this setting
>>>>>>> set :  
>>>>>> maybe the very first thing to check should be the faq/wiki that has this
>>>>>> documented?
>>>>>>  
>>>>>>>
>>>>>>> mmlsconfig envVar
>>>>>>>
>>>>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
>>>>>>> MLX5_USE_MUTEX 1
>>>>>>>
>>>>>>> if that doesn't come back the way above you need to set it :
>>>>>>>
>>>>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
>>>>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"  
>>>>>> i just set this (wasn't set before), but problem is still present.
>>>>>>  
>>>>>>>
>>>>>>> there was a problem in the Mellanox FW in various versions that was  
>>>> never  
>>>>>>> completely addressed (bugs where found and fixed, but it was never  
>>>> fully  
>>>>>>> proven to be addressed) the above environment variables turn code on
>>>>>>> in  
>>>>>

Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
hi sven,


> the very first thing you should check is if you have this setting set :
maybe the very first thing to check should be the faq/wiki that has this
documented?

> 
> mmlsconfig envVar
> 
> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
> MLX5_USE_MUTEX 1
> 
> if that doesn't come back the way above you need to set it :
> 
> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
i just set this (wasn't set before), but problem is still present.

> 
> there was a problem in the Mellanox FW in various versions that was never
> completely addressed (bugs where found and fixed, but it was never fully
> proven to be addressed) the above environment variables turn code on in the
> mellanox driver that prevents this potential code path from being used to
> begin with.
> 
> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in Scale
> that even you don't set this variables the problem can't happen anymore
> until then the only choice you have is the envVar above (which btw ships as
> default on all ESS systems).
> 
> you also should be on the latest available Mellanox FW & Drivers as not all
> versions even have the code that is activated by the environment variables
> above, i think at a minimum you need to be at 3.4 but i don't remember the
> exact version. There had been multiple defects opened around this area, the
> last one i remember was  :
we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
dell, and the fw is a bit behind. i'm trying to convince dell to make
new one. mellanox used to allow to make your own, but they don't anymore.

> 
> 00154843 : ESS ConnectX-3 performance issue - spinning on pthread_spin_lock
> 
> you may ask your mellanox representative if they can get you access to this
> defect. while it was found on ESS , means on PPC64 and with ConnectX-3
> cards its a general issue that affects all cards and on intel as well as
> Power.
ok, thanks for this. maybe such a reference is enough for dell to update
their firmware.

stijn

> 
> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <stijn.dewei...@ugent.be>
> wrote:
> 
>> hi all,
>>
>> is there any documentation wrt data integrity in spectrum scale:
>> assuming a crappy network, does gpfs garantee somehow that data written
>> by client ends up safe in the nsd gpfs daemon; and similarly from the
>> nsd gpfs daemon to disk.
>>
>> and wrt crappy network, what about rdma on crappy network? is it the same?
>>
>> (we are hunting down a crappy infiniband issue; ibm support says it's
>> network issue; and we see no errors anywhere...)
>>
>> thanks a lot,
>>
>> stijn
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
> No guarantee...unless you are using ess/gss solution.
ok, so crappy network == corrupt data? hmmm, that is really a pity on
2017...

> 
> Crappy network will get you loads of expels and occasional fscks.  Which I
> guess beats data loss and recovery from backup.
if only we had errors like that. with the current issue mmfsck is the
only tool that seems to trigger them (and setting some of the nsdChksum
config flags reports checksum errors in the log files). but nsdperf with
verify=on reports nothing.

> 
> YOu probably have a network issue...they can be subtle.  Gpfs is a very
> extremely thorough network tester.
we know ;)

stijn


> 
> 
> Eric
> 
> On Wed, Aug 2, 2017 at 11:57 AM, Stijn De Weirdt <stijn.dewei...@ugent.be>
> wrote:
> 
>> hi all,
>>
>> is there any documentation wrt data integrity in spectrum scale:
>> assuming a crappy network, does gpfs garantee somehow that data written
>> by client ends up safe in the nsd gpfs daemon; and similarly from the
>> nsd gpfs daemon to disk.
>>
>> and wrt crappy network, what about rdma on crappy network? is it the same?
>>
>> (we are hunting down a crappy infiniband issue; ibm support says it's
>> network issue; and we see no errors anywhere...)
>>
>> thanks a lot,
>>
>> stijn
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] data integrity documentation

2017-08-02 Thread Stijn De Weirdt
hi all,

is there any documentation wrt data integrity in spectrum scale:
assuming a crappy network, does gpfs garantee somehow that data written
by client ends up safe in the nsd gpfs daemon; and similarly from the
nsd gpfs daemon to disk.

and wrt crappy network, what about rdma on crappy network? is it the same?

(we are hunting down a crappy infiniband issue; ibm support says it's
network issue; and we see no errors anywhere...)

thanks a lot,

stijn
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfs waiters debugging

2017-06-06 Thread Stijn De Weirdt
oh sure, i meant waiters that last > 300 seconds or so (something that
could trigger deadlock). obviously we're not interested in debugging the
short ones, it's not that gpfs doesn't work or anything ;)

stijn

On 06/06/2017 02:57 PM, Frederick Stock wrote:
> Realize that generally any waiter under 1 second should be ignored.  In an 
> active GPFS system there are always waiters and the greater the use of the 
> system likely the more waiters you will see.  The point is waiters 
> themselves are not an indication your system is having problems.
> 
> As for creating them any steady level of activity against the file system 
> should cause waiters to appear, though most should be of a short duration.
> 
> 
> Fred
> __
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> sto...@us.ibm.com
> 
> 
> 
> From:   Stijn De Weirdt <stijn.dewei...@ugent.be>
> To: gpfsug-discuss@spectrumscale.org
> Date:   06/06/2017 08:31 AM
> Subject:Re: [gpfsug-discuss] gpfs waiters debugging
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> hi bob,
> 
> waiters from RPC replies and/or threads waiting on mutex are most 
> "popular".
> 
> but my question is not how to resolve them, the question is how to
> create such a waiter so we can train ourself in grep and mmfsadm etc etc
> 
> we want to recreate the waiters a few times, try out some things and
> either script or at least put instructions on our internal wiki what to 
> do.
> 
> the instructions in the slides are clear enough, but there are a lot of
> slides, and typically when this occurs offshift, you don't want to start
> with rereading the slides and wondering what to do next; let alone debug
> scripts ;)
> 
> thanks,
> 
> stijn
> 
> On 06/06/2017 01:44 PM, Oesterlin, Robert wrote:
>> Hi Stijn
>>
>> You need to provide some more details on the type and duration of the 
> waiters before the group can offer some advice.
>>
>> Bob Oesterlin
>> Sr Principal Storage Engineer, Nuance
>>
>>
>>
>> On 6/6/17, 2:05 AM, "gpfsug-discuss-boun...@spectrumscale.org on behalf 
> of Stijn De Weirdt" <gpfsug-discuss-boun...@spectrumscale.org on behalf of 
> stijn.dewei...@ugent.be> wrote:
>>
>>
>> but we are wondering if and how we can cause those waiters ourself, 
> so
>> we can train ourself in debugging and resolving them (either on test
>> system or in controlled environment on the production clusters).
>>
>> all hints welcome.
>>
>> stijn
>> ___
>>
>>
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfs waiters debugging

2017-06-06 Thread Stijn De Weirdt
hi bob,

waiters from RPC replies and/or threads waiting on mutex are most "popular".

but my question is not how to resolve them, the question is how to
create such a waiter so we can train ourself in grep and mmfsadm etc etc

we want to recreate the waiters a few times, try out some things and
either script or at least put instructions on our internal wiki what to do.

the instructions in the slides are clear enough, but there are a lot of
slides, and typically when this occurs offshift, you don't want to start
with rereading the slides and wondering what to do next; let alone debug
scripts ;)

thanks,

stijn

On 06/06/2017 01:44 PM, Oesterlin, Robert wrote:
> Hi Stijn
> 
> You need to provide some more details on the type and duration of the waiters 
> before the group can offer some advice.
> 
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 
>  
> 
> On 6/6/17, 2:05 AM, "gpfsug-discuss-boun...@spectrumscale.org on behalf of 
> Stijn De Weirdt" <gpfsug-discuss-boun...@spectrumscale.org on behalf of 
> stijn.dewei...@ugent.be> wrote:
> 
> 
> but we are wondering if and how we can cause those waiters ourself, so
> we can train ourself in debugging and resolving them (either on test
> system or in controlled environment on the production clusters).
> 
> all hints welcome.
> 
> stijn
> ___
>  
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] gpfs waiters debugging

2017-06-06 Thread Stijn De Weirdt
hi all,

we have recently been hit by quite a few cases that triggered long waiters.

we are aware of the excellent slides
http://files.gpfsug.org/presentations/2017/NERSC/GPFS-Troubleshooting-Apr-2017.pdf

but we are wondering if and how we can cause those waiters ourself, so
we can train ourself in debugging and resolving them (either on test
system or in controlled environment on the production clusters).

all hints welcome.

stijn
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] connected v. datagram mode

2017-05-14 Thread Stijn De Weirdt
hi all,

does anyone know about the impact of memory usage? afaik, connected mode
keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2)
instructions suggested not to use CM for large-ish (>128 nodes at that
time) clusters.

we never turned it back on, and now have 700 nodes.

wrt IPoIB MTU, UD can have up to 4042 (or something like that) with
correct opensm configuration.


stijn

On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote:
> It also depends on the adapter.
> 
> We have seen better performance using datagram with MLNX adapters
> however we see better in connected mode when using Intel truescale.
> Again as Jonathon has mentioned we have also seen better performance
> when using connected mode on active/slave bonded interface (even between
> a mixed MLNX/TS fabric).
> 
> There is also a significant difference in the MTU size you can use in
> datagram vs connected mode, with datagram being limited to 2044 (if
> memory serves) there as connected mode can use 65536 (again if memory
> serves).
> 
> I typically now run qperf and nsdperf benchmarks to find the best
> configuration.
> 
> -- Lauz
> 
> On 12/05/2017 16:05, Jonathon A Anderson wrote:
>> It may be true that you should always favor connected mode; but those
>> instructions look like they’re specifically only talking about when
>> you have bonded interfaces.
>>
>> ~jonathon
>>
>>
>> On 5/12/17, 9:03 AM, "gpfsug-discuss-boun...@spectrumscale.org on
>> behalf of Jan-Frode Myklebust"
>> > janfr...@tanso.net> wrote:
>>
>> I also don't know much about this, but the ESS
>> quick deployment guide is quite clear on the we should use connected
>> mode for IPoIB:
>>   --
>>  Note: If using bonded IP over IB, do the following: Ensure that
>> the CONNECTED_MODE=yes statement exists in the corresponding
>> slave-bond interface scripts located in /etc/sysconfig/network-scripts
>> directory of the management server and I/O server nodes. These
>>   scripts are created as part of the IP over IB bond creation. An
>> example of the slave-bond interface with the modification is shown below.
>>  ---
>>-jf
>>  fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister
>> :
>>For what it's worth we've seen *significantly* better
>> performance of
>>  streaming benchmarks of IPoIB with connected mode vs datagram
>> mode on IB.
>>   -Aaron
>>   On 5/12/17 10:43 AM, Jonathon A Anderson wrote:
>>  > This won’t tell you which to use; but datagram mode and
>> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is
>> “unreliable” in that there’s no checking/retry built into the
>> protocol; the other is “reliable” and detects whether data is received
>>   completely and in the correct order.
>>  >
>>  > The last advice I heard for traditional IB was that the
>> overhead of connected mode isn’t worth it, particularly if you’re
>> using IPoIB (where you’re likely to be using TCP anyway). That said,
>> on our OPA network we’re seeing the opposite advice; so I, to, am
>>   often unsure what the most correct configuration would be for
>> any given fabric.
>>  >
>>  > ~jonathon
>>  >
>>  >
>>  > On 5/12/17, 4:42 AM, "gpfsug-discuss-boun...@spectrumscale.org
>> on behalf of Damir Krstic" >   on behalf of damir.krs...@gmail.com> wrote:
>>  >
>>  > I never fully understood the difference between connected
>> v. datagram mode beside the obvious packet size difference. Our NSD
>> servers (ESS GL6 nodes) are installed with RedHat 7 and are in
>> connected mode. Our 700+ clients are running RH6 and
>>  >  are in datagram mode.
>>  >
>>  >
>>  > In a month we are upgrading our cluster to RedHat 7 and are
>> debating whether to leave the compute nodes in datagram mode or
>> whether to switch them to connected mode.
>>  > What is is the right thing to do?
>>  >
>>  >
>>  > Thanks in advance.
>>  > Damir
>>  >
>>  >
>>  >
>>  > ___
>>  > gpfsug-discuss mailing list
>>  > gpfsug-discuss at
>>  spectrumscale.org 
>>  >
>>  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>>  >
>>   --
>>  Aaron Knister
>>  NASA Center for Climate Simulation (Code 606.2)
>>  Goddard Space Flight Center
>>  (301) 286-2776
>>  ___
>>  gpfsug-discuss mailing list
>>  gpfsug-discuss at
>>  spectrumscale.org 
>>  http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> ___
>> gpfsug-discuss mailing list
>> 

[gpfsug-discuss] workaround gpfs 4.2.1-0 rpm issue

2016-10-27 Thread Stijn De Weirdt
hi all,

gpfs.base 4.2.1-0 rpm has following postuninstall snippet

it will disable the gpfs unit always (when previously enabled), whether
this is a removal or an upgrade.

this however prevents an update to 4.2.1-1, as it will remove the unit
that is added during install (post has 'systemctl reenable
/usr/lpp/mmfs/lib/systemd/gpfs.service')

so after the upgrade, we are left with nodes that have no gpfs service
unit (and thus no gpfs).

it would have been better if the rpm symlinked teh service to the
/usr/lib/systemd/... units, and enabled/disabled it.

i'll probably rpmrebuild the 4.2.1-1 rpms to make a more permanent unit
in a proper system location. other tips are welcome.


stijn

> postuninstall scriptlet (using /bin/sh):
>   if test -n "$DEBUG" || test -n "$DEBUGpostun"; then
> set -x
>   fi
>   packageCnt=$1
>   debian_release="/etc/debian_version"
> 
>   # set the system utilities if they are in different path on different 
> systems
>   if [ -f "$debian_release" ]
>   then
> AWK=/usr/bin/awk
>   else
> AWK=/bin/awk
>   fi
> 
>   if /usr/bin/systemctl -q is-enabled gpfs.service 2>/dev/null
>   then
> /usr/bin/systemctl -q disable gpfs.service
>   fi
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS on ZFS?

2016-06-13 Thread Stijn De Weirdt
hi chris,

do you have any form of HA for the zfs blockdevices/jbod (eg when a nsd
reboots/breaks/...)? or do you rely on replication within GPFS?


stijn

On 06/13/2016 06:19 PM, Hoffman, Christopher P wrote:
> Hi Jaime,
> 
> What in particular would you like explained more? I'd be more than happy to 
> discuss things further.
> 
> Chris
> 
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jaime Pinto 
> [pi...@scinet.utoronto.ca]
> Sent: Monday, June 13, 2016 10:11
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS on ZFS?
> 
> I just came across this presentation on "GPFS with underlying ZFS
> block devices", by Christopher Hoffman, Los Alamos National Lab,
> although some of the
> implementation remains obscure.
> 
> http://files.gpfsug.org/presentations/2016/anl-june/LANL_GPFS_ZFS.pdf
> 
> It would be great to have more details, in particular the possibility
> of straight use of GPFS on ZFS, instead of the 'archive' use case as
> described on the presentation.
> 
> Thanks
> Jaime
> 
> 
> 
> 
> Quoting "Jaime Pinto" :
> 
>> Since we can not get GNR outside ESS/GSS appliances, is anybody using
>> ZFS for software raid on commodity storage?
>>
>> Thanks
>> Jaime
>>
>>
> 
> 
> 
> 
>   
>TELL US ABOUT YOUR SUCCESS STORIES
>   http://www.scinethpc.ca/testimonials
>   
> ---
> Jaime Pinto
> SciNet HPC Consortium  - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.org
> University of Toronto
> 256 McCaul Street, Room 235
> Toronto, ON, M5T1W5
> P: 416-978-2755
> C: 416-505-1477
> 
> 
> This message was sent using IMP at SciNet Consortium, University of Toronto.
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] GPFS VM passthrough

2016-05-05 Thread Stijn De Weirdt
hi all,


we are examining the possibility to give VMs access to GPFS mounted on
the hypervisors. (use cases are readonly or read-mostly access to the
data on gpfs, the hypervisors have IB, the vms not yet; and we have no
idea how to handle possible explosion of gpfs client licenses if we
mount gpfs inside the VMs ;)

has anyone experience with using the 9p filesystem passthrough and gpfs
with qemu/kvm?

many thanks,

stijn
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] migrating data from GPFS3.5 to ESS appliance(GPFS4.1)

2016-01-29 Thread Stijn De Weirdt
we wrapped something base on zookeeper around rsync to be able to use
rsync in parallel by splitting the path in subdirectories, and
distribute those
https://github.com/hpcugent/vsc-zk

works really well if the number of files in directories is somewhat
balanced. we use it to rsync some gpfs filesystems (200TB, 100M inodes ;)

stijn

On 01/29/2016 09:38 PM, Marc A Kaplan wrote:
> mmbackupconfig may be of some help.  The output is eyeball-able, so one 
> could tweak and then feed into mmrestoreconfig on the new system.
> Even if you don't use mmrestoreconfig, you might like to have the info 
> collected by mmbackupconfig.
> 
> 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss