Re: [lustre-discuss] A question about lctl lfsck

2019-07-04 Thread Andreas Dilger
You can use "lctl dk" to dump the kernel debug log on the MDS/OSS nodes, and 
grep for the LFSCK messages, but if there are lots of messages the kernel logs 
would not be enough to hold them all.

Another option is to enable "lctl set_param printk=+lfsck" on the MDS and OSS 
and have it print repair messages to the console, and use syslog filtering 
rules to put those messages into their own log file.

> On Jul 3, 2019, at 14:15, Kurt Strosahl  wrote:
> 
> Good Afternoon,
> 
> Hopefully a simple question... If I run lctl lfsck_start is there a place 
> where I can get a list of what it did?

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Jeff Johnson
If you only have those two processor models to choose from I’d do the 5217
for MDS and 5218 for OSS. If you were using ZFS for a backend definitely
the 5218 for the OSS. With ZFS your processors are also your RAID
controller so you have the disk i/o, parity calculation, checksums and ZFS
threads on top of the Lustre i/o and OS processes.

—Jeff

On Thu, Jul 4, 2019 at 13:30 Simon Legrand  wrote:

> Hello Jeff,
>
> Thanks for your quick answer. We plan to use ldiskfs, but I would be
> interested to know what could fit for zfs.
>
> Simon
>
> --
>
> *De: *"Jeff Johnson" 
> *À: *"Simon Legrand" 
> *Cc: *"lustre-discuss" 
> *Envoyé: *Jeudi 4 Juillet 2019 20:40:40
> *Objet: *Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors
>
> Simon,
>
> Which backend do you plan on using? ldiskfs or zfs?
>
> —Jeff
>
> On Thu, Jul 4, 2019 at 10:41 Simon Legrand  wrote:
>
>> Dear all,
>>
>> We are currently configuring a Lustre filesystem and facing a dilemma. We
>> have the choice between two types of processors for an OSS and a MDS.
>> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T
>> (115W) - DDR4-2666
>> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo,
>> HT,16C/32T (105W) - DDR4-2666
>>
>> Basically, we have to choose between freequency and number of cores.
>> Our current architecture is the following:
>> - 1MDS with 11To SDD
>> - 3 OSS/OST (~ 3*80To)
>> Our final target is 6 OSS/OST with a single MDS.
>> Do anyone of you could help us to choose and explain us the reasons?
>>
>> Best regards,
>>
>> Simon
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> --
> --
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.john...@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> 
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
> --
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Simon Legrand
Hello Jeff, 

Thanks for your quick answer. We plan to use ldiskfs, but I would be interested 
to know what could fit for zfs. 

Simon 

> De: "Jeff Johnson" 
> À: "Simon Legrand" 
> Cc: "lustre-discuss" 
> Envoyé: Jeudi 4 Juillet 2019 20:40:40
> Objet: Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

> Simon,

> Which backend do you plan on using? ldiskfs or zfs?

> —Jeff

> On Thu, Jul 4, 2019 at 10:41 Simon Legrand < [ mailto:simon.legr...@inria.fr |
> simon.legr...@inria.fr ] > wrote:

>> Dear all,

>> We are currently configuring a Lustre filesystem and facing a dilemma. We 
>> have
>> the choice between two types of processors for an OSS and a MDS.
>> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T 
>> (115W)
>> - DDR4-2666
>> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo, HT,16C/32T
>> (105W) - DDR4-2666

>> Basically, we have to choose between freequency and number of cores.
>> Our current architecture is the following:
>> - 1MDS with 11To SDD
>> - 3 OSS/OST (~ 3*80To)
>> Our final target is 6 OSS/OST with a single MDS.
>> Do anyone of you could help us to choose and explain us the reasons?

>> Best regards,

>> Simon
>> ___
>> lustre-discuss mailing list
>> [ mailto:lustre-discuss@lists.lustre.org | lustre-discuss@lists.lustre.org ]
>> [ http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org |
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ]

> --
> --
> Jeff Johnson
> Co-Founder
> Aeon Computing

> [ mailto:jeff.john...@aeoncomputing.com | jeff.john...@aeoncomputing.com ]
> [ http://www.aeoncomputing.com/ | www.aeoncomputing.com ]
> t: 858-412-3810 x1001 f: 858-412-3845
> m: 619-204-9061

> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Jeff Johnson
Simon,

Which backend do you plan on using? ldiskfs or zfs?

—Jeff

On Thu, Jul 4, 2019 at 10:41 Simon Legrand  wrote:

> Dear all,
>
> We are currently configuring a Lustre filesystem and facing a dilemma. We
> have the choice between two types of processors for an OSS and a MDS.
> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T
> (115W) - DDR4-2666
> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo,
> HT,16C/32T (105W) - DDR4-2666
>
> Basically, we have to choose between freequency and number of cores.
> Our current architecture is the following:
> - 1MDS with 11To SDD
> - 3 OSS/OST (~ 3*80To)
> Our final target is 6 OSS/OST with a single MDS.
> Do anyone of you could help us to choose and explain us the reasons?
>
> Best regards,
>
> Simon
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Simon Legrand
Dear all, 

We are currently configuring a Lustre filesystem and facing a dilemma. We have 
the choice between two types of processors for an OSS and a MDS. 
- Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T (115W) 
- DDR4-2666 
- Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo, HT,16C/32T 
(105W) - DDR4-2666 

Basically, we have to choose between freequency and number of cores. 
Our current architecture is the following: 
- 1MDS with 11To SDD 
- 3 OSS/OST (~ 3*80To) 
Our final target is 6 OSS/OST with a single MDS. 
Do anyone of you could help us to choose and explain us the reasons? 

Best regards, 

Simon 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount client with 56 MDSes and beyond

2019-07-04 Thread Colin Faber
We encountered this in testing done time ago and already have a bug filed
(don't recall the number right now) and should have a patch soonish if not
already. The gist of the problem is changelog registration limits (interger
type) and some padding resulting in an artificially low limit.

On Thu, Jul 4, 2019, 6:42 AM Matt Rásó-Barnett  wrote:

> I just tried out this configuration and was able to reproduce what Scott
> saw on 2.12.2.
>
> I couldn't see a Jira ticket for this though so I've opened one a new
> one: https://jira.whamcloud.com/browse/LU-12506
>
> Cheers,
> --
> Matt Rásó-Barnett
> University of Cambridge
>
> On Wed, May 22, 2019 at 08:02:59AM +, Andreas Dilger wrote:
> >Scott, if you haven't already done so, it is probably best to file a
> >ticket in Jira with the details.  Please include the client
> >syslog/dmesg as well as a Lustre debug log ("lctl dk /tmp/debug") so
> >that the problem can be isolated.
> >
> >During DNE development we tested with up to 128 MDTs in AWS, but
> >haven't tested that many MDTs in some time.
> >
> >Cheers, Andreas
> >
> >On May 8, 2019, at 12:28, White, Scott F  wrote:
> >>
> >> We’ve been testing DNE Phase II and tried scaling the number of
> >> MDSes(one MDT each for all of our tests) very high, but when we did
> >> that, we couldn’t mount the filesystem on a client.  After trial and
> >> error, we discovered that we were unable to mount the filesystem when
> >> there were 56 MDSes. 55 MDSes mounted without issue, and it appears
> >> any number below that will mount. This failure at 56 MDSes was
> >> replicable across different nodes being used for the MDSes, all of
> >> which were tested with working configurations, so it doesn’t seem to
> >> be a bad server.
> >>
> >> Here’s the error info we saw in dmesg on the client:
> >>
> >> LustreError: 28880:0:(obd_config.c:559:class_setup()) setup
> >> lustre-MDT0037-mdc-95923d31b000 failed (-16)
> >> LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler())
> >> MGCx.x.x.x@o2ib: cfg command failed: rc = -16
> >> Lustre:cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID
> >> 2:x.x.x.x@o2ib
> >> LustreError: 15c-8: MGCx.x.x.x@o2ib: The configuration from log
> >> 'lustre-client' failed (-16). This may be the result of communication
> >> errors between this node and the MGS, a bad configuration, or other
> >> errors. See the syslog for more information.
> >> LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not
> >> setup
> >> Lustre: Unmounted lustre-client
> >> LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to
> >> mount  (-16)
> >>
> >> OS: CentOS 7.6.1810
> >> Kernel: 3.10.0-957.5.1.el7.x86_64
> >> Lustre: 2.12.1
> >> Network card: Qlogic InfiniPath_QLE7340
> >>
> >> Other things to note for completeness’ sake: this happened with both
> >> ldiskfs and zfs backfstypes, and these tests were using files in
> >> memory as the backing devices.
> >>
> >> Is there something I’m missing as to why more than 56 MDSes won’t
> >> mount?
> >>
> >> Thanks,
> >> Scott White
> >> Scientist, HPC
> >> Los Alamos National Laboratory
> >>
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> >Cheers, Andreas
> >--
> >Andreas Dilger
> >Principal Lustre Architect
> >Whamcloud
> >
> >___
> >lustre-discuss mailing list
> >lustre-discuss@lists.lustre.org
> >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] A question about lctl lfsck

2019-07-04 Thread Nathan Dauchy - NOAA Affiliate
On Wed, Jul 3, 2019 at 2:15 PM Kurt Strosahl  wrote:

>
> Hopefully a simple question... If I run lctl lfsck_start is there a place
> where I can get a list of what it did?
>
>
Kurt,

As far as I know, this is still an open feature request...
https://jira.whamcloud.com/browse/LU-5202 (LFSCK 5: LFSCK needs to log all
changes and errors found)

The debug_daemon method is tolerable as a workaround, with the caveat that
you have to periodically run debug_file to dump to a log, since the debug
buffer will definitely fill up before the lfsck finishes, and then
post-process those files to remove duplicates.

Regards,
Nathan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Error when mounting additional MDT

2019-07-04 Thread Thomas Roth
Hi all,

when adding an MDT2 to a system with MGS+MDT0 and MDT1, there was an 
interruption, the MGS at first
reported

LustreError: 140-5: Server hebe-MDT0002 requested index 2, but that index is 
already
in use. Use --writeconf to force
LustreError: 30446:0:(mgs_handler.c:535:mgs_target_reg()) Failed to write 
hebe-MDT0002
log (-98)


but seems to have been convinced to accept the new MDT anyhow,

Lustre: ctl-hebe-MDT: super-sequence allocation rc = 0
[0x004d0400-0x004d4400]:2:mdt


because afterwards, I could create a directory with

lfs mkdir -i 2 /lustre/test2

A file put into that directory shows

# lfs getstripe -M /lustre/test2/testfile
2



Does this mean that the log for hebe-MDT0002 has been written?
Or should we do the big writeconf?


Regards,
Thomas


-- 

Thomas Roth

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Georg Schütte
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount client with 56 MDSes and beyond

2019-07-04 Thread Matt Rásó-Barnett
I just tried out this configuration and was able to reproduce what Scott 
saw on 2.12.2.


I couldn't see a Jira ticket for this though so I've opened one a new 
one: https://jira.whamcloud.com/browse/LU-12506


Cheers,
--
Matt Rásó-Barnett
University of Cambridge

On Wed, May 22, 2019 at 08:02:59AM +, Andreas Dilger wrote:
Scott, if you haven't already done so, it is probably best to file a 
ticket in Jira with the details.  Please include the client 
syslog/dmesg as well as a Lustre debug log ("lctl dk /tmp/debug") so 
that the problem can be isolated.


During DNE development we tested with up to 128 MDTs in AWS, but 
haven't tested that many MDTs in some time.


Cheers, Andreas

On May 8, 2019, at 12:28, White, Scott F  wrote:


We’ve been testing DNE Phase II and tried scaling the number of 
MDSes(one MDT each for all of our tests) very high, but when we did 
that, we couldn’t mount the filesystem on a client.  After trial and 
error, we discovered that we were unable to mount the filesystem when 
there were 56 MDSes. 55 MDSes mounted without issue, and it appears 
any number below that will mount. This failure at 56 MDSes was 
replicable across different nodes being used for the MDSes, all of 
which were tested with working configurations, so it doesn’t seem to 
be a bad server.


Here’s the error info we saw in dmesg on the client:

LustreError: 28880:0:(obd_config.c:559:class_setup()) setup 
lustre-MDT0037-mdc-95923d31b000 failed (-16)
LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler()) 
MGCx.x.x.x@o2ib: cfg command failed: rc = -16
Lustre:cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID  
2:x.x.x.x@o2ib
LustreError: 15c-8: MGCx.x.x.x@o2ib: The configuration from log 
'lustre-client' failed (-16). This may be the result of communication 
errors between this node and the MGS, a bad configuration, or other 
errors. See the syslog for more information.
LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not 
setup

Lustre: Unmounted lustre-client
LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to 
mount  (-16)


OS: CentOS 7.6.1810
Kernel: 3.10.0-957.5.1.el7.x86_64
Lustre: 2.12.1
Network card: Qlogic InfiniPath_QLE7340

Other things to note for completeness’ sake: this happened with both 
ldiskfs and zfs backfstypes, and these tests were using files in 
memory as the backing devices.


Is there something I’m missing as to why more than 56 MDSes won’t 
mount?


Thanks,
Scott White
Scientist, HPC
Los Alamos National Laboratory

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org