Re: [lustre-discuss] 2.15.4 o2iblnd on RoCEv2?
A LU ticket and patch for lnetctl or for me being an under-caffeinated idiot? ;-) On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger wrote: > > It would seem that the error message could be improved in this case? Could > you file an LU ticket for that with the reproducer below, and ideally along > with a patch? > > Cheers, Andreas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 2.15.4 o2iblnd on RoCEv2?
Man am I an idiot. Been up all night too many nights in a row and not enough coffee. It helps if you use the correct --net designation. I was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine. (cleanup from previous) lctl net down && lustre_rmmod (new attempt) modprobe lnet -v lnetctl lnet configure lnetctl net add --if enp1s0np0 --net o2ib0 lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib local NI(s): - nid: 10.0.50.27@o2ib status: up interfaces: 0: enp1s0np0 Lots more to test and verify but the original mailing list submission was total pilot error on my part. Apologies to all who spent cycles pondering this nothingburger. On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson wrote: > > Howdy intrepid Lustrefarians, > > While starting down the debug rabbit hole I thought I'd raise my hand > and see if anyone has a few magic beans to spare. > > I cannot get lnet (via lnetctl) to init a o2iblnd interface on a > RoCEv2 interface. > > Running `lnetctl net add --net ib0 --if enp1s0np0` results in > net: > errno: -1 > descr: cannot parse net '<255:65535>' > > Nothing in dmesg to indicate why. Search engines aren't coughing up > much here either. > > Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4 > > I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and > ibdev2netdev report it correctly. ibv_rc_pingpong works fine between > nodes. > > Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if > enp1s0np0 && lnetctl net show` > [root@r2u11n3 ~]# lnetctl net show > net: > - net type: lo > local NI(s): > - nid: 0@lo > status: up > - net type: tcp > local NI(s): > - nid: 10.0.50.27@tcp > status: up > interfaces: > 0: enp1s0np0 > > I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well > as sysfs references > > [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1 > RoCE v2 > > Ideas? Suggestions? Incense? > > Thanks, > > --Jeff -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] 2.15.4 o2iblnd on RoCEv2?
Howdy intrepid Lustrefarians, While starting down the debug rabbit hole I thought I'd raise my hand and see if anyone has a few magic beans to spare. I cannot get lnet (via lnetctl) to init a o2iblnd interface on a RoCEv2 interface. Running `lnetctl net add --net ib0 --if enp1s0np0` results in net: errno: -1 descr: cannot parse net '<255:65535>' Nothing in dmesg to indicate why. Search engines aren't coughing up much here either. Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4 I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and ibdev2netdev report it correctly. ibv_rc_pingpong works fine between nodes. Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if enp1s0np0 && lnetctl net show` [root@r2u11n3 ~]# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.0.50.27@tcp status: up interfaces: 0: enp1s0np0 I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well as sysfs references [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1 RoCE v2 Ideas? Suggestions? Incense? Thanks, --Jeff ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?
onsidering NVME storage for the next MDS. > >> > >> > >> As I understand, NVME disks are bundled in software, not by a hardware > >> raid controller. > >> This would be done using Linux software raid, mdadm, correct? > >> > >> > >> We have some experience with ZFS, which we use on our OSTs. > >> But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol > >> on top which is then formatted with ldiskfs - to much voodoo... > >> > >> > >> How is this handled elsewhere? Any experiences? > >> > >> > >> > >> > >> The available devices are quite large. If I create a raid-10 out of 4 > >> disks, e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB > >> limit. > >> So no need for a box with lots of U.3 slots. > >> > >> > >> But for MDS operations, we will still need a powerful dual-CPU system with > >> lots of RAM. > >> Then the NVME devices should be distributed between the CPUs? > >> Is there a way to pinpoint this in a call for tender? > >> > >> > >> > >> > >> Best regards, > >> Thomas > >> > >> > >> > >> Thomas Roth > >> > >> > >> GSI Helmholtzzentrum für Schwerionenforschung GmbH > >> Planckstraße 1, 64291 Darmstadt, Germany, > >> https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$ > >> > >> <https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$ > >> > > >> > >> > >> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 > >> Managing Directors / Geschäftsführung: > >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock > >> Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: > >> State Secretary / Staatssekretär Dr. Volkmar Dietz > >> > >> > >> > >> > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org> > >> https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$ > >> > >> <https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$ > >> > > >> > >> > >> > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$ > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OSS on compute node
I'm certainly not Andreas. That said... You're running an MPI simulation, assumingly across most or all of your 34 compute nodes. Lustre server operations, their lnet activity and the backend storage I/O will create a profound imbalance on the few compute nodes you designate to do both server and client operation. That and you expose yourself to deadlocks and other potentials mentioned earlier. I do not know how performant your login server is, but depending on the file operations of your simulations you could cavitate your login server. Also, you generally don't want users logging in on a node as critical as an MDS. You would be better served by allocating two of your compute nodes to just be Lustre servers, one mds/oss, the other an oss and run 32 clean client nodes. More stable, clean and in the end probably more workflow productivity over time. Fewer technical incidents. Just my opinion...others may differ. --Jeff On Fri, Oct 13, 2023 at 12:43 PM Fedele Stabile < fedele.stab...@fis.unical.it> wrote: > I believe in Linux is possible to limit the memory used by a user and also > it is possible to limit the amount of cpu used so I can limit resources for > group user and also if i put oss server in a vm i suppose i can limit cpu > and memory usage. > My scenario is: i have 34 compute nodes 512 GB RAM and 34 HD 16 TB each > that I can arrange in 9 nodes, i have also a management node that can be > used for LUSTRE metadata server, infiniband is 200 Gb/s > We make mhd simulations. > What Lustre configuration do you suggest? > > -- > *Da:* Andreas Dilger > *Inviato:* Venerdì, Ottobre 13, 2023 7:19:11 PM > *A:* Fedele Stabile > *Cc:* lustre-discuss@lists.lustre.org > *Oggetto:* Re: [lustre-discuss] OSS on compute node > > On Oct 13, 2023, at 20:58, Fedele Stabile > wrote: > > > Hello everyone, > We are in progress to integrate Lustre on our little HPC Cluster and we > would like to know if it is possible to use the same node in a cluster to > act as an OSS with disks and to also use it as a Compute Node and then > install a Lustre Client. > I know that the OSS server require a modified kernel so I suppose it can > be installed in a virtual machine using kvm on a compute node. > > > There isn't really a problem with running a client + OSS on the same node > anymore, nor is there a problem with an OSS running inside a VM (if you > have SR-IOV and enough CPU+RAM to run the server). > > *HOWEVER*, I don't think it would be good to have the client mounted on > the *VM host*, and then run the OSS on a *VM guest*. That could lead to > deadlocks and priority inversion if the client becomes busy, but depends on > the local OSS to flush dirty data from RAM and the OSS cannot run in the VM > because it doesn't have any RAM... > > If the client and OSS are BOTH run in VMs, or neither run in VMs, or only > the client run in a VM, then that should be OK, but may have reduced > performance due to the server contending with the client application. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OSS on compute node
Skydiving with an anvil is *possible* ...but not advisable. --Jeff On Fri, Oct 13, 2023 at 10:21 AM Andreas Dilger via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > On Oct 13, 2023, at 20:58, Fedele Stabile > wrote: > > > Hello everyone, > We are in progress to integrate Lustre on our little HPC Cluster and we > would like to know if it is possible to use the same node in a cluster to > act as an OSS with disks and to also use it as a Compute Node and then > install a Lustre Client. > I know that the OSS server require a modified kernel so I suppose it can > be installed in a virtual machine using kvm on a compute node. > > > There isn't really a problem with running a client + OSS on the same node > anymore, nor is there a problem with an OSS running inside a VM (if you > have SR-IOV and enough CPU+RAM to run the server). > > *HOWEVER*, I don't think it would be good to have the client mounted on > the *VM host*, and then run the OSS on a *VM guest*. That could lead to > deadlocks and priority inversion if the client becomes busy, but depends on > the local OSS to flush dirty data from RAM and the OSS cannot run in the VM > because it doesn't have any RAM... > > If the client and OSS are BOTH run in VMs, or neither run in VMs, or only > the client run in a VM, then that should be OK, but may have reduced > performance due to the server contending with the client application. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lnet errors
I couldn't say exactly but.. - Your net is o2ib1. Is there an o2ib0? - Are you routing? If so, lnet routing or IB routing? Any issues with the routers or routing? - Verify the stability of lnet and the fabric path between client and server in the messages above using a tool like lnet_selftest? - Verify the fabric: Check error counters on the switch and HCA ports involved. Use non-Lustre IB tools (ib_send_bw, etc) to test the fabric. Lustre can, and will tell you when lnet issue arise but it cannot tell you anything about the actual network layer it is riding on so it is usually a good idea to certify function of the network layer first before delving into "what LBUG is running my weekend plans?" I hope that helps, --Jeff (resent to list in hopes of being beneficial to others) On Thu, Oct 5, 2023 at 9:34 AM Alastair Basden via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hi, > > Lustre 2.12.2. > > We are seeing lots of errors on the servers such as: > Oct 5 11:16:48 oss04 kernel: LNetError: > 6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending > PUT to 12345-172.19.171.15@o2ib1: -125 > Oct 5 11:16:48 oss04 kernel: LustreError: > 6414:0:(events.c:450:server_bulk_callback()) event type 5, status -125, > desc 8fe066bb9400 > > and > Oct 4 14:59:48 oss04 kernel: LustreError: > 6383:0:(events.c:305:request_in_callback()) event type 2, status -103, > service ost_io > > and > Oct 5 11:18:06 oss04 kernel: LustreError: > 6388:0:(events.c:305:request_in_callback()) event type 2, status -5, > service ost_io > Oct 5 11:18:06 oss04 kernel: LNet: > 6412:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from > 172.19.171.15@o2ib1 > > and on the clients: > m7: Oct 5 14:46:59 m7132 kernel: LustreError: > 2466:0:(events.c:200:client_bulk_callback()) event type 2, status -103, > desc 9a251fc14400 > > and > m7: Oct 5 11:18:34 m7086 kernel: LustreError: > 2495:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc > 9a39ad668000 > > Does anyone have any ideas about what could be causing this? > > Thanks, > Alastair. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL EMAIL] Re: [EXTERNAL EMAIL] Re: [EXTERNAL] No port 988?
Nothing better than sliding in at the last moment to steal all the glory ;-) —Jeff On Wed, Sep 27, 2023 at 07:10 Jan Andersen wrote: > Hi Jeff, > > Yes, that was it! Things are working beautifully now - big thanks. > > /jan > > On 27/09/2023 15:07, Jeff Johnson wrote: > > Any chance the firewall is running? > > > > You can use `lctl ping ipaddress@lnet` to check if you have functional > > lnet between machines. Example `lctl ping 10.0.0.10@tcp` > > > > —Jeff > > > > On Wed, Sep 27, 2023 at 05:35 Jan Andersen > <mailto:j...@comind.io>> wrote: > > > > However, it is still timing out when I try to mount on the oss. This > is > > the kernel module: > > > > [root@mds ~]# lsmod | grep lnet > > lnet 704512 7 > mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt > > libcfs266240 15 > > > > fld,lnet,fid,lod,mdd,mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt,osd_ldiskfs,lquota,lfsck > > sunrpc577536 2 lnet > > > > But it only listens on tcp6, which I don't use - is there a way to > for > > it to use tcp4? > > > > [root@mds ~]# netstat -nap | grep 988 > > tcp6 0 0 :::988 :::* > > LISTEN - > > > > /jan > > > > On 27/09/2023 10:15, Jan Andersen wrote: > > > Hi Rick, > > > > > > Very strange - when I started the vm this morning, 'modprobe lnet' > > > didn't return an error - and it seems to have loaded the module: > > > > > > [root@rocky8 ~]# lsmod | grep lnet > > > lnet 704512 0 > > > libcfs266240 1 lnet > > > sunrpc577536 2 lnet > > > > > > Looking at the running kernel and the kernel source, they now > > seem to be > > > the same version: > > > > > > [root@rocky8 ~]# ll /usr/src/kernels > > > total 4 > > > drwxr-xr-x. 23 root root 4096 Sep 26 12:34 > > 4.18.0-477.27.1.el8_8.x86_64/ > > > [root@rocky8 ~]# uname -r > > > 4.18.0-477.27.1.el8_8.x86_64 > > > > > > - which would explain that it now works. Things were a bit hectic > > with > > > other things yesterday afternoon, and I don't quite remember > > installing > > > a new kernel, but it looks like I did. Hopefully this is my > problem > > > solved, then - sorry for jumping up and down and making noise! > > > > > > /jan > > > > > > On 26/09/2023 18:13, Mohr, Rick wrote: > > >> What error do you get when you run "modprobe lnet"? > > >> > > >> --Rick > > >> > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.org> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL EMAIL] Re: [EXTERNAL] No port 988?
Any chance the firewall is running? You can use `lctl ping ipaddress@lnet` to check if you have functional lnet between machines. Example `lctl ping 10.0.0.10@tcp` —Jeff On Wed, Sep 27, 2023 at 05:35 Jan Andersen wrote: > However, it is still timing out when I try to mount on the oss. This is > the kernel module: > > [root@mds ~]# lsmod | grep lnet > lnet 704512 7 mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt > libcfs266240 15 > > fld,lnet,fid,lod,mdd,mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt,osd_ldiskfs,lquota,lfsck > sunrpc577536 2 lnet > > But it only listens on tcp6, which I don't use - is there a way to for > it to use tcp4? > > [root@mds ~]# netstat -nap | grep 988 > tcp6 0 0 :::988 :::* > LISTEN - > > /jan > > On 27/09/2023 10:15, Jan Andersen wrote: > > Hi Rick, > > > > Very strange - when I started the vm this morning, 'modprobe lnet' > > didn't return an error - and it seems to have loaded the module: > > > > [root@rocky8 ~]# lsmod | grep lnet > > lnet 704512 0 > > libcfs266240 1 lnet > > sunrpc577536 2 lnet > > > > Looking at the running kernel and the kernel source, they now seem to be > > the same version: > > > > [root@rocky8 ~]# ll /usr/src/kernels > > total 4 > > drwxr-xr-x. 23 root root 4096 Sep 26 12:34 4.18.0-477.27.1.el8_8.x86_64/ > > [root@rocky8 ~]# uname -r > > 4.18.0-477.27.1.el8_8.x86_64 > > > > - which would explain that it now works. Things were a bit hectic with > > other things yesterday afternoon, and I don't quite remember installing > > a new kernel, but it looks like I did. Hopefully this is my problem > > solved, then - sorry for jumping up and down and making noise! > > > > /jan > > > > On 26/09/2023 18:13, Mohr, Rick wrote: > >> What error do you get when you run "modprobe lnet"? > >> > >> --Rick > >> > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] How to eliminate zombie OSTs
Alejandro, Is your MGS located on the same node as your primary MDT? (combined MGS/MDT node) --Jeff On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hello, > > In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS > with 4 jboxes, each box with 24 disks of 12 TB each, for a total of > nearly 1 PB. In all that time we had power failures and failed raid > controller cards, all of which made us adjust the configuration. After > the last failure, the system keeps sending error messages about OSTs > that are no more in the system. In the MDS I do > > # lctl dl > > and I get the 20 currently active OSTs > > oss01.lanot.unam.mx - OST00 /dev/disk/by-label/lustre-OST > oss01.lanot.unam.mx - OST01 /dev/disk/by-label/lustre-OST0001 > oss01.lanot.unam.mx - OST02 /dev/disk/by-label/lustre-OST0002 > oss01.lanot.unam.mx - OST03 /dev/disk/by-label/lustre-OST0003 > oss01.lanot.unam.mx - OST04 /dev/disk/by-label/lustre-OST0004 > oss01.lanot.unam.mx - OST05 /dev/disk/by-label/lustre-OST0005 > oss01.lanot.unam.mx - OST06 /dev/disk/by-label/lustre-OST0006 > oss01.lanot.unam.mx - OST07 /dev/disk/by-label/lustre-OST0007 > oss01.lanot.unam.mx - OST08 /dev/disk/by-label/lustre-OST0008 > oss01.lanot.unam.mx - OST09 /dev/disk/by-label/lustre-OST0009 > oss02.lanot.unam.mx - OST15 /dev/disk/by-label/lustre-OST000f > oss02.lanot.unam.mx - OST16 /dev/disk/by-label/lustre-OST0010 > oss02.lanot.unam.mx - OST17 /dev/disk/by-label/lustre-OST0011 > oss02.lanot.unam.mx - OST18 /dev/disk/by-label/lustre-OST0012 > oss02.lanot.unam.mx - OST19 /dev/disk/by-label/lustre-OST0013 > oss02.lanot.unam.mx - OST25 /dev/disk/by-label/lustre-OST0019 > oss02.lanot.unam.mx - OST26 /dev/disk/by-label/lustre-OST001a > oss02.lanot.unam.mx - OST27 /dev/disk/by-label/lustre-OST001b > oss02.lanot.unam.mx - OST28 /dev/disk/by-label/lustre-OST001c > oss02.lanot.unam.mx - OST29 /dev/disk/by-label/lustre-OST001d > > but I also get 5 that are not currently active, in fact doesn't exist > > 28 IN osp lustre-OST0014-osc-MDT lustre-MDT-mdtlov_UUID 4 > 29 UP osp lustre-OST0015-osc-MDT lustre-MDT-mdtlov_UUID 4 > 30 UP osp lustre-OST0016-osc-MDT lustre-MDT-mdtlov_UUID 4 > 31 UP osp lustre-OST0017-osc-MDT lustre-MDT-mdtlov_UUID 4 > 32 UP osp lustre-OST0018-osc-MDT lustre-MDT-mdtlov_UUID 4 > > When I try to eliminate them with > > lctl conf_param -P osp.lustre-OST0015-osc-MDT.active=0 > > I get the error > > conf_param: invalid option -- 'P' > set a permanent config parameter. > This command must be run on the MGS node > usage: conf_param [-d] > -d Remove the permanent setting. > > If I do > > lctl --device 28 deactivate > > I don't get an error, but nothing changes > > What can I do? > > Thank you in advance for any help. > > -- > Alejandro Aguilar Sierra > LANOT, ICAyCC, UNAM > _______ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'
Yao, Glad you have the correct modules working now. Can you explain why you are employing the virtual disk driver with osd-zfs? Capacities with Lustre-ZFS are estimated as zfs has functions like compression that can change "capacity". As ZFS volumes are used and filled as Lustre volumes their capacities will be reported more accurately as objects are allocated to storage. Running a Lustre command from a client `lfs df -h` will give a better view of Lustre capacities from the client perspective. Your use of dev/vda may be adding obscurity and I'm not sure why you would be adding that. --Jeff --Jeff On Thu, Jun 29, 2023 at 10:33 PM Yao Weng wrote: > Hi Jeff: > Thank you very much ! I install lustre-zfs-dkms and lustre client can > mount lustre filesystem. However, the size does not adds-up. > I have > - msg > > sudo mkfs.lustre --mgs --reformat --backfstype=zfs --fsname=lustre > lustre-mgs/mgs > /dev/vda2 > > >Permanent disk data: > > Target: MGS > > Index: unassigned > > Lustre FS: lustre > > Mount type: zfs > > Flags: 0x64 > > (MGS first_time update ) > > Persistent mount opts: > > Parameters: > > mkfs_cmd = zpool create -f -O canmount=off lustre-mgs /dev/vda2 > > mkfs_cmd = zfs create -o canmount=off lustre-mgs/mgs > > xattr=sa > > dnodesize=auto > > Writing lustre-mgs/mgs properties > > lustre:version=1 > > lustre:flags=100 > > lustre:index=65535 > > lustre:fsname=lustre > > lustre:svname=MGS > > > - mdt > > sudo mkfs.lustre --mdt --reformat --backfstype=zfs --fsname=lustre > --index=0 --mgsnode=10.34.0.103@tcp0 lustre-mdt0/mdt0 /dev/vda2 > > >Permanent disk data: > > Target: lustre:MDT > > Index: 0 > > Lustre FS: lustre > > Mount type: zfs > > Flags: 0x61 > > (MDT first_time update ) > > Persistent mount opts: > > Parameters: mgsnode=10.34.0.103@tcp > > mkfs_cmd = zpool create -f -O canmount=off lustre-mdt0 /dev/vda2 > > mkfs_cmd = zfs create -o canmount=off lustre-mdt0/mdt0 > > xattr=sa > > dnodesize=auto > > Writing lustre-mdt0/mdt0 properties > > lustre:mgsnode=10.34.0.103@tcp > > lustre:version=1 > > lustre:flags=97 > > lustre:index=0 > > lustre:fsname=lustre > > lustre:svname=lustre:MDT > > > - ost > > sudo mkfs.lustre --ost --reformat --backfstype=zfs --fsname=lustre > --index=0 --mgsnode=10.34.0.103@tcp0 lustre-ost0/ost0 /dev/vda2 > > >Permanent disk data: > > Target: lustre:OST > > Index: 0 > > Lustre FS: lustre > > Mount type: zfs > > Flags: 0x62 > > (OST first_time update ) > > Persistent mount opts: > > Parameters: mgsnode=10.34.0.103@tcp > > mkfs_cmd = zpool create -f -O canmount=off lustre-ost0 /dev/vda2 > > mkfs_cmd = zfs create -o canmount=off lustre-ost0/ost0 > > xattr=sa > > dnodesize=auto > > recordsize=1M > > Writing lustre-ost0/ost0 properties > > lustre:mgsnode=10.34.0.103@tcp > > lustre:version=1 > > lustre:flags=98 > > lustre:index=0 > > lustre:fsname=lustre > > lustre:svname=lustre:OST > > > > I have 51G for /dev/vda > > df -H /dev/vda2 > > Filesystem Size Used Avail Use% Mounted on > > devtmpfs 51G 0 51G 0% /dev > > On my client node, > > sudo mount -t lustre 10.34.0.103@tcp0:/lustre /mnt > > However, the size is only 25M, shouldn't it be 51G ? > > df -H /mnt > > Filesystem Size Used Avail Use% Mounted on > > 10.34.0.103@tcp:/lustre 25M 3.2M 19M 15% /mnt > > Thanks > Yao > > On Wed, Jun 28, 2023 at 12:22 PM Jeff Johnson < > jeff.john...@aeoncomputing.com> wrote: > >> Yao, >> >> Either add the required kernel-{devel,debuginfo} so the osd-ldiskfs >> kernel modules can be built against your kernel or remove lustre-all-dkms >> package and replace with lustre-zfs-dkms and build ZFS only Lustre modules. >> >> --Jeff >> >> >> On Wed, Jun 28, 2023 at 8:45 AM Yao Weng wrote: >> >>> I have error when installing lustre-all-dkms-2.15.3-1.el8.noarch >>> >>> Loading new lustre-all-2.15.3 DKMS files... >>> >>> Deprecated feature: REMAKE_INITRD (/usr/src/lustre-all-2.15.3/dkms.conf) >>> >>> Building for 4.18.0-477.15.1.el8_8.x86_64 >>> 4.18.0-477.10.1.el8_lustre.x86_64 >>> >>> Building initial module for 4.18.0-477.15.1.el8_8.x86_64 >>> >>> Deprecated feature: REMAK
Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'
Yao, Either add the required kernel-{devel,debuginfo} so the osd-ldiskfs kernel modules can be built against your kernel or remove lustre-all-dkms package and replace with lustre-zfs-dkms and build ZFS only Lustre modules. --Jeff On Wed, Jun 28, 2023 at 8:45 AM Yao Weng wrote: > I have error when installing lustre-all-dkms-2.15.3-1.el8.noarch > > Loading new lustre-all-2.15.3 DKMS files... > > Deprecated feature: REMAKE_INITRD (/usr/src/lustre-all-2.15.3/dkms.conf) > > Building for 4.18.0-477.15.1.el8_8.x86_64 4.18.0-477.10.1.el8_lustre.x86_64 > > Building initial module for 4.18.0-477.15.1.el8_8.x86_64 > > Deprecated feature: REMAKE_INITRD > (/var/lib/dkms/lustre-all/2.15.3/source/dkms.conf) > > realpath: /var/lib/dkms/spl/2.1.12/source: No such file or directory > > realpath: /var/lib/dkms/spl/kernel-4.18.0-477.15.1.el8_8.x86_64-x86_64: No > such file or directory > > configure: WARNING: > > > Disabling ldiskfs support because complete ext4 source does not exist. > > > If you are building using kernel-devel packages and require ldiskfs > > server support then ensure that the matching kernel-debuginfo-common > > and kernel-debuginfo-common- packages are installed. > > > awk: fatal: cannot open file > `/var/lib/dkms/lustre-all/2.15.3/build/_lpb/Makefile.compile.lustre' for > reading (No such file or directory) > > ./configure: line 53751: test: too many arguments > > ./configure: line 53755: test: too many arguments > > Error! Build of osd_ldiskfs.ko failed for: 4.18.0-477.15.1.el8_8.x86_64 > (x86_64) > > Make sure the name of the generated module is correct and at the root of > the > > build directory, or consult make.log in the build directory > > /var/lib/dkms/lustre-all/2.15.3/build for more information. > > warning: %post(lustre-all-dkms-2.15.3-1.el8.noarch) scriptlet failed, exit > status 7 > > > Error in POSTIN scriptlet in rpm package lustre-all-dkms > > > > On Wed, Jun 28, 2023 at 9:57 AM Yao Weng wrote: > >> Thank Jeff: >> My installation steps are >> >> step1: set local software provision ( >> https://wiki.lustre.org/Installing_the_Lustre_Software) >> I download all rpm from >> >> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/server >> >> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/client >> >> https://downloads.whamcloud.com/public/e2fsprogs/1.47.0.wc2/el8 >> >> step 2: Install the Lustre e2fsprogs distribution: >> >> sudo yum --nogpgcheck --disablerepo=* --enablerepo=e2fsprogs-wc install >> e2fsprogs >> >> >> step 3Install EPEL repository support: >> >> sudo yum -y install epel-release >> >> >> step 4 Follow the instructions from the ZFS on Linux project >> <https://openzfs.github.io/openzfs-docs/Getting%20Started/RHEL-based%20distro/index.html> >> to >> install the ZFS YUM repository definition. Use the DKMS package repository >> (the default) >> >> sudo dnf install https://zfsonlinux.org/epel/zfs-release-2-3$(rpm --eval >> "%{dist}").noarch.rpm >> >> >> step 5 Install the Lustre-patched kernel packages. Ensure that the Lustre >> repository is picked for the kernel packages, by disabling the OS repos: >> >> >> sudo yum --nogpgcheck --disablerepo=base,extras,updates \ >> >> --enablerepo=lustre-server install \ >> >> kernel \ >> >> kernel-devel \ >> >> kernel-headers \ >> >> kernel-tools \ >> >> kernel-tools-libs \ >> >> kernel-tools-libs-devel >> >> >> step 6 Generate a persistent hostid on the machine, if one does not >> already exist. This is needed to help protect ZFS zpools against >> simultaneous imports on multiple servers. For example: >> >> hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'` >> >> [ "$hid" = `hostid` ] || genhostid >> >> >> step 7 reboot >> >> step 8 install lustre and zfs >> sudo yum --skip-broken --nogpgcheck --enablerepo=lustre-server install \ >> lustre-dkms \ >> lustre-osd-zfs-mount \ >> lustre \ >> lustre-resource-agents \ >> lustre-dkms \ >> zfs >> >> step 9 Load the Lustre and ZFS kernel modules to verify that the >> software has installed correctly >> >> sudo modprobe -v zfs >> >> sudo modprobe -v lustre >> >> On Wed, Jun 28, 2023 at 1:04 AM Jeff Johnson < >> jeff.john...@aeoncomputing.com> wrote: >> >>> Did you install t
Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'
Did you install the Lustre server RPMs? Your email lists both server and client repositories. Are you using DKMS? Did you install and built lustre-zfs-dkms or lustre-all-dkms packages? It doesn’t appear that you have any Lustre server kernel modules loaded, which makes me suspect you didn’t install or built the server side RPMs or DKMS trees On Tue, Jun 27, 2023 at 21:41 Yao Weng via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hi: > I follow https://wiki.lustre.org/Installing_the_Lustre_Software to > install lustre. > > My kernel is > > $ uname -r > > 4.18.0-477.13.1.el8_8.x86_64 > > I install > > https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/server > > https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/client > > https://downloads.whamcloud.com/public/e2fsprogs/1.47.0.wc2/el8 > > > lsmod | grep lustre > > *lustre* 1048576 0 > > lmv 204800 1 *lustre* > > mdc 282624 1 *lustre* > > lov 344064 2 mdc,*lustre* > > ptlrpc 2490368 7 fld,osc,fid,lov,mdc,lmv,*lustre* > > obdclass 3633152 8 fld,osc,fid,ptlrpc,lov,mdc,lmv,*lustre* > > lnet 704512 7 osc,obdclass,ptlrpc,ksocklnd,lmv,*lustre* > > libcfs266240 11 > fld,lnet,osc,fid,obdclass,ptlrpc,ksocklnd,lov,mdc,lmv, > > *lustre*lsmod | grep zfs > > lsmod | grep zfs > > *zfs* 3887104 0 > > zunicode 335872 1 *zfs* > > zzstd 512000 1 *zfs* > > zlua 176128 1 *zfs* > > zavl 16384 1 *zfs* > > icp 319488 1 *zfs* > > zcommon 102400 2 *zfs*,icp > > znvpair90112 2 *zfs*,zcommon > > spl 114688 6 *zfs*,icp,zzstd,znvpair,zcommon,zavl > > > I am able to create mgs/mdt/ost > > But when I try to mount > > sudo mount.lustre lustre-mgs/mgs /lustre/mnt > > mount.lustre: mount lustre-mgs/mgs at /lustre/mnt failed: No such device > > Are the lustre modules loaded? > > Check /etc/modprobe.conf and /proc/filesystems > > dmesg gives these error > > [76783.604090] LustreError: 158-c: Can't load module 'osd-zfs' > > [76783.606174] LustreError: 223535:0:(genops.c:361:class_newdev()) OBD: > unknown type: osd-zfs > > [76783.607856] LustreError: 223535:0:(obd_config.c:620:class_attach()) > Cannot create device MGS-osd of type osd-zfs : -19 > > [76783.609805] LustreError: > 223535:0:(obd_mount.c:195:lustre_start_simple()) MGS-osd attach error -19 > > [76783.611426] LustreError: > 223535:0:(obd_mount_server.c:1993:server_fill_super()) Unable to start osd > on lustre-mgs/mgs: -19 > > [76783.613457] LustreError: 223535:0:(super25.c:183:lustre_fill_super()) > llite: Unable to mount : rc = -19 > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] CentOS Stream 8/9 support?
This has the makings of being significant enough of an impact that I don't think it is a done deal. I'm sure someone in DC is calling someone at IBM. Even if USG does nothing, this is the kind of thing that EU regulators have stomped on in the past. I suspect this isn't open and shut...yet. On Thu, Jun 22, 2023 at 11:35 AM Laura Hild via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > We have one, small Stream 8 cluster, which is currently running a Lustre > client to which I cherry-picked a kernel compatibility patch. I could > imagine the effort being considerably more for the server component. I > also wonder, even if Whamcloud were to provide releases for Stream kernels, > how many sites would be happy with Stream's five-year lifetimes. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only
Maybe someone else in the list can add clarity but I don't believe a recovery process on mount would keep the MDS read-only or trigger that trace. Something else may be going on. I would start from the ground up. Bring your servers up, unmounted. Ensure lnet is loaded and configured properly. Test lnet using ping or lnet_selftest from your MDS to all of your OSS nodes. Then mount your combined MGS/MDT volume on the MDS and see what happens. Is your MDS in a high-availability pair? What version of Lustre are you running? ...just a few things readers on the list might want to know. --Jeff On Wed, Jun 21, 2023 at 11:21 AM Mike Mosley wrote: > Jeff, > > At this point we have the OSS shutdown. We were coming back from. full > outage and so we are trying to get the MDS up before starting to bring up > the OSS. > > Mike > > On Wed, Jun 21, 2023 at 2:15 PM Jeff Johnson < > jeff.john...@aeoncomputing.com> wrote: > >> Mike, >> >> Have you made sure the the o2ib interface on all of your Lustre servers >> (MDS & OSS) are functioning properly? Are you able to `lctl ping >> x.x.x.x@o2ib` successfully between MDS and OSS nodes? >> >> --Jeff >> >> >> On Wed, Jun 21, 2023 at 10:08 AM Mike Mosley via lustre-discuss < >> lustre-discuss@lists.lustre.org> wrote: >> >>> Rick, >>> 172.16.100.4 is the IB address of one of the OSS servers.I >>> believe the mgt and mdt0 are the same target. My understanding is >>> that we have a single instanceof the MGT which is on the first MDT server >>> i.e. it was created via a comand similar to: >>> >>> # mkfs.lustre --fsname=scratch --index=0 --mdt --mgs --replace /dev/sdb >>> >>> Does that make sense. >>> >>> On Wed, Jun 21, 2023 at 12:55 PM Mohr, Rick wrote: >>> >>>> Which host is 172.16.100.4? Also, are the mgt and mdt0 on the same >>>> target or are they two separate targets just on the same host? >>>> >>>> --Rick >>>> >>>> >>>> On 6/21/23, 12:52 PM, "Mike Mosley" >>> <mailto:mike.mos...@charlotte.edu>> wrote: >>>> >>>> >>>> Hi Rick, >>>> >>>> >>>> The MGS/MDS are combined. The output I posted is from the primary. >>>> >>>> >>>> >>>> >>>> THanks, >>>> >>>> >>>> >>>> >>>> Mike >>>> >>>> >>>> >>>> >>>> On Wed, Jun 21, 2023 at 12:27 PM Mohr, Rick >>> moh...@ornl.gov> <mailto:moh...@ornl.gov <mailto:moh...@ornl.gov>>> >>>> wrote: >>>> >>>> >>>> Mike, >>>> >>>> >>>> It looks like the mds server is having a problem contacting the mgs >>>> server. I'm guessing the mgs is a separate host? I would start by looking >>>> for possible network problems that might explain the LNet timeouts. You can >>>> try using "lctl ping" to test the LNet connection between nodes, and you >>>> can also try regular "ping" between the IP addresses on the IB interfaces. >>>> >>>> >>>> --Rick >>>> >>>> >>>> >>>> >>>> On 6/21/23, 11:35 AM, "lustre-discuss on behalf of Mike Mosley via >>>> lustre-discuss" >>> lustre-discuss-boun...@lists.lustre.org> <_blank> >>> lustre-discuss-boun...@lists.lustre.org >>> lustre-discuss-boun...@lists.lustre.org> <_blank>> on behalf of >>>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org> >>>> <_blank> <mailto:lustre-discuss@lists.lustre.org >>> lustre-discuss@lists.lustre.org> <_blank>>> wrote: >>>> >>>> >>>> >>>> >>>> Greetings, >>>> >>>> >>>> >>>> >>>> We have experienced some type of issue that is causing both of our MDS >>>> servers to only be able to mount the mdt device in read only mode. Here are >>>> some of the error messages we are seeing in the log files below. We lost >>>> our Lustre expert a while back and we are not sure how to proceed to >>>> troubleshoot this issue. Can anybody provide us guidance on how to proceed? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> &
Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only
kernel: [] ? >> class_config_dump_handler+0x7e0/0x7e0 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> mgc_process_config+0x88b/0x13f0 [mgc] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> lustre_process_log+0x2d8/0xad0 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> libcfs_debug_msg+0x57/0x80 [libcfs] >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> lprocfs_counter_add+0xf9/0x160 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> server_start_targets+0x13a4/0x2a20 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> lustre_start_mgc+0x260/0x2510 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> class_config_dump_handler+0x7e0/0x7e0 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> server_fill_super+0x10cc/0x1890 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> lustre_fill_super+0x468/0x960 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> lustre_common_put_super+0x270/0x270 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> mount_nodev+0x4f/0xb0 >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> lustre_mount+0x38/0x60 [obdclass] >> Jun 20 15:12:14 hyd-mds1 kernel: [] mount_fs+0x3e/0x1b0 >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> vfs_kern_mount+0x67/0x110 >> Jun 20 15:12:14 hyd-mds1 kernel: [] do_mount+0x1ef/0xd00 >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> __check_object_size+0x1ca/0x250 >> Jun 20 15:12:14 hyd-mds1 kernel: [] ? >> kmem_cache_alloc_trace+0x3c/0x200 >> Jun 20 15:12:14 hyd-mds1 kernel: [] SyS_mount+0x83/0xd0 >> Jun 20 15:12:14 hyd-mds1 kernel: [] >> system_call_fastpath+0x25/0x2a >> Jun 20 15:13:14 hyd-mds1 kernel: LNet: >> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for >> 172.16.100.4@o2ib: 9 seconds >> Jun 20 15:13:14 hyd-mds1 kernel: LNet: >> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 239 previous >> similar messages >> Jun 20 15:14:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for >> more than 120 seconds. >> Jun 20 15:14:14 hyd-mds1 kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Jun 20 15:14:14 hyd-mds1 kernel: mount.lustre D 9f27a3bc5230 0 4123 1 >> 0x0086 >> >> >> >> >> >> >> >> >> >> >> >> >> dumpe2fs seems to show that the file systems are clean i.e. >> >> >> >> >> >> >> >> >> dumpe2fs 1.45.6.wc1 (20-Mar-2020) >> Filesystem volume name: hydra-MDT >> Last mounted on: / >> Filesystem UUID: 3ae09231-7f2a-43b3-a4ee-7f36080b5a66 >> Filesystem magic number: 0xEF53 >> Filesystem revision #: 1 (dynamic) >> Filesystem features: has_journal ext_attr resize_inode dir_index filetype >> mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink >> quota >> Filesystem flags: signed_directory_hash >> Default mount options: user_xattr acl >> Filesystem state: clean >> Errors behavior: Continue >> Filesystem OS type: Linux >> Inode count: 2247671504 >> Block count: 1404931944 >> Reserved block count: 70246597 >> Free blocks: 807627552 >> Free inodes: 2100036536 >> First block: 0 >> Block size: 4096 >> Fragment size: 4096 >> Reserved GDT blocks: 1024 >> Blocks per group: 20472 >> Fragments per group: 20472 >> Inodes per group: 32752 >> Inode blocks per group: 8188 >> Flex block group size: 16 >> Filesystem created: Thu Aug 8 14:21:01 2019 >> Last mount time: Tue Jun 20 15:19:03 2023 >> Last write time: Wed Jun 21 10:43:51 2023 >> Mount count: 38 >> Maximum mount count: -1 >> Last checked: Thu Aug 8 14:21:01 2019 >> Check interval: 0 () >> Lifetime writes: 219 TB >> Reserved blocks uid: 0 (user root) >> Reserved blocks gid: 0 (group root) >> First inode: 11 >> Inode size: 1024 >> Required extra isize: 32 >> Desired extra isize: 32 >> Journal inode: 8 >> Default directory hash: half_md4 >> Directory Hash Seed: 2e518531-82d9-4652-9acd-9cf9ca09c399 >> Journal backup: inode blocks >> MMP block number: 1851467 >> MMP update interval: 5 >> User quota inode: 3 >> Group quota inode: 4 >> Journal features: journal_incompat_revoke >> Journal size: 4096M >> Journal length: 1048576 >> Journal sequence: 0x0a280713 >> Journal start: 0 >> MMP_block: >> mmp_magic: 0x4d4d50 >> mmp_check_interval: 6 >> mmp_sequence: 0xff4d4d50 >> mmp_update_date: Wed Jun 21 10:43:51 2023 >> mmp_update_time: 1687358631 >> mmp_node_name: hyd-mds1.uncc.edu <_blank> <_blank> >> mmp_device_name: dm-0 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] storing Lustre jobid in file xattrs: seeking feedback
Just a thought, instead of embedding the jobname itself, perhaps just a least significant 7 character sha-1 hash of the jobname. Small chance of collision, easy to decode/cross reference to jobid when needed. Just a thought. --Jeff On Fri, May 12, 2023 at 3:08 PM Andreas Dilger via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hi Thomas, > thanks for working on this functionality and raising this question. > > As you know, I'm inclined toward the user.job xattr, but I think it is > never a good idea to unilaterally make policy decisions in the kernel that > cannot be changed. > > As such, it probably makes sense to have a tunable parameter like " > mdt.*.job_xattr=user.job" and then this could be changed in the future if > there is some conflict (e.g. some site already uses the "user.job" xattr > for some other purpose). > > I don't think the job_xattr should allow totally arbitrary values (e.g. > overwriting trusted.lov or trusted.lma or security.* would be bad). One > option is to only allow a limited selection of valid xattr namespaces, and > possibly names: > >- NONE to turn this feature off >- user, or trusted or system (if admin wants to restrict the ability >of regular users to change this value?), with ".job" added >automatically >- user.* (or trusted.* or system.*) to also allow specifying the xattr >name > > If we allow the xattr name portion to be specified (which I'm not sure > about, but putting it out for completeness), it should have some reasonable > limits: > >- <= 7 characters long to avoid wasting valuable xattr space in the >inode >- should not conflict with other known xattrs, which is tricky if we >allow the name to be arbitrary. Possibly if in trusted (and system?) >it should only allow trusted.job to avoid future conflicts? >- maybe restrict it to contain "job" (or maybe "pbs", "slurm", ...) to >reduce the chance of namespace clashes in user or system? However, I'm >reluctant to restrict names in user since this *shouldn't* have any >fatal side effects (e.g. data corruption like in trusted or system), >and the admin is supposed to know what they are doing... > > > On May 4, 2023, at 15:53, Bertschinger, Thomas Andrew Hjorth via > lustre-discuss wrote: > > Hello Lustre Users, > > There has been interest in a proposed feature > https://jira.whamcloud.com/browse/LU-13031 to store the jobid with each > Lustre file at create time, in an extended attribute. An open question is > which xattr namespace is to use between "user", the Lustre-specific > namespace "lustre", "trusted", or even perhaps "system". > > The correct namespace likely depends on how this xattr will be used. For > example, will interoperability with other filesystems be important? > Different namespaces have their own limitations so the correct choice > depends on the use cases. > > I'm looking for feedback on applications for this feature. If you have > thoughts on how you could use this, please feel free to share them so that > we design it in a way that meets your needs. > > Thanks! > > Tom Bertschinger > LANL > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Disk failures triggered during OST creation and mounting on OSS Servers
unt: Succeeded. > May 9 13:36:09 sphnxoss47 kernel: LDISKFS-fs (md2): mounted filesystem > with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc > Show less > 11:03 AM > > - > > it just repeats for all of the md raids, then the errors start and the > drive fails and is disabled: > > May 9 13:44:31 sphnxoss47 kernel: LustreError: > 48069:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount > : rc = -110 > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > > > May 9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#1102 FAILED > Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=1s > May 9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#1102 CDB: > Read(10) 28 00 00 00 87 79 00 00 01 00 > May 9 13:44:33 sphnxoss47 kernel: blk_update_request: I/O error, dev > sdef, sector 277448 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0 > May 9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#6800 FAILED > Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=1s > May 9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#6800 CDB: > Read(10) 28 00 00 00 87 dd 00 00 01 00 > May 9 13:44:33 sphnxoss47 kernel: blk_update_request: I/O error, dev > sdef, sector 278248 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0 > May 9 13:44:33 sphnxoss47 kernel: device-mapper: multipath: 253:52: > Failing path 128:112. > May 9 13:44:33 sphnxoss47 multipathd[6051]: sdef: mark as failed > May 9 13:44:33 sphnxoss47 multipathd[6051]: mpathae: remaining active > paths: 1 > ... > ... > May 9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a): > originator(PL), code(0x12), sub_code(0x011a) > May 9 13:44:34 sphnxoss47 kernel: md: super_written gets error=-5 > May 9 13:44:34 sphnxoss47 kernel: md/raid:md8: Disk failure on dm-55, > disabling device. > May 9 13:44:34 sphnxoss47 kernel: md: super_written gets error=-5 > May 9 13:44:34 sphnxoss47 kernel: md/raid:md8: Operation continuing on > 9 devices. > May 9 13:44:34 sphnxoss47 multipathd[6051]: sdah: mark as failed > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] Mounting lustre on block device
If you *really* want a block device on a client that resides in Lustre you *could* create a file in Lustre and then make that file a loopback device with losetup. Of course, your mileage will vary *a lot* based on use case, access, underlying LFS configuration. dd if=/dev/zero of=/my_lustre_mountpoint/some_subdir/big_raw_file bs=1048576 count=10 losetup -f /my_lustre_mountpoint/some_subdir/big_raw_file *assuming loop0 is created* some_fun_command /dev/loop0 Disclaimer: Just because you *can* do this, doesn't necessarily mean it is a good idea On Thu, Mar 16, 2023 at 3:29 PM Mohr, Rick via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Are you asking if you can mount Lustre on a client so that it shows up as > a block device? If so, the answer to that is you can't. Lustre does not > appear as a block device to the clients. > > -Rick > > > > On 3/16/23, 3:44 PM, "lustre-discuss on behalf of Shambhu Raje via > lustre-discuss" lustre-discuss-boun...@lists.lustre.org> on behalf of > lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>> > wrote: > > > When we mount a lustre file system on client, the lustre file system does > not use block device on client side. Instead it uses virtual file system > namespace. Mounting point will not be shown when we do 'lsblk'. As it only > show on 'df-hT'. > > > How can we mount lustre file system on block such that when we write > something with lusterfs then it can be shown in block device?? > Can share command?? > > > > > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Memory Management in Lustre
Ellis, I haven't messed with it much personally but if you look at some of the Lustre module parameters, like in the case of module obdclass, you will see some options that could be of interest like lu_cache_percent. I'm sure a Whamcloud person might chime in with more detail. # modinfo obdclass filename: /lib/modules/3.10.0-957.27.2.el7.DPC.x86_64/extra/obdclass.ko.xz license:GPL version:2.12.2 description:Lustre Class Driver author: OpenSFS, Inc. <http://www.lustre.org/> alias: fs-lustre retpoline: Y rhelversion:7.6 srcversion: 3D7126D7BB611F089C67867 depends:libcfs,lnet,crc-t10dif vermagic: 3.10.0-957.27.2.el7.DPC.x86_64 SMP mod_unload modversions parm: lu_cache_percent:Percentage of memory to be used as lu_object cache (int) parm: lu_cache_nr:Maximum number of objects in lu_object cache (long) parm: lprocfs_no_percpu_stats:Do not alloc percpu data for lprocfs stats (int) --Jeff On Wed, Jan 19, 2022 at 6:35 PM Ellis Wilson via lustre-discuss wrote: > > Hi folks, > > Broader (but related) question than my current malaise with OOM issues on > 2.14/2.15: Is there any documentation or can somebody point me at some code > that explains memory management within Lustre? I've hunted through Lustre > manuals, the Lustre internals doc, and a bunch of code, but can find nothing > that documents the memory architecture in place. I'm specifically looking at > PTLRPC and OBD code right now, and I can't seem to find anywhere that > explicitly limits the amount of allocations Lustre will perform. On other > filesystems I've worked on there are memory pools that you can explicitly > size with maxes, and while these may be discrete between areas or reference > counters used to leverage a system-shared pool, I expected to see /something/ > that might bake in limits of some kind. I'm sure I'm just not finding it. > Any help is greatly appreciated. > > Best, > > ellis > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Unable to mount new OST
pathd: sddy [128:0]: path added to devmap > OST0051 > Jul 4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte > logical blocks: (142 TB/129 TiB) > Jul 4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off > Jul 4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled, > read cache: enabled, supports DPO and FUA > Jul 4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk > Jul 4 06:02:07 oss03 multipathd: sddy: add path (uevent) > Jul 4 06:02:07 oss03 multipathd: sddy [128:0]: path added to devmap > OST0051 > Jul 4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte > logical blocks: (142 TB/129 TiB) > Jul 4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off > Jul 4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled, > read cache: enabled, supports DPO and FUA > Jul 4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk > Jul 4 06:25:54 oss03 multipathd: sddy: add path (uevent) > Jul 4 06:25:54 oss03 multipathd: sddy [128:0]: path added to devmap > OST0051 > Jul 4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte > logical blocks: (142 TB/129 TiB) > Jul 4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off > Jul 4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled, > read cache: enabled, supports DPO and FUA > Jul 4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk > Jul 4 07:22:23 oss03 multipathd: sddy: add path (uevent) > Jul 4 07:22:23 oss03 multipathd: sddy [128:0]: path added to devmap > OST0051 > Jul 6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte > logical blocks: (142 TB/129 TiB) > Jul 6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off > Jul 6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled, > read cache: enabled, supports DPO and FUA > Jul 6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk > Jul 6 07:59:42 oss03 multipathd: sddy: add path (uevent) > Jul 6 07:59:42 oss03 multipathd: sddy [128:0]: path added to devmap > OST0051 > > On Wed, Jul 7, 2021 at 7:24 AM Jeff Johnson < > jeff.john...@aeoncomputing.com> wrote: > >> What devices are underneath dm-21 and are there any errors in >> /var/log/messages for those devices? (assuming /dev/sdX devices underneath) >> >> Run `ls /sys/block/dm-21/slaves` to see what devices are beneath dm-21 >> >> >> >> >> >> On Tue, Jul 6, 2021 at 20:09 David Cohen >> wrote: >> >>> Hi, >>> The index of the OST is unique in the system and free for the new one, >>> as it is increased by "1" for every new OST created, so whatever it >>> converts to should not be relevant to it's refusal to mount, or am I >>> mistaken? >>> >>> I'm pasting the log messages again, in case they were lost up the >>> thread, adding the output of "fdisk -l", should the OST size be the issue: >>> >>> lctl dk show tens of thousands of lines repeating the same error after >>> attempting to mount the OST: >>> >>> 0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >>> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >>> 0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >>> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >>> 0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >>> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >>> >>> in /var/log/messages I see the following corresponding to dm21 which is >>> the new OST: >>> >>> Jul 6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21): >>> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, >>> please wait. >>> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled, >>> maximum tree depth=5 >>> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >>> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous >>> mount: IO failure >>> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >>> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check. >>> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs >>> with errors, running e2fsck is recommended >>> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete >>> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem >>> w
Re: [lustre-discuss] Unable to mount new OST
behalf of David Cohen" < >> lustre-discuss-boun...@lists.lustre.org on behalf of >> cda...@physics.technion.ac.il> wrote: >> >> >> >> Thanks Andreas, >> >> I'm aware that index 51 actually translates to hex 33 >> (local-OST0033_UUID). >> I don't believe that's the reason for the failed mount as it is only an >> index that I increase for every new OST and there are no duplicates. >> >> >> >> lctl dk show tens of thousands of lines repeating the same error after >> attempting to mount the OST: >> >> >> >> 0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> 0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> 0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one()) >> local-OST0033: fail to set LMA for init OI scrub: rc = -30 >> >> >> >> in /var/log/messages I see the following corresponding to dm21 which is >> the new OST: >> >> >> >> Jul 6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, >> please wait. >> >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled, >> maximum tree depth=5 >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous >> mount: IO failure >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21): >> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check. >> Jul 6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs >> with errors, running e2fsck is recommended >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with >> ordered data mode. Opts: >> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21): >> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad >> entry in directory: rec_len is too small for name_len - offset=4084(4084), >> inode=0, rec_len=12 >> , name_len=0 >> Jul 6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8. >> Jul 6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem >> read-only >> Jul 6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21): >> kmmpd:187: kmmpd being stopped since filesystem has been remounted as >> readonly. >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last >> fsck: 6 >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time >> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233 >> Jul 6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time >> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233 >> >> As I mentioned before mount never completes so the only way out of that >> is force reboot. >> >> Thanks, >> David >> >> >> >> On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger >> wrote: >> >> >> >> >> >> On Jul 5, 2021, at 09:05, David Cohen >> wrote: >> >> >> >> Hi, >> >> I'm using Lustre 2.10.5 and lately tried to add a new OST. >> >> The OST was formatted with the command below, which other than the index >> is the exact same one used for all the other OSTs in the system. >> >> >> >> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost >> --fsname=local --index=0051 --param ost.quota_type=ug >> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp >> --mgsnode=10.0.0.1@tc >> >> p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp >> --servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051 >> >> >> >> Note that your "--index=0051" value is probably interpreted as an octal >> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match >> the OST device name) or "--index=81" (decimal). >> >> >> >> >> >> When trying to mount the with: >> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051 >> >> >> >> The system stays on 100% CPU (one core) forever and the mount never >> completes, not even after a week. >> >> >> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the >> other targets, but the behaviour remains the same. >> >> >> >> Cheers, Andreas >> >> -- >> >> Andreas Dilger >> >> Lustre Principal Architect >> >> Whamcloud >> >> >> >> >> >> >> >> >> >> >> >> >> >> ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] zfs
I was just popping a big bowl of popcorn for this... ;-D On Mon, Dec 21, 2020 at 6:59 AM Peter Jones wrote: > Just in case anyone was wondering – the poster never did reach out to me > so this does seem to be more of a case of phishing/trolling rather than > someone being genuinely confused. > > > > *From: *Peter Jones > *Date: *Monday, December 14, 2020 at 5:03 AM > *To: *Samantha Smith , " > lustre-discuss@lists.lustre.org" > *Subject: *Re: [lustre-discuss] zfs > > > > Sam > > > > Welcome to the list. This is surprising for a number of reasons. Could you > please reach out to me directly from your corporate account (rather than > gmail) and I’ll be happy to work this through with you. > > > > Thanks > > > > Peter > > > > *From: *lustre-discuss on > behalf of Samantha Smith > *Date: *Sunday, December 13, 2020 at 5:35 PM > *To: *"lustre-discuss@lists.lustre.org" > *Subject: *[lustre-discuss] zfs > > > > Our team received a demand letter from an Oracle attorney claiming patent > violations on zfs used in our DDN lustre cluster. > > > > We called our DDN sales person who gave us a non-answer and has refused to > call us back. > > > > How are other people dealing with this? > > > > sam > > > _______ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] error mounting client
Your output shows Infiniband NIDs (@o2ib). If you are mounting @tcp what is your tcp access method to the Infiniband file system? Multihomed? lnet router? --Jeff On Tue, Aug 25, 2020 at 8:32 AM Peeples, Heath wrote: > We have just build a 2.12.5 cluster. When trying to mount the fs (via > tcp). I get the following errors. Would anyone have an idea what the > problem might be? Thanks in advance > > > > > > [10680.535157] LustreError: 15c-8: MGC192.168.8.8@tcp: The configuration > from log 'ldata-client' failed (-2). This may be the result of > communication errors between this node and the MGS, a bad configuration, or > other errors. See the syslog for more information. > > [10680.883649] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) > ldata-clilov-91b118df1000: lov tgt 0 not cleaned! deathrow=0, lovrc=1 > > [10680.886610] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) Skipped > 4 previous similar messages > > [10680.890298] LustreError: 12634:0:(obd_config.c:610:class_cleanup()) > Device 9 not setup > > [10680.891816] Lustre: Unmounted ldata-client > > [10680.895178] LustreError: 12634:0:(obd_mount.c:1608:lustre_fill_super()) > Unable to mount (-2) > > [10763.516841] LustreError: 12732:0:(ldlm_lib.c:494:client_obd_setup()) > can't add initial connection > > [10763.518368] LustreError: 12732:0:(obd_config.c:559:class_setup()) setup > ldata-OST0006-osc-91b125029800 failed (-2) > > [10763.519806] LustreError: > 12732:0:(obd_config.c:1835:class_config_llog_handler()) MGC192.168.8.8@tcp: > cfg command failed: rc = -2 > > [10763.522603] Lustre:cmd=cf003 0:ldata-OST0006-osc > 1:ldata-OST0006_UUID 2:172.23.0.116@o2ib > > > > Heath > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Centos 7.7 upgrade
Alastair, Are you sure you have functioning ZFS modules for that kernel, and that they are loaded? Are you able to see your zpools? Did you use DKMS for either ZFS, Lustre or both? If so, what does `dkms status` report? --Jeff On Mon, Jun 1, 2020 at 4:21 PM Alastair Basden wrote: > Hi, > > We have just upgraded Lustre servers from 2.12.2 on centos 7.6 to 2.12.3 > on centos 7.7. > > The OSSs are on top of zfs (0.7.13 as recommended), and we are using > 3.10.0-1062.1.1.el7_lustre.x86_64 > > After the update, Lustre will no longer mount - and messages such as: > Jun 2 00:02:44 hostname kernel: LustreError: 158-c: Can't load module > 'osd-zfs' > Jun 2 00:02:44 hostname kernel: LustreError: Skipped 875 previous similar > messages > Jun 2 00:02:44 hostname kernel: LustreError: > 226253:0:(genops.c:397:class_newdev()) OBD: unknown type: osd-zfs > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_config.c:403:class_attach()) Cannot create device > lustfs-OST0006-osd of type osd-zfs : -19 > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_config.c:403:class_attach()) Skipped 881 previous similar > messages > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount.c:197:lustre_start_simple()) lustfs-OST0006-osd attach > error -19 > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount.c:197:lustre_start_simple()) Skipped 881 previous > similar messages > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount_server.c:1947:server_fill_super()) Unable to start osd > on lustfs-ost6/ost6: -19 > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount_server.c:1947:server_fill_super()) Skipped 881 previous > similar messages > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-19) > Jun 2 00:02:44 hostname kernel: LustreError: > 226265:0:(obd_mount.c:1608:lustre_fill_super()) Skipped 881 previous > similar messages > Jun 2 00:02:44 hostname kernel: LustreError: > 226253:0:(genops.c:397:class_newdev()) Skipped 887 previous similar messages > > Does anyone have any ideas? > > Thanks, > Alastair. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Any way to dump Lustre quota data?
Kevin, There are files in /proc/fs/lustre/qmt/yourfsname-QMT/ that you can pull it all from based on UID and GID. Look for files like md-0x0/glb-usr dt-0x0/glb-usr and files in /proc/fs/lustre/osd-zfs/yourfsname-MDT/quota_slave. I’m not in front of a keyboard, I’m cooking breakfast but I’ll follow up with the exact files. You can cat them and maybe find what you’re looking for. —Jeff On Thu, Sep 5, 2019 at 05:07 Kevin M. Hildebrand wrote: > Is there any way to dump the Lustre quota data in its entirety, rather > than having to call 'lfs quota' individually for each user, group, and > project? > > I'm currently doing this on a regular basis so we can keep graphs of how > users and groups behave over time, but it's problematic for two reasons: > 1. Getting a comprehensive list of users and groups to iterate over is > difficult- sure I can use the passwd/group files, but if a user has been > deleted there may still be files owned by a now orphaned userid or groupid > which I won't see. We may also have thousands of users in the passwd file > that don't have files on a particular Lustre filesystem, and doing lfs > quota calls for those users wastes time. > 2. Calling lfs quota hundreds of times for each of the users, groups, and > projects takes a while. This reduces my ability to collect the data at the > frequency I want. Ideally I'd like to be able to collect every minute or > so. > > I have two different Lustre installations, one running 2.8.0 with ldiskfs, > the other running 2.10.8 with ZFS. > > Thanks, > Kevin > > -- > Kevin Hildebrand > University of Maryland > Division of IT > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Wanted: multipath.conf for dell ME4 series arrays
Andrew, ME4084 is a dual-controller active/active hardware RAID array. Disclosing some config data could be helpful. 1. What underlying Lustre target filesystem? (assuming ldiskfs with a hardware RAID array) 2. What does your current multipath.conf look like? --Jeff On Tue, Aug 20, 2019 at 11:47 PM Andrew Elwell wrote: > Hi folks, > > we're seeing MMP reluctance to hand over the (umounted) OSTs to the > partner pair on our shiny new ME4084 arrays, > > Does anyone have the device {} settings they'd be willing to share? > My gut feel is we've not defined path failover properly and some > timeouts need tweaking > > > (4* ME4084's per 2 740 servers with SAS cabling, Lustre 2.10.8 and CentOS > 7.x) > > Many thanks > > > Andrew > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.johnson at aeoncomputing dot com www.aeoncomputing.com 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors
If you only have those two processor models to choose from I’d do the 5217 for MDS and 5218 for OSS. If you were using ZFS for a backend definitely the 5218 for the OSS. With ZFS your processors are also your RAID controller so you have the disk i/o, parity calculation, checksums and ZFS threads on top of the Lustre i/o and OS processes. —Jeff On Thu, Jul 4, 2019 at 13:30 Simon Legrand wrote: > Hello Jeff, > > Thanks for your quick answer. We plan to use ldiskfs, but I would be > interested to know what could fit for zfs. > > Simon > > ------ > > *De: *"Jeff Johnson" > *À: *"Simon Legrand" > *Cc: *"lustre-discuss" > *Envoyé: *Jeudi 4 Juillet 2019 20:40:40 > *Objet: *Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors > > Simon, > > Which backend do you plan on using? ldiskfs or zfs? > > —Jeff > > On Thu, Jul 4, 2019 at 10:41 Simon Legrand wrote: > >> Dear all, >> >> We are currently configuring a Lustre filesystem and facing a dilemma. We >> have the choice between two types of processors for an OSS and a MDS. >> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T >> (115W) - DDR4-2666 >> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo, >> HT,16C/32T (105W) - DDR4-2666 >> >> Basically, we have to choose between freequency and number of cores. >> Our current architecture is the following: >> - 1MDS with 11To SDD >> - 3 OSS/OST (~ 3*80To) >> Our final target is 6 OSS/OST with a single MDS. >> Do anyone of you could help us to choose and explain us the reasons? >> >> Best regards, >> >> Simon >> _______ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > -- > -- > Jeff Johnson > Co-Founder > Aeon Computing > > jeff.john...@aeoncomputing.com > www.aeoncomputing.com > t: 858-412-3810 x1001 f: 858-412-3845 > m: 619-204-9061 > > 4170 Morena Boulevard, Suite C - San Diego, CA 92117 > <https://www.google.com/maps/search/4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117?entry=gmail=g> > High-Performance Computing / Lustre Filesystems / Scale-out Storage > > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors
Simon, Which backend do you plan on using? ldiskfs or zfs? —Jeff On Thu, Jul 4, 2019 at 10:41 Simon Legrand wrote: > Dear all, > > We are currently configuring a Lustre filesystem and facing a dilemma. We > have the choice between two types of processors for an OSS and a MDS. > - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T > (115W) - DDR4-2666 > - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo, > HT,16C/32T (105W) - DDR4-2666 > > Basically, we have to choose between freequency and number of cores. > Our current architecture is the following: > - 1MDS with 11To SDD > - 3 OSS/OST (~ 3*80To) > Our final target is 6 OSS/OST with a single MDS. > Do anyone of you could help us to choose and explain us the reasons? > > Best regards, > > Simon > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Issue updating lustre from 2.10.6 to 2.10.7
Kurt, I see you're using dkms. Take a look at `dkms status` and ensure that there are no lingering installs from the previous version. Sometimes the older version doesn't get uninstalled from /lib/modules/`uname -r`/ and the dkms install process for the new version doesn't overwrite them. --Jeff On Fri, Apr 12, 2019 at 8:34 AM Kurt Strosahl wrote: > Good Morning, > > >I've encountered an issue updating lustre from 2.10.6 to 2.10.7 on my > metadata system. > > > I installed the updated RPMs using whamcloud's yum repository, but when I > run modprobe -v lustre I get the following errors: > > insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/libcfs.ko.xz > insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/lnet.ko.xz > networks=o2ib0(bond0) > insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/obdclass.ko.xz > insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/ptlrpc.ko.xz > insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/fld.ko.xz > modprobe: ERROR: could not insert 'lustre': Invalid argument > > dmesg shows the following: > > [ 377.981515] fld: disagrees about version of symbol class_export_put > [ 377.981518] fld: Unknown symbol class_export_put (err -22) > [ 377.981529] fld: disagrees about version of symbol > req_capsule_server_pack > [ 377.981531] fld: Unknown symbol req_capsule_server_pack (err -22) > [ 377.981537] fld: disagrees about version of symbol req_capsule_set_size > [ 377.981538] fld: Unknown symbol req_capsule_set_size (err -22) > [ 377.981542] fld: disagrees about version of symbol req_capsule_client_get > [ 377.981543] fld: Unknown symbol req_capsule_client_get (err -22) > [ 377.981555] fld: disagrees about version of symbol lu_env_init > [ 377.981556] fld: Unknown symbol lu_env_init (err -22) > [ 377.981562] fld: disagrees about version of symbol ptlrpc_queue_wait > [ 377.981563] fld: Unknown symbol ptlrpc_queue_wait (err -22) > [ 377.981571] fld: disagrees about version of symbol lu_context_key_get > [ 377.981573] fld: Unknown symbol lu_context_key_get (err -22) > [ 377.981604] fld: disagrees about version of symbol lu_env_fini > [ 377.981606] fld: Unknown symbol lu_env_fini (err -22) > [ 377.981611] fld: disagrees about version of symbol > lu_context_key_degister > [ 377.981612] fld: Unknown symbol lu_context_key_degister (err -22) > [ 377.981623] fld: disagrees about version of symbol class_exp2cliimp > [ 377.981625] fld: Unknown symbol class_exp2cliimp (err -22) > [ 377.981631] fld: disagrees about version of symbol req_capsule_set > [ 377.981632] fld: Unknown symbol req_capsule_set (err -22) > > I've tried uninstalling and reinstalling the RPMs but this is always the > state I end up in. > > > I'm using zfs for the back end, and those modules work after the upgrade > > > Here are the RPMs currently installed. > > perf-3.10.0-957.1.3.el7_lustre.x86_64 > lustre-resource-agents-2.10.7-1.el7.x86_64 > lustre-osd-zfs-mount-2.10.7-1.el7.x86_64 > kernel-headers-3.10.0-957.1.3.el7_lustre.x86_64 > lustre-2.10.7-1.el7.x86_64 > kernel-3.10.0-957.1.3.el7_lustre.x86_64 > perf-debuginfo-3.10.0-957.el7_lustre.x86_64 > lustre-zfs-dkms-2.10.7-1.el7.noarch > kernel-debuginfo-common-x86_64-3.10.0-957.el7_lustre.x86_64 > kernel-devel-3.10.0-957.1.3.el7_lustre.x86_64 > bpftool-3.10.0-957.el7_lustre.x86_64 > > > Thank you, > > Kurt J. Strosahl > System Administrator: Lustre, HPC > Scientific Computing Group, Thomas Jefferson National Accelerator Facility > _______ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?
Jason, If there are files located on the full OST that you can delete then delete them without trying to deactivate the OST. There is a process where you can find files solely located on the full OST using lfs then deactivate the full OST, copy the files to new files with some suffix (.new perhaps) so they end up on other OSTs, delete the original files, rename the new files to the old file names (thereby removing the suffix used previously) and then reactivating the deactivated OST. You might be lucky enough to find a few very large single stripe files located on that OST where moving the file gets you easy gains. You *should* be able to delete a file that resides on an inactive OST but I’m not over a keyboard where I can do it before making the assertion it is possible. —Jeff On Sat, Jan 5, 2019 at 18:49 Jason Williams wrote: > Hello, > > > We have a lustre system (version 2.10.4) that has unfortunately fallen > victim to a 100% full OST... Every time we clear some space on it, the > system fills it right back up again. > > > I have looked around the internet and found you can disable an OST, but > when I have tried that, any writes (including deletes) to the OST hang the > clients indefinitely. Does anyone know a way to make an OST basically > "read-only" with the exception of deletes so we can work to clear out the > OST? Or better yet, a way to "drain" or move files off an OST with a > script (keeping in mind it might not be known if the files are in use at > the time). Or even a way to tell lustre "Hey don't write any new data > here, but reading and removing data is OK." > > > > > -- > Jason Williams > Assistant Director > Systems and Data Center Operations. > Maryland Advanced Research Computing Center (MARCC) > Johns Hopkins University > jas...@jhu.edu > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre Sizing
Very forward versions...especially on ZFS. You build OST volumes in a pool. If no other volumes are defined in a pool then 100% of that pool will be available for the OST volume but the way ZFS works the capacity doesn’t really belong to the OST volume until blocks are allocated for writes. So you have a pool Of a known size and you’re the admin. As long as nobody else can create a ZFS volume in that pool then all of the capacity in that pool will go to the OST eventually when new writes occur. Keep in mind that the same pool can contain multiple snapshots (if created) so the pool is a “potential capacity” but that capacity could be concurrently allocated to OST volume writes, snapshots and other ZFS volumes (if created) —Jeff On Mon, Dec 31, 2018 at 22:20 ANS wrote: > Thanks Jeff. Currently i am using > > modinfo zfs | grep version > version:0.8.0-rc2 > rhelversion:7.4 > > lfs --version > lfs 2.12.0 > > And this is a fresh install. So is there any other possibility to show the > complete zpool lun has been allocated for lustre alone. > > Thanks, > ANS > > > > On Tue, Jan 1, 2019 at 11:44 AM Jeff Johnson < > jeff.john...@aeoncomputing.com> wrote: > >> ANS, >> >> Lustre on top of ZFS has to estimate capacities and it’s fairly off when >> the OSTs are new and empty. As objects are written to OSTs and capacity is >> consumed it gets the sizing of capacity more accurate. At the beginning >> it’s so off that it appears to be an error. >> >> What version are you running? Some patches have been added to make this >> calculation more accurate. >> >> —Jeff >> >> On Mon, Dec 31, 2018 at 22:08 ANS wrote: >> >>> Dear Team, >>> >>> I am trying to configure lustre with backend ZFS as file system with 2 >>> servers in HA. But after compiling and creating zfs pools >>> >>> zpool list >>> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAGCAP >>> DEDUPHEALTH ALTROOT >>> lustre-data 54.5T 25.8M 54.5T- 16.0E 0% 0% >>> 1.00xONLINE - >>> lustre-data1 54.5T 25.1M 54.5T- 16.0E 0% 0% >>> 1.00xONLINE - >>> lustre-data2 54.5T 25.8M 54.5T- 16.0E 0% 0% >>> 1.00xONLINE - >>> lustre-data3 54.5T 25.8M 54.5T- 16.0E 0% 0% >>> 1.00xONLINE - >>> lustre-meta832G 3.50M 832G- 16.0E 0% 0% >>> 1.00xONLINE - >>> >>> and when mounted to client >>> >>> lfs df -h >>> UUID bytesUsed Available Use% Mounted on >>> home-MDT_UUID 799.7G3.2M 799.7G 0% >>> /home[MDT:0] >>> home-OST_UUID 39.9T 18.0M 39.9T 0% >>> /home[OST:0] >>> home-OST0001_UUID 39.9T 18.0M 39.9T 0% >>> /home[OST:1] >>> home-OST0002_UUID 39.9T 18.0M 39.9T 0% >>> /home[OST:2] >>> home-OST0003_UUID 39.9T 18.0M 39.9T 0% >>> /home[OST:3] >>> >>> filesystem_summary: 159.6T 72.0M 159.6T 0% /home >>> >>> So out of total 54.5TX4=218TB i am getting only 159 TB usable. So can >>> any one give the information regarding this. >>> >>> Also from performance prospective what are the zfs and lustre parameters >>> to be tuned. >>> >>> -- >>> Thanks, >>> ANS. >>> ___ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> >> -- >> -- >> Jeff Johnson >> Co-Founder >> Aeon Computing >> >> jeff.john...@aeoncomputing.com >> www.aeoncomputing.com >> t: 858-412-3810 x1001 f: 858-412-3845 >> m: 619-204-9061 >> >> 4170 Morena Boulevard, Suite C - San Diego, CA 92117 >> <https://maps.google.com/?q=4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117=gmail=g> >> >> High-Performance Computing / Lustre Filesystems / Scale-out Storage >> > > > -- > Thanks, > ANS. > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre Sizing
ANS, Lustre on top of ZFS has to estimate capacities and it’s fairly off when the OSTs are new and empty. As objects are written to OSTs and capacity is consumed it gets the sizing of capacity more accurate. At the beginning it’s so off that it appears to be an error. What version are you running? Some patches have been added to make this calculation more accurate. —Jeff On Mon, Dec 31, 2018 at 22:08 ANS wrote: > Dear Team, > > I am trying to configure lustre with backend ZFS as file system with 2 > servers in HA. But after compiling and creating zfs pools > > zpool list > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAGCAP DEDUP > HEALTH ALTROOT > lustre-data 54.5T 25.8M 54.5T- 16.0E 0% 0% 1.00x > ONLINE - > lustre-data1 54.5T 25.1M 54.5T- 16.0E 0% 0% 1.00x > ONLINE - > lustre-data2 54.5T 25.8M 54.5T- 16.0E 0% 0% 1.00x > ONLINE - > lustre-data3 54.5T 25.8M 54.5T- 16.0E 0% 0% 1.00x > ONLINE - > lustre-meta832G 3.50M 832G- 16.0E 0% 0% 1.00x > ONLINE - > > and when mounted to client > > lfs df -h > UUID bytesUsed Available Use% Mounted on > home-MDT_UUID 799.7G3.2M 799.7G 0% /home[MDT:0] > home-OST_UUID 39.9T 18.0M 39.9T 0% /home[OST:0] > home-OST0001_UUID 39.9T 18.0M 39.9T 0% /home[OST:1] > home-OST0002_UUID 39.9T 18.0M 39.9T 0% /home[OST:2] > home-OST0003_UUID 39.9T 18.0M 39.9T 0% /home[OST:3] > > filesystem_summary: 159.6T 72.0M 159.6T 0% /home > > So out of total 54.5TX4=218TB i am getting only 159 TB usable. So can any > one give the information regarding this. > > Also from performance prospective what are the zfs and lustre parameters > to be tuned. > > -- > Thanks, > ANS. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Can we install lustre on centos 7.5
Shirshak, Lustre 2.10.5 can be installed on RHEL/CentOS 7.5. Documentation can be found in html, pdf and wiki formats at www.lustre.org --Jeff On Sun, Sep 23, 2018 at 8:09 AM shirshak bajgain wrote: > I head there is problem in centos 7.5 kernal. It took around 10-20 fresh > installation and the fact is there is no proper documentation of lusture > installation as intel version is not working any more. Is there anyway to > install lustre in centos 7.5? And where is proper guide for lusture > installation. > > Thanks > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] rhel 7.5
The ChangeLog in 2.11.59 doesn’t mention a 7.5 server kernel. You can reference lustre/ChangeLog in the various tags in the Lustre git repo. The official support matrix is here: https://wiki.hpdd.intel.com/plugins/servlet/mobile?contentId=8126580#content/view/8126580 —Jeff On Mon, Apr 30, 2018 at 11:01 Hebenstreit, Michael < michael.hebenstr...@intel.com> wrote: > I have 2.11 already running on a 7.5 clone (client only) > > -Original Message- > From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On > Behalf Of Michael Di Domenico > Sent: Monday, April 30, 2018 11:49 AM > Cc: lustre-discuss <lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] rhel 7.5 > > On Mon, Apr 30, 2018 at 10:09 AM, Jeff Johnson < > jeff.john...@aeoncomputing.com> wrote: > > RHEL 7.5 support comes in Lustre 2.10.4. Only path I can think of off > > the top of my head is to git clone and build a 2.10.4 prerelease and > > live on the bleeding edge. I’m not sure if all of the 7.5 work is > > finished in the current prerelease or not. > > argh... not sure i want to be that bleeding edge... sadly, i can't find > a release schedule for 2.10.4. i wonder if 2.11 will work > > > > > —Jeff > > > > On Mon, Apr 30, 2018 at 06:21 Michael Di Domenico > > <mdidomeni...@gmail.com> > > wrote: > >> > >> On Mon, Apr 30, 2018 at 9:19 AM, Michael Di Domenico > >> <mdidomeni...@gmail.com> wrote: > >> > when i tried to compile 2.10.2 patchless client into rpms under > >> > rhel > >> > 7.5 using kernel 3.10.0-862.el7.x86_64 > >> > > >> > the compilation went fine as far as i can tell and the rpm creation > >> > seemed to work > >> > > >> > but when i went install the rpms i got > >> > > >> > Error: Package: kmod-lustre-client-2.10.2-1.el7.x86_64 > >> > (/kmod-lustre-client-2.10.2-1.el7.x86_64 > >> > requires: kernel < 3.10.0-694 > >> > >> premature send... > >> > >> requires: kernel < 3.10.0-694 > >> Installed: kernel-3.10.0-862.el7.x86_64 (@updates/7.5) > >> > >> did i do something wrong in the recompile of the rpms for the target > >> kernel or is there a workaround for this? > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > -- > > -- > > Jeff Johnson > > Co-Founder > > Aeon Computing > > > > jeff.john...@aeoncomputing.com > > www.aeoncomputing.com > > t: 858-412-3810 x1001 f: 858-412-3845 > > m: 619-204-9061 > > > > 4170 Morena Boulevard, Suite D - San Diego, CA 92117 > > > > High-Performance Computing / Lustre Filesystems / Scale-out Storage > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] rhel 7.5
RHEL 7.5 support comes in Lustre 2.10.4. Only path I can think of off the top of my head is to git clone and build a 2.10.4 prerelease and live on the bleeding edge. I’m not sure if all of the 7.5 work is finished in the current prerelease or not. —Jeff On Mon, Apr 30, 2018 at 06:21 Michael Di Domenico <mdidomeni...@gmail.com> wrote: > On Mon, Apr 30, 2018 at 9:19 AM, Michael Di Domenico > <mdidomeni...@gmail.com> wrote: > > when i tried to compile 2.10.2 patchless client into rpms under rhel > > 7.5 using kernel 3.10.0-862.el7.x86_64 > > > > the compilation went fine as far as i can tell and the rpm creation > > seemed to work > > > > but when i went install the rpms i got > > > > Error: Package: kmod-lustre-client-2.10.2-1.el7.x86_64 > > (/kmod-lustre-client-2.10.2-1.el7.x86_64 > > requires: kernel < 3.10.0-694 > > premature send... > > requires: kernel < 3.10.0-694 > Installed: kernel-3.10.0-862.el7.x86_64 (@updates/7.5) > > did i do something wrong in the recompile of the rpms for the target > kernel or is there a workaround for this? > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre mount in heterogeneous net environment-update
Greetings Megan, One scenario that could cause this is if your appliance-style Lustre MDS is a high-availability server pair and your mount command is not declaring both NIDs in the mount command *and* the MGS and MDT resources happen to be presently residing on the MDS server you are not declaring in your mount command. If it is high-availability and the IPs of those servers is A.B.C.D and A.B.C.E then make sure your command command looks something like: mount -t lustre A.B.C.D@tcp:A.B.C.E@tcp:/somefsname /localmountpoint That way the client will be looking for the MGS in all of the places it *could* be located. Just one possibility of what may be the cause. Certainly easier and less painful than a lower level version compatibility issue. —Jeff On Wed, Feb 28, 2018 at 13:36 Ms. Megan Larko <dobsonu...@gmail.com> wrote: > Greetings List! > > We have been continuing to dissect our LNet environment between our > lustre-2.7.0 clients and the lustre-2.7.18 servers. We have moved from the > client node to the LNet server which bridges the InfiniBand (IB) and > ethernet networks. As a test, we attempted to mount the ethernet Lustre > storage from the LNet hopefully taking the IB out of the equation to limit > the scope of our debugging. > > On the LNet router the attempted mount of Lustre storage fails. The LNet > command line error on the test LNet client is exactly the same as the > original client result: > mount A.B.C.D@tcp0:/lustre at /mnt/lustre failed: Input/output error Is > the MGS running? > > On the lustre servers, both the MGS/MDS and OSS we can see the error via > dmesg: > LNet: There was an unexpected network error while writing to C.D.E.F: -110 > > and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS: > Lustre: MGS: Client (at C.D.E.F@tcp) reconnecting > > The "lctl pings" in various directions are still successful. > > So, forget the end lustre client, we are not yet getting from MGS/MDS > sucessfully to the LNet router. > We have been looking at the contents of /sys/module/lustre.conf and we are > not seeing any differences in set values between the LNet router we are > using as a test Lustre client and the Lustre MGS/MDS server. > > As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both > "appliance" style Lustre storage systems and clients tied to specific > versions of the linux kernel (for reasons other than Lustre). > > Is there a key parameter which I could still be overlooking? > > Cheers, > megan > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Level 3 support
Media malpractice. Intel still has it's level 3+ Lustre support function. The media reporting of Intel's org changes was poor at best. Some of the more inexperienced vendors may have lost touch with HPDD, my opinion. --Jeff On Fri, Feb 23, 2018 at 8:30 AM, Brian Andrus <toomuc...@gmail.com> wrote: > With the relatively recent changes in Lustre support out there, I am > curious as to what folks have started doing/planning for level 3 support. > > I know a few vendors that sell lustre based products but only provide > first or second levels of support. They used to use Intel for 3rd level, > which we had used in the past as well. But now they no longer offer it, so > they are in a possible pickle if anything goes terribly south. > > > Brian Andrus > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] weird issue w. lnet routers
John, I can't speak to Fragella's tuning making things worse but... Have you run iperf3 and lnet_selftest from your Ethernet clients to each of the lnet routers to establish what your top end is? It'd be good to determine if you have an Ethernet problem vs a lnet problem. Also, are you running Ethernet rdma? If not interrupts on the receive end can be vexing. --Jeff On Tue, Nov 28, 2017 at 17:21 John Casu <j...@chiraldynamics.com> wrote: > just built a system w. lnet routers that bridge Infiniband & 100GbE, using > Centos built in Infiniband support > servers are Infiniband, clients are 100GbE (connectx-4 cards) > > my direct write performance from clients over Infiniband is around 15GB/s > > When I introduced the lnet routers, performance dropped to 10GB/s > > Thought the problem was an MTU of 1500, but when I changed the MTUs to 9000 > performance dropped to 3GB/s. > > When I tuned according to John Fragella's LUG slides, things went even > slower (1.5GB/s write) > > does anyone have any ideas on what I'm doing wrong?? > > thanks, > -john c. > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing
Harald, Yes. You want to ensure your MOFED IB settings and options, as well as lnet options on the new systems are compatible with the settings on the existing functional LFS. You want common lnet and OFED config files across your servers and clients. This can change a bit if you get into lnet routing or more exotic configurations. For a single, non-routed flat IB fabric maintain common config files on all systems. Some do this by hand and others use provisioning environments like Puppet to achieve this. --Jeff On Tue, Nov 14, 2017 at 9:57 AM, Harald van Pee <p...@hiskp.uni-bonn.de> wrote: > Hi Jeff, > > thanks for your answer. > Can I be sure that there is no autoprobing which sets any configuration > differently? > The options given for mkfs.lustre and in /etc/modprobe.d/lustre.conf > will be the same, is this enough? > > Best > Harald > > On Tuesday 14 November 2017 18:39:49 Jeff Johnson wrote: > > Harald, > > > > As long as your new servers and clients all have the same settings in > their > > config files as your currently running configuration you should be fine. > > > > --Jeff > > > > > > On Tue, Nov 14, 2017 at 9:24 AM, Harald van Pee <p...@hiskp.uni-bonn.de> > > > > wrote: > > > Dear all, > > > > > > I have installed lustre 2.10.1 from source with MOFED 4.1. > > > mdt/mgs and oss run on centos 7.4 > > > clients on debian 9 (kernel 4.9) > > > > > > our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib > > > qdr nics > > > runs without problems on a mellanox fdr switch. > > > Now we have additional clients and servers with fdr and qdr nics. > > > > > > Do I need any special configuration (beside options lnet > networks=o2ib0) > > > if I add additional fdr clients and/or servers? > > > > > > Was the configuration probed? And does it make a difference if I would > > > start > > > with fdr servers and clients and add qdr servers and clients or the > other > > > way > > > around? > > > > > > Thanks in advance > > > Harald > > > > > > > > > ___ > > > lustre-discuss mailing list > > > lustre-discuss@lists.lustre.org > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing
Harald, As long as your new servers and clients all have the same settings in their config files as your currently running configuration you should be fine. --Jeff On Tue, Nov 14, 2017 at 9:24 AM, Harald van Pee <p...@hiskp.uni-bonn.de> wrote: > Dear all, > > I have installed lustre 2.10.1 from source with MOFED 4.1. > mdt/mgs and oss run on centos 7.4 > clients on debian 9 (kernel 4.9) > > our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib qdr > nics > runs without problems on a mellanox fdr switch. > Now we have additional clients and servers with fdr and qdr nics. > > Do I need any special configuration (beside options lnet networks=o2ib0) > if I add additional fdr clients and/or servers? > > Was the configuration probed? And does it make a difference if I would > start > with fdr servers and clients and add qdr servers and clients or the other > way > around? > > Thanks in advance > Harald > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 1 MDS and 1 OSS
Amjad, You might ask your vendor to propose a single MDT comprised of (8 * 500GB) 2.5" disk drives or better, SSDs. With some bio applications you would benefit from spreading the MDT I/O across more drives. How many clients to you expect to mount the file system? A standard filer (or ZFS/NFS server) will perform well compared to Lustre until you bottleneck somewhere in the server hardware (net, disk, cpu, etc), with Lustre you can add simply add one or more OSS/OSTs to the file system and performance potential increases by the number of additional OSS/OST servers. High-availability is nice to have but it isn't necessary unless your environment cannot tolerate any interruption or downtime. If your vendor proposes quality hardware these cases are infrequent. --Jeff On Mon, Oct 30, 2017 at 12:04 PM, Amjad Syed <amjad...@gmail.com> wrote: > The vendor has proposed a single MDT ( 4 * 1.2 TB) in RAID 10 > configuration. > The OST will be RAID 6 and proposed are 2 OST. > > > On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans <bev...@cray.com> wrote: > >> How many OST's are behind that OSS? How many MDT's behind the MDS? >> >> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf >> of Brian Andrus <toomuc...@gmail.com> >> Date: Monday, October 30, 2017 at 12:24 PM >> To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> >> Subject: Re: [lustre-discuss] 1 MDS and 1 OSS >> >> Hmm. That is an odd one from a quick thought... >> >> However, IF you are planning on growing and adding OSSes/OSTs, this is >> not a bad way to get started and used to how everything works. It is >> basically a single stripe storage. >> >> If you are not planning on growing, I would lean towards gluster on 2 >> boxes. I do that often, actually. A single MDS/OSS has zero redundancy, >> unless something is being done at harware level and that would help in >> availability. >> NFS is quite viable too, but you would be splitting the available storage >> on 2 boxes. >> >> Brian Andrus >> >> >> >> On 10/30/2017 12:47 AM, Amjad Syed wrote: >> >> Hello >> We are in process in procuring one small Lustre filesystem giving us 120 >> TB of storage using Lustre 2.X. >> The vendor has proposed only 1 MDS and 1 OSS as a solution. >> The query we have is that is this configuration enough , or we need more >> OSS? >> The MDS and OSS server are identical with regards to RAM (64 GB) and >> HDD (300GB) >> >> Thanks >> Majid >> >> >> ___ >> lustre-discuss mailing >> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Can Lustre Be Mount to Kuberentes Pod
Forrest, You should be able to define the Lustre mount point on the host system to the Kuberentes pod configuration the same way you would for a nfs mount. I believe you define a directory on the host for the pod to use and that directory could be local, nfs, Lustre, etc. As for persistence, it would be as persistent as your Lustre configuration, network, etc is stable. --Jeff On Mon, Sep 18, 2017 at 20:03 <forrest.wc.l...@dell.com> wrote: > Hi Lustre experts: > > > > Can Lustre file system be mounted to Kuberentes Pods as a persistent > volumes ? > > Thanks, > > Forrest Ling (凌巍才) > > +86 18600622522 <%2B86%2018600622522> > > Dell HPC Product Technologist Greater China > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Best way to run serverside 2.8 w. MOFED 4.1 on Centos 7.2
John, You can rebuild 2.8 against MOFED. 1) Install MOFED version of choice. 2) Pull down the 2.8 Lustre source and configure with '--with-o2ib=/usr/src/ofa_kernel/default'. 3) `make rpms` 4) Install. 5) Profit. --Jeff On Fri, Aug 18, 2017 at 9:41 AM, john casu <j...@chiraldynamics.com> wrote: > I have an existing 2.8 install that broke when we added MOFED into the mix. > > Nothing I do wrt installing 2.8 rpms works to fix this, and I get a couple > of missing symbole, when I install lustre-modules: > depmod: WARNING: /lib/modules/3.10.0-327.3.1.el > 7_lustre.x86_64/extra/kernel/net/lustre/ko2iblnd.ko needs unknown symbol > ib_query_device > depmod: WARNING: /lib/modules/3.10.0-327.3.1.el > 7_lustre.x86_64/extra/kernel/net/lustre/ko2iblnd.ko needs unknown symbol > ib_alloc_pd > > I'm assuming the issue is that lustre 2.8 is built using the standard > Centos 7.2 infiniband drivers. > > I can't move to Centos 7.3, at this time. Is there any way to get 2.8 up > & running w. mofed without rebuilding lustre rpms? > > If I have to rebuild, it'd probably be easier to go to 2.10 (and zfs > 0.7.1). Is that a correct assumption? > Or will the 2.10 rpms work on Centps 7.2? > > thanks, > -john c > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre 2.10 on RHEL6.x?
I'm going to be testing an upgrade of a filled 2.9/0.6.5.7/CentOS6.x LFS to 2.10/0.7/CentOS6.9. I will report back results to the mailing list when it is completed. --Jeff On Mon, Aug 7, 2017 at 06:50 E.S. Rosenberg <esr+lus...@mail.hebrew.edu> wrote: > We created a test system that was installed with CentOS 6.x and Lustre 2.8 > filled with some data and subsequently reinstalled with CentOS 7.x and > Lustre 2.9 > > Everything seems to have gone fine but I am actually curious if anyone > else did this pretty invasive upgrade? (Hoping to upgrade in the > not-to-distant future, maybe even directly to 2.10) > > Thanks, > Eli > > On Mon, Aug 7, 2017 at 4:46 PM, Jones, Peter A <peter.a.jo...@intel.com> > wrote: > >> Correct – RHEL 6.x support appeared for the last time in the community >> 2.8 release. However, there has been some interest in seeing some kind of >> support for RHEL 6.x in the 2.10 LTS releases so I think it likely that at >> least support for clients will be reintroduced in a future 2.10.x >> maintenance release. >> >> On 8/7/17, 6:34 AM, "lustre-discuss on behalf of E.S. Rosenberg" < >> lustre-discuss-boun...@lists.lustre.org on behalf of >> esr+lus...@mail.hebrew.edu> wrote: >> >> If I'm not mistaken they haven't provided RPMs for RHEL6.x since 2.9... >> HTH, >> Eli >> >> On Mon, Aug 7, 2017 at 4:33 PM, Steve Barnet <bar...@icecube.wisc.edu> >> wrote: >> >>> Hey all, >>> >>> I am looking to upgrade from lustre 2.8 to 2.10. I see that >>> there are no pre-built RPMs for 2.10 on RHEL6.x families. >>> >>> Did I miss them, or will I need to build from source (or >>> upgrade to Centos 7)? >>> >>> Thanks much! >>> >>> Best, >>> >>> ---Steve >>> >>> ___ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> >> >> > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre and OFED
Eli, The biggest driver is usually the drivers. Newer Mellanox hardware not yet supported, or supported well, by kernel IB. Way back in the days of old there were some interoperability issues where everything (clients and servers) needed to be the same drivers and libraries but much of that was cleaned up. There could be situations where OFED is needed on the server side to support something under the Lustre layer like OST or MDT block devices via iSER, SRP, NVMeF, etc. There may be other reasons but those are off the top of my head. --Jeff On Thu, Jul 27, 2017 at 4:55 PM, E.S. Rosenberg <esr+lus...@mail.hebrew.edu> wrote: > Hi all, > > How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every > once in a while and that got me thinking a bit. > > What things are gained by installing OFED? Performance? Accurate traffic > reports? > > Currently I am using a lustre system without OFED but our IB hardware is > from the FDR generation so not bleeding edge and probably doesn't need OFED > because of that > > Thanks, > Eli > > Tech specs: > Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs) > Clients: Debian + kernel 4.2 + Lustre 2.8 > IB: ConnectX-3 FDR > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
Jessica, You are getting a NID registering twice. Doug noticed and pointed it out. I'd look to see if that is one machine doing something twice or two machines with the same NID. --Jeff On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonu...@gmail.com> wrote: > Greetings Jessica, > > I'm not sure I am correctly understanding the behavior "robinhood activity > floods the MDT". The robinhood program as you (and I) are using it is > consuming the MDT CHANGELOG via a reader_id which was assigned when the > CHANGELOG was enabled on the MDT. You can check the MDS for these readers > via "lctl get_param mdd.*.changelog_users". Each CHANGELOG reader must > either be consumed by a process or destroyed otherwise the CHANGELOG will > grow until it consumes sufficient space to stop the MDT from functioning > correctly. So robinhood should consume and then clear the CHANGELOG via > this reader_id. This implementation of robinhood is actually a rather > light-weight process as far as the MDS is concerned. The load issues I > encountered were on the robinhood server itself which is a separate server > from the Lustre MGS/MDS server. > > Just curious, have you checked for multiple reader_id's on your MDS for > this Lustre file system? > > P.S. My robinhood configuration file is using nb_threads = 8, just for a > data point. > > Cheers, > megan > > On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jo...@nrao.edu> wrote: > >> Hi Megan, >> >> Thanks for your input. We use percona, a drop-in replacement for mysql... >> The robinhood activity floods the MDT, but it does not seem to produce any >> excessive load on the robinhood box... >> >> Anyway, FWIW... >> >> ~]# mysql --version >> mysql Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline >> 5.1 >> >> Product: robinhood >> Version: 3.0-1 >> Build: 2017-03-13 10:29:26 >> >> Compilation switches: >> Lustre filesystems >> Lustre Version: 2.5 >> Address entries by FID >> MDT Changelogs supported >> >> Database binding: MySQL >> >> RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64 >> Lustre rpms: >> >> lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 >> lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 >> >> On 5/18/17 11:55 AM, Ms. Megan Larko wrote: >> >> With regards to (WRT) Subject "Robinhood exhausting RPC resources against >> 2.5.5 lustre file systems", what version of robinhood and what version of >> MySQL database? I mention this because I have been working with >> robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on >> kernel-2.6.32-573 and had issues in which the robinhood server consumed >> more than the total amount of 32 CPU cores on the robinhood server (with >> 128 G RAM) and would functionally hang the robinhood server. The issue >> was solved for me by changing to MySQL-5.6.35. It was the "sort" command >> in robinhood that was not working well with the MySQL-5.5.32. >> >> Cheers, >> megan >> >> >> > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre 2.9 performance issues
While tuning can alleviate some pain it shouldn't go without mentioning that there are some operations that are just less than optimal on a parallel file system. I'd bet a cold one that a copy to local /tmp, vim/paste, copy back to the LFS would've been quicker. Some single-threaded small i/o operations can be approached more efficiently in a similar manner. Lustre is a fantastic tool and like most tools it doesn't do everything well..*yet* --Jeff On Thu, Apr 27, 2017 at 4:21 PM, Dilger, Andreas <andreas.dil...@intel.com> wrote: > On Apr 25, 2017, at 13:11, Bass, Ned <ba...@llnl.gov> wrote: > > > > Hi Darby, > > > >> -Original Message- > >> > >> for i in $(seq 0 99) ; do > >> dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1 > >> done > >> > >> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges > from 20 > >> to 60 sec on our newer 2.9 LFS. > > > > Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements > fsync() by > > waiting for an entire transaction group to get written out. This can > incur long > > delays on a busy filesystem as the transaction groups become quite > large. Work > > on implementing ZIL support is being tracked in LU-4009 but this feature > is not > > expected to make it into the upcoming 2.10 release. > > There is also the patch that was developed in the past to test this: > https://review.whamcloud.com/7761 "LU-4009 osd-zfs: Add tunables to > disable sync" > which allows disabling ZFS to wait for TXG commit for each sync on the > servers. > > That may be an acceptable workaround in the meantime. Essentially, > clients would > _start_ a sync on the server, but would not wait for completion before > returning > to the application. Both the client and the OSS would need to crash > within a few > seconds of the sync for it to be lost. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Intel Corporation > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] design to enable kernel updates
> > this month and last). We are in the process of syncing our existing LFS > > to this new one and I've failed over/rebooted/upgraded the new LFS > servers > > many times now to make sure we can do this in practice when the new LFS > goes > > into production. Its working beautifully. > > > > Many thanks to the lustre developers for their continued efforts. We > have > > been using and have been fans of lustre for quite some time now and it > > just keeps getting better. > > > > -Original Message- > > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on > behalf of Ben Evans <bev...@cray.com> > > Date: Monday, February 6, 2017 at 2:22 PM > > To: Brian Andrus <toomuc...@gmail.com>, "lustre-discuss@lists.lustre.org" > <lustre-discuss@lists.lustre.org> > > Subject: Re: [lustre-discuss] design to enable kernel updates > > > > It's certainly possible. When I've done that sort of thing, you upgrade > > the OS on all the servers first, boot half of them (the A side) to the > new > > image, all the targets will fail over to the B servers. Once the A side > > is up, reboot the B half to the new OS. Finally, do a failback to the > > "normal" running state. > > > > At least when I've done it, you'll want to do the failovers manually so > > the HA infrastructure doesn't surprise you for any reason. > > > > -Ben > > > > On 2/6/17, 2:54 PM, "lustre-discuss on behalf of Brian Andrus" > > <lustre-discuss-boun...@lists.lustre.org on behalf of > toomuc...@gmail.com> > > wrote: > > > >> All, > >> > >> I have been contemplating how lustre could be configured such that I > >> could update the kernel on each server without downtime. > >> > >> It seems this is _almost_ possible when you have a san system so you > >> have failover for OSTs and MDTs. BUT the MGS/MGT seems to be the > >> problematic one, since rebooting that seems cause downtime that cannot > >> be avoided. > >> > >> If you have a system where the disks are physically part of the OSS > >> hardware, you are out of luck. The hypothetical scenario I am using is > >> if someone had a VM that was a qcow image on a lustre mount (basically > >> an active, open file being read/written to continuously). How could > >> lustre be built to ensure anyone on the VM would not notice a kernel > >> upgrade to the underlying lustre servers. > >> > >> > >> Could such a setup be done? It seems that would be a better use case for > >> something like GPFS or Gluster, but being a die-hard lustre enthusiast, > >> I want to at least show it could be done. > >> > >> > >> Thanks in advance, > >> > >> Brian Andrus > >> > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LNET Self-test
Without seeing your entire command it is hard to say for sure but I would make sure your concurrency option is set to 8 for starters. --Jeff Sent from my iPhone > On Feb 5, 2017, at 11:30, Jon Tegnerwrote: > > Hi, > > I'm trying to use lnet selftest to evaluate network performance on a test > setup (only two machines). Using e.g., iperf or Netpipe I've managed to > demonstrate the bandwidth of the underlying 10 Gbits/s network (and typically > you reach the expected bandwidth as the packet size increases). > > How can I do the same using lnet selftest (i.e., verifying the bandwidth of > the underlying hardware)? My initial thought was to increase the I/O size, > but it seems the maximum size one can use is "--size=1M". > > Thanks, > > /jon > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre won't build anymore on RHEL 7.3
/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: expected 'void *' but argument is of type 'int' > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9: > > error: too few arguments to function 'rdma_create_id' > > cmid = kiblnd_rdma_create_id(kiblnd_dummy_callback, dev, > > RDMA_PS_TCP, > > ^ > > In file included from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: declared here > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In > > function 'kiblnd_dev_failover': > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9: > > error: passing argument 1 of 'rdma_create_id' from incompatible pointer > > type [-Werror] > > cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev, > RDMA_PS_TCP, > > ^ > > In file included from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: expected 'struct net *' but argument is of type 'int (*)(struct > > rdma_cm_id *, struct rdma_cm_event *)' > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9: > > error: passing argument 2 of 'rdma_create_id' from incompatible pointer > > type [-Werror] > > cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev, > RDMA_PS_TCP, > > ^ > > In file included from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: expected 'rdma_cm_event_handler' but argument is of type 'struct > > kib_dev_t *' > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9: > > error: passing argument 3 of 'rdma_create_id' makes pointer from integer > > without a cast [-Werror] > > cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev, > RDMA_PS_TCP, > > ^ > > In file included from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: expected 'void *' but argument is of type 'int' > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9: > > error: too few arguments to function 'rdma_create_id' > > cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev, > RDMA_PS_TCP, > > ^ > > In file included from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > from > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > note: declared here > > struct rdma_cm_id *rdma_create_id(struct net *net, > > ^ > > cc1: all warnings being treated as errors > > make[7]: *** > > [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.o] Error 1 > > make[6]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd] > Error 2 > > make[5]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds] Error 2 > > make[4]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet] Error 2 > > make[4]: *** Waiting for unfinished jobs > > make[3]: *** [_module_/root/rpmbuild/BUILD/lustre-2.8.0] Error 2 > > make[2]: *** [modules] Error 2 > > make[1]: *** [all-recursive] Error 1 > > make: *** [all] Error 2 > > error: Bad exit status from /var/tmp/rpm-tmp.mYkfwi (%build) > > > > > > RPM build errors: > > Bad exit status from /var/tmp/rpm-tmp.mYkfwi (%build) > > > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] poor performance on reading small files
On 8/3/16 10:57 AM, Dilger, Andreas wrote: On Jul 29, 2016, at 03:33, Oliver Mangold <oliver.mang...@emea.nec.com> wrote: On 29.07.2016 04:19, Riccardo Veraldi wrote: I am using lustre on ZFS. While write performances are excellent also on smaller files, I find there is a drop down in performance on reading 20KB files. Performance can go as low as 200MB/sec or even less. Getting 200 MB/s with 20kB files means you have to do 1 metadata ops/s. Don't want to say it is impossible to get more than that, but at least with MDT on ZFS this doesn't sound bad either. Did you run an mdtest on your system? Maybe some serious tuning of MD performance is in order. I'd agree with Oliver that getting 200MB/s with 20KB files is not too bad. Are you using HDDs or SSDs for the MDT and OST devices? If using HDDs, are you using SSD L2ARC to allow the metadata and file data be cached in L2ARC, and allowing enough time for L2ARC to be warmed up? Are you using TCP or IB networking? If using TCP then there is a lower limit on the number of RPCs that can be handled compared to IB. Cheers, Andreas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Also consider that 20KB of data per lnet RPC, assuming a 1MB RPC, to move 20KB files at 200MB/sec into a non-striped LFS directory you are using EDR for lnet? 100GB Ethernet? --Jeff -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] Lustre Server Sizing
Indivar, Since your CIFS or NFS gateways operate as Lustre clients there can be issues with running multiple NFS or CIFS gateway machines frontending the same Lustre filesystem. As Lustre clients there are no issues in terms of file locking but the NFS and CIFS caching and multi-client file access mechanics don't interface with Lustre's file locking mechanics. Perhaps that may have changed recently and a developer on the list may comment on developments there. So while you could provide client access through multiple NFS or CIFS gateway machines there would not be much in the way of file locking protection. There is a way to configure pCIFS with CTDB and get close to what you envision with Samba. I did that configuration once as a proof of concept (no valuable data). It is a *very* complex configuration and based on the state of software when I did it I wouldn't say it was a production grade environment. As I said before, my understanding may be a year out of date and someone else could speak to the current state of things. Hopefully that would be a better story. --Jeff On Tue, Jul 21, 2015 at 10:26 AM, Indivar Nair indivar.n...@techterra.in wrote: Hi Scott, The 3 - SAN Storages with 240 disks each has its own 3 NAS Headers (NAS Appliances). However, even with 240 10K RPM disk and RAID50, it is only providing around 1.2 - 1.4GB/s per NAS Header. There is no clustered file system, and each NAS Header has its own file-system. It uses some custom mechanism to present the 3 file systems as single name space. But the directories have to be manually spread across for load-balancing. As you can guess, this doesn't work most of the time. Many a times, most of the compute nodes access a single NAS Header, overloading it. The customer wants *at least* 9GB/s throughput from a single file-system. But I think, if we architect the Lustre Storage correctly, with these many disks, we should get at least 18GB/s throughput, if not more. Regards, Indivar Nair On Tue, Jul 21, 2015 at 10:15 PM, Scott Nolin scott.no...@ssec.wisc.edu wrote: An important question is what performance do they have now, and what do they expect if converting it to Lustre. Our more basically, what are they looking for in general in changing? The performance requirements may help drive your OSS numbers for example, or interconnect, and all kinds of stuff. Also I don't have a lot of experience with NFS/CIFS gateways, but that is perhaps it's own topic and may need some close attention. Scott On 7/21/2015 10:57 AM, Indivar Nair wrote: Hi ..., One of our customers has a 3 x 240 Disk SAN Storage Array and would like to convert it to Lustre. They have around 150 Workstations and around 200 Compute (Render) nodes. The File Sizes they generally work with are - 1 to 1.5 million files (images) of 10-20MB in size. And a few thousand files of 500-1000MB in size. Almost 50% of the infra is on MS Windows or Apple MACs I was thinking of the following configuration - 1 MDS 1 Failover MDS 3 OSS (failover to each other) 3 NFS+CIFS Gateway Servers FDR Infiniband backend network (to connect the Gateways to Lustre) Each Gateway Server will have 8 x 10GbE Frontend Network (connecting the clients) *Option A* 10+10 Disk RAID60 Array with 64KB Chunk Size i.e. 1MB Stripe Width 720 Disks / (10+10) = 36 Arrays. 12 OSTs per OSS 18 OSTs per OSS in case of Failover *Option B* 10+10+10+10 Disk RAID60 Array with 128KB Chunk Size i.e. 4MB Stripe Width 720 Disks / (10+10+10+10) = 18 Arrays 6 OSTs per OSS 9 OSTs per OSS in case of Failover 4MB RPC and I/O *Questions* 1. Would it be better to let Lustre do most of the striping / file distribution (as in Option A) OR would it be better to let the RAID Controllers do it (as in Option B) 2. Will Option B allow us to have lesser CPU/RAM than Option A? Regards, Indivar Nair ___ HPDD-discuss mailing list hpdd-disc...@lists.01.org https://lists.01.org/mailman/listinfo/hpdd-discuss ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA
Why choose? Why not install a lnet router QDR-10GbE or dual home your MDS OSS nodes with QDR and a 10GbE nic? --Jeff On Fri, Jun 19, 2015 at 9:10 AM, INKozin i.n.ko...@googlemail.com wrote: I know that QDR IB gives the best bang for buck currently and that's what we have now. However due to various reasons we are looking at alternatives hence the question. Thank you very much for your information, Ben. On 19 June 2015 at 16:24, Ben Evans bev...@cray.com wrote: It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years) Balancing your network performance with what your disks can provide is a whole other level of system design and implementation. You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck. You can stack up enough of both so that the PCIE bus is your bottleneck. Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can. -Ben Evans *From:* igk...@gmail.com [mailto:igk...@gmail.com] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 11:10 AM *To:* Ben Evans *Cc:* lustre-discuss@lists.lustre.org *Subject:* Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA Ben, is it possible to quantify faster? Understandably, for a single client on an empty cluster it may feel faster but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content. On 19 June 2015 at 14:46, Ben Evans bev...@cray.com wrote: It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer. As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware. It’s still pretty new, though, so you might have some speedbumps. -Ben Evans *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On Behalf Of *INKozin *Sent:* Friday, June 19, 2015 5:43 AM *To:* lustre-discuss@lists.lustre.org *Subject:* [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated. Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf where they claim comparable performance of both. How much worse the throughput on small block sizes would be without iWARP? Thank you Igor ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] problem getting high performance output to single file
David, What interconnect are you using for Lustre? ( IB/o2ib [fdr,qdr,ddr], Ethernet/tcp [40GbE,10Gbe,1GbE] ). You can run 'lctl list_nids' and see what protocol lnet is binding to, then look at that interface for the specific type. Also, do you know anything about the server side of your Lustre FS? What make/model of block devices are used in OSTs? --Jeff On 5/19/15 9:05 AM, Schneider, David A. wrote: Thanks, for the client, where I am running from, I have $ cat /proc/fs/lustre/version lustre: 2.1.6 kernel: patchless_client build: jenkins--PRISTINE-2.6.18-348.4.1.el5 best, David Schneider From: Patrick Farrell [p...@cray.com] Sent: Tuesday, May 19, 2015 9:03 AM To: Schneider, David A.; John Bauer; lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] problem getting high performance output to single file For the clients, cat /proc/fs/lustre/version For the servers, it¹s the same, but presumably you don¹t have access. On 5/19/15, 11:01 AM, Schneider, David A. david...@slac.stanford.edu wrote: Hi, My first test was just to do the for loop where I allocate a 4MB buffer, initialize it, and delete it. That program ran at about 6GB/sec. Once I write to a file, I drop down to 370mb/sec. Our top performance for I/O to one file has been about 400 mb/sec. For this question: Which versions are you using in servers and clients? I don't know what command to determine this, I suspect it is older since we are on red hat 5. I will ask. best, David Schneider From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of John Bauer [bau...@iodoctors.com] Sent: Tuesday, May 19, 2015 8:52 AM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] problem getting high performance output to single file David You note that you write a 6GB file. I suspect that your Linux systems have significantly more memory than 6GB meaning your file will end being cached in the system buffers. It wont matter how many OSTs you use as you probably are not measuring the speed to the OST's, but rather, you are measuring the memory copy speed. What transfer rate are you seeing? John On 5/19/2015 10:40 AM, Schneider, David A. wrote: I am trying to get good performance with parallel writing to one file through MPI. Our cluster has high performance when I write to separate files, but when I use one file - I see very little performance increase. As I understand, our cluster defaults to use one OST per file. There are many OST's though, which is how we get good performance when writing to multiple files. I have been using the command lfs setstripe to change the stripe count and block size. I can see that this works, when I do lfs getstripe, I see the output file is striped, but I'm getting very little I/O performance when I create the striped file. When working from hdf5 and mpi, I have seen a number of references about tuning parameters, I haven't dug into this yet. I first want to make sure lustre has the high output performance at a basic level. I tried to write a C program uses simple POSIX calls (open and looping over writes) but I don't see much increase in performance (I've tried 8 and 19 OST's, 1MB and 4MB chunks, I write a 6GB file). Does anyone know if this should work? What is the simplest C program I could write to see an increase in output performance after I stripe? Do I need separate processes/threads with separate file handles? I am on linux red hat 5. I'm not sure what version of lustre this is. I have skimmed through a 450 page pdf of lustre documentation, I saw references to destructive testing one does in the beginning, but I'm not sure what I can do now. I think this is the first work we've done to get high performance when writing a single file, so I'm worried there is something buried in the lustre configuration that needs to be changed. I can run /usr/sbin/lcntl, maybe there are certain parameters I should check? best, David Schneider ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- I/O Doctors, LLC 507-766-0378 bau...@iodoctors.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite
Re: [Lustre-discuss] only root can read/write on new lustre filesystem (Lustre 2.6)
Does your mds node have access to your non-privileged users/groups identity data by way of methods like ldap or local files (/etc/passwd, /etc/group, etc)? Your clients and mds need to be on the same sheet of music. --Jeff On Wed, Feb 4, 2015 at 5:19 PM, No One jc.listm...@gmail.com wrote: I'm sure I've overlooked something in the documentation, but for the life of me, I can't figure out what. I've setup a new cluster running 2.6 and everything seemed to go fine. I've got the filesystem mounted on a few clients and as long as I am root, I can read and write to it just fine. If I switch to another user, I get something like this: -bash-4.1$ ls -al ls: cannot access test: Permission denied total 8 drwxr-xr-x 4 root root 4096 Feb 5 00:58 . drwxr-xr-x 3 root root 4096 Feb 4 23:05 .. d? ? ?? ?? test if I am root though, it looks fine: [cvt]# ls -al total 12 drwxr-xr-x 4 root root 4096 Feb 5 00:58 . drwxr-xr-x 3 root root 4096 Feb 4 23:05 .. drwxr-xr-x 2 test test 4096 Feb 5 01:13 test No amount of chmod'ing or chown'ing has worked to resolve this. I know I've seen this before and I feel like it was in the context of NFS, but I'm not finding it. I could use any help/advice to figure this out. Thanks! ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 2.5.2 Client Errors
Murshid, Does the error message actually have Tell Peter, lookup on mtpt, it open text in it? If so, one of the funnier Lustre error messages to be sure. --Jeff On Fri, Aug 22, 2014 at 12:28 AM, Murshid Azman murshid.az...@gmail.com wrote: Hello Everyone, We're trying to run a cluster image on Lustre filesystem version 2.5.2 and repeatedly seeing the following message. Haven't seen anything bizarre on this machine other than this: 2014-08-22T13:52:01+07:00 node01 kernel: LustreError: 4271:0:(namei.c:530:ll_lookup_ it()) Tell Peter, lookup on mtpt, it open 2014-08-22T13:52:01+07:00 node01 kernel: LustreError: 4271:0:(namei.c:530:ll_lookup_it()) Skipped 128 previous similar messages This doesn't happen to our desktop Lustre clients. I'm wondering if anyone has any idea what this means. Thanks, Murshid Azman. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST Failover Configuration (Active/Active) verification
Peter, You have it about half right. Since you are dealing with active filesystems on the OST shared storage devices and you can never truly predict the type of failure a system will have you need to add a software command/control layer that will manage failover, remounting storage and cutting power to the failing node. Failure to do that could result in a split brain situation where your OST backend filesystem gets corrupted. You have the mkfs.lustre part right. You need to add heartbeat/corosync to the configuration and configure it so the two systems monitor each other with a watchdog heartbeat. Failing machine gets sensed by healthy machine and the healthy machine shoots it (STONITH: shoot the other node in the head) via ipmi power control or a smart rack PDU like an APC ethernet managed PDU. The heartbeat/corosync config takes your existing config and adds automated directives like: node1 mount sdb : node2 mounts sdc if node1 dies node2 mounts sdb if node2 dies node1 mounts sdc if surviving node senses restoration of heartbeat local defined storage gets remounted to owning node Intel's IEEL Lustre distribution does all of this sort of thing automagically. Or you can manually install Lustre and the corosync app packages and configure it manually. --Jeff On Thu, Jan 30, 2014 at 9:21 PM, Peter Mistich peter.mist...@rackspace.comwrote: hello, anyone here can answer a questions about OST Failover Configuration (Active/Active) I think I understand but want to make sure. I configure 2 oss servernames = node1 and node2 with 2 shared drives /dev/sdb and /dev/sdc and on node1 I run the command on node1 mkfs.lustre --fsname=testfs --ost --failnode=node2 --mgsnode=msg /dev/sdb I run the command on node2 mkfs.lustre --fsname=testfs --ost --failnode=node1 --mgsnode=msg /dev/sdc I mount /dev/sdb on node 1 and mount /dev/sdc on node2 if node1 fails then I just mount /dev/sdb on node2 and that is how active/active works is this correct ? Thanks, Pete ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre
://www.clustertech.com Address: Units 211 - 213, Lakeside 1, No. 8 Science Park West Avenue, Hong Kong Science Park, N.T. Hong Kong Hong Kong Beijing Shanghai Guangzhou Shenzhen Wuhan Sydney ** The information contained in this e-mail and its attachments is confidential and intended solely for the specified addressees. If you have received this email in error, please do not read, copy, distribute, disclose or use any information of this email in any way and please immediately notify the sender and delete this email. Thank you for your cooperation. ** ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4
Hola Eduardo, How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)? Are there any non-Lustre errors in the dmesg output of the OSS? Block devices error on the OSS (/dev/sd?)? If you are losing [scsi,sas,fc,srp] connectivity you may see this sort of thing. If the OSTs are connected to the OSS node via IB SRP and your IB fabric gets busy or you have subnet manager issues you might see a condition like this. Is this the AliceFS at DGTIC? --Jeff On 10/17/13 3:52 PM, Eduardo Murrieta wrote: Hello, this is my first post on this list, I hope someone can give me some advise on how to resolve the following issue. I'm using the lustre release 2.4.0 RC2 compiled from whamcloud sources, this is an upgrade from lustre 2.2.22 from same sources. The situation is: There are several clients reading files that belongs mostly to the same OST, afther a period of time the clients starts loosing contact with this OST and processes stops due to this fault, here is the state for such OST on one client: client# lfs check servers ... ... lustre-OST000a-osc-8801bc548000: check error: Resource temporarily unavailable ... ... checking dmesg on client and OSS server we have: client# dmesg LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with 10.2.2.3@o2ib, operation ost_connect failed with -16. LustreError: Skipped 24 previous similar messages OSS-server# dmesg Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) reconnecting Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs At this moment I can ping from client to server and vice versa, but some time this call also hangs on server and client. client# # lctl ping OSS-server@o2ib 12345-0@lo 12345-OSS-server@o2ib OSS-server# lctl ping 10.2.64.4@o2ib 12345-0@lo 12345-10.2.64.4@o2ib This situation happens very frequently and specially with jobs that process a lot of files in an average size of 100MB. The only solution that I find to reestablish the communication between the server and the client is restarting both machines. I hope some have an idea what is the reason for the problem and how can I reset the communication with the clients without restarting the machines. thank you, Eduardo UNAM@Mexico -- Eduardo Murrieta Unidad de Cómputo Instituto de Ciencias Nucleares, UNAM Ph. +52-55-5622-4739 ext. 5103 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4
Ah, I understand. I performed the onsite Lustre installation of Alice and worked with JLG and his staff. Nice group of people! This seems like a backend issue. Ldiskfs or the LSI RAID devices. Do you see any read/write failures reported on the OSS of the sd block devices where the OSTs reside? Something is timing out; disk I/O or the OSS is running too high of an iowait under load. How many OSS nodes in the filesystem? Are these operations striped across all OSTs? Across multiple OSSs? I still have an account on DGTIC's gateway, I could login and look. :-) --Jeff On Thursday, October 17, 2013, Eduardo Murrieta wrote: Hello Jeff, Non, this is a lustre filesystem for Instituto de Ciencias Nucleares at UNAM, we are working on the installation for Alice at DGTIC too, but this problem is with our local filesystem. The OST is connected using a LSI-SAS controller, we have 8 OSTs on the same server, there are nodes that loose connection with all the OSTs that belong to this server but the problem is not related with the OST-OSS communication, since I can access this OST and read files stored there from other lustre clients. The problem is a deadlock condition in which the OSS and some clients refuse connections from each other as I can see from dmesg: in the client LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with 10.2.2.3@o2ib, operation ost_connect failed with -16. in the server Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) reconnecting Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs this only happen with clients that are reading a lot of small files (~100MB each) in the same OST. thank you, Eduardo 2013/10/17 Jeff Johnson jeff.john...@aeoncomputing.com Hola Eduardo, How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)? Are there any non-Lustre errors in the dmesg output of the OSS? Block devices error on the OSS (/dev/sd?)? If you are losing [scsi,sas,fc,srp] connectivity you may see this sort of thing. If the OSTs are connected to the OSS node via IB SRP and your IB fabric gets busy or you have subnet manager issues you might see a condition like this. Is this the AliceFS at DGTIC? --Jeff On 10/17/13 3:52 PM, Eduardo Murrieta wrote: Hello, this is my first post on this list, I hope someone can give me some advise on how to resolve the following issue. I'm using the lustre release 2.4.0 RC2 compiled from whamcloud sources, this is an upgrade from lustre 2.2.22 from same sources. The situation is: There are several clients reading files that belongs mostly to the same OST, afther a period of time the clients starts loosing contact with this OST and processes stops due to this fault, here is the state for such OST on one client: client# lfs check servers ... ... lustre-OST000a-osc-8801bc548000: check error: Resource temporarily unavailable ... ... checking dmesg on client and OSS server we have: client# dmesg LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with 10.2.2.3@o2ib, operation ost_connect failed with -16. LustreError: Skipped 24 previous similar messages OSS-server# dmesg Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) reconnecting Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs At this moment I can ping from client to server and vice versa, but some time this call also hangs on server and client. client# # lctl ping OSS-server@o2ib 12345-0@lo 12345-OSS-server@o2ib OSS-server# lctl ping 10.2.64.4@o2ib 12345-0@lo 12345-10.2.64.4@o2ib This situation happens very frequently and specially with jobs that process a lot of files in an average size of 100MB. The only solution that I find to reestablish the communication between the server and the client is restarting both machines. I hope some have an idea what is the reason for the problem and how can I reset the communication with the clients without restarting the machines. thank you, Eduardo UNAM@Mexico -- Eduardo Murrieta Unidad de Cómputo Instituto de Ciencias Nucleares, UNAM Ph. +52-55-5622-4739 ext. 5103 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-performance Computing / Lustre Filesystems / Scale-out Storage
Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4
Eduardo, One or two E5506 CPUs in the OSS? What is the specific LSI controller and how many of them in the OSS? I think the OSS is under provisioned for 8 OSTs. I'm betting you run a high iowait on those sd devices during your problematic run. The iowait probably grows until deadlock. Can you run the job while running a shell with top on the OSS. You're likely hitting 99% iowait. --Jeff On Thursday, October 17, 2013, Eduardo Murrieta wrote: I have this on the debug_file from my OSS: 0010:02000400:0.0:1382055634.785734:0:3099:0:(ost_handler.c:940:ost_brw_read()) lustre-OST: Bulk IO read error with 0afb2e4c-d 870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib), client will retry: rc -107 0400:02000400:0.0:1382055634.786061:0:3099:0:(watchdog.c:411:lcw_update_time()) Service thread pid 3099 completed after 227.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). But I can read without problems files stored on this ODT from other clients. For example: $ lfs find --obd lustre-OST . ./src/BLAS/srot.f ... $ more ./src/BLAS/srot.f SUBROUTINE SROT(N,SX,INCX,SY,INCY,C,S) * .. Scalar Arguments .. REAL C,S INTEGER INCX,INCY,N * .. * .. Array Arguments .. REAL SX(*),SY(*) ... ... This OSS have 8 ODTs of 14 TB each, with 12 GB/RAM and Xeon Quad Core E5506. Tomorrow I'll increase the memory, if this is the missing resource. 2013/10/17 Joseph Landman land...@scalableinformatics.com Are there device or Filesystem level error messages on the server? This almost looks like a corrupted file system. Please pardon brevity and typos ... Sent from my iPhone On Oct 17, 2013, at 6:11 PM, Eduardo Murrieta emurri...@nucleares.unam.mx wrote: Hello Jeff, Non, this is a lustre filesystem for Instituto de Ciencias Nucleares at UNAM, we are working on the installation for Alice at DGTIC too, but this problem is with our local filesystem. The OST is connected using a LSI-SAS controller, we have 8 OSTs on the same server, there are nodes that loose connection with all the OSTs that belong to this server but the problem is not related with the OST-OSS communication, since I can access this OST and read files stored there from other lustre clients. The problem is a deadlock condition in which the OSS and some clients refuse connections from each other as I can see from dmesg: in the client LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with 10.2.2.3@o2ib, operation ost_connect failed with -16. in the server Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) reconnecting Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs this only happen with clients that are reading a lot of small files (~100MB each) in the same OST. thank you, Eduardo 2013/10/17 Jeff Johnson jeff.john...@aeoncomputing.com Hola Eduardo, How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)? Are there any non-Lustre errors in the dmesg output of the OSS? Block devices error on the OSS (/dev/sd?)? If you are losing [scsi,sas,fc,srp] connectivity you may see this sort of thing. If the OSTs are connected to the OSS node via IB SRP and your IB fabric gets busy or you have subnet manager issues you might see a condition like this. Is this the AliceFS at DGTIC? --Jeff On 10/17/13 3:52 PM, Eduardo Murrieta wrote: Hello, this is my first post on this list, I hope someone can give me some advise on how to resolve the following issue. I'm using the lustre release 2.4.0 RC2 compiled from whamcloud sources, this is an upgrade from lustre 2.2.22 from same sources. The situation is: There are several clients reading files that belongs mostly to the same OST, afther a period of time the clients starts loosing contact with this OST and processes stops due to this fault, here is the state for such OST on one client: client# lfs check servers ... ... lustre-OST000a-osc-8801bc548000: check error: Resource temporarily unavailable ... ... checking dmesg on client and OSS server we have: client# dmesg LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with 10.2.2.3@o2ib, operation ost_connect failed with -16. LustreError: Skipped 24 previous similar messages OSS-server# dmesg Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) reconnecting Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs At this moment I can ping from client to server and vice versa, but some time this call also hangs on server and client. client# # lctl ping OSS-server@o2ib 12345-0@lo 12345-OSS-server@o2ib
Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?
On 10/7/13 11:23 AM, Anjana Kar wrote: Here is the exact command used to create a raidz2 pool with 8+2 drives, followed by the error messages: mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm /dev/sdo /dev/sdq /dev/sds Additional suggestion. You should make zfs/zpools with persistent device names like /dev/disk/by-path or /dev/disk/by-id. Standard 'sd' device names are not persistent and could change after a reboot or hardware change. This would be bad for a zpool with data. Also, I don't know if its just email formatting but be sure that command is all on one line: mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs --index=0 \ --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2 /dev/sda /dev/sdc \ /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm /dev/sdo /dev/sdq /dev/sds --Jeff -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-performance Computing / Lustre Filesystems / Scale-out Storage ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Known working 1.8.x client mouting 2.x server combinations?
Greetings, Is there a table of known stable client to server combinations for a 1.8.x client to mount a 2.x LFS? I'm assisting a group trying to mount two LFSs, one is a 1.8.7 LFS and the other is a 2.1.6 LFS. The client access is over tcp (10GbE). I installed the latest 1.8.9 lustre-client for CentOS6 on a test client. I am able to mount the 1.8.7 LFS with no problem. If I umount the 1.8.7 LFS and mount the 2.1.6 LFS the client machine deadlocks instantly. No error messages, after 2-3mins the workstation does a hardware reset. No log entries. The ultimate goal is being able to mount both the 1.8.7 and 2.1.6 LFSs simultaneously from the clients over tcp. I understand that newer server code is friendlier to older client code but how new? 2.1.x tree or do I have to be in 2.3 tree to get this ability? Thanks.. -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Completely lost MGT/MDT
I am not aware of any tool or method to recover from a lost MGT/MDT. Do you have any recent backups of your MDT device? I would hold on to your MDT device with care and see if someone can help you resurrect it. --Jeff On 6/26/13 3:01 PM, Andrus, Brian Contractor wrote: All, We have a sizeable filesystem and during a hardware upgrade, our MDT disk was completely lost. I am trying to find if and how to recover from such an event, but am not finding anything. We were running lustre 2.3 and have upgraded to 2.4 (or are in the process of it). Can anyone point me in the right direction here? Thanks in advance, Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 2.2 with centos 6.3 gives problem while loading o2ib module for infiniband
Faheem, Could you reply with some error messages or details on the failure? I do not have a crystal ball. Details of the error, your lnet kernel module options and state of IB interface when the error occurs would be helpful. --Jeff On 3/26/13 3:00 AM, faheem patel wrote: Dear All, we are facing problem while connecting o2ib module. Lustre 2.2 with centos 6.3 gives problem while loading o2ib module for infiniband. Thanks in advance Regards, Faheem Patel ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Job: Lustre Development Engineer, Aeon Computing, Inc - San Diego, CA USA
Apologies to the list for posting this Lustre job opportunity. No disrespect meant. Lustre Development Engineer, Aeon Computing, Inc - San Diego, CA USA Position Summary Primary role will be to support and enhance a single large Lustre filesystem installation. Primary Duties / Responsibilities Analyze, design, program, debug, and modify Lustre code Identification and resolution of Lustre filesystem and LNET bugs. Respond to support requests by analyzing issues and creating code fixes or providing workarounds Development of site customized features Assist Intel HPDD Lustre development team with developing Chroma functionality Advise and assist in planning adoption of future releases of Lustre including Lustre on ZFS Engaging with Intel HPDD Lustre development team and the Lustre community. Qualifications (Knowledge, Skills, Abilities) Linux kernel development Deep acumen with high performance storage systems Proactive and solution-oriented problem solver Prior project and/or team leadership experience is not required but would be considered an asset Strong verbal and written English communication skills. Ability to work well in a distributed team environment. High level of attention to detail and comfortable multi-tasking Requirements (Education, Certification, Training, and Experience) Master’s Degree in computer science and 3 years of relevant experience; or Bachelor’s Degree in computer science or closely related field and 5 years of relevant experience; or equivalent work experience Development experience preferably with Lustre, LNET, Ethernet and TCP/IP Relocation to San Diego, CA Physical Demands / Work Environment Ability and desire to work as part of a geographically-distributed team -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Adding iSCSI SAN to Lustre
Jon, Any block device, be it disk, RAID, IB SRP or iSCSI requires an OSS node to front end the block storage. If the iSCSI storage device was a storage server with a software iSCSI layer you could potentially strip out the iSCSI software and lay down Linux and Lustre on it. Without knowing exactly what iSCSI system it is my statement about stripping out the software is a generalization. --Jeff On 1/28/13 4:13 PM, Jon Yeargers wrote: I may have an opportunity to repurpose an existing iSCSI SAN device. If I wanted to add it to an existing Lustre setup (or create a new one) – my understanding is that I would need a machine to act as the ‘communications link’ to Lustre from the SAN device. Something has to represent the OSS device – right? An iSCSI storage device would need a representative for this.. ? Does that make any sense? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] is Luster ready for prime time?
Greg, I'm echoing Charles' comments a bit. Specific filesystems are not good at everything. While it is my opinion that Lustre can be very stable, and like Colin stated the underlying hardware and configuration is crucial to that end, the filesystem may not be the best performing at every data access model. Like every other filesystem Lustre has use cases where it excels and others where overhead may be less than optimal. Other filesystems and storage devices also suffer from one size fits most. Many here would likely be biased toward Lustre but many of those people have also used many other options on the market and ended up here. --Jeff -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 On 1/17/13 9:17 AM, greg whynott wrote: Hello, just signed up today, please forgive me if this question has been covered recently. - in a bit of a rush to get an answer on this as we need to make a decision soon, the idea of using luster was thrown into the mix very late in the decision making process. We are looking to procure a new storage solution which will predominately be used for HPC output but will also be used as our main business centric storage for day to day use. Meaning the file system needs to be available 24/7/365. The last time I was involved in considering Luster was about 6 years ago and it was at that time being considered for scratch space for HPC usage only. Our VMs and databases would remain on non-luster storage as we already have that in place and it works well.The luster file system potentially would have everything else. Projects we work on typically take up to 2 years to complete and during that time we would want all assets to remain on the file system. Some of the vendors on our short list include HDS(Blue Arc), Isilon and NetApp.Last week we started bouncing the idea of using Luster around. I'd love to use it if it is considered stable enough to do so. your thoughts and/or comments would be greatly appreciated. thanks for your time. greg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] problem with installing lustre and ofed
Jason, The prebuilt server-side Lustre packages from Whamcloud are built against RHEL/CentOS kernel sources with kernel-ib active in them. This means that any of the Lustre prebuilt server packages are already tied to RHEL's kernel-ib. To accomplish your stated goal you'll have to start with a non Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install the OFED version of your choice. Once you have that you can build Lustre from source where it will compile against OFED and the installed kernel. --Jeff --- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4170 Morena Boulevard, Suite D - San Diego, CA 92117 /* Follow us on Twitter - @AeonComputing */ On 12/28/12 3:54 PM, Jason Brooks wrote: Hello, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox's OFED distribution so we may use infiniband. Would you folks look at my procedure and results below and let me know what you think? Thanks very much! The mellanox ofed installation builds and installs some kernel modules too, so I used this method to ensure OFED compiled against the correct kernel. This is on centos 6.3. 1. download all lustre rpms from whamcloud 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel 1. in this case, it's the rpm files with 2.6.32-279.14.1.el6_lustre.x86_64 in their name 3. reboot into this lustre kernel 4. install the remaining rpms 5. download ofed from mellanox MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso 1. build mellanox ofed bits using the lustre kernel and kernel-devel info 2. install mellanox ofed 6. reboot 7. upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can modprobe lnet and lustre. 8. if I DO have o2ib3 present in the lnet parameters, running modprobe lustre gets me: ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): Input/output error WARNING: Error inserting fid (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko): Input/output error WARNING: Error inserting osc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko): Input/output error WARNING: Error inserting lov (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko): Input/output error FATAL: Error inserting lustre (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko): Input/output error dmesg shows: ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq … ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
On 12/7/12 9:34 AM, Dilger, Andreas wrote: I've been using Lustre for years with my home MythTV (Linux PVR) setup. Nerd. :) -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lo2iblnd and Mellanox IB question
Megan, You will have to rebuild Lustre from source. Furthermore you will have to have the Mellanox ib driver source installed so the Lustre build process can grab the necessary bits from the Mellanox source. The issue you are seeing is exactly what you think it is. The WC builds use the RHEL in-kernel IB driver. I have even had issues with MDS/OSS boxes running RHEL in-kernel IB and clients running Mellanox of OFED IB drivers. Even though IB is a standard you really need to have everything, from core to edge, talking the same driver. I recently did nearly the same config you have; RHEL6.2 x86_64, MLX OFED, Lustre 2.1.3. You could opt to run your Mellanox IB HCA using the RHEL in-kernel IB drivers and not have to recompile anything. --Jeff On 11/20/12 1:20 PM, Ms. Megan Larko wrote: Hello to Everyone! I have a question to which I think I know the answer, but I am seeking confirmation (re-assurance?). I have build a RHEL 6.2 system with lustre-2.1.2. I am using the rpms from the Whamcloud site for linux kernel 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching lustre, lustre-modules, lustre-ldiskfs, and kernel-devel,I also have from the Whamcloud site kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related kernel-ib-devel for same. The lustre file system works properly for TCP. I would like to use InfiniBand. The system has a new Mellanox card for which mlxn1 firmware and drivers were installed. After this was done (I cannot speak to before) the IB network will come up on boot and copy and ping in a traditional network fashion. Hard Part: I would like to run the lustre file system on the IB (ib0). I re-created the lustre network to use /etc/modprobe.d/lustre.conf pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and the osts point to mgs on IB net). When I modprobe lustre to start the system I receive error messages stating that there are Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko lov.ko. The lustre.ko cannot be started. A look in /var/log/messages reveals many Unknown symbol and Disagrees about version of symbol from the ko2iblnd module. A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko shows it pointing to the Modules.symvers of the lustre kernel. Am I correct in thinking that because of the specific Mellanox IB hardware I have (with its own /usr/src/ofa_kernel/Module.symvers file), that I have to build Lustre-2.1.2 from tarball to use the configure --with-o2ib=/usr/src/ofa_kernel mandating that this system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from the kernel-ib rpms to which Lustre defaults in the Linux kernel? Is a rebuild of lustre from source mandartory or is there a way in which I may point to the appropriate symbols needed by the ko2iblnd.ko? Enjoy the Thanksgiving holiday for those U.S. readers.To everyone else in the world, have a great weekend! Megan Larko Hewlett-Packard ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lctl ping of Pacemaker IP
Megan, lnet pings aren't the same as tcpip/udp pings. An lnet ping 'lctl ping' would need to touch an active lnet instance on the target address. I don't think you can bind lnet to a pacemaker virtual IP but I'll let someone smarter than me on this list confirm or correct me. In any event an lnet ping and udp ping are completely separate animals. --Jeff Sent from my iPhone On Nov 1, 2012, at 21:04, Ms. Megan Larko dobsonu...@gmail.com wrote: Greetings! I am working with Lustre-2.1.2 on RHEL 6.2. First I configured it using the standard defaults over TCP/IP. Everything worked very nicely usnig a real, static --mgsnode=a.b.c.x value which was the actual IP of the MGS/MDS system1 node. I am now trying to integrate it with Pacemaker-1.1.7.I believe I have most of the set-up completed with a particular exception. The lctl ping command cannot ping the pacemaker IP alias (say a.b.c.d). The generic ping command in RHEL 6.2 can successfully access the interface. The Pacemaker alias IP (for failover of the combnied MGSMDS node with Fibre Channel multipath storage shared between both MGS/MDS-configured machines) works in and of itself. I tested with an apache service. The Pacemaker will correctly fail over the MGS/MDS from system1 to system2 properly. If I go to system2 then my Lustre file system stops because it cannot get to the alias IP number. I did configure the lustre OSTs to use --mgsnode=a.b.c.d (a.b.c.d representing my Pacemaker IP alias). A tunefs.lustre confirms the alias IP number. The alias IP number does not appear in LNET (lctl list_nids), and lctl ping a.b.c.d fails. Should this IP alias go into the LNET data base? If yes, how? What steps should I take to generate a successful lctl ping a.b.c.d? Thanks for reading! Cheers, megan ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Large Corosync/Pacemaker clusters
Shawn, In my opinion you shouldn't be running corosync on any more than two machines. They should be configured in self contained pairs (mds pair, oss pairs). Anything beyond that would be chaos to manage, even if it worked. Don't forget the stonith portion. Not every block storage implementation respects mmp protection. --Jeff On 10/19/12 9:52 AM, Hall, Shawn wrote: Hi, We’re setting up fairly large Lustre 2.1.2 filesystems, each with 18 nodes and 159 resources all in one Corosync/Pacemaker cluster as suggested by our vendor. We’re getting mixed messages on how large of a Corosync/Pacemaker cluster will work well between our vendor an others. 1.Are there Lustre Corosync/Pacemaker clusters out there of this size or larger? 2.If so, what tuning needed to be done to get it to work well? 3.Should we be looking more seriously into splitting this Corosync/Pacemaker cluster into pairs or sets of 4 nodes? Right now, our current configuration takes a long time to start/stop all resources (~30-45 mins), and failing back OSTs puts a heavy load on the cib process on every node in the cluster. Under heavy IO load, the many of the nodes will show as “unclean/offline” and many OST resources will show as inactive in crm status, despite the fact that every single MDT and OST is still mounted in the appropriate place. We are running 2 corosync rings, each on a private 1 GbE network. We have a bonded 10 GbE network for the LNET. Thanks, Shawn ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mounting Failover OSTs
Brian, Do you have corosync or other Linux HA software infrastructure running on these systems? You need an HA software layer to manage heartbeat monitoring, split-brain protection and mounting/migrating of resources. --Jeff On 10/11/12 2:02 PM, Andrus, Brian Contractor wrote: All, I am starting to try and configure failover for our lustre filesystem. Node00 is the mgs/mdt Node00 is the oss for ost0 and failnode for ost1 Node01 is the oss for ost1 and failnode for ost0 Both osts are on an SRP network and are visible by both nodes. Ost0 is mounted on node00 Ost1 is mounted on node01 If I try to mount ost0 on node01 I see in the logs for node00: kernel: Lustre: Denying initial registration attempt from nid 10.100.255.250@o2ib, specified as failover So do I have to manually mount the ost for failover purposes when there is a fail? I would have thought I mount the osts on both nodes and lustre will manage which node is the active node. Brian Andrus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files
Following up on my original post. I switched from /bin/tar that comes with RHEL/CentOS 5.x to thw Whamcloud patched tar utility. The entire backup was successful and took only 12 hours to complete. The CPU utilization was high 90% but only on one core. The process was much faster than the standard tar shipped in RHEL/CentOS and the only slow downs were on file pointers to very large files (100TB+) with large stripe counts. The files that were going very slow when I reported the initial problem were backed up instantly with the Whamcloud version of tar. Best part, the MDT was saved and the 4PB filesystem is in production again. --Jeff On 5/30/12 3:02 PM, Andreas Dilger wrote: On 2012-05-29, at 1:28 PM, Peter Grandi wrote: The tar backup of the MDT is taking a very long time. So far it has backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar process pointers to small or average size files are backed up quickly and at a consistent pace. When tar encounters a pointer/inode belonging to a very large file (100GB+) the tar process stalls on that file for a very long time, as if it were trying to archive the real filesize amount of data rather than the pointer/inode. If you have stripes on, a 100GiB file will have 100,000 1MiB stripes, and each requires a chunk of metadata. The descriptor for that file will have this potentially a very large number of extents, scattered around the MDT block device, depending on how slowly the file grew etc. While that may be true for other distributed filesystems, that is not true for Lustre at all. The size of a Lustre object is not fixed to a chunk size like 32MB or similar, but rather is variable depending on the size of the file itself. The number of stripes (== objects) on a file is currently fixed at file creation time, and the MDS only needs to store the location of each stripe (at most one per OST). The actual blocks/extents of the objects are managed inside the OST itself and are never seen by the client or the MDS. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineerhttp://www.whamcloud.com/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files
Greetings, I am aiding in the recovery of a multi-Petabyte Lustre filesystem (1.8.7) that went down hard due to site wide power loss. Power loss caused the MDT RAID volume to be put in a critical state and I was able to get the md raid based MDT device mounted read only and the MDT mounted read only as type ldiskfs. I was able to successfully backup the extended attributes of the MDT. This process took about 10 minutes. The tar backup of the MDT is taking a very long time. So far it has backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar process pointers to small or average size files are backed up quickly and at a consistent pace. When tar encounters a pointer/inode belonging to a very large file (100GB+) the tar process stalls on that file for a very long time, as if it were trying to archive the real filesize amount of data rather than the pointer/inode. During this process there are no errors reported by kernel, ldiskfs, md or tar. Nothing that would indiciate why things are so slow on pointers to large files. In watching the tar process the CPU utilization is at or near 100% so it is doing something. Running iostat at the same time shows that while tar is at or near 100% CPU there are no reads taking place on the MDT device and no writes to the device where the tarball is being written. It appears that the tar process goes to outer space when it encounters pointers to very large files. Is this expected behavior? The backup command used is the one from the MDT backup process in the 1.8 manual: 'tar zcvf tarfile --sparse .' df reports the ldiskfs MDT as 5GB used: /dev/md0 2636788616 5192372 2455778504 1% /mnt/mdt df -i reports the ldiskfs MDT as having 10,300,000 inodes used: /dev/md0 1758199808 10353389 17478464191% /mnt/mdt Any feedback is appreciated! --Jeff -- -- Jeff Johnson Partner Aeon Computing jeff dot johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?
FYI, I tested the Lustre 1.8 patch that provides support for 2.6.18-308.1.1 and while it does successfully compile the resulting changes in the kernel severely impact the isci (Intel SAS) driver. Romley (new Xeon-E5) systems booting this driver panic during boot, just after switchroot. I will capture the trace and post it, it appears to be the result of patch changes to jbd. --Jeff On Fri, Mar 30, 2012 at 11:41 AM, Peter Jones pjo...@whamcloud.com wrote: Heh. Nice imagery Jeff. Yes, this patch is still under active development and also note that it will not be able to complete autotest until we switch it to use RHEL5.8 by default (hopefully soon, but this had been on hold until we saw whether we needed RC3 for 2.2). So, if you are willing to test the latest patch that would be appreciated and could accelerate things (please post your findings on the JIRA ticket), but regardless we should have this work completed in the near future. -- -- Jeff Johnson Aeon Computing jeff dot johnson at aeoncomputing.com www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?
Greetings, Does anyone know the most recent kernel (RHEL/CentOS) that can be successfully patched and compiled against the current Lustre 1.8 git source tree? I attempted 2.6.18-308.1.1 but there are several patches that fail. Quilt would not make it past the third patch in the series file. Applying the patches manually reveals several hunk failures. I require a kernel more recent than 2.6.18-274.3.1 for non-Lustre related issues. Thanks in advance. -- -- Jeff Johnson Aeon Computing jeff dot johnson at aeoncomputing.com www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?
I applied that patch at 9AM this morning. Several hunks failed and then I noticed that the LU-1052 patch is literally morphing as the morning progresses. There is a difference between living on the bleeding edge and standing in a field watching the razor's edge coming at you... :) On Thu, Mar 29, 2012 at 11:40 PM, Johann Lombardi joh...@whamcloud.com wrote: On Thu, Mar 29, 2012 at 11:30:03PM -0700, Jeff Johnson wrote: Does anyone know the most recent kernel (RHEL/CentOS) that can be successfully patched and compiled against the current Lustre 1.8 git source tree? I attempted 2.6.18-308.1.1 but there are several patches that fail. LU-1052 is the jira ticket for 2.6.18-308.1.1 support. There is a patch here which should help you to build against this kernel. Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com -- -- Jeff Johnson Aeon Computing jeff dot johnson at aeoncomputing.com www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?
My Lustre system (IB, Xeon-E5, 4xOSS, 4xOST, 84TB, 16 clients) is a development only system. No meaningful data. It is currently loaded with CentOS5.8 and I have Lustre 1.8 source checked out via git last night. I'm happy to test and provide feedback. FTR, I think Intel's new isci driver is a real PITA. On Fri, Mar 30, 2012 at 11:41 AM, Peter Jones pjo...@whamcloud.com wrote: Heh. Nice imagery Jeff. Yes, this patch is still under active development and also note that it will not be able to complete autotest until we switch it to use RHEL5.8 by default (hopefully soon, but this had been on hold until we saw whether we needed RC3 for 2.2). So, if you are willing to test the latest patch that would be appreciated and could accelerate things (please post your findings on the JIRA ticket), but regardless we should have this work completed in the near future. -- -- Jeff Johnson Aeon Computing jeff dot johnson at aeoncomputing.com www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre-1.8.4 : BUG soft lock up
Greetings, The below console output is from a 1.8.4 OST (RHEL5.5, 2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug for sure. Just wondering if anyone has seen this or something very similar. Updating to 1.8.6 WC variant isn't an option at this time. If anyone has some insight into this I'd appreciate the feedback. Thanks, --Jeff BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409] CPU 6: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) ib_iser(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_srp(U) rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) mptsas(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_en(U) joydev(U) shpchp(U) sg(U) mlx4_core(U) e1000e(U) serio_raw(U) pcspkr(U) i2c_i801(U) i2c_core(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) mptspi(U) scsi_transport_spi(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 409, comm: kswapd0 Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[801011bf] [801011bf] dqput+0x105/0x19f RSP: 0018:8101be805cd0 EFLAGS: 0202 RAX: 81012e03f000 RBX: RCX: 81012e03f000 RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0 RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918 R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60 R13: 8b6448f0 R14: ffe2 R15: 8b6448f0 FS: () GS:8101bfc2adc0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00402000 CR3: 00201000 CR4: 06e0 Call Trace: [8010182b] dquot_drop+0x30/0x5e [8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70 [80022d99] clear_inode+0xb4/0x123 [80034e52] dispose_list+0x41/0xe0 [8002d6a7] shrink_icache_memory+0x1b7/0x1e6 [8003f466] shrink_slab+0xdc/0x153 [80057e59] kswapd+0x343/0x46c [800a0ab2] autoremove_wake_function+0x0/0x2e [80057b16] kswapd+0x0/0x46c [800a089a] keventd_create_kthread+0x0/0xc4 [80032890] kthread+0xfe/0x132 [8009d728] request_module+0x0/0x14d [8005dfb1] child_rip+0xa/0x11 [800a089a] keventd_create_kthread+0x0/0xc4 [80032792] kthread+0x0/0x132 [8005dfa7] child_rip+0x0/0x11 -- -- Jeff Johnson Manager Aeon Computing jeff.johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Enabling mds failover after filesystem creation
Greetings, I am attempting to add mds failover operation to an existing v1.8.4 filesystem. I have heartbeat/stonith configured on the mds nodes. What is unclear is what to change in the lustre parameters. I have read over the 1.8.x and 2.0 manuals and they are unclear as exactly how to enable failover mds operation on an existing filesystem. Do I simply run the following on the primary mds node and specify the NID of the secondary mds node? tunefs.lustre --param=failover.node=10.0.1.3@o2ib /dev/mdt device where: 10.0.1.2=primary mds, 10.0.1.3=secondary mds All of the examples for enabling failover via tunefs.lustre are for OSTs and I want to be sure that there isn't a different procedure for the MDS since it can only be active/passive. Thanks, --Jeff -- Jeff Johnson Aeon Computing www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Enabling mds failover after filesystem creation
Apologies, I should have been more descriptive. I am running a dedicated MGS node and MGT device. The MDT is a standalone RAID-10 shared via SAS between two nodes, one being the current MDS and the second being the planned secondary MDS. Heartbeat and stonith w/ ipmi control is currently configured but not started between the two nodes. On 6/14/11 12:12 PM, Cliff White wrote: It depends - are you using a combined MGS/MDS? If so, you will have to update the mgsnid on all servers to reflect the failover node, plus change the client mount string to show the failover node. otherwise, it's the same procedure as with an OST. cliffw On Tue, Jun 14, 2011 at 12:06 PM, Jeff Johnson jeff.john...@aeoncomputing.com mailto:jeff.john...@aeoncomputing.com wrote: Greetings, I am attempting to add mds failover operation to an existing v1.8.4 filesystem. I have heartbeat/stonith configured on the mds nodes. What is unclear is what to change in the lustre parameters. I have read over the 1.8.x and 2.0 manuals and they are unclear as exactly how to enable failover mds operation on an existing filesystem. Do I simply run the following on the primary mds node and specify the NID of the secondary mds node? tunefs.lustre --param=failover.node=10.0.1.3@o2ib /dev/mdt device where: 10.0.1.2=primary mds, 10.0.1.3=secondary mds All of the examples for enabling failover via tunefs.lustre are for OSTs and I want to be sure that there isn't a different procedure for the MDS since it can only be active/passive. Thanks, --Jeff -- Jeff Johnson Aeon Computing www.aeoncomputing.com http://www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com http://www.whamcloud.com -- Jeff Johnson Aeon Computing www.aeoncomputing.com 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre 1.8.4 - Local mount of ost for backup purposes, fs type ldiskfs or ext4?
Greetings, I am doing a local mount of a 8TB ost device in a Lustre 1.8.4 installation. The ost was built with a backfstype of ldiskfs. When attempting the local mount: mount -t ldiskfs /dev/sdc /mnt/save/ost I get: mount: wrong fs type, bad option, bad superblock on /dev/sdt, missing codepage or other error I am able to mount the same block device as ext4, just not as ldiskfs. I need to be able to mount as ldiskfs to get access to the extended attributes and back them up. Is this still the case with the ext4 extensions for Lustre 1.8.4? I am able to mount read-only as ext4 but any attempt at reading the extended attributes with getfattr fails. Thanks, --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] aacraid kernel panic caused failover
I have seen similar behavior on these controllers. On dissimilar configs and different aged systems. These happened to be non-Lustre standalone nfs and iscsi target boxes. Went through controller and drive firmware upgrades, low-level fw dumps and analysis from dev engineers. In the end it was never really explained or resolved. It appears that these controllers, like small children, have tantrums and fall apart. A power cycle clears the condition. Not the best controller for an OSS. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com On Apr 6, 2011, at 1:05, Thomas Roth t.r...@gsi.de wrote: We have ~ 60 servers with these Adaptec controllers, and found this problem just to happen from time to time. Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but they had no clue either. Only good thing is it seems that this adapter panic happens in an instant, halting the machine, but has no prior phase of degradation: the controller doesn't start leaving out every second bit or just writing the '1's and not the '0's or ... - so whatever data has made it to the disks before the crash seems to be quite sensible. Reboot and never buy Adaptec again. Cheers, Thomas On 04/06/2011 07:03 AM, David Noriega wrote: Ok I updated the aacraid driver and the raid firmware, yet I still had the problem happen, so I did more research and applied the following tweaks: 1) Rebuilt mkinitrd with the following options: a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac 2) Added the local hard disk to the multipath black list 3) Edited modprobe.conf to have the following aacraid options: options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't seem to print anything to dmesg 4) Added pcie_aspm=off to the kernel boot options So things looked good for a while. I did have a problem mounting the lustre partitions but this was my fault in misconfiguring some lnet options I was experimenting with. I fixed that and just as a test, I ran 'modprobe lustre' since I wasn't ready to fail back the partitions just yet(wanted to wait till when activity was the lowest). That was earlier today. I was about to fail back tonight, yet when I checked the server again I saw in dmesg the same aacraid problems from before. Is it possible lustre is interfering with aacraid? Its weird since I do have a duplicate machine and its not having any of thise problems. On Fri, Mar 25, 2011 at 9:55 AM, Temple Jason jtem...@cscs.ch wrote: Adaptec should have the firmware and drivers on their site for your card. If not adaptec, then SOracle will have it available somewhere. The firmware and system drivers usually have a utility that will check the current version and upgrade it for you. Hope this helps (I use different cards, so I can't tell you exactly). -Jason -Original Message- From: David Noriega [mailto:tsk...@my.utsa.edu] Sent: venerdì, 25. marzo 2011 15:47 To: Temple Jason Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover Hmm not sure, whats the best way to find out? On Fri, Mar 25, 2011 at 9:46 AM, Temple Jason jtem...@cscs.ch wrote: Hi, Are you using the latest firmware? This sort of thing used to happen to me, but with different raid cards. -Jason -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega Sent: venerdì, 25. marzo 2011 15:38 To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] aacraid kernel panic caused failover Had some crazyness happen to our lustre system. We have two OSSs, both identical sun x4140 servers and on only one of them have I've seen this pop up in the kernel messages and then a kernel panic. The panic seemed to then spread and caused the network to go down and the second OSS to try to failover(or failback?). Anyways 'splitbrain' occurred and I was able to get in and set them straight. I researched this aacraid module messages and so far all I can find says to increase the timeout, but these are old messages and currently they are set to 60. Anyone else have any ideas? aacraid: Host adapter abort request (0,0,0,0) aacraid: Host adapter reset request. SCSI hang ? AAC: Host adapter BLINK LED 0xef AAC0: adapter kernel panic'd ef. -- Personally, I liked the university. They gave us money and facilities, we didn't have to produce anything! You've never been out of college! You don't know what it's like out there! I've worked in the private sector. They expect results. -Ray Ghostbusters ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Personally, I liked
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, In the future you might want to consider posting some entries or pieces of a log rather than the entire log file. =) Was this from the OSS that you say was rebooting or from your MDS node? I would look at the log file of the OSS node(s) that contain OST0006 and OST0007 and see if there are any RAID errors. It might be a network problem as well. Morning is coming and one of the developers will likely respond to this with more suggestions. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com m: 619-204-9061 On Dec 20, 2010, at 23:13, Daniel Raj danielraj2...@gmail.com wrote: Dec 19 04:19:49 cluster kernel: Lustre: 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting Dec 19 04:19:49 cluster kernel: Lustre: 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) Skipped 4 previous similar messages Dec 19 04:30:05 cluster kernel: Lustre: 23308:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting Dec 19 04:30:05 cluster kernel: LustreError: 137-5: UUID 'cluster-ost7_UUID' is not available for connect (no target) Dec 19 04:30:05 cluster kernel: LustreError: 23290:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8103fd722c00 x1355442914715019/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292713305 ref 1 fl Interpret:/0/0 rc -19/0 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, Check the health and stability of your raid-6 volume. Make sure the raid is healthy and online. Use whatever monitor utility came with your raid card or check /proc/mdstat if it's a Linux mdraid. Check /var/log/messages for error messages from your raid or other hardware. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com On Dec 20, 2010, at 22:27, Daniel Raj danielraj2...@gmail.com wrote: Hi Jeff, Thanks for your reply Storage information : DL380G5 == OSS + 16GB Ram OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3 MSA60 box == OST RAID 6 Regards, Daniel A On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST errors caused by residual client info?
Greetings.. Is it possible that the below error can be derived from a client that has not been rebooted or had lustre kernel mods reloaded during a time when a few test file systems were built and mounted? LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@81032dd2d000 x1348952525350751/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1291669076 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 55 previous similar messages LustreError: 137-5: UUID 'fs-OST0058_UUID' is not available for connect (no target) Normally this would be a back end storage issue. In this case, the oss where this error is logged doesn't have an ost OST0058. It has an ost OST006d. Regardless of the ost name, the backend raid is healthy with no hardware errors. No other h/w errors present on the oss node (e.g.: mce, panic, ib/enet failures, etc). Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? Thanks! --Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST errors caused by residual client info?
On 12/6/10 3:55 PM, Oleg Drokin wrote: Hello! On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote: Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? If you did not unmount clients from the previous incarnation of the filesystem, those clients would still continue to try to contact the servers they know about even after the servers themselves go away and are repurposed (since there is no way for the client to know about this). All clients were unmounted but the lustre kernel mods were never removed/reloaded nor were the clients rebooted. Is it odd that this error would occur naming an ost that is not present on that oss? Should an oss only report this error about its own ost devices? As I said, this particular oss where the error came from only has an OST006c and OST006d. It does not have an OST0058 although it may have back when the filesystem was made with a simple test csv that did not specifically give index numbers as part of the mkfs.lustre process. They were named later, randomly, when the osts were first mounted and connected to the mds. Do you think it is possible for a client to retain this information even though a umount/mount of the filesystem took place? --Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss