Re: [lustre-discuss] 2.15.4 o2iblnd on RoCEv2?

2024-01-10 Thread Jeff Johnson
A LU ticket and patch for lnetctl or for me being an under-caffeinated
idiot? ;-)

On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger  wrote:
>
> It would seem that the error message could be improved in this case?  Could 
> you file an LU ticket for that with the reproducer below, and ideally along 
> with a patch?
>
> Cheers, Andreas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.15.4 o2iblnd on RoCEv2?

2024-01-10 Thread Jeff Johnson
Man am I an idiot. Been up all night too many nights in a row and not
enough coffee. It helps if you use the correct --net designation. I
was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine.

(cleanup from previous)
lctl net down && lustre_rmmod

(new attempt)
modprobe lnet -v
lnetctl lnet configure
lnetctl net add --if enp1s0np0 --net o2ib0
lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: o2ib
  local NI(s):
- nid: 10.0.50.27@o2ib
  status: up
  interfaces:
  0: enp1s0np0

Lots more to test and verify but the original mailing list submission
was total pilot error on my part. Apologies to all who spent cycles
pondering this nothingburger.




On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson
 wrote:
>
> Howdy intrepid Lustrefarians,
>
> While starting down the debug rabbit hole I thought I'd raise my hand
> and see if anyone has a few magic beans to spare.
>
> I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
> RoCEv2 interface.
>
> Running `lnetctl net add --net ib0 --if enp1s0np0` results in
>  net:
>   errno: -1
>   descr: cannot parse net '<255:65535>'
>
> Nothing in dmesg to indicate why. Search engines aren't coughing up
> much here either.
>
> Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4
>
> I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
> ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
> nodes.
>
> Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
> enp1s0np0 && lnetctl net show`
> [root@r2u11n3 ~]# lnetctl net show
> net:
> - net type: lo
>   local NI(s):
> - nid: 0@lo
>   status: up
> - net type: tcp
>   local NI(s):
> - nid: 10.0.50.27@tcp
>   status: up
>   interfaces:
>   0: enp1s0np0
>
> I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
> as sysfs references
>
> [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
> RoCE v2
>
> Ideas? Suggestions? Incense?
>
> Thanks,
>
> --Jeff



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] 2.15.4 o2iblnd on RoCEv2?

2024-01-09 Thread Jeff Johnson
Howdy intrepid Lustrefarians,

While starting down the debug rabbit hole I thought I'd raise my hand
and see if anyone has a few magic beans to spare.

I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
RoCEv2 interface.

Running `lnetctl net add --net ib0 --if enp1s0np0` results in
 net:
  errno: -1
  descr: cannot parse net '<255:65535>'

Nothing in dmesg to indicate why. Search engines aren't coughing up
much here either.

Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4

I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
nodes.

Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
enp1s0np0 && lnetctl net show`
[root@r2u11n3 ~]# lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: tcp
  local NI(s):
- nid: 10.0.50.27@tcp
  status: up
  interfaces:
  0: enp1s0np0

I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
as sysfs references

[root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
RoCE v2

Ideas? Suggestions? Incense?

Thanks,

--Jeff
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-08 Thread Jeff Johnson
onsidering NVME storage for the next MDS.
> >>
> >>
> >> As I understand, NVME disks are bundled in software, not by a hardware 
> >> raid controller.
> >> This would be done using Linux software raid, mdadm, correct?
> >>
> >>
> >> We have some experience with ZFS, which we use on our OSTs.
> >> But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol 
> >> on top which is then formatted with ldiskfs - to much voodoo...
> >>
> >>
> >> How is this handled elsewhere? Any experiences?
> >>
> >>
> >>
> >>
> >> The available devices are quite large. If I create a raid-10 out of 4 
> >> disks, e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB 
> >> limit.
> >> So no need for a box with lots of U.3 slots.
> >>
> >>
> >> But for MDS operations, we will still need a powerful dual-CPU system with 
> >> lots of RAM.
> >> Then the NVME devices should be distributed between the CPUs?
> >> Is there a way to pinpoint this in a call for tender?
> >>
> >>
> >>
> >>
> >> Best regards,
> >> Thomas
> >>
> >>
> >> 
> >> Thomas Roth
> >>
> >>
> >> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> >> Planckstraße 1, 64291 Darmstadt, Germany, 
> >> https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
> >>   
> >> <https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
> >>  >
> >>
> >>
> >> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
> >> Managing Directors / Geschäftsführung:
> >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
> >> Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
> >> State Secretary / Staatssekretär Dr. Volkmar Dietz
> >>
> >>
> >>
> >>
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>
> >> https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$
> >>   
> >> <https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$
> >>  >
> >>
> >>
> >>
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org
> >> https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Jeff Johnson
I'm certainly not Andreas. That said...

You're running an MPI simulation, assumingly across most or all of your 34
compute nodes. Lustre server operations, their lnet activity and the
backend storage I/O will create a profound imbalance on the few compute
nodes you designate to do both server and client operation. That and you
expose yourself to deadlocks and other potentials mentioned earlier. I do
not know how performant your login server is, but depending on the file
operations of your simulations you could cavitate your login server. Also,
you generally don't want users logging in on a node as critical as an MDS.

You would be better served by allocating two of your compute nodes to just
be Lustre servers, one mds/oss, the other an oss and run 32 clean client
nodes. More stable, clean and in the end  probably more workflow
productivity over time. Fewer technical incidents.

Just my opinion...others may differ.

--Jeff



On Fri, Oct 13, 2023 at 12:43 PM Fedele Stabile <
fedele.stab...@fis.unical.it> wrote:

> I believe in Linux is possible to limit the memory used by a user and also
> it is possible to limit the amount of cpu used so I can limit resources for
> group user and also if i put oss server in a vm i suppose i can limit cpu
> and memory usage.
> My scenario is: i have 34 compute nodes 512 GB RAM and 34 HD 16 TB each
> that I can arrange in 9 nodes, i have also a management node that can be
> used for LUSTRE metadata server, infiniband is 200 Gb/s
> We make mhd simulations.
> What Lustre configuration do you suggest?
>
> --
> *Da:* Andreas Dilger 
> *Inviato:* Venerdì, Ottobre 13, 2023 7:19:11 PM
> *A:* Fedele Stabile 
> *Cc:* lustre-discuss@lists.lustre.org 
> *Oggetto:* Re: [lustre-discuss] OSS on compute node
>
> On Oct 13, 2023, at 20:58, Fedele Stabile 
> wrote:
>
>
> Hello everyone,
> We are in progress to integrate Lustre on our little HPC Cluster and we
> would like to know if it is possible to use the same node in a cluster to
> act as an OSS with disks and to also use it as a Compute Node and then
> install a Lustre Client.
> I know that the OSS server require a modified kernel so I suppose it can
> be installed in a virtual machine using kvm on a compute node.
>
>
> There isn't really a problem with running a client + OSS on the same node
> anymore, nor is there a problem with an OSS running inside a VM (if you
> have SR-IOV and enough CPU+RAM to run the server).
>
> *HOWEVER*, I don't think it would be good to have the client mounted on
> the *VM host*, and then run the OSS on a *VM guest*.  That could lead to
> deadlocks and priority inversion if the client becomes busy, but depends on
> the local OSS to flush dirty data from RAM and the OSS cannot run in the VM
> because it doesn't have any RAM...
>
> If the client and OSS are BOTH run in VMs, or neither run in VMs, or only
> the client run in a VM, then that should be OK, but may have reduced
> performance due to the server contending with the client application.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Jeff Johnson
Skydiving with an anvil is *possible* ...but not advisable.

--Jeff


On Fri, Oct 13, 2023 at 10:21 AM Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> On Oct 13, 2023, at 20:58, Fedele Stabile 
> wrote:
>
>
> Hello everyone,
> We are in progress to integrate Lustre on our little HPC Cluster and we
> would like to know if it is possible to use the same node in a cluster to
> act as an OSS with disks and to also use it as a Compute Node and then
> install a Lustre Client.
> I know that the OSS server require a modified kernel so I suppose it can
> be installed in a virtual machine using kvm on a compute node.
>
>
> There isn't really a problem with running a client + OSS on the same node
> anymore, nor is there a problem with an OSS running inside a VM (if you
> have SR-IOV and enough CPU+RAM to run the server).
>
> *HOWEVER*, I don't think it would be good to have the client mounted on
> the *VM host*, and then run the OSS on a *VM guest*.  That could lead to
> deadlocks and priority inversion if the client becomes busy, but depends on
> the local OSS to flush dirty data from RAM and the OSS cannot run in the VM
> because it doesn't have any RAM...
>
> If the client and OSS are BOTH run in VMs, or neither run in VMs, or only
> the client run in a VM, then that should be OK, but may have reduced
> performance due to the server contending with the client application.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lnet errors

2023-10-05 Thread Jeff Johnson
I couldn't say exactly but..

   - Your net is o2ib1. Is there an o2ib0?
   - Are you routing? If so, lnet routing or IB routing? Any issues with
   the routers or routing?
   - Verify the stability of lnet and the fabric path between client and
   server in the messages above using a tool like lnet_selftest?
   - Verify the fabric: Check error counters on the switch and HCA ports
   involved. Use non-Lustre IB tools (ib_send_bw, etc) to test the fabric.

Lustre can, and will tell you when lnet issue arise but it cannot tell you
anything about the actual network layer it is riding on so it is usually a
good idea to certify function of the network layer first before delving
into "what LBUG is running my weekend plans?"

I hope that helps,

--Jeff

(resent to list in hopes of being beneficial to others)

On Thu, Oct 5, 2023 at 9:34 AM Alastair Basden via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hi,
>
> Lustre 2.12.2.
>
> We are seeing lots of errors on the servers such as:
> Oct  5 11:16:48 oss04 kernel: LNetError:
> 6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending
> PUT to 12345-172.19.171.15@o2ib1: -125
> Oct  5 11:16:48 oss04 kernel: LustreError:
> 6414:0:(events.c:450:server_bulk_callback()) event type 5, status -125,
> desc 8fe066bb9400
>
> and
> Oct  4 14:59:48 oss04 kernel: LustreError:
> 6383:0:(events.c:305:request_in_callback()) event type 2, status -103,
> service ost_io
>
> and
> Oct  5 11:18:06 oss04 kernel: LustreError:
> 6388:0:(events.c:305:request_in_callback()) event type 2, status -5,
> service ost_io
> Oct  5 11:18:06 oss04 kernel: LNet:
> 6412:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from
> 172.19.171.15@o2ib1
>
> and on the clients:
> m7: Oct  5 14:46:59 m7132 kernel: LustreError:
> 2466:0:(events.c:200:client_bulk_callback()) event type 2, status -103,
> desc 9a251fc14400
>
> and
> m7: Oct  5 11:18:34 m7086 kernel: LustreError:
> 2495:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc
> 9a39ad668000
>
> Does anyone have any ideas about what could be causing this?
>
> Thanks,
> Alastair.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL EMAIL] Re: [EXTERNAL EMAIL] Re: [EXTERNAL] No port 988?

2023-09-27 Thread Jeff Johnson
Nothing better than sliding in at the last moment to steal all the glory ;-)

—Jeff

On Wed, Sep 27, 2023 at 07:10 Jan Andersen  wrote:

> Hi Jeff,
>
> Yes, that was it! Things are working beautifully now - big thanks.
>
> /jan
>
> On 27/09/2023 15:07, Jeff Johnson wrote:
> > Any chance the firewall is running?
> >
> > You can use `lctl ping ipaddress@lnet` to check if you have functional
> > lnet between machines. Example `lctl ping 10.0.0.10@tcp`
> >
> > —Jeff
> >
> > On Wed, Sep 27, 2023 at 05:35 Jan Andersen  > <mailto:j...@comind.io>> wrote:
> >
> > However, it is still timing out when I try to mount on the oss. This
> is
> > the kernel module:
> >
> > [root@mds ~]# lsmod | grep lnet
> > lnet  704512  7
> mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt
> > libcfs266240  15
> >
>  
> fld,lnet,fid,lod,mdd,mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt,osd_ldiskfs,lquota,lfsck
> > sunrpc577536  2 lnet
> >
> > But it only listens on tcp6, which I don't use - is there a way to
> for
> > it to use tcp4?
> >
> > [root@mds ~]# netstat -nap | grep 988
> > tcp6   0  0 :::988  :::*
> > LISTEN  -
> >
> > /jan
> >
> > On 27/09/2023 10:15, Jan Andersen wrote:
> >  > Hi Rick,
> >  >
> >  > Very strange - when I started the vm this morning, 'modprobe lnet'
> >  > didn't return an error - and it seems to have loaded the module:
> >  >
> >  > [root@rocky8 ~]# lsmod | grep lnet
> >  > lnet  704512  0
> >  > libcfs266240  1 lnet
> >  > sunrpc577536  2 lnet
> >  >
> >  > Looking at the running kernel and the kernel source, they now
> > seem to be
> >  > the same version:
> >  >
> >  > [root@rocky8 ~]# ll /usr/src/kernels
> >  > total 4
> >  > drwxr-xr-x. 23 root root 4096 Sep 26 12:34
> > 4.18.0-477.27.1.el8_8.x86_64/
> >  > [root@rocky8 ~]# uname -r
> >  > 4.18.0-477.27.1.el8_8.x86_64
> >  >
> >  > - which would explain that it now works. Things were a bit hectic
> > with
> >  > other things yesterday afternoon, and I don't quite remember
> > installing
> >  > a new kernel, but it looks like I did. Hopefully this is my
> problem
> >  > solved, then - sorry for jumping up and down and making noise!
> >  >
> >  > /jan
> >  >
> >  > On 26/09/2023 18:13, Mohr, Rick wrote:
> >  >> What error do you get when you run "modprobe lnet"?
> >  >>
> >  >> --Rick
> >  >>
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org  lustre-discuss@lists.lustre.org>
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> >
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL EMAIL] Re: [EXTERNAL] No port 988?

2023-09-27 Thread Jeff Johnson
Any chance the firewall is running?

You can use `lctl ping ipaddress@lnet` to check if you have functional lnet
between machines. Example `lctl ping 10.0.0.10@tcp`

—Jeff

On Wed, Sep 27, 2023 at 05:35 Jan Andersen  wrote:

> However, it is still timing out when I try to mount on the oss. This is
> the kernel module:
>
> [root@mds ~]# lsmod | grep lnet
> lnet  704512  7 mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt
> libcfs266240  15
>
> fld,lnet,fid,lod,mdd,mgs,obdclass,osp,ptlrpc,mgc,ksocklnd,mdt,osd_ldiskfs,lquota,lfsck
> sunrpc577536  2 lnet
>
> But it only listens on tcp6, which I don't use - is there a way to for
> it to use tcp4?
>
> [root@mds ~]# netstat -nap | grep 988
> tcp6   0  0 :::988  :::*
> LISTEN  -
>
> /jan
>
> On 27/09/2023 10:15, Jan Andersen wrote:
> > Hi Rick,
> >
> > Very strange - when I started the vm this morning, 'modprobe lnet'
> > didn't return an error - and it seems to have loaded the module:
> >
> > [root@rocky8 ~]# lsmod | grep lnet
> > lnet  704512  0
> > libcfs266240  1 lnet
> > sunrpc577536  2 lnet
> >
> > Looking at the running kernel and the kernel source, they now seem to be
> > the same version:
> >
> > [root@rocky8 ~]# ll /usr/src/kernels
> > total 4
> > drwxr-xr-x. 23 root root 4096 Sep 26 12:34 4.18.0-477.27.1.el8_8.x86_64/
> > [root@rocky8 ~]# uname -r
> > 4.18.0-477.27.1.el8_8.x86_64
> >
> > - which would explain that it now works. Things were a bit hectic with
> > other things yesterday afternoon, and I don't quite remember installing
> > a new kernel, but it looks like I did. Hopefully this is my problem
> > solved, then - sorry for jumping up and down and making noise!
> >
> > /jan
> >
> > On 26/09/2023 18:13, Mohr, Rick wrote:
> >> What error do you get when you run "modprobe lnet"?
> >>
> >> --Rick
> >>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] How to eliminate zombie OSTs

2023-08-09 Thread Jeff Johnson
Alejandro,

Is your MGS located on the same node as your primary MDT? (combined MGS/MDT
node)

--Jeff

On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hello,
>
> In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS
> with 4 jboxes, each box with 24 disks of 12 TB each, for a total of
> nearly 1 PB. In all that time we had power failures and failed raid
> controller cards, all of which made us adjust the configuration. After
> the last failure, the system keeps sending error messages about OSTs
> that are no more in the system. In the MDS I do
>
> # lctl dl
>
> and I get the 20 currently active OSTs
>
> oss01.lanot.unam.mx -   OST00   /dev/disk/by-label/lustre-OST
> oss01.lanot.unam.mx -   OST01   /dev/disk/by-label/lustre-OST0001
> oss01.lanot.unam.mx -   OST02   /dev/disk/by-label/lustre-OST0002
> oss01.lanot.unam.mx -   OST03   /dev/disk/by-label/lustre-OST0003
> oss01.lanot.unam.mx -   OST04   /dev/disk/by-label/lustre-OST0004
> oss01.lanot.unam.mx -   OST05   /dev/disk/by-label/lustre-OST0005
> oss01.lanot.unam.mx -   OST06   /dev/disk/by-label/lustre-OST0006
> oss01.lanot.unam.mx -   OST07   /dev/disk/by-label/lustre-OST0007
> oss01.lanot.unam.mx -   OST08   /dev/disk/by-label/lustre-OST0008
> oss01.lanot.unam.mx -   OST09   /dev/disk/by-label/lustre-OST0009
> oss02.lanot.unam.mx -   OST15   /dev/disk/by-label/lustre-OST000f
> oss02.lanot.unam.mx -   OST16   /dev/disk/by-label/lustre-OST0010
> oss02.lanot.unam.mx -   OST17   /dev/disk/by-label/lustre-OST0011
> oss02.lanot.unam.mx -   OST18   /dev/disk/by-label/lustre-OST0012
> oss02.lanot.unam.mx -   OST19   /dev/disk/by-label/lustre-OST0013
> oss02.lanot.unam.mx -   OST25   /dev/disk/by-label/lustre-OST0019
> oss02.lanot.unam.mx -   OST26   /dev/disk/by-label/lustre-OST001a
> oss02.lanot.unam.mx -   OST27   /dev/disk/by-label/lustre-OST001b
> oss02.lanot.unam.mx -   OST28   /dev/disk/by-label/lustre-OST001c
> oss02.lanot.unam.mx -   OST29   /dev/disk/by-label/lustre-OST001d
>
> but I also get 5 that are not currently active, in fact doesn't exist
>
>  28 IN osp lustre-OST0014-osc-MDT lustre-MDT-mdtlov_UUID 4
>  29 UP osp lustre-OST0015-osc-MDT lustre-MDT-mdtlov_UUID 4
>  30 UP osp lustre-OST0016-osc-MDT lustre-MDT-mdtlov_UUID 4
>  31 UP osp lustre-OST0017-osc-MDT lustre-MDT-mdtlov_UUID 4
>  32 UP osp lustre-OST0018-osc-MDT lustre-MDT-mdtlov_UUID 4
>
> When I try to eliminate them with
>
> lctl conf_param -P osp.lustre-OST0015-osc-MDT.active=0
>
> I get the error
>
> conf_param: invalid option -- 'P'
> set a permanent config parameter.
> This command must be run on the MGS node
> usage: conf_param [-d] 
>   -d  Remove the permanent setting.
>
> If I do
>
> lctl --device 28 deactivate
>
> I don't get an error, but nothing changes
>
> What can I do?
>
> Thank you in advance for any help.
>
> --
> Alejandro Aguilar Sierra
> LANOT, ICAyCC, UNAM
> _______
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'

2023-06-30 Thread Jeff Johnson
Yao,

Glad you have the correct modules working now. Can you explain why you are
employing the virtual disk driver with osd-zfs?

Capacities with Lustre-ZFS are estimated as zfs has functions like
compression that can change "capacity". As ZFS volumes are used and filled
as Lustre volumes their capacities will be reported more accurately as
objects are allocated to storage.

Running a Lustre command from a client `lfs df -h` will give a better view
of Lustre capacities from the client perspective.

Your use of dev/vda may be adding obscurity and I'm not sure why you would
be adding that.

--Jeff



--Jeff


On Thu, Jun 29, 2023 at 10:33 PM Yao Weng  wrote:

> Hi Jeff:
> Thank you very much ! I install lustre-zfs-dkms and lustre client can
> mount lustre filesystem. However, the size does not adds-up.
> I have
> - msg
>
>  sudo mkfs.lustre --mgs  --reformat --backfstype=zfs --fsname=lustre  
> lustre-mgs/mgs
> /dev/vda2
>
>
>Permanent disk data:
>
> Target: MGS
>
> Index:  unassigned
>
> Lustre FS:  lustre
>
> Mount type: zfs
>
> Flags:  0x64
>
>   (MGS first_time update )
>
> Persistent mount opts:
>
> Parameters:
>
> mkfs_cmd = zpool create -f -O canmount=off lustre-mgs /dev/vda2
>
> mkfs_cmd = zfs create -o canmount=off  lustre-mgs/mgs
>
>   xattr=sa
>
>   dnodesize=auto
>
> Writing lustre-mgs/mgs properties
>
>   lustre:version=1
>
>   lustre:flags=100
>
>   lustre:index=65535
>
>   lustre:fsname=lustre
>
>   lustre:svname=MGS
>
>
> - mdt
>
>  sudo mkfs.lustre --mdt --reformat --backfstype=zfs --fsname=lustre
> --index=0  --mgsnode=10.34.0.103@tcp0 lustre-mdt0/mdt0 /dev/vda2
>
>
>Permanent disk data:
>
> Target: lustre:MDT
>
> Index:  0
>
> Lustre FS:  lustre
>
> Mount type: zfs
>
> Flags:  0x61
>
>   (MDT first_time update )
>
> Persistent mount opts:
>
> Parameters: mgsnode=10.34.0.103@tcp
>
> mkfs_cmd = zpool create -f -O canmount=off lustre-mdt0 /dev/vda2
>
> mkfs_cmd = zfs create -o canmount=off  lustre-mdt0/mdt0
>
>   xattr=sa
>
>   dnodesize=auto
>
> Writing lustre-mdt0/mdt0 properties
>
>   lustre:mgsnode=10.34.0.103@tcp
>
>   lustre:version=1
>
>   lustre:flags=97
>
>   lustre:index=0
>
>   lustre:fsname=lustre
>
>   lustre:svname=lustre:MDT
>
>
> - ost
>
> sudo mkfs.lustre --ost --reformat --backfstype=zfs --fsname=lustre
> --index=0  --mgsnode=10.34.0.103@tcp0 lustre-ost0/ost0 /dev/vda2
>
>
>Permanent disk data:
>
> Target: lustre:OST
>
> Index:  0
>
> Lustre FS:  lustre
>
> Mount type: zfs
>
> Flags:  0x62
>
>   (OST first_time update )
>
> Persistent mount opts:
>
> Parameters: mgsnode=10.34.0.103@tcp
>
> mkfs_cmd = zpool create -f -O canmount=off lustre-ost0 /dev/vda2
>
> mkfs_cmd = zfs create -o canmount=off  lustre-ost0/ost0
>
>   xattr=sa
>
>   dnodesize=auto
>
>   recordsize=1M
>
> Writing lustre-ost0/ost0 properties
>
>   lustre:mgsnode=10.34.0.103@tcp
>
>   lustre:version=1
>
>   lustre:flags=98
>
>   lustre:index=0
>
>   lustre:fsname=lustre
>
>   lustre:svname=lustre:OST
>
>
>
> I have 51G for /dev/vda
>
> df -H /dev/vda2
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> devtmpfs 51G 0   51G   0% /dev
>
> On my client node,
>
> sudo mount -t lustre 10.34.0.103@tcp0:/lustre /mnt
>
> However, the size is only 25M, shouldn't it be 51G ?
>
> df -H /mnt
>
> Filesystem   Size  Used Avail Use% Mounted on
>
> 10.34.0.103@tcp:/lustre   25M  3.2M   19M  15% /mnt
>
> Thanks
> Yao
>
> On Wed, Jun 28, 2023 at 12:22 PM Jeff Johnson <
> jeff.john...@aeoncomputing.com> wrote:
>
>> Yao,
>>
>> Either add the required kernel-{devel,debuginfo} so the osd-ldiskfs
>> kernel modules can be built against your kernel or remove lustre-all-dkms
>> package and replace with lustre-zfs-dkms and build ZFS only Lustre modules.
>>
>> --Jeff
>>
>>
>> On Wed, Jun 28, 2023 at 8:45 AM Yao Weng  wrote:
>>
>>> I have error when installing lustre-all-dkms-2.15.3-1.el8.noarch
>>>
>>> Loading new lustre-all-2.15.3 DKMS files...
>>>
>>> Deprecated feature: REMAKE_INITRD (/usr/src/lustre-all-2.15.3/dkms.conf)
>>>
>>> Building for 4.18.0-477.15.1.el8_8.x86_64
>>> 4.18.0-477.10.1.el8_lustre.x86_64
>>>
>>> Building initial module for 4.18.0-477.15.1.el8_8.x86_64
>>>
>>> Deprecated feature: REMAK

Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'

2023-06-28 Thread Jeff Johnson
Yao,

Either add the required kernel-{devel,debuginfo} so the osd-ldiskfs kernel
modules can be built against your kernel or remove lustre-all-dkms package
and replace with lustre-zfs-dkms and build ZFS only Lustre modules.

--Jeff


On Wed, Jun 28, 2023 at 8:45 AM Yao Weng  wrote:

> I have error when installing lustre-all-dkms-2.15.3-1.el8.noarch
>
> Loading new lustre-all-2.15.3 DKMS files...
>
> Deprecated feature: REMAKE_INITRD (/usr/src/lustre-all-2.15.3/dkms.conf)
>
> Building for 4.18.0-477.15.1.el8_8.x86_64 4.18.0-477.10.1.el8_lustre.x86_64
>
> Building initial module for 4.18.0-477.15.1.el8_8.x86_64
>
> Deprecated feature: REMAKE_INITRD
> (/var/lib/dkms/lustre-all/2.15.3/source/dkms.conf)
>
> realpath: /var/lib/dkms/spl/2.1.12/source: No such file or directory
>
> realpath: /var/lib/dkms/spl/kernel-4.18.0-477.15.1.el8_8.x86_64-x86_64: No
> such file or directory
>
> configure: WARNING:
>
>
> Disabling ldiskfs support because complete ext4 source does not exist.
>
>
> If you are building using kernel-devel packages and require ldiskfs
>
> server support then ensure that the matching kernel-debuginfo-common
>
> and kernel-debuginfo-common- packages are installed.
>
>
> awk: fatal: cannot open file
> `/var/lib/dkms/lustre-all/2.15.3/build/_lpb/Makefile.compile.lustre' for
> reading (No such file or directory)
>
> ./configure: line 53751: test: too many arguments
>
> ./configure: line 53755: test: too many arguments
>
> Error!  Build of osd_ldiskfs.ko failed for: 4.18.0-477.15.1.el8_8.x86_64
> (x86_64)
>
> Make sure the name of the generated module is correct and at the root of
> the
>
> build directory, or consult make.log in the build directory
>
> /var/lib/dkms/lustre-all/2.15.3/build for more information.
>
> warning: %post(lustre-all-dkms-2.15.3-1.el8.noarch) scriptlet failed, exit
> status 7
>
>
> Error in POSTIN scriptlet in rpm package lustre-all-dkms
>
>
>
> On Wed, Jun 28, 2023 at 9:57 AM Yao Weng  wrote:
>
>> Thank Jeff:
>> My installation steps are
>>
>> step1: set local software provision (
>> https://wiki.lustre.org/Installing_the_Lustre_Software)
>> I download all rpm from
>>
>> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/server
>>
>> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/client
>>
>> https://downloads.whamcloud.com/public/e2fsprogs/1.47.0.wc2/el8
>>
>> step 2: Install the Lustre e2fsprogs distribution:
>>
>> sudo yum --nogpgcheck --disablerepo=* --enablerepo=e2fsprogs-wc install
>> e2fsprogs
>>
>>
>> step 3Install EPEL repository support:
>>
>> sudo yum -y install epel-release
>>
>>
>> step 4 Follow the instructions from the ZFS on Linux project
>> <https://openzfs.github.io/openzfs-docs/Getting%20Started/RHEL-based%20distro/index.html>
>>  to
>> install the ZFS YUM repository definition. Use the DKMS package repository
>> (the default)
>>
>> sudo dnf install https://zfsonlinux.org/epel/zfs-release-2-3$(rpm --eval
>> "%{dist}").noarch.rpm
>>
>>
>> step 5 Install the Lustre-patched kernel packages. Ensure that the Lustre
>> repository is picked for the kernel packages, by disabling the OS repos:
>>
>>
>> sudo yum --nogpgcheck --disablerepo=base,extras,updates \
>>
>> --enablerepo=lustre-server install \
>>
>> kernel \
>>
>> kernel-devel \
>>
>> kernel-headers \
>>
>> kernel-tools \
>>
>> kernel-tools-libs \
>>
>> kernel-tools-libs-devel
>>
>>
>> step 6 Generate a persistent hostid on the machine, if one does not
>> already exist. This is needed to help protect ZFS zpools against
>> simultaneous imports on multiple servers. For example:
>>
>> hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'`
>>
>> [ "$hid" = `hostid` ] || genhostid
>>
>>
>> step 7 reboot
>>
>> step 8 install lustre and zfs
>> sudo yum --skip-broken --nogpgcheck --enablerepo=lustre-server install \
>>  lustre-dkms \
>>  lustre-osd-zfs-mount \
>>  lustre \
>>  lustre-resource-agents \
>> lustre-dkms \
>>  zfs
>>
>> step 9 Load the Lustre and ZFS kernel modules to verify that the
>> software has installed correctly
>>
>> sudo modprobe -v zfs
>>
>> sudo modprobe -v lustre
>>
>> On Wed, Jun 28, 2023 at 1:04 AM Jeff Johnson <
>> jeff.john...@aeoncomputing.com> wrote:
>>
>>> Did you install t

Re: [lustre-discuss] Lustre Installation Question: 158-c: Can't load module 'osd-zfs'

2023-06-27 Thread Jeff Johnson
Did you install the Lustre server RPMs?
Your email lists both server and client repositories.

Are you using DKMS? Did you install and built lustre-zfs-dkms or
lustre-all-dkms packages?

It doesn’t appear that you have any Lustre server kernel modules loaded,
which makes me suspect you didn’t install or built the server side RPMs or
DKMS trees



On Tue, Jun 27, 2023 at 21:41 Yao Weng via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hi:
> I follow https://wiki.lustre.org/Installing_the_Lustre_Software to
> install lustre.
>
> My kernel is
>
> $ uname -r
>
> 4.18.0-477.13.1.el8_8.x86_64
>
> I install
>
> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/server
>
> https://downloads.whamcloud.com/public/lustre/lustre-2.15.3/el8.8/client
>
> https://downloads.whamcloud.com/public/e2fsprogs/1.47.0.wc2/el8
>
>
> lsmod | grep lustre
>
> *lustre*   1048576  0
>
> lmv   204800  1 *lustre*
>
> mdc   282624  1 *lustre*
>
> lov   344064  2 mdc,*lustre*
>
> ptlrpc   2490368  7 fld,osc,fid,lov,mdc,lmv,*lustre*
>
> obdclass 3633152  8 fld,osc,fid,ptlrpc,lov,mdc,lmv,*lustre*
>
> lnet  704512  7 osc,obdclass,ptlrpc,ksocklnd,lmv,*lustre*
>
> libcfs266240  11
> fld,lnet,osc,fid,obdclass,ptlrpc,ksocklnd,lov,mdc,lmv,
>
> *lustre*lsmod | grep zfs
>
> lsmod | grep zfs
>
> *zfs*  3887104  0
>
> zunicode  335872  1 *zfs*
>
> zzstd 512000  1 *zfs*
>
> zlua  176128  1 *zfs*
>
> zavl   16384  1 *zfs*
>
> icp   319488  1 *zfs*
>
> zcommon   102400  2 *zfs*,icp
>
> znvpair90112  2 *zfs*,zcommon
>
> spl   114688  6 *zfs*,icp,zzstd,znvpair,zcommon,zavl
>
>
> I am able to create mgs/mdt/ost
>
> But when I try to mount
>
> sudo mount.lustre lustre-mgs/mgs /lustre/mnt
>
> mount.lustre: mount lustre-mgs/mgs at /lustre/mnt failed: No such device
>
> Are the lustre modules loaded?
>
>  Check /etc/modprobe.conf and /proc/filesystems
>
> dmesg gives these error
>
> [76783.604090] LustreError: 158-c: Can't load module 'osd-zfs'
>
> [76783.606174] LustreError: 223535:0:(genops.c:361:class_newdev()) OBD:
> unknown type: osd-zfs
>
> [76783.607856] LustreError: 223535:0:(obd_config.c:620:class_attach())
> Cannot create device MGS-osd of type osd-zfs : -19
>
> [76783.609805] LustreError:
> 223535:0:(obd_mount.c:195:lustre_start_simple()) MGS-osd attach error -19
>
> [76783.611426] LustreError:
> 223535:0:(obd_mount_server.c:1993:server_fill_super()) Unable to start osd
> on lustre-mgs/mgs: -19
>
> [76783.613457] LustreError: 223535:0:(super25.c:183:lustre_fill_super())
> llite: Unable to mount : rc = -19
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] CentOS Stream 8/9 support?

2023-06-22 Thread Jeff Johnson
This has the makings of being significant enough of an impact that I don't
think it is a done deal. I'm sure someone in DC is calling someone at IBM.
Even if USG does nothing, this is the kind of thing that EU regulators have
stomped on in the past.

I suspect this isn't open and shut...yet.

On Thu, Jun 22, 2023 at 11:35 AM Laura Hild via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> We have one, small Stream 8 cluster, which is currently running a Lustre
> client to which I cherry-picked a kernel compatibility patch.  I could
> imagine the effort being considerably more for the server component.  I
> also wonder, even if Whamcloud were to provide releases for Stream kernels,
> how many sites would be happy with Stream's five-year lifetimes.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Jeff Johnson
Maybe someone else in the list can add clarity but I don't believe a
recovery process on mount would keep the MDS read-only or trigger that
trace. Something else may be going on.

I would start from the ground up. Bring your servers up, unmounted. Ensure
lnet is loaded and configured properly. Test lnet using ping or
lnet_selftest from your MDS to all of your OSS nodes. Then mount your
combined MGS/MDT volume on the MDS and see what happens.

Is your MDS in a high-availability pair?
What version of Lustre are you running?

...just a few things readers on the list might want to know.

--Jeff


On Wed, Jun 21, 2023 at 11:21 AM Mike Mosley 
wrote:

> Jeff,
>
> At this point we have the OSS shutdown.  We were coming back from. full
> outage and so we are trying to get the MDS up before starting to bring up
> the OSS.
>
> Mike
>
> On Wed, Jun 21, 2023 at 2:15 PM Jeff Johnson <
> jeff.john...@aeoncomputing.com> wrote:
>
>> Mike,
>>
>> Have you made sure the the o2ib interface on all of your Lustre servers
>> (MDS & OSS) are functioning properly? Are you able to `lctl ping
>> x.x.x.x@o2ib` successfully between MDS and OSS nodes?
>>
>> --Jeff
>>
>>
>> On Wed, Jun 21, 2023 at 10:08 AM Mike Mosley via lustre-discuss <
>> lustre-discuss@lists.lustre.org> wrote:
>>
>>> Rick,
>>> 172.16.100.4 is the IB address of one of the OSS servers.I
>>>  believe the mgt and mdt0 are the same target.   My understanding is
>>> that we have a single instanceof the MGT which is on the first MDT server
>>> i.e. it was created via a comand similar to:
>>>
>>> # mkfs.lustre --fsname=scratch --index=0 --mdt --mgs --replace /dev/sdb
>>>
>>> Does that make sense.
>>>
>>> On Wed, Jun 21, 2023 at 12:55 PM Mohr, Rick  wrote:
>>>
>>>> Which host is 172.16.100.4?  Also, are the mgt and mdt0 on the same
>>>> target or are they two separate targets just on the same host?
>>>>
>>>> --Rick
>>>>
>>>>
>>>> On 6/21/23, 12:52 PM, "Mike Mosley" >>> <mailto:mike.mos...@charlotte.edu>> wrote:
>>>>
>>>>
>>>> Hi Rick,
>>>>
>>>>
>>>> The MGS/MDS are combined. The output I posted is from the primary.
>>>>
>>>>
>>>>
>>>>
>>>> THanks,
>>>>
>>>>
>>>>
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 21, 2023 at 12:27 PM Mohr, Rick >>> moh...@ornl.gov> <mailto:moh...@ornl.gov <mailto:moh...@ornl.gov>>>
>>>> wrote:
>>>>
>>>>
>>>> Mike,
>>>>
>>>>
>>>> It looks like the mds server is having a problem contacting the mgs
>>>> server. I'm guessing the mgs is a separate host? I would start by looking
>>>> for possible network problems that might explain the LNet timeouts. You can
>>>> try using "lctl ping" to test the LNet connection between nodes, and you
>>>> can also try regular "ping" between the IP addresses on the IB interfaces.
>>>>
>>>>
>>>> --Rick
>>>>
>>>>
>>>>
>>>>
>>>> On 6/21/23, 11:35 AM, "lustre-discuss on behalf of Mike Mosley via
>>>> lustre-discuss" >>> lustre-discuss-boun...@lists.lustre.org> <_blank> >>> lustre-discuss-boun...@lists.lustre.org >>> lustre-discuss-boun...@lists.lustre.org> <_blank>> on behalf of
>>>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>
>>>> <_blank> <mailto:lustre-discuss@lists.lustre.org >>> lustre-discuss@lists.lustre.org> <_blank>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> Greetings,
>>>>
>>>>
>>>>
>>>>
>>>> We have experienced some type of issue that is causing both of our MDS
>>>> servers to only be able to mount the mdt device in read only mode. Here are
>>>> some of the error messages we are seeing in the log files below. We lost
>>>> our Lustre expert a while back and we are not sure how to proceed to
>>>> troubleshoot this issue. Can anybody provide us guidance on how to proceed?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
&

Re: [lustre-discuss] [EXTERNAL] MDTs will only mount read only

2023-06-21 Thread Jeff Johnson
kernel: [] ?
>> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> mgc_process_config+0x88b/0x13f0 [mgc]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> lustre_process_log+0x2d8/0xad0 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> libcfs_debug_msg+0x57/0x80 [libcfs]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> lprocfs_counter_add+0xf9/0x160 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> server_start_targets+0x13a4/0x2a20 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> lustre_start_mgc+0x260/0x2510 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> server_fill_super+0x10cc/0x1890 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> lustre_fill_super+0x468/0x960 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> lustre_common_put_super+0x270/0x270 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> mount_nodev+0x4f/0xb0
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> lustre_mount+0x38/0x60 [obdclass]
>> Jun 20 15:12:14 hyd-mds1 kernel: [] mount_fs+0x3e/0x1b0
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> vfs_kern_mount+0x67/0x110
>> Jun 20 15:12:14 hyd-mds1 kernel: [] do_mount+0x1ef/0xd00
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> __check_object_size+0x1ca/0x250
>> Jun 20 15:12:14 hyd-mds1 kernel: [] ?
>> kmem_cache_alloc_trace+0x3c/0x200
>> Jun 20 15:12:14 hyd-mds1 kernel: [] SyS_mount+0x83/0xd0
>> Jun 20 15:12:14 hyd-mds1 kernel: []
>> system_call_fastpath+0x25/0x2a
>> Jun 20 15:13:14 hyd-mds1 kernel: LNet:
>> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for
>> 172.16.100.4@o2ib: 9 seconds
>> Jun 20 15:13:14 hyd-mds1 kernel: LNet:
>> 4458:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 239 previous
>> similar messages
>> Jun 20 15:14:14 hyd-mds1 kernel: INFO: task mount.lustre:4123 blocked for
>> more than 120 seconds.
>> Jun 20 15:14:14 hyd-mds1 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jun 20 15:14:14 hyd-mds1 kernel: mount.lustre D 9f27a3bc5230 0 4123 1
>> 0x0086
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> dumpe2fs seems to show that the file systems are clean i.e.
>>
>>
>>
>>
>>
>>
>>
>>
>> dumpe2fs 1.45.6.wc1 (20-Mar-2020)
>> Filesystem volume name: hydra-MDT
>> Last mounted on: /
>> Filesystem UUID: 3ae09231-7f2a-43b3-a4ee-7f36080b5a66
>> Filesystem magic number: 0xEF53
>> Filesystem revision #: 1 (dynamic)
>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype
>> mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink
>> quota
>> Filesystem flags: signed_directory_hash
>> Default mount options: user_xattr acl
>> Filesystem state: clean
>> Errors behavior: Continue
>> Filesystem OS type: Linux
>> Inode count: 2247671504
>> Block count: 1404931944
>> Reserved block count: 70246597
>> Free blocks: 807627552
>> Free inodes: 2100036536
>> First block: 0
>> Block size: 4096
>> Fragment size: 4096
>> Reserved GDT blocks: 1024
>> Blocks per group: 20472
>> Fragments per group: 20472
>> Inodes per group: 32752
>> Inode blocks per group: 8188
>> Flex block group size: 16
>> Filesystem created: Thu Aug 8 14:21:01 2019
>> Last mount time: Tue Jun 20 15:19:03 2023
>> Last write time: Wed Jun 21 10:43:51 2023
>> Mount count: 38
>> Maximum mount count: -1
>> Last checked: Thu Aug 8 14:21:01 2019
>> Check interval: 0 ()
>> Lifetime writes: 219 TB
>> Reserved blocks uid: 0 (user root)
>> Reserved blocks gid: 0 (group root)
>> First inode: 11
>> Inode size: 1024
>> Required extra isize: 32
>> Desired extra isize: 32
>> Journal inode: 8
>> Default directory hash: half_md4
>> Directory Hash Seed: 2e518531-82d9-4652-9acd-9cf9ca09c399
>> Journal backup: inode blocks
>> MMP block number: 1851467
>> MMP update interval: 5
>> User quota inode: 3
>> Group quota inode: 4
>> Journal features: journal_incompat_revoke
>> Journal size: 4096M
>> Journal length: 1048576
>> Journal sequence: 0x0a280713
>> Journal start: 0
>> MMP_block:
>> mmp_magic: 0x4d4d50
>> mmp_check_interval: 6
>> mmp_sequence: 0xff4d4d50
>> mmp_update_date: Wed Jun 21 10:43:51 2023
>> mmp_update_time: 1687358631
>> mmp_node_name: hyd-mds1.uncc.edu <_blank> <_blank>
>> mmp_device_name: dm-0
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] storing Lustre jobid in file xattrs: seeking feedback

2023-05-12 Thread Jeff Johnson
Just a thought, instead of embedding the jobname itself, perhaps just a
least significant 7 character sha-1 hash of the jobname. Small chance of
collision, easy to decode/cross reference to jobid when needed. Just a
thought.

--Jeff


On Fri, May 12, 2023 at 3:08 PM Andreas Dilger via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hi Thomas,
> thanks for working on this functionality and raising this question.
>
> As you know, I'm inclined toward the user.job xattr, but I think it is
> never a good idea to unilaterally make policy decisions in the kernel that
> cannot be changed.
>
> As such, it probably makes sense to have a tunable parameter like "
> mdt.*.job_xattr=user.job" and then this could be changed in the future if
> there is some conflict (e.g. some site already uses the "user.job" xattr
> for some other purpose).
>
> I don't think the job_xattr should allow totally arbitrary values (e.g.
> overwriting trusted.lov or trusted.lma or security.* would be bad). One
> option is to only allow a limited selection of valid xattr namespaces, and
> possibly names:
>
>- NONE to turn this feature off
>- user, or trusted or system (if admin wants to restrict the ability
>of regular users to change this value?), with ".job" added
>automatically
>- user.* (or trusted.* or system.*) to also allow specifying the xattr
>name
>
> If we allow the xattr name portion to be specified (which I'm not sure
> about, but putting it out for completeness), it should have some reasonable
> limits:
>
>- <= 7 characters long to avoid wasting valuable xattr space in the
>inode
>- should not conflict with other known xattrs, which is tricky if we
>allow the name to be arbitrary. Possibly if in trusted (and system?)
>it should only allow trusted.job to avoid future conflicts?
>- maybe restrict it to contain "job" (or maybe "pbs", "slurm", ...) to
>reduce the chance of namespace clashes in user or system? However, I'm
>reluctant to restrict names in user since this *shouldn't* have any
>fatal side effects (e.g. data corruption like in trusted or system),
>and the admin is supposed to know what they are doing...
>
>
> On May 4, 2023, at 15:53, Bertschinger, Thomas Andrew Hjorth via
> lustre-discuss  wrote:
>
> Hello Lustre Users,
>
> There has been interest in a proposed feature
> https://jira.whamcloud.com/browse/LU-13031 to store the jobid with each
> Lustre file at create time, in an extended attribute. An open question is
> which xattr namespace is to use between "user", the Lustre-specific
> namespace "lustre", "trusted", or even perhaps "system".
>
> The correct namespace likely depends on how this xattr will be used. For
> example, will interoperability with other filesystems be important?
> Different namespaces have their own limitations so the correct choice
> depends on the use cases.
>
> I'm looking for feedback on applications for this feature. If you have
> thoughts on how you could use this, please feel free to share them so that
> we design it in a way that meets your needs.
>
> Thanks!
>
> Tom Bertschinger
> LANL
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Disk failures triggered during OST creation and mounting on OSS Servers

2023-05-10 Thread Jeff Johnson
unt: Succeeded.
> May  9 13:36:09 sphnxoss47 kernel: LDISKFS-fs (md2): mounted filesystem
> with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
> Show less
> 11:03 AM
>
> -
>
> it just repeats for all of the md raids, then the errors start and the
> drive fails and is disabled:
>
> May  9 13:44:31 sphnxoss47 kernel: LustreError:
> 48069:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount
> : rc = -110
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:33 sphnxoss47 kernel: mpt3sas_cm1: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> 
> 
> May  9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#1102 FAILED
> Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=1s
> May  9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#1102 CDB:
> Read(10) 28 00 00 00 87 79 00 00 01 00
> May  9 13:44:33 sphnxoss47 kernel: blk_update_request: I/O error, dev
> sdef, sector 277448 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0
> May  9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#6800 FAILED
> Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=1s
> May  9 13:44:33 sphnxoss47 kernel: sd 16:0:31:0: [sdef] tag#6800 CDB:
> Read(10) 28 00 00 00 87 dd 00 00 01 00
> May  9 13:44:33 sphnxoss47 kernel: blk_update_request: I/O error, dev
> sdef, sector 278248 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0
> May  9 13:44:33 sphnxoss47 kernel: device-mapper: multipath: 253:52:
> Failing path 128:112.
> May  9 13:44:33 sphnxoss47 multipathd[6051]: sdef: mark as failed
> May  9 13:44:33 sphnxoss47 multipathd[6051]: mpathae: remaining active
> paths: 1
> ...
> ...
> May  9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:34 sphnxoss47 kernel: mpt3sas_cm0: log_info(0x3112011a):
> originator(PL), code(0x12), sub_code(0x011a)
> May  9 13:44:34 sphnxoss47 kernel: md: super_written gets error=-5
> May  9 13:44:34 sphnxoss47 kernel: md/raid:md8: Disk failure on dm-55,
> disabling device.
> May  9 13:44:34 sphnxoss47 kernel: md: super_written gets error=-5
> May  9 13:44:34 sphnxoss47 kernel: md/raid:md8: Operation continuing on
> 9 devices.
> May  9 13:44:34 sphnxoss47 multipathd[6051]: sdah: mark as failed
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] Mounting lustre on block device

2023-03-16 Thread Jeff Johnson
If you *really* want a block device on a client that resides in Lustre you
*could* create a file in Lustre and then make that file a loopback device
with losetup. Of course, your mileage will vary *a lot* based on use case,
access, underlying LFS configuration.

dd if=/dev/zero
of=/my_lustre_mountpoint/some_subdir/big_raw_file bs=1048576 count=10
losetup -f /my_lustre_mountpoint/some_subdir/big_raw_file
*assuming loop0 is created*
some_fun_command /dev/loop0

Disclaimer: Just because you *can* do this, doesn't necessarily mean it is
a good idea



On Thu, Mar 16, 2023 at 3:29 PM Mohr, Rick via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Are you asking if you can mount Lustre on a client so that it shows up as
> a block device?  If so, the answer to that is you can't.  Lustre does not
> appear as a block device to the clients.
>
> -Rick
>
>
>
> On 3/16/23, 3:44 PM, "lustre-discuss on behalf of Shambhu Raje via
> lustre-discuss"  lustre-discuss-boun...@lists.lustre.org> on behalf of
> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>>
> wrote:
>
>
> When we mount a lustre file system on client, the lustre file system does
> not use block device on client side. Instead it uses virtual file system
> namespace. Mounting point will not be shown when we do 'lsblk'. As it only
> show on 'df-hT'.
>
>
> How can we mount lustre file system on block such that when we write
> something with lusterfs then it can be shown in block device??
> Can share command??
>
>
>
>
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Memory Management in Lustre

2022-01-19 Thread Jeff Johnson
Ellis,

I haven't messed with it much personally but if you look at some of
the Lustre module parameters, like in the case of module obdclass, you
will see some options that could be of interest like lu_cache_percent.

I'm sure a Whamcloud person might chime in with more detail.

# modinfo obdclass
filename:   /lib/modules/3.10.0-957.27.2.el7.DPC.x86_64/extra/obdclass.ko.xz
license:GPL
version:2.12.2
description:Lustre Class Driver
author: OpenSFS, Inc. <http://www.lustre.org/>
alias:  fs-lustre
retpoline:  Y
rhelversion:7.6
srcversion: 3D7126D7BB611F089C67867
depends:libcfs,lnet,crc-t10dif
vermagic:   3.10.0-957.27.2.el7.DPC.x86_64 SMP mod_unload modversions
parm:   lu_cache_percent:Percentage of memory to be used as
lu_object cache (int)
parm:   lu_cache_nr:Maximum number of objects in lu_object cache (long)
parm:   lprocfs_no_percpu_stats:Do not alloc percpu data for
lprocfs stats (int)

--Jeff


On Wed, Jan 19, 2022 at 6:35 PM Ellis Wilson via lustre-discuss
 wrote:
>
> Hi folks,
>
> Broader (but related) question than my current malaise with OOM issues on 
> 2.14/2.15:  Is there any documentation or can somebody point me at some code 
> that explains memory management within Lustre?  I've hunted through Lustre 
> manuals, the Lustre internals doc, and a bunch of code, but can find nothing 
> that documents the memory architecture in place.  I'm specifically looking at 
> PTLRPC and OBD code right now, and I can't seem to find anywhere that 
> explicitly limits the amount of allocations Lustre will perform.  On other 
> filesystems I've worked on there are memory pools that you can explicitly 
> size with maxes, and while these may be discrete between areas or reference 
> counters used to leverage a system-shared pool, I expected to see /something/ 
> that might bake in limits of some kind.  I'm sure I'm just not finding it.  
> Any help is greatly appreciated.
>
> Best,
>
> ellis
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Jeff Johnson
pathd: sddy [128:0]: path added to devmap
> OST0051
> Jul  4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off
> Jul  4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 06:02:07 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk
> Jul  4 06:02:07 oss03 multipathd: sddy: add path (uevent)
> Jul  4 06:02:07 oss03 multipathd: sddy [128:0]: path added to devmap
> OST0051
> Jul  4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off
> Jul  4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 06:25:54 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk
> Jul  4 06:25:54 oss03 multipathd: sddy: add path (uevent)
> Jul  4 06:25:54 oss03 multipathd: sddy [128:0]: path added to devmap
> OST0051
> Jul  4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off
> Jul  4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  4 07:22:23 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk
> Jul  4 07:22:23 oss03 multipathd: sddy: add path (uevent)
> Jul  4 07:22:23 oss03 multipathd: sddy [128:0]: path added to devmap
> OST0051
> Jul  6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] 34863054848 4096-byte
> logical blocks: (142 TB/129 TiB)
> Jul  6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Write Protect is off
> Jul  6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Write cache: enabled,
> read cache: enabled, supports DPO and FUA
> Jul  6 07:59:41 oss03 kernel: sd 15:0:0:92: [sddy] Attached SCSI disk
> Jul  6 07:59:42 oss03 multipathd: sddy: add path (uevent)
> Jul  6 07:59:42 oss03 multipathd: sddy [128:0]: path added to devmap
> OST0051
>
> On Wed, Jul 7, 2021 at 7:24 AM Jeff Johnson <
> jeff.john...@aeoncomputing.com> wrote:
>
>> What devices are underneath dm-21 and are there any errors in
>> /var/log/messages for those devices? (assuming /dev/sdX devices underneath)
>>
>> Run `ls /sys/block/dm-21/slaves` to see what devices are beneath dm-21
>>
>>
>>
>>
>>
>> On Tue, Jul 6, 2021 at 20:09 David Cohen 
>> wrote:
>>
>>> Hi,
>>> The index of the OST is unique in the system and free for the new one,
>>> as it is increased by "1" for every new OST created, so whatever it
>>> converts to should not be relevant to it's refusal to mount, or am I
>>> mistaken?
>>>
>>> I'm pasting the log messages again, in case they were lost up the
>>> thread, adding the output of "fdisk -l", should the OST size be the issue:
>>>
>>> lctl dk show tens of thousands of lines repeating the same error after
>>> attempting to mount the OST:
>>>
>>> 0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>> 0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>> 0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>>
>>> in /var/log/messages I see the following corresponding to dm21 which is
>>> the new OST:
>>>
>>> Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
>>> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
>>> please wait.
>>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
>>> maximum tree depth=5
>>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>>> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
>>> mount: IO failure
>>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>>> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
>>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
>>> with errors, running e2fsck is recommended
>>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
>>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem
>>> w

Re: [lustre-discuss] Unable to mount new OST

2021-07-06 Thread Jeff Johnson
behalf of David Cohen" <
>> lustre-discuss-boun...@lists.lustre.org on behalf of
>> cda...@physics.technion.ac.il> wrote:
>>
>>
>>
>> Thanks Andreas,
>>
>> I'm aware that index 51 actually translates to hex 33
>> (local-OST0033_UUID).
>> I don't believe that's the reason for the failed mount as it is only an
>> index that I increase for every new OST and there are no duplicates.
>>
>>
>>
>> lctl dk show tens of thousands of lines repeating the same error after
>> attempting to mount the OST:
>>
>>
>>
>> 0010:1000:26.0:1625546374.322973:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>> 0010:1000:26.0:1625546374.322974:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>> 0010:1000:26.0:1625546374.322975:0:248211:0:(osd_scrub.c:2039:osd_ios_scan_one())
>> local-OST0033: fail to set LMA for init OI scrub: rc = -30
>>
>>
>>
>> in /var/log/messages I see the following corresponding to dm21 which is
>> the new OST:
>>
>>
>>
>> Jul  6 07:38:37 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected,
>> please wait.
>>
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): file extents enabled,
>> maximum tree depth=5
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_clear_journal_err:4862: Filesystem error recorded from previous
>> mount: IO failure
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> ldiskfs_clear_journal_err:4863: Marking fs in need of filesystem check.
>> Jul  6 07:39:19 oss03 kernel: LDISKFS-fs (dm-21): warning: mounting fs
>> with errors, running e2fsck is recommended
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): recovery complete
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): mounted filesystem with
>> ordered data mode. Opts:
>> user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs error (device dm-21):
>> htree_dirblock_to_tree:1278: inode #2: block 21233: comm mount.lustre: bad
>> entry in directory: rec_len is too small for name_len - offset=4084(4084),
>> inode=0, rec_len=12
>> , name_len=0
>> Jul  6 07:39:22 oss03 kernel: Aborting journal on device dm-21-8.
>> Jul  6 07:39:22 oss03 kernel: LDISKFS-fs (dm-21): Remounting filesystem
>> read-only
>> Jul  6 07:39:24 oss03 kernel: LDISKFS-fs warning (device dm-21):
>> kmmpd:187: kmmpd being stopped since filesystem has been remounted as
>> readonly.
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): error count since last
>> fsck: 6
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): initial error at time
>> 1625367384: htree_dirblock_to_tree:1278: inode 2: block 21233
>> Jul  6 07:44:22 oss03 kernel: LDISKFS-fs (dm-21): last error at time
>> 1625546362: htree_dirblock_to_tree:1278: inode 2: block 21233
>>
>> As I mentioned before mount never completes so the only way out of that
>> is force reboot.
>>
>> Thanks,
>> David
>>
>>
>>
>> On Tue, Jul 6, 2021 at 8:07 AM Andreas Dilger 
>> wrote:
>>
>>
>>
>>
>>
>> On Jul 5, 2021, at 09:05, David Cohen 
>> wrote:
>>
>>
>>
>> Hi,
>>
>> I'm using Lustre 2.10.5 and lately tried to add a new OST.
>>
>> The OST was formatted with the command below, which other than the index
>> is the exact same one used for all the other OSTs in the system.
>>
>>
>>
>> mkfs.lustre --reformat --mkfsoptions="-t ext4 -T huge" --ost
>> --fsname=local  --index=0051 --param ost.quota_type=ug
>> --mountfsoptions='errors=remount-ro,extents,mballoc' --mgsnode=10.0.0.3@tcp
>> --mgsnode=10.0.0.1@tc
>>
>> p --mgsnode=10.0.0.2@tcp --servicenode=10.0.0.3@tcp
>> --servicenode=10.0.0.1@tcp --servicenode=10.0.0.2@tcp /dev/mapper/OST0051
>>
>>
>>
>> Note that your "--index=0051" value is probably interpreted as an octal
>> number "41", it should be "--index=0x0051" or "--index=0x51" (hex, to match
>> the OST device name) or "--index=81" (decimal).
>>
>>
>>
>>
>>
>> When trying to mount the with:
>> mount.lustre /dev/mapper/OST0051 /Lustre/OST0051
>>
>>
>>
>> The system stays on 100% CPU (one core) forever and the mount never
>> completes, not even after a week.
>>
>>
>> I tried tunefs.lustre --writeconf --erase-params on the MDS and all the
>> other targets, but the behaviour remains the same.
>>
>>
>>
>> Cheers, Andreas
>>
>> --
>>
>> Andreas Dilger
>>
>> Lustre Principal Architect
>>
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] zfs

2020-12-21 Thread Jeff Johnson
I was just popping a big bowl of popcorn for this... ;-D

On Mon, Dec 21, 2020 at 6:59 AM Peter Jones  wrote:

> Just in case anyone was wondering – the poster never did reach out to me
> so this does seem to be more of a case of phishing/trolling rather than
> someone being genuinely confused.
>
>
>
> *From: *Peter Jones 
> *Date: *Monday, December 14, 2020 at 5:03 AM
> *To: *Samantha Smith , "
> lustre-discuss@lists.lustre.org" 
> *Subject: *Re: [lustre-discuss] zfs
>
>
>
> Sam
>
>
>
> Welcome to the list. This is surprising for a number of reasons. Could you
> please reach out to me directly from your corporate account (rather than
> gmail) and I’ll be happy to work this through with you.
>
>
>
> Thanks
>
>
>
> Peter
>
>
>
> *From: *lustre-discuss  on
> behalf of Samantha Smith 
> *Date: *Sunday, December 13, 2020 at 5:35 PM
> *To: *"lustre-discuss@lists.lustre.org" 
> *Subject: *[lustre-discuss] zfs
>
>
>
> Our team received a demand letter from an Oracle attorney claiming patent
> violations on zfs used in our DDN lustre cluster.
>
>
>
> We called our DDN sales person who gave us a non-answer and has refused to
> call us back.
>
>
>
> How are other people dealing with this?
>
>
>
> sam
>
>
> _______
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] error mounting client

2020-08-25 Thread Jeff Johnson
Your output shows Infiniband NIDs (@o2ib). If you are mounting @tcp what is
your tcp access method to the Infiniband file system? Multihomed? lnet
router?

--Jeff

On Tue, Aug 25, 2020 at 8:32 AM Peeples, Heath 
wrote:

> We have just build a 2.12.5 cluster.  When trying to mount the fs (via
> tcp).  I get the following errors.  Would anyone have an idea what  the
> problem might be?  Thanks in advance
>
>
>
>
>
> [10680.535157] LustreError: 15c-8: MGC192.168.8.8@tcp: The configuration
> from log 'ldata-client' failed (-2). This may be the result of
> communication errors between this node and the MGS, a bad configuration, or
> other errors. See the syslog for more information.
>
> [10680.883649] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup())
> ldata-clilov-91b118df1000: lov tgt 0 not cleaned! deathrow=0, lovrc=1
>
> [10680.886610] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) Skipped
> 4 previous similar messages
>
> [10680.890298] LustreError: 12634:0:(obd_config.c:610:class_cleanup())
> Device 9 not setup
>
> [10680.891816] Lustre: Unmounted ldata-client
>
> [10680.895178] LustreError: 12634:0:(obd_mount.c:1608:lustre_fill_super())
> Unable to mount  (-2)
>
> [10763.516841] LustreError: 12732:0:(ldlm_lib.c:494:client_obd_setup())
> can't add initial connection
>
> [10763.518368] LustreError: 12732:0:(obd_config.c:559:class_setup()) setup
> ldata-OST0006-osc-91b125029800 failed (-2)
>
> [10763.519806] LustreError:
> 12732:0:(obd_config.c:1835:class_config_llog_handler()) MGC192.168.8.8@tcp:
> cfg command failed: rc = -2
>
> [10763.522603] Lustre:cmd=cf003 0:ldata-OST0006-osc
> 1:ldata-OST0006_UUID  2:172.23.0.116@o2ib
>
>
>
> Heath
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Centos 7.7 upgrade

2020-06-01 Thread Jeff Johnson
Alastair,

Are you sure you have functioning ZFS modules for that kernel, and that
they are loaded? Are you able to see your zpools? Did you use DKMS for
either ZFS, Lustre or both? If so, what does `dkms status` report?

--Jeff


On Mon, Jun 1, 2020 at 4:21 PM Alastair Basden 
wrote:

> Hi,
>
> We have just upgraded Lustre servers from 2.12.2 on centos 7.6 to 2.12.3
> on centos 7.7.
>
> The OSSs are on top of zfs (0.7.13 as recommended), and we are using
> 3.10.0-1062.1.1.el7_lustre.x86_64
>
> After the update, Lustre will no longer mount - and messages such as:
> Jun  2 00:02:44 hostname kernel: LustreError: 158-c: Can't load module
> 'osd-zfs'
> Jun  2 00:02:44 hostname kernel: LustreError: Skipped 875 previous similar
> messages
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226253:0:(genops.c:397:class_newdev()) OBD: unknown type: osd-zfs
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_config.c:403:class_attach()) Cannot create device
> lustfs-OST0006-osd of type osd-zfs : -19
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_config.c:403:class_attach()) Skipped 881 previous similar
> messages
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount.c:197:lustre_start_simple()) lustfs-OST0006-osd attach
> error -19
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount.c:197:lustre_start_simple()) Skipped 881 previous
> similar messages
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount_server.c:1947:server_fill_super()) Unable to start osd
> on lustfs-ost6/ost6: -19
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount_server.c:1947:server_fill_super()) Skipped 881 previous
> similar messages
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-19)
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226265:0:(obd_mount.c:1608:lustre_fill_super()) Skipped 881 previous
> similar messages
> Jun  2 00:02:44 hostname kernel: LustreError:
> 226253:0:(genops.c:397:class_newdev()) Skipped 887 previous similar messages
>
> Does anyone have any ideas?
>
> Thanks,
> Alastair.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Any way to dump Lustre quota data?

2019-09-05 Thread Jeff Johnson
Kevin,

There are files in /proc/fs/lustre/qmt/yourfsname-QMT/ that you can
pull it all from based on UID and GID. Look for files like md-0x0/glb-usr
 dt-0x0/glb-usr and files in
/proc/fs/lustre/osd-zfs/yourfsname-MDT/quota_slave.

I’m not in front of a keyboard, I’m cooking breakfast but I’ll follow up
with the exact files. You can cat them and maybe find what you’re looking
for.

—Jeff

On Thu, Sep 5, 2019 at 05:07 Kevin M. Hildebrand  wrote:

> Is there any way to dump the Lustre quota data in its entirety, rather
> than having to call 'lfs quota' individually for each user, group, and
> project?
>
> I'm currently doing this on a regular basis so we can keep graphs of how
> users and groups behave over time, but it's problematic for two reasons:
> 1.  Getting a comprehensive list of users and groups to iterate over is
> difficult- sure I can use the passwd/group files, but if a user has been
> deleted there may still be files owned by a now orphaned userid or groupid
> which I won't see.  We may also have thousands of users in the passwd file
> that don't have files on a particular Lustre filesystem, and doing lfs
> quota calls for those users wastes time.
> 2.  Calling lfs quota hundreds of times for each of the users, groups, and
> projects takes a while.  This reduces my ability to collect the data at the
> frequency I want.  Ideally I'd like to be able to collect every minute or
> so.
>
> I have two different Lustre installations, one running 2.8.0 with ldiskfs,
> the other running 2.10.8 with ZFS.
>
> Thanks,
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland
> Division of IT
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Wanted: multipath.conf for dell ME4 series arrays

2019-08-21 Thread Jeff Johnson
Andrew,

ME4084 is a dual-controller active/active hardware RAID array. Disclosing
some config data could be helpful.

   1. What underlying Lustre target filesystem? (assuming ldiskfs with a
   hardware RAID array)
   2. What does your current multipath.conf look like?

--Jeff

On Tue, Aug 20, 2019 at 11:47 PM Andrew Elwell 
wrote:

> Hi folks,
>
> we're seeing MMP reluctance to hand over the (umounted) OSTs to the
> partner pair on our shiny new ME4084 arrays,
>
> Does anyone have the device {} settings they'd be willing to share?
> My gut feel is we've not defined path failover properly and some
> timeouts need tweaking
>
>
> (4* ME4084's per 2 740 servers with SAS cabling, Lustre 2.10.8 and CentOS
> 7.x)
>
> Many thanks
>
>
> Andrew
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing dot com
www.aeoncomputing.com
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Jeff Johnson
If you only have those two processor models to choose from I’d do the 5217
for MDS and 5218 for OSS. If you were using ZFS for a backend definitely
the 5218 for the OSS. With ZFS your processors are also your RAID
controller so you have the disk i/o, parity calculation, checksums and ZFS
threads on top of the Lustre i/o and OS processes.

—Jeff

On Thu, Jul 4, 2019 at 13:30 Simon Legrand  wrote:

> Hello Jeff,
>
> Thanks for your quick answer. We plan to use ldiskfs, but I would be
> interested to know what could fit for zfs.
>
> Simon
>
> ------
>
> *De: *"Jeff Johnson" 
> *À: *"Simon Legrand" 
> *Cc: *"lustre-discuss" 
> *Envoyé: *Jeudi 4 Juillet 2019 20:40:40
> *Objet: *Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors
>
> Simon,
>
> Which backend do you plan on using? ldiskfs or zfs?
>
> —Jeff
>
> On Thu, Jul 4, 2019 at 10:41 Simon Legrand  wrote:
>
>> Dear all,
>>
>> We are currently configuring a Lustre filesystem and facing a dilemma. We
>> have the choice between two types of processors for an OSS and a MDS.
>> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T
>> (115W) - DDR4-2666
>> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo,
>> HT,16C/32T (105W) - DDR4-2666
>>
>> Basically, we have to choose between freequency and number of cores.
>> Our current architecture is the following:
>> - 1MDS with 11To SDD
>> - 3 OSS/OST (~ 3*80To)
>> Our final target is 6 OSS/OST with a single MDS.
>> Do anyone of you could help us to choose and explain us the reasons?
>>
>> Best regards,
>>
>> Simon
>> _______
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> --
> --
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.john...@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> <https://www.google.com/maps/search/4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117?entry=gmail=g>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
> --
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Frequency vs Cores for OSS/MDS processors

2019-07-04 Thread Jeff Johnson
Simon,

Which backend do you plan on using? ldiskfs or zfs?

—Jeff

On Thu, Jul 4, 2019 at 10:41 Simon Legrand  wrote:

> Dear all,
>
> We are currently configuring a Lustre filesystem and facing a dilemma. We
> have the choice between two types of processors for an OSS and a MDS.
> - Intel Xeon Gold 5217 3GHz, 11M Cache,10.40GT/s, 2UPI, Turbo, HT,8C/16T
> (115W) - DDR4-2666
> - Intel Xeon Gold 5218 2.3GHz, 22M Cache,10.40GT/s, 2UPI, Turbo,
> HT,16C/32T (105W) - DDR4-2666
>
> Basically, we have to choose between freequency and number of cores.
> Our current architecture is the following:
> - 1MDS with 11To SDD
> - 3 OSS/OST (~ 3*80To)
> Our final target is 6 OSS/OST with a single MDS.
> Do anyone of you could help us to choose and explain us the reasons?
>
> Best regards,
>
> Simon
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Issue updating lustre from 2.10.6 to 2.10.7

2019-04-12 Thread Jeff Johnson
Kurt,

I see you're using dkms. Take a look at `dkms status` and ensure that there
are no lingering installs from the previous version. Sometimes the older
version doesn't get uninstalled from /lib/modules/`uname -r`/ and the dkms
install process for the new version doesn't overwrite them.

--Jeff


On Fri, Apr 12, 2019 at 8:34 AM Kurt Strosahl  wrote:

> Good Morning,
>
>
>I've encountered an issue updating lustre from 2.10.6 to 2.10.7 on my
> metadata system.
>
>
> I installed the updated RPMs using whamcloud's yum repository, but when I
> run modprobe -v lustre I get the following errors:
>
> insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/libcfs.ko.xz
> insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/lnet.ko.xz
> networks=o2ib0(bond0)
> insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/obdclass.ko.xz
> insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/ptlrpc.ko.xz
> insmod /lib/modules/3.10.0-957.1.3.el7_lustre.x86_64/extra/fld.ko.xz
> modprobe: ERROR: could not insert 'lustre': Invalid argument
>
> dmesg shows the following:
>
> [ 377.981515] fld: disagrees about version of symbol class_export_put
> [ 377.981518] fld: Unknown symbol class_export_put (err -22)
> [ 377.981529] fld: disagrees about version of symbol
> req_capsule_server_pack
> [ 377.981531] fld: Unknown symbol req_capsule_server_pack (err -22)
> [ 377.981537] fld: disagrees about version of symbol req_capsule_set_size
> [ 377.981538] fld: Unknown symbol req_capsule_set_size (err -22)
> [ 377.981542] fld: disagrees about version of symbol req_capsule_client_get
> [ 377.981543] fld: Unknown symbol req_capsule_client_get (err -22)
> [ 377.981555] fld: disagrees about version of symbol lu_env_init
> [ 377.981556] fld: Unknown symbol lu_env_init (err -22)
> [ 377.981562] fld: disagrees about version of symbol ptlrpc_queue_wait
> [ 377.981563] fld: Unknown symbol ptlrpc_queue_wait (err -22)
> [ 377.981571] fld: disagrees about version of symbol lu_context_key_get
> [ 377.981573] fld: Unknown symbol lu_context_key_get (err -22)
> [ 377.981604] fld: disagrees about version of symbol lu_env_fini
> [ 377.981606] fld: Unknown symbol lu_env_fini (err -22)
> [ 377.981611] fld: disagrees about version of symbol
> lu_context_key_degister
> [ 377.981612] fld: Unknown symbol lu_context_key_degister (err -22)
> [ 377.981623] fld: disagrees about version of symbol class_exp2cliimp
> [ 377.981625] fld: Unknown symbol class_exp2cliimp (err -22)
> [ 377.981631] fld: disagrees about version of symbol req_capsule_set
> [ 377.981632] fld: Unknown symbol req_capsule_set (err -22)
>
> I've tried uninstalling and reinstalling the RPMs but this is always the
> state I end up in.
>
>
> I'm using zfs for the back end, and those modules work after the upgrade
>
>
> Here are the RPMs currently installed.
>
> perf-3.10.0-957.1.3.el7_lustre.x86_64
> lustre-resource-agents-2.10.7-1.el7.x86_64
> lustre-osd-zfs-mount-2.10.7-1.el7.x86_64
> kernel-headers-3.10.0-957.1.3.el7_lustre.x86_64
> lustre-2.10.7-1.el7.x86_64
> kernel-3.10.0-957.1.3.el7_lustre.x86_64
> perf-debuginfo-3.10.0-957.el7_lustre.x86_64
> lustre-zfs-dkms-2.10.7-1.el7.noarch
> kernel-debuginfo-common-x86_64-3.10.0-957.el7_lustre.x86_64
> kernel-devel-3.10.0-957.1.3.el7_lustre.x86_64
> bpftool-3.10.0-957.el7_lustre.x86_64
>
>
> Thank you,
>
> Kurt J. Strosahl
> System Administrator: Lustre, HPC
> Scientific Computing Group, Thomas Jefferson National Accelerator Facility
> _______
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-05 Thread Jeff Johnson
Jason,

If there are files located on the full OST that you can delete then delete
them without trying to deactivate the OST. There is a process where you can
find files solely located on the full OST using lfs then deactivate the
full OST, copy the files to new files with some suffix (.new perhaps) so
they end up on other OSTs, delete the original files, rename the new files
to the old file names (thereby removing the suffix used previously) and
then reactivating the deactivated OST. You might be lucky enough to find a
few very large single stripe files located on that OST where moving the
file gets you easy gains.

You *should* be able to delete a file that resides on an inactive OST but
I’m not over a keyboard where I can do it before making the assertion it is
possible.

—Jeff


On Sat, Jan 5, 2019 at 18:49 Jason Williams  wrote:

> Hello,
>
>
> We have a lustre system (version 2.10.4) that has unfortunately fallen
> victim to a 100% full OST... Every time we clear some space on it, the
> system fills it right back up again.
>
>
> I have looked around the internet and found you can disable an OST, but
> when I have tried that, any writes (including deletes) to the OST hang the
> clients indefinitely.  Does anyone know a way to make an OST basically
> "read-only" with the exception of deletes so we can work to clear out the
> OST?  Or better yet, a way to "drain" or move files off an OST with a
> script (keeping in mind it might not be known if the files are in use at
> the time).  Or even a way to tell lustre "Hey don't write any new data
> here, but reading and removing data is OK."
>
>
>
>
> --
> Jason Williams
> Assistant Director
> Systems and Data Center Operations.
> Maryland Advanced Research Computing Center (MARCC)
> Johns Hopkins University
> jas...@jhu.edu
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre Sizing

2018-12-31 Thread Jeff Johnson
Very forward versions...especially on ZFS.

You build OST volumes in a pool. If no other volumes are defined in a pool
then 100% of that pool will be available for the OST volume but the way ZFS
works the capacity doesn’t really belong to the OST volume until blocks are
allocated for writes. So you have a pool
Of a known size and you’re the admin. As long as nobody else can create a
ZFS volume in that pool then all of the capacity in that pool will go to
the OST eventually when new writes occur. Keep in mind that the same pool
can contain multiple snapshots (if created) so the pool is a “potential
capacity” but that capacity could be concurrently allocated to OST volume
writes, snapshots and other ZFS volumes (if created)

—Jeff



On Mon, Dec 31, 2018 at 22:20 ANS  wrote:

> Thanks Jeff. Currently i am using
>
> modinfo zfs | grep version
> version:0.8.0-rc2
> rhelversion:7.4
>
> lfs --version
> lfs 2.12.0
>
> And this is a fresh install. So is there any other possibility to show the
> complete zpool lun has been allocated for lustre alone.
>
> Thanks,
> ANS
>
>
>
> On Tue, Jan 1, 2019 at 11:44 AM Jeff Johnson <
> jeff.john...@aeoncomputing.com> wrote:
>
>> ANS,
>>
>> Lustre on top of ZFS has to estimate capacities and it’s fairly off when
>> the OSTs are new and empty. As objects are written to OSTs and capacity is
>> consumed it gets the sizing of capacity more accurate. At the beginning
>> it’s so off that it appears to be an error.
>>
>> What version are you running? Some patches have been added to make this
>> calculation more accurate.
>>
>> —Jeff
>>
>> On Mon, Dec 31, 2018 at 22:08 ANS  wrote:
>>
>>> Dear Team,
>>>
>>> I am trying to configure lustre with backend ZFS as file system with 2
>>> servers in HA. But after compiling and creating zfs pools
>>>
>>> zpool list
>>> NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP
>>> DEDUPHEALTH  ALTROOT
>>> lustre-data   54.5T  25.8M  54.5T- 16.0E 0% 0%
>>> 1.00xONLINE  -
>>> lustre-data1  54.5T  25.1M  54.5T- 16.0E 0% 0%
>>> 1.00xONLINE  -
>>> lustre-data2  54.5T  25.8M  54.5T- 16.0E 0% 0%
>>> 1.00xONLINE  -
>>> lustre-data3  54.5T  25.8M  54.5T- 16.0E 0% 0%
>>> 1.00xONLINE  -
>>> lustre-meta832G  3.50M   832G- 16.0E 0% 0%
>>> 1.00xONLINE  -
>>>
>>> and when mounted to client
>>>
>>> lfs df -h
>>> UUID   bytesUsed   Available Use% Mounted on
>>> home-MDT_UUID 799.7G3.2M  799.7G   0%
>>> /home[MDT:0]
>>> home-OST_UUID  39.9T   18.0M   39.9T   0%
>>> /home[OST:0]
>>> home-OST0001_UUID  39.9T   18.0M   39.9T   0%
>>> /home[OST:1]
>>> home-OST0002_UUID  39.9T   18.0M   39.9T   0%
>>> /home[OST:2]
>>> home-OST0003_UUID  39.9T   18.0M   39.9T   0%
>>> /home[OST:3]
>>>
>>> filesystem_summary:   159.6T   72.0M  159.6T   0% /home
>>>
>>> So out of total 54.5TX4=218TB i am getting only 159 TB usable. So can
>>> any one give the information regarding this.
>>>
>>> Also from performance prospective what are the zfs and lustre parameters
>>> to be tuned.
>>>
>>> --
>>> Thanks,
>>> ANS.
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>> --
>> --
>> Jeff Johnson
>> Co-Founder
>> Aeon Computing
>>
>> jeff.john...@aeoncomputing.com
>> www.aeoncomputing.com
>> t: 858-412-3810 x1001   f: 858-412-3845
>> m: 619-204-9061
>>
>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>> <https://maps.google.com/?q=4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117=gmail=g>
>>
>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>
>
>
> --
> Thanks,
> ANS.
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre Sizing

2018-12-31 Thread Jeff Johnson
ANS,

Lustre on top of ZFS has to estimate capacities and it’s fairly off when
the OSTs are new and empty. As objects are written to OSTs and capacity is
consumed it gets the sizing of capacity more accurate. At the beginning
it’s so off that it appears to be an error.

What version are you running? Some patches have been added to make this
calculation more accurate.

—Jeff

On Mon, Dec 31, 2018 at 22:08 ANS  wrote:

> Dear Team,
>
> I am trying to configure lustre with backend ZFS as file system with 2
> servers in HA. But after compiling and creating zfs pools
>
> zpool list
> NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP  DEDUP
>   HEALTH  ALTROOT
> lustre-data   54.5T  25.8M  54.5T- 16.0E 0% 0%  1.00x
>   ONLINE  -
> lustre-data1  54.5T  25.1M  54.5T- 16.0E 0% 0%  1.00x
>   ONLINE  -
> lustre-data2  54.5T  25.8M  54.5T- 16.0E 0% 0%  1.00x
>   ONLINE  -
> lustre-data3  54.5T  25.8M  54.5T- 16.0E 0% 0%  1.00x
>   ONLINE  -
> lustre-meta832G  3.50M   832G- 16.0E 0% 0%  1.00x
>   ONLINE  -
>
> and when mounted to client
>
> lfs df -h
> UUID   bytesUsed   Available Use% Mounted on
> home-MDT_UUID 799.7G3.2M  799.7G   0% /home[MDT:0]
> home-OST_UUID  39.9T   18.0M   39.9T   0% /home[OST:0]
> home-OST0001_UUID  39.9T   18.0M   39.9T   0% /home[OST:1]
> home-OST0002_UUID  39.9T   18.0M   39.9T   0% /home[OST:2]
> home-OST0003_UUID  39.9T   18.0M   39.9T   0% /home[OST:3]
>
> filesystem_summary:   159.6T   72.0M  159.6T   0% /home
>
> So out of total 54.5TX4=218TB i am getting only 159 TB usable. So can any
> one give the information regarding this.
>
> Also from performance prospective what are the zfs and lustre parameters
> to be tuned.
>
> --
> Thanks,
> ANS.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Can we install lustre on centos 7.5

2018-09-23 Thread Jeff Johnson
Shirshak,

Lustre 2.10.5 can be installed on RHEL/CentOS 7.5. Documentation can be
found in html, pdf and wiki formats at www.lustre.org

--Jeff

On Sun, Sep 23, 2018 at 8:09 AM shirshak bajgain 
wrote:

> I head there is problem in centos 7.5 kernal. It took around 10-20 fresh
> installation and the fact is there is no proper documentation of lusture
> installation as intel version is not working any more. Is there anyway to
> install lustre in centos 7.5? And where is proper guide for lusture
> installation.
>
> Thanks
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] rhel 7.5

2018-04-30 Thread Jeff Johnson
The ChangeLog in 2.11.59 doesn’t mention a 7.5 server kernel. You can
reference lustre/ChangeLog in the various tags in the Lustre git repo.

The official support matrix is here:
https://wiki.hpdd.intel.com/plugins/servlet/mobile?contentId=8126580#content/view/8126580

—Jeff

On Mon, Apr 30, 2018 at 11:01 Hebenstreit, Michael <
michael.hebenstr...@intel.com> wrote:

> I have 2.11 already running on a 7.5 clone (client only)
>
> -Original Message-
> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On
> Behalf Of Michael Di Domenico
> Sent: Monday, April 30, 2018 11:49 AM
> Cc: lustre-discuss <lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] rhel 7.5
>
> On Mon, Apr 30, 2018 at 10:09 AM, Jeff Johnson <
> jeff.john...@aeoncomputing.com> wrote:
> > RHEL 7.5 support comes in Lustre 2.10.4. Only path I can think of off
> > the top of my head is to git clone and build a 2.10.4 prerelease and
> > live on the bleeding edge. I’m not sure if all of the 7.5 work is
> > finished in the current prerelease or not.
>
> argh...  not sure i want to be that bleeding edge...  sadly, i can't find
> a release schedule for 2.10.4.  i wonder if 2.11 will work
>
>
>
> > —Jeff
> >
> > On Mon, Apr 30, 2018 at 06:21 Michael Di Domenico
> > <mdidomeni...@gmail.com>
> > wrote:
> >>
> >> On Mon, Apr 30, 2018 at 9:19 AM, Michael Di Domenico
> >> <mdidomeni...@gmail.com> wrote:
> >> > when i tried to compile 2.10.2 patchless client into rpms under
> >> > rhel
> >> > 7.5 using kernel 3.10.0-862.el7.x86_64
> >> >
> >> > the compilation went fine as far as i can tell and the rpm creation
> >> > seemed to work
> >> >
> >> > but when i went install the rpms i got
> >> >
> >> > Error: Package: kmod-lustre-client-2.10.2-1.el7.x86_64
> >> > (/kmod-lustre-client-2.10.2-1.el7.x86_64
> >> > requires: kernel < 3.10.0-694
> >>
> >> premature send...
> >>
> >> requires: kernel < 3.10.0-694
> >> Installed: kernel-3.10.0-862.el7.x86_64 (@updates/7.5)
> >>
> >> did i do something wrong in the recompile of the rpms for the target
> >> kernel or is there a workaround for this?
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> > --
> > --
> > Jeff Johnson
> > Co-Founder
> > Aeon Computing
> >
> > jeff.john...@aeoncomputing.com
> > www.aeoncomputing.com
> > t: 858-412-3810 x1001   f: 858-412-3845
> > m: 619-204-9061
> >
> > 4170 Morena Boulevard, Suite D - San Diego, CA 92117
> >
> > High-Performance Computing / Lustre Filesystems / Scale-out Storage
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] rhel 7.5

2018-04-30 Thread Jeff Johnson
RHEL 7.5 support comes in Lustre 2.10.4. Only path I can think of off the
top of my head is to git clone and build a 2.10.4 prerelease and live on
the bleeding edge. I’m not sure if all of the 7.5 work is finished in the
current prerelease or not.

—Jeff

On Mon, Apr 30, 2018 at 06:21 Michael Di Domenico <mdidomeni...@gmail.com>
wrote:

> On Mon, Apr 30, 2018 at 9:19 AM, Michael Di Domenico
> <mdidomeni...@gmail.com> wrote:
> > when i tried to compile 2.10.2 patchless client into rpms under rhel
> > 7.5 using kernel 3.10.0-862.el7.x86_64
> >
> > the compilation went fine as far as i can tell and the rpm creation
> > seemed to work
> >
> > but when i went install the rpms i got
> >
> > Error: Package: kmod-lustre-client-2.10.2-1.el7.x86_64
> > (/kmod-lustre-client-2.10.2-1.el7.x86_64
> > requires: kernel < 3.10.0-694
>
> premature send...
>
> requires: kernel < 3.10.0-694
> Installed: kernel-3.10.0-862.el7.x86_64 (@updates/7.5)
>
> did i do something wrong in the recompile of the rpms for the target
> kernel or is there a workaround for this?
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre mount in heterogeneous net environment-update

2018-02-28 Thread Jeff Johnson
Greetings Megan,

One scenario that could cause this is if your appliance-style Lustre MDS is
a high-availability server pair and your mount command is not declaring
both NIDs in the mount command *and* the MGS and MDT resources happen to be
presently residing on the MDS server you are not declaring in your mount
command.

If it is high-availability and the IPs of those servers is A.B.C.D and
A.B.C.E then make sure your command command looks something like:

mount -t lustre A.B.C.D@tcp:A.B.C.E@tcp:/somefsname /localmountpoint

That way the client will be looking for the MGS in all of the places it
*could* be located.

Just one possibility of what may be the cause. Certainly easier and less
painful than a lower level version compatibility issue.

—Jeff

On Wed, Feb 28, 2018 at 13:36 Ms. Megan Larko <dobsonu...@gmail.com> wrote:

> Greetings List!
>
> We have been continuing to dissect our LNet environment between our
> lustre-2.7.0 clients and the lustre-2.7.18 servers.  We have moved from the
> client node to the LNet server which bridges the InfiniBand (IB) and
> ethernet networks.   As a test, we attempted to mount the ethernet Lustre
> storage from the LNet hopefully taking the IB out of the equation to limit
> the scope of our debugging.
>
> On the LNet router the attempted mount of Lustre storage fails.   The LNet
> command line error on the test LNet client is exactly the same as the
> original client result:
> mount A.B.C.D@tcp0:/lustre at /mnt/lustre failed: Input/output error  Is
> the MGS running?
>
> On the lustre servers, both the MGS/MDS and OSS we can see the error via
> dmesg:
> LNet: There was an unexpected network error while writing to C.D.E.F:  -110
>
> and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS:
> Lustre: MGS: Client  (at C.D.E.F@tcp) reconnecting
>
> The "lctl pings" in various directions are still successful.
>
> So, forget the end lustre client, we are not yet getting from MGS/MDS
> sucessfully to the LNet router.
> We have been looking at the contents of /sys/module/lustre.conf and we are
> not seeing any differences in set values between the LNet router we are
> using as a test Lustre client and the Lustre MGS/MDS server.
>
> As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both
> "appliance" style Lustre storage systems and clients tied to specific
> versions of the linux kernel (for reasons other than Lustre).
>
> Is there a key parameter which I could still be overlooking?
>
> Cheers,
> megan
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Level 3 support

2018-02-23 Thread Jeff Johnson
Media malpractice. Intel still has it's level 3+ Lustre support function.
The media reporting of Intel's org changes was poor at best. Some of the
more inexperienced vendors may have lost touch with HPDD, my opinion.

--Jeff

On Fri, Feb 23, 2018 at 8:30 AM, Brian Andrus <toomuc...@gmail.com> wrote:

> With the relatively recent changes in Lustre support out there, I am
> curious as to what folks have started doing/planning for level 3 support.
>
> I know a few vendors that sell lustre based products but only provide
> first or second levels of support. They used to use Intel for 3rd level,
> which we had used in the past as well. But now they no longer offer it, so
> they are in a possible pickle if anything goes terribly south.
>
>
> Brian Andrus
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] weird issue w. lnet routers

2017-11-28 Thread Jeff Johnson
John,

I can't speak to Fragella's tuning making things worse but...

Have you run iperf3 and lnet_selftest from your Ethernet clients to each of
the lnet routers to establish what your top end is? It'd be good to
determine if you have an Ethernet problem vs a lnet problem.

Also, are you running Ethernet rdma? If not interrupts on the receive end
can be vexing.

--Jeff

On Tue, Nov 28, 2017 at 17:21 John Casu <j...@chiraldynamics.com> wrote:

> just built a system w. lnet routers that bridge Infiniband & 100GbE, using
> Centos built in Infiniband support
> servers are Infiniband, clients are 100GbE (connectx-4 cards)
>
> my direct write performance from clients over Infiniband is around 15GB/s
>
> When I introduced the lnet routers, performance dropped to 10GB/s
>
> Thought the problem was an MTU of 1500, but when I changed the MTUs to 9000
> performance dropped to 3GB/s.
>
> When I tuned according to John Fragella's LUG slides, things went even
> slower (1.5GB/s write)
>
> does anyone have any ideas on what I'm doing wrong??
>
> thanks,
> -john c.
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing

2017-11-14 Thread Jeff Johnson
Harald,

Yes. You want to ensure your MOFED IB settings and options, as well as lnet
options on the new systems are compatible with the settings on the existing
functional LFS. You want common lnet and OFED config files across your
servers and clients. This can change a bit if you get into lnet routing or
more exotic configurations. For a single, non-routed flat IB fabric
maintain common config files on all systems. Some do this by hand and
others use provisioning environments like Puppet to achieve this.

--Jeff


On Tue, Nov 14, 2017 at 9:57 AM, Harald van Pee <p...@hiskp.uni-bonn.de>
wrote:

> Hi Jeff,
>
> thanks for your answer.
> Can I be sure that there is no autoprobing which sets any configuration
> differently?
> The options given for mkfs.lustre and in /etc/modprobe.d/lustre.conf
> will be the same, is this enough?
>
> Best
> Harald
>
> On Tuesday 14 November 2017 18:39:49 Jeff Johnson wrote:
> > Harald,
> >
> > As long as your new servers and clients all have the same settings in
> their
> > config files as your currently running configuration you should be fine.
> >
> > --Jeff
> >
> >
> > On Tue, Nov 14, 2017 at 9:24 AM, Harald van Pee <p...@hiskp.uni-bonn.de>
> >
> > wrote:
> > > Dear all,
> > >
> > > I have installed lustre 2.10.1 from source with MOFED 4.1.
> > > mdt/mgs and oss run on centos 7.4
> > > clients on debian 9 (kernel 4.9)
> > >
> > > our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib
> > > qdr nics
> > > runs without problems on a mellanox fdr switch.
> > > Now we have additional clients and servers with fdr and qdr nics.
> > >
> > > Do I need any special configuration (beside options lnet
> networks=o2ib0)
> > > if I add additional fdr clients and/or servers?
> > >
> > > Was the configuration probed? And does it make a difference if I would
> > > start
> > > with fdr servers and clients and add qdr servers and clients or the
> other
> > > way
> > > around?
> > >
> > > Thanks in advance
> > > Harald
> > >
> > >
> > > ___
> > > lustre-discuss mailing list
> > > lustre-discuss@lists.lustre.org
> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing

2017-11-14 Thread Jeff Johnson
Harald,

As long as your new servers and clients all have the same settings in their
config files as your currently running configuration you should be fine.

--Jeff


On Tue, Nov 14, 2017 at 9:24 AM, Harald van Pee <p...@hiskp.uni-bonn.de>
wrote:

> Dear all,
>
> I have installed lustre 2.10.1 from source with MOFED 4.1.
> mdt/mgs and oss run on centos 7.4
> clients on debian 9 (kernel 4.9)
>
> our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib qdr
> nics
> runs without problems on a mellanox fdr switch.
> Now we have additional clients and servers with fdr and qdr nics.
>
> Do I need any special configuration (beside options lnet networks=o2ib0)
> if I add additional fdr clients and/or servers?
>
> Was the configuration probed? And does it make a difference if I would
> start
> with fdr servers and clients and add qdr servers and clients or the other
> way
> around?
>
> Thanks in advance
> Harald
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Jeff Johnson
Amjad,

You might ask your vendor to propose a single MDT comprised of (8 * 500GB)
2.5" disk drives or better, SSDs. With some bio applications you would
benefit from spreading the MDT I/O across more drives.

How many clients to you expect to mount the file system? A standard filer
(or ZFS/NFS server) will perform well compared to Lustre until you
bottleneck somewhere in the server hardware (net, disk, cpu, etc), with
Lustre you can add simply add one or more OSS/OSTs to the file system and
performance potential increases by the number of additional OSS/OST servers.

High-availability is nice to have but it isn't necessary unless your
environment cannot tolerate any interruption or downtime. If your vendor
proposes quality hardware these cases are infrequent.

--Jeff

On Mon, Oct 30, 2017 at 12:04 PM, Amjad Syed <amjad...@gmail.com> wrote:

> The vendor has proposed a single MDT  ( 4 * 1.2 TB) in RAID 10
> configuration.
> The OST will be RAID 6  and proposed are 2 OST.
>
>
> On Mon, Oct 30, 2017 at 7:55 PM, Ben Evans <bev...@cray.com> wrote:
>
>> How many OST's are behind that OSS?  How many MDT's behind the MDS?
>>
>> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf
>> of Brian Andrus <toomuc...@gmail.com>
>> Date: Monday, October 30, 2017 at 12:24 PM
>> To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
>> Subject: Re: [lustre-discuss] 1 MDS and 1 OSS
>>
>> Hmm. That is an odd one from a quick thought...
>>
>> However, IF you are planning on growing and adding OSSes/OSTs, this is
>> not a bad way to get started and used to how everything works. It is
>> basically a single stripe storage.
>>
>> If you are not planning on growing, I would lean towards gluster on 2
>> boxes. I do that often, actually. A single MDS/OSS has zero redundancy,
>> unless something is being done at harware level and that would help in
>> availability.
>> NFS is quite viable too, but you would be splitting the available storage
>> on 2 boxes.
>>
>> Brian Andrus
>>
>>
>>
>> On 10/30/2017 12:47 AM, Amjad Syed wrote:
>>
>> Hello
>> We are in process in procuring one small Lustre filesystem giving us 120
>> TB  of storage using Lustre 2.X.
>> The vendor has proposed only 1 MDS and 1 OSS as a solution.
>> The query we have is that is this configuration enough , or we need more
>> OSS?
>> The MDS and OSS server are identical  with regards to RAM (64 GB) and
>> HDD (300GB)
>>
>> Thanks
>> Majid
>>
>>
>> ___
>> lustre-discuss mailing 
>> listlustre-discuss@lists.lustre.orghttp://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Can Lustre Be Mount to Kuberentes Pod

2017-09-18 Thread Jeff Johnson
Forrest,

You should be able to define the Lustre mount point on the host system to
the Kuberentes pod configuration the same way you would for a nfs mount. I
believe you define a directory on the host for the pod to use and that
directory could be local, nfs, Lustre, etc.

As for persistence, it would be as persistent as your Lustre configuration,
network, etc is stable.

--Jeff


On Mon, Sep 18, 2017 at 20:03 <forrest.wc.l...@dell.com> wrote:

> Hi Lustre experts:
>
>
>
> Can Lustre file system be mounted to Kuberentes Pods as a persistent
> volumes ?
>
> Thanks,
>
> Forrest Ling (凌巍才)
>
> +86 18600622522 <%2B86%2018600622522>
>
> Dell HPC Product Technologist Greater China
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Best way to run serverside 2.8 w. MOFED 4.1 on Centos 7.2

2017-08-18 Thread Jeff Johnson
John,

You can rebuild 2.8 against MOFED. 1) Install MOFED version of choice. 2)
Pull down the 2.8 Lustre source and configure with
'--with-o2ib=/usr/src/ofa_kernel/default'. 3) `make rpms` 4) Install. 5)
Profit.

--Jeff

On Fri, Aug 18, 2017 at 9:41 AM, john casu <j...@chiraldynamics.com> wrote:

> I have an existing 2.8 install that broke when we added MOFED into the mix.
>
> Nothing I do wrt installing 2.8 rpms works to fix this, and I get a couple
> of missing symbole, when I install lustre-modules:
> depmod: WARNING: /lib/modules/3.10.0-327.3.1.el
> 7_lustre.x86_64/extra/kernel/net/lustre/ko2iblnd.ko needs unknown symbol
> ib_query_device
> depmod: WARNING: /lib/modules/3.10.0-327.3.1.el
> 7_lustre.x86_64/extra/kernel/net/lustre/ko2iblnd.ko needs unknown symbol
> ib_alloc_pd
>
> I'm assuming the issue is that lustre 2.8 is built using the standard
> Centos 7.2 infiniband drivers.
>
> I can't move to Centos 7.3, at this time.  Is there any way to get 2.8 up
> & running w. mofed without rebuilding lustre rpms?
>
> If I have to rebuild, it'd probably be easier to go to 2.10 (and zfs
> 0.7.1). Is that a correct assumption?
> Or will the 2.10 rpms work on Centps 7.2?
>
> thanks,
> -john c
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.10 on RHEL6.x?

2017-08-07 Thread Jeff Johnson
I'm going to be testing an upgrade of a filled 2.9/0.6.5.7/CentOS6.x LFS to
2.10/0.7/CentOS6.9. I will report back results to the mailing list when it
is completed.

--Jeff

On Mon, Aug 7, 2017 at 06:50 E.S. Rosenberg <esr+lus...@mail.hebrew.edu>
wrote:

> We created a test system that was installed with CentOS 6.x and Lustre 2.8
> filled with some data and subsequently reinstalled with CentOS 7.x and
> Lustre 2.9
>
> Everything seems to have gone fine but I am actually curious if anyone
> else did this pretty invasive upgrade? (Hoping to upgrade in the
> not-to-distant future, maybe even directly to 2.10)
>
> Thanks,
> Eli
>
> On Mon, Aug 7, 2017 at 4:46 PM, Jones, Peter A <peter.a.jo...@intel.com>
> wrote:
>
>> Correct – RHEL 6.x support appeared for the last time in the community
>> 2.8 release. However, there has been some interest in seeing some kind of
>> support for RHEL 6.x in the 2.10 LTS releases so I think it likely that at
>> least support for clients will be reintroduced in a future 2.10.x
>> maintenance release.
>>
>> On 8/7/17, 6:34 AM, "lustre-discuss on behalf of E.S. Rosenberg" <
>> lustre-discuss-boun...@lists.lustre.org on behalf of
>> esr+lus...@mail.hebrew.edu> wrote:
>>
>> If I'm not mistaken they haven't provided RPMs for RHEL6.x since 2.9...
>> HTH,
>> Eli
>>
>> On Mon, Aug 7, 2017 at 4:33 PM, Steve Barnet <bar...@icecube.wisc.edu>
>> wrote:
>>
>>> Hey all,
>>>
>>>   I am looking to upgrade from lustre 2.8 to 2.10. I see that
>>> there are no pre-built RPMs for 2.10 on RHEL6.x families.
>>>
>>> Did I miss them, or will I need to build from source (or
>>> upgrade to Centos 7)?
>>>
>>> Thanks much!
>>>
>>> Best,
>>>
>>> ---Steve
>>>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-27 Thread Jeff Johnson
Eli,

The biggest driver is usually the drivers. Newer Mellanox hardware not yet
supported, or supported well, by kernel IB. Way back in the days of old
there were some interoperability issues where everything (clients and
servers) needed to be the same drivers and libraries but much of that was
cleaned up. There could be situations where OFED is needed on the server
side to support something under the Lustre layer like OST or MDT block
devices via iSER, SRP, NVMeF, etc.

There may be other reasons but those are off the top of my head.

--Jeff

On Thu, Jul 27, 2017 at 4:55 PM, E.S. Rosenberg <esr+lus...@mail.hebrew.edu>
wrote:

> Hi all,
>
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
>
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
>
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need OFED
> because of that
>
> Thanks,
> Eli
>
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-19 Thread Jeff Johnson
Jessica,

You are getting a NID registering twice. Doug noticed and pointed it out.
I'd look to see if that is one machine doing something twice or two
machines with the same NID.

--Jeff

On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonu...@gmail.com> wrote:

> Greetings Jessica,
>
> I'm not sure I am correctly understanding the behavior "robinhood activity
> floods the MDT".   The robinhood program as you (and I) are using it is
> consuming the MDT CHANGELOG via a reader_id which was assigned when the
> CHANGELOG was enabled on the MDT.   You can check the MDS for these readers
> via "lctl get_param mdd.*.changelog_users".  Each CHANGELOG reader must
> either be consumed by a process or destroyed otherwise the CHANGELOG will
> grow until it consumes sufficient space to stop the MDT from functioning
> correctly.  So robinhood should consume and then clear the CHANGELOG via
> this reader_id.  This implementation of robinhood is actually a rather
> light-weight process as far as the MDS is concerned.   The load issues I
> encountered were on the robinhood server itself which is a separate server
> from the Lustre MGS/MDS server.
>
> Just curious, have you checked for multiple reader_id's on your MDS for
> this Lustre file system?
>
> P.S. My robinhood configuration file is using nb_threads = 8, just for a
> data point.
>
> Cheers,
> megan
>
> On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jo...@nrao.edu> wrote:
>
>> Hi Megan,
>>
>> Thanks for your input. We use percona, a drop-in replacement for mysql...
>> The robinhood activity floods the MDT, but it does not seem to produce any
>> excessive load on the robinhood box...
>>
>> Anyway, FWIW...
>>
>> ~]# mysql --version
>> mysql  Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline
>> 5.1
>>
>> Product: robinhood
>> Version: 3.0-1
>> Build:   2017-03-13 10:29:26
>>
>> Compilation switches:
>> Lustre filesystems
>> Lustre Version: 2.5
>> Address entries by FID
>> MDT Changelogs supported
>>
>> Database binding: MySQL
>>
>> RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64
>> Lustre rpms:
>>
>> lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>> lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>>
>> On 5/18/17 11:55 AM, Ms. Megan Larko wrote:
>>
>> With regards to (WRT) Subject "Robinhood exhausting RPC resources against
>> 2.5.5   lustre file systems", what version of robinhood and what version of
>> MySQL database?   I mention this because I have been working with
>> robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on
>> kernel-2.6.32-573 and had issues in which the robinhood server consumed
>> more than the total amount of 32 CPU cores on the robinhood server (with
>> 128 G RAM) and would functionally hang the robinhood server.   The issue
>> was solved for me by changing to MySQL-5.6.35.   It was the "sort" command
>> in robinhood that was not working well with the MySQL-5.5.32.
>>
>> Cheers,
>> megan
>>
>>
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.9 performance issues

2017-04-27 Thread Jeff Johnson
While tuning can alleviate some pain it shouldn't go without mentioning
that there are some operations that are just less than optimal on a
parallel file system. I'd bet a cold one that a copy to local /tmp,
vim/paste, copy back to the LFS would've been quicker. Some single-threaded
small i/o operations can be approached more efficiently in a similar
manner.

Lustre is a fantastic tool and like most tools it doesn't do everything
well..*yet*

--Jeff

On Thu, Apr 27, 2017 at 4:21 PM, Dilger, Andreas <andreas.dil...@intel.com>
wrote:

> On Apr 25, 2017, at 13:11, Bass, Ned <ba...@llnl.gov> wrote:
> >
> > Hi Darby,
> >
> >> -Original Message-
> >>
> >> for i in $(seq 0 99) ; do
> >>   dd if=/dev/zero of=dd.dat.$i bs=1k count=1 conv=fsync > /dev/null 2>&1
> >> done
> >>
> >> The timing of this ranges from 0.1 to 1 sec on our old LFS but ranges
> from 20
> >> to 60 sec on our newer 2.9 LFS.
> >
> > Because Lustre does not yet use the ZFS Intent Log (ZIL), it implements
> fsync() by
> > waiting for an entire transaction group to get written out. This can
> incur long
> > delays on a busy filesystem as the transaction groups become quite
> large. Work
> > on implementing ZIL support is being tracked in LU-4009 but this feature
> is not
> > expected to make it into the upcoming 2.10 release.
>
> There is also the patch that was developed in the past to test this:
> https://review.whamcloud.com/7761 "LU-4009 osd-zfs: Add tunables to
> disable sync"
> which allows disabling ZFS to wait for TXG commit for each sync on the
> servers.
>
> That may be an acceptable workaround in the meantime.  Essentially,
> clients would
> _start_ a sync on the server, but would not wait for completion before
> returning
> to the application.  Both the client and the OSS would need to crash
> within a few
> seconds of the sync for it to be lost.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
>
>
>
>
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] design to enable kernel updates

2017-02-10 Thread Jeff Johnson
> > this month and last).  We are in the process of syncing our existing LFS
> > to this new one and I've failed over/rebooted/upgraded the new LFS
> servers
> > many times now to make sure we can do this in practice when the new LFS
> goes
> > into production.  Its working beautifully.
> >
> > Many thanks to the lustre developers for their continued efforts.  We
> have
> > been using and have been fans of lustre for quite some time now and it
> > just keeps getting better.
> >
> > -Original Message-
> > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of Ben Evans <bev...@cray.com>
> > Date: Monday, February 6, 2017 at 2:22 PM
> > To: Brian Andrus <toomuc...@gmail.com>, "lustre-discuss@lists.lustre.org"
> <lustre-discuss@lists.lustre.org>
> > Subject: Re: [lustre-discuss] design to enable kernel updates
> >
> > It's certainly possible.  When I've done that sort of thing, you upgrade
> > the OS on all the servers first, boot half of them (the A side) to the
> new
> > image, all the targets will fail over to the B servers.  Once the A side
> > is up, reboot the B half to the new OS.  Finally, do a failback to the
> > "normal" running state.
> >
> > At least when I've done it, you'll want to do the failovers manually so
> > the HA infrastructure doesn't surprise you for any reason.
> >
> > -Ben
> >
> > On 2/6/17, 2:54 PM, "lustre-discuss on behalf of Brian Andrus"
> > <lustre-discuss-boun...@lists.lustre.org on behalf of
> toomuc...@gmail.com>
> > wrote:
> >
> >> All,
> >>
> >> I have been contemplating how lustre could be configured such that I
> >> could update the kernel on each server without downtime.
> >>
> >> It seems this is _almost_ possible when you have a san system so you
> >> have failover for OSTs and MDTs. BUT the MGS/MGT seems to be the
> >> problematic one, since rebooting that seems cause downtime that cannot
> >> be avoided.
> >>
> >> If you have a system where the disks are physically part of the OSS
> >> hardware, you are out of luck. The hypothetical scenario I am using is
> >> if someone had a VM that was a qcow image on a lustre mount (basically
> >> an active, open file being read/written to continuously). How could
> >> lustre be built to ensure anyone on the VM would not notice a kernel
> >> upgrade to the underlying lustre servers.
> >>
> >>
> >> Could such a setup be done? It seems that would be a better use case for
> >> something like GPFS or Gluster, but being a die-hard lustre enthusiast,
> >> I want to at least show it could be done.
> >>
> >>
> >> Thanks in advance,
> >>
> >> Brian Andrus
> >>
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> >
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LNET Self-test

2017-02-05 Thread Jeff Johnson
Without seeing your entire command it is hard to say for sure but I would make 
sure your concurrency option is set to 8 for starters. 

--Jeff

Sent from my iPhone

> On Feb 5, 2017, at 11:30, Jon Tegner  wrote:
> 
> Hi,
> 
> I'm trying to use lnet selftest to evaluate network performance on a test 
> setup (only two machines). Using e.g., iperf or Netpipe I've managed to 
> demonstrate the bandwidth of the underlying 10 Gbits/s network (and typically 
> you reach the expected bandwidth as the packet size increases).
> 
> How can I do the same using lnet selftest (i.e., verifying the bandwidth of 
> the underlying hardware)? My initial thought was to increase the I/O size, 
> but it seems the maximum size one can use is "--size=1M".
> 
> Thanks,
> 
> /jon
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre won't build anymore on RHEL 7.3

2016-11-29 Thread Jeff Johnson
/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: expected 'void *' but argument is of type 'int'
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9:
> > error: too few arguments to function 'rdma_create_id'
> >  cmid = kiblnd_rdma_create_id(kiblnd_dummy_callback, dev,
> > RDMA_PS_TCP,
> >  ^
> > In file included from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> >  from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: declared here
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In
> > function 'kiblnd_dev_failover':
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9:
> > error: passing argument 1 of 'rdma_create_id' from incompatible pointer
> > type [-Werror]
> >  cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev,
> RDMA_PS_TCP,
> >  ^
> > In file included from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> >  from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: expected 'struct net *' but argument is of type 'int (*)(struct
> > rdma_cm_id *, struct rdma_cm_event *)'
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9:
> > error: passing argument 2 of 'rdma_create_id' from incompatible pointer
> > type [-Werror]
> >  cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev,
> RDMA_PS_TCP,
> >  ^
> > In file included from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> >  from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: expected 'rdma_cm_event_handler' but argument is of type 'struct
> > kib_dev_t *'
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9:
> > error: passing argument 3 of 'rdma_create_id' makes pointer from integer
> > without a cast [-Werror]
> >  cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev,
> RDMA_PS_TCP,
> >  ^
> > In file included from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> >  from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: expected 'void *' but argument is of type 'int'
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2321:9:
> > error: too few arguments to function 'rdma_create_id'
> >  cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, dev,
> RDMA_PS_TCP,
> >  ^
> > In file included from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> >  from
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > note: declared here
> >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > ^
> > cc1: all warnings being treated as errors
> > make[7]: ***
> > [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.o] Error 1
> > make[6]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd]
> Error 2
> > make[5]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds] Error 2
> > make[4]: *** [/root/rpmbuild/BUILD/lustre-2.8.0/lnet] Error 2
> > make[4]: *** Waiting for unfinished jobs
> > make[3]: *** [_module_/root/rpmbuild/BUILD/lustre-2.8.0] Error 2
> > make[2]: *** [modules] Error 2
> > make[1]: *** [all-recursive] Error 1
> > make: *** [all] Error 2
> > error: Bad exit status from /var/tmp/rpm-tmp.mYkfwi (%build)
> >
> >
> > RPM build errors:
> > Bad exit status from /var/tmp/rpm-tmp.mYkfwi (%build)
> >
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] poor performance on reading small files

2016-08-03 Thread Jeff Johnson

On 8/3/16 10:57 AM, Dilger, Andreas wrote:

On Jul 29, 2016, at 03:33, Oliver Mangold <oliver.mang...@emea.nec.com> wrote:

On 29.07.2016 04:19, Riccardo Veraldi wrote:

I am using lustre on ZFS.

While write performances are excellent also on smaller files, I find
there is a drop down in performance
on reading 20KB files. Performance can go as low as 200MB/sec or even
less.

Getting 200 MB/s with 20kB files means you have to do 1 metadata
ops/s. Don't want to say it is impossible to get more than that, but at
least with MDT on ZFS this doesn't sound bad either. Did you run an
mdtest on your system? Maybe some serious tuning of MD performance is in
order.

I'd agree with Oliver that getting 200MB/s with 20KB files is not too bad.
Are you using HDDs or SSDs for the MDT and OST devices?  If using HDDs,
are you using SSD L2ARC to allow the metadata and file data be cached in
L2ARC, and allowing enough time for L2ARC to be warmed up?

Are you using TCP or IB networking?  If using TCP then there is a lower
limit on the number of RPCs that can be handled compared to IB.

Cheers, Andreas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Also consider that 20KB of data per lnet RPC, assuming a 1MB RPC, to 
move 20KB files at 200MB/sec into a non-striped LFS directory you are 
using EDR for lnet? 100GB Ethernet?


--Jeff


--
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-performance Computing / Lustre Filesystems / Scale-out Storage

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] Lustre Server Sizing

2015-07-21 Thread Jeff Johnson
Indivar,

Since your CIFS or NFS gateways operate as Lustre clients there can be
issues with running multiple NFS or CIFS gateway machines frontending the
same Lustre filesystem. As Lustre clients there are no issues in terms of
file locking but the NFS and CIFS caching and multi-client file access
mechanics don't interface with Lustre's file locking mechanics. Perhaps
that may have changed recently and a developer on the list may comment on
developments there. So while you could provide client access through
multiple NFS or CIFS gateway machines there would not be much in the way of
file locking protection. There is a way to configure pCIFS with CTDB and
get close to what you envision with Samba. I did that configuration once as
a proof of concept (no valuable data). It is a *very* complex configuration
and based on the state of software when I did it I wouldn't say it was a
production grade environment.

As I said before, my understanding may be a year out of date and someone
else could speak to the current state of things. Hopefully that would be a
better story.

--Jeff



On Tue, Jul 21, 2015 at 10:26 AM, Indivar Nair indivar.n...@techterra.in
wrote:

 Hi Scott,

 The 3 - SAN Storages with 240 disks each has its own 3 NAS Headers (NAS
 Appliances).
 However, even with 240 10K RPM disk and RAID50, it is only providing
 around 1.2 - 1.4GB/s per NAS Header.

 There is no clustered file system, and each NAS Header has its own
 file-system.
 It uses some custom mechanism to present the 3 file systems as single name
 space.
 But the directories have to be manually spread across for load-balancing.
 As you can guess, this doesn't work most of the time.
 Many a times, most of the compute nodes access a single NAS Header,
 overloading it.

 The customer wants *at least* 9GB/s throughput from a single file-system.

 But I think, if we architect the Lustre Storage correctly, with these many
 disks, we should get at least 18GB/s throughput, if not more.

 Regards,


 Indivar Nair


 On Tue, Jul 21, 2015 at 10:15 PM, Scott Nolin scott.no...@ssec.wisc.edu
 wrote:

 An important question is what performance do they have now, and what do
 they expect if converting it to Lustre. Our more basically, what are they
 looking for in general in changing?

 The performance requirements may help drive your OSS numbers for example,
 or interconnect, and all kinds of stuff.

 Also I don't have a lot of experience with NFS/CIFS gateways, but that is
 perhaps it's own topic and may need some close attention.

 Scott

 On 7/21/2015 10:57 AM, Indivar Nair wrote:

 Hi ...,

 One of our customers has a 3 x 240 Disk SAN Storage Array and would like
 to convert it to Lustre.

 They have around 150 Workstations and around 200 Compute (Render) nodes.
 The File Sizes they generally work with are -
 1 to 1.5 million files (images) of 10-20MB in size.
 And a few thousand files of 500-1000MB in size.

 Almost 50% of the infra is on MS Windows or Apple MACs

 I was thinking of the following configuration -
 1 MDS
 1 Failover MDS
 3 OSS (failover to each other)
 3 NFS+CIFS Gateway Servers
 FDR Infiniband backend network (to connect the Gateways to Lustre)
 Each Gateway Server will have 8 x 10GbE Frontend Network (connecting the
 clients)

 *Option A*
  10+10 Disk RAID60 Array with 64KB Chunk Size i.e. 1MB Stripe Width
  720 Disks / (10+10) = 36 Arrays.
  12 OSTs per OSS
  18 OSTs per OSS in case of Failover

 *Option B*
  10+10+10+10 Disk RAID60 Array with 128KB Chunk Size i.e. 4MB Stripe
 Width
  720 Disks / (10+10+10+10) = 18 Arrays
  6 OSTs per OSS
  9 OSTs per OSS in case of Failover
  4MB RPC and I/O

 *Questions*
 1. Would it be better to let Lustre do most of the striping / file
 distribution (as in Option A) OR would it be better to let the RAID
 Controllers do it (as in Option B)

 2. Will Option B allow us to have lesser CPU/RAM than Option A?

 Regards,


 Indivar Nair



 ___
 HPDD-discuss mailing list
 hpdd-disc...@lists.01.org
 https://lists.01.org/mailman/listinfo/hpdd-discuss




 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA

2015-06-19 Thread Jeff Johnson
Why choose? Why not install a lnet router QDR-10GbE or dual home your MDS
 OSS nodes with QDR and a 10GbE nic?

--Jeff

On Fri, Jun 19, 2015 at 9:10 AM, INKozin i.n.ko...@googlemail.com wrote:

 I know that QDR IB gives the best bang for buck currently and that's what
 we have now. However due to various reasons we are looking at alternatives
 hence the question. Thank you very much for your information, Ben.

 On 19 June 2015 at 16:24, Ben Evans bev...@cray.com wrote:

  It’s faster in that you eliminate all the TCP overhead and latency.
 (something on the order of 20% improvement in speed, IIRC, it’s been
 several years)



 Balancing your network performance with what your disks can provide is a
 whole other level of system design and implementation.  You can stack
 enough disks or SSDs behind a server so that the network is your
 bottleneck, you can stack up enough network to few enough disks so that the
 drives are your bottleneck.  You can stack up enough of both so that the
 PCIE bus is your bottleneck.



 Take the time and compare costs/performance to Infiniband, since most
 systems have a dedicated client/server network, you might as well go as
 fast as you can.



 -Ben Evans



 *From:* igk...@gmail.com [mailto:igk...@gmail.com] *On Behalf Of *INKozin
 *Sent:* Friday, June 19, 2015 11:10 AM
 *To:* Ben Evans
 *Cc:* lustre-discuss@lists.lustre.org
 *Subject:* Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and
 without RDMA



 Ben, is it possible to quantify faster?

 Understandably, for a single client on an empty cluster it may feel
 faster but on a busy cluster with many reads and writes in flight I'd
 have thought the limiting factor is the back end's throughput rather than
 the network, no? As long as the bandwidth to a client is somewhat higher
 than the average i/o bandwidth (back end's throughput divided by the number
 of clients) the client should be content.



 On 19 June 2015 at 14:46, Ben Evans bev...@cray.com wrote:

 It is faster, but I don’t know what price/performance tradeoff is, as I
 only used it as an engineer.



 As an alternative, take a look at RoCE, it does much the same thing but
 uses normal (?) hardware.  It’s still pretty new, though, so you might have
 some speedbumps.



 -Ben Evans



 *From:* lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] *On
 Behalf Of *INKozin
 *Sent:* Friday, June 19, 2015 5:43 AM
 *To:* lustre-discuss@lists.lustre.org
 *Subject:* [lustre-discuss] Lustre over 10 Gb Ethernet with and without
 RDMA



 My question is about performance advantages of Lustre RDMA over 10 Gb
 Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the
 premium for iWARP? I understand that iWARP essentially reduces latency but
 less sure of its specific implications for storage. Would it improve
 performance on small files? Any pointers to representative benchmarks will
 be very appreciated.



 Celsio has released a white paper in which they compare Lustre RDMA over
 40 Gb Ethernet and FDR IB


 http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf

 where they claim comparable performance of both.

 How much worse the throughput on small block sizes would be without iWARP?



 Thank you

 Igor





 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Jeff Johnson

David,

What interconnect are you using for Lustre? ( IB/o2ib [fdr,qdr,ddr], 
Ethernet/tcp [40GbE,10Gbe,1GbE] ). You can run 'lctl list_nids' and see 
what protocol lnet is binding to, then look at that interface for the 
specific type.


Also, do you know anything about the server side of your Lustre FS? What 
make/model of block devices are used in OSTs?


--Jeff


On 5/19/15 9:05 AM, Schneider, David A. wrote:

Thanks, for the client, where I am running from, I have

$ cat /proc/fs/lustre/version
lustre: 2.1.6
kernel: patchless_client
build:  jenkins--PRISTINE-2.6.18-348.4.1.el5


best,

David Schneider

From: Patrick Farrell [p...@cray.com]
Sent: Tuesday, May 19, 2015 9:03 AM
To: Schneider, David A.; John Bauer; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to single 
file

For the clients, cat /proc/fs/lustre/version

For the servers, it¹s the same, but presumably you don¹t have access.

On 5/19/15, 11:01 AM, Schneider, David A. david...@slac.stanford.edu
wrote:


Hi,

My first test was just to do the for loop where I allocate a 4MB buffer,
initialize it, and delete it. That program ran at about 6GB/sec. Once I
write to a file, I drop down to 370mb/sec. Our top performance for I/O to
one file has been about 400 mb/sec.

For this question: Which versions are you using in servers and clients?
I don't know what command to determine this, I suspect it is older since
we are on red hat 5. I will ask.

best,

David Schneider

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf
of John Bauer [bau...@iodoctors.com]
Sent: Tuesday, May 19, 2015 8:52 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to
single file

David

You note that you write a 6GB file.  I suspect that your Linux systems
have significantly more memory than 6GB meaning your file will end being
cached in the system buffers.  It wont matter how many OSTs you use as
you probably are not measuring the speed to the OST's, but rather, you
are measuring the memory copy speed.
What transfer rate are you seeing?

John

On 5/19/2015 10:40 AM, Schneider, David A. wrote:

I am trying to get good performance with parallel writing to one file
through MPI. Our cluster has high performance when I write to separate
files, but when I use one file - I see very little performance increase.

As I understand, our cluster defaults to use one OST per file. There
are many OST's though, which is how we get good performance when writing
to multiple files. I have been using the command

   lfs setstripe

to change the stripe count and block size. I can see that this works,
when I do lfs getstripe, I see the output file is striped, but I'm
getting very little I/O performance when I create the striped file.

When working from hdf5 and mpi, I have seen a number of references
about tuning parameters, I haven't dug into this yet. I first want to
make sure lustre has the high output performance at a basic level. I
tried to write a C program uses simple POSIX calls (open and looping
over writes) but I don't see much increase in performance (I've tried 8
and 19 OST's, 1MB and 4MB chunks, I write a 6GB file).

Does anyone know if this should work? What is the simplest C program I
could write to see an increase in output performance after I stripe? Do
I need separate processes/threads with separate file handles? I am on
linux red hat 5. I'm not sure what version of lustre this is. I have
skimmed through a 450 page pdf of lustre documentation, I saw references
to destructive testing one does in the beginning, but I'm not sure what
I can do now. I think this is the first work we've done to get high
performance when writing a single file, so I'm worried there is
something buried in the lustre configuration that needs to be changed. I
can run /usr/sbin/lcntl, maybe there are certain parameters I should
check?

best,

David Schneider
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
I/O Doctors, LLC
507-766-0378
bau...@iodoctors.com

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



--
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite

Re: [Lustre-discuss] only root can read/write on new lustre filesystem (Lustre 2.6)

2015-02-04 Thread Jeff Johnson
Does your mds node have access to your non-privileged users/groups identity
data by way of methods like ldap or local files (/etc/passwd, /etc/group,
etc)?

Your clients and mds need to be on the same sheet of music.

--Jeff


On Wed, Feb 4, 2015 at 5:19 PM, No One jc.listm...@gmail.com wrote:

 I'm sure I've overlooked something in the documentation, but for the life
 of me, I can't figure out what.

 I've setup a new cluster running 2.6 and everything seemed to go fine.
 I've got the filesystem mounted on a few clients and as long as I am root,
 I can read and write to it just fine.

 If I switch to another user, I get something like this:

 -bash-4.1$ ls -al
 ls: cannot access test: Permission denied
 total 8
 drwxr-xr-x 4 root root 4096 Feb  5 00:58 .
 drwxr-xr-x 3 root root 4096 Feb  4 23:05 ..
 d? ? ??   ?? test


 if I am root though, it looks fine:

 [cvt]# ls -al
 total 12
 drwxr-xr-x 4 root root 4096 Feb  5 00:58 .
 drwxr-xr-x 3 root root 4096 Feb  4 23:05 ..
 drwxr-xr-x 2 test test 4096 Feb  5 01:13 test


 No amount of chmod'ing or chown'ing has worked to resolve this.  I know
 I've seen this before and I feel like it was in the context of NFS, but I'm
 not finding it.

 I could use any help/advice to figure this out.

 Thanks!


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 2.5.2 Client Errors

2014-08-22 Thread Jeff Johnson
Murshid,

Does the error message actually have Tell Peter, lookup on mtpt, it open
text in it?

If so, one of the funnier Lustre error messages to be sure.

--Jeff


On Fri, Aug 22, 2014 at 12:28 AM, Murshid Azman murshid.az...@gmail.com
wrote:

 Hello Everyone,

 We're trying to run a cluster image on Lustre filesystem version 2.5.2 and
 repeatedly seeing the following message. Haven't seen anything bizarre on
 this machine other than this:

 2014-08-22T13:52:01+07:00 node01 kernel: LustreError:
 4271:0:(namei.c:530:ll_lookup_
 it()) Tell Peter, lookup on mtpt, it open
 2014-08-22T13:52:01+07:00 node01 kernel: LustreError:
 4271:0:(namei.c:530:ll_lookup_it()) Skipped 128 previous similar messages

 This doesn't happen to our desktop Lustre clients.

 I'm wondering if anyone has any idea what this means.

 Thanks,
 Murshid Azman.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST Failover Configuration (Active/Active) verification

2014-01-30 Thread Jeff Johnson
Peter,

You have it about half right. Since you are dealing with active filesystems
on the OST shared storage devices and you can never truly predict the type
of failure a system will have you need to add a software command/control
layer that will manage failover, remounting storage and cutting power to
the failing node. Failure to do that could result in a split brain
situation where your OST backend filesystem gets corrupted.

You have the mkfs.lustre part right. You need to add heartbeat/corosync to
the configuration and configure it so the two systems monitor each other
with a watchdog heartbeat. Failing machine gets sensed by healthy machine
and the healthy machine shoots it (STONITH: shoot the other node in the
head) via ipmi power control or a smart rack PDU like an APC ethernet
managed PDU.

The heartbeat/corosync config takes your existing config and adds automated
directives like:
node1 mount sdb : node2 mounts sdc
if node1 dies node2 mounts sdb
if node2 dies node1 mounts sdc
if surviving node senses restoration of heartbeat local defined storage
gets remounted to owning node

Intel's IEEL Lustre distribution does all of this sort of thing
automagically. Or you can manually install Lustre and the corosync app
packages and configure it manually.

--Jeff


On Thu, Jan 30, 2014 at 9:21 PM, Peter Mistich
peter.mist...@rackspace.comwrote:

 hello,

 anyone here can answer a questions about OST Failover Configuration
 (Active/Active) I think I understand but want to make sure.

 I configure 2 oss  servernames = node1 and node2  with 2 shared drives
 /dev/sdb and /dev/sdc and  on node1

 I run the command on node1 mkfs.lustre --fsname=testfs --ost
 --failnode=node2 --mgsnode=msg /dev/sdb

 I run the command on node2 mkfs.lustre --fsname=testfs --ost
 --failnode=node1 --mgsnode=msg /dev/sdc

 I mount  /dev/sdb on node 1 and  mount /dev/sdc on node2

 if node1 fails then I just mount  /dev/sdb on node2 and that is how
 active/active works

 is this correct ?

 Thanks,
 Pete
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre 1.8.5 client failed to mount lustre

2013-10-17 Thread Jeff Johnson
://www.clustertech.com
 Address: Units 211 - 213, Lakeside 1, No. 8 Science Park West Avenue, Hong 
 Kong Science Park, N.T. Hong Kong
 Hong Kong Beijing Shanghai Guangzhou Shenzhen Wuhan 
 Sydney

 **
 The information contained in this e-mail and its attachments is confidential 
 and intended solely for the specified addressees. If you have received this 
 email in error, please do not read, copy, distribute, disclose or use any 
 information of this email in any way and please immediately notify the sender 
 and delete this email. Thank you for your cooperation.
 **


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-performance Computing / Lustre Filesystems / Scale-out Storage

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Hola Eduardo,

How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)?
Are there any non-Lustre errors in the dmesg output of the OSS?
Block devices error on the OSS (/dev/sd?)?

If you are losing [scsi,sas,fc,srp] connectivity you may see this sort 
of thing. If the OSTs are connected to the OSS node via IB SRP and your 
IB fabric gets busy or you have subnet manager issues you might see a 
condition like this.

Is this the AliceFS at DGTIC?

--Jeff



On 10/17/13 3:52 PM, Eduardo Murrieta wrote:
 Hello,

 this is my first post on this list, I hope someone can give me some 
 advise on how to resolve the following issue.

 I'm using the lustre release 2.4.0 RC2 compiled from whamcloud 
 sources, this is an upgrade from lustre 2.2.22 from same sources.

 The situation is:

 There are several clients reading files that belongs mostly to the 
 same OST, afther a period of time the clients starts loosing contact 
 with this OST and processes stops due to this fault, here is the state 
 for such OST on one client:

 client# lfs check servers
 ...
 ...
 lustre-OST000a-osc-8801bc548000: check error: Resource temporarily 
 unavailable
 ...
 ...

 checking dmesg on client and OSS server we have:

 client# dmesg
 LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating 
 with 10.2.2.3@o2ib, operation ost_connect failed with -16.
 LustreError: Skipped 24 previous similar messages

 OSS-server# dmesg
 
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 
 (at 10.2.64.4@o2ib) reconnecting
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 
 (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs
 

 At this moment I can ping from client to server and vice versa, but 
 some time this call also hangs on server and client.

 client# # lctl ping OSS-server@o2ib
 12345-0@lo
 12345-OSS-server@o2ib

 OSS-server# lctl ping 10.2.64.4@o2ib
 12345-0@lo
 12345-10.2.64.4@o2ib

 This situation happens very frequently and specially with jobs that 
 process a lot of files in an average size of 100MB.

 The only solution that  I find to reestablish the communication 
 between the server and the client is restarting both machines.

 I hope some have an idea what is the reason for the problem and how 
 can I reset the communication with the clients without restarting the 
 machines.

 thank you,

 Eduardo
 UNAM@Mexico

 -- 
 Eduardo Murrieta
 Unidad de Cómputo
 Instituto de Ciencias Nucleares, UNAM
 Ph. +52-55-5622-4739 ext. 5103



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-performance Computing / Lustre Filesystems / Scale-out Storage

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Ah, I understand. I performed the onsite Lustre installation of Alice and
worked with JLG and his staff. Nice group of people!

This seems like a backend issue. Ldiskfs or the LSI RAID devices. Do you
see any read/write failures reported on the OSS of the sd block devices
where the OSTs reside? Something is timing out; disk I/O or the OSS is
running too high of an iowait under load.

How many OSS nodes in the filesystem? Are these operations striped across
all OSTs? Across multiple OSSs?

I still have an account on DGTIC's gateway, I could login and look. :-)

--Jeff

On Thursday, October 17, 2013, Eduardo Murrieta wrote:

 Hello Jeff,

 Non, this is a lustre filesystem for Instituto de Ciencias Nucleares at
 UNAM, we are working on the installation for Alice at DGTIC too, but this
 problem is with our local filesystem.

 The OST is connected using a LSI-SAS controller, we have 8 OSTs on the
 same server, there are nodes that loose connection with all the OSTs that
 belong to this server but the problem is not related with the OST-OSS
 communication, since I can access this  OST and read files stored there
 from other lustre clients.

 The problem is a deadlock condition in which the OSS and some clients
 refuse connections from each other as I can see from dmesg:

 in the client
 LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with
 10.2.2.3@o2ib, operation ost_connect failed with -16.

 in the server
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at
 10.2.64.4@o2ib) reconnecting
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at
 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs

 this only happen with clients that are reading a lot of small files
 (~100MB each) in the same OST.

 thank you,

 Eduardo



 2013/10/17 Jeff Johnson jeff.john...@aeoncomputing.com

 Hola Eduardo,

 How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)?
 Are there any non-Lustre errors in the dmesg output of the OSS?
 Block devices error on the OSS (/dev/sd?)?

 If you are losing [scsi,sas,fc,srp] connectivity you may see this sort
 of thing. If the OSTs are connected to the OSS node via IB SRP and your
 IB fabric gets busy or you have subnet manager issues you might see a
 condition like this.

 Is this the AliceFS at DGTIC?

 --Jeff



 On 10/17/13 3:52 PM, Eduardo Murrieta wrote:
  Hello,
 
  this is my first post on this list, I hope someone can give me some
  advise on how to resolve the following issue.
 
  I'm using the lustre release 2.4.0 RC2 compiled from whamcloud
  sources, this is an upgrade from lustre 2.2.22 from same sources.
 
  The situation is:
 
  There are several clients reading files that belongs mostly to the
  same OST, afther a period of time the clients starts loosing contact
  with this OST and processes stops due to this fault, here is the state
  for such OST on one client:
 
  client# lfs check servers
  ...
  ...
  lustre-OST000a-osc-8801bc548000: check error: Resource temporarily
  unavailable
  ...
  ...
 
  checking dmesg on client and OSS server we have:
 
  client# dmesg
  LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating
  with 10.2.2.3@o2ib, operation ost_connect failed with -16.
  LustreError: Skipped 24 previous similar messages
 
  OSS-server# dmesg
  
  Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9
  (at 10.2.64.4@o2ib) reconnecting
  Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9
  (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs
  
 
  At this moment I can ping from client to server and vice versa, but
  some time this call also hangs on server and client.
 
  client# # lctl ping OSS-server@o2ib
  12345-0@lo
  12345-OSS-server@o2ib
 
  OSS-server# lctl ping 10.2.64.4@o2ib
  12345-0@lo
  12345-10.2.64.4@o2ib
 
  This situation happens very frequently and specially with jobs that
  process a lot of files in an average size of 100MB.
 
  The only solution that  I find to reestablish the communication
  between the server and the client is restarting both machines.
 
  I hope some have an idea what is the reason for the problem and how
  can I reset the communication with the clients without restarting the
  machines.
 
  thank you,
 
  Eduardo
  UNAM@Mexico
 
  --
  Eduardo Murrieta
  Unidad de Cómputo
  Instituto de Ciencias Nucleares, UNAM
  Ph. +52-55-5622-4739 ext. 5103
 
 
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --
 --
 Jeff Johnson
 Co-Founder
 Aeon Computing

 jeff.john...@aeoncomputing.com
 www.aeoncomputing.com
 t: 858-412-3810 x1001   f: 858-412-3845
 m: 619-204-9061

 4170 Morena Boulevard, Suite D - San Diego, CA 92117

 High-performance Computing / Lustre Filesystems / Scale-out Storage

Re: [Lustre-discuss] Broken communication between OSS and Client on Lustre 2.4

2013-10-17 Thread Jeff Johnson
Eduardo,

One or two E5506 CPUs in the OSS? What is the specific LSI controller and
how many of them in the OSS?

I think the OSS is under provisioned for 8 OSTs. I'm betting you run a high
iowait on those sd devices during your problematic run. The iowait probably
grows until deadlock. Can you run the job while running a shell with top on
the OSS. You're likely hitting 99% iowait.

--Jeff

On Thursday, October 17, 2013, Eduardo Murrieta wrote:

 I have this on the debug_file from my OSS:

 0010:02000400:0.0:1382055634.785734:0:3099:0:(ost_handler.c:940:ost_brw_read())
 lustre-OST: Bulk IO read error with 0afb2e4c-d
 870-47ef-c16f-4d2bce6dabf9 (at 10.2.64.4@o2ib), client will retry: rc -107

 0400:02000400:0.0:1382055634.786061:0:3099:0:(watchdog.c:411:lcw_update_time())
 Service thread pid 3099 completed after 227.00s. This indicates the system
 was overloaded (too many service threads, or there were not enough hardware
 resources).

 But I can read without problems files stored on this ODT from other
 clients. For example:

 $ lfs find --obd lustre-OST .
 ./src/BLAS/srot.f
 ...

 $ more ./src/BLAS/srot.f
   SUBROUTINE SROT(N,SX,INCX,SY,INCY,C,S)
 * .. Scalar Arguments ..
   REAL C,S
   INTEGER INCX,INCY,N
 * ..
 * .. Array Arguments ..
   REAL SX(*),SY(*)
 ...
 ...

 This OSS have 8 ODTs of 14 TB each, with 12 GB/RAM and Xeon Quad Core
 E5506. Tomorrow I'll increase the memory, if this is the missing resource.









 2013/10/17 Joseph Landman land...@scalableinformatics.com

 Are there device or Filesystem level error messages on the server?  This
 almost looks like a corrupted file system.

 Please pardon brevity and typos ... Sent from my iPhone

 On Oct 17, 2013, at 6:11 PM, Eduardo Murrieta emurri...@nucleares.unam.mx
 wrote:

 Hello Jeff,

 Non, this is a lustre filesystem for Instituto de Ciencias Nucleares at
 UNAM, we are working on the installation for Alice at DGTIC too, but this
 problem is with our local filesystem.

 The OST is connected using a LSI-SAS controller, we have 8 OSTs on the
 same server, there are nodes that loose connection with all the OSTs that
 belong to this server but the problem is not related with the OST-OSS
 communication, since I can access this  OST and read files stored there
 from other lustre clients.

 The problem is a deadlock condition in which the OSS and some clients
 refuse connections from each other as I can see from dmesg:

 in the client
 LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating with
 10.2.2.3@o2ib, operation ost_connect failed with -16.

 in the server
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at
 10.2.64.4@o2ib) reconnecting
 Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9 (at
 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs

 this only happen with clients that are reading a lot of small files
 (~100MB each) in the same OST.

 thank you,

 Eduardo



 2013/10/17 Jeff Johnson jeff.john...@aeoncomputing.com

 Hola Eduardo,

 How are the OSTs connected to the OSS (SAS, FC, Infiniband SRP)?
 Are there any non-Lustre errors in the dmesg output of the OSS?
 Block devices error on the OSS (/dev/sd?)?

 If you are losing [scsi,sas,fc,srp] connectivity you may see this sort
 of thing. If the OSTs are connected to the OSS node via IB SRP and your
 IB fabric gets busy or you have subnet manager issues you might see a
 condition like this.

 Is this the AliceFS at DGTIC?

 --Jeff



 On 10/17/13 3:52 PM, Eduardo Murrieta wrote:
  Hello,
 
  this is my first post on this list, I hope someone can give me some
  advise on how to resolve the following issue.
 
  I'm using the lustre release 2.4.0 RC2 compiled from whamcloud
  sources, this is an upgrade from lustre 2.2.22 from same sources.
 
  The situation is:
 
  There are several clients reading files that belongs mostly to the
  same OST, afther a period of time the clients starts loosing contact
  with this OST and processes stops due to this fault, here is the state
  for such OST on one client:
 
  client# lfs check servers
  ...
  ...
  lustre-OST000a-osc-8801bc548000: check error: Resource temporarily
  unavailable
  ...
  ...
 
  checking dmesg on client and OSS server we have:
 
  client# dmesg
  LustreError: 11-0: lustre-OST000a-osc-8801bc548000: Communicating
  with 10.2.2.3@o2ib, operation ost_connect failed with -16.
  LustreError: Skipped 24 previous similar messages
 
  OSS-server# dmesg
  
  Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9
  (at 10.2.64.4@o2ib) reconnecting
  Lustre: lustre-OST000a: Client 0afb2e4c-d870-47ef-c16f-4d2bce6dabf9
  (at 10.2.64.4@o2ib) refused reconnection, still busy with 9 active RPCs
  
 
  At this moment I can ping from client to server and vice versa, but
  some time this call also hangs on server and client.
 
  client# # lctl ping OSS-server@o2ib
  12345-0@lo
  12345-OSS-server@o2ib

Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-07 Thread Jeff Johnson
On 10/7/13 11:23 AM, Anjana Kar wrote:
 Here is the exact command used to create a raidz2 pool with 8+2 drives,
 followed by the error messages:

 mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs --index=0
 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2 /dev/sda /dev/sdc
 /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm /dev/sdo /dev/sdq /dev/sds

Additional suggestion. You should make zfs/zpools with persistent device 
names like /dev/disk/by-path or /dev/disk/by-id. Standard 'sd' device 
names are not persistent and could change after a reboot or hardware 
change. This would be bad for a zpool with data.

Also, I don't know if its just email formatting but be sure that command 
is all on one line:

mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs --index=0 \
--mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2 /dev/sda /dev/sdc \
/dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm /dev/sdo /dev/sdq /dev/sds



--Jeff

-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-performance Computing / Lustre Filesystems / Scale-out Storage

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Known working 1.8.x client mouting 2.x server combinations?

2013-08-07 Thread Jeff Johnson
Greetings,

Is there a table of known stable client to server combinations for a 1.8.x
client to mount a 2.x LFS?

I'm assisting a group trying to mount two LFSs, one is a 1.8.7 LFS and the
other is a 2.1.6 LFS. The client access is over tcp (10GbE).

I installed the latest 1.8.9 lustre-client for CentOS6 on a test client. I
am able to mount the 1.8.7 LFS with no problem. If I umount the 1.8.7 LFS
and mount the 2.1.6 LFS the client machine deadlocks instantly. No error
messages, after 2-3mins the workstation does a hardware reset. No log
entries.

The ultimate goal is being able to mount both the 1.8.7 and 2.1.6 LFSs
simultaneously from the clients over tcp.

I understand that newer server code is friendlier to older client code
but how new? 2.1.x tree or do I have to be in 2.3 tree to get this ability?

Thanks..

-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Completely lost MGT/MDT

2013-06-26 Thread Jeff Johnson
I am not aware of any tool or method to recover from a lost MGT/MDT. Do 
you have any recent backups of your MDT device?

I would hold on to your MDT device with care and see if someone can help 
you resurrect it.

--Jeff


On 6/26/13 3:01 PM, Andrus, Brian Contractor wrote:
 All,

 We have a sizeable filesystem and during a hardware upgrade, our MDT disk was 
 completely lost.
 I am trying to find if and how to recover from such an event, but am not 
 finding anything.

 We were running lustre 2.3 and have upgraded to 2.4 (or are in the process of 
 it).

 Can anyone point me in the right direction here?

 Thanks in advance,


 Brian Andrus
 ITACS/Research Computing
 Naval Postgraduate School
 Monterey, California
 voice: 831-656-6238


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 2.2 with centos 6.3 gives problem while loading o2ib module for infiniband

2013-03-26 Thread Jeff Johnson
Faheem,

Could you reply with some error messages or details on the failure? I do 
not have a crystal ball. Details of the error, your lnet kernel module 
options and state of IB interface when the error occurs would be helpful.

--Jeff

On 3/26/13 3:00 AM, faheem patel wrote:
 Dear All,

 we are facing problem while connecting o2ib module.

 Lustre 2.2 with centos 6.3 gives problem while loading o2ib module for 
 infiniband.


 Thanks in advance

 Regards,

 Faheem Patel


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Job: Lustre Development Engineer, Aeon Computing, Inc - San Diego, CA USA

2013-02-03 Thread Jeff Johnson
Apologies to the list for posting this Lustre job opportunity. No 
disrespect meant.

Lustre Development Engineer, Aeon Computing, Inc - San Diego, CA USA

Position Summary
Primary role will be to support and enhance a single large Lustre 
filesystem installation.

Primary Duties / Responsibilities
Analyze, design, program, debug, and modify Lustre code
Identification and resolution of Lustre filesystem and LNET bugs.
Respond to support requests by analyzing issues and creating code fixes 
or providing workarounds
Development of site customized features
Assist Intel HPDD Lustre development team with developing Chroma 
functionality
Advise and assist in planning adoption of future releases of Lustre 
including Lustre on ZFS
Engaging with Intel HPDD Lustre development team and the Lustre community.

Qualifications (Knowledge, Skills, Abilities)
Linux kernel development
Deep acumen with high performance storage systems
Proactive and solution-oriented problem solver
Prior project and/or team leadership experience is not required but 
would be considered an asset
Strong verbal and written English communication skills.
Ability to work well in a distributed team environment.
High level of attention to detail and comfortable multi-tasking

Requirements (Education, Certification, Training, and Experience)
Master’s Degree in computer science and 3 years of relevant experience; 
or Bachelor’s Degree in computer science or closely related field and 5 
years of relevant experience; or equivalent work experience
Development experience preferably with Lustre, LNET, Ethernet and TCP/IP
Relocation to San Diego, CA

Physical Demands / Work Environment
Ability and desire to work as part of a geographically-distributed team


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Adding iSCSI SAN to Lustre

2013-01-28 Thread Jeff Johnson
Jon,

Any block device, be it disk, RAID, IB SRP or iSCSI requires an OSS node 
to front end the block storage. If the iSCSI storage device was a 
storage server with a software iSCSI layer you could potentially strip 
out the iSCSI software and lay down Linux and Lustre on it. Without 
knowing exactly what iSCSI system it is my statement about stripping out 
the software is a generalization.

--Jeff


On 1/28/13 4:13 PM, Jon Yeargers wrote:

 I may have an opportunity to repurpose an existing iSCSI SAN device. 
 If I wanted to add it to an existing Lustre setup (or create a new 
 one) – my understanding is that I would need a machine to act as the 
 ‘communications link’ to Lustre from the SAN device. Something has to 
 represent the OSS device – right? An iSCSI storage device would need a 
 representative for this.. ?

 Does that make any sense?



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] is Luster ready for prime time?

2013-01-17 Thread Jeff Johnson
Greg,

I'm echoing Charles' comments a bit. Specific filesystems are not good 
at everything. While it is my opinion that Lustre can be very stable, 
and like Colin stated the underlying hardware and configuration is 
crucial to that end, the filesystem may not be the best performing at 
every data access model.

Like every other filesystem Lustre has use cases where it excels and 
others where overhead may be less than optimal. Other filesystems and 
storage devices also suffer from one size fits most.

Many here would likely be biased toward Lustre but many of those people 
have also used many other options on the market and ended up here.

--Jeff

-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117




On 1/17/13 9:17 AM, greg whynott wrote:
 Hello,

 just signed up today, please forgive me if this question has been 
 covered recently.  - in a bit of a rush to get an answer on this as we 
 need to make a decision soon,  the idea of using luster was thrown 
 into the mix very late in the decision making process.


 We are looking to procure a new storage solution which will 
 predominately be used for HPC output but will also be used as our main 
 business centric storage for day to day use. Meaning the file system 
 needs to be available 24/7/365. The last time I was involved in 
 considering Luster was about 6 years ago and it was at that time being 
 considered for scratch space for HPC usage only.

 Our VMs and databases would remain on non-luster storage as we already 
 have that in place and it works well.The luster file system 
 potentially would have everything else.  Projects we work on typically 
 take up to 2 years to complete and during that time we would want all 
 assets to remain on the file system.

 Some of the vendors on our short list include HDS(Blue Arc), Isilon 
 and NetApp.Last week we started bouncing the idea of using Luster 
 around.   I'd love to use it if it is considered stable enough to do so.

 your thoughts and/or comments would be greatly appreciated. thanks for 
 your time.

 greg





 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jeff Johnson
Jason,

The prebuilt server-side Lustre packages from Whamcloud are built 
against RHEL/CentOS kernel sources with kernel-ib active in them. This 
means that any of the Lustre prebuilt server packages are already tied 
to RHEL's kernel-ib.

To accomplish your stated goal you'll have to start with a non 
Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install 
the OFED version of your choice. Once you have that you can build Lustre 
from source where it will compile against OFED and the installed kernel.

--Jeff

---
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

/* Follow us on Twitter - @AeonComputing */




On 12/28/12 3:54 PM, Jason Brooks wrote:
 Hello,

 I am having trouble installing the server modules for lustre 2.1.4 and 
 use mellanox's OFED distribution so we may use infiniband. Would you 
 folks look at my procedure and results below and let me know what you 
 think? Thanks very much!

 The mellanox ofed installation builds and installs some kernel modules 
 too, so I used this method to ensure OFED compiled against the correct 
 kernel. This is on centos 6.3.

  1. download all lustre rpms from whamcloud
  2. install kernel, kernel-firmware, kernel-headers, and kernel-devel
  1. in this case, it's the rpm files with
 2.6.32-279.14.1.el6_lustre.x86_64 in their name
  3. reboot into this lustre kernel
  4. install the remaining rpms
  5. download ofed from mellanox
 MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso
  1. build mellanox ofed bits using the lustre kernel and
 kernel-devel info
  2. install mellanox ofed
  6. reboot
  7. upon reboot, if I do NOT have o2ib3 in my lnet networks
 parameters, I can modprobe lnet and lustre.
  8. if I DO have o2ib3 present in the lnet parameters, running
 modprobe lustre gets me:

 ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
  
 Input/output error
 WARNING: Error inserting fid 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
  
 Input/output error
 WARNING: Error inserting mdc 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
  
 Input/output error
 WARNING: Error inserting osc 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
  
 Input/output error
 WARNING: Error inserting lov 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
  
 Input/output error
 FATAL: Error inserting lustre 
 (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
  
 Input/output error


 dmesg shows:
 ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
 ko2iblnd: Unknown symbol ib_fmr_pool_unmap
 ko2iblnd: disagrees about version of symbol ib_create_cq
 ko2iblnd: Unknown symbol ib_create_cq
 …







 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Applications of Lustre - streaming?

2012-12-07 Thread Jeff Johnson
On 12/7/12 9:34 AM, Dilger, Andreas wrote:
 I've been using Lustre for years with my home MythTV (Linux PVR) setup.
Nerd. :)

-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-21 Thread Jeff Johnson
Megan,

You will have to rebuild Lustre from source. Furthermore you will have 
to have the Mellanox ib driver source installed so the Lustre build 
process can grab the necessary bits from the Mellanox source.

The issue you are seeing is exactly what you think it is. The WC builds 
use the RHEL in-kernel IB driver. I have even had issues with MDS/OSS 
boxes running RHEL in-kernel IB and clients running Mellanox of OFED IB 
drivers. Even though IB is a standard you really need to have 
everything, from core to edge, talking the same driver.

I recently did nearly the same config you have; RHEL6.2 x86_64, MLX 
OFED, Lustre 2.1.3.

You could opt to run your Mellanox IB HCA using the RHEL in-kernel IB 
drivers and not have to recompile anything.

--Jeff


On 11/20/12 1:20 PM, Ms. Megan Larko wrote:
 Hello to Everyone!

 I have a question to which I think I know the answer, but I am seeking
 confirmation (re-assurance?).

 I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
 rpms from the Whamcloud site for linux kernel
 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
 lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
 have from the Whamcloud site
 kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
 kernel-ib-devel for same.

 The lustre file system works properly for TCP.

 I would like to use InfiniBand.   The system has a new Mellanox card
 for which mlxn1 firmware and drivers were installed.   After this was
 done (I cannot speak to before) the IB network will come up on boot
 and copy and ping in a traditional network fashion.

 Hard Part:  I would like to run the lustre file system on the IB (ib0).
 I re-created the lustre network to use /etc/modprobe.d/lustre.conf
 pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
 osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
 the osts point to mgs on IB net).   When I modprobe lustre to start
 the system I receive error messages stating that there are
 Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
 lov.ko.   The lustre.ko cannot be started.   A look in
 /var/log/messages reveals many Unknown symbol and Disagrees about
 version of symbol  from the ko2iblnd module.

 A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko  shows it
 pointing to the Modules.symvers of the lustre kernel.

 Am I correct in thinking that because of the specific Mellanox IB
 hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
 file), that I have to build Lustre-2.1.2 from tarball to use the
 configure --with-o2ib=/usr/src/ofa_kernel  mandating that this
 system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
 the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

 Is a rebuild of lustre from source mandartory or is there a way in
 which I may point to the appropriate symbols needed by the
 ko2iblnd.ko?

 Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
 else in the world, have a great weekend!

 Megan Larko
 Hewlett-Packard
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lctl ping of Pacemaker IP

2012-11-01 Thread Jeff Johnson
Megan,

lnet pings aren't the same as tcpip/udp pings. An lnet ping 'lctl ping' would 
need to touch an active lnet instance on the target address. I don't think you 
can bind lnet to a pacemaker virtual IP but I'll let someone smarter than me on 
this list confirm or correct me.

In any event an lnet ping and udp ping are completely separate animals.

--Jeff

Sent from my iPhone

On Nov 1, 2012, at 21:04, Ms. Megan Larko dobsonu...@gmail.com wrote:

 Greetings!
 
 I am working with Lustre-2.1.2 on RHEL 6.2.  First I configured it
 using the standard defaults over TCP/IP.   Everything worked very
 nicely usnig a real, static --mgsnode=a.b.c.x value which was the
 actual IP of the MGS/MDS system1 node.
 
 I am now trying to integrate it with Pacemaker-1.1.7.I believe I
 have most of the set-up completed with a particular exception.  The
 lctl ping command cannot ping the pacemaker IP alias (say a.b.c.d).
 The generic ping command in RHEL 6.2 can successfully access the
 interface.  The Pacemaker alias IP (for failover of the combnied
 MGSMDS node with Fibre Channel multipath storage shared between both
 MGS/MDS-configured machines)  works in and of itself.  I tested with
 an apache service.   The Pacemaker will correctly fail over the
 MGS/MDS from system1 to system2 properly.  If I go to system2 then my
 Lustre file system stops because it cannot get to the alias IP number.
 
 I did configure the lustre OSTs to use --mgsnode=a.b.c.d (a.b.c.d
 representing my Pacemaker IP alias).  A tunefs.lustre confirms the
 alias IP number.  The alias IP number does not appear in LNET (lctl
 list_nids), and lctl ping a.b.c.d fails.
 
 Should this IP alias go into the LNET data base?  If yes, how?   What
 steps should I take to generate a successful lctl ping a.b.c.d?
 
 Thanks for reading!
 Cheers,
 megan
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Large Corosync/Pacemaker clusters

2012-10-24 Thread Jeff Johnson
Shawn,

In my opinion you shouldn't be running corosync on any more than two 
machines. They should be configured in self contained pairs (mds pair, 
oss pairs). Anything beyond that would be chaos to manage, even if it 
worked. Don't forget the stonith portion. Not every block storage 
implementation respects mmp protection.

--Jeff


On 10/19/12 9:52 AM, Hall, Shawn wrote:

 Hi,

 We’re setting up fairly large Lustre 2.1.2 filesystems, each with 18 
 nodes and 159 resources all in one Corosync/Pacemaker cluster as 
 suggested by our vendor. We’re getting mixed messages on how large of 
 a Corosync/Pacemaker cluster will work well between our vendor an others.

 1.Are there Lustre Corosync/Pacemaker clusters out there of this size 
 or larger?

 2.If so, what tuning needed to be done to get it to work well?

 3.Should we be looking more seriously into splitting this 
 Corosync/Pacemaker cluster into pairs or sets of 4 nodes?

 Right now, our current configuration takes a long time to start/stop 
 all resources (~30-45 mins), and failing back OSTs puts a heavy load 
 on the cib process on every node in the cluster. Under heavy IO load, 
 the many of the nodes will show as “unclean/offline” and many OST 
 resources will show as inactive in crm status, despite the fact that 
 every single MDT and OST is still mounted in the appropriate place. We 
 are running 2 corosync rings, each on a private 1 GbE network. We have 
 a bonded 10 GbE network for the LNET.

 Thanks,

 Shawn



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mounting Failover OSTs

2012-10-11 Thread Jeff Johnson
Brian,

Do you have corosync or other Linux HA software infrastructure running 
on these systems? You need an HA software layer to manage heartbeat 
monitoring, split-brain protection and mounting/migrating of resources.

--Jeff

On 10/11/12 2:02 PM, Andrus, Brian Contractor wrote:
 All,

 I am starting to try and configure failover for our lustre filesystem.
 Node00 is the mgs/mdt
 Node00 is the oss for ost0 and failnode for ost1
 Node01 is the oss for ost1 and failnode for ost0

 Both osts are on an SRP network and are visible by both nodes.
 Ost0 is mounted on node00
 Ost1 is mounted on node01

 If I try to mount ost0 on node01 I see in the logs for node00:
   kernel: Lustre: Denying initial registration attempt from nid 
 10.100.255.250@o2ib, specified as failover

 So do I have to manually mount the ost for failover purposes when there is a 
 fail?
 I would have thought I mount the osts on both nodes and lustre will manage 
 which node is the active node.


 Brian Andrus
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

/* New Address */
4170 Morena Boulevard, Suite D - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-30 Thread Jeff Johnson
Following up on my original post. I switched from /bin/tar that comes 
with RHEL/CentOS 5.x to thw Whamcloud patched tar utility. The entire 
backup was successful and took only 12 hours to complete. The CPU 
utilization was high 90% but only on one core. The process was much 
faster than the standard tar shipped in RHEL/CentOS and the only slow 
downs were on file pointers to very large files (100TB+) with large 
stripe counts. The files that were going very slow when I reported the 
initial problem were backed up instantly with the Whamcloud version of tar.

Best part, the MDT was saved and the 4PB filesystem is in production again.

--Jeff



On 5/30/12 3:02 PM, Andreas Dilger wrote:
 On 2012-05-29, at 1:28 PM, Peter Grandi wrote:
 The tar backup of the MDT is taking a very long time. So far it has
 backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
 process pointers to small or average size files are backed up quickly
 and at a consistent pace. When tar encounters a pointer/inode
 belonging to a very large file (100GB+) the tar process stalls on that
 file for a very long time, as if it were trying to archive the real
 filesize amount of data rather than the pointer/inode.
 If you have stripes on, a 100GiB file will have 100,000 1MiB
 stripes, and each requires a chunk of metadata. The descriptor
 for that file will have this potentially a very large number of
 extents, scattered around the MDT block device, depending on how
 slowly the file grew etc.
 While that may be true for other distributed filesystems, that is
 not true for Lustre at all.  The size of a Lustre object is not
 fixed to a chunk size like 32MB or similar, but rather is
 variable depending on the size of the file itself.  The number of
 stripes (== objects) on a file is currently fixed at file
 creation time, and the MDS only needs to store the location of
 each stripe (at most one per OST).  The actual blocks/extents of
 the objects are managed inside the OST itself and are never seen
 by the client or the MDS.

 Cheers, Andreas
 --
 Andreas Dilger   Whamcloud, Inc.
 Principal Lustre Engineerhttp://www.whamcloud.com/




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-28 Thread Jeff Johnson
Greetings,

I am aiding in the recovery of a multi-Petabyte Lustre filesystem
(1.8.7) that went down hard due to site wide power loss. Power loss
caused the MDT RAID volume to be put in a critical state and I was
able to get the md raid based MDT device mounted read only and the MDT
mounted read only as type ldiskfs.

I was able to successfully backup the extended attributes of the MDT.
This process took about 10 minutes.

The tar backup of the MDT is taking a very long time. So far it has
backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
process pointers to small or average size files are backed up quickly
and at a consistent pace. When tar encounters a pointer/inode
belonging to a very large file (100GB+) the tar process stalls on that
file for a very long time, as if it were trying to archive the real
filesize amount of data rather than the pointer/inode.

During this process there are no errors reported by kernel, ldiskfs,
md or tar. Nothing that would indiciate why things are so slow on
pointers to large files. In watching the tar process the CPU
utilization is at or near 100% so it is doing something. Running
iostat at the same time shows that while tar is at or near 100% CPU
there are no reads taking place on the MDT device and no writes to the
device where the tarball is being written.

It appears that the tar process goes to outer space when it encounters
pointers to very large files. Is this expected behavior?

The backup command used is the one from the MDT backup process in the
1.8 manual: 'tar zcvf tarfile --sparse .'

df reports the ldiskfs MDT as 5GB used:
/dev/md0   2636788616   5192372 2455778504   1% /mnt/mdt

df -i reports the ldiskfs MDT as having 10,300,000 inodes used:
/dev/md0   1758199808 10353389 17478464191% /mnt/mdt

Any feedback is appreciated!

--Jeff


--
--
Jeff Johnson
Partner
Aeon Computing

jeff dot johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?

2012-04-02 Thread Jeff Johnson
FYI, I tested the Lustre 1.8 patch that provides support for 
2.6.18-308.1.1 and while it does successfully compile the resulting 
changes in the kernel severely impact the isci (Intel SAS) driver. 
Romley (new Xeon-E5) systems booting this driver panic during boot, just 
after switchroot. I will capture the trace and post it, it appears to be 
the result of patch changes to jbd.

--Jeff


On Fri, Mar 30, 2012 at 11:41 AM, Peter Jones pjo...@whamcloud.com wrote:
 Heh. Nice imagery Jeff. Yes, this patch is still under active
 development and also note that it will not be able to complete autotest
 until we switch it to use RHEL5.8 by default (hopefully soon, but this
 had been on hold until we saw whether we needed RC3 for 2.2). So, if you
 are willing to test the latest patch that would be appreciated and could
 accelerate things (please post your findings on the JIRA ticket), but
 regardless we should have this work completed in the near future.

--
--
Jeff Johnson
Aeon Computing

jeff dot johnson at aeoncomputing.com
www.aeoncomputing.com

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?

2012-03-30 Thread Jeff Johnson
Greetings,

Does anyone know the most recent kernel (RHEL/CentOS) that can be
successfully patched and compiled against the current Lustre 1.8 git
source tree? I attempted 2.6.18-308.1.1 but there are several patches
that fail. Quilt would not make it past the third patch in the series
file. Applying the patches manually reveals several hunk failures. I
require a kernel more recent than 2.6.18-274.3.1 for non-Lustre
related issues.

Thanks in advance.

--
--
Jeff Johnson
Aeon Computing

jeff dot johnson at aeoncomputing.com
www.aeoncomputing.com

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?

2012-03-30 Thread Jeff Johnson
I applied that patch at 9AM this morning. Several hunks failed and
then I noticed that the LU-1052 patch is literally morphing as the
morning progresses.

There is a difference between living on the bleeding edge and standing
in a field watching the razor's edge coming at you... :)


On Thu, Mar 29, 2012 at 11:40 PM, Johann Lombardi joh...@whamcloud.com wrote:
 On Thu, Mar 29, 2012 at 11:30:03PM -0700, Jeff Johnson wrote:
 Does anyone know the most recent kernel (RHEL/CentOS) that can be
 successfully patched and compiled against the current Lustre 1.8 git
 source tree? I attempted 2.6.18-308.1.1 but there are several patches
 that fail.

 LU-1052 is the jira ticket for 2.6.18-308.1.1 support. There is a patch here 
 which should help you to build against this kernel.

 Johann
 --
 Johann Lombardi
 Whamcloud, Inc.
 www.whamcloud.com


--
--
Jeff Johnson
Aeon Computing

jeff dot johnson at aeoncomputing.com
www.aeoncomputing.com

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Latest RHEL/CentOS kernel that compiles against Lustre 1.8 git tree?

2012-03-30 Thread Jeff Johnson
My Lustre system (IB, Xeon-E5, 4xOSS, 4xOST, 84TB, 16 clients) is a
development only system. No meaningful data. It is currently loaded
with CentOS5.8 and I have Lustre 1.8 source checked out via git last
night. I'm happy to test and provide feedback.

FTR, I think Intel's new isci driver is a real PITA.


On Fri, Mar 30, 2012 at 11:41 AM, Peter Jones pjo...@whamcloud.com wrote:
 Heh. Nice imagery Jeff. Yes, this patch is still under active
 development and also note that it will not be able to complete autotest
 until we switch it to use RHEL5.8 by default (hopefully soon, but this
 had been on hold until we saw whether we needed RC3 for 2.2). So, if you
 are willing to test the latest patch that would be appreciated and could
 accelerate things (please post your findings on the JIRA ticket), but
 regardless we should have this work completed in the near future.

--
--
Jeff Johnson
Aeon Computing

jeff dot johnson at aeoncomputing.com
www.aeoncomputing.com

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre-1.8.4 : BUG soft lock up

2011-08-09 Thread Jeff Johnson
Greetings,

The below console output is from a 1.8.4 OST (RHEL5.5, 
2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug 
for sure. Just wondering if anyone has seen this or something very 
similar. Updating to 1.8.6 WC variant isn't an option at this time.

If anyone has some insight into this I'd appreciate the feedback.

Thanks,

--Jeff

BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409]
CPU 6:
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) 
jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U)
osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) 
autofs4(U) hidp(U) l2cap(U) bluetooth(U)
lockd(U) sunrpc(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) 
x_tables(U) ib_iser(U) libiscsi2(U)
scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_srp(U) rds(U) ib_sdp(U) 
ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U)
crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) 
iw_cm(U) ib_addr(U) ib_sa(U) mptsas(U) mptctl(U)
dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) 
power_meter(U) hwmon(U) i2c_ec(U) dell_wmi(U) wmi(U)
button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) 
parport(U) mlx4_ib(U) ib_mad(U) ib_core(U)
mlx4_en(U) joydev(U) shpchp(U) sg(U) mlx4_core(U) e1000e(U) serio_raw(U) 
pcspkr(U) i2c_i801(U) i2c_core(U) dm_raid45(U)
dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) mptspi(U) 
scsi_transport_spi(U) mptscsih(U) mptbase(U)
scsi_transport_sas(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) raid1(U) 
ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 409, comm: kswapd0 Tainted: G  2.6.18-194.3.1.el5_lustre.1.8.4 #1
RIP: 0010:[801011bf]  [801011bf] dqput+0x105/0x19f
RSP: 0018:8101be805cd0  EFLAGS: 0202
RAX: 81012e03f000 RBX:  RCX: 81012e03f000
RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0
RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918
R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60
R13: 8b6448f0 R14: ffe2 R15: 8b6448f0
FS:  () GS:8101bfc2adc0() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 00402000 CR3: 00201000 CR4: 06e0

Call Trace:
  [8010182b] dquot_drop+0x30/0x5e
  [8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70
  [80022d99] clear_inode+0xb4/0x123
  [80034e52] dispose_list+0x41/0xe0
  [8002d6a7] shrink_icache_memory+0x1b7/0x1e6
  [8003f466] shrink_slab+0xdc/0x153
  [80057e59] kswapd+0x343/0x46c
  [800a0ab2] autoremove_wake_function+0x0/0x2e
  [80057b16] kswapd+0x0/0x46c
  [800a089a] keventd_create_kthread+0x0/0xc4
  [80032890] kthread+0xfe/0x132
  [8009d728] request_module+0x0/0x14d
  [8005dfb1] child_rip+0xa/0x11
  [800a089a] keventd_create_kthread+0x0/0xc4
  [80032792] kthread+0x0/0x132
  [8005dfa7] child_rip+0x0/0x11


-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Enabling mds failover after filesystem creation

2011-06-14 Thread Jeff Johnson
Greetings,

I am attempting to add mds failover operation to an existing v1.8.4 
filesystem. I have heartbeat/stonith configured on the mds nodes. What 
is unclear is what to change in the lustre parameters. I have read over 
the 1.8.x and 2.0 manuals and they are unclear as exactly how to enable 
failover mds operation on an existing filesystem.

Do I simply run the following on the primary mds node and specify the 
NID of the secondary mds node?

tunefs.lustre --param=failover.node=10.0.1.3@o2ib /dev/mdt device

where: 10.0.1.2=primary mds, 10.0.1.3=secondary mds

All of the examples for enabling failover via tunefs.lustre are for OSTs 
and I want to be sure that there isn't a different procedure for the MDS 
since it can only be active/passive.

Thanks,

--Jeff

--
Jeff Johnson
Aeon Computing

www.aeoncomputing.com
4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Enabling mds failover after filesystem creation

2011-06-14 Thread Jeff Johnson

Apologies, I should have been more descriptive.

I am running a dedicated MGS node and MGT device. The MDT is a 
standalone RAID-10 shared via SAS between two nodes, one being the 
current MDS and the second being the planned secondary MDS. Heartbeat 
and stonith w/ ipmi control is currently configured but not started 
between the two nodes.



On 6/14/11 12:12 PM, Cliff White wrote:

It depends - are you using a combined MGS/MDS?
If so, you will have to update the mgsnid on all servers to reflect 
the failover node,

plus change the client mount string to show the failover node.
otherwise, it's the same procedure as with an OST.
cliffw


On Tue, Jun 14, 2011 at 12:06 PM, Jeff Johnson 
jeff.john...@aeoncomputing.com 
mailto:jeff.john...@aeoncomputing.com wrote:


Greetings,

I am attempting to add mds failover operation to an existing v1.8.4
filesystem. I have heartbeat/stonith configured on the mds nodes. What
is unclear is what to change in the lustre parameters. I have read
over
the 1.8.x and 2.0 manuals and they are unclear as exactly how to
enable
failover mds operation on an existing filesystem.

Do I simply run the following on the primary mds node and specify the
NID of the secondary mds node?

tunefs.lustre --param=failover.node=10.0.1.3@o2ib /dev/mdt device

where: 10.0.1.2=primary mds, 10.0.1.3=secondary mds

All of the examples for enabling failover via tunefs.lustre are
for OSTs
and I want to be sure that there isn't a different procedure for
the MDS
since it can only be active/passive.

Thanks,

--Jeff

--
Jeff Johnson
Aeon Computing

www.aeoncomputing.com http://www.aeoncomputing.com
4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
mailto:Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss




--
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com http://www.whamcloud.com




--

Jeff Johnson
Aeon Computing

www.aeoncomputing.com
4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre 1.8.4 - Local mount of ost for backup purposes, fs type ldiskfs or ext4?

2011-05-11 Thread Jeff Johnson
Greetings,

I am doing a local mount of a 8TB ost device in a Lustre 1.8.4 
installation. The ost was built with a backfstype of ldiskfs.

When attempting the local mount:

 mount -t ldiskfs /dev/sdc /mnt/save/ost

I get:

 mount: wrong fs type, bad option, bad superblock on /dev/sdt,
 missing codepage or other error

I am able to mount the same block device as ext4, just not as ldiskfs. I 
need to be able to mount as ldiskfs to get access to the extended 
attributes and back them up. Is this still the case with the ext4 
extensions for Lustre 1.8.4? I am able to mount read-only as ext4 but 
any attempt at reading the extended attributes with getfattr fails.

Thanks,

--Jeff

-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] aacraid kernel panic caused failover

2011-04-06 Thread Jeff Johnson
I have seen similar behavior on these controllers. On dissimilar configs and 
different aged systems. These happened to be non-Lustre standalone nfs and 
iscsi target boxes. 

Went through controller and drive firmware upgrades, low-level fw dumps  and 
analysis from dev engineers.

In the end it was never really explained or resolved. It appears that these 
controllers, like small children, have tantrums and fall apart. A power cycle 
clears the condition.

Not the best controller for an OSS.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com


On Apr 6, 2011, at 1:05, Thomas Roth t.r...@gsi.de wrote:

 We have ~ 60 servers with these Adaptec controllers, and found this problem 
 just to happen from time to time.
 Upgrade of the aacraid module wouldn't help. We had contacts to Adaptec, but 
 they had no clue either.
 Only good thing is it seems that this adapter panic happens in an instant, 
 halting the machine, but has no prior phase of degradation: the controller
 doesn't start leaving out every second bit or just writing the '1's and not 
 the '0's or ... - so whatever data has made it to the disks before the
 crash seems to be quite sensible. Reboot and never buy Adaptec again.
 
 Cheers,
 Thomas
 
 On 04/06/2011 07:03 AM, David Noriega wrote:
 Ok I updated the aacraid driver and the raid firmware, yet I still had
 the problem happen, so I did more research and applied the following
 tweaks:
 
 1) Rebuilt mkinitrd with the following options:
 a) edit /etc/sysconfig/mkinitrid/multipath to contain MULTIPATH=yes
 b) mkinitrid initrd-2.6.18-194.3.1.el5_lustre.1.8.4.img
 2.6.18-194.3.1.el5_lustre.1.8.4 --preload=scsi_dh_rdac
 2) Added the local hard disk to the multipath black list
 3) Edited modprobe.conf to have the following aacraid options:
 options aacraid firmware_debug=2 startup_timeout=60 #the debug doesn't
 seem to print anything to dmesg
 4) Added pcie_aspm=off to the kernel boot options
 
 So things looked good for a while. I did have a problem mounting the
 lustre partitions but this was my fault in misconfiguring some lnet
 options I was experimenting with. I fixed that and just as a test, I
 ran 'modprobe lustre' since I wasn't ready to fail back the partitions
 just yet(wanted to wait till when activity was the lowest). That was
 earlier today. I was about to fail back tonight, yet when I checked
 the server again I saw in dmesg the same aacraid problems from before.
 Is it possible lustre is interfering with aacraid? Its weird since I
 do have a duplicate machine and its not having any of thise problems.
 
 On Fri, Mar 25, 2011 at 9:55 AM, Temple  Jason jtem...@cscs.ch wrote:
 Adaptec should have the firmware and drivers on their site for your card.  
 If not adaptec, then SOracle will have it available somewhere.
 
 The firmware and system drivers usually have a utility that will check the 
 current version and upgrade it for you.
 
 Hope this helps (I use different cards, so I can't tell you exactly).
 
 -Jason
 
 -Original Message-
 From: David Noriega [mailto:tsk...@my.utsa.edu]
 Sent: venerdì, 25. marzo 2011 15:47
 To: Temple Jason
 Subject: Re: [Lustre-discuss] aacraid kernel panic caused failover
 
 Hmm not sure, whats the best way to find out?
 
 On Fri, Mar 25, 2011 at 9:46 AM, Temple  Jason jtem...@cscs.ch wrote:
 Hi,
 
 Are you using the latest firmware?  This sort of thing used to happen to 
 me, but with different raid cards.
 
 -Jason
 
 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Noriega
 Sent: venerdì, 25. marzo 2011 15:38
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] aacraid kernel panic caused failover
 
 Had some crazyness happen to our lustre system. We have two OSSs, both
 identical sun x4140 servers and on only one of them have I've seen
 this pop up in the kernel messages and then a kernel panic. The panic
 seemed to then spread and caused the network to go down and the second
 OSS to try to failover(or failback?). Anyways 'splitbrain' occurred
 and I was able to get in and set them straight. I researched this
 aacraid module messages and so far all I can find says to increase the
 timeout, but these are old messages and currently they are set to 60.
 Anyone else have any ideas?
 
 aacraid: Host adapter abort request (0,0,0,0)
 aacraid: Host adapter reset request. SCSI hang ?
 AAC: Host adapter BLINK LED 0xef
 AAC0: adapter kernel panic'd ef.
 
 --
 Personally, I liked the university. They gave us money and facilities,
 we didn't have to produce anything! You've never been out of college!
 You don't know what it's like out there! I've worked in the private
 sector. They expect results. -Ray Ghostbusters
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 
 --
 Personally, I liked

Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-21 Thread Jeff Johnson
Daniel,

In the future you might want to consider posting some entries or pieces of a 
log rather than the entire log file. =)

Was this from the OSS that you say was rebooting or from your MDS node? I would 
look at the log file of the OSS node(s) that contain OST0006 and OST0007 and 
see if there are any RAID errors. It might be a network problem as well.

Morning is coming and one of the developers will likely respond to this with 
more suggestions.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com
m: 619-204-9061

On Dec 20, 2010, at 23:13, Daniel Raj danielraj2...@gmail.com wrote:

 Dec 19 04:19:49 cluster kernel: Lustre: 
 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: 
 d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting
 Dec 19 04:19:49 cluster kernel: Lustre: 
 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) Skipped 4 previous similar 
 messages
 Dec 19 04:30:05 cluster kernel: Lustre: 
 23308:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: 
 d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting
 Dec 19 04:30:05 cluster kernel: LustreError: 137-5: UUID 'cluster-ost7_UUID' 
 is not available  for connect (no target)
 Dec 19 04:30:05 cluster kernel: LustreError: 
 23290:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19)  
 r...@8103fd722c00 x1355442914715019/t0 o8-?@?:0/0 lens 368/0 e 0 to 
 0 dl 1292713305 ref 1 fl Interpret:/0/0 rc -19/0
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-20 Thread Jeff Johnson
Daniel,

It looks like your OST backend storage device may be having an issue. I
would check the health and stability of the backend storage device or raid
you are using for an OST device. It wouldn't likely cause a system reboot of
your OSS system. There may be more problems, hardware and/or OS related that
are causing the system to reboot in addition to Lustre complaining that it
can't find the OST storage device.

Others here on the list will likely give you a more detailed answer. The
storage device is the place i would look first.

--Jeff

-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote:




 Hi Genius,


 Good Day  !!


 I am Daniel. My OSS getting  automatically rebooted again and again .
 kindly help to me

 Its showing the below error messages


  *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg())
 @@@ processing error (-19)  r...@810400e24400 x1353488904620274/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc
 -19/0
 *

 Regards,

 Daniel A


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-20 Thread Jeff Johnson
Daniel,

Check the health and stability of your raid-6 volume. Make sure the raid is 
healthy and online. Use whatever monitor utility came with your raid card or 
check /proc/mdstat if it's a Linux mdraid. Check /var/log/messages for error 
messages from your raid or other hardware.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com

On Dec 20, 2010, at 22:27, Daniel Raj danielraj2...@gmail.com wrote:

 Hi Jeff,
 
 
 Thanks for your reply 
 
 Storage information : 
 
 
 DL380G5   == OSS + 16GB Ram 
 OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3
 MSA60 box   == OST
 RAID 6
 
 
 Regards,
 
 Daniel A 
 
 On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson 
 jeff.john...@aeoncomputing.com wrote:
 Daniel,
 
 It looks like your OST backend storage device may be having an issue. I would 
 check the health and stability of the backend storage device or raid you are 
 using for an OST device. It wouldn't likely cause a system reboot of your OSS 
 system. There may be more problems, hardware and/or OS related that are 
 causing the system to reboot in addition to Lustre complaining that it can't 
 find the OST storage device.
 
 Others here on the list will likely give you a more detailed answer. The 
 storage device is the place i would look first.
 
 --Jeff
 
 -- 
 --
 Jeff Johnson
 Manager
 Aeon Computing
 
 jeff.john...@aeoncomputing.com
 www.aeoncomputing.com
 t: 858-412-3810 x101   f: 858-412-3845
 m: 619-204-9061
 
 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
 
 
 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote:
 
 
 
 Hi Genius,
 
 
 Good Day  !!
 
 
 I am Daniel. My OSS getting  automatically rebooted again and again . kindly 
 help to me 
 
 Its showing the below error messages 
 
 
  kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ 
 processing error (-19)  r...@810400e24400 x1353488904620274/t0 
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc 
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for 
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ 
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0 
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc 
 -19/0
 
 
 Regards,
 
 Daniel A 
 
 
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Jeff Johnson
Greetings..

Is it possible that the below error can be derived from a client that 
has not been rebooted or had lustre kernel mods reloaded during a time 
when a few test file systems were built and mounted?

LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@81032dd2d000 x1348952525350751/t0 o8-?@?:0/0 lens 
368/0 e 0 to 0 dl 1291669076 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 55 
previous similar messages
LustreError: 137-5: UUID 'fs-OST0058_UUID' is not available  for connect (no 
target)


Normally this would be a back end storage issue. In this case, the oss 
where this error is logged doesn't have an ost OST0058. It has an ost 
OST006d. Regardless of the ost name, the backend raid is healthy with 
no hardware errors. No other h/w errors present on the oss node (e.g.: 
mce, panic, ib/enet failures, etc).

Previous test incarnations of this filesystem were built where ost name 
was not assigned (e.g.: OST) and was assigned upon first mount and 
connection to the mds. Is it possible that some clients have residual 
pointers or config data about the previously built file systems?

Thanks!

--Jeff

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Jeff Johnson
On 12/6/10 3:55 PM, Oleg Drokin wrote:
 Hello!

 On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote:
 Previous test incarnations of this filesystem were built where ost name
 was not assigned (e.g.: OST) and was assigned upon first mount and
 connection to the mds. Is it possible that some clients have residual
 pointers or config data about the previously built file systems?
 If you did not unmount clients from the previous incarnation of the 
 filesystem,
 those clients would still continue to try to contact the servers they know 
 about even
 after the servers themselves go away and are repurposed (since there is no 
 way for the
 client to know about this).
All clients were unmounted but the lustre kernel mods were never 
removed/reloaded nor were the clients rebooted.

Is it odd that this error would occur naming an ost that is not present 
on that oss? Should an oss only report this error about its own ost 
devices? As I said, this particular oss where the error came from only 
has an OST006c and OST006d. It does not have an OST0058 although it may 
have back when the filesystem was made with a simple test csv that did 
not specifically give index numbers as part of the mkfs.lustre process. 
They were named later, randomly, when the osts were first mounted and 
connected to the mds.

Do you think it is possible for a client to retain this information even 
though a umount/mount of the filesystem took place?

--Jeff
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss