[ceph-users] Re: Misleading error (osd has already bound to class) when starting osd on nautilus?

2020-11-25 Thread Anthony D'Atri

This was my first thought too.  Is it just this one drive, all drives on this 
host, or all drives in the cluster?

I’m curious if stupid HBA tricks are afoot, if this is a SAS / SATA drive.  
Especially if it’s a RAID-capable HBA vs passthrough.


>>> It might be an issue with the driver then reporting the wrong data. I'll
>>> look into it.
___
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


[ceph-users] Re: Misleading error (osd has already bound to class) when starting osd on nautilus?

2020-11-25 Thread David Caro

Forwarding here in case anyone is seeing the same/similar issue, Amit gave
really good pointers and a workaround :)


Thanks Amit!


On 11/25 16:08, Amit Ghadge wrote:
> Yes, and if you want avoid in future update this flag to 0 by $echo 0 >
> /sys/block/sdx/queue/rotational
> 
> Thanks
> 
> On Wed, Nov 25, 2020 at 4:03 PM David Caro  wrote:
> 
> >
> > Yep, you are right:
> >
> > ```
> > # cat /sys/block/sdd/queue/rotational
> > 1
> > ```
> >
> > I was looking to the code too but you got there before me :)
> >
> > https://github.com/ceph/ceph/blob/25ac1528419371686740412616145703810a561f/src/common/blkdev.cc#L222
> >
> >
> > It might be an issue with the driver then reporting the wrong data. I'll
> > look
> > into it.
> >
> > Do you mind if I reply on the list with this info? (or if you want you
> > reply)
> > I think this might help others too (and myself in the future xd)
> >
> > Thanks Amit!
> >
> > On 11/25 15:50, Amit Ghadge wrote:
> > > This might happen when the disk default sets 1
> > > in /sys/block/sdx/queue/rotational , 1 for HDD and 0 for SSD, But we not
> > > see any problem till now.
> > >
> > > -AmitG
> > >
> > > On Wed, Nov 25, 2020 at 3:08 PM David Caro  wrote:
> > >
> > > >
> > > > Hi!
> > > >
> > > > I have a nautilus ceph cluster, and today I restarted one of the osd
> > > > daemons
> > > > and spend some time trying to debug an error I was seeing in the log,
> > > > though it
> > > > seems the osd is actually working.
> > > >
> > > >
> > > > The error I was seeing is:
> > > > ```
> > > > Nov 25 09:07:43 osd15 systemd[1]: Starting Ceph object storage daemon
> > > > osd.44...
> > > > Nov 25 09:07:43 osd15 systemd[1]: Started Ceph object storage daemon
> > > > osd.44.
> > > > Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.846
> > > > 7f55395fbc80 -1 osd.44 106947 log_to_monitors {default=true}
> > > > Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.850
> > > > 7f55395fbc80 -1 osd.44 106947 mon_cmd_maybe_osd_create fail: 'osd.44
> > has
> > > > already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph
> > osd
> > > > crush rm-device-class ' to remove old class first': (16) Device or
> > > > resource busy
> > > > ```
> > > >
> > > > There's no other messages in the journal so at first I thought that
> > the osd
> > > > failed to start.
> > > > But it seems to be up and working correctly anyhow.
> > > >
> > > > There's no "hdd" class in my crush map:
> > > > ```
> > > > # ceph osd crush class ls
> > > > [
> > > > "ssd"
> > > > ]
> > > > ```
> > > >
> > > > And that osd is actually of the correct class:
> > > > ```
> > > > # ceph osd crush get-device-class osd.44
> > > > ssd
> > > > ```
> > > >
> > > > ```
> > > > # uname -a
> > > > Linux osd15 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1
> > (2020-06-07)
> > > > x86_64 GNU/Linux
> > > >
> > > > # ceph --version
> > > > ceph version 14.2.5-1-g23e76c7aa6
> > > > (23e76c7aa6e15817ffb6741aafbc95ca99f24cbb) nautilus (stable)
> > > > ```
> > > >
> > > > The osd shows up in the cluster and it's receiving load, so there
> > seems to
> > > > be
> > > > no problem, but does anyone know what that error is about?
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > --
> > > > David Caro
> > > > SRE - Cloud Services
> > > > Wikimedia Foundation 
> > > > PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
> > > >
> > > > "Imagine a world in which every single human being can freely share in
> > the
> > > > sum of all knowledge. That's our commitment."
> > > > ___
> > > > ceph-users mailing list -- [email protected]
> > > > To unsubscribe send an email to [email protected]
> > > >
> >
> > --
> > David Caro
> > SRE - Cloud Services
> > Wikimedia Foundation 
> > PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
> >
> > "Imagine a world in which every single human being can freely share in the
> > sum of all knowledge. That's our commitment."
> >

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]