Re: [ceph-users] Missing clones
So - here is the feedback. After a long night... The plain copying did not help... it then complains about the Snaps of another VM (also with old Snapshots). I remembered about a thread I read that the problem could solved by converting back to filestore, because you then have access of the data in filesystem. So I did that for the 3 OSDs affected. After that, of course (rgh), the PG got located on other OSDs - but at least one was still on a filestore converted OSD. So I first set the primary affinity in a way that the PG was primary on the filestore OSD. Then I quickly turned off all three OSDs. The PG got stale then (all replicas were down). Flushed the journals to be on the safe side. Then I took a detailed look in the filesystem (with find) and found the rbd_data.2313975238e1f29.000XXX, which was size 0. So no data in it. I then used > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-X > rbd_data.2313975238e1f29.000XXX remove on all three OSDs and fired them up again. Then - after waiting for the cluster to get balanced again (PG still reported as inconsistent) - I fired up a repair on the PG (primary still on the filestore OSD). -> Fixed. :-) HEALTHY This night I will set the OSD up as BlueStore again. Hopefully it will not happen again. I found in a bug report the tip to set "bluefs_allocator = stupid" in ceph.conf. I also did that and restarted all OSDs afterwards. So maybe this prevents the problem to happen again. Best Karsten On 20.02.2018 16:03, Eugen Block wrote: > Alright, good luck! > The results would be interesting. :-) Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing clones
Alright, good luck! The results would be interesting. :-) Zitat von Karsten Becker : Hi Eugen, yes, I also see the rbd_data.-Number changing. This can be caused by me by deleting snapshots and trying to move over VMs to another pool which is not affected. Currently I'm trying to move the Finance VM, which is a very old VM which got created as one of the first VMs and is still alive (as the only one of this age). Maybe it's really a problem of "old" VM formats, like mentioned in the links somebody sent where snapshots had wrong/old bits that a new Ceph could not understrand anymore. We'll see... the VM is large and currently copying... if the error gets also copied, the VM format/age is the cause. If not, ... hm... :-D Nevertheless thank you for your help! Karsten On 20.02.2018 15:47, Eugen Block wrote: I'm not quite sure how to interpret this, but there are different objects referenced. From the first log output you pasted: 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing From the failed PG import the logs mention two different objects: Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# And your last log output has another two different objects: Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# So in total we're seeing five different rbd_data objects here: - rbd_data.2313975238e1f29 - rbd_data.f5b8603d1b58ba - rbd_data.966489238e1f29 - rbd_data.e57feb238e1f29 - rbd_data.4401c7238e1f29 This doesn't make too much sense to me, yet. Which ones are belongig to your corrupted VM? Do you have a backup of the VM in case the repair fails? Zitat von Karsten Becker : Nope: Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23# Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head# snapset 612=[23,22,15]:{19=[15],23=[23,22]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set&, MapCacher::Transaction, ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 13:56:20.672430 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fd4478c68f2] 2: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&, MapCacher::Transaction, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] 6: (main()+0x3909) [0x556930432349] 7: (__libc_start_main()+0xf1) [0x7fd444d252b1] 8: (_start()+0x2a) [0x5569304ba01a] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7fd45147a400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x556930ae1f14] 2: (()+0x110c0) [0x7fd44619e0c0] 3: (gsignal()+0xcf) [0x7fd444d37fcf] 4: (abort()+0x16a) [0x7fd444d393fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fd4478c6a7e] 6: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&, MapCacher::Transaction, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] 10: (main()+0x3909) [0x556930432349] 11: (__libc_start_main()+0xf1) [0x7fd444d252b1] 12: (_start()+0x2a) [0x5569304ba01a] Aborted What I also do not understand: If I take your approach of finding out what is stored in the PG, I get no match with my PG ID anymore. If I take the approach of "rbd info" which was posted by Mykola Golub, I get a match
Re: [ceph-users] Missing clones
Hi Eugen, yes, I also see the rbd_data.-Number changing. This can be caused by me by deleting snapshots and trying to move over VMs to another pool which is not affected. Currently I'm trying to move the Finance VM, which is a very old VM which got created as one of the first VMs and is still alive (as the only one of this age). Maybe it's really a problem of "old" VM formats, like mentioned in the links somebody sent where snapshots had wrong/old bits that a new Ceph could not understrand anymore. We'll see... the VM is large and currently copying... if the error gets also copied, the VM format/age is the cause. If not, ... hm... :-D Nevertheless thank you for your help! Karsten On 20.02.2018 15:47, Eugen Block wrote: > I'm not quite sure how to interpret this, but there are different > objects referenced. From the first log output you pasted: > >> 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 >> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected >> clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 >> missing > > From the failed PG import the logs mention two different objects: > >> Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# > snapset 0=[]:{} > Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# > > And your last log output has another two different objects: > >> Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# > snapset 0=[]:{} > Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# > > > So in total we're seeing five different rbd_data objects here: > > - rbd_data.2313975238e1f29 > - rbd_data.f5b8603d1b58ba > - rbd_data.966489238e1f29 > - rbd_data.e57feb238e1f29 > - rbd_data.4401c7238e1f29 > > This doesn't make too much sense to me, yet. Which ones are belongig to > your corrupted VM? Do you have a backup of the VM in case the repair fails? > > > Zitat von Karsten Becker : > >> Nope: >> >>> Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# >>> snapset 0=[]:{} >>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# >>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23# >>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head# >>> snapset 612=[23,22,15]:{19=[15],23=[23,22]} >>> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function >>> 'void SnapMapper::add_oid(const hobject_t&, const >>> std::set&, >>> MapCacher::Transaction, >>> ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 >>> 13:56:20.672430 >>> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED >>> assert(r == -2) >>> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) >>> luminous (stable) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x102) [0x7fd4478c68f2] >>> 2: (SnapMapper::add_oid(hobject_t const&, std::set>> std::less, std::allocator > const&, >>> MapCacher::Transaction>> std::char_traits, std::allocator >, >>> ceph::buffer::list>*)+0x8e9) [0x556930765fe9] >>> 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, >>> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, >>> SnapMapper&)+0xafb) [0x5569304ca01b] >>> 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, >>> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) >>> [0x5569304caae8] >>> 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, >>> std::__cxx11::basic_string, >>> std::allocator >, ObjectStore::Sequencer&)+0x1135) >>> [0x5569304d12f5] >>> 6: (main()+0x3909) [0x556930432349] >>> 7: (__libc_start_main()+0xf1) [0x7fd444d252b1] >>> 8: (_start()+0x2a) [0x5569304ba01a] >>> NOTE: a copy of the executable, or `objdump -rdS ` is >>> needed to interpret this. >>> *** Caught signal (Aborted) ** >>> in thread 7fd45147a400 thread_name:ceph-objectstor >>> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) >>> luminous (stable) >>> 1: (()+0x913f14) [0x556930ae1f14] >>> 2: (()+0x110c0) [0x7fd44619e0c0] >>> 3: (gsignal()+0xcf) [0x7fd444d37fcf] >>> 4: (abort()+0x16a) [0x7fd444d393fa] >>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x28e) [0x7fd4478c6a7e] >>> 6: (SnapMapper::add_oid(hobject_t const&, std::set>> std::less, std::allocator > const&, >>> MapCacher::Transaction>> std::char_traits, std::allocator >, >>> ceph::buffer::list>*)+0x8e9) [0x556930765fe9] >>> 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, >>> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, >>> SnapMapper&)+0xafb) [0x5569304ca01b] >>> 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, >>> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) >>> [0x5569304caae8] >>> 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, >>> std::__cxx11::basic_string, >>> std::allocator >, ObjectStore::Sequencer&)+0x1135) >>> [0x5569304d12f5] >>> 10: (main()+0x3909) [0x556930432349] >>> 11: (__libc_start_main()+0xf1
Re: [ceph-users] Missing clones
I'm not quite sure how to interpret this, but there are different objects referenced. From the first log output you pasted: 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing From the failed PG import the logs mention two different objects: Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# And your last log output has another two different objects: Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# So in total we're seeing five different rbd_data objects here: - rbd_data.2313975238e1f29 - rbd_data.f5b8603d1b58ba - rbd_data.966489238e1f29 - rbd_data.e57feb238e1f29 - rbd_data.4401c7238e1f29 This doesn't make too much sense to me, yet. Which ones are belongig to your corrupted VM? Do you have a backup of the VM in case the repair fails? Zitat von Karsten Becker : Nope: Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23# Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head# snapset 612=[23,22,15]:{19=[15],23=[23,22]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set&, MapCacher::Transaction, ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 13:56:20.672430 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fd4478c68f2] 2: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&, MapCacher::Transactionstd::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] 6: (main()+0x3909) [0x556930432349] 7: (__libc_start_main()+0xf1) [0x7fd444d252b1] 8: (_start()+0x2a) [0x5569304ba01a] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7fd45147a400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x556930ae1f14] 2: (()+0x110c0) [0x7fd44619e0c0] 3: (gsignal()+0xcf) [0x7fd444d37fcf] 4: (abort()+0x16a) [0x7fd444d393fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fd4478c6a7e] 6: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&, MapCacher::Transactionstd::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] 10: (main()+0x3909) [0x556930432349] 11: (__libc_start_main()+0xf1) [0x7fd444d252b1] 12: (_start()+0x2a) [0x5569304ba01a] Aborted What I also do not understand: If I take your approach of finding out what is stored in the PG, I get no match with my PG ID anymore. If I take the approach of "rbd info" which was posted by Mykola Golub, I get a match - unfortunately the most important VM on our system which holds the software for our Finance. Best Karsten On 20.02.2018 09:16, Eugen Block wrote: And does the re-import of the PG work? From the logs I assumed that the snapshot(s) prevented a successful import, but now that they are deleted it could work. Zitat von Karsten Becker : Hi Eugen, hmmm, that should be : rbd -p cpVirtualMachines list | while read LINE; do osdmaptool --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2 --pool 10 osdmap 2>&1; done; done | less It's a Proxmox syst
Re: [ceph-users] Missing clones
Nope: > Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head# > snapset 0=[]:{} > Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19# > Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23# > Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head# > snapset 612=[23,22,15]:{19=[15],23=[23,22]} > /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void > SnapMapper::add_oid(const hobject_t&, const std::set&, > MapCacher::Transaction, > ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 13:56:20.672430 > /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r > == -2) > ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x7fd4478c68f2] > 2: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, > MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) > [0x556930765fe9] > 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, > ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] > 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, > OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] > 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > std::__cxx11::basic_string, std::allocator > >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] > 6: (main()+0x3909) [0x556930432349] > 7: (__libc_start_main()+0xf1) [0x7fd444d252b1] > 8: (_start()+0x2a) [0x5569304ba01a] > NOTE: a copy of the executable, or `objdump -rdS ` is needed to > interpret this. > *** Caught signal (Aborted) ** > in thread 7fd45147a400 thread_name:ceph-objectstor > ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > (stable) > 1: (()+0x913f14) [0x556930ae1f14] > 2: (()+0x110c0) [0x7fd44619e0c0] > 3: (gsignal()+0xcf) [0x7fd444d37fcf] > 4: (abort()+0x16a) [0x7fd444d393fa] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x28e) [0x7fd4478c6a7e] > 6: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, > MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) > [0x556930765fe9] > 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, > ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] > 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, > OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] > 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > std::__cxx11::basic_string, std::allocator > >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5] > 10: (main()+0x3909) [0x556930432349] > 11: (__libc_start_main()+0xf1) [0x7fd444d252b1] > 12: (_start()+0x2a) [0x5569304ba01a] > Aborted What I also do not understand: If I take your approach of finding out what is stored in the PG, I get no match with my PG ID anymore. If I take the approach of "rbd info" which was posted by Mykola Golub, I get a match - unfortunately the most important VM on our system which holds the software for our Finance. Best Karsten On 20.02.2018 09:16, Eugen Block wrote: > And does the re-import of the PG work? From the logs I assumed that the > snapshot(s) prevented a successful import, but now that they are deleted > it could work. > > > Zitat von Karsten Becker : > >> Hi Eugen, >> >> hmmm, that should be : >> >>> rbd -p cpVirtualMachines list | while read LINE; do osdmaptool >>> --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls >>> cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while >>> read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2 >>> --pool 10 osdmap 2>&1; done; done | less >> >> It's a Proxmox system. There were only two snapshots on the PG, which I >> deleted now. Now nothing gets displayed on the PG... is that possible? A >> repair still fails unfortunately... >> >> Best & thank you for the hint! >> Karsten >> >> >> >> On 19.02.2018 22:42, Eugen Block wrote: BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? >>> >>> Depending on how many PGs your cluster/pool has, you could dump your >>> osdmap and then run the osdmaptool [1] for every rbd object in your pool >>> and grep for the affected PG. That would be quick for a few objects, I >>> guess: >>> >>> ceph1:~ # ceph osd getmap -o /tmp/osdmap >>> >>> ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap >>> osdmaptool: osdmap file '/tmp/osdmap' >>> object 'image1' -> 5.2 -> [0] >>> >>> ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap >>> osdmaptool: osdmap file '/tmp/osdmap' >>> object 'image2'
Re: [ceph-users] Missing clones
And does the re-import of the PG work? From the logs I assumed that the snapshot(s) prevented a successful import, but now that they are deleted it could work. Zitat von Karsten Becker : Hi Eugen, hmmm, that should be : rbd -p cpVirtualMachines list | while read LINE; do osdmaptool --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2 --pool 10 osdmap 2>&1; done; done | less It's a Proxmox system. There were only two snapshots on the PG, which I deleted now. Now nothing gets displayed on the PG... is that possible? A repair still fails unfortunately... Best & thank you for the hint! Karsten On 19.02.2018 22:42, Eugen Block wrote: BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Depending on how many PGs your cluster/pool has, you could dump your osdmap and then run the osdmaptool [1] for every rbd object in your pool and grep for the affected PG. That would be quick for a few objects, I guess: ceph1:~ # ceph osd getmap -o /tmp/osdmap ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image1' -> 5.2 -> [0] ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image2' -> 5.f -> [0] [1] https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/ Zitat von Karsten Becker : BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Best Karsten On 19.02.2018 19:26, Karsten Becker wrote: Hi. Thank you for the tip. I just tried... but unfortunately the import aborts: Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# snapset 628=[24,21,17]:{18=[17],24=[24,21]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set&, MapCacher::Transaction, ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7facb0c2a8f2] 2: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&, MapCacher::Transaction, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 6: (main()+0x3909) [0x55eef3561349] 7: (__libc_start_main()+0xf1) [0x7facae0892b1] 8: (_start()+0x2a) [0x55eef35e901a] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7facba7de400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x55eef3c10f14] 2: (()+0x110c0) [0x7facaf5020c0] 3: (gsignal()+0xcf) [0x7facae09bfcf] 4: (abort()+0x16a) [0x7facae09d3fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7facb0c2aa7e] 6: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&, MapCacher::Transaction, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 10: (main()+0x3909) [0x55eef3561349] 11: (__libc_start_main()+0xf1) [0x7facae0892b1] 12: (_start()+0x2a) [0x55eef35e901a] Aborted Best Karsten On 19.02.2018 17:09, Eugen Block wrote: Could [1] be of interest? Exporting the intact PG and importing it back to the rescpective OSD sounds promising. [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.h
Re: [ceph-users] Missing clones
Hi Eugen, hmmm, that should be : > rbd -p cpVirtualMachines list | while read LINE; do osdmaptool > --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls > cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while read > LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2 --pool 10 osdmap > 2>&1; done; done | less It's a Proxmox system. There were only two snapshots on the PG, which I deleted now. Now nothing gets displayed on the PG... is that possible? A repair still fails unfortunately... Best & thank you for the hint! Karsten On 19.02.2018 22:42, Eugen Block wrote: >> BTW - how can I find out, which RBDs are affected by this problem. Maybe >> a copy/remove of the affected RBDs could help? But how to find out to >> which RBDs this PG belongs to? > > Depending on how many PGs your cluster/pool has, you could dump your > osdmap and then run the osdmaptool [1] for every rbd object in your pool > and grep for the affected PG. That would be quick for a few objects, I > guess: > > ceph1:~ # ceph osd getmap -o /tmp/osdmap > > ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap > osdmaptool: osdmap file '/tmp/osdmap' > object 'image1' -> 5.2 -> [0] > > ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap > osdmaptool: osdmap file '/tmp/osdmap' > object 'image2' -> 5.f -> [0] > > > [1] > https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/ > > > Zitat von Karsten Becker : > >> BTW - how can I find out, which RBDs are affected by this problem. Maybe >> a copy/remove of the affected RBDs could help? But how to find out to >> which RBDs this PG belongs to? >> >> Best >> Karsten >> >> On 19.02.2018 19:26, Karsten Becker wrote: >>> Hi. >>> >>> Thank you for the tip. I just tried... but unfortunately the import >>> aborts: >>> Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# snapset 628=[24,21,17]:{18=[17],24=[24,21]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set&, MapCacher::Transaction, ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7facb0c2a8f2] 2: (SnapMapper::add_oid(hobject_t const&, std::set>>> std::less, std::allocator > const&, MapCacher::Transaction>>> std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 6: (main()+0x3909) [0x55eef3561349] 7: (__libc_start_main()+0xf1) [0x7facae0892b1] 8: (_start()+0x2a) [0x55eef35e901a] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7facba7de400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x55eef3c10f14] 2: (()+0x110c0) [0x7facaf5020c0] 3: (gsignal()+0xcf) [0x7facae09bfcf] 4: (abort()+0x16a) [0x7facae09d3fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7facb0c2aa7e] 6: (SnapMapper::add_oid(hobject_t const&, std::set>>> std::less, std::allocator > const&, MapCacher::Transaction>>> std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 10: (main()+0x3909) [0x55eef3561349] 11: (__libc_start_main()+0xf1) [0x7facae0892b1] 12: (_start()+0
Re: [ceph-users] Missing clones
On Mon, Feb 19, 2018 at 10:17:55PM +0100, Karsten Becker wrote: > BTW - how can I find out, which RBDs are affected by this problem. Maybe > a copy/remove of the affected RBDs could help? But how to find out to > which RBDs this PG belongs to? In this case rbd_data.966489238e1f29.250b looks like the problem object. To find out which RBD image it belongs to you can run `rbd info /` command for every image in the pool, looking at block_name_prefix field, until you find 'rbd_data.966489238e1f29'. > > Best > Karsten > > On 19.02.2018 19:26, Karsten Becker wrote: > > Hi. > > > > Thank you for the tip. I just tried... but unfortunately the import aborts: > > > >> Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# > >> snapset 0=[]:{} > >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# > >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# > >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# > >> snapset 628=[24,21,17]:{18=[17],24=[24,21]} > >> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void > >> SnapMapper::add_oid(const hobject_t&, const std::set&, > >> MapCacher::Transaction, > >> ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 > >> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED > >> assert(r == -2) > >> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > >> (stable) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x102) [0x7facb0c2a8f2] > >> 2: (SnapMapper::add_oid(hobject_t const&, std::set >> std::less, std::allocator > const&, > >> MapCacher::Transaction >> std::char_traits, std::allocator >, > >> ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] > >> 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, > >> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, > >> SnapMapper&)+0xafb) [0x55eef35f901b] > >> 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, > >> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) > >> [0x55eef35f9ae8] > >> 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > >> std::__cxx11::basic_string, > >> std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] > >> 6: (main()+0x3909) [0x55eef3561349] > >> 7: (__libc_start_main()+0xf1) [0x7facae0892b1] > >> 8: (_start()+0x2a) [0x55eef35e901a] > >> NOTE: a copy of the executable, or `objdump -rdS ` is needed > >> to interpret this. > >> *** Caught signal (Aborted) ** > >> in thread 7facba7de400 thread_name:ceph-objectstor > >> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > >> (stable) > >> 1: (()+0x913f14) [0x55eef3c10f14] > >> 2: (()+0x110c0) [0x7facaf5020c0] > >> 3: (gsignal()+0xcf) [0x7facae09bfcf] > >> 4: (abort()+0x16a) [0x7facae09d3fa] > >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x28e) [0x7facb0c2aa7e] > >> 6: (SnapMapper::add_oid(hobject_t const&, std::set >> std::less, std::allocator > const&, > >> MapCacher::Transaction >> std::char_traits, std::allocator >, > >> ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] > >> 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, > >> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, > >> SnapMapper&)+0xafb) [0x55eef35f901b] > >> 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, > >> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) > >> [0x55eef35f9ae8] > >> 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > >> std::__cxx11::basic_string, > >> std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] > >> 10: (main()+0x3909) [0x55eef3561349] > >> 11: (__libc_start_main()+0xf1) [0x7facae0892b1] > >> 12: (_start()+0x2a) [0x55eef35e901a] > >> Aborted -- Mykola Golub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing clones
BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Depending on how many PGs your cluster/pool has, you could dump your osdmap and then run the osdmaptool [1] for every rbd object in your pool and grep for the affected PG. That would be quick for a few objects, I guess: ceph1:~ # ceph osd getmap -o /tmp/osdmap ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image1' -> 5.2 -> [0] ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image2' -> 5.f -> [0] [1] https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/ Zitat von Karsten Becker : BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Best Karsten On 19.02.2018 19:26, Karsten Becker wrote: Hi. Thank you for the tip. I just tried... but unfortunately the import aborts: Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# snapset 628=[24,21,17]:{18=[17],24=[24,21]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set&, MapCacher::Transaction, ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7facb0c2a8f2] 2: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&, MapCacher::Transactionstd::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 6: (main()+0x3909) [0x55eef3561349] 7: (__libc_start_main()+0xf1) [0x7facae0892b1] 8: (_start()+0x2a) [0x55eef35e901a] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7facba7de400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x55eef3c10f14] 2: (()+0x110c0) [0x7facaf5020c0] 3: (gsignal()+0xcf) [0x7facae09bfcf] 4: (abort()+0x16a) [0x7facae09d3fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7facb0c2aa7e] 6: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&, MapCacher::Transactionstd::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string, std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 10: (main()+0x3909) [0x55eef3561349] 11: (__libc_start_main()+0xf1) [0x7facae0892b1] 12: (_start()+0x2a) [0x55eef35e901a] Aborted Best Karsten On 19.02.2018 17:09, Eugen Block wrote: Could [1] be of interest? Exporting the intact PG and importing it back to the rescpective OSD sounds promising. [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html Zitat von Karsten Becker : Hi. We have size=3 min_size=2. But this "upgrade" has been done during the weekend. We had size=2 min_size=1 before. Best Karsten On 19.02.2018 13:02, Eugen Block wrote: Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker : Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this? 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing 2018-02-19 11:00:23.183707 osd.29 [INF] r
Re: [ceph-users] Missing clones
BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Best Karsten On 19.02.2018 19:26, Karsten Becker wrote: > Hi. > > Thank you for the tip. I just tried... but unfortunately the import aborts: > >> Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# >> snapset 0=[]:{} >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# >> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# >> snapset 628=[24,21,17]:{18=[17],24=[24,21]} >> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void >> SnapMapper::add_oid(const hobject_t&, const std::set&, >> MapCacher::Transaction, >> ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 >> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r >> == -2) >> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous >> (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x102) [0x7facb0c2a8f2] >> 2: (SnapMapper::add_oid(hobject_t const&, std::set> std::less, std::allocator > const&, >> MapCacher::Transaction> std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) >> [0x55eef3894fe9] >> 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, >> ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] >> 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, >> OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] >> 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, >> std::__cxx11::basic_string, >> std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] >> 6: (main()+0x3909) [0x55eef3561349] >> 7: (__libc_start_main()+0xf1) [0x7facae0892b1] >> 8: (_start()+0x2a) [0x55eef35e901a] >> NOTE: a copy of the executable, or `objdump -rdS ` is needed to >> interpret this. >> *** Caught signal (Aborted) ** >> in thread 7facba7de400 thread_name:ceph-objectstor >> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous >> (stable) >> 1: (()+0x913f14) [0x55eef3c10f14] >> 2: (()+0x110c0) [0x7facaf5020c0] >> 3: (gsignal()+0xcf) [0x7facae09bfcf] >> 4: (abort()+0x16a) [0x7facae09d3fa] >> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x28e) [0x7facb0c2aa7e] >> 6: (SnapMapper::add_oid(hobject_t const&, std::set> std::less, std::allocator > const&, >> MapCacher::Transaction> std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) >> [0x55eef3894fe9] >> 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, >> ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] >> 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, >> OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] >> 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, >> std::__cxx11::basic_string, >> std::allocator >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] >> 10: (main()+0x3909) [0x55eef3561349] >> 11: (__libc_start_main()+0xf1) [0x7facae0892b1] >> 12: (_start()+0x2a) [0x55eef35e901a] >> Aborted > > Best > Karsten > > On 19.02.2018 17:09, Eugen Block wrote: >> Could [1] be of interest? >> Exporting the intact PG and importing it back to the rescpective OSD >> sounds promising. >> >> [1] >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html >> >> >> Zitat von Karsten Becker : >> >>> Hi. >>> >>> We have size=3 min_size=2. >>> >>> But this "upgrade" has been done during the weekend. We had size=2 >>> min_size=1 before. >>> >>> Best >>> Karsten >>> >>> >>> >>> On 19.02.2018 13:02, Eugen Block wrote: Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker : > Hi, > > I have one damaged PG in my cluster. All OSDs are BlueStore. How do I > fix this? > >> 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 >> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected >> clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 >> missing >> 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 >> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 >> missing clone(s) >> 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible >> data damage: 1 pg inconsistent (PG_DAMAGED) >> 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 >> fixed >> 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub >> errors; Possible data damage: 1 pg inconsistent > > "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually > s
Re: [ceph-users] Missing clones
Hi. Thank you for the tip. I just tried... but unfortunately the import aborts: > Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head# > snapset 0=[]:{} > Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18# > Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24# > Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head# > snapset 628=[24,21,17]:{18=[17],24=[24,21]} > /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void > SnapMapper::add_oid(const hobject_t&, const std::set&, > MapCacher::Transaction, > ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 > /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r > == -2) > ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x7facb0c2a8f2] > 2: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, > MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) > [0x55eef3894fe9] > 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, > ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] > 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, > OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] > 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > std::__cxx11::basic_string, std::allocator > >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] > 6: (main()+0x3909) [0x55eef3561349] > 7: (__libc_start_main()+0xf1) [0x7facae0892b1] > 8: (_start()+0x2a) [0x55eef35e901a] > NOTE: a copy of the executable, or `objdump -rdS ` is needed to > interpret this. > *** Caught signal (Aborted) ** > in thread 7facba7de400 thread_name:ceph-objectstor > ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > (stable) > 1: (()+0x913f14) [0x55eef3c10f14] > 2: (()+0x110c0) [0x7facaf5020c0] > 3: (gsignal()+0xcf) [0x7facae09bfcf] > 4: (abort()+0x16a) [0x7facae09d3fa] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x28e) [0x7facb0c2aa7e] > 6: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, > MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) > [0x55eef3894fe9] > 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, > ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] > 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, > OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] > 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, > std::__cxx11::basic_string, std::allocator > >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] > 10: (main()+0x3909) [0x55eef3561349] > 11: (__libc_start_main()+0xf1) [0x7facae0892b1] > 12: (_start()+0x2a) [0x55eef35e901a] > Aborted Best Karsten On 19.02.2018 17:09, Eugen Block wrote: > Could [1] be of interest? > Exporting the intact PG and importing it back to the rescpective OSD > sounds promising. > > [1] > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html > > > Zitat von Karsten Becker : > >> Hi. >> >> We have size=3 min_size=2. >> >> But this "upgrade" has been done during the weekend. We had size=2 >> min_size=1 before. >> >> Best >> Karsten >> >> >> >> On 19.02.2018 13:02, Eugen Block wrote: >>> Hi, >>> >>> just to rule out the obvious, which size does the pool have? You aren't >>> running it with size = 2, do you? >>> >>> >>> Zitat von Karsten Becker : >>> Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this? > 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected > clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 > missing > 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 > missing clone(s) > 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible > data damage: 1 pg inconsistent (PG_DAMAGED) > 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 > fixed > 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub > errors; Possible data damage: 1 pg inconsistent "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amt
Re: [ceph-users] Missing clones
Could [1] be of interest? Exporting the intact PG and importing it back to the rescpective OSD sounds promising. [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html Zitat von Karsten Becker : Hi. We have size=3 min_size=2. But this "upgrade" has been done during the weekend. We had size=2 min_size=1 before. Best Karsten On 19.02.2018 13:02, Eugen Block wrote: Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker : Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this? 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing clone(s) 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing clones
When we ran our test cluster with size 2 I experienced a similar issue, but that was in Hammer. There I could find the corresponding PG data in the filesystem and copy it to the damaged PG. But now we also run Bluestore on Luminous, I don't know yet how to fix this kind of issue, maybe someone else can share some thoughts on this. Zitat von Karsten Becker : Hi. We have size=3 min_size=2. But this "upgrade" has been done during the weekend. We had size=2 min_size=1 before. Best Karsten On 19.02.2018 13:02, Eugen Block wrote: Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker : Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this? 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing clone(s) 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing clones
Hi. We have size=3 min_size=2. But this "upgrade" has been done during the weekend. We had size=2 min_size=1 before. Best Karsten On 19.02.2018 13:02, Eugen Block wrote: > Hi, > > just to rule out the obvious, which size does the pool have? You aren't > running it with size = 2, do you? > > > Zitat von Karsten Becker : > >> Hi, >> >> I have one damaged PG in my cluster. All OSDs are BlueStore. How do I >> fix this? >> >>> 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 >>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected >>> clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 >>> missing >>> 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 >>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 >>> missing clone(s) >>> 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible >>> data damage: 1 pg inconsistent (PG_DAMAGED) >>> 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed >>> 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub >>> errors; Possible data damage: 1 pg inconsistent >> >> "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually >> started scrub "ceph pg scrub 10.7b9" also. >> >> Best from Berlin/Germany >> Karsten >> >> >> Ecologic Institut gemeinnuetzige GmbH >> Pfalzburger Str. 43/44, D-10717 Berlin >> Geschaeftsfuehrerin / Director: Dr. Camilla Bausch >> Sitz der Gesellschaft / Registered Office: Berlin (Germany) >> Registergericht / Court of Registration: Amtsgericht Berlin >> (Charlottenburg), HRB 57947 >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing clones
Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker : Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this? 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing clone(s) 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com