Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space
https://tracker.ceph.com/issues/41255 is probably reporting the same issue. On Thu, Aug 22, 2019 at 6:31 PM Lars Täuber wrote: > > Hi there! > > We also experience this behaviour of our cluster while it is moving pgs. > > # ceph health detail > HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs > inactive; Degraded data redundancy (low space): 1 pg backfill_toofull > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked > for 359 secs > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.231 is stuck inactive for 878.224182, current state remapped, last > acting [20,2147483647,13,2147483647,15,10] > pg 21.240 is stuck inactive for 878.123932, current state remapped, last > acting [26,17,21,20,2147483647,2147483647] > PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull > pg 21.376 is active+remapped+backfill_wait+backfill_toofull, acting > [6,11,29,2,10,15] > # ceph pg map 21.376 > osdmap e68016 pg 21.376 (21.376) -> up [6,5,23,21,10,11] acting > [6,11,29,2,10,15] > > # ceph osd dump | fgrep ratio > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > > This happens while the cluster is rebalancing the pgs after I manually mark a > single osd out. > see here: > Subject: [ceph-users] pg 21.1f9 is stuck inactive for 53316.902820, current > state remapped > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036634.html > > > Mostly the cluster heals itself at least into state HEALTH_WARN: > > > # ceph health detail > HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs > inactive > MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs > mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked > for 1155 secs > PG_AVAILABILITY Reduced data availability: 2 pgs inactive > pg 21.231 is stuck inactive for 1677.312219, current state remapped, last > acting [20,2147483647,13,2147483647,15,10] > pg 21.240 is stuck inactive for 1677.211969, current state remapped, last > acting [26,17,21,20,2147483647,2147483647] > > > > Cheers, > Lars > > > Wed, 21 Aug 2019 17:28:05 -0500 > Reed Dier ==> Vladimir Brik > : > > Just chiming in to say that I too had some issues with backfill_toofull > > PGs, despite no OSD's being in a backfill_full state, albeit, there were > > some nearfull OSDs. > > > > I was able to get through it by reweighting down the OSD that was the > > target reported by ceph pg dump | grep 'backfill_toofull'. > > > > This was on 14.2.2. > > > > Reed > > > > > On Aug 21, 2019, at 2:50 PM, Vladimir Brik > > > wrote: > > > > > > Hello > > > > > > After increasing number of PGs in a pool, ceph status is reporting > > > "Degraded data redundancy (low space): 1 pg backfill_toofull", but I > > > don't understand why, because all OSDs seem to have enough space. > > > > > > ceph health detail says: > > > pg 40.155 is active+remapped+backfill_toofull, acting [20,57,79,85] > > > > > > $ ceph pg map 40.155 > > > osdmap e3952 pg 40.155 (40.155) -> up [20,57,66,85] acting [20,57,79,85] > > > > > > So I guess Ceph wants to move 40.155 from 66 to 79 (or other way > > > around?). According to "osd df", OSD 66's utilization is 71.90%, OSD 79's > > > utilization is 58.45%. The OSD with least free space in the cluster is > > > 81.23% full, and it's not any of the ones above. > > > > > > OSD backfillfull_ratio is 90% (is there a better way to determine this?): > > > $ ceph osd dump | grep ratio > > > full_ratio 0.95 > > > backfillfull_ratio 0.9 > > > nearfull_ratio 0.7 > > > > > > Does anybody know why a PG could be in the backfill_toofull state if no > > > OSD is in the backfillfull state? > > > > > > > > > Vlad > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Informationstechnologie > Berlin-Brandenburgische Akademie der Wissenschaften > Jägerstraße 22-23 10117 Berlin > Tel.: +49 30 20370-352 http://www.bbaw.de > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space
Hi there! We also experience this behaviour of our cluster while it is moving pgs. # ceph health detail HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs inactive; Degraded data redundancy (low space): 1 pg backfill_toofull MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 359 secs PG_AVAILABILITY Reduced data availability: 2 pgs inactive pg 21.231 is stuck inactive for 878.224182, current state remapped, last acting [20,2147483647,13,2147483647,15,10] pg 21.240 is stuck inactive for 878.123932, current state remapped, last acting [26,17,21,20,2147483647,2147483647] PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull pg 21.376 is active+remapped+backfill_wait+backfill_toofull, acting [6,11,29,2,10,15] # ceph pg map 21.376 osdmap e68016 pg 21.376 (21.376) -> up [6,5,23,21,10,11] acting [6,11,29,2,10,15] # ceph osd dump | fgrep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 This happens while the cluster is rebalancing the pgs after I manually mark a single osd out. see here: Subject: [ceph-users] pg 21.1f9 is stuck inactive for 53316.902820, current state remapped http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036634.html Mostly the cluster heals itself at least into state HEALTH_WARN: # ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs inactive MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 1155 secs PG_AVAILABILITY Reduced data availability: 2 pgs inactive pg 21.231 is stuck inactive for 1677.312219, current state remapped, last acting [20,2147483647,13,2147483647,15,10] pg 21.240 is stuck inactive for 1677.211969, current state remapped, last acting [26,17,21,20,2147483647,2147483647] Cheers, Lars Wed, 21 Aug 2019 17:28:05 -0500 Reed Dier ==> Vladimir Brik : > Just chiming in to say that I too had some issues with backfill_toofull PGs, > despite no OSD's being in a backfill_full state, albeit, there were some > nearfull OSDs. > > I was able to get through it by reweighting down the OSD that was the target > reported by ceph pg dump | grep 'backfill_toofull'. > > This was on 14.2.2. > > Reed > > > On Aug 21, 2019, at 2:50 PM, Vladimir Brik > > wrote: > > > > Hello > > > > After increasing number of PGs in a pool, ceph status is reporting > > "Degraded data redundancy (low space): 1 pg backfill_toofull", but I don't > > understand why, because all OSDs seem to have enough space. > > > > ceph health detail says: > > pg 40.155 is active+remapped+backfill_toofull, acting [20,57,79,85] > > > > $ ceph pg map 40.155 > > osdmap e3952 pg 40.155 (40.155) -> up [20,57,66,85] acting [20,57,79,85] > > > > So I guess Ceph wants to move 40.155 from 66 to 79 (or other way around?). > > According to "osd df", OSD 66's utilization is 71.90%, OSD 79's utilization > > is 58.45%. The OSD with least free space in the cluster is 81.23% full, and > > it's not any of the ones above. > > > > OSD backfillfull_ratio is 90% (is there a better way to determine this?): > > $ ceph osd dump | grep ratio > > full_ratio 0.95 > > backfillfull_ratio 0.9 > > nearfull_ratio 0.7 > > > > Does anybody know why a PG could be in the backfill_toofull state if no OSD > > is in the backfillfull state? > > > > > > Vlad > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space
Just chiming in to say that I too had some issues with backfill_toofull PGs, despite no OSD's being in a backfill_full state, albeit, there were some nearfull OSDs. I was able to get through it by reweighting down the OSD that was the target reported by ceph pg dump | grep 'backfill_toofull'. This was on 14.2.2. Reed > On Aug 21, 2019, at 2:50 PM, Vladimir Brik > wrote: > > Hello > > After increasing number of PGs in a pool, ceph status is reporting "Degraded > data redundancy (low space): 1 pg backfill_toofull", but I don't understand > why, because all OSDs seem to have enough space. > > ceph health detail says: > pg 40.155 is active+remapped+backfill_toofull, acting [20,57,79,85] > > $ ceph pg map 40.155 > osdmap e3952 pg 40.155 (40.155) -> up [20,57,66,85] acting [20,57,79,85] > > So I guess Ceph wants to move 40.155 from 66 to 79 (or other way around?). > According to "osd df", OSD 66's utilization is 71.90%, OSD 79's utilization > is 58.45%. The OSD with least free space in the cluster is 81.23% full, and > it's not any of the ones above. > > OSD backfillfull_ratio is 90% (is there a better way to determine this?): > $ ceph osd dump | grep ratio > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.7 > > Does anybody know why a PG could be in the backfill_toofull state if no OSD > is in the backfillfull state? > > > Vlad > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status showing wrong osd
Hi Paul, Thanks for your reply. Looks like it is contacting the monitor properly as it shows the below o/p from ceph status.Correct me if iam wrong monmap e1: 1 mons at {0=10.38.32.245:16789/0} election epoch 1, quorum 0 0 The reason could be that the osd’s are created incorrectly. Let me check this. Regards, Muneendra. *From:* Paul Emmerich [mailto:paul.emmer...@croit.io] *Sent:* Tuesday, June 05, 2018 5:51 PM *To:* Muneendra Kumar M *Cc:* ceph-users *Subject:* Re: [ceph-users] ceph status showing wrong osd It was either created incorrectly (no auth key?) or it can't contact the monitor for some reason. The log file should tell you more. Paul 2018-06-05 13:20 GMT+02:00 Muneendra Kumar M : Hi, I have created a cluster and when I run ceph status it is showing me the wrong number of osds. cluster 6571de66-75e1-4da7-b1ed-15a8bfed0944 health HEALTH_WARN 2112 pgs stuck inactive 2112 pgs stuck unclean monmap e1: 1 mons at {0=10.38.32.245:16789/0} election epoch 1, quorum 0 0 * osdmap e6: 2 osds: 0 up, 0 in* flags sortbitwise pgmap v7: 2112 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 2112 creating I have created only one osd and ceph osd tree also shows two osd’s and both are down. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default -3 0 rack unknownrack -2 0 host Test 0 0 osd.0 down0 1.0 1 0 osd.1 down0 1.0 On the osd node iam seeing the osd daemon is running. root3153 1 0 04:27 pts/000:00:00 /opt/ceph/bin/ceph-mon -i 0 --pid-file /ceph-test/var/run/ceph/mon.0.pid root4696 1 0 04:42 ?00:00:00 /opt/ceph/bin/ceph-osd -i 0 --pid-file /ceph-test/var/run/ceph/osd.0.pid Could anyone please give me the inputs where could be the issue. Regards, Muneendra. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status showing wrong osd
It was either created incorrectly (no auth key?) or it can't contact the monitor for some reason. The log file should tell you more. Paul 2018-06-05 13:20 GMT+02:00 Muneendra Kumar M : > Hi, > > I have created a cluster and when I run ceph status it is showing me the > wrong number of osds. > > > > cluster 6571de66-75e1-4da7-b1ed-15a8bfed0944 > > health HEALTH_WARN > > 2112 pgs stuck inactive > > 2112 pgs stuck unclean > > monmap e1: 1 mons at {0=10.38.32.245:16789/0} > > election epoch 1, quorum 0 0 > > * osdmap e6: 2 osds: 0 up, 0 in* > > flags sortbitwise > > pgmap v7: 2112 pgs, 3 pools, 0 bytes data, 0 objects > > 0 kB used, 0 kB / 0 kB avail > > 2112 creating > > > > I have created only one osd and ceph osd tree also shows two osd’s and > both are down. > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > > -1 0 root default > > -3 0 rack unknownrack > > -2 0 host Test > > 0 0 osd.0 down0 1.0 > > 1 0 osd.1 down0 1.0 > > > > > > On the osd node iam seeing the osd daemon is running. > > > > root3153 1 0 04:27 pts/000:00:00 /opt/ceph/bin/ceph-mon > -i 0 --pid-file /ceph-test/var/run/ceph/mon.0.pid > > root4696 1 0 04:42 ?00:00:00 /opt/ceph/bin/ceph-osd > -i 0 --pid-file /ceph-test/var/run/ceph/osd.0.pid > > > > Could anyone please give me the inputs where could be the issue. > > > > > > Regards, > > Muneendra. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
It was a firewall issue on the controller nodes.After allowing ceph-mgr port in iptables everything is displaying correctly.Thanks to people on IRC. Thanks alot, Kevin On Thu, Dec 21, 2017 at 5:24 PM, kevin parrikarwrote: > accidently removed mailing list email > > ++ceph-users > > Thanks a lot JC for looking into this issue. I am really out of ideas. > > > ceph.conf on mgr node which is also monitor node. > > [global] > fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf > mon_initial_members = node-16 node-30 node-31 > mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > log_to_syslog_level = info > log_to_syslog = True > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 64 > public_network = 172.16.1.0/24 > log_to_syslog_facility = LOG_LOCAL0 > osd_journal_size = 2048 > auth_supported = cephx > osd_pool_default_pgp_num = 64 > osd_mkfs_type = xfs > cluster_network = 172.16.1.0/24 > osd_recovery_max_active = 1 > osd_max_backfills = 1 > mon allow pool delete = true > > [client] > rbd_cache_writethrough_until_flush = True > rbd_cache = True > > [client.radosgw.gateway] > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator > keyring = /etc/ceph/keyring.radosgw.gateway > rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 > rgw_socket_path = /tmp/radosgw.sock > rgw_keystone_revocation_interval = 100 > rgw_keystone_url = http://192.168.1.3:35357 > rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R > rgw_init_timeout = 36 > host = controller3 > rgw_dns_name = *.sapiennetworks.com > rgw_print_continue = True > rgw_keystone_token_cache_size = 10 > rgw_data = /var/lib/ceph/radosgw > user = www-data > > > > > ceph auth list > > > osd.100 > key: AQAtZjpaVZOFBxAAwl0yFLdUOidLzPFjv+HnjA== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > osd.101 > key: AQA4ZjpaS4wwGBAABwgoXQRc1J8sav4MUkWceQ== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > osd.102 > key: AQBDZjpaBS2tEBAAtFiPKBzh8JGi8Nh3PtAGCg== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > > client.admin > key: AQD0yXFYflnYFxAAEz/2XLHO/6RiRXQ5HXRAnw== > caps: [mds] allow * > caps: [mgr] allow * > caps: [mon] allow * > caps: [osd] allow * > client.backups > key: AQC0y3FY4YQNNhAAs5fludq0yvtp/JJt7RT4HA== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=backups, allow rwx pool=volumes > client.bootstrap-mds > key: AQD5yXFYyIxiFxAAyoqLPnxxqWmUr+zz7S+qVQ== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-mds > client.bootstrap-mgr > key: AQBmOTpaXqHQDhAAyDXoxlPmG9QovfmmUd8gIg== > caps: [mon] allow profile bootstrap-mgr > client.bootstrap-osd > key: AQD0yXFYuGkSIhAAelSb3TCPuXRFoFJTBh7Vdg== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-osd > client.bootstrap-rbd > key: AQBnOTpafDS/IRAAnKzuI9AYEF81/6mDVv0QgQ== > caps: [mon] allow profile bootstrap-rbd > > client.bootstrap-rgw > key: AQD3yXFYxt1mLRAArxOgRvWmmzT9pmsqTLpXKw== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-rgw > client.compute > key: AQCbynFYRcNWOBAAPzdAKfP21GvGz1VoHBimGQ== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=volumes, allow rx pool=images, allow rwx pool=compute > client.images > key: AQCyy3FYSMtlJRAAbJ8/U/R82NXvWBC5LmkPGw== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=images > client.radosgw.gateway > key: AQA3ynFYAYMSAxAApvfe/booa9KhigpKpLpUOA== > caps: [mgr] allow r > caps: [mon] allow rw > caps: [osd] allow rwx > client.volumes > key: AQCzy3FYa3paKBAA9BlYpQ1PTeR770ghVv1jKQ== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=volumes, allow rx pool=images > mgr.controller2 > key: AQAmVTpaA+9vBhAApD3rMs//Qri+SawjUF4U4Q== > caps: [mds] allow * > caps: [mgr] allow * > caps: [mon] allow * > caps: [osd] allow * > mgr.controller3 > key: AQByfDparprIEBAAj7Pxdr/87/v0kmJV49aKpQ== > caps: [mds] allow * > caps: [mgr] allow * > caps: [mon] allow * > caps: [osd] allow * > > Regards, > Kevin > > On Thu, Dec 21, 2017 at 8:10 AM, kevin parrikar > wrote: > >>
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
accidently removed mailing list email ++ceph-users Thanks a lot JC for looking into this issue. I am really out of ideas. ceph.conf on mgr node which is also monitor node. [global] fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf mon_initial_members = node-16 node-30 node-31 mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true log_to_syslog_level = info log_to_syslog = True osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 64 public_network = 172.16.1.0/24 log_to_syslog_facility = LOG_LOCAL0 osd_journal_size = 2048 auth_supported = cephx osd_pool_default_pgp_num = 64 osd_mkfs_type = xfs cluster_network = 172.16.1.0/24 osd_recovery_max_active = 1 osd_max_backfills = 1 mon allow pool delete = true [client] rbd_cache_writethrough_until_flush = True rbd_cache = True [client.radosgw.gateway] rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator keyring = /etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 rgw_socket_path = /tmp/radosgw.sock rgw_keystone_revocation_interval = 100 rgw_keystone_url = http://192.168.1.3:35357 rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R rgw_init_timeout = 36 host = controller3 rgw_dns_name = *.sapiennetworks.com rgw_print_continue = True rgw_keystone_token_cache_size = 10 rgw_data = /var/lib/ceph/radosgw user = www-data ceph auth list osd.100 key: AQAtZjpaVZOFBxAAwl0yFLdUOidLzPFjv+HnjA== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.101 key: AQA4ZjpaS4wwGBAABwgoXQRc1J8sav4MUkWceQ== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.102 key: AQBDZjpaBS2tEBAAtFiPKBzh8JGi8Nh3PtAGCg== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * client.admin key: AQD0yXFYflnYFxAAEz/2XLHO/6RiRXQ5HXRAnw== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * client.backups key: AQC0y3FY4YQNNhAAs5fludq0yvtp/JJt7RT4HA== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=backups, allow rwx pool=volumes client.bootstrap-mds key: AQD5yXFYyIxiFxAAyoqLPnxxqWmUr+zz7S+qVQ== caps: [mgr] allow r caps: [mon] allow profile bootstrap-mds client.bootstrap-mgr key: AQBmOTpaXqHQDhAAyDXoxlPmG9QovfmmUd8gIg== caps: [mon] allow profile bootstrap-mgr client.bootstrap-osd key: AQD0yXFYuGkSIhAAelSb3TCPuXRFoFJTBh7Vdg== caps: [mgr] allow r caps: [mon] allow profile bootstrap-osd client.bootstrap-rbd key: AQBnOTpafDS/IRAAnKzuI9AYEF81/6mDVv0QgQ== caps: [mon] allow profile bootstrap-rbd client.bootstrap-rgw key: AQD3yXFYxt1mLRAArxOgRvWmmzT9pmsqTLpXKw== caps: [mgr] allow r caps: [mon] allow profile bootstrap-rgw client.compute key: AQCbynFYRcNWOBAAPzdAKfP21GvGz1VoHBimGQ== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images, allow rwx pool=compute client.images key: AQCyy3FYSMtlJRAAbJ8/U/R82NXvWBC5LmkPGw== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images client.radosgw.gateway key: AQA3ynFYAYMSAxAApvfe/booa9KhigpKpLpUOA== caps: [mgr] allow r caps: [mon] allow rw caps: [osd] allow rwx client.volumes key: AQCzy3FYa3paKBAA9BlYpQ1PTeR770ghVv1jKQ== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images mgr.controller2 key: AQAmVTpaA+9vBhAApD3rMs//Qri+SawjUF4U4Q== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * mgr.controller3 key: AQByfDparprIEBAAj7Pxdr/87/v0kmJV49aKpQ== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * Regards, Kevin On Thu, Dec 21, 2017 at 8:10 AM, kevin parrikarwrote: > Thanks JC, > I tried > ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr > 'allow *' > > but still status is same,also mgr.log is being flooded with below errors. > > 2017-12-21 02:39:10.622834 7fb40a22b700 0 Cannot get stat of OSD 140 > 2017-12-21 02:39:10.622835 7fb40a22b700 0 Cannot get stat of OSD 141 > Not sure whats wrong in my setup > > Regards, > Kevin > > > On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopez > wrote: > >> Hi, >> >> make sure client.admin user has an MGR cap
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
Hi Kevin looks like the pb comes from the mgr user itself then. Can you get me the output of - ceph auth list - cat /etc/ceph/ceph.conf on your mgr node Regards JC While moving. Excuse unintended typos. > On Dec 20, 2017, at 18:40, kevin parrikarwrote: > > Thanks JC, > I tried > ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr > 'allow *' > > but still status is same,also mgr.log is being flooded with below errors. > > 2017-12-21 02:39:10.622834 7fb40a22b700 0 Cannot get stat of OSD 140 > 2017-12-21 02:39:10.622835 7fb40a22b700 0 Cannot get stat of OSD 141 > Not sure whats wrong in my setup > > Regards, > Kevin > > >> On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopez >> wrote: >> Hi, >> >> make sure client.admin user has an MGR cap using ceph auth list. At some >> point there was a glitch with the update process that was not adding the MGR >> cap to the client.admin user. >> >> JC >> >> >>> On Dec 20, 2017, at 10:02, kevin parrikar wrote: >>> >>> hi All, >>> I have upgraded the cluster from Hammer to Jewel and to Luminous . >>> >>> i am able to upload/download glance images but ceph -s shows 0kb used and >>> Available and probably because of that cinder create is failing. >>> >>> >>> ceph -s >>> cluster: >>> id: 06c5c906-fc43-499f-8a6f-6c8e21807acf >>> health: HEALTH_WARN >>> Reduced data availability: 6176 pgs inactive >>> Degraded data redundancy: 6176 pgs unclean >>> >>> services: >>> mon: 3 daemons, quorum controller3,controller2,controller1 >>> mgr: controller3(active) >>> osd: 71 osds: 71 up, 71 in >>> rgw: 1 daemon active >>> >>> data: >>> pools: 4 pools, 6176 pgs >>> objects: 0 objects, 0 bytes >>> usage: 0 kB used, 0 kB / 0 kB avail >>> pgs: 100.000% pgs unknown >>> 6176 unknown >>> >>> >>> i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create >>> ,it was successfull but for some reason ceph -s is not showing correct >>> values. >>> Can some one help me here please >>> >>> Regards, >>> Kevin >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
Thanks JC, I tried ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr 'allow *' but still status is same,also mgr.log is being flooded with below errors. 2017-12-21 02:39:10.622834 7fb40a22b700 0 Cannot get stat of OSD 140 2017-12-21 02:39:10.622835 7fb40a22b700 0 Cannot get stat of OSD 141 Not sure whats wrong in my setup Regards, Kevin On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopezwrote: > Hi, > > make sure client.admin user has an MGR cap using ceph auth list. At some > point there was a glitch with the update process that was not adding the > MGR cap to the client.admin user. > > JC > > > On Dec 20, 2017, at 10:02, kevin parrikar > wrote: > > hi All, > I have upgraded the cluster from Hammer to Jewel and to Luminous . > > i am able to upload/download glance images but ceph -s shows 0kb used and > Available and probably because of that cinder create is failing. > > > ceph -s > cluster: > id: 06c5c906-fc43-499f-8a6f-6c8e21807acf > health: HEALTH_WARN > Reduced data availability: 6176 pgs inactive > Degraded data redundancy: 6176 pgs unclean > > services: > mon: 3 daemons, quorum controller3,controller2,controller1 > mgr: controller3(active) > osd: 71 osds: 71 up, 71 in > rgw: 1 daemon active > > data: > pools: 4 pools, 6176 pgs > objects: 0 objects, 0 bytes > usage: 0 kB used, 0 kB / 0 kB avail > pgs: 100.000% pgs unknown > 6176 unknown > > > i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr > create ,it was successfull but for some reason ceph -s is not showing > correct values. > Can some one help me here please > > Regards, > Kevin > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
Hi, make sure client.admin user has an MGR cap using ceph auth list. At some point there was a glitch with the update process that was not adding the MGR cap to the client.admin user. JC > On Dec 20, 2017, at 10:02, kevin parrikarwrote: > > hi All, > I have upgraded the cluster from Hammer to Jewel and to Luminous . > > i am able to upload/download glance images but ceph -s shows 0kb used and > Available and probably because of that cinder create is failing. > > > ceph -s > cluster: > id: 06c5c906-fc43-499f-8a6f-6c8e21807acf > health: HEALTH_WARN > Reduced data availability: 6176 pgs inactive > Degraded data redundancy: 6176 pgs unclean > > services: > mon: 3 daemons, quorum controller3,controller2,controller1 > mgr: controller3(active) > osd: 71 osds: 71 up, 71 in > rgw: 1 daemon active > > data: > pools: 4 pools, 6176 pgs > objects: 0 objects, 0 bytes > usage: 0 kB used, 0 kB / 0 kB avail > pgs: 100.000% pgs unknown > 6176 unknown > > > i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create > ,it was successfull but for some reason ceph -s is not showing correct values. > Can some one help me here please > > Regards, > Kevin > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
On 20.12.2017 19:02, kevin parrikar wrote: hi All, I have upgraded the cluster from Hammer to Jewel and to Luminous . i am able to upload/download glance images but ceph -s shows 0kb used and Available and probably because of that cinder create is failing. ceph -s cluster: id: 06c5c906-fc43-499f-8a6f-6c8e21807acf health: HEALTH_WARN Reduced data availability: 6176 pgs inactive Degraded data redundancy: 6176 pgs unclean services: mon: 3 daemons, quorum controller3,controller2,controller1 mgr: controller3(active) osd: 71 osds: 71 up, 71 in rgw: 1 daemon active data: pools: 4 pools, 6176 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 6176 unknown i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create ,it was successfull but for some reason ceph -s is not showing correct values. Can some one help me here please Regards, Kevin is ceph-mgr actually running ? all statistics now require a ceph-mgr to be running. also check the mgr's logfile to see if it is able to authenticate/start properly. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Status - Segmentation Fault
On Tue, Jun 14, 2016 at 2:26 AM, Mathias Buresch <mathias.bure...@de.clara.net> wrote: > Hey, > > I opened an issue at tracker.ceph.com -> http://tracker.ceph.com/issues > /16266 Hi Mathias, Thanks! I've added some information in that bug as I came across this same issue working on something else and saw your bug this morning. Cheers, Brad -Original Message- > From: Brad Hubbard <bhubb...@redhat.com> > To: Mathias Buresch <mathias.bure...@de.clara.net> > Cc: jsp...@redhat.com <jsp...@redhat.com>, ceph-us...@ceph.com e...@ceph.com> > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > Date: Thu, 2 Jun 2016 09:50:20 +1000 > > Could this be the call in RotatingKeyRing::get_secret() failing? > > Mathias, I'd suggest opening a tracker for this with the information in > your last post and let us know the number here. > Cheers, > Brad > > On Wed, Jun 1, 2016 at 3:15 PM, Mathias Buresch <mathias.bure...@de.cla > ra.net> wrote: >> Hi, >> >> here is the output including --debug-auth=20. Does this help? >> >> (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- >> rados=20 --debug-auth=20 >> Starting program: /usr/bin/python /usr/bin/ceph status --debug- >> monc=20 >> --debug-ms=20 --debug-rados=20 --debug-auth=20 >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib/x86_64-linux- >> gnu/libthread_db.so.1". >> [New Thread 0x710f5700 (LWP 2210)] >> [New Thread 0x708f4700 (LWP 2211)] >> [Thread 0x710f5700 (LWP 2210) exited] >> [New Thread 0x710f5700 (LWP 2212)] >> [Thread 0x710f5700 (LWP 2212) exited] >> [New Thread 0x710f5700 (LWP 2213)] >> [Thread 0x710f5700 (LWP 2213) exited] >> [New Thread 0x710f5700 (LWP 2233)] >> [Thread 0x710f5700 (LWP 2233) exited] >> [New Thread 0x710f5700 (LWP 2236)] >> [Thread 0x710f5700 (LWP 2236) exited] >> [New Thread 0x710f5700 (LWP 2237)] >> [Thread 0x710f5700 (LWP 2237) exited] >> [New Thread 0x710f5700 (LWP 2238)] >> [New Thread 0x7fffeb885700 (LWP 2240)] >> 2016-06-01 07:12:55.656336 710f5700 10 monclient(hunting): >> build_initial_monmap >> 2016-06-01 07:12:55.656440 710f5700 1 librados: starting msgr at >> :/0 >> 2016-06-01 07:12:55.656446 710f5700 1 librados: starting >> objecter >> [New Thread 0x7fffeb084700 (LWP 2241)] >> 2016-06-01 07:12:55.657552 710f5700 10 -- :/0 ready :/0 >> [New Thread 0x7fffea883700 (LWP 2242)] >> [New Thread 0x7fffea082700 (LWP 2245)] >> 2016-06-01 07:12:55.659548 710f5700 1 -- :/0 messenger.start >> [New Thread 0x7fffe9881700 (LWP 2248)] >> 2016-06-01 07:12:55.660530 710f5700 1 librados: setting wanted >> keys >> 2016-06-01 07:12:55.660539 710f5700 1 librados: calling >> monclient >> init >> 2016-06-01 07:12:55.660540 710f5700 10 monclient(hunting): init >> 2016-06-01 07:12:55.660550 710f5700 5 adding auth protocol: >> cephx >> 2016-06-01 07:12:55.660552 710f5700 10 monclient(hunting): >> auth_supported 2 method cephx >> 2016-06-01 07:12:55.660532 7fffe9881700 10 -- :/1337675866 >> reaper_entry >> start >> 2016-06-01 07:12:55.660570 7fffe9881700 10 -- :/1337675866 reaper >> 2016-06-01 07:12:55.660572 7fffe9881700 10 -- :/1337675866 reaper >> done >> 2016-06-01 07:12:55.660733 710f5700 2 auth: KeyRing::load: >> loaded >> key file /etc/ceph/ceph.client.admin.keyring >> [New Thread 0x7fffe9080700 (LWP 2251)] >> [New Thread 0x7fffe887f700 (LWP 2252)] >> 2016-06-01 07:12:55.662754 710f5700 10 monclient(hunting): >> _reopen_session rank -1 name >> 2016-06-01 07:12:55.662764 710f5700 10 -- :/1337675866 >> connect_rank >> to 62.176.141.181:6789/0, creating pipe and registering >> [New Thread 0x7fffe3fff700 (LWP 2255)] >> 2016-06-01 07:12:55.663789 710f5700 10 -- :/1337675866 >> >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 >> c=0x7fffec05aa30).register_pipe >> 2016-06-01 07:12:55.663819 710f5700 10 -- :/1337675866 >> get_connection mon.0 62.176.141.181:6789/0 new 0x7fffec064010 >> 2016-06-01 07:12:55.663790 7fffe3fff700 10 -- :/1337675866 >> >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 >> c=0x7fffec05aa30).writer: state = connecting policy.server=0 >> 2016-06-01 07:12:55.663830 7fffe3fff700 10 -- :/1337675866 >> >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 >> c=0x7fffec05aa30).connect 0 >> 2016-06-01 07:1
Re: [ceph-users] Ceph Status - Segmentation Fault
Hey, I opened an issue at tracker.ceph.com -> http://tracker.ceph.com/issues /16266-Original Message- From: Brad Hubbard <bhubb...@redhat.com> To: Mathias Buresch <mathias.bure...@de.clara.net> Cc: jsp...@redhat.com <jsp...@redhat.com>, ceph-us...@ceph.com Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Thu, 2 Jun 2016 09:50:20 +1000 Could this be the call in RotatingKeyRing::get_secret() failing? Mathias, I'd suggest opening a tracker for this with the information in your last post and let us know the number here. Cheers, Brad On Wed, Jun 1, 2016 at 3:15 PM, Mathias Buresch <mathias.bure...@de.cla ra.net> wrote: > Hi, > > here is the output including --debug-auth=20. Does this help? > > (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- > rados=20 --debug-auth=20 > Starting program: /usr/bin/python /usr/bin/ceph status --debug- > monc=20 > --debug-ms=20 --debug-rados=20 --debug-auth=20 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux- > gnu/libthread_db.so.1". > [New Thread 0x710f5700 (LWP 2210)] > [New Thread 0x708f4700 (LWP 2211)] > [Thread 0x710f5700 (LWP 2210) exited] > [New Thread 0x710f5700 (LWP 2212)] > [Thread 0x710f5700 (LWP 2212) exited] > [New Thread 0x710f5700 (LWP 2213)] > [Thread 0x710f5700 (LWP 2213) exited] > [New Thread 0x710f5700 (LWP 2233)] > [Thread 0x710f5700 (LWP 2233) exited] > [New Thread 0x710f5700 (LWP 2236)] > [Thread 0x710f5700 (LWP 2236) exited] > [New Thread 0x710f5700 (LWP 2237)] > [Thread 0x710f5700 (LWP 2237) exited] > [New Thread 0x710f5700 (LWP 2238)] > [New Thread 0x7fffeb885700 (LWP 2240)] > 2016-06-01 07:12:55.656336 710f5700 10 monclient(hunting): > build_initial_monmap > 2016-06-01 07:12:55.656440 710f5700 1 librados: starting msgr at > :/0 > 2016-06-01 07:12:55.656446 710f5700 1 librados: starting > objecter > [New Thread 0x7fffeb084700 (LWP 2241)] > 2016-06-01 07:12:55.657552 710f5700 10 -- :/0 ready :/0 > [New Thread 0x7fffea883700 (LWP 2242)] > [New Thread 0x7fffea082700 (LWP 2245)] > 2016-06-01 07:12:55.659548 710f5700 1 -- :/0 messenger.start > [New Thread 0x7fffe9881700 (LWP 2248)] > 2016-06-01 07:12:55.660530 710f5700 1 librados: setting wanted > keys > 2016-06-01 07:12:55.660539 710f5700 1 librados: calling > monclient > init > 2016-06-01 07:12:55.660540 710f5700 10 monclient(hunting): init > 2016-06-01 07:12:55.660550 710f5700 5 adding auth protocol: > cephx > 2016-06-01 07:12:55.660552 710f5700 10 monclient(hunting): > auth_supported 2 method cephx > 2016-06-01 07:12:55.660532 7fffe9881700 10 -- :/1337675866 > reaper_entry > start > 2016-06-01 07:12:55.660570 7fffe9881700 10 -- :/1337675866 reaper > 2016-06-01 07:12:55.660572 7fffe9881700 10 -- :/1337675866 reaper > done > 2016-06-01 07:12:55.660733 710f5700 2 auth: KeyRing::load: > loaded > key file /etc/ceph/ceph.client.admin.keyring > [New Thread 0x7fffe9080700 (LWP 2251)] > [New Thread 0x7fffe887f700 (LWP 2252)] > 2016-06-01 07:12:55.662754 710f5700 10 monclient(hunting): > _reopen_session rank -1 name > 2016-06-01 07:12:55.662764 710f5700 10 -- :/1337675866 > connect_rank > to 62.176.141.181:6789/0, creating pipe and registering > [New Thread 0x7fffe3fff700 (LWP 2255)] > 2016-06-01 07:12:55.663789 710f5700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).register_pipe > 2016-06-01 07:12:55.663819 710f5700 10 -- :/1337675866 > get_connection mon.0 62.176.141.181:6789/0 new 0x7fffec064010 > 2016-06-01 07:12:55.663790 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).writer: state = connecting policy.server=0 > 2016-06-01 07:12:55.663830 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connect 0 > 2016-06-01 07:12:55.663841 710f5700 10 monclient(hunting): picked > mon.pix01 con 0x7fffec05aa30 addr 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663847 710f5700 20 -- :/1337675866 > send_keepalive con 0x7fffec05aa30, have pipe. > 2016-06-01 07:12:55.663850 7fffe3fff700 10 -- :/1337675866 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connecting to 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663863 710f5700 10 monclient(hunting): > _send_mon_message to mon.pix01 at 62.176.141.181:6789/0 > 2016-06-01 07:12:55.663866 710f5700 1 -- :/1337675866 --> > 62.176.141.181:6789/0 -- auth(proto
Re: [ceph-users] Ceph Status - Segmentation Fault
t; 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).reader got message 2 0x7fffd0002f20 auth_reply(proto > 2 0 (0) Success) v1 > 2016-06-01 07:12:55.665944 7fffe3efe700 20 -- > 62.176.141.181:0/1337675866 queue 0x7fffd0002f20 prio 196 > 2016-06-01 07:12:55.665950 7fffe3efe700 20 -- > 62.176.141.181:0/1337675866 >> 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).reader reading tag... > 2016-06-01 07:12:55.665891 7fffea883700 1 -- > 62.176.141.181:0/1337675866 <== mon.0 62.176.141.181:6789/0 1 > mon_map magic: 0 v1 340+0+0 (3213884171 0 0) 0x7fffd0001cb0 con > 0x7fffec05aa30 > 2016-06-01 07:12:55.665953 7fffe3fff700 10 -- > 62.176.141.181:0/1337675866 >> 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).writer: state = open policy.server=0 > 2016-06-01 07:12:55.665960 7fffea883700 10 monclient(hunting): > handle_monmap mon_map magic: 0 v1 > 2016-06-01 07:12:55.665960 7fffe3fff700 10 -- > 62.176.141.181:0/1337675866 >> 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).write_ack 2 > 2016-06-01 07:12:55.665966 7fffe3fff700 10 -- > 62.176.141.181:0/1337675866 >> 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).writer: state = open policy.server=0 > 2016-06-01 07:12:55.665971 7fffea883700 10 monclient(hunting): got > monmap 1, mon.pix01 is now rank 0 > 2016-06-01 07:12:55.665970 7fffe3fff700 20 -- > 62.176.141.181:0/1337675866 >> 62.176.141.181:6789/0 > pipe(0x7fffec064010 sd=3 :41128 s=2 pgs=339278 cs=1 l=1 > c=0x7fffec05aa30).writer sleeping > 2016-06-01 07:12:55.665972 7fffea883700 10 monclient(hunting): dump: > epoch 1 > fsid 28af67eb-4060-4770-ac1d-d2be493877af > last_changed 2014-11-12 15:44:27.182395 > created 2014-11-12 15:44:27.182395 > 0: 62.176.141.181:6789/0 mon.pix01 > 1: 62.176.141.182:6789/0 mon.pix02 > > 2016-06-01 07:12:55.665988 7fffea883700 10 -- > 62.176.141.181:0/1337675866 dispatch_throttle_release 340 to dispatch > throttler 373/104857600 > 2016-06-01 07:12:55.665992 7fffea883700 20 -- > 62.176.141.181:0/1337675866 done calling dispatch on 0x7fffd0001cb0 > 2016-06-01 07:12:55.665997 7fffea883700 1 -- > 62.176.141.181:0/1337675866 <== mon.0 62.176.141.181:6789/0 2 > auth_reply(proto 2 0 (0) Success) v1 33+0+0 (3918039325 0 0) > 0x7fffd0002f20 con 0x7fffec05aa30 > 2016-06-01 07:12:55.666015 7fffea883700 10 cephx: set_have_need_key no > handler for service mon > 2016-06-01 07:12:55.666016 7fffea883700 10 cephx: set_have_need_key no > handler for service osd > 2016-06-01 07:12:55.666017 7fffea883700 10 cephx: set_have_need_key no > handler for service auth > 2016-06-01 07:12:55.666018 7fffea883700 10 cephx: validate_tickets want > 37 have 0 need 37 > 2016-06-01 07:12:55.666020 7fffea883700 10 monclient(hunting): my > global_id is 3511432 > 2016-06-01 07:12:55.666022 7fffea883700 10 cephx client: > handle_response ret = 0 > 2016-06-01 07:12:55.666023 7fffea883700 10 cephx client: got initial > server challenge 3112857369079243605 > 2016-06-01 07:12:55.666025 7fffea883700 10 cephx client: > validate_tickets: want=37 need=37 have=0 > 2016-06-01 07:12:55.666026 7fffea883700 10 cephx: set_have_need_key no > handler for service mon > 2016-06-01 07:12:55.666027 7fffea883700 10 cephx: set_have_need_key no > handler for service osd > 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: set_have_need_key no > handler for service auth > 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: validate_tickets want > 37 have 0 need 37 > 2016-06-01 07:12:55.666031 7fffea883700 10 cephx client: want=37 > need=37 have=0 > 2016-06-01 07:12:55.666034 7fffea883700 10 cephx client: build_request > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffea883700 (LWP 2242)] > 0x73141a57 in encrypt (cct=, > error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) > at auth/cephx/../Crypto.h:110 > 110 auth/cephx/../Crypto.h: No such file or directory. > (gdb) bt > #0 0x73141a57 in encrypt (cct=, > error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) > at auth/cephx/../Crypto.h:110 > #1 encode_encrypt_enc_bl (cct=, > error="", out=..., key=..., t=) > at auth/cephx/CephxProtocol.h:464 > #2 encode_encrypt (cct=, error="", > out=..., key=..., t=) > at auth/cephx/CephxProtocol.h:489 > #3 cephx_calc_client_server_challenge (cct=, > secret=..., server_challenge=3112857369079243605, > client_challenge=12899511428024786235, key=key@entry=0
Re: [ceph-users] Ceph Status - Segmentation Fault
d=3 :41128 s=2 pgs=339278 cs=1 l=1 c=0x7fffec05aa30).writer sleeping 2016-06-01 07:12:55.665972 7fffea883700 10 monclient(hunting): dump: epoch 1 fsid 28af67eb-4060-4770-ac1d-d2be493877af last_changed 2014-11-12 15:44:27.182395 created 2014-11-12 15:44:27.182395 0: 62.176.141.181:6789/0 mon.pix01 1: 62.176.141.182:6789/0 mon.pix02 2016-06-01 07:12:55.665988 7fffea883700 10 -- 62.176.141.181:0/1337675866 dispatch_throttle_release 340 to dispatch throttler 373/104857600 2016-06-01 07:12:55.665992 7fffea883700 20 -- 62.176.141.181:0/1337675866 done calling dispatch on 0x7fffd0001cb0 2016-06-01 07:12:55.665997 7fffea883700 1 -- 62.176.141.181:0/1337675866 <== mon.0 62.176.141.181:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (3918039325 0 0) 0x7fffd0002f20 con 0x7fffec05aa30 2016-06-01 07:12:55.666015 7fffea883700 10 cephx: set_have_need_key no handler for service mon 2016-06-01 07:12:55.666016 7fffea883700 10 cephx: set_have_need_key no handler for service osd 2016-06-01 07:12:55.666017 7fffea883700 10 cephx: set_have_need_key no handler for service auth 2016-06-01 07:12:55.666018 7fffea883700 10 cephx: validate_tickets want 37 have 0 need 37 2016-06-01 07:12:55.666020 7fffea883700 10 monclient(hunting): my global_id is 3511432 2016-06-01 07:12:55.666022 7fffea883700 10 cephx client: handle_response ret = 0 2016-06-01 07:12:55.666023 7fffea883700 10 cephx client: got initial server challenge 3112857369079243605 2016-06-01 07:12:55.666025 7fffea883700 10 cephx client: validate_tickets: want=37 need=37 have=0 2016-06-01 07:12:55.666026 7fffea883700 10 cephx: set_have_need_key no handler for service mon 2016-06-01 07:12:55.666027 7fffea883700 10 cephx: set_have_need_key no handler for service osd 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: set_have_need_key no handler for service auth 2016-06-01 07:12:55.666030 7fffea883700 10 cephx: validate_tickets want 37 have 0 need 37 2016-06-01 07:12:55.666031 7fffea883700 10 cephx client: want=37 need=37 have=0 2016-06-01 07:12:55.666034 7fffea883700 10 cephx client: build_request Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 2242)] 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 110 auth/cephx/../Crypto.h: No such file or directory. (gdb) bt #0 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 #1 encode_encrypt_enc_bl (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:464 #2 encode_encrypt (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:489 #3 cephx_calc_client_server_challenge (cct=, secret=..., server_challenge=3112857369079243605, client_challenge=12899511428024786235, key=key@entry=0x7fffea8824a8 , ret="") at auth/cephx/CephxProtocol.cc:36 #4 0x7313aff4 in CephxClientHandler::build_request (this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53 #5 0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff fec006b70, m=m@entry=0x7fffd0002f20) at mon/MonClient.cc:510 #6 0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70, m=0x7fffd0002f20) at mon/MonClient.cc:277 #7 0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002f20, this=0x7fffec055410) at ./msg/Messenger.h:582 #8 DispatchQueue::entry (this=0x7fffec0555d8) at msg/simple/DispatchQueue.cc:185 #9 0x731023bd in DispatchQueue::DispatchThread::entry (this=) at msg/simple/DispatchQueue.h:103 #10 0x7ffff7bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #11 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Best regards Mathias-Original Message- From: Brad Hubbard <bhubb...@redhat.com> To: jsp...@redhat.com Cc: ceph-us...@ceph.com, Mathias Buresch <mathias.bure...@de.clara.net> Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 19:22:03 -0400 Hi John, This looks a lot like http://tracker.ceph.com/issues/12417 which is, of course, fixed. Worth gathering debug-auth=20 ? Maybe on the MON end as well? Cheers, Brad - Original Message - > > From: "Mathias Buresch" <mathias.bure...@de.clara.net> > To: jsp...@redhat.com > Cc: ceph-us...@ceph.com > Sent: Thursday, 26 May, 2016 12:57:47 AM > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > > There wasnt a package ceph-debuginfo available (Maybe bc I am running > Ubuntu). Have installed those: > > * ceph-dbg > * librados2-dbg > > There would be also ceph-mds-dbg and ceph-fs-common-dbg and so.. > > But now there are more information provided by the gdb output :) > > (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- > rados=20 > Starting program: /usr/bin/python /usr/bin/ceph
Re: [ceph-users] Ceph Status - Segmentation Fault
Hi John, This looks a lot like http://tracker.ceph.com/issues/12417 which is, of course, fixed. Worth gathering debug-auth=20 ? Maybe on the MON end as well? Cheers, Brad - Original Message - > From: "Mathias Buresch" <mathias.bure...@de.clara.net> > To: jsp...@redhat.com > Cc: ceph-us...@ceph.com > Sent: Thursday, 26 May, 2016 12:57:47 AM > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > > There wasnt a package ceph-debuginfo available (Maybe bc I am running > Ubuntu). Have installed those: > > * ceph-dbg > * librados2-dbg > > There would be also ceph-mds-dbg and ceph-fs-common-dbg and so.. > > But now there are more information provided by the gdb output :) > > (gdb) run /usr/bin/ceph status --debug-monc=20 --debug-ms=20 --debug- > rados=20 > Starting program: /usr/bin/python /usr/bin/ceph status --debug-monc=20 > --debug-ms=20 --debug-rados=20 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux- > gnu/libthread_db.so.1". > [New Thread 0x710f5700 (LWP 26739)] > [New Thread 0x708f4700 (LWP 26740)] > [Thread 0x710f5700 (LWP 26739) exited] > [New Thread 0x710f5700 (LWP 26741)] > [Thread 0x710f5700 (LWP 26741) exited] > [New Thread 0x710f5700 (LWP 26742)] > [Thread 0x710f5700 (LWP 26742) exited] > [New Thread 0x710f5700 (LWP 26743)] > [Thread 0x710f5700 (LWP 26743) exited] > [New Thread 0x710f5700 (LWP 26744)] > [Thread 0x710f5700 (LWP 26744) exited] > [New Thread 0x710f5700 (LWP 26745)] > [Thread 0x710f5700 (LWP 26745) exited] > [New Thread 0x710f5700 (LWP 26746)] > [New Thread 0x7fffeb885700 (LWP 26747)] > 2016-05-25 16:55:30.929131 710f5700 10 monclient(hunting): > build_initial_monmap > 2016-05-25 16:55:30.929221 710f5700 1 librados: starting msgr at > :/0 > 2016-05-25 16:55:30.929226 710f5700 1 librados: starting objecter > [New Thread 0x7fffeb084700 (LWP 26748)] > 2016-05-25 16:55:30.930288 710f5700 10 -- :/0 ready :/0 > [New Thread 0x7fffea883700 (LWP 26749)] > [New Thread 0x7fffea082700 (LWP 26750)] > 2016-05-25 16:55:30.932251 710f5700 1 -- :/0 messenger.start > [New Thread 0x7fffe9881700 (LWP 26751)] > 2016-05-25 16:55:30.933277 710f5700 1 librados: setting wanted > keys > 2016-05-25 16:55:30.933287 710f5700 1 librados: calling monclient > init > 2016-05-25 16:55:30.933289 710f5700 10 monclient(hunting): init > 2016-05-25 16:55:30.933279 7fffe9881700 10 -- :/3663984981 reaper_entry > start > 2016-05-25 16:55:30.933300 710f5700 10 monclient(hunting): > auth_supported 2 method cephx > 2016-05-25 16:55:30.933303 7fffe9881700 10 -- :/3663984981 reaper > 2016-05-25 16:55:30.933305 7fffe9881700 10 -- :/3663984981 reaper done > [New Thread 0x7fffe9080700 (LWP 26752)] > [New Thread 0x7fffe887f700 (LWP 26753)] > 2016-05-25 16:55:30.935485 710f5700 10 monclient(hunting): > _reopen_session rank -1 name > 2016-05-25 16:55:30.935495 710f5700 10 -- :/3663984981 connect_rank > to 62.176.141.181:6789/0, creating pipe and registering > [New Thread 0x7fffe3fff700 (LWP 26754)] > 2016-05-25 16:55:30.936556 710f5700 10 -- :/3663984981 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).register_pipe > 2016-05-25 16:55:30.936573 710f5700 10 -- :/3663984981 > get_connection mon.0 62.176.141.181:6789/0 new 0x7fffec064010 > 2016-05-25 16:55:30.936557 7fffe3fff700 10 -- :/3663984981 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).writer: state = connecting policy.server=0 > 2016-05-25 16:55:30.936583 7fffe3fff700 10 -- :/3663984981 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=-1 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connect 0 > 2016-05-25 16:55:30.936594 710f5700 10 monclient(hunting): picked > mon.pix01 con 0x7fffec05aa30 addr 62.176.141.181:6789/0 > 2016-05-25 16:55:30.936600 710f5700 20 -- :/3663984981 > send_keepalive con 0x7fffec05aa30, have pipe. > 2016-05-25 16:55:30.936603 7fffe3fff700 10 -- :/3663984981 >> > 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fffec05aa30).connecting to 62.176.141.181:6789/0 > 2016-05-25 16:55:30.936615 710f5700 10 monclient(hunting): > _send_mon_message to mon.pix01 at 62.176.141.181:6789/0 > 2016-05-25 16:55:30.936618 710f5700 1 -- :/3663984981 --> > 62.176.141.181:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 > 0x7fffec060450 con 0x7fffec05aa30 > 2016-05-25 16:55:30.936623 710f5700 20 -- :/3663984981 > submit_message auth(proto 0 30 bytes epoch 0) v1 remote, > 62.176.141.181:6789/0, have pipe. > 2016-05
Re: [ceph-users] Ceph Status - Segmentation Fault
cs=1 l=1 c=0x7fffec05aa30).aborted = 0 2016-05-25 16:55:30.938413 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got 340 + 0 + 0 byte message 2016-05-25 16:55:30.938427 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).No session security set 2016-05-25 16:55:30.938434 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got message 1 0x7fffd0001cb0 mon_map magic: 0 v1 2016-05-25 16:55:30.938442 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 queue 0x7fffd0001cb0 prio 196 2016-05-25 16:55:30.938450 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader reading tag... 2016-05-25 16:55:30.938453 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer: state = open policy.server=0 2016-05-25 16:55:30.938464 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).write_ack 1 2016-05-25 16:55:30.938467 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got MSG 2016-05-25 16:55:30.938471 7fffe3fff700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer: state = open policy.server=0 2016-05-25 16:55:30.938472 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got envelope type=18 src mon.0 front=33 data=0 off 0 2016-05-25 16:55:30.938475 7fffe3fff700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).writer sleeping 2016-05-25 16:55:30.938476 7fffe3efe700 10 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader wants 33 from dispatch throttler 340/104857600 2016-05-25 16:55:30.938456 7fffea883700 1 -- 62.176.141.181:0/3663984981 <== mon.0 62.176.141.181:6789/0 1 mon_map magic: 0 v1 340+0+0 (3213884171 0 0) 0x7fffd0001cb0 con 0x7fffec05aa30 2016-05-25 16:55:30.938481 7fffe3efe700 20 -- 62.176.141.181:0/3663984981 >> 62.176.141.181:6789/0 pipe(0x7fffec064010 sd=3 :38763 s=2 pgs=327867 cs=1 l=1 c=0x7fffec05aa30).reader got front 33 2016-05-25 16:55:30.938484 7fffea883700 10 monclient(hunting): handle_monmap mon_map magic: 0 v1 2016-05-25 16:55:30.938485 7fffe3efe700 10 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 26749)] 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 110 auth/cephx/../Crypto.h: No such file or directory. (gdb) bt #0 0x73141a57 in encrypt (cct=, error=0x7fffea882280, out=..., in=..., this=0x7fffea882470) at auth/cephx/../Crypto.h:110 #1 encode_encrypt_enc_bl (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:464 #2 encode_encrypt (cct=, error="", out=..., key=..., t=) at auth/cephx/CephxProtocol.h:489 #3 cephx_calc_client_server_challenge (cct=, secret=..., server_challenge=9622349603176979543, client_challenge=7732813711656640623, key=key@entry=0x7fffea8824a8, ret="") at auth/cephx/CephxProtocol.cc:36 #4 0x7313aff4 in CephxClientHandler::build_request (this=0x7fffd4001520, bl=...) at auth/cephx/CephxClientHandler.cc:53 #5 0x72fe4a79 in MonClient::handle_auth (this=this@entry=0x7ff fec006b70, m=m@entry=0x7fffd0002ee0) at mon/MonClient.cc:510 #6 0x72fe6507 in MonClient::ms_dispatch (this=0x7fffec006b70, m=0x7fffd0002ee0) at mon/MonClient.cc:277 #7 0x730d5dc9 in ms_deliver_dispatch (m=0x7fffd0002ee0, this=0x7fffec055410) at ./msg/Messenger.h:582 #8 DispatchQueue::entry (this=0x7fffec0555d8) at msg/simple/DispatchQueue.cc:185 #9 0x731023bd in DispatchQueue::DispatchThread::entry (this=) at msg/simple/DispatchQueue.h:103 #10 0x77bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #11 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 -Original Message- From: John Spray <jsp...@redhat.com> To: Mathias Buresch <mathias.bure...@de.clara.net> Cc: ceph-us...@ceph.com <ceph-us...@ceph.com> Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 15:41:51 +0100 On Wed, M
Re: [ceph-users] Ceph Status - Segmentation Fault
mon_map magic: 0 > v1 > 2016-05-25 14:51:02.408827 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 queue 0x7f186c001cb0 prio 196 > 2016-05-25 14:51:02.408837 7f1879efa700 20 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).reader reading tag... > 2016-05-25 14:51:02.408851 7f1879ffb700 10 -- > 62.176.141.181:0/2987460054 >> 62.176.141.181:6789/0 > pipe(0x7f187c064010 sd=3 :37964 s=2 pgs=327710 cs=1 l=1 > c=0x7f187c05aa50).writer: state = open policy.server=0 > Segmentation fault > > > (gdb) run /usr/bin/ceph status > Starting program: /usr/bin/python /usr/bin/ceph status > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux- > gnu/libthread_db.so.1". > [New Thread 0x710f5700 (LWP 23401)] > [New Thread 0x708f4700 (LWP 23402)] > [Thread 0x710f5700 (LWP 23401) exited] > [New Thread 0x710f5700 (LWP 23403)] > [Thread 0x710f5700 (LWP 23403) exited] > [New Thread 0x710f5700 (LWP 23404)] > [Thread 0x710f5700 (LWP 23404) exited] > [New Thread 0x710f5700 (LWP 23405)] > [Thread 0x710f5700 (LWP 23405) exited] > [New Thread 0x710f5700 (LWP 23406)] > [Thread 0x710f5700 (LWP 23406) exited] > [New Thread 0x710f5700 (LWP 23407)] > [Thread 0x710f5700 (LWP 23407) exited] > [New Thread 0x710f5700 (LWP 23408)] > [New Thread 0x7fffeb885700 (LWP 23409)] > [New Thread 0x7fffeb084700 (LWP 23410)] > [New Thread 0x7fffea883700 (LWP 23411)] > [New Thread 0x7fffea082700 (LWP 23412)] > [New Thread 0x7fffe9881700 (LWP 23413)] > [New Thread 0x7fffe9080700 (LWP 23414)] > [New Thread 0x7fffe887f700 (LWP 23415)] > [New Thread 0x7fffe807e700 (LWP 23416)] > [New Thread 0x7fffe7f7d700 (LWP 23419)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffea883700 (LWP 23411)] > 0x73141a57 in ?? () from /usr/lib/librados.so.2 > (gdb) bt > #0 0x73141a57 in ?? () from /usr/lib/librados.so.2 > #1 0x7313aff4 in ?? () from /usr/lib/librados.so.2 > #2 0x72fe4a79 in ?? () from /usr/lib/librados.so.2 > #3 0x72fe6507 in ?? () from /usr/lib/librados.so.2 > #4 0x730d5dc9 in ?? () from /usr/lib/librados.so.2 > #5 0x731023bd in ?? () from /usr/lib/librados.so.2 > #6 0x77bc4182 in start_thread () from /lib/x86_64-linux- > gnu/libpthread.so.0 > #7 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > > Does that help? I cant really see where the error is. :) Hmm, can you try getting that backtrace again after installing the ceph-debuginfo package? Also add --debug-rados=20 to your command line (you can use all the --debug... options when you're running inside gdb to get the logs and the backtrace in one). John > > -Original Message- > From: John Spray <jsp...@redhat.com> > To: Mathias Buresch <mathias.bure...@de.clara.net> > Cc: ceph-us...@ceph.com <ceph-us...@ceph.com> > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > Date: Wed, 25 May 2016 10:16:55 +0100 > > On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch > <mathias.bure...@de.clara.net> wrote: >> >> Please found the logs with higher debug level attached to this email. > You've attached the log from your mon, but it's not your mon that's > segfaulting, right? > > You can use normal ceph command line flags to crank up the verbosity > on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind). > > You can also run the ceph CLI in gdb like this: > gdb python > (gdb) run /usr/bin/ceph status > ... hopefully it crashes and then ... > (gdb) bt > > Cheers, > John > >> >> >> >> Kind regards >> Mathias >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Status - Segmentation Fault
fff10f5700 (LWP 23403) exited] [New Thread 0x710f5700 (LWP 23404)] [Thread 0x710f5700 (LWP 23404) exited] [New Thread 0x710f5700 (LWP 23405)] [Thread 0x710f5700 (LWP 23405) exited] [New Thread 0x710f5700 (LWP 23406)] [Thread 0x710f5700 (LWP 23406) exited] [New Thread 0x710f5700 (LWP 23407)] [Thread 0x710f5700 (LWP 23407) exited] [New Thread 0x710f5700 (LWP 23408)] [New Thread 0x7fffeb885700 (LWP 23409)] [New Thread 0x7fffeb084700 (LWP 23410)] [New Thread 0x7fffea883700 (LWP 23411)] [New Thread 0x7fffea082700 (LWP 23412)] [New Thread 0x7fffe9881700 (LWP 23413)] [New Thread 0x7fffe9080700 (LWP 23414)] [New Thread 0x7fffe887f700 (LWP 23415)] [New Thread 0x7fffe807e700 (LWP 23416)] [New Thread 0x7fffe7f7d700 (LWP 23419)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffea883700 (LWP 23411)] 0x73141a57 in ?? () from /usr/lib/librados.so.2 (gdb) bt #0 0x73141a57 in ?? () from /usr/lib/librados.so.2 #1 0x7313aff4 in ?? () from /usr/lib/librados.so.2 #2 0x72fe4a79 in ?? () from /usr/lib/librados.so.2 #3 0x72fe6507 in ?? () from /usr/lib/librados.so.2 #4 0x730d5dc9 in ?? () from /usr/lib/librados.so.2 #5 0x731023bd in ?? () from /usr/lib/librados.so.2 #6 0x77bc4182 in start_thread () from /lib/x86_64-linux- gnu/libpthread.so.0 #7 0x778f147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Does that help? I cant really see where the error is. :) -Original Message- From: John Spray <jsp...@redhat.com> To: Mathias Buresch <mathias.bure...@de.clara.net> Cc: ceph-us...@ceph.com <ceph-us...@ceph.com> Subject: Re: [ceph-users] Ceph Status - Segmentation Fault Date: Wed, 25 May 2016 10:16:55 +0100 On Mon, May 23, 2016 at 12:41 PM, Mathias Buresch <mathias.bure...@de.clara.net> wrote: > > Please found the logs with higher debug level attached to this email. You've attached the log from your mon, but it's not your mon that's segfaulting, right? You can use normal ceph command line flags to crank up the verbosity on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind). You can also run the ceph CLI in gdb like this: gdb python (gdb) run /usr/bin/ceph status ... hopefully it crashes and then ... (gdb) bt Cheers, John > > > > Kind regards > Mathias > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Status - Segmentation Fault
On Mon, May 23, 2016 at 12:41 PM, Mathias Bureschwrote: > Please found the logs with higher debug level attached to this email. You've attached the log from your mon, but it's not your mon that's segfaulting, right? You can use normal ceph command line flags to crank up the verbosity on the CLI too (--debug-monc=20 --debug-ms=20 spring to mind). You can also run the ceph CLI in gdb like this: gdb python (gdb) run /usr/bin/ceph status ... hopefully it crashes and then ... (gdb) bt Cheers, John > > > Kind regards > Mathias > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Status - Segmentation Fault
/usr/bin/ceph is a python script so it's not segfaulting but some binary it's launching is and there doesn't appear to be much information about it in the log you uploaded. Are you able to capture a core file and generate a stack trace from gdb? The following may help to get some data. $ ulimit -c unlimited $ ceph -s $ ls core.* // This should list a recently made core file $ file core.XXX // Now run gdb with the output of the previous "file" command $ gdb -c core.XXX $(which binary_name) -batch -ex "thr apply all bt" $ ulimit -c 0 You may need debuginfo for the relevant binary and libraries installed to get good stack traces but it's something you can try. For example. $ ulimit -c unlimited $ sleep 100 & [1] 32056 $ kill -SIGSEGV 32056 $ ls core.* core.32056 [1]+ Segmentation fault (core dumped) sleep 100 $ file core.32056 core.32056: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'sleep 100' $ gdb -c core.32056 $(which sleep) -batch -ex "thr apply all bt" [New LWP 32056] warning: the debug information found in "/usr/lib/debug//lib64/libc-2.22.so.debug" does not match "/lib64/libc.so.6" (CRC mismatch). warning: the debug information found in "/usr/lib/debug//usr/lib64/libc-2.22.so.debug" does not match "/lib64/libc.so.6" (CRC mismatch). warning: the debug information found in "/usr/lib/debug//lib64/ld-2.22.so.debug" does not match "/lib64/ld-linux-x86-64.so.2" (CRC mismatch). warning: the debug information found in "/usr/lib/debug//usr/lib64/ld-2.22.so.debug" does not match "/lib64/ld-linux-x86-64.so.2" (CRC mismatch). Core was generated by `sleep 100'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x7f1fd99e84b0 in __nanosleep_nocancel () from /lib64/libc.so.6 Thread 1 (LWP 32056): #0 0x7f1fd99e84b0 in __nanosleep_nocancel () from /lib64/libc.so.6 #1 0x5641e10ba29f in rpl_nanosleep () #2 0x5641e10ba100 in xnanosleep () #3 0x5641e10b7a1d in main () $ ulimit -c 0 HTH, Brad - Original Message - > From: "Mathias Buresch"> To: ceph-us...@ceph.com > Sent: Monday, 23 May, 2016 9:41:51 PM > Subject: [ceph-users] Ceph Status - Segmentation Fault > > Hi there, > I was updating Ceph to 0.94.7 and now I am getting segmantation faults. > > When getting status via "ceph -s" or "ceph health detail" I am getting > an error "Segmentation fault". > > I have only two Monitor Deamon.. but didn't had any problems yet with > that.. maybe they maintenance time was too long this time..?! > > When getting the status via admin socket I get following for both: > > ceph daemon mon.pix01 mon_status > { > "name": "pix01", > "rank": 0, > "state": "leader", > "election_epoch": 226, > "quorum": [ > 0, > 1 > ], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { > "epoch": 1, > "fsid": "28af67eb-4060-4770-ac1d-d2be493877af", > "modified": "2014-11-12 15:44:27.182395", > "created": "2014-11-12 15:44:27.182395", > "mons": [ > { > "rank": 0, > "name": "pix01", > "addr": "x.x.x.x:6789\/0" > }, > { > "rank": 1, > "name": "pix02", > "addr": "x.x.x.x:6789\/0" > } > ] > } > } > > ceph daemon mon.pix02 mon_status > { > "name": "pix02", > "rank": 1, > "state": "peon", > "election_epoch": 226, > "quorum": [ > 0, > 1 > ], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { > "epoch": 1, > "fsid": "28af67eb-4060-4770-ac1d-d2be493877af", > "modified": "2014-11-12 15:44:27.182395", > "created": "2014-11-12 15:44:27.182395", > "mons": [ > { > "rank": 0, > "name": "pix01", > "addr": "x.x.x.x:6789\/0" > }, > { > "rank": 1, > "name": "pix02", > "addr": "x.x.x.x:6789\/0" > } > ] > } > } > > Please found the logs with higher debug level attached to this email. > > > Kind regards > Mathias > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph status
Hi Ajitha, For one, it looks like you don't have enough OSDs for the number of replicas you have specified in the config file. What is the value of your 'osd pool default size' in ceph.conf? If it's 3, for example, then you need to have at least 3 hosts with 1 OSD each (with the default CRUSH rules, IIRC). Alternatively, you could reduce the replication level. You can see how to do that here: http://ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas The other warning indicates that your monitor VM has a nearly full disk. Hope that helps! Cheers, Lincoln On 1/6/2015 5:07 AM, Ajitha Robert wrote: Hi all, I have installed ceph using ceph-deploy utility.. I have created three VM's, one for monitor+mds and other two VM's for OSD's. ceph admin is another seperate machine... .Status and health of ceph are shown below.. Can you please suggest What i can infer from the status.. I m a beginner to this.. *ceph status* cluster 3a946c74-b16d-41bd-a5fe-41efa96f0ee9 health HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale; 46 pgs stuck degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs stuck unclean; 46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk space monmap e1: 1 mons at {MON=10.184.39.66:6789/0}, election epoch 1, quorum 0 MON osdmap e19: 5 osds: 2 up, 2 in pgmap v33: 64 pgs, 1 pools, 0 bytes data, 0 objects 10304 MB used, 65947 MB / 76252 MB avail 18 stale+incomplete 46 stale+active+undersized+degraded *ceph health* HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale; 46 pgs stuck degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs stuck unclean; 46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk space *ceph -w* cluster 3a946c74-b16d-41bd-a5fe-41efa96f0ee9 health HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale; 46 pgs stuck degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs stuck unclean; 46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk space monmap e1: 1 mons at {MON=10.184.39.66:6789/0}, election epoch 1, quorum 0 MON osdmap e19: 5 osds: 2 up, 2 in pgmap v31: 64 pgs, 1 pools, 0 bytes data, 0 objects 10305 MB used, 65947 MB / 76252 MB avail 18 stale+incomplete 46 stale+active+undersized+degraded 2015-01-05 20:38:53.159998 mon.0 [INF] from='client.? 10.184.39.66:0/1011909' entity='client.bootstrap-mds' cmd='[{prefix: auth get-or-create, entity: mds.MON, caps: [osd, allow rwx, mds, allow, mon, allow profile mds]}]': finished 2015-01-05 20:41:42.003690 mon.0 [INF] pgmap v32: 64 pgs: 18 stale+incomplete, 46 stale+active+undersized+degraded; 0 bytes data, 10304 MB used, 65947 MB / 76252 MB avail 2015-01-05 20:41:50.100784 mon.0 [INF] pgmap v33: 64 pgs: 18 stale+incomplete, 46 stale+active+undersized+degraded; 0 bytes data, 10304 MB used, 65947 MB / 76252 MB avail *Regards,Ajitha R* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status 104 active+degraded+remapped 88 creating+incomplete
On 29.10.2014 18:29, Thomas Alrin wrote: Hi all, I'm new to ceph. What is wrong in this ceph? How can i make status to change HEALTH_OK? Please help With the current default pool size of 3 and the default crush rule you need at least 3 OSDs on separate nodes for a new ceph cluster to start. With 2 OSDs on one node you need to change the pool replica size and the crush rule. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph --status Missing keyring
Dan, Do you have /etc/ceph/ceph.client.admin.keyring, or is that in a local directory? Ceph will be looking for it in the /etc/ceph directory by default. See if adding read permissions works, e.g., sudo chmod +r. You can also try sudo when executing ceph. On Wed, Aug 6, 2014 at 6:55 AM, O'Reilly, Dan daniel.orei...@dish.com wrote: Any idea what may be the issue here? [ceph@tm1cldcphal01 ~]$ ceph --status 2014-08-06 07:53:21.767255 7fe31fd1e700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2014-08-06 07:53:21.767263 7fe31fd1e700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound [ceph@tm1cldcphal01 ~]$ ll total 372 -rw--- 1 ceph ceph 71 Aug 5 21:07 ceph.bootstrap-mds.keyring -rw--- 1 ceph ceph 71 Aug 5 21:07 ceph.bootstrap-osd.keyring -rw--- 1 ceph ceph 63 Aug 5 21:07 ceph.client.admin.keyring -rw--- 1 ceph ceph289 Aug 5 21:01 ceph.conf -rw--- 1 ceph ceph 355468 Aug 6 07:53 ceph.log -rw--- 1 ceph ceph 73 Aug 5 21:01 ceph.mon.keyring [ceph@tm1cldcphal01 ~]$ cat ceph.conf [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 10.18.201.110,10.18.201.76,10.18.201.77 mon_initial_members = tm1cldmonl01, tm1cldmonl02, tm1cldmonl03 fsid = 474a8905-7537-42a6-8edc-1ab9fd2ca5e4 [ceph@tm1cldcphal01 ~]$ Dan O'Reilly UNIX Systems Administration [image: cid:638154011@09122011-048B] 9601 S. Meridian Blvd. Englewood, CO 80112 720-514-6293 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Inktank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com