Re: [ceph-users] pg is stuck stale (osd.21 still removed)
Hi ceph-users, any idea to fix my cluster? OSD.21 removed, but still some (staled) PG's pointing to OSD.21... I don't know how to proceed... Help is very welcome! Best regards Daniel > -Original Message- > From: Daniel Schwager > Sent: Friday, January 08, 2016 3:10 PM > To: 'ceph-us...@ceph.com' > Subject: pg is stuck stale (osd.21 still removed) > > Hi, > > we had a HW-problem with OSD.21 today. The OSD daemon was down and "smartctl" > told me about some > hardware errors. > > I decided to remove the HDD: > > ceph osd out 21 > ceph osd crush remove osd.21 > ceph auth del osd.21 > ceph osd rm osd.21 > > But afterwards I saw that I have some stucked pg's for osd.21: > > root@ceph-admin:~# ceph -w > cluster c7b12656-15a6-41b0-963f-4f47c62497dc >health HEALTH_WARN > 50 pgs stale > 50 pgs stuck stale >monmap e4: 3 mons at > {ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph- > mon3=192.168.135.33:6789/0} > election epoch 404, quorum 0,1,2 > ceph-mon1,ceph-mon2,ceph-mon3 >mdsmap e136: 1/1/1 up {0=ceph-mon1=up:active} >osdmap e18259: 23 osds: 23 up, 23 in > pgmap v47879105: 6656 pgs, 10 pools, 23481 GB data, 6072 kobjects > 54974 GB used, 30596 GB / 85571 GB avail > 6605 active+clean > 50 stale+active+clean > 1 active+clean+scrubbing+deep > > root@ceph-admin:~# ceph health > HEALTH_WARN 50 pgs stale; 50 pgs stuck stale > > root@ceph-admin:~# ceph health detail > HEALTH_WARN 50 pgs stale; 50 pgs stuck stale; noout flag(s) set > pg 34.225 is stuck stale for 98780.399254, current state > stale+active+clean, last acting [21] > pg 34.186 is stuck stale for 98780.399195, current state > stale+active+clean, last acting [21] > ... > > root@ceph-admin:~# ceph pg 34.225 query > Error ENOENT: i don't have pgid 34.225 > > root@ceph-admin:~# ceph pg 34.225 list_missing > Error ENOENT: i don't have pgid 34.225 > > root@ceph-admin:~# ceph osd lost 21 --yes-i-really-mean-it > osd.21 is not down or doesn't exist > > # checking the crushmap > ceph osd getcrushmap -o crush.map > crushtool -d crush.map -o crush.txt > root@ceph-admin:~# grep 21 crush.txt > -> nothing here > > > Of course, I cannot start OSD.21, because it's not available anymore - I > removed it. > > Is there a way to remap the stucked pg's to other OSD's than osd.21 > > One more - I tried to recreate the pg but now this pg this "stuck inactive": > > root@ceph-admin:~# ceph pg force_create_pg 34.225 > pg 34.225 now creating, ok > > root@ceph-admin:~# ceph health detail > HEALTH_WARN 49 pgs stale; 1 pgs stuck inactive; 49 pgs stuck stale; 1 > pgs stuck unclean > pg 34.225 is stuck inactive since forever, current state creating, last > acting [] > pg 34.225 is stuck unclean since forever, current state creating, last > acting [] > pg 34.186 is stuck stale for 118481.013632, current state > stale+active+clean, last acting [21] > ... > > Maybe somebody has an idea how to fix this situation? smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg is stuck stale (osd.21 still removed)
Hi Daniel, On Friday, January 8, 2016, Daniel Schwagerwrote: > One more - I tried to recreate the pg but now this pg this "stuck > inactive": > > root@ceph-admin:~# ceph pg force_create_pg 34.225 > pg 34.225 now creating, ok > > root@ceph-admin:~# ceph health detail > HEALTH_WARN 49 pgs stale; 1 pgs stuck inactive; 49 pgs stuck > stale; 1 pgs stuck unclean > pg 34.225 is stuck inactive since forever, current state creating, > last acting [] > pg 34.225 is stuck unclean since forever, current state creating, > last acting [] > pg 34.186 is stuck stale for 118481.013632, current state > stale+active+clean, last acting [21] > ... > > Maybe somebody has an idea how to fix this situation? I don't unfortunately have the answers, but maybe the following links will help you make some progress: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17820.html https://ceph.com/community/incomplete-pgs-oh-my/ Good luck, Alex > > regards > Danny > > > -- -- Alex Gorbachev Storcium ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg is stuck stale (osd.21 still removed) - SOLVED.
Well, ok - I found the solution: ceph health detail HEALTH_WARN 50 pgs stale; 50 pgs stuck stale pg 34.225 is stuck inactive since forever, current state creating, last acting [] pg 34.225 is stuck unclean since forever, current state creating, last acting [] pg 34.226 is stuck stale for 77328.923060, current state stale+active+clean, last acting [21] pg 34.3cb is stuck stale for 77328.923213, current state stale+active+clean, last acting [21] root@ceph-admin:~# ceph pg map 34.225 osdmap e18263 pg 34.225 (34.225) -> up [16] acting [16] After restart osd.16, pg 34.225 is fine. So, I recreate all the broken PG's: for pg in `ceph health detail | grep stale | cut -d' ' -f2`; do ceph pg force_create_pg $pg; done and restart all (or the necessary) OSD's.. Now, the cluster is HEALTH_OK again. root@ceph-admin:~# ceph health HEALTH_OK Best regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg is stuck stale (osd.21 still removed)
One more - I tried to recreate the pg but now this pg this "stuck inactive": root@ceph-admin:~# ceph pg force_create_pg 34.225 pg 34.225 now creating, ok root@ceph-admin:~# ceph health detail HEALTH_WARN 49 pgs stale; 1 pgs stuck inactive; 49 pgs stuck stale; 1 pgs stuck unclean pg 34.225 is stuck inactive since forever, current state creating, last acting [] pg 34.225 is stuck unclean since forever, current state creating, last acting [] pg 34.186 is stuck stale for 118481.013632, current state stale+active+clean, last acting [21] ... Maybe somebody has an idea how to fix this situation? regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg is stuck stale (osd.21 still removed)
Hi, we had a HW-problem with OSD.21 today. The OSD daemon was down and "smartctl" told me about some hardware errors. I decided to remove the HDD: ceph osd out 21 ceph osd crush remove osd.21 ceph auth del osd.21 ceph osd rm osd.21 But afterwards I saw that I have some stucked pg's for osd.21: root@ceph-admin:~# ceph -w cluster c7b12656-15a6-41b0-963f-4f47c62497dc health HEALTH_WARN 50 pgs stale 50 pgs stuck stale monmap e4: 3 mons at {ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph-mon3=192.168.135.33:6789/0} election epoch 404, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3 mdsmap e136: 1/1/1 up {0=ceph-mon1=up:active} osdmap e18259: 23 osds: 23 up, 23 in pgmap v47879105: 6656 pgs, 10 pools, 23481 GB data, 6072 kobjects 54974 GB used, 30596 GB / 85571 GB avail 6605 active+clean 50 stale+active+clean 1 active+clean+scrubbing+deep root@ceph-admin:~# ceph health HEALTH_WARN 50 pgs stale; 50 pgs stuck stale root@ceph-admin:~# ceph health detail HEALTH_WARN 50 pgs stale; 50 pgs stuck stale; noout flag(s) set pg 34.225 is stuck stale for 98780.399254, current state stale+active+clean, last acting [21] pg 34.186 is stuck stale for 98780.399195, current state stale+active+clean, last acting [21] ... root@ceph-admin:~# ceph pg 34.225 query Error ENOENT: i don't have pgid 34.225 root@ceph-admin:~# ceph pg 34.225 list_missing Error ENOENT: i don't have pgid 34.225 root@ceph-admin:~# ceph osd lost 21 --yes-i-really-mean-it osd.21 is not down or doesn't exist # checking the crushmap ceph osd getcrushmap -o crush.map crushtool -d crush.map -o crush.txt root@ceph-admin:~# grep 21 crush.txt -> nothing here Of course, I cannot start OSD.21, because it's not available anymore - I removed it. Is there a way to remap the stucked pg's to other OSD's than osd.21? How can I help my cluster (ceph 0.94.2)? best regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com