Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Hi Don, after a lot of trouble due an unfinished setcrushmap, I was able to remove the new EC pool. Load the old crushmap and edit agin. After include an step set_choose_tries 100 in the crushmap the EC pool creation with ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile work without trouble. Due to defect PGs from this test, I remove the cache tier from the old EC pool which gaves the next trouble - but this is another story! Thanks again Udo Am 25.03.2015 20:37, schrieb Don Doerner: More info please: how did you create your EC pool? It's hard to imagine that you could have specified enough PGs to make it impossible to form PGs out of 84 OSDs (I'm assuming your SSDs are in a separate root) but I have to ask... -don- -Original Message- From: Udo Lembke [mailto:ulem...@polarzone.de] Sent: 25 March, 2015 08:54 To: Don Doerner; ceph-us...@ceph.com Subject: Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded Hi Don, thanks for the info! looks that choose_tries set to 200 do the trick. But the setcrushmap takes a long long time (alarming, but the client have still IO)... hope it's finished soon ;-) Udo Am 25.03.2015 16:00, schrieb Don Doerner: Assuming you've calculated the number of PGs reasonably, see here https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/10350k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=Uyb56Qt%2BKVFbsV03VYVYpn8wSfEZJBXMjOz%2BQX5j0fY%3D%0As=b2547ec4aefa0f1b25d47bc813cab344a24c22c2464d4ff2cb199be0ef9b15cf and here https://urldefense.proofpoint.com/v1/url?u=http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/%23crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=Uyb56Qt%2BKVFbsV03VYVYpn8wSfEZJBXMjOz%2BQX5j0fY%3D%0As=09d9aeb34481797e2d8f24938980db3697f26d94e92ff4c72714651181329de9. I'm guessing these will address your issue. That weird number means that no OSD was found/assigned to the PG. -don- -- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Hi, due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and get an strange effect: ceph@admin:~$ ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck unclean since forever, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck undersized for 406.614447, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck undersized for 406.616563, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck degraded for 406.614566, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck degraded for 406.616679, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58] But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647! Where the heck came the 2147483647 from? I do following commands: ceph osd erasure-code-profile set 7hostprofile k=5 m=2 ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile my version: ceph -v ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) I found an issue in my crush-map - one SSD was twice in the map: host ceph-061-ssd { id -16 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 } root ssd { id -13 # do not change unnecessarily # weight 0.780 alg straw hash 0 # rjenkins1 item ceph-01-ssd weight 0.170 item ceph-02-ssd weight 0.170 item ceph-03-ssd weight 0.000 item ceph-04-ssd weight 0.170 item ceph-05-ssd weight 0.170 item ceph-06-ssd weight 0.050 item ceph-07-ssd weight 0.050 item ceph-061-ssd weight 0.000 } Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but after fix the crusmap the issue with the osd 2147483647 still excist. Any idea how to fix that? regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
On Wed, Mar 25, 2015 at 1:20 AM, Udo Lembke ulem...@polarzone.de wrote: Hi, due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and get an strange effect: ceph@admin:~$ ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized This is the big clue: you have two undersized PGs! pg 22.3e5 is stuck unclean since forever, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] 2147483647 is the largest number you can represent in a signed 32-bit integer. There's an output error of some kind which is fixed elsewhere; this should be -1. So for whatever reason (in general it's hard on CRUSH trying to select N entries out of N choices), CRUSH hasn't been able to map an OSD to this slot for you. You'll want to figure out why that is and fix it. -Greg pg 22.240 is stuck unclean since forever, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck undersized for 406.614447, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck undersized for 406.616563, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck degraded for 406.614566, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck degraded for 406.616679, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58] But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647! Where the heck came the 2147483647 from? I do following commands: ceph osd erasure-code-profile set 7hostprofile k=5 m=2 ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile my version: ceph -v ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) I found an issue in my crush-map - one SSD was twice in the map: host ceph-061-ssd { id -16 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 } root ssd { id -13 # do not change unnecessarily # weight 0.780 alg straw hash 0 # rjenkins1 item ceph-01-ssd weight 0.170 item ceph-02-ssd weight 0.170 item ceph-03-ssd weight 0.000 item ceph-04-ssd weight 0.170 item ceph-05-ssd weight 0.170 item ceph-06-ssd weight 0.050 item ceph-07-ssd weight 0.050 item ceph-061-ssd weight 0.000 } Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but after fix the crusmap the issue with the osd 2147483647 still excist. Any idea how to fix that? regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Sorry all: my company's e-mail security got in the way there. Try these references... *http://tracker.ceph.com/issues/10350 * http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon -don- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don Doerner Sent: 25 March, 2015 08:01 To: Udo Lembke; ceph-us...@ceph.com Subject: Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded Assuming you've calculated the number of PGs reasonably, see herehttps://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/10350k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0As=dc3fc62fa581494703a491f5e7090feafb1dc52128f072e3e4d4a5a882ef9c90 and herehttps://urldefense.proofpoint.com/v1/url?u=http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/%23crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0As=1683ddcb2c3bb9c786555c0aad19daaa03b91ad8f3241035f496d16c0e57b552. I'm guessing these will address your issue. That weird number means that no OSD was found/assigned to the PG. -don- -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo Lembke Sent: 25 March, 2015 01:21 To: ceph-us...@ceph.commailto:ceph-us...@ceph.com Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded Hi, due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and get an strange effect: ceph@admin:~$ ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck unclean since forever, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck undersized for 406.614447, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck undersized for 406.616563, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck degraded for 406.614566, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck degraded for 406.616679, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58] But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647! Where the heck came the 2147483647 from? I do following commands: ceph osd erasure-code-profile set 7hostprofile k=5 m=2 ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile my version: ceph -v ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) I found an issue in my crush-map - one SSD was twice in the map: host ceph-061-ssd { id -16 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 } root ssd { id -13 # do not change unnecessarily # weight 0.780 alg straw hash 0 # rjenkins1 item ceph-01-ssd weight 0.170 item ceph-02-ssd weight 0.170 item ceph-03-ssd weight 0.000 item ceph-04-ssd weight 0.170 item ceph-05-ssd weight 0.170 item ceph-06-ssd weight 0.050 item ceph-07-ssd weight 0.050 item ceph-061-ssd weight 0.000 } Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but after fix the crusmap the issue with the osd 2147483647 still excist. Any idea how to fix that? regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comk=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0As=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam
Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Hi Gregory, thanks for the answer! I have look which storage nodes are missing, and it's two differrent: pg 22.240 is stuck undersized for 24437.862139, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.240 is stuck undersized for 24437.862139, current state active+undersized+degraded, last acting [ceph-04,ceph-07,ceph-02,ceph-06,2147483647,ceph-01,ceph-05] ceph-03 is missing pg 22.3e5 is stuck undersized for 24437.860025, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.3e5 is stuck undersized for 24437.860025, current state active+undersized+degraded, last acting [ceph-06,ceph-ceph-02,ceph-07,ceph-01,ceph-05,ceph-03,2147483647] ceph-04 is missing Perhaps I hit an PGs/OSD max?! I look with the script from http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd pool : 17 18 19 9 10 20 21 13 22 23 16 | SUM ... host ceph-03: osd.24 0 12 2 2 4 76 16 5 74 0 66 | 257 osd.25 0 17 3 4 4 89 16 4 82 0 60 | 279 osd.26 0 20 2 5 3 71 12 5 81 0 61 | 260 osd.27 0 18 2 4 3 73 21 3 76 0 61 | 261 osd.28 0 14 2 9 4 73 23 9 94 0 64 | 292 osd.29 0 19 3 3 4 54 25 4 89 0 62 | 263 osd.30 0 22 2 6 3 80 15 6 92 0 47 | 273 osd.31 0 25 4 2 3 87 20 3 76 0 62 | 282 osd.32 0 13 4 2 2 64 14 1 82 0 69 | 251 osd.33 0 12 2 5 5 89 25 7 83 0 68 | 296 osd.34 0 28 0 8 5 81 18 3 99 0 65 | 307 osd.35 0 17 3 2 4 74 21 3 95 0 58 | 277 host ceph-04: osd.36 0 13 1 9 6 72 17 5 93 0 56 | 272 osd.37 0 21 2 5 6 83 20 4 78 0 71 | 290 osd.38 0 17 3 2 5 64 22 7 76 0 57 | 253 osd.39 0 21 3 7 6 79 27 4 80 0 68 | 295 osd.40 0 15 1 5 7 71 17 6 93 0 74 | 289 osd.41 0 16 5 5 6 76 18 6 95 0 70 | 297 osd.42 0 13 0 6 1 71 25 4 83 0 56 | 259 osd.43 0 20 2 2 6 81 23 4 89 0 59 | 286 osd.44 0 21 2 5 6 77 9 5 76 0 52 | 253 osd.45 0 11 4 8 3 76 24 6 82 0 49 | 263 osd.46 0 17 2 5 6 57 15 4 84 0 62 | 252 osd.47 0 19 3 2 3 84 19 5 94 0 48 | 277 ... SUM : 768 1536192 384 384 61441536384 7168 24 5120| Pool 22 is the new ec7archiv. But on ceph-04 there aren't OSD with more than 300 PGs... Udo Am 25.03.2015 14:52, schrieb Gregory Farnum: On Wed, Mar 25, 2015 at 1:20 AM, Udo Lembke ulem...@polarzone.de wrote: Hi, due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and get an strange effect: ceph@admin:~$ ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized This is the big clue: you have two undersized PGs! pg 22.3e5 is stuck unclean since forever, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] 2147483647 is the largest number you can represent in a signed 32-bit integer. There's an output error of some kind which is fixed elsewhere; this should be -1. So for whatever reason (in general it's hard on CRUSH trying to select N entries out of N choices), CRUSH hasn't been able to map an OSD to this slot for you. You'll want to figure out why that is and fix it. -Greg pg 22.240 is stuck unclean since forever, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg
Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Hi Don, thanks for the info! looks that choose_tries set to 200 do the trick. But the setcrushmap takes a long long time (alarming, but the client have still IO)... hope it's finished soon ;-) Udo Am 25.03.2015 16:00, schrieb Don Doerner: Assuming you've calculated the number of PGs reasonably, see here http://tracker.ceph.com/issues/10350 and here http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/. I’m guessing these will address your issue. That weird number means that no OSD was found/assigned to the PG. -don- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded
Assuming you've calculated the number of PGs reasonably, see herehttp://tracker.ceph.com/issues/10350 and herehttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/. I'm guessing these will address your issue. That weird number means that no OSD was found/assigned to the PG. -don- -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo Lembke Sent: 25 March, 2015 01:21 To: ceph-us...@ceph.com Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded Hi, due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and get an strange effect: ceph@admin:~$ ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck unclean since forever, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck undersized for 406.614447, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck undersized for 406.616563, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is stuck degraded for 406.614566, current state active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647] pg 22.240 is stuck degraded for 406.616679, current state active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58] pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58] But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647! Where the heck came the 2147483647 from? I do following commands: ceph osd erasure-code-profile set 7hostprofile k=5 m=2 ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile my version: ceph -v ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) I found an issue in my crush-map - one SSD was twice in the map: host ceph-061-ssd { id -16 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 } root ssd { id -13 # do not change unnecessarily # weight 0.780 alg straw hash 0 # rjenkins1 item ceph-01-ssd weight 0.170 item ceph-02-ssd weight 0.170 item ceph-03-ssd weight 0.000 item ceph-04-ssd weight 0.170 item ceph-05-ssd weight 0.170 item ceph-06-ssd weight 0.050 item ceph-07-ssd weight 0.050 item ceph-061-ssd weight 0.000 } Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but after fix the crusmap the issue with the osd 2147483647 still excist. Any idea how to fix that? regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comk=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0As=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b -- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com