Re: [ceph-users] Need help for PG problem
Hi Reddy, It's over a thousand lines, I pasted it on gist: https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4 On Tue, 22 Mar 2016 at 18:15 M Ranga Swami Reddy <swamire...@gmail.com> wrote: > Hi, > Can you please share the "ceph health detail" output? > > Thanks > Swami > > On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang <dotslash...@gmail.com> > wrote: > > Hi all, > > > > I have 20 OSDs and 1 pool, and, as recommended by the > > doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/), > I > > configured pg_num and pgp_num to 4096, size 2, min size 1. > > > > But ceph -s shows: > > > > HEALTH_WARN > > 534 pgs degraded > > 551 pgs stuck unclean > > 534 pgs undersized > > too many PGs per OSD (382 > max 300) > > > > Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work? And what > > does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has > 382 > > PGs I would have had 7640 PGs. > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Need help for PG problem
Hi all, I have 20 OSDs and 1 pool, and, as recommended by the doc( http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I configured pg_num and pgp_num to 4096, size 2, min size 1. But ceph -s shows: HEALTH_WARN 534 pgs degraded 551 pgs stuck unclean 534 pgs undersized too many PGs per OSD (382 > max 300) Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work? And what does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has 382 PGs I would have had 7640 PGs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help for PG problem
I got it, the pg_num suggested is the total, I need to divide it by the number of replications. Thanks Oliver, your answer is very thorough and helpful! On 23 March 2016 at 02:19, Oliver Dzombic <i...@ip-interactive.de> wrote: > Hi Zhang, > > yeah i saw your answer already. > > At very first, you should make sure that there is no clock skew. > This can cause some sideeffects. > > > > According to > > http://docs.ceph.com/docs/master/rados/operations/placement-groups/ > > you have to: > > (OSDs * 100) > Total PGs = > pool size > > > Means: > > 20 OSD's of you * 100 = 2000 > > Poolsize is: > > Where pool size is either the number of replicas for replicated pools or > the K+M sum for erasure coded pools (as returned by ceph osd > erasure-code-profile get). > > -- > > So lets say, you have 2 replications, you should have 1000 PG's. > > If you have 3 replications, you should have 2000 / 3 = 666 PG's. > > But you configured 4096 PGs. Thats simply far too much. > > Reduce it. Or, if you can not, get more OSD's into this. > > I dont know any other way. > > Good luck ! > > -- > Mit freundlichen Gruessen / Best regards > > Oliver Dzombic > IP-Interactive > > mailto:i...@ip-interactive.de > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 22.03.2016 um 19:02 schrieb Zhang Qiang: > > Hi Oliver, > > > > Thanks for your reply to my question on Ceph mailing list. I somehow > > wasn't able to receive your reply in my mailbox, but I saw your reply in > > the archive, so I have to mail you personally. > > > > I have pasted the whole ceph health output on gist: > > https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4 > > > > Hope this will help. Thank you! > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help for PG problem
And here's the osd tree if it matters. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 22.39984 root default -2 21.39984 host 10 0 1.06999 osd.0up 1.0 1.0 1 1.06999 osd.1up 1.0 1.0 2 1.06999 osd.2up 1.0 1.0 3 1.06999 osd.3up 1.0 1.0 4 1.06999 osd.4up 1.0 1.0 5 1.06999 osd.5up 1.0 1.0 6 1.06999 osd.6up 1.0 1.0 7 1.06999 osd.7up 1.0 1.0 8 1.06999 osd.8up 1.0 1.0 9 1.06999 osd.9up 1.0 1.0 10 1.06999 osd.10 up 1.0 1.0 11 1.06999 osd.11 up 1.0 1.0 12 1.06999 osd.12 up 1.0 1.0 13 1.06999 osd.13 up 1.0 1.0 14 1.06999 osd.14 up 1.0 1.0 15 1.06999 osd.15 up 1.0 1.0 16 1.06999 osd.16 up 1.0 1.0 17 1.06999 osd.17 up 1.0 1.0 18 1.06999 osd.18 up 1.0 1.0 19 1.06999 osd.19 up 1.0 1.0 -3 1.0 host 148_96 0 1.0 osd.0up 1.0 1.0 On Wed, 23 Mar 2016 at 19:10 Zhang Qiang <dotslash...@gmail.com> wrote: > Oliver, Goncalo, > > Sorry to disturb again, but recreating the pool with a smaller pg_num > didn't seem to work, now all 666 pgs are degraded + undersized. > > New status: > cluster d2a69513-ad8e-4b25-8f10-69c4041d624d > health HEALTH_WARN > 666 pgs degraded > 82 pgs stuck unclean > 666 pgs undersized > monmap e5: 5 mons at {1= > 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 > } > election epoch 28, quorum 0,1,2,3,4 > GGZ-YG-S0311-PLATFORM-138,1,2,3,4 > osdmap e705: 20 osds: 20 up, 20 in > pgmap v1961: 666 pgs, 1 pools, 0 bytes data, 0 objects > 13223 MB used, 20861 GB / 21991 GB avail > 666 active+undersized+degraded > > Only one pool and its size is 3. So I think according to the algorithm, > (20 * 100) / 3 = 666 pgs is reasonable. > > I updated health detail and also attached a pg query result on gist( > https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4). > > On Wed, 23 Mar 2016 at 09:01 Dotslash Lu <dotslash...@gmail.com> wrote: > >> Hello Gonçalo, >> >> Thanks for your reminding. I was just setting up the cluster for test, so >> don't worry, I can just remove the pool. And I learnt that since the >> replication number and pool number are related to pg_num, I'll consider >> them carefully before deploying any data. >> >> On Mar 23, 2016, at 6:58 AM, Goncalo Borges <goncalo.bor...@sydney.edu.au> >> wrote: >> >> Hi Zhang... >> >> If I can add some more info, the change of PGs is a heavy operation, and >> as far as i know, you should NEVER decrease PGs. From the notes in pgcalc ( >> http://ceph.com/pgcalc/): >> >> "It's also important to know that the PG count can be increased, but >> NEVER decreased without destroying / recreating the pool. However, >> increasing the PG Count of a pool is one of the most impactful events in a >> Ceph Cluster, and should be avoided for production clusters if possible." >> >> So, in your case, I would consider in adding more OSDs. >> >> Cheers >> Goncalo >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help for PG problem
Oliver, Goncalo, Sorry to disturb again, but recreating the pool with a smaller pg_num didn't seem to work, now all 666 pgs are degraded + undersized. New status: cluster d2a69513-ad8e-4b25-8f10-69c4041d624d health HEALTH_WARN 666 pgs degraded 82 pgs stuck unclean 666 pgs undersized monmap e5: 5 mons at {1= 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 } election epoch 28, quorum 0,1,2,3,4 GGZ-YG-S0311-PLATFORM-138,1,2,3,4 osdmap e705: 20 osds: 20 up, 20 in pgmap v1961: 666 pgs, 1 pools, 0 bytes data, 0 objects 13223 MB used, 20861 GB / 21991 GB avail 666 active+undersized+degraded Only one pool and its size is 3. So I think according to the algorithm, (20 * 100) / 3 = 666 pgs is reasonable. I updated health detail and also attached a pg query result on gist( https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4). On Wed, 23 Mar 2016 at 09:01 Dotslash Luwrote: > Hello Gonçalo, > > Thanks for your reminding. I was just setting up the cluster for test, so > don't worry, I can just remove the pool. And I learnt that since the > replication number and pool number are related to pg_num, I'll consider > them carefully before deploying any data. > > On Mar 23, 2016, at 6:58 AM, Goncalo Borges > wrote: > > Hi Zhang... > > If I can add some more info, the change of PGs is a heavy operation, and > as far as i know, you should NEVER decrease PGs. From the notes in pgcalc ( > http://ceph.com/pgcalc/): > > "It's also important to know that the PG count can be increased, but NEVER > decreased without destroying / recreating the pool. However, increasing the > PG Count of a pool is one of the most impactful events in a Ceph Cluster, > and should be avoided for production clusters if possible." > > So, in your case, I would consider in adding more OSDs. > > Cheers > Goncalo > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help for PG problem
Yes it was the crush map. I updated it, distributed 20 OSDs across 2 hosts correctly, finally all pgs are healthy. Thanks guys, I really appreciate your help! On Thu, 24 Mar 2016 at 07:25 Goncalo Borges <goncalo.bor...@sydney.edu.au> wrote: > Hi Zhang... > > I think you are dealing with two different problems. > > The first problem refers to number of PGs per OSD. That was already > discussed, and now there is no more messages concerning it. > > The second problem you are experiencing seems to be that all your OSDs are > under the same host. Besides that, osd.0 appears twice in two different > hosts (I do not really know why is that happening). If you are using the > default crush rules, ceph is not able to replicate objects (even with size > 2) across two different hosts because all your OSDs are just in one host. > > Cheers > Goncalo > > -- > *From:* Zhang Qiang [dotslash...@gmail.com] > *Sent:* 23 March 2016 23:17 > *To:* Goncalo Borges > *Cc:* Oliver Dzombic; ceph-users > *Subject:* Re: [ceph-users] Need help for PG problem > > And here's the osd tree if it matters. > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 22.39984 root default > -2 21.39984 host 10 > 0 1.06999 osd.0up 1.0 1.0 > 1 1.06999 osd.1up 1.0 1.0 > 2 1.06999 osd.2up 1.0 1.0 > 3 1.06999 osd.3up 1.0 1.0 > 4 1.06999 osd.4up 1.0 1.0 > 5 1.06999 osd.5up 1.0 1.0 > 6 1.06999 osd.6up 1.0 1.0 > 7 1.06999 osd.7up 1.0 1.0 > 8 1.06999 osd.8up 1.0 1.0 > 9 1.06999 osd.9up 1.0 1.0 > 10 1.06999 osd.10 up 1.0 1.0 > 11 1.06999 osd.11 up 1.0 1.0 > 12 1.06999 osd.12 up 1.0 1.0 > 13 1.06999 osd.13 up 1.0 1.0 > 14 1.06999 osd.14 up 1.0 1.0 > 15 1.06999 osd.15 up 1.0 1.0 > 16 1.06999 osd.16 up 1.0 1.0 > 17 1.06999 osd.17 up 1.0 1.0 > 18 1.06999 osd.18 up 1.0 1.0 > 19 1.06999 osd.19 up 1.0 1.0 > -3 1.0 host 148_96 > 0 1.0 osd.0up 1.0 1.0 > > On Wed, 23 Mar 2016 at 19:10 Zhang Qiang <dotslash...@gmail.com > <http://redir.aspx?REF=7XhgTE6Jvg0jJH-IYNTGkgF858R1R8uarnbreTlxmaNI42sab1PTCAFtYWlsdG86ZG90c2xhc2gubHVAZ21haWwuY29t>> > wrote: > >> Oliver, Goncalo, >> >> Sorry to disturb again, but recreating the pool with a smaller pg_num >> didn't seem to work, now all 666 pgs are degraded + undersized. >> >> New status: >> cluster d2a69513-ad8e-4b25-8f10-69c4041d624d >> health HEALTH_WARN >> 666 pgs degraded >> 82 pgs stuck unclean >> 666 pgs undersized >> monmap e5: 5 mons at {1= >> 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 >> <http://redir.aspx?REF=eHahCJ6Vheno1kM9Y6hJVYyLtJjtbgztCcJvnwMZRopI42sab1PTCAFodHRwOi8vMTAuMy4xMzguMzc6Njc4OS8wLDI9MTAuMy4xMzguMzk6Njc4OS8wLDM9MTAuMy4xMzguNDA6Njc4OS8wLDQ9MTAuMy4xMzguNTk6Njc4OS8wLEdHWi1ZRy1TMDMxMS1QTEFURk9STS0xMzg9MTAuMy4xMzguMzY6Njc4OS8w> >> } >> election epoch 28, quorum 0,1,2,3,4 >> GGZ-YG-S0311-PLATFORM-138,1,2,3,4 >> osdmap e705: 20 osds: 20 up, 20 in >> pgmap v1961: 666 pgs, 1 pools, 0 bytes data, 0 objects >> 13223 MB used, 20861 GB / 21991 GB avail >> 666 active+undersized+degraded >> >> Only one pool and its size is 3. So I think according to the algorithm, >> (20 * 100) / 3 = 666 pgs is reasonable. >> >> I updated health detail and also attached a pg query result on gist( >> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4 >> <http://redir.aspx?REF=Re0O2_zDHLnX00Zf3IrX215GKBz2CkZCKo_yIyQwqm1I42sab1PTCAFodHRwczovL2dpc3QuZ2l0aHViLmNvbS9kb3RTbGFzaEx1LzIyNjIzYjRjZWZhMDZhNDZlMGQ0> >> ). >> >> On Wed, 23 Mar 2016 at 09:01 Dotslash Lu <dotslash...@gmail.com >> <http://redir.aspx?REF=7XhgTE6Jvg0jJH-IYNTGkgF858R1R8uarnbreTlxmaNI42sab1PTCAFtYWlsdG86ZG90c2xhc2gubHVAZ21haWwuY29t>> >> wrote: >>
Re: [ceph-users] Ceph-fuse huge performance gap between different block sizes
Hi Christian, Thanks for your reply, here're the test specs: >>> [global] ioengine=libaio runtime=90 direct=1 group_reporting iodepth=16 ramp_time=5 size=1G [seq_w_4k_20] bs=4k filename=seq_w_4k_20 rw=write numjobs=20 [seq_w_1m_20] bs=1m filename=seq_w_1m_20 rw=write numjobs=20 <<<< Test results: 4k - aggrb=13245KB/s, 1m - aggrb=1102.6MB/s Mount options: ceph-fuse /ceph -m 10.3.138.36:6789 Ceph configurations: >>>> filestore_xattr_use_omap = true auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 128 osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 512 osd pool default pgp num = 512 osd crush chooseleaf type = 1 <<<< Other configurations are all default. Status: health HEALTH_OK monmap e5: 5 mons at {1= 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 } election epoch 28, quorum 0,1,2,3,4 GGZ-YG-S0311-PLATFORM-138,1,2,3,4 mdsmap e55: 1/1/1 up {0=1=up:active} osdmap e1290: 20 osds: 20 up, 20 in pgmap v7180: 1000 pgs, 2 pools, 14925 MB data, 3851 objects 37827 MB used, 20837 GB / 21991 GB avail 1000 active+clean On Fri, 25 Mar 2016 at 16:44 Christian Balzer <ch...@gol.com> wrote: > > Hello, > > On Fri, 25 Mar 2016 08:11:27 + Zhang Qiang wrote: > > > Hi all, > > > > According to fio, > Exact fio command please. > > >with 4k block size, the sequence write performance of > > my ceph-fuse mount > > Exact mount options, ceph config (RBD cache) please. > > >is just about 20+ M/s, only 200 Mb of 1 Gb full > > duplex NIC outgoing bandwidth was used for maximum. But for 1M block > > size the performance could achieve as high as 1000 M/s, approaching the > > limit of the NIC bandwidth. Why the performance stats differs so mush > > for different block sizes? > That's exactly why. > You can see that with local attached storage as well, many small requests > are slower than large (essential sequential) writes. > Network attached storage in general (latency) and thus Ceph as well (plus > code overhead) amplify that. > > >Can I configure ceph-fuse mount's block size > > for maximum performance? > > > Very little to do with that if you're using sync writes (thus the fio > command line pleasE), if not RBD cache could/should help. > > Christian > > > Basic information about the cluster: 20 OSDs on separate PCIe hard disks > > distributed across 2 servers, each with write performance about 300 M/s; > > 5 MONs; 1 MDS. Ceph version 0.94.6 > > (e832001feaf8c176593e0325c8298e3f16dfb403). > > > > Thanks :) > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph-fuse huge performance gap between different block sizes
Hi all, According to fio, with 4k block size, the sequence write performance of my ceph-fuse mount is just about 20+ M/s, only 200 Mb of 1 Gb full duplex NIC outgoing bandwidth was used for maximum. But for 1M block size the performance could achieve as high as 1000 M/s, approaching the limit of the NIC bandwidth. Why the performance stats differs so mush for different block sizes? Can I configure ceph-fuse mount's block size for maximum performance? Basic information about the cluster: 20 OSDs on separate PCIe hard disks distributed across 2 servers, each with write performance about 300 M/s; 5 MONs; 1 MDS. Ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403). Thanks :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help for PG problem
I adjusted the crush map, everything's OK now. Thanks for your help! On Wed, 23 Mar 2016 at 23:13 Matt Conner <matt.con...@keepertech.com> wrote: > Hi Zhang, > > In a 2 copy pool, each placement group is spread across 2 OSDs - that is > why you see such a high number of placement groups per OSD. There is a PG > calculator at http://ceph.com/pgcalc/. Based on your setup, it may be > worth using 2048 instead of 4096. > > As for stuck/degraded PGs, most are reporting as being on osd.0. Looking > at your OSD Tree, you somehow have 21 OSDs being reported with 2 being > labeled as osd.0; both up and in. I'd recommend trying to get rid of the > one listed on host 148_96 and see if it clears the issues. > > > > On Tue, Mar 22, 2016 at 6:28 AM, Zhang Qiang <dotslash...@gmail.com> > wrote: > >> Hi Reddy, >> It's over a thousand lines, I pasted it on gist: >> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4 >> >> On Tue, 22 Mar 2016 at 18:15 M Ranga Swami Reddy <swamire...@gmail.com> >> wrote: >> >>> Hi, >>> Can you please share the "ceph health detail" output? >>> >>> Thanks >>> Swami >>> >>> On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang <dotslash...@gmail.com> >>> wrote: >>> > Hi all, >>> > >>> > I have 20 OSDs and 1 pool, and, as recommended by the >>> > doc( >>> http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I >>> > configured pg_num and pgp_num to 4096, size 2, min size 1. >>> > >>> > But ceph -s shows: >>> > >>> > HEALTH_WARN >>> > 534 pgs degraded >>> > 551 pgs stuck unclean >>> > 534 pgs undersized >>> > too many PGs per OSD (382 > max 300) >>> > >>> > Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work? And >>> what >>> > does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has >>> 382 >>> > PGs I would have had 7640 PGs. >>> > >>> > ___ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] yum installed jewel doesn't provide systemd scripts
I installed jewel el7 via yum on CentOS 7.1, but it seems no systemd scripts are available. But I do find there's a folder named 'systemd' in the source, so maybe we forgot to build it into the package? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All OSDs are down with addr :/0
Hi, I need help for deploying jewel OSDs on CentOS 7. Following the guide, I have successfully run OSD daemons but all of them are down according to `ceph -s`: 15/15 in osds are down No errors in /var/log/ceph/ceph-osd.1.log, it just stoped at these lines and never made progress: 2016-05-09 01:32:03.187802 7f35acb4a700 0 osd.0 100 crush map has features 2200130813952, adjusting msgr requires for clients 2016-05-09 01:32:03.187841 7f35acb4a700 0 osd.0 100 crush map has features 2200130813952 was 2199057080833, adjusting msgr requires for mons 2016-05-09 01:32:03.187859 7f35acb4a700 0 osd.0 100 crush map has features 2200130813952, adjusting msgr requires for osds ceph health details shows: osd.0 is down since epoch 0, last address :/0 Why the address is :/0? Am I configuring it wrong? I've followed the OSD troubleshooting guide but with no luck. And the network seems good, since the ports are telnet-able, and I can do ceph -s on the OSD machine. ceph.conf: [global] fsid = fad5f8d4-f5f6-425d-b035-a018614c0664 mon osd full ratio = .75 mon osd nearfull ratio = .65 auth cluster required = cephx auth service requried = cephx auth client required = cephx mon initial members = mon_vm_1,mon_vm_2,mon_vm_3 mon host = 10.3.1.94,10.3.1.95,10.3.1.96 [mon.a] host = mon_vm_1 mon addr = 10.3.1.94 [mon.b] host = mon_vm_2 mon addr = 10.3.1.95 [mon.c] host = mon_vm_3 mon addr = 10.3.1.96 [osd] osd journal size = 10240 osd pool default size = 3 osd pool default min size = 2 osd pool default pg num = 512 osd pool default pgp num = 512 osd crush chooseleaf type = 1 osd journal = /ceph_journal/$id ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.94 OSD crashes
Thanks Wang, looks like so, not Ceph to blame :) On 25 October 2016 at 09:59, Haomai Wang <hao...@xsky.com> wrote: > could you check dmesg? I think there exists disk EIO error > > On Tue, Oct 25, 2016 at 9:58 AM, Zhang Qiang <dotslash...@gmail.com> > wrote: > >> Hi, >> >> One of several OSDs on the same machine crashed several times within >> days. It's always that one, other OSDs are all fine. Below is the dumped >> message, since it's too long here, I only pasted the head and tail of the >> recent events. If it's necessary to inspect the full log, please see >> https://gist.github.com/dotSlashLu/3e8ca9491fbf07636a4583244ac23f80. >> >> 2016-10-24 18:52:06.216341 7f307c22f700 -1 os/FileStore.cc: In function >> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, size_t, >> ceph::bufferlist&, uint32_t, bool)' thread 7f307c22f700 time 2016-10-24 >> 18:52:06.213123 >> os/FileStore.cc: 2854: FAILED assert(allow_eio || !m_filestore_fail_eio >> || got != -5) >> >> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x85) [0xbc9195] >> 2: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned >> long, ceph::buffer::list&, unsigned int, bool)+0xc94) [0x909f34] >> 3: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, >> ScrubMap::object&, ThreadPool::TPHandle&)+0x311) [0x9fe0e1] >> 4: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, >> std::allocator > const&, bool, unsigned int, >> ThreadPool::TPHandle&)+0x2e8) [0x8ce8c8] >> 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, >> unsigned int, ThreadPool::TPHandle&)+0x213) [0x7def53] >> 6: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x4c2) >> [0x7df722] >> 7: (OSD::RepScrubWQ::_process(MOSDRepScrub*, >> ThreadPool::TPHandle&)+0xbe) [0x6dcade] >> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbb9966] >> 9: (ThreadPool::WorkThread::entry()+0x10) [0xbba9f0] >> 10: (()+0x7dc5) [0x7f309cd26dc5] >> 11: (clone()+0x6d) [0x7f309b80821d] >> NOTE: a copy of the executable, or `objdump -rdS ` is needed >> to interpret this. >> >> --- begin dump of recent events --- >> -1> 2016-10-24 18:51:34.341035 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.56:6821/4808 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x175a2c00 con 0x1526a940 >> -> 2016-10-24 18:51:34.341046 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.61:6817/4808 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x175a3600 con 0x15269fa0 >> -9998> 2016-10-24 18:51:34.341058 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.56:6823/5402 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x12aaa400 con 0x27bc9080 >> -9997> 2016-10-24 18:51:34.341069 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.61:6821/5402 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x1f89ec00 con 0x27bc91e0 >> -9996> 2016-10-24 18:51:34.341080 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.56:6824/6216 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0xaa16000 con 0x175b0c00 >> -9995> 2016-10-24 18:51:34.341090 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.61:6818/6216 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x23b87800 con 0x175ae160 >> -9994> 2016-10-24 18:51:34.341101 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.57:6802/23367 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x258ed400 con 0x17500d60 >> -9993> 2016-10-24 18:51:34.341113 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.62:6806/23367 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x242bb000 con 0x175019c0 >> -9992> 2016-10-24 18:51:34.341128 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.57:6805/25009 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x28e41c00 con 0x1744aec0 >> -9991> 2016-10-24 18:51:34.341139 7f307b22d700 1 -- 10.3.149.62:0/25857 >> --> 10.3.149.62:6805/25009 -- osd_ping(ping e3014 stamp 2016-10-24 >> 18:51:34.340550) v2 -- ?+0 0x10be5200 con 0x175bf8c0 >> -9990> 2016-10-24 18:51:34.341130 7f3088a48700 1 -- 10.3.149.62:0/25857 >> <== osd.1 10.3.149.55:6835/2010188 187557 osd_ping(ping_reply e301
[ceph-users] v0.94 OSD crashes
Hi, One of several OSDs on the same machine crashed several times within days. It's always that one, other OSDs are all fine. Below is the dumped message, since it's too long here, I only pasted the head and tail of the recent events. If it's necessary to inspect the full log, please see https://gist.github.com/dotSlashLu/3e8ca9491fbf07636a4583244ac23f80. 2016-10-24 18:52:06.216341 7f307c22f700 -1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, size_t, ceph::bufferlist&, uint32_t, bool)' thread 7f307c22f700 time 2016-10-24 18:52:06.213123 os/FileStore.cc: 2854: FAILED assert(allow_eio || !m_filestore_fail_eio || got != -5) ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc9195] 2: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)+0xc94) [0x909f34] 3: (ReplicatedBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&)+0x311) [0x9fe0e1] 4: (PGBackend::be_scan_list(ScrubMap&, std::vectorconst&, bool, unsigned int, ThreadPool::TPHandle&)+0x2e8) [0x8ce8c8] 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x213) [0x7def53] 6: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x4c2) [0x7df722] 7: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbe) [0x6dcade] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbb9966] 9: (ThreadPool::WorkThread::entry()+0x10) [0xbba9f0] 10: (()+0x7dc5) [0x7f309cd26dc5] 11: (clone()+0x6d) [0x7f309b80821d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -1> 2016-10-24 18:51:34.341035 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.56:6821/4808 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x175a2c00 con 0x1526a940 -> 2016-10-24 18:51:34.341046 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.61:6817/4808 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x175a3600 con 0x15269fa0 -9998> 2016-10-24 18:51:34.341058 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.56:6823/5402 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x12aaa400 con 0x27bc9080 -9997> 2016-10-24 18:51:34.341069 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.61:6821/5402 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x1f89ec00 con 0x27bc91e0 -9996> 2016-10-24 18:51:34.341080 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.56:6824/6216 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0xaa16000 con 0x175b0c00 -9995> 2016-10-24 18:51:34.341090 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.61:6818/6216 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x23b87800 con 0x175ae160 -9994> 2016-10-24 18:51:34.341101 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.57:6802/23367 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x258ed400 con 0x17500d60 -9993> 2016-10-24 18:51:34.341113 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.62:6806/23367 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x242bb000 con 0x175019c0 -9992> 2016-10-24 18:51:34.341128 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.57:6805/25009 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x28e41c00 con 0x1744aec0 -9991> 2016-10-24 18:51:34.341139 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.62:6805/25009 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x10be5200 con 0x175bf8c0 -9990> 2016-10-24 18:51:34.341130 7f3088a48700 1 -- 10.3.149.62:0/25857 <== osd.1 10.3.149.55:6835/2010188 187557 osd_ping(ping_reply e3014 stamp 2016-10-24 18:51:34.340550) v2 47+0+0 (1550182756 0 0) 0x1a83bc00 con 0x7874580 -9989> 2016-10-24 18:51:34.341151 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.57:6814/26469 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x1f48aa00 con 0x175bfa20 -9988> 2016-10-24 18:51:34.341162 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.62:6811/26469 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x24456e00 con 0x175bfb80 -9987> 2016-10-24 18:51:34.341174 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.58:6805/2023199 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x25c59e00 con 0x7874f20 -9986> 2016-10-24 18:51:34.341186 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.63:6805/2023199 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x19703c00 con 0x7875760 -9985> 2016-10-24 18:51:34.341208 7f307b22d700 1 -- 10.3.149.62:0/25857 --> 10.3.149.58:6803/2023356 -- osd_ping(ping e3014 stamp 2016-10-24 18:51:34.340550) v2 -- ?+0 0x19702600 con 0x26444940
[ceph-users] Behavior of ceph-fuse when network is down
Hi all, To observe what will happen to ceph-fuse mount if the network is down, we blocked network connections to all three monitors by iptables. If we restore the network immediately(within minutes), the blocked I/O request will be restored, every thing will be back to normal. But if we continue to block it long enough, say twenty minutes, ceph-fuse will not be able to restore. The ceph-fuse process is still there, but will not be able to handle I/O operations, df or ls will hang indefinitely. What is the retry policy of ceph-fuse? Is it normal for ceph-fuse to hang after the network blocking? If so, how can I make it restore to normal after the network is recovered? If it is not normal, what might be the cause? How can I help to debug this? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Behavior of ceph-fuse when network is down
Thanks! I'll check it out. 2017年11月24日 17:58,"Yan, Zheng" <uker...@gmail.com>写道: > On Fri, Nov 24, 2017 at 4:59 PM, Zhang Qiang <dotslash...@gmail.com> > wrote: > > Hi all, > > > > To observe what will happen to ceph-fuse mount if the network is down, we > > blocked > > network connections to all three monitors by iptables. If we restore the > > network > > immediately(within minutes), the blocked I/O request will be restored, > every > > thing will > > be back to normal. > > > > But if we continue to block it long enough, say twenty minutes, ceph-fuse > > will not be > > able to restore. The ceph-fuse process is still there, but will not be > able > > to handle I/O > > operations, df or ls will hang indefinitely. > > > > What is the retry policy of ceph-fuse? Is it normal for ceph-fuse to hang > > after the > > network blocking? If so, how can I make it restore to normal after the > > network is > > recovered? If it is not normal, what might be the cause? How can I help > to > > debug this? > > you can use 'kick_stale_sessions' ASOK command to make ceph-fuse > reconnect, or set 'client_reconnect_stale' config option to true. > Besides, you need to set mds config option > 'mds_session_blacklist_on_timeout' to false. > > > > > Thanks. > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-volume created filestore journal bad header magic
Hi all, I'm new to Luminous, when I use ceph-volume create to add a new filestore OSD, it will tell me that the journal's header magic is not good. But the journal device is a new LV. How to make it write the new OSD's header to the journal? And it seems this error message will not affect the creation and start of the OSD, but it complains the bad header magic in the log every time it boots. journal _open /var/lib/ceph/osd/ceph-1/journal fd 30: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1 journal do_read_entry(3922624512): bad header magic journal do_read_entry(3922624512): bad header magic journal _open /var/lib/ceph/osd/ceph-1/journal fd 30: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1 Should I care about this? Is the OSD using the journal with bad magic header normally? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] FS Reclaims storage too slow
Hi, Is it normal that I deleted files from the cephfs and ceph didn't delete the back objects a day later? Until I restart the mds deamon then it started to release the storage space. I noticed the doc(http://docs.ceph.com/docs/mimic/dev/delayed-delete/) says the file is marked as deleted on the MDS, and deleted lazily. What is the condition to trigger the back object deletion? If it's normal the deletion delayed that much, is there any way to make it faster? Since the cluster is near full. I'm using jewel 10.2.3 both for ceph-fuse and mds. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't get MDS running after a power outage
Hi, Ceph version 10.2.3. After a power outage, I tried to start the MDS deamons, but they stuck forever replaying journals, I had no idea why they were taking that long, because this is just a small cluster for testing purpose with only hundreds MB data. I restarted them, and the error below was encountered. Any chance I can restore them? Mar 28 14:20:30 node01 systemd: Started Ceph metadata server daemon. Mar 28 14:20:30 node01 systemd: Starting Ceph metadata server daemon... Mar 28 14:20:30 node01 ceph-mds: 2018-03-28 14:20:30.796255 7f0150c8c180 -1 deprecation warning: MDS id 'mds.0' is invalid and will be forbidden in a future version. MDS names may not start with a numeric digit. Mar 28 14:20:30 node01 ceph-mds: starting mds.0 at :/0 Mar 28 14:20:30 node01 ceph-mds: ./mds/MDSMap.h: In function 'const entity_inst_t MDSMap::get_inst(mds_rank_t)' thread 7f014ac6c700 time 2018-03-28 14:20:30.942480 Mar 28 14:20:30 node01 ceph-mds: ./mds/MDSMap.h: 582: FAILED assert(up.count(m)) Mar 28 14:20:30 node01 ceph-mds: ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) Mar 28 14:20:30 node01 ceph-mds: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f01512aba45] Mar 28 14:20:30 node01 ceph-mds: 2: (MDSMap::get_inst(int)+0x20f) [0x7f0150ee5e3f] Mar 28 14:20:30 node01 ceph-mds: 3: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x7b9) [0x7f0150ed6e49] Mar 28 14:20:30 node01 ceph-mds: 4: (MDSDaemon::handle_mds_map(MMDSMap*)+0xe3d) [0x7f0150eb396d] Mar 28 14:20:30 node01 ceph-mds: 5: (MDSDaemon::handle_core_message(Message*)+0x7b3) [0x7f0150eb4eb3] Mar 28 14:20:30 node01 ceph-mds: 6: (MDSDaemon::ms_dispatch(Message*)+0xdb) [0x7f0150eb514b] Mar 28 14:20:30 node01 ceph-mds: 7: (DispatchQueue::entry()+0x78a) [0x7f01513ad4aa] Mar 28 14:20:30 node01 ceph-mds: 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f015129098d] Mar 28 14:20:30 node01 ceph-mds: 9: (()+0x7dc5) [0x7f0150095dc5] Mar 28 14:20:30 node01 ceph-mds: 10: (clone()+0x6d) [0x7f014eb61ced] Mar 28 14:20:30 node01 ceph-mds: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-fuse segfaults
Hi, I'm using ceph-fuse 10.2.3 on CentOS 7.3.1611. ceph-fuse always segfaults after running for some time. *** Caught signal (Segmentation fault) ** in thread 7f455d832700 thread_name:ceph-fuse ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) 1: (()+0x2a442a) [0x7f457208e42a] 2: (()+0xf5e0) [0x7f4570b895e0] 3: (Client::get_root_ino()+0x10) [0x7f4571f86a20] 4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x18d) [0x7f4571f844bd] 5: (()+0x19ae21) [0x7f4571f84e21] 6: (()+0x164b5) [0x7f457199e4b5] 7: (()+0x16bdb) [0x7f457199ebdb] 8: (()+0x13471) [0x7f457199b471] 9: (()+0x7e25) [0x7f4570b81e25] 10: (clone()+0x6d) [0x7f456fa6934d] Detailed events dump: https://drive.google.com/file/d/0B_4ESJRu7BZFcHZmdkYtVG5CTGQ3UVFod0NxQloxS0ZCZmQ0/view?usp=sharing Let me know if more info is needed. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse segfaults
Thanks Patrick, I should have checked the tracker first. I'll try the kernel client and a upgrade to see if it resolves. On 2 April 2018 at 22:29, Patrick Donnelly <pdonn...@redhat.com> wrote: > Probably fixed by this: http://tracker.ceph.com/issues/17206 > > You need to upgrade your version of ceph-fuse. > > On Mon, Apr 2, 2018 at 12:56 AM, Zhang Qiang <dotslash...@gmail.com> wrote: >> Hi, >> >> I'm using ceph-fuse 10.2.3 on CentOS 7.3.1611. ceph-fuse always >> segfaults after running for some time. >> >> *** Caught signal (Segmentation fault) ** >> in thread 7f455d832700 thread_name:ceph-fuse >> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) >> 1: (()+0x2a442a) [0x7f457208e42a] >> 2: (()+0xf5e0) [0x7f4570b895e0] >> 3: (Client::get_root_ino()+0x10) [0x7f4571f86a20] >> 4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x18d) >> [0x7f4571f844bd] >> 5: (()+0x19ae21) [0x7f4571f84e21] >> 6: (()+0x164b5) [0x7f457199e4b5] >> 7: (()+0x16bdb) [0x7f457199ebdb] >> 8: (()+0x13471) [0x7f457199b471] >> 9: (()+0x7e25) [0x7f4570b81e25] >> 10: (clone()+0x6d) [0x7f456fa6934d] >> >> Detailed events dump: >> https://drive.google.com/file/d/0B_4ESJRu7BZFcHZmdkYtVG5CTGQ3UVFod0NxQloxS0ZCZmQ0/view?usp=sharing >> Let me know if more info is needed. >> >> Thanks. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com