[ceph-users] Safe to delete data, metadata pools?
Hi Everyone, I've got a couple of pools that I don't believe are being used but have a reasonably large number of pg's (approx 50% of our total pg's). I'd like to delete them but as they were pre-existing when I inherited the cluster, I wanted to make sure they aren't needed for anything first. Here's the details: POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 0 088037G0 metadata 1 0 088037G0 We don't run cephfs and I believe these are meant for that, but may have been created by default when the cluster was set up (back on dumpling or bobtail I think). As far as I can tell there is no data in them. Do they need to exist for some ceph function? The pool names worry me a little, as they sound important. They have 3136 pg's each so I'd like to be rid of those so I can increase the number of pg's in my actual data pools without getting over the 300 pg's per osd. Here's the osd dump: pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 3136 pgp_num 3136 last_change 1 crash_replay_interval 45 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 3136 pgp_num 3136 last_change 1 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 Also, what performance impact am I likely to see when ceph removes the empty pg's considering it's approx 50% of my total pg's on my 180 osd's. Thanks, Rich ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph on Public IP
I think what you're looking for is the public bind addr option. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph on Public IP
Hi all, I am installing ceph using ceph-deploy. I am installing it on 4 VMs which have private IP addresses and public IPs NAT-ted to it. But upon installation, even after adding public network = 0.0.0.0/0 it still listens on the private IP. I tried doing the steps mentioned in http://docs.ceph.com/docs/ master/rados/configuration/network-config-ref/ but I still face the issue. Any directions in this regard will be helpful. Thanks & Regards, Nitish B. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete
Unfortunately, I don’t see that setting documented anywhere other than the release notes. Its hard to find guidance for questions in that case, but luckily you noted it in your blog post. I wish I knew what setting to put that at. I did use the deprecated one after moving to hammer a while back due to the mis-calcuated PGs. I have now that settings, but used 0 as the value, which cleared the error in the status, but the stuck incomplete pgs persist. I restarted all daemons, so it should be in full effect. Interestingly enough, it added the hdd in the ceph osd tree output... anyhow, I know this is a dirty cluster due to this mis-calcuation, I would like to fix the cluster if possible ( both the stuck/incomplete and the underlying too many pgs issue ) Thanks for the information! -Brent -Original Message- From: Jens-U. Mozdzen [mailto:jmozd...@nde.ag] Sent: Sunday, January 7, 2018 1:23 PM To: bkenn...@cfl.rr.com Cc: ste...@bit.nl Subject: Fwd: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete Hi Brent, sorry, the quoting style had me confused, this actually was targeted at your question, I believe. @Stefan: Sorry for the noise Regards, Jens - Weitergeleitete Nachricht von "Jens-U. Mozdzen"- Datum: Sun, 07 Jan 2018 18:18:00 + Von: "Jens-U. Mozdzen" Betreff: Re: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete An: ste...@bit.nl Hi Stefan, I'm in a bit of a hurry, so just a short offline note: >>> Quoting Brent Kennedy (bkenn...@cfl.rr.com): >>> Unfortunately, this cluster was setup before the calculator was in >>> place and when the equation was not well understood. We have the >>> storage space to move the pools and recreate them, which was >>> apparently the only way to handle the issue( you are suggesting what >>> appears to be a different approach ). I was hoping to avoid doing >>> all of this because the migration would be very time consuming. >>> There is no way to fix the stuck pg?s though? If I were to expand >>> the replication to 3 instances, would that help with the PGs per OSD >>> issue any? >> No! It will make the problem worse because you need PGs to host these >> copies. The more replicas, the more PGs you need. > Guess I am confused here, wouldn?t it spread out the existing data to > more PGs? Or are you saying that it couldn?t spread out because the > PGs are already in use? Previously it was set to 3 and we reduced it > to 2 because of failures. If you increase the replication size, you'll ask RADOS to create additional copies in additional PGs. So the number of PGs will increase... >>> When you say enforce, do you mean it will block all access to the >>> cluster/OSDs? >> No, [...] My experience differs: If you cluster already has too many PGs per OSD (before upgrading to 12.2.2) and anything PG-per-OSD-related changes (i.e. re-distributing PGs when OSDs go down), any access to the over-sized OSDs *will block*. Cost me a number of days to figure out and was recently discussed by someone else on the ML. Increase the according parameter ("mon_max_pg_per_osd") in the global section and restart your MONs, MGRs and OSDs (OSDs one by one, if you don't know your layout, to avoid data loss). Made me even create a blog entry, for future reference: http://technik.blogs.nde.ag/2017/12/26/ceph-12-2-2-minor-update-major-trouble/ If this needs to go into more details, let's take it back to the mailing list, I'll be available again during the upcoming week. Regards, Jens - Ende der weitergeleiteten Nachricht - ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph luminous - performance issue
Sorry for the delay Here are the results when using bs=16k and rw=write ( Note: I am running the command directly on a OSD host as root) fio /home/cephuser/write.fio write-4M: (g=0): rw=write, bs=16K-16K/16K-16K/16K-16K, ioengine=rbd, iodepth=32 fio-2.2.8 Starting 1 process rbd engine: RBD version: 1.12.0 Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/172.6MB/0KB /s] [0/11.5K/0 iops] [eta 00m:00s] Here are the results when runnnig with bs=4k and rw=randwrite [root@osd03 ~]# fio /home/cephuser/write.fio write-4M: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 fio-2.2.8 Starting 1 process rbd engine: RBD version: 1.12.0 Jobs: 1 (f=0): [w(1)] [100.0% done] [0KB/54056KB/0KB /s] [0/13.6K/0 iops] [eta 00m:00s] On 3 January 2018 at 15:28,wrote: > Hi Steven. > > interesting... 'm quite curious after your post now. > > I've migrated our prod. CEPH cluster to 12.2.2 and Bluestore just today > and haven't heard back anything "bad" from the applications/users so far. > performance tests on our test cluster were good before, but we use S3/RGW > only anyhow ;) > > there are two things I would like to know/learn... could you try/test and > feed back?! > > - change all your tests to use >=16k block size, see also BStore comments > here (https://www.mail-archive.com/ceph-users@lists.ceph.com/msg43023.html > ) > - change your "write.fio" file profile from "rw=randwrite" to "rw=write" > (or something similar :O ) to compare apples with apples ;) > > thanks for your efforts and looking forward for those results ;) > > best regards > Notna > > > -- > > Gesendet: Mittwoch, 03. Januar 2018 um 16:20 Uhr > Von: "Steven Vacaroaia" > An: "Brady Deetz" > Cc: ceph-users > Betreff: Re: [ceph-users] ceph luminous - performance issue > > Thanks for your willingness to help > > DELL R620, 1 CPU, 8 cores, 64 GB RAM > cluster network is using 2 bonded 10 GB NICs ( mode=4), MTU=9000 > > SSD drives are Enterprise grade - 400 GB SSD Toshiba PX04SHB040 > HDD drives are - 10k RPM, 600 GB Toshiba AL13SEB600 > > Steven > > > On 3 January 2018 at 09:41, Brady Deetz z...@gmail.com]> wrote: > Can you provide more detail regarding the infrastructure backing this > environment? What hard drive, ssd, and processor are you using? Also, what > is providing networking? > > I'm seeing 4k blocksize tests here. Latency is going to destroy you. > > > On Jan 3, 2018 8:11 AM, "Steven Vacaroaia" 7...@gmail.com]> wrote: > > Hi, > > I am doing a PoC with 3 DELL R620 and 12 OSD , 3 SSD drives ( one on each > server), bluestore > > I configured the OSD using the following ( /dev/sda is my SSD drive) > ceph-disk prepare --zap-disk --cluster ceph --bluestore /dev/sde > --block.wal /dev/sda --block.db /dev/sda > > Unfortunately both fio and bench tests show much worse performance for the > pools than for the individual disks > > Example: > DISKS > fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k > --numjobs=14 --iodepth=1 --runtime=60 --time_based --group_reporting > --name=journal-test > > SSD drive > Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/465.2MB/0KB /s] [0/119K/0 > iops] [eta 00m:00s] > > HD drive > Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/179.2MB/0KB /s] [0/45.9K/0 > iops] [eta 00m:00s] > > POOL > > fio write.fio > Jobs: 1 (f=0): [w(1)] [100.0% done] [0KB/51428KB/0KB /s] [0/12.9K/0 iops] > > > cat write.fio > [write-4M] > description="write test with 4k block" > ioengine=rbd > clientname=admin > pool=scbench > rbdname=image01 > iodepth=32 > runtime=120 > rw=randwrite > bs=4k > > > rados bench -p scbench 12 write > > > Max bandwidth (MB/sec): 224 > Min bandwidth (MB/sec): 0 > Average IOPS: 26 > Stddev IOPS:24 > Max IOPS: 56 > Min IOPS: 0 > Average Latency(s): 0.59819 > Stddev Latency(s): 1.64017 > Max latency(s): 10.8335 > Min latency(s): 0.00475139 > > > > > I must be missing something - any help/suggestions will be greatly > appreciated > > Here are some specific info > > > ceph -s > cluster: > id: 91118dde-f231-4e54-a5f0-a1037f3d5142 > health: HEALTH_OK > > services: > mon: 1 daemons, quorum mon01 > mgr: mon01(active) > osd: 12 osds: 12 up, 12 in > > data: > pools: 4 pools, 484 pgs > objects: 70082 objects, 273 GB > usage: 570 GB used, 6138 GB / 6708 GB avail > pgs: 484 active+clean > > io: > client: 2558 B/s rd, 2 op/s rd, 0 op/s wr > > > ceph osd pool ls detail > pool 1 'test-replicated' replicated size 2 min_size 1 crush_rule 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 157 flags > hashpspool stripe_width 0 application rbd > removed_snaps [1~3] > pool 2 'test-erasure' erasure size 3 min_size 3 crush_rule 1
[ceph-users] Bluestore migration disaster - incomplete pgs recovery process and progress (in progress)
Below is the status of my disastrous self-inflicted journey. I will preface this by admitting this could not have been prevented by software attempting to keep me from being stupid. I have a production cluster with over 350 XFS backed osds running Luminous. We want to transition the cluster to Bluestore for the purpose of enabling EC for CephFS. We are currently at 75+% utilization and EC coding could really help us reclaim some much needed capacity. Formatting 1 osd at a time and waiting on the cluster to backfill for every disk was going to take a very long time (based on our observations an estimated 240+ days). Formatting an entire host at once caused a little too much turbulence in the cluster. Furthermore, we could start the transition to EC if we had enough hosts with enough disks running Bluestore, before the entire cluster was migrated. As such, I decided to parallelize. The general idea is that we could format any osd that didn't have anything other than active+clean pgs associated. I maintain that this method should work. But, something went terribly wrong with the script and somehow we formatted disks in a manner that brought PGs into an incomplete state. It's now pretty obvious that the affected PGs were backfilling to other osds when the script clobbered the last remaining good set of objects. This cluster serves CephFS and a few RBD volumes. mailing list submissions related to this outage: cephfs-data-scan pg_files errors finding and manually recovering objects in bluestore Determine cephfs paths and rados objects affected by incomplete pg Our recovery 1) We allowed the cluster to repair itself as much as possible. 2) Following self-healing we were left with 3 PGs incomplete. 2 were in the cephfs data pool and 1 in an RBD pool. 3) Using ceph pg ${pgid} query, we found all disks known to have recently contained some of that PG's data 4) For each osd listed in the pg query, we exported the remaining PG data using ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${osdid}/ --pgid ${pgid} --op export --file /media/ceph_recovery/ceph-${ osdid}/recover.${pgid} 5) After having all of the possible exports we compared the recovery files and chose the largest. I would have appreciated the ability to do a merge of some sort on these exports, but we'll take what we can get. We're just going to assume the largest export was the most complete backfill at the time disaster struck. 6) We removed the nearly empty pg from the acting osds using ceph-objectstore-tool --op remove --data-path /var/lib/ceph/osd/ceph-${osdid} --pgid ${pgid} 7) We imported the largest export we had into the acting osds for the pg 8) We marked the pg as complete using the following on the primary acting ceph-objectstore-tool --op mark-complete --data-path /var/lib/ceph/osd/ceph-${osdid}/ --pgid ${pgid} 9) We were convinced that it would be possible multiple exports of the same partially backfilled PG different objects. As such, we started reversing the format of the export file to extract the objects from the exports and compared. 10) While our resident reverse engineer was hard at work, focus was shifted toward tooling for the purpose of identifying corrupt files, rbds, and appropriate actions for each 10a) A list of all rados objects were dumped for our most valuable data (CephFS). Our first mechanism of detection is a skip in object sequence numbers 10b) Because our metadata pool was unaffected by this mess, we are trusting that ls delivers correct file sizes even for corrupt files. As such, we should be able to identify how many objects make up the file. If the count of objects for that file's inode are less than that, there's a problem.. More than the calculated amount??? The world definitely explodes. 10c) Finally, the saddest check is if there are no objects in rados for that inode. That's where we are right now. I'll update this thread as we get closer to recovery from backups and accepting data loss if necessary. I will note that we wish there were some documentation on using on ceph-objectstore-tool. We understand that it's for emergencies, but that's when concise documentation is most important. From what we've found, the only documentation seems to be --help and the source code. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] permission denied, unable to bind socket
Ok I fixed the address error. The service is able to start now. ceph -s hangs though. gentooserver ~ # ceph -s ^CError EINTR: problem getting command descriptions from mon. I'm not sure how to fix the permissions issue. /var/run/ceph is a temporary directory so I can't just chown it. On Sat, Jan 6, 2018 at 7:40 PM, Nathan Dehnelwrote: > So I'm following the guide at http://docs.ceph.com/docs/ > master/install/manual-deployment/ > > ceph-mon@gentooserver.service fails. > > Jan 06 19:12:40 gentooserver systemd[1]: Started Ceph cluster monitor > daemon. > Jan 06 19:12:41 gentooserver ceph-mon[2674]: warning: unable to create > /var/run/ceph: (13) Permission denied > Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.000507 > 7f3163008f80 -1 asok(0x563d0f97d2c0) AdminSocketConfigObs::init: failed: > AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to > '/var/run/ceph/ceph-mon.gentooserver.asok': (2) No such file or directory > Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.152781 > 7f3163008f80 -1 Processor -- bind unable to bind to > [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign > requested address > Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.152789 > 7f3163008f80 -1 Processor -- bind was unable to bind. Trying again in 5 > seconds > Jan 06 19:12:46 gentooserver ceph-mon[2674]: 2018-01-06 19:12:46.152982 > 7f3163008f80 -1 Processor -- bind unable to bind to > [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign > requested address > Jan 06 19:12:46 gentooserver ceph-mon[2674]: 2018-01-06 19:12:46.152996 > 7f3163008f80 -1 Processor -- bind was unable to bind. Trying again in 5 > seconds > Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153218 > 7f3163008f80 -1 Processor -- bind unable to bind to > [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign > requested address > Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153244 > 7f3163008f80 -1 Processor -- bind was unable to bind after 3 attempts: > (99) Cannot assign requested address > Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153250 > 7f3163008f80 -1 unable to bind monitor to [2001:1c:d64b:91c5:3a84:dfce: > 8546:998]:6789/0 > Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Main process exited, code=exited, status=1/FAILURE > Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Unit entered failed state. > Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Failed with result 'exit-code'. > Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Service hold-off time over, scheduling restart. > Jan 06 19:13:01 gentooserver systemd[1]: Stopped Ceph cluster monitor > daemon. > Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Start request repeated too quickly. > Jan 06 19:13:01 gentooserver systemd[1]: Failed to start Ceph cluster > monitor daemon. > Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Unit entered failed state. > Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service: > Failed with result 'exit-code'. > > cat /etc/ceph/ceph.conf > [global] > fsid = a736559a-92d1-483e-9289-d2c7feed510f > ms bind ipv6 = true > mon initial members = gentooserver > > [mon.mona] > host = gentooserver > mon addr = [2001:1c:d64b:91c5:3a84:dfce:8546:9982] > > [osd] > osd journal size = 1 > > I'm not sure if the problem is the permissions error, or the IP address > appearing to get truncated in the output, or both. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "VolumeDriver.Create: Unable to create Ceph RBD Image"
Hi List I'm getting the following error when trying to run docker with a rbd volume (either pre-existing, or not): "VolumeDriver.Create: Unable to create Ceph RBD Image" Please could someone give me a clue as to how to debug this further and resolve it? Details of my platform: 1. ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) 2. Docker version 17.05.0-ce, build 89658be 3. rbd-docker-plugin --version 2.0.1 4. Kernel: Linux lol-server-049 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Here are the details from the rbd-docker logs and syslogs: - Running docker with an as-yet-uncreated rbd volume, and rbd-docker-plugin with --create=true: ``` root@lol-server-045:~# docker run --volume-driver=rbd --volume dummy02:/mnt centos:latest bash docker: Error response from daemon: create dummy02: VolumeDriver.Create: Unable to create Ceph RBD Image(dummy02): exit status 2. See 'docker run --help'. ``` - With an already created rbd volume, and rbd-docker-plugin with --create=false: ``` root@lol-server-045:~# docker run --volume-driver=rbd --volume dummy01:/mnt centos:latest bash docker: Error response from daemon: create dummy01: VolumeDriver.Create: Ceph RBD Image not found: dummy01. ``` - state of a pre-created rbd device: root@lol-server-045:/var/log# rbd ls| egrep dummy dummy01 root@lol-server-045:/var/log# rbd info dummy01 rbd image 'dummy01': size 1096 MB in 274 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.85d6238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: BUG: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1578484 ``` root@lol-server-045:/var/log# root@lol-server-045:/var/log# rbd feature disable foo exclusive-lock object-map fast-diff deep-flatten rbd: error opening image foo: (2) No such file or directory root@lol-server-045:/var/log# rbd feature disable dummy01 exclusive-lock object-map fast-diff deep-flatten root@lol-server-045:/var/log# rbd map dummy01 --pool rbd /dev/rbd3 ``` - rbd-docker-plugin.log entry following restart of the rbd-docker driver service) ``` 2018/01/07 23:45:20 main.go:121: INFO: Creating Docker VolumeDriver Handler 2018/01/07 23:45:20 main.go:125: INFO: Opening Socket for Docker to connect: /run/docker/plugins/rbd.sock 2018/01/07 23:45:29 main.go:141: INFO: received TERM or KILL signal: terminated 2018/01/07 23:45:29 main.go:190: INFO: closing log file 2018/01/07 23:45:29 main.go:91: INFO: starting rbd-docker-plugin version 2.0.1 2018/01/07 23:45:29 main.go:92: INFO: canCreateVolumes=true, removeAction="ignore" 2018/01/07 23:45:29 main.go:101: INFO: Setting up Ceph Driver for PluginID=rbd, cluster=, ceph-user=docker, pool=rbd, mount=/var/lib/docker-volumes, config=/etc/ceph/ceph.conf 2018/01/07 23:45:29 driver.go:85: INFO: newCephRBDVolumeDriver: setting base mount dir=/var/lib/docker-volumes/rbd 2018/01/07 23:45:29 main.go:121: INFO: Creating Docker VolumeDriver Handler 2018/01/07 23:45:29 main.go:125: INFO: Opening Socket for Docker to connect: /run/docker/plugins/rbd.sock ``` - when attempting to run a docker image, specifying a volume that does not yet exist: ``` root@lol-server-045:/var/log# docker run -u 0 --privileged -it --volume-driver rbd -v dummy02:/mnt:rw centos:latest bash ``` docker: Error response from daemon: create dummy02: VolumeDriver.Create: Unable to create Ceph RBD Image(dummy02): exit status 2. ``` - Log entry: ``` 2018/01/07 23:45:29 driver.go:85: INFO: newCephRBDVolumeDriver: setting base mount dir=/var/lib/docker-volumes/rbd 2018/01/07 23:45:29 main.go:121: INFO: Creating Docker VolumeDriver Handler 2018/01/07 23:45:29 main.go:125: INFO: Opening Socket for Docker to connect: /run/docker/plugins/rbd.sock 2018/01/07 23:46:56 api.go:188: Entering go-plugins-helpers getPath 2018/01/07 23:46:56 driver.go:467: WARN: Image dummy02 does not exist 2018/01/07 23:46:56 api.go:132: Entering go-plugins-helpers createPath 2018/01/07 23:46:56 driver.go:145: INFO: API Create(&{"dummy02" map[]}) 2018/01/07 23:46:56 driver.go:153: INFO: createImage(&{"dummy02" map[]}) 2018/01/07 23:46:56 driver.go:687: INFO: Attempting to create new RBD Image: (rbd/dummy02, %!s(int=20480), xfs) 2018/01/07 23:46:56 driver.go:203: ERROR: Unable to create Ceph RBD Image(dummy02): exit status 2 ``` - docker log entries: ``` Jan 7 23:42:03 lol-server-045 kernel: [4063726.059726] aufs au_opts_verify:1597:dockerd[107149]: dirperm1 breaks the protection by the permission bits on the lower branch Jan 7 23:42:30 lol-server-045 kernel: [4063752.624828] aufs au_opts_verify:1597:dockerd[107147]: dirperm1 breaks the protection by the permission bits on the lower branch Jan 7 23:45:20 lol-server-045 rbd-docker-plugin[77813]: 2018/01/07 23:45:20 main.go:179: INFO: setting log file: /var/log/rbd-docker-plugin.log Jan 7 23:45:29 lol-server-045 rbd-docker-plugin[77856]: 2018/01/07 23:45:29
[ceph-users] Stuck pgs (activating+remapped) and slow requests after adding OSD node via ceph-ansible
Hi all, We have 5 node ceph cluster (Luminous 12.2.1) installed via ceph-ansible. All servers have 16X1.5TB SSD disks. 3 of these servers are also acting as MON+MGRs. We don't have separated network for cluster and public, each node has 4 NICs bonded together (40G) and serves cluster+public communication (We know it's not ideal and planning to change it). Last week we added another node to cluster (another 16*1.5TB ssd). We used ceph-ansible latest stable release. After OSD activation cluster started rebalancing and problems began: 1. Cluster entered HEALTH_ERROR state 2. 67 pgs stuck at activating+remapped 3. A lot of blocked slow requests. This cluster serves OpenStack volumes and almost all OpenStack instances got 100% disk utilization and hanged, eventually, cinder-volume has crushed. Eventually, after restarting several OSDs, problem solved and cluster got back to HEALTH_OK Our configuration already has: osd max backfills = 1 osd max scrubs = 1 osd recovery max active = 1 osd recovery op priority = 1 In addition, we see a lot of bad mappings: for example: bad mapping rule 0 x 52 num_rep 8 result [32,5,78,25,96,59,80] What can be the cause and what can I do in order to avoid this situation? we need to add another 9 osd servers and can't afford downtime. Any help would be appreciated. Thank you very much Our ceph configuration: [mgr] mgr_modules = dashboard zabbix [global] cluster network = *removed for security resons* fsid = *removed for security resons* mon host = *removed for security resons* mon initial members = *removed for security resons* mon osd down out interval = 900 osd pool default size = 3 public network = *removed for security resons* [client.libvirt] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [osd] osd backfill scan max = 16 osd backfill scan min = 4 osd bluestore cache size = 104857600 **Due to 12.2.1 bluestore memory leak bug** osd max backfills = 1 osd max scrubs = 1 osd recovery max active = 1 osd recovery max single start = 1 osd recovery op priority = 1 osd recovery threads = 1 -- *Tzachi Strul* *Storage DevOps *// *Kenshoo* -- This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd, its subsidiaries or affiliates ("Kenshoo"). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com