[ceph-users] Safe to delete data, metadata pools?

2018-01-07 Thread Richard Bade
Hi Everyone,
I've got a couple of pools that I don't believe are being used but
have a reasonably large number of pg's (approx 50% of our total pg's).
I'd like to delete them but as they were pre-existing when I inherited
the cluster, I wanted to make sure they aren't needed for anything
first.
Here's the details:
POOLS:
NAME   ID USED   %USED MAX AVAIL OBJECTS
data   0   0 088037G0
metadata   1   0 088037G0

We don't run cephfs and I believe these are meant for that, but may
have been created by default when the cluster was set up (back on
dumpling or bobtail I think).
As far as I can tell there is no data in them. Do they need to exist
for some ceph function?
The pool names worry me a little, as they sound important.

They have 3136 pg's each so I'd like to be rid of those so I can
increase the number of pg's in my actual data pools without getting
over the 300 pg's per osd.
Here's the osd dump:
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 3136 pgp_num 3136 last_change 1 crash_replay_interval
45 min_read_recency_for_promote 1 min_write_recency_for_promote 1
stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1
object_hash rjenkins pg_num 3136 pgp_num 3136 last_change 1
min_read_recency_for_promote 1 min_write_recency_for_promote 1
stripe_width 0

Also, what performance impact am I likely to see when ceph removes the
empty pg's considering it's approx 50% of my total pg's on my 180
osd's.

Thanks,
Rich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Public IP

2018-01-07 Thread John Petrini
I think what you're looking for is the public bind addr option.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph on Public IP

2018-01-07 Thread nithish B
Hi all,
I am installing ceph using ceph-deploy. I am installing it on 4 VMs which
have private IP addresses and public IPs NAT-ted to it. But upon
installation, even after adding

public network = 0.0.0.0/0

it still listens on the private IP.
I tried doing the steps mentioned in http://docs.ceph.com/docs/
master/rados/configuration/network-config-ref/  but I still face the issue.

Any directions in this regard will be helpful.

Thanks & Regards,
Nitish B.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete

2018-01-07 Thread Brent Kennedy
Unfortunately, I don’t see that setting documented anywhere other than the 
release notes.  Its hard to find guidance for questions in that case, but 
luckily you noted it in your blog post.  I wish I knew what setting to put that 
at.  I did use the deprecated one after moving to hammer a while back due to 
the mis-calcuated PGs.  I have now that settings, but used 0 as the value, 
which cleared the error in the status, but the stuck incomplete pgs persist.  I 
restarted all daemons, so it should be in full effect.  Interestingly enough, 
it added the hdd in the ceph osd tree output...   anyhow, I know this is a 
dirty cluster due to this mis-calcuation, I would like to fix the cluster if 
possible ( both the stuck/incomplete and the underlying too many pgs issue )

Thanks for the information!

-Brent

-Original Message-
From: Jens-U. Mozdzen [mailto:jmozd...@nde.ag] 
Sent: Sunday, January 7, 2018 1:23 PM
To: bkenn...@cfl.rr.com
Cc: ste...@bit.nl
Subject: Fwd: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs 
incomplete

Hi Brent,

sorry, the quoting style had me confused, this actually was targeted at your 
question, I believe.

@Stefan: Sorry for the noise

Regards,
Jens
- Weitergeleitete Nachricht von "Jens-U. Mozdzen"  -
   Datum: Sun, 07 Jan 2018 18:18:00 +
 Von: "Jens-U. Mozdzen" 
Betreff: Re: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs 
incomplete
  An: ste...@bit.nl

Hi Stefan,

I'm in a bit of a hurry, so just a short offline note:

>>> Quoting Brent Kennedy (bkenn...@cfl.rr.com):
>>> Unfortunately, this cluster was setup before the calculator was in 
>>> place and when the equation was not well understood.  We have the 
>>> storage space to move the pools and recreate them, which was 
>>> apparently the only way to handle the issue( you are suggesting what 
>>> appears to be a different approach ).  I was hoping to avoid doing 
>>> all of this because the migration would be very time consuming.  
>>> There is no way to fix the stuck pg?s though?  If I were to expand 
>>> the replication to 3 instances, would that help with the PGs per OSD 
>>> issue any?
>> No! It will make the problem worse because you need PGs to host these 
>> copies. The more replicas, the more PGs you need.
> Guess I am confused here, wouldn?t it spread out the existing data to 
> more PGs?  Or are you saying that it couldn?t spread out because the 
> PGs are already in use?  Previously it was set to 3 and we reduced it 
> to 2 because of failures.

If you increase the replication size, you'll ask RADOS to create additional 
copies in additional PGs. So the number of PGs will increase...

>>> When you say enforce, do you mean it will block all access to the 
>>> cluster/OSDs?
>> No, [...]

My experience differs: If you cluster already has too many PGs per OSD  
(before upgrading to 12.2.2) and anything PG-per-OSD-related changes  
(i.e. re-distributing PGs when OSDs go down), any access to the  
over-sized OSDs *will block*. Cost me a number of days to figure out  
and was recently discussed by someone else on the ML. Increase the  
according parameter ("mon_max_pg_per_osd") in the global section and  
restart your MONs, MGRs and OSDs (OSDs one by one, if you don't know  
your layout, to avoid data loss). Made me even create a blog entry,  
for future reference:  
http://technik.blogs.nde.ag/2017/12/26/ceph-12-2-2-minor-update-major-trouble/

If this needs to go into more details, let's take it back to the  
mailing list, I'll be available again during the upcoming week.

Regards,
Jens

- Ende der weitergeleiteten Nachricht -


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous - performance issue

2018-01-07 Thread Steven Vacaroaia
Sorry for the delay

Here are the results when using bs=16k and rw=write
( Note: I am running the command directly on a OSD host as root)

fio /home/cephuser/write.fio

write-4M: (g=0): rw=write, bs=16K-16K/16K-16K/16K-16K, ioengine=rbd,
iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/172.6MB/0KB /s] [0/11.5K/0 iops]
[eta 00m:00s]


Here are the results when runnnig with bs=4k and rw=randwrite

[root@osd03 ~]# fio /home/cephuser/write.fio
write-4M: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=0): [w(1)] [100.0% done] [0KB/54056KB/0KB /s] [0/13.6K/0 iops]
[eta 00m:00s]


On 3 January 2018 at 15:28,  wrote:

> Hi Steven.
>
> interesting... 'm quite curious after your post now.
>
> I've migrated our prod. CEPH cluster to 12.2.2 and Bluestore just today
> and haven't heard back anything "bad" from the applications/users so far.
> performance tests on our test cluster were good before, but we use S3/RGW
> only anyhow ;)
>
> there are two things I would like to know/learn... could you try/test and
> feed back?!
>
> - change all your tests to use >=16k block size, see also BStore comments
> here (https://www.mail-archive.com/ceph-users@lists.ceph.com/msg43023.html
> )
> - change your "write.fio" file profile from "rw=randwrite" to "rw=write"
> (or something similar :O ) to compare apples with apples ;)
>
> thanks for your efforts and looking forward for those results ;)
>
> best regards
>  Notna
>
> 
> --
>
> Gesendet: Mittwoch, 03. Januar 2018 um 16:20 Uhr
> Von: "Steven Vacaroaia" 
> An: "Brady Deetz" 
> Cc: ceph-users 
> Betreff: Re: [ceph-users] ceph luminous - performance issue
>
> Thanks for your willingness to help
>
> DELL R620, 1 CPU, 8 cores, 64 GB RAM
> cluster network is using 2 bonded 10 GB NICs ( mode=4), MTU=9000
>
> SSD drives are Enterprise grade  - 400 GB SSD  Toshiba PX04SHB040
> HDD drives are  - 10k RPM, 600 GB  Toshiba AL13SEB600
>
> Steven
>
>
> On 3 January 2018 at 09:41, Brady Deetz  z...@gmail.com]> wrote:
> Can you provide more detail regarding the infrastructure backing this
> environment? What hard drive, ssd, and processor are you using? Also, what
> is providing networking?
>
> I'm seeing 4k blocksize tests here. Latency is going to destroy you.
>
>
> On Jan 3, 2018 8:11 AM, "Steven Vacaroaia"  7...@gmail.com]> wrote:
>
> Hi,
>
> I am doing a PoC with 3 DELL R620 and 12 OSD , 3 SSD drives ( one on each
> server), bluestore
>
> I configured the OSD using the following ( /dev/sda is my SSD drive)
> ceph-disk prepare --zap-disk --cluster ceph  --bluestore /dev/sde
> --block.wal /dev/sda --block.db /dev/sda
>
> Unfortunately both fio and bench tests show much worse performance for the
> pools than for the individual disks
>
> Example:
> DISKS
> fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=14 --iodepth=1 --runtime=60 --time_based --group_reporting
> --name=journal-test
>
> SSD drive
> Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/465.2MB/0KB /s] [0/119K/0
> iops] [eta 00m:00s]
>
> HD drive
> Jobs: 14 (f=14): [W(14)] [100.0% done] [0KB/179.2MB/0KB /s] [0/45.9K/0
> iops] [eta 00m:00s]
>
> POOL
>
> fio write.fio
> Jobs: 1 (f=0): [w(1)] [100.0% done] [0KB/51428KB/0KB /s] [0/12.9K/0 iops]
>
>
>  cat write.fio
> [write-4M]
> description="write test with 4k block"
> ioengine=rbd
> clientname=admin
> pool=scbench
> rbdname=image01
> iodepth=32
> runtime=120
> rw=randwrite
> bs=4k
>
>
> rados bench -p scbench 12 write
>
>
> Max bandwidth (MB/sec): 224
> Min bandwidth (MB/sec): 0
> Average IOPS:   26
> Stddev IOPS:24
> Max IOPS:   56
> Min IOPS:   0
> Average Latency(s): 0.59819
> Stddev Latency(s):  1.64017
> Max latency(s): 10.8335
> Min latency(s): 0.00475139
>
>
>
>
> I must be missing something - any help/suggestions will be greatly
> appreciated
>
> Here are some specific info
>
>
> ceph -s
>   cluster:
> id: 91118dde-f231-4e54-a5f0-a1037f3d5142
> health: HEALTH_OK
>
>   services:
> mon: 1 daemons, quorum mon01
> mgr: mon01(active)
> osd: 12 osds: 12 up, 12 in
>
>   data:
> pools:   4 pools, 484 pgs
> objects: 70082 objects, 273 GB
> usage:   570 GB used, 6138 GB / 6708 GB avail
> pgs: 484 active+clean
>
>   io:
> client:   2558 B/s rd, 2 op/s rd, 0 op/s wr
>
>
> ceph osd pool ls detail
> pool 1 'test-replicated' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 157 flags
> hashpspool stripe_width 0 application rbd
> removed_snaps [1~3]
> pool 2 'test-erasure' erasure size 3 min_size 3 crush_rule 1 

[ceph-users] Bluestore migration disaster - incomplete pgs recovery process and progress (in progress)

2018-01-07 Thread Brady Deetz
Below is the status of my disastrous self-inflicted journey. I will preface
this by admitting this could not have been prevented by software attempting
to keep me from being stupid.

I have a production cluster with over 350 XFS backed osds running Luminous.
We want to transition the cluster to Bluestore for the purpose of enabling
EC for CephFS. We are currently at 75+% utilization and EC coding could
really help us reclaim some much needed capacity. Formatting 1 osd at a
time and waiting on the cluster to backfill for every disk was going to
take a very long time (based on our observations an estimated 240+ days).
Formatting an entire host at once caused a little too much turbulence in
the cluster. Furthermore, we could start the transition to EC if we had
enough hosts with enough disks running Bluestore, before the entire cluster
was migrated. As such, I decided to parallelize. The general idea is that
we could format any osd that didn't have anything other than active+clean
pgs associated. I maintain that this method should work. But, something
went terribly wrong with the script and somehow we formatted disks in a
manner that brought PGs into an incomplete state. It's now pretty obvious
that the affected PGs were backfilling to other osds when the script
clobbered the last remaining good set of objects.

This cluster serves CephFS and a few RBD volumes.

mailing list submissions related to this outage:
cephfs-data-scan pg_files errors
finding and manually recovering objects in bluestore
Determine cephfs paths and rados objects affected by incomplete pg

Our recovery
1) We allowed the cluster to repair itself as much as possible.

2) Following self-healing we were left with 3 PGs incomplete. 2 were in the
cephfs data pool and 1 in an RBD pool.

3) Using ceph pg ${pgid} query, we found all disks known to have recently
contained some of that PG's data

4) For each osd listed in the pg query, we exported the remaining PG data
using ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${osdid}/
--pgid ${pgid} --op export --file /media/ceph_recovery/ceph-${
osdid}/recover.${pgid}

5) After having all of the possible exports we compared the recovery files
and chose the largest. I would have appreciated the ability to do a merge
of some sort on these exports, but we'll take what we can get. We're just
going to assume the largest export was the most complete backfill at the
time disaster struck.

6) We removed the nearly empty pg from the acting osds
using ceph-objectstore-tool --op remove --data-path
/var/lib/ceph/osd/ceph-${osdid} --pgid ${pgid}

7) We imported the largest export we had into the acting osds for the pg

8) We marked the pg as complete using the following on the primary
acting ceph-objectstore-tool --op mark-complete --data-path
/var/lib/ceph/osd/ceph-${osdid}/ --pgid ${pgid}

9) We were convinced that it would be possible multiple exports of the same
partially backfilled PG different objects. As such, we started reversing
the format of the export file to extract the objects from the exports and
compared.

10) While our resident reverse engineer was hard at work, focus was shifted
toward tooling for the purpose of identifying corrupt files, rbds, and
appropriate actions for each
10a) A list of all rados objects were dumped for our most valuable data
(CephFS). Our first mechanism of detection is a skip in object sequence
numbers
10b) Because our metadata pool was unaffected by this mess, we are trusting
that ls delivers correct file sizes even for corrupt files. As such, we
should be able to identify how many objects make up the file. If the count
of objects for that file's inode are less than that, there's a problem..
More than the calculated amount??? The world definitely explodes.
10c) Finally, the saddest check is if there are no objects in rados for
that inode.

That's where we are right now. I'll update this thread as we get closer to
recovery from backups and accepting data loss if necessary.

I will note that we wish there were some documentation on using on
ceph-objectstore-tool. We understand that it's for emergencies, but that's
when concise documentation is most important. From what we've found, the
only documentation seems to be --help and the source code.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] permission denied, unable to bind socket

2018-01-07 Thread Nathan Dehnel
Ok I fixed the address error. The service is able to start now. ceph -s
hangs though.

gentooserver ~ # ceph -s
^CError EINTR: problem getting command descriptions from mon.

I'm not sure how to fix the permissions issue. /var/run/ceph is a temporary
directory so I can't just chown it.

On Sat, Jan 6, 2018 at 7:40 PM, Nathan Dehnel  wrote:

> So I'm following the guide at http://docs.ceph.com/docs/
> master/install/manual-deployment/
>
> ceph-mon@gentooserver.service fails.
>
> Jan 06 19:12:40 gentooserver systemd[1]: Started Ceph cluster monitor
> daemon.
> Jan 06 19:12:41 gentooserver ceph-mon[2674]: warning: unable to create
> /var/run/ceph: (13) Permission denied
> Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.000507
> 7f3163008f80 -1 asok(0x563d0f97d2c0) AdminSocketConfigObs::init: failed:
> AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to
> '/var/run/ceph/ceph-mon.gentooserver.asok': (2) No such file or directory
> Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.152781
> 7f3163008f80 -1  Processor -- bind unable to bind to
> [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign
> requested address
> Jan 06 19:12:41 gentooserver ceph-mon[2674]: 2018-01-06 19:12:41.152789
> 7f3163008f80 -1  Processor -- bind was unable to bind. Trying again in 5
> seconds
> Jan 06 19:12:46 gentooserver ceph-mon[2674]: 2018-01-06 19:12:46.152982
> 7f3163008f80 -1  Processor -- bind unable to bind to
> [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign
> requested address
> Jan 06 19:12:46 gentooserver ceph-mon[2674]: 2018-01-06 19:12:46.152996
> 7f3163008f80 -1  Processor -- bind was unable to bind. Trying again in 5
> seconds
> Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153218
> 7f3163008f80 -1  Processor -- bind unable to bind to
> [2001:1c:d64b:91c5:3a84:dfce:8546:998]:6789/0: (99) Cannot assign
> requested address
> Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153244
> 7f3163008f80 -1  Processor -- bind was unable to bind after 3 attempts:
> (99) Cannot assign requested address
> Jan 06 19:12:51 gentooserver ceph-mon[2674]: 2018-01-06 19:12:51.153250
> 7f3163008f80 -1 unable to bind monitor to [2001:1c:d64b:91c5:3a84:dfce:
> 8546:998]:6789/0
> Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Main process exited, code=exited, status=1/FAILURE
> Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Unit entered failed state.
> Jan 06 19:12:51 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Failed with result 'exit-code'.
> Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Service hold-off time over, scheduling restart.
> Jan 06 19:13:01 gentooserver systemd[1]: Stopped Ceph cluster monitor
> daemon.
> Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Start request repeated too quickly.
> Jan 06 19:13:01 gentooserver systemd[1]: Failed to start Ceph cluster
> monitor daemon.
> Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Unit entered failed state.
> Jan 06 19:13:01 gentooserver systemd[1]: ceph-mon@gentooserver.service:
> Failed with result 'exit-code'.
>
> cat /etc/ceph/ceph.conf
> [global]
> fsid = a736559a-92d1-483e-9289-d2c7feed510f
> ms bind ipv6 = true
> mon initial members = gentooserver
>
> [mon.mona]
> host = gentooserver
> mon addr = [2001:1c:d64b:91c5:3a84:dfce:8546:9982]
>
> [osd]
> osd journal size = 1
>
> I'm not sure if the problem is the permissions error, or the IP address
> appearing to get truncated in the output, or both.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "VolumeDriver.Create: Unable to create Ceph RBD Image"

2018-01-07 Thread Traiano Welcome
Hi List

I'm getting the following error when trying to run docker with a rbd volume
(either pre-existing, or not):

"VolumeDriver.Create: Unable to create Ceph RBD Image"

Please could someone give me a clue as to how to debug this further and
resolve it?

Details of my platform:

1. ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
2. Docker version 17.05.0-ce, build 89658be
3. rbd-docker-plugin --version 2.0.1
4. Kernel: Linux lol-server-049 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18
14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Here are the details from the rbd-docker logs and syslogs:

- Running docker with an as-yet-uncreated rbd volume, and rbd-docker-plugin
with --create=true:

```
root@lol-server-045:~# docker run  --volume-driver=rbd --volume
dummy02:/mnt centos:latest bash
docker: Error response from daemon: create dummy02: VolumeDriver.Create:
Unable to create Ceph RBD Image(dummy02): exit status 2.
See 'docker run --help'.
```

- With an already created rbd volume, and rbd-docker-plugin with
--create=false:

```
root@lol-server-045:~# docker run  --volume-driver=rbd --volume
dummy01:/mnt centos:latest bash
docker: Error response from daemon: create dummy01: VolumeDriver.Create:
Ceph RBD Image not found: dummy01.

```


- state of a pre-created rbd device:

root@lol-server-045:/var/log# rbd ls| egrep dummy
dummy01

root@lol-server-045:/var/log# rbd info dummy01
rbd image 'dummy01':
size 1096 MB in 274 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.85d6238e1f29
format: 2
features: layering, exclusive-lock, object-map, fast-diff,
deep-flatten
flags:

BUG:  https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1578484

```
root@lol-server-045:/var/log#
root@lol-server-045:/var/log# rbd feature disable foo exclusive-lock
object-map fast-diff deep-flatten
rbd: error opening image foo: (2) No such file or directory
root@lol-server-045:/var/log# rbd feature disable dummy01 exclusive-lock
object-map fast-diff deep-flatten
root@lol-server-045:/var/log# rbd map dummy01 --pool rbd
/dev/rbd3
```

- rbd-docker-plugin.log entry following restart of the rbd-docker driver
service)

```
2018/01/07 23:45:20 main.go:121: INFO: Creating Docker VolumeDriver Handler
2018/01/07 23:45:20 main.go:125: INFO: Opening Socket for Docker to
connect: /run/docker/plugins/rbd.sock
2018/01/07 23:45:29 main.go:141: INFO: received TERM or KILL signal:
terminated
2018/01/07 23:45:29 main.go:190: INFO: closing log file
2018/01/07 23:45:29 main.go:91: INFO: starting rbd-docker-plugin version
2.0.1
2018/01/07 23:45:29 main.go:92: INFO: canCreateVolumes=true,
removeAction="ignore"
2018/01/07 23:45:29 main.go:101: INFO: Setting up Ceph Driver for
PluginID=rbd, cluster=, ceph-user=docker, pool=rbd,
mount=/var/lib/docker-volumes, config=/etc/ceph/ceph.conf
2018/01/07 23:45:29 driver.go:85: INFO: newCephRBDVolumeDriver: setting
base mount dir=/var/lib/docker-volumes/rbd
2018/01/07 23:45:29 main.go:121: INFO: Creating Docker VolumeDriver Handler
2018/01/07 23:45:29 main.go:125: INFO: Opening Socket for Docker to
connect: /run/docker/plugins/rbd.sock
```

- when attempting to run a docker image, specifying a volume that does not
yet exist:

```
root@lol-server-045:/var/log# docker run  -u 0 --privileged -it
--volume-driver rbd -v dummy02:/mnt:rw centos:latest bash
```


docker: Error response from daemon: create dummy02: VolumeDriver.Create:
Unable to create Ceph RBD Image(dummy02): exit status 2.
```

- Log entry:

```
2018/01/07 23:45:29 driver.go:85: INFO: newCephRBDVolumeDriver: setting
base mount dir=/var/lib/docker-volumes/rbd
2018/01/07 23:45:29 main.go:121: INFO: Creating Docker VolumeDriver Handler
2018/01/07 23:45:29 main.go:125: INFO: Opening Socket for Docker to
connect: /run/docker/plugins/rbd.sock
2018/01/07 23:46:56 api.go:188: Entering go-plugins-helpers getPath
2018/01/07 23:46:56 driver.go:467: WARN: Image dummy02 does not exist
2018/01/07 23:46:56 api.go:132: Entering go-plugins-helpers createPath
2018/01/07 23:46:56 driver.go:145: INFO: API Create(&{"dummy02" map[]})
2018/01/07 23:46:56 driver.go:153: INFO: createImage(&{"dummy02" map[]})
2018/01/07 23:46:56 driver.go:687: INFO: Attempting to create new RBD
Image: (rbd/dummy02, %!s(int=20480), xfs)
2018/01/07 23:46:56 driver.go:203: ERROR: Unable to create Ceph RBD
Image(dummy02): exit status 2
```

- docker log entries:


```
Jan  7 23:42:03 lol-server-045 kernel: [4063726.059726] aufs
au_opts_verify:1597:dockerd[107149]: dirperm1 breaks the protection by the
permission bits on the lower branch
Jan  7 23:42:30 lol-server-045 kernel: [4063752.624828] aufs
au_opts_verify:1597:dockerd[107147]: dirperm1 breaks the protection by the
permission bits on the lower branch
Jan  7 23:45:20 lol-server-045 rbd-docker-plugin[77813]: 2018/01/07
23:45:20 main.go:179: INFO: setting log file: /var/log/rbd-docker-plugin.log
Jan  7 23:45:29 lol-server-045 rbd-docker-plugin[77856]: 2018/01/07
23:45:29 

[ceph-users] Stuck pgs (activating+remapped) and slow requests after adding OSD node via ceph-ansible

2018-01-07 Thread Tzachi Strul
Hi all,
We have 5 node ceph cluster (Luminous 12.2.1) installed via ceph-ansible.
All servers have 16X1.5TB SSD disks.
3 of these servers are also acting as MON+MGRs.
We don't have separated network for cluster and public, each node has 4
NICs bonded together (40G) and serves cluster+public communication (We know
it's not ideal and planning to change it).

Last week we added another node to cluster (another 16*1.5TB ssd).
We used ceph-ansible latest stable release.
After OSD activation cluster started rebalancing and problems began:
1. Cluster entered HEALTH_ERROR state
2. 67 pgs stuck at activating+remapped
3. A lot of blocked slow requests.

This cluster serves OpenStack volumes and almost all OpenStack instances
got 100% disk utilization and hanged, eventually, cinder-volume has crushed.

Eventually, after restarting several OSDs, problem solved and cluster got
back to HEALTH_OK

Our configuration already has:
osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery op priority = 1

In addition, we see a lot of bad mappings:
for example: bad mapping rule 0 x 52 num_rep 8 result [32,5,78,25,96,59,80]

What can be the cause and what can I do in order to avoid this situation?
we need to add another 9 osd servers and can't afford downtime.

Any help would be appreciated. Thank you very much


Our ceph configuration:

[mgr]
mgr_modules = dashboard zabbix

[global]
cluster network = *removed for security resons*
fsid =  *removed for security resons*
mon host =  *removed for security resons*
mon initial members =  *removed for security resons*
mon osd down out interval = 900
osd pool default size = 3
public network =  *removed for security resons*

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be
writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and
allowed by SELinux or AppArmor

[osd]
osd backfill scan max = 16
osd backfill scan min = 4
osd bluestore cache size = 104857600  **Due to 12.2.1 bluestore memory leak
bug**
osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery max single start = 1
osd recovery op priority = 1
osd recovery threads = 1


--

*Tzachi Strul*

*Storage DevOps *// *Kenshoo*

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com