Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-04 Thread Wido den Hollander



On 6/5/19 8:44 AM, jes...@krogh.cc wrote:
> Hi.
> 
> This is more an inquiry to figure out how our current setup compares
> to other setups. I have a 3 x replicated SSD pool with RBD images.
> When running fio on /tmp I'm interested in seeing how much IOPS a
> single thread can get - as Ceph scales up very nicely with concurrency.
> 
> Currently 34 OSD of ~896GB Intel D3-4510's each over 7 OSD-hosts.
> 
> jk@iguana:/tmp$ for i in 01 02 03 04 05 06 07; do ping -c 10 ceph-osd$i;
> done  |egrep '(statistics|rtt)'
> --- ceph-osd01.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.316/0.381/0.483/0.056 ms
> --- ceph-osd02.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.293/0.415/0.625/0.100 ms
> --- ceph-osd03.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.319/0.395/0.558/0.074 ms
> --- ceph-osd04.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.224/0.352/0.492/0.077 ms
> --- ceph-osd05.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.257/0.360/0.444/0.059 ms
> --- ceph-osd06.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.209/0.334/0.442/0.062 ms
> --- ceph-osd07.nzcorp.net ping statistics ---
> rtt min/avg/max/mdev = 0.259/0.401/0.517/0.069 ms
> 
> Ok, average network latency from VM to OSD's ~0.4ms.
> 
> $ fio fio-job-randr.ini
> test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
> fio-2.2.10
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [2145KB/0KB/0KB /s] [536/0/0 iops]
> [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=29519: Wed Jun  5 08:40:51 2019
>   Description  : [fio random 4k reads]
>   read : io=143352KB, bw=2389.2KB/s, iops=597, runt= 60001msec
> slat (usec): min=8, max=1925, avg=30.24, stdev=13.56
> clat (usec): min=7, max=321039, avg=1636.47, stdev=4346.52
>  lat (usec): min=102, max=321074, avg=1667.58, stdev=4346.57
> clat percentiles (usec):
>  |  1.00th=[  157],  5.00th=[  844], 10.00th=[  924], 20.00th=[ 1012],
>  | 30.00th=[ 1096], 40.00th=[ 1160], 50.00th=[ 1224], 60.00th=[ 1304],
>  | 70.00th=[ 1400], 80.00th=[ 1528], 90.00th=[ 1768], 95.00th=[ 2128],
>  | 99.00th=[11328], 99.50th=[18304], 99.90th=[51456], 99.95th=[94720],
>  | 99.99th=[216064]
> bw (KB  /s): min=0, max= 3089, per=99.39%, avg=2374.50, stdev=472.15
> lat (usec) : 10=0.01%, 100=0.01%, 250=2.95%, 500=0.03%, 750=0.27%
> lat (usec) : 1000=14.96%
> lat (msec) : 2=75.87%, 4=2.99%, 10=1.78%, 20=0.73%, 50=0.30%
> lat (msec) : 100=0.07%, 250=0.03%, 500=0.01%
>   cpu  : usr=0.76%, sys=3.29%, ctx=38871, majf=0, minf=11
>   IO depths: 1=108.2%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> =64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> =64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> =64=0.0%
>  issued: total=r=35838/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>READ: io=143352KB, aggrb=2389KB/s, minb=2389KB/s, maxb=2389KB/s,
> mint=60001msec, maxt=60001msec
> 
> Disk stats (read/write):
>   vda: ios=38631/51, merge=0/3, ticks=62668/40, in_queue=62700, util=96.77%
> 
> 
> And fio-file:
> $ cat fio-job-randr.ini
> [global]
> readwrite=randread
> blocksize=4k
> ioengine=libaio
> numjobs=1
> thread=0
> direct=1
> iodepth=1
> group_reporting=1
> ramp_time=5
> norandommap=1
> description=fio random 4k reads
> time_based=1
> runtime=60
> randrepeat=0
> 
> [test]
> size=1g
> 
> 
> Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
> Is that comparable to what other are seeing?

Something <1ms is possible. I did a test last week and got 1236 IOps
with a 4k 3x write.

Think about:

- Disable Ceph logging
- Pin CPU to C-State 1
- Disable CPU powersaving

Try again!

Wido

> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Single threaded IOPS on SSD pool.

2019-06-04 Thread jesper
Hi.

This is more an inquiry to figure out how our current setup compares
to other setups. I have a 3 x replicated SSD pool with RBD images.
When running fio on /tmp I'm interested in seeing how much IOPS a
single thread can get - as Ceph scales up very nicely with concurrency.

Currently 34 OSD of ~896GB Intel D3-4510's each over 7 OSD-hosts.

jk@iguana:/tmp$ for i in 01 02 03 04 05 06 07; do ping -c 10 ceph-osd$i;
done  |egrep '(statistics|rtt)'
--- ceph-osd01.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.316/0.381/0.483/0.056 ms
--- ceph-osd02.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.293/0.415/0.625/0.100 ms
--- ceph-osd03.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.319/0.395/0.558/0.074 ms
--- ceph-osd04.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.224/0.352/0.492/0.077 ms
--- ceph-osd05.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.257/0.360/0.444/0.059 ms
--- ceph-osd06.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.209/0.334/0.442/0.062 ms
--- ceph-osd07.nzcorp.net ping statistics ---
rtt min/avg/max/mdev = 0.259/0.401/0.517/0.069 ms

Ok, average network latency from VM to OSD's ~0.4ms.

$ fio fio-job-randr.ini
test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [2145KB/0KB/0KB /s] [536/0/0 iops]
[eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=29519: Wed Jun  5 08:40:51 2019
  Description  : [fio random 4k reads]
  read : io=143352KB, bw=2389.2KB/s, iops=597, runt= 60001msec
slat (usec): min=8, max=1925, avg=30.24, stdev=13.56
clat (usec): min=7, max=321039, avg=1636.47, stdev=4346.52
 lat (usec): min=102, max=321074, avg=1667.58, stdev=4346.57
clat percentiles (usec):
 |  1.00th=[  157],  5.00th=[  844], 10.00th=[  924], 20.00th=[ 1012],
 | 30.00th=[ 1096], 40.00th=[ 1160], 50.00th=[ 1224], 60.00th=[ 1304],
 | 70.00th=[ 1400], 80.00th=[ 1528], 90.00th=[ 1768], 95.00th=[ 2128],
 | 99.00th=[11328], 99.50th=[18304], 99.90th=[51456], 99.95th=[94720],
 | 99.99th=[216064]
bw (KB  /s): min=0, max= 3089, per=99.39%, avg=2374.50, stdev=472.15
lat (usec) : 10=0.01%, 100=0.01%, 250=2.95%, 500=0.03%, 750=0.27%
lat (usec) : 1000=14.96%
lat (msec) : 2=75.87%, 4=2.99%, 10=1.78%, 20=0.73%, 50=0.30%
lat (msec) : 100=0.07%, 250=0.03%, 500=0.01%
  cpu  : usr=0.76%, sys=3.29%, ctx=38871, majf=0, minf=11
  IO depths: 1=108.2%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=35838/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=143352KB, aggrb=2389KB/s, minb=2389KB/s, maxb=2389KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  vda: ios=38631/51, merge=0/3, ticks=62668/40, in_queue=62700, util=96.77%


And fio-file:
$ cat fio-job-randr.ini
[global]
readwrite=randread
blocksize=4k
ioengine=libaio
numjobs=1
thread=0
direct=1
iodepth=1
group_reporting=1
ramp_time=5
norandommap=1
description=fio random 4k reads
time_based=1
runtime=60
randrepeat=0

[test]
size=1g


Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
Is that comparable to what other are seeing?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Two questions about ceph update/upgrade strategies

2019-06-04 Thread Martin Verges
Hello Rainer,

most of the time you just install the newer versions an restart the
old ones without having to worry about the sequence.

Otherwise just use a management solution that helps you with any
day-to-day operation including the complete software update part.
Something like you can see in this video
https://www.youtube.com/watch?v=Jrnzlylidjs.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Di., 4. Juni 2019 um 11:19 Uhr schrieb Rainer Krienke
:
>
> I have a fresh ceph 14.2.1 cluster up and running based on Ubuntu 18.04.
> It consists of 9 hosts (+1 admin host). The nine hosts have each 16
> ceph-osd daemons running, three in these nine hosts also have
> a ceph-mon and a ceph-mgr daemon running. So three hosts are running
> osd, mon and also mgr daemons.
>
> Now I am unsure about the right way to go for ceph upgrades and linux
> system host updates.
>
> Ceph-Upgrade:
> Reading the ceph upgrade docs I ask myself how a future upgrade say to
> 14.2.2 should be performed correctly? The recommendation says to upgrade
> first monitors, then osds etc...
>
> So what is the correct way to go in a mixed setup like mine? Following
> the rules strictly would mean not to use ceph-deploy install, but
> instead to log into the mon(/osd) hosts and then upgrade only the
> ceph-mon package and restart this mon, and then do the same with the
> other monitors/osd hosts. After all mons have been successfully upgraded
> I should then continue with upgrading OSDs (ceph-osd package) on one
> host and restart all osds on this host one after another or reboot the
> whole host. Then proceed to the next osd-host.
>
> Is this the correct and best way to go?
>
> Linux system updates:
> The second point I would like to hear your opinions about is how you
> handle linux system updates? Since even a non ceph linux system package
> update might break ceph or even stop the whole linux host from booting,
> care has to be taken. So how do you handle this problem? Do you run host
> upgrades only manually in a fixed sequence eg first on a osd/mon host
> and if the update is successful, then run the linux system package
> updates on the other hosts?   Do you use another strategy?
>
> Thanks
> Rainer
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-04 Thread 解决


Thank your help, jason
I find the reason.The exclusive-lock of image do not release after the 
disaster test. I release the exclusive-lock then The virtual machine start 
properly,and it can also create snap with nova user.


At 2019-06-04 20:13:35, "Jason Dillaman"  wrote:
>On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>>
>> Hi all,
>> We use ceph(luminous) + openstack(queens) in my test environment。The 
>> virtual machine does not start properly after the disaster test and the 
>> image of virtual machine can not create snap.The procedure is as follows:
>> #!/usr/bin/env python
>>
>> import rados
>> import rbd
>> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
>> with cluster.open_ioctx('vms') as ioctx:
>> rbd_inst = rbd.RBD()
>> print "start open rbd image"
>> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') 
>> as image:
>> print "start create snapshot"
>> image.create_snap('myimage_snap1')
>>
>> when i run it ,it show readonlyimage,as follows:
>>
>> start open rbd image
>> start create snapshot
>> Traceback (most recent call last):
>>   File "testpool.py", line 17, in 
>> image.create_snap('myimage_snap1')
>>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
>> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15682)
>> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 from 
>> 10df4634-4401-45ca-9c57-f349b78da475_disk
>>
>> but i run it with admin instead of nova,it is ok.
>>
>> "ceph auth list"  as follow
>>
>> installed auth entries:
>>
>> osd.1
>> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.10
>> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.11
>> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.2
>> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.4
>> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.5
>> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.7
>> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.8
>> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> client.admin
>> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
>> auid: 0
>> caps: [mds] allow
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>> client.cinder
>> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
>> pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
>> client.cinder-backup
>> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=backups, allow rwx pool=backups-cache
>> client.glance
>> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=images, allow rwx pool=images-cache
>> client.nova
>> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
>> pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache
>> client.radosgw.gateway
>> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
>> auid: 0
>> caps: [mon] allow rwx
>> caps: [osd] allow rwx
>> mgr.172.30.126.26
>> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.27
>> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.28
>> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>>
>>
>> Can someone explain it to me?
>
>Your clients don't have the correct caps. See [1] or [2].
>
>
>> thanks!!
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>[1] 
>http://docs.ceph.com/docs/mimic/releases/luminous/#upgrade-from-jewel-or-kraken
>[2] 
>http://docs.ceph.com/docs/luminous/rbd/rados-rbd-cmds/#create-a-block-device-user
>
>-- 
>Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.5 Luminous released

2019-06-04 Thread Alex Gorbachev
On Tue, Jun 4, 2019 at 3:32 PM Sage Weil  wrote:
>
> [pruning CC list]
>
> On Tue, 4 Jun 2019, Alex Gorbachev wrote:
> > Late question, but I am noticing
> >
> > ceph-volume: automatic VDO detection
> >
> > Does this mean that the OSD layer will at some point support
> > deployment with VDO?
> >
> > Or that one could build on top of VDO devices and Ceph would detect
> > this and report somewhere?
>
> Some preliminary support is there, in that the OSD will detect it is
> consuming VDO, and it will query VDO for utilization/freespace and report
> that (instead of the size of the thinly-provisioned VDO device it is
> consuming).
>
> However, there isn't any automated testing, we haven't done any
> performance analysis, and it's not clear that the balancer will behave
> well as it does not yet take into consideration the consumed storage vs
> stored storage (which means that variation in the compressibility/dedup of
> data on different OSDs will throw the OSD data balancing off).
>
> Aside from that balancer issue (which should be fixed independent of
> VDO!), I don't know of any current plans to pursue the ceph OSDs on VDO.
> I'm hoping that we can get a distributed dedup solution in place...
>
> sage

Thank you for the quick response, Sage.

I did some very basic testing with VDO on block devices, and seem to
have about 40% to 70% of the original device's performance (only
measured throughput), likely based on the data being fed into VDO.
Also, fio is not likely the best tool, i.e. dedup performance and
ratio will vary wildly, depending on the data being used (common
sense).

I see a good use case at higher levels, maybe between RBD and whatever
consumes it, will try that next.

Best,
Alex

>
>
>
> >
> > Best,
> > --
> > Alex Gorbachev
> > ISS Storcium
> >
> > On Tue, Apr 24, 2018 at 4:29 PM Abhishek  wrote:
> > >
> > > Hello cephers,
> > >
> > > We're glad to announce the fifth bugfix release of Luminous v12.2.x long
> > > term stable
> > > release series. This release contains a range of bug fixes across all
> > > compoenents of Ceph. We recommend all the users of 12.2.x series to
> > > update.
> > >
> > > Notable Changes
> > > ---
> > >
> > > * MGR
> > >
> > >The ceph-rest-api command-line tool included in the ceph-mon
> > >package has been obsoleted by the MGR "restful" module. The
> > >ceph-rest-api tool is hereby declared deprecated and will be dropped
> > >in Mimic.
> > >
> > >The MGR "restful" module provides similar functionality via a "pass
> > > through"
> > >method. See http://docs.ceph.com/docs/luminous/mgr/restful for
> > > details.
> > >
> > > * CephFS
> > >
> > >Upgrading an MDS cluster to 12.2.3+ will result in all active MDS
> > >exiting due to feature incompatibilities once an upgraded MDS comes
> > >online (even as standby). Operators may ignore the error messages
> > >and continue upgrading/restarting or follow this upgrade sequence:
> > >
> > >Reduce the number of ranks to 1 (`ceph fs set  max_mds 1`),
> > >wait for all other MDS to deactivate, leaving the one active MDS,
> > >upgrade the single active MDS, then upgrade/start standbys. Finally,
> > >restore the previous max_mds.
> > >
> > >See also: https://tracker.ceph.com/issues/23172
> > >
> > >
> > > Other Notable Changes
> > > -
> > >
> > > * add --add-bucket and --move options to crushtool (issue#23472,
> > > issue#23471, pr#21079, Kefu Chai)
> > > * BlueStore.cc: _balance_bluefs_freespace: assert(0 == "allocate failed,
> > > wtf") (issue#23063, pr#21394, Igor Fedotov, xie xingguo, Sage Weil, Zac
> > > Medico)
> > > * bluestore: correctly check all block devices to decide if journal
> > > is\_… (issue#23173, issue#23141, pr#20651, Greg Farnum)
> > > * bluestore: statfs available can go negative (issue#23074, pr#20554,
> > > Igor Fedotov, Sage Weil)
> > > * build Debian installation packages failure (issue#22856, issue#22828,
> > > pr#20250, Tone Zhang)
> > > * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> > > pr#20748, Nathan Cutler)
> > > * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> > > pr#21233, Nathan Cutler)
> > > * build/ops: run-make-check.sh: fix SUSE support (issue#22875,
> > > issue#23178, pr#20737, Nathan Cutler)
> > > * cephfs-journal-tool: Fix Dumper destroyed before shutdown
> > > (issue#22862, issue#22734, pr#20251, dongdong tao)
> > > * ceph.in: print all matched commands if arg missing (issue#22344,
> > > issue#23186, pr#20664, Luo Kexue, Kefu Chai)
> > > * ceph-objectstore-tool command to trim the pg log (issue#23242,
> > > pr#20803, Josh Durgin, David Zafman)
> > > * ceph osd force-create-pg cause all ceph-mon to crash and unable to
> > > come up again (issue#22942, pr#20399, Sage Weil)
> > > * ceph-volume: adds raw device support to 'lvm list' (issue#23140,
> > > pr#20647, Andrew Schoen)
> > > * ceph-volume: allow parallel creates (issue#23757, pr#21509, Theofilos
> > > Mou

Re: [ceph-users] v12.2.5 Luminous released

2019-06-04 Thread Sage Weil
[pruning CC list]

On Tue, 4 Jun 2019, Alex Gorbachev wrote:
> Late question, but I am noticing
> 
> ceph-volume: automatic VDO detection
> 
> Does this mean that the OSD layer will at some point support
> deployment with VDO?
> 
> Or that one could build on top of VDO devices and Ceph would detect
> this and report somewhere?

Some preliminary support is there, in that the OSD will detect it is 
consuming VDO, and it will query VDO for utilization/freespace and report 
that (instead of the size of the thinly-provisioned VDO device it is 
consuming).

However, there isn't any automated testing, we haven't done any 
performance analysis, and it's not clear that the balancer will behave 
well as it does not yet take into consideration the consumed storage vs 
stored storage (which means that variation in the compressibility/dedup of 
data on different OSDs will throw the OSD data balancing off).

Aside from that balancer issue (which should be fixed independent of 
VDO!), I don't know of any current plans to pursue the ceph OSDs on VDO.  
I'm hoping that we can get a distributed dedup solution in place...

sage



> 
> Best,
> --
> Alex Gorbachev
> ISS Storcium
> 
> On Tue, Apr 24, 2018 at 4:29 PM Abhishek  wrote:
> >
> > Hello cephers,
> >
> > We're glad to announce the fifth bugfix release of Luminous v12.2.x long
> > term stable
> > release series. This release contains a range of bug fixes across all
> > compoenents of Ceph. We recommend all the users of 12.2.x series to
> > update.
> >
> > Notable Changes
> > ---
> >
> > * MGR
> >
> >The ceph-rest-api command-line tool included in the ceph-mon
> >package has been obsoleted by the MGR "restful" module. The
> >ceph-rest-api tool is hereby declared deprecated and will be dropped
> >in Mimic.
> >
> >The MGR "restful" module provides similar functionality via a "pass
> > through"
> >method. See http://docs.ceph.com/docs/luminous/mgr/restful for
> > details.
> >
> > * CephFS
> >
> >Upgrading an MDS cluster to 12.2.3+ will result in all active MDS
> >exiting due to feature incompatibilities once an upgraded MDS comes
> >online (even as standby). Operators may ignore the error messages
> >and continue upgrading/restarting or follow this upgrade sequence:
> >
> >Reduce the number of ranks to 1 (`ceph fs set  max_mds 1`),
> >wait for all other MDS to deactivate, leaving the one active MDS,
> >upgrade the single active MDS, then upgrade/start standbys. Finally,
> >restore the previous max_mds.
> >
> >See also: https://tracker.ceph.com/issues/23172
> >
> >
> > Other Notable Changes
> > -
> >
> > * add --add-bucket and --move options to crushtool (issue#23472,
> > issue#23471, pr#21079, Kefu Chai)
> > * BlueStore.cc: _balance_bluefs_freespace: assert(0 == "allocate failed,
> > wtf") (issue#23063, pr#21394, Igor Fedotov, xie xingguo, Sage Weil, Zac
> > Medico)
> > * bluestore: correctly check all block devices to decide if journal
> > is\_… (issue#23173, issue#23141, pr#20651, Greg Farnum)
> > * bluestore: statfs available can go negative (issue#23074, pr#20554,
> > Igor Fedotov, Sage Weil)
> > * build Debian installation packages failure (issue#22856, issue#22828,
> > pr#20250, Tone Zhang)
> > * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> > pr#20748, Nathan Cutler)
> > * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> > pr#21233, Nathan Cutler)
> > * build/ops: run-make-check.sh: fix SUSE support (issue#22875,
> > issue#23178, pr#20737, Nathan Cutler)
> > * cephfs-journal-tool: Fix Dumper destroyed before shutdown
> > (issue#22862, issue#22734, pr#20251, dongdong tao)
> > * ceph.in: print all matched commands if arg missing (issue#22344,
> > issue#23186, pr#20664, Luo Kexue, Kefu Chai)
> > * ceph-objectstore-tool command to trim the pg log (issue#23242,
> > pr#20803, Josh Durgin, David Zafman)
> > * ceph osd force-create-pg cause all ceph-mon to crash and unable to
> > come up again (issue#22942, pr#20399, Sage Weil)
> > * ceph-volume: adds raw device support to 'lvm list' (issue#23140,
> > pr#20647, Andrew Schoen)
> > * ceph-volume: allow parallel creates (issue#23757, pr#21509, Theofilos
> > Mouratidis)
> > * ceph-volume: allow skipping systemd interactions on activate/create
> > (issue#23678, pr#21538, Alfredo Deza)
> > * ceph-volume: automatic VDO detection (issue#23581, pr#21505, Alfredo
> > Deza)
> > * ceph-volume be resilient to $PATH issues (pr#20716, Alfredo Deza)
> > * ceph-volume: fix action plugins path in tox (pr#20923, Guillaume
> > Abrioux)
> > * ceph-volume Implement an 'activate all' to help with dense servers or
> > migrating OSDs (pr#21533, Alfredo Deza)
> > * ceph-volume improve robustness when reloading vms in tests (pr#21072,
> > Alfredo Deza)
> > * ceph-volume lvm.activate error if no bluestore OSDs are found
> > (issue#23644, pr#21335, Alfredo Deza)
> > * ceph-volume: Nits noticed while studying code (pr#21

Re: [ceph-users] v12.2.5 Luminous released

2019-06-04 Thread Alex Gorbachev
Late question, but I am noticing

ceph-volume: automatic VDO detection

Does this mean that the OSD layer will at some point support
deployment with VDO?

Or that one could build on top of VDO devices and Ceph would detect
this and report somewhere?

Best,
--
Alex Gorbachev
ISS Storcium

On Tue, Apr 24, 2018 at 4:29 PM Abhishek  wrote:
>
> Hello cephers,
>
> We're glad to announce the fifth bugfix release of Luminous v12.2.x long
> term stable
> release series. This release contains a range of bug fixes across all
> compoenents of Ceph. We recommend all the users of 12.2.x series to
> update.
>
> Notable Changes
> ---
>
> * MGR
>
>The ceph-rest-api command-line tool included in the ceph-mon
>package has been obsoleted by the MGR "restful" module. The
>ceph-rest-api tool is hereby declared deprecated and will be dropped
>in Mimic.
>
>The MGR "restful" module provides similar functionality via a "pass
> through"
>method. See http://docs.ceph.com/docs/luminous/mgr/restful for
> details.
>
> * CephFS
>
>Upgrading an MDS cluster to 12.2.3+ will result in all active MDS
>exiting due to feature incompatibilities once an upgraded MDS comes
>online (even as standby). Operators may ignore the error messages
>and continue upgrading/restarting or follow this upgrade sequence:
>
>Reduce the number of ranks to 1 (`ceph fs set  max_mds 1`),
>wait for all other MDS to deactivate, leaving the one active MDS,
>upgrade the single active MDS, then upgrade/start standbys. Finally,
>restore the previous max_mds.
>
>See also: https://tracker.ceph.com/issues/23172
>
>
> Other Notable Changes
> -
>
> * add --add-bucket and --move options to crushtool (issue#23472,
> issue#23471, pr#21079, Kefu Chai)
> * BlueStore.cc: _balance_bluefs_freespace: assert(0 == "allocate failed,
> wtf") (issue#23063, pr#21394, Igor Fedotov, xie xingguo, Sage Weil, Zac
> Medico)
> * bluestore: correctly check all block devices to decide if journal
> is\_… (issue#23173, issue#23141, pr#20651, Greg Farnum)
> * bluestore: statfs available can go negative (issue#23074, pr#20554,
> Igor Fedotov, Sage Weil)
> * build Debian installation packages failure (issue#22856, issue#22828,
> pr#20250, Tone Zhang)
> * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> pr#20748, Nathan Cutler)
> * build/ops: deb: move python-jinja2 dependency to mgr (issue#22457,
> pr#21233, Nathan Cutler)
> * build/ops: run-make-check.sh: fix SUSE support (issue#22875,
> issue#23178, pr#20737, Nathan Cutler)
> * cephfs-journal-tool: Fix Dumper destroyed before shutdown
> (issue#22862, issue#22734, pr#20251, dongdong tao)
> * ceph.in: print all matched commands if arg missing (issue#22344,
> issue#23186, pr#20664, Luo Kexue, Kefu Chai)
> * ceph-objectstore-tool command to trim the pg log (issue#23242,
> pr#20803, Josh Durgin, David Zafman)
> * ceph osd force-create-pg cause all ceph-mon to crash and unable to
> come up again (issue#22942, pr#20399, Sage Weil)
> * ceph-volume: adds raw device support to 'lvm list' (issue#23140,
> pr#20647, Andrew Schoen)
> * ceph-volume: allow parallel creates (issue#23757, pr#21509, Theofilos
> Mouratidis)
> * ceph-volume: allow skipping systemd interactions on activate/create
> (issue#23678, pr#21538, Alfredo Deza)
> * ceph-volume: automatic VDO detection (issue#23581, pr#21505, Alfredo
> Deza)
> * ceph-volume be resilient to $PATH issues (pr#20716, Alfredo Deza)
> * ceph-volume: fix action plugins path in tox (pr#20923, Guillaume
> Abrioux)
> * ceph-volume Implement an 'activate all' to help with dense servers or
> migrating OSDs (pr#21533, Alfredo Deza)
> * ceph-volume improve robustness when reloading vms in tests (pr#21072,
> Alfredo Deza)
> * ceph-volume lvm.activate error if no bluestore OSDs are found
> (issue#23644, pr#21335, Alfredo Deza)
> * ceph-volume: Nits noticed while studying code (pr#21565, Dan Mick)
> * ceph-volume tests alleviate libvirt timeouts when reloading
> (issue#23163, pr#20754, Alfredo Deza)
> * ceph-volume update man page for prepare/activate flags (pr#21574,
> Alfredo Deza)
> * ceph-volume: Using --readonly for {vg|pv|lv}s commands (pr#21519,
> Erwan Velu)
> * client: allow client to use caps that are revoked but not yet returned
> (issue#23028, issue#23314, pr#20904, Jeff Layton)
> * : Client:Fix readdir bug (issue#22936, pr#20356, dongdong tao)
> * client: release revoking Fc after invalidate cache (issue#22652,
> pr#20342, "Yan, Zheng")
> * Client: setattr should drop "Fs" rather than "As" for mtime and size
> (issue#22935, pr#20354, dongdong tao)
> * client: use either dentry_invalidate_cb or remount_cb to invalidate k…
> (issue#23355, pr#20960, Zhi Zhang)
> * cls/rbd: group_image_list incorrectly flagged as RW (issue#23407,
> issue#23388, pr#20967, Jason Dillaman)
> * cls/rgw: fix bi_log_iterate_entries return wrong truncated
> (issue#22737, issue#23225, pr#21054, Tianshan Qu)
> * cmake: rbd resource agent needs

[ceph-users] ceph monitor keep crash

2019-06-04 Thread Jianyu Li
Hello,

I have a ceph cluster running over 2 years and the monitor began crash
since yesterday. I had some flapping OSDs up and down occasionally,
sometimes I need to rebuild the OSD. I found 3 OSDs are down yesterday,
they may cause this issue or may not.

Ceph Version: 12.2.12, ( upgraded from 12.2.8 not fix the issue)
I have 5 mon nodes, when I start mon service on the first 2 nodes, they are
good. Once I start the service on the third node, All 3 nodes begin keeping
up/down(flapping) due to Aborted in OSDMonitor::build_incremental. I also
tried to recover monitor from 1 node(remove other 4 nodes) by injecting
monmap, the node keep crash as well.

See below crash log from mon
May 31 02:26:09 ctlr101 systemd[1]: Started Ceph cluster monitor daemon.
May 31 02:26:09 ctlr101 ceph-mon[2632098]: 2019-05-31 02:26:09.345533
7fe250321080 -1 compacting monitor store ...
May 31 02:26:11 ctlr101 ceph-mon[2632098]: 2019-05-31 02:26:11.320926
7fe250321080 -1 done compacting
May 31 02:26:16 ctlr101 ceph-mon[2632098]: 2019-05-31 02:26:16.497933
7fe242925700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR 13 osds
down; 1 host (6 osds) down; 74266/2566020 objects misplace
May 31 02:26:16 ctlr101 ceph-mon[2632098]: *** Caught signal (Aborted) **
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  in thread 7fe24692d700
thread_name:ms_dispatch
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  1: (()+0x9e6334)
[0x558c5f2fb334]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  2: (()+0x11390) [0x7fe24f6ce390]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  3: (gsignal()+0x38)
[0x7fe24dc14428]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  4: (abort()+0x16a)
[0x7fe24dc1602a]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  5:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x9c5) [0x558c5ee80455]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  6:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr)+0xcf) [0x558c5ee80b3f]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  7:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x22d) [0x558c5ee8622d]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  8:
(Monitor::handle_subscribe(boost::intrusive_ptr)+0x1082)
[0x558c5ecdb0b2]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  9:
(Monitor::dispatch_op(boost::intrusive_ptr)+0x9f4)
[0x558c5ed05114]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  10:
(Monitor::_ms_dispatch(Message*)+0x6db) [0x558c5ed061ab]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  11:
(Monitor::ms_dispatch(Message*)+0x23) [0x558c5ed372c3]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  12:
(DispatchQueue::entry()+0xf4a) [0x558c5f2a205a]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  13:
(DispatchQueue::DispatchThread::entry()+0xd) [0x558c5f035dcd]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  14: (()+0x76ba) [0x7fe24f6c46ba]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  15: (clone()+0x6d)
[0x7fe24dce641d]
May 31 02:26:16 ctlr101 ceph-mon[2632098]: 2019-05-31 02:26:16.578932
7fe24692d700 -1 *** Caught signal (Aborted) **
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  in thread 7fe24692d700
thread_name:ms_dispatch
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  1: (()+0x9e6334)
[0x558c5f2fb334]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  2: (()+0x11390) [0x7fe24f6ce390]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  3: (gsignal()+0x38)
[0x7fe24dc14428]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  4: (abort()+0x16a)
[0x7fe24dc1602a]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  5:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x9c5) [0x558c5ee80455]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  6:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr)+0xcf) [0x558c5ee80b3f]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  7:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x22d) [0x558c5ee8622d]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  8:
(Monitor::handle_subscribe(boost::intrusive_ptr)+0x1082)
[0x558c5ecdb0b2]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  9:
(Monitor::dispatch_op(boost::intrusive_ptr)+0x9f4)
[0x558c5ed05114]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  10:
(Monitor::_ms_dispatch(Message*)+0x6db) [0x558c5ed061ab]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  11:
(Monitor::ms_dispatch(Message*)+0x23) [0x558c5ed372c3]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  12:
(DispatchQueue::entry()+0xf4a) [0x558c5f2a205a]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  13:
(DispatchQueue::DispatchThread::entry()+0xd) [0x558c5f035dcd]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  14: (()+0x76ba) [0x7fe24f6c46ba]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  15: (clone()+0x6d)
[0x7fe24dce641d]
May 31 02:26:16 ctlr101 ceph-mon[2632098]:  NOTE: a copy of the executable,
or `objdump -rdS ` is needed to inter

Re: [ceph-users] Large OMAP object in RGW GC pool

2019-06-04 Thread J. Eric Ivancich
On 6/4/19 7:37 AM, Wido den Hollander wrote:
> I've set up a temporary machine next to the 13.2.5 cluster with the
> 13.2.6 packages from Shaman.
> 
> On that machine I'm running:
> 
> $ radosgw-admin gc process
> 
> That seems to work as intended! So the PR seems to have fixed it.
> 
> Should be fixed permanently when 13.2.6 is officially released.
> 
> Wido

Thank you, Wido, for sharing the results of your experiment. I'm happy
to learn that it was successful. And v13.2.6 was just released about 2
hours ago.

Eric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v13.2.6 Mimic released

2019-06-04 Thread Abhishek Lekshmanan

We're glad to announce the sixth bugfix release of the Mimic v13.2.x
long term stable release series. We recommend that all Mimic users
upgrade. We thank everyone for contributing towards this release.

Notable Changes
---
* Ceph v13.2.6 now packages python bindings for python3.6 instead of
  python3.4, because EPEL7 recently switched from python3.4 to
  python3.6 as the native python3. See the announcement[1] _`
  for more details on the background of this change.


For a detailed changelog, please refer to the official blog post entry
at https://ceph.com/releases/v13-2-6-mimic-released/


[1]: 
https://lists.fedoraproject.org/archives/list/epel-annou...@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO

Getting Ceph
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-13.2.6.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 7b695f835b03642f85998b2ae7b6dd093d9fbce4

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>
> Hi all,
> We use ceph(luminous) + openstack(queens) in my test environment。The 
> virtual machine does not start properly after the disaster test and the image 
> of virtual machine can not create snap.The procedure is as follows:
> #!/usr/bin/env python
>
> import rados
> import rbd
> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
> with cluster.open_ioctx('vms') as ioctx:
> rbd_inst = rbd.RBD()
> print "start open rbd image"
> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') as 
> image:
> print "start create snapshot"
> image.create_snap('myimage_snap1')
>
> when i run it ,it show readonlyimage,as follows:
>
> start open rbd image
> start create snapshot
> Traceback (most recent call last):
>   File "testpool.py", line 17, in 
> image.create_snap('myimage_snap1')
>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15682)
> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 from 
> 10df4634-4401-45ca-9c57-f349b78da475_disk
>
> but i run it with admin instead of nova,it is ok.
>
> "ceph auth list"  as follow
>
> installed auth entries:
>
> osd.1
> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.10
> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.11
> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.2
> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.4
> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.5
> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.7
> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.8
> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> client.admin
> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
> auid: 0
> caps: [mds] allow
> caps: [mgr] allow *
> caps: [mon] allow *
> caps: [osd] allow *
> client.cinder
> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
> pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
> client.cinder-backup
> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=backups, allow rwx pool=backups-cache
> client.glance
> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=images, allow rwx pool=images-cache
> client.nova
> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
> pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache
> client.radosgw.gateway
> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
> auid: 0
> caps: [mon] allow rwx
> caps: [osd] allow rwx
> mgr.172.30.126.26
> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.27
> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.28
> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
>
>
> Can someone explain it to me?

Your clients don't have the correct caps. See [1] or [2].


> thanks!!
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] 
http://docs.ceph.com/docs/mimic/releases/luminous/#upgrade-from-jewel-or-kraken
[2] 
http://docs.ceph.com/docs/luminous/rbd/rados-rbd-cmds/#create-a-block-device-user

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
>
> On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
>  wrote:
> >
> > Hi,
> >
> > On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> >
> > Hi everyone,
> >
> >
> >
> > We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We 
> > do not wish to upgrade the actual cluster as all the hardware is EOS and we 
> > upgrade the configuration of the servers.
> >
> > We can’t find a “proper” way to mount two rbd images from two different 
> > cluster on the same host.
> >
> > Does anyone know what is the “good” procedure to achieve this ?
>
> Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> clusters to a single machine (preferably running a Mimic "rbd" client)
> under "/etc/ceph/.conf" and
> "/etc/ceph/.client..keyring".
>
> You can then use "rbd -c  export --export-format 2
>  - | rbd -c  import --export-format=2 -
> ". The "--export-format=2" option will also copy all
> associated snapshots with the images. If you don't want/need the
> snapshots, just drop that optional.

That "-c" should be "--cluster" if specifying by name, otherwise with
"-c" it's the full path to the two different conf files.

> >
> > Just my 2 ct:
> >
> > the 'rbd' commands allows specifying a configuration file (-c). You need to 
> > setup two configuration files, one for each cluster. You can also use two 
> > different cluster names (--cluster option). AFAIK the name is only used to 
> > locate the configuration file. I'm not sure how well the kernel works with 
> > mapping RBDs from two different cluster.
> >
> >
> > If you only want to transfer RBDs from one cluster to another, you do not 
> > need to map and mount them; the 'rbd' command has the sub commands 'export' 
> > and 'import'. You can pipe them to avoid writing data to a local disk. This 
> > should be the fastest way to transfer the RBDs.
> >
> >
> > Regards,
> >
> > Burkhard
> >
> > --
> > Dr. rer. nat. Burkhard Linke
> > Bioinformatics and Systems Biology
> > Justus-Liebig-University Giessen
> > 35392 Giessen, Germany
> > Phone: (+49) (0)641 9935810
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread Jason Dillaman
On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
 wrote:
>
> Hi,
>
> On 6/4/19 10:12 AM, CUZA Frédéric wrote:
>
> Hi everyone,
>
>
>
> We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We 
> do not wish to upgrade the actual cluster as all the hardware is EOS and we 
> upgrade the configuration of the servers.
>
> We can’t find a “proper” way to mount two rbd images from two different 
> cluster on the same host.
>
> Does anyone know what is the “good” procedure to achieve this ?

Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
clusters to a single machine (preferably running a Mimic "rbd" client)
under "/etc/ceph/.conf" and
"/etc/ceph/.client..keyring".

You can then use "rbd -c  export --export-format 2
 - | rbd -c  import --export-format=2 -
". The "--export-format=2" option will also copy all
associated snapshots with the images. If you don't want/need the
snapshots, just drop that optional.

>
> Just my 2 ct:
>
> the 'rbd' commands allows specifying a configuration file (-c). You need to 
> setup two configuration files, one for each cluster. You can also use two 
> different cluster names (--cluster option). AFAIK the name is only used to 
> locate the configuration file. I'm not sure how well the kernel works with 
> mapping RBDs from two different cluster.
>
>
> If you only want to transfer RBDs from one cluster to another, you do not 
> need to map and mount them; the 'rbd' command has the sub commands 'export' 
> and 'import'. You can pipe them to avoid writing data to a local disk. This 
> should be the fastest way to transfer the RBDs.
>
>
> Regards,
>
> Burkhard
>
> --
> Dr. rer. nat. Burkhard Linke
> Bioinformatics and Systems Biology
> Justus-Liebig-University Giessen
> 35392 Giessen, Germany
> Phone: (+49) (0)641 9935810
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-06-04 Thread vitalif

Basically they max out at around 1000 IOPS and report 100%
utilization and feel slow.

Haven't seen the 5200 yet.


Micron 5100s performs wonderfully!

You have to just turn its write cache off:

hdparm -W 0 /dev/sdX

1000 IOPS means you haven't done it. Although even with write cache 
enabled I observe like ~5000 iops, not 1000, but that delta is probably 
just eaten by Ceph :))


With write cache turned off 5100 is capable of up to 4 write iops. 
5200 is slightly worse, but only slightly: it still gives ~25000 iops.


Funny thing is that the same applies to a lot of server SSDs with 
supercapacitors. As I understand when their write cache is turned on 
every `fsync` is translated to SATA FLUSH CACHE, and the latter is 
interpreted by the drive as "please flush all caches, including 
capacitor-protected write cache".


And when you turn it off the drive just writes at its full speed and 
doesn't flush the cache because it has capacitors to account for a 
possible power loss.


You don't need to disable cache explicitly only with some HBAs that do 
it internally.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large OMAP object in RGW GC pool

2019-06-04 Thread Wido den Hollander



On 5/30/19 2:45 PM, Wido den Hollander wrote:
> 
> 
> On 5/29/19 11:22 PM, J. Eric Ivancich wrote:
>> Hi Wido,
>>
>> When you run `radosgw-admin gc list`, I assume you are *not* using the
>> "--include-all" flag, right? If you're not using that flag, then
>> everything listed should be expired and be ready for clean-up. If after
>> running `radosgw-admin gc process` the same entries appear in
>> `radosgw-admin gc list` then gc apparently stalled.
>>
> 
> Not using the --include-all in both cases.
> 
> GC seems to stall and doesn't do anything when looking at it with
> --debug-rados=10
> 
>> There were a few bugs within gc processing that could prevent it from
>> making forward progress. They were resolved with a PR (master:
>> https://github.com/ceph/ceph/pull/26601 ; mimic backport:
>> https://github.com/ceph/ceph/pull/27796). Unfortunately that code was
>> backported after the 13.2.5 release, but it is in place for the 13.2.6
>> release of mimic.
>>
> 
> Thanks! I'll might grab some packages from Shaman to give GC a try.
> 

I've set up a temporary machine next to the 13.2.5 cluster with the
13.2.6 packages from Shaman.

On that machine I'm running:

$ radosgw-admin gc process

That seems to work as intended! So the PR seems to have fixed it.

Should be fixed permanently when 13.2.6 is officially released.

Wido

> Wido
> 
>> Eric
>>
>>
>> On 5/29/19 3:19 AM, Wido den Hollander wrote:
>>> Hi,
>>>
>>> I've got a Ceph cluster with this status:
>>>
>>> health: HEALTH_WARN
>>> 3 large omap objects
>>>
>>> After looking into it I see that the issue comes from objects in the
>>> '.rgw.gc' pool.
>>>
>>> Investigating it I found that the gc.* objects have a lot of OMAP keys:
>>>
>>> for OBJ in $(rados -p .rgw.gc ls); do
>>>   echo $OBJ
>>>   rados -p .rgw.gc listomapkeys $OBJ|wc -l
>>> done
>>>
>>> I then found out that on average these objects have about 100k of OMAP
>>> keys each, but two stand out and have about 3M OMAP keys.
>>>
>>> I can list the GC with 'radosgw-admin gc list' and this yields a JSON
>>> which is a couple of MB in size.
>>>
>>> I ran:
>>>
>>> $ radosgw-admin gc process
>>>
>>> That runs for hours and then finishes, but the large list of OMAP keys
>>> stays.
>>>
>>> Running Mimic 13.3.5 on this cluster.
>>>
>>> Has anybody seen this before?
>>>
>>> Wido
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Two questions about ceph update/upgrade strategies

2019-06-04 Thread Rainer Krienke
I have a fresh ceph 14.2.1 cluster up and running based on Ubuntu 18.04.
It consists of 9 hosts (+1 admin host). The nine hosts have each 16
ceph-osd daemons running, three in these nine hosts also have
a ceph-mon and a ceph-mgr daemon running. So three hosts are running
osd, mon and also mgr daemons.

Now I am unsure about the right way to go for ceph upgrades and linux
system host updates.

Ceph-Upgrade:
Reading the ceph upgrade docs I ask myself how a future upgrade say to
14.2.2 should be performed correctly? The recommendation says to upgrade
first monitors, then osds etc...

So what is the correct way to go in a mixed setup like mine? Following
the rules strictly would mean not to use ceph-deploy install, but
instead to log into the mon(/osd) hosts and then upgrade only the
ceph-mon package and restart this mon, and then do the same with the
other monitors/osd hosts. After all mons have been successfully upgraded
I should then continue with upgrading OSDs (ceph-osd package) on one
host and restart all osds on this host one after another or reboot the
whole host. Then proceed to the next osd-host.

Is this the correct and best way to go?

Linux system updates:
The second point I would like to hear your opinions about is how you
handle linux system updates? Since even a non ceph linux system package
update might break ceph or even stop the whole linux host from booting,
care has to be taken. So how do you handle this problem? Do you run host
upgrades only manually in a fixed sequence eg first on a osd/mon host
and if the update is successful, then run the linux system package
updates on the other hosts?   Do you use another strategy?

Thanks
Rainer
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-04 Thread 解决
Hi all,
We use ceph(luminous) + openstack(queens) in my test environment。The 
virtual machine does not start properly after the disaster test and the image 
of virtual machine can not create snap.The procedure is as follows:
#!/usr/bin/env python


import rados
import rbd
with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
with cluster.open_ioctx('vms') as ioctx:
rbd_inst = rbd.RBD()
print "start open rbd image"
with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') as 
image:
print "start create snapshot"
image.create_snap('myimage_snap1')


when i run it ,it show readonlyimage,as follows:


start open rbd image
start create snapshot
Traceback (most recent call last):
  File "testpool.py", line 17, in 
image.create_snap('myimage_snap1')
  File "rbd.pyx", line 1790, in rbd.Image.create_snap 
(/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15682)
rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 from 
10df4634-4401-45ca-9c57-f349b78da475_disk


but i run it with admin instead of nova,it is ok.


"ceph auth list"  as follow


installed auth entries:


osd.1
key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.10
key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.11
key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.4
key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.5
key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.7
key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.8
key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
auid: 0
caps: [mds] allow
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
client.cinder
key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
client.cinder-backup
key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
pool=backups, allow rwx pool=backups-cache
client.glance
key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images, 
allow rwx pool=images-cache
client.nova
key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow rwx 
pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache
client.radosgw.gateway
key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
auid: 0
caps: [mon] allow rwx
caps: [osd] allow rwx
mgr.172.30.126.26
key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
mgr.172.30.126.27
key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
mgr.172.30.126.28
key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
caps: [mds] allow *
caps: [mon] allow profile mgr
caps: [osd] allow *
 


Can someone explain it to me? 
thanks!!



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread Burkhard Linke

Hi,

On 6/4/19 10:12 AM, CUZA Frédéric wrote:


Hi everyone,

We want to migrate datas from one cluster (Hammer) to a new one 
(Mimic). We do not wish to upgrade the actual cluster as all the 
hardware is EOS and we upgrade the configuration of the servers.


We can’t find a “proper” way to mount two rbd images from two 
different cluster on the same host.


Does anyone know what is the “good” procedure to achieve this ?



Just my 2 ct:

the 'rbd' commands allows specifying a configuration file (-c). You need 
to setup two configuration files, one for each cluster. You can also use 
two different cluster names (--cluster option). AFAIK the name is only 
used to locate the configuration file. I'm not sure how well the kernel 
works with mapping RBDs from two different cluster.



If you only want to transfer RBDs from one cluster to another, you do 
not need to map and mount them; the 'rbd' command has the sub commands 
'export' and 'import'. You can pipe them to avoid writing data to a 
local disk. This should be the fastest way to transfer the RBDs.



Regards,

Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple rbd images from different clusters

2019-06-04 Thread CUZA Frédéric
Hi everyone,

We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We do 
not wish to upgrade the actual cluster as all the hardware is EOS and we 
upgrade the configuration of the servers.
We can't find a "proper" way to mount two rbd images from two different cluster 
on the same host.
Does anyone know what is the "good" procedure to achieve this ?


Cheers and thanks,

Fred.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

2019-06-04 Thread James Wilkins
(Thanks Yan for confirming fix - we'll implement now)

@Marc

Yep - x3 replica on meta-data pools

We have 4 clusters (all running same version) and have experienced meta-data 
corruption on the majority of them at some time or the other - normally a scan 
fixes - I suspect due to the use case - think LAMP stacks with various 
drupal/wordpress caching plugins - which are running within openshift 
containers and utilising CephFS as a storage backend.  These clusters have all 
been life-cycled up from Jewel if that matters.


Example;

# ceph osd dump | grep metadata
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 528539 flags hashpspool stripe_width 
0 application cephfs

Only other thing of note is on this particular cluster the meta-data pool is 
quite large for the number of files - see below re 281GB - other clusters 
metadata is a lot smaller for similar dataset.

# ceph df
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
66.9TiB 29.1TiB  37.8TiB 56.44
POOLS:
NAMEID USED%USED MAX AVAIL OBJECTS
rbd 0  8.86TiB 64.57   4.86TiB  2637530
cephfs_data 1  2.59TiB 34.72   4.86TiB 25341863
cephfs_metadata 2   281GiB  5.34   4.86TiB  6755178

Cheers,

James

On 04/06/2019, 08:59, "Marc Roos"  wrote:

 
How did this get damaged? You had 3x replication on the pool?



-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com] 
Sent: dinsdag 4 juni 2019 1:14
To: James Wilkins
Cc: ceph-users
Subject: Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

On Mon, Jun 3, 2019 at 3:06 PM James Wilkins 
 wrote:
>
> Hi all,
>
> After a bit of advice to ensure we’re approaching this the right way.
>
> (version: 12.2.12, multi-mds, dirfrag is enabled)
>
> We have corrupt meta-data as identified by ceph
>
> health: HEALTH_ERR
> 2 MDSs report damaged metadata
>
> Asking the mds via damage ls
>
> {
> "damage_type": "dir_frag",
> "id": 2265410500,
> "ino": 2199349051809,
> "frag": "*",
> "path": 
"/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html/s
hop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1"
> }
>
>
> We’ve done the steps outlined here -> 
> http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/ namely
>
> cephfs-journal-tool –fs:all journal reset (both ranks) 
> cephfs-data-scan scan extents / inodes / links has completed
>
> However when attempting to access the named folder we get:
>
> 2019-05-31 03:16:04.792274 7f56f6fb5700 -1 log_channel(cluster) log 
> [ERR] : dir 0x200136b41a1 object missing on disk; some files may be 
> lost 
> (/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html
> /shop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1)
>
> We get this error followed shortly by an MDS failover
>
> Two questions really
>
> What’s not immediately clear from the documentation is should we/do 
we also need to run the below?
>
> # Session table
> cephfs-table-tool 0 reset session
> # SnapServer
> cephfs-table-tool 0 reset snap
> # InoTable
> cephfs-table-tool 0 reset inode
> # Root inodes ("/" and MDS directory)
> cephfs-data-scan init
>

No, don't do this.

> And secondly – our current train of thought is we need to grab the 
inode number of the parent folder and delete this from the metadata pool 
via rados rmomapkey – is this correct?
>

Yes, find inode number of directory 21832. check if omap key '1_head'
exist in object .. If it exists, 
remove it.

> Any input appreciated
>
> Cheers,
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

2019-06-04 Thread Marc Roos
 
How did this get damaged? You had 3x replication on the pool?



-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com] 
Sent: dinsdag 4 juni 2019 1:14
To: James Wilkins
Cc: ceph-users
Subject: Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

On Mon, Jun 3, 2019 at 3:06 PM James Wilkins 
 wrote:
>
> Hi all,
>
> After a bit of advice to ensure we’re approaching this the right way.
>
> (version: 12.2.12, multi-mds, dirfrag is enabled)
>
> We have corrupt meta-data as identified by ceph
>
> health: HEALTH_ERR
> 2 MDSs report damaged metadata
>
> Asking the mds via damage ls
>
> {
> "damage_type": "dir_frag",
> "id": 2265410500,
> "ino": 2199349051809,
> "frag": "*",
> "path": 
"/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html/s
hop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1"
> }
>
>
> We’ve done the steps outlined here -> 
> http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/ namely
>
> cephfs-journal-tool –fs:all journal reset (both ranks) 
> cephfs-data-scan scan extents / inodes / links has completed
>
> However when attempting to access the named folder we get:
>
> 2019-05-31 03:16:04.792274 7f56f6fb5700 -1 log_channel(cluster) log 
> [ERR] : dir 0x200136b41a1 object missing on disk; some files may be 
> lost 
> (/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html
> /shop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1)
>
> We get this error followed shortly by an MDS failover
>
> Two questions really
>
> What’s not immediately clear from the documentation is should we/do 
we also need to run the below?
>
> # Session table
> cephfs-table-tool 0 reset session
> # SnapServer
> cephfs-table-tool 0 reset snap
> # InoTable
> cephfs-table-tool 0 reset inode
> # Root inodes ("/" and MDS directory)
> cephfs-data-scan init
>

No, don't do this.

> And secondly – our current train of thought is we need to grab the 
inode number of the parent folder and delete this from the metadata pool 
via rados rmomapkey – is this correct?
>

Yes, find inode number of directory 21832. check if omap key '1_head'
exist in object .. If it exists, 
remove it.

> Any input appreciated
>
> Cheers,
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com