[ceph-users] CephFS Ganesha NFS for VMWare
Hello Ceph Users, I am trialing CephFS / Ganesha NFS for VMWare usage. We are on Mimic / Centos 7.7 / 130 x 12TB 7200rpm OSDs / 13 hosts / 3 replica. So far the read performance has been great. The write performance ( NFS sync ) hasn't been great. We use a lot of 64KB NFS read / writes and the latency is around 50-60ms from esxtop. I have been benchmarking different CephFS block / stripe sizes but would like to hear what others have settled on? The default 4MB / 1 stripe doesn't seem to give great 64KB performance. I would also like to know if I am experiencing PG locking but haven't found a way to do that. Glen This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Any CEPH's iSCSI gateway users?
Interesting performance increase! I'm Iscsi it at a few installations and now a wonder what version of Centos is required to improve performance! Did the cluster go from Luminous to Mimic? Glen -Original Message- From: ceph-users On Behalf Of Heðin Ejdesgaard Møller Sent: Saturday, 8 June 2019 8:00 AM To: Paul Emmerich ; Igor Podlesny Cc: Ceph Users Subject: Re: [ceph-users] Any CEPH's iSCSI gateway users? -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I recently upgraded a RHCS-3.0 cluster with 4 iGW's to RHCS-3.2 on top of RHEL- 7.6 Big block size performance went from ~350MB/s to about 1100MB/s on each lun, seen from a VM in vSphere-6.5 with data read from an ssd pool and written to a hdd pool, both being 3/2 replica. I have not experienced any hick-up since the upgrade. You will always have a degree of performance hit when using the iGW, because it's both an extra layer between consumer and hardware, and a potential choke- point, just like any "traditional" iSCSI based SAN solution. If you are considering to deploy the iGW on the upstream bits then I would recommend you to stick to CentOS, since a lot of it's development have happened on the RHEL platform. Regards Heðin Ejdesgaard Synack sp/f On frí, 2019-06-07 at 12:44 +0200, Paul Emmerich wrote: > Hi, > > ceph-iscsi 3.0 fixes a lot of problems and limitations of the older gateway. > > Best way to run it on Debian/Ubuntu is to build it yourself > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at > https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > > On Tue, May 28, 2019 at 10:02 AM Igor Podlesny wrote: > > What is your experience? > > Does it make sense to use it -- is it solid enough or beta quality > > rather (both in terms of stability and performance)? > > > > I've read it was more or less packaged to work with RHEL. Does it > > hold true still? > > What's the best way to install it on, say, CentOS or Debian/Ubuntu? > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEElZWfRQVsNukQFi9Ko80MCbT/An0FAlz6+ocACgkQo80MCbT/ An28EBAA4FlpRYEhFSWm2dfTdYBfFLNJbLrwyMvXOe22sLHwlz3GWMnY2llJ7nyM YAZy0DZGmujoztBos3eR1A/FB22yr6BYPjC9/f/+8vt3TMhxG5Tm0g/XifJSXaJl zL8lA3T+XkcMZkphukjhR2BZWioam0ipT07n6+rNdQCaS9/xt7QE7gwWeGQWxKsf EDY4XWKjiIvyuK4nt2R1raTl9uaW1FI2qM/UoHWyW+ip86syEC1p1HfqWpeU5Mm2 TXRgTVRS4tM91GfciwKdCwZIZjT10POyFfk2DHwMA40lUc8cFCyzj3aAkdJp4U4h 8Wm0QJBzabcuWHfBBJlWRARSGVXKUx08HM3alatO8vum5WSK2w9l5pgyx5H4jM5+ 6YABtwvT5lwEiHL9hUoO9HDpyj/IcMzHF5yG5v5PdXCuat7HwNcv6dD2j2dEAgma HlLRo84PNeHiIn52jSSFGr4O6MQTYei/VMD2IbrDJzjUOFCUOxdX5WsSeFdhF5Zc LW2rcnLiTcRisxiu3MvJI1kUvGFr1GFmjQI/7MeTXiq2bfQh08LUpM6Cz/ch7iUQ xo7zUGGuQcOx6iSmagTcMa1QqF8+txCSvCTVvlWdXLAzXsDOJ4mkGe1EWJ2pHjz2 zBcn25Qfws9DEvEww71a/sKp2tlwnKCZgKXhkIBOKyhU7x1dYOI= =I3pv -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] op_w_latency
Thanks for the updated command – much cleaner! The OSD nodes have a single 6core X5650 @ 2.67GHz, 72GB GB and around 8x10TB HDD OSD/ 4 x 2TB SSD OSD. Cpu usage is around 20% and the ram has 22GB available. The 3 MON nodes are the same but with no OSDs The cluster has around 150 drives and only doing 500-1000 ops overall. The network is dual 10gbit using lacp. Vlan for private ceph traffic and untagged for public Glen From: Konstantin Shalygin Sent: Wednesday, 3 April 2019 11:39 AM To: Glen Baars Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] op_w_latency Hello Ceph Users, I am finding that the write latency across my ceph clusters isn't great and I wanted to see what other people are getting for op_w_latency. Generally I am getting 70-110ms latency. I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | grep -A3 '\"op_w_latency' | grep 'avgtime' Better like this: ceph daemon osd.102 perf dump | jq '.osd.op_w_latency.avgtime' Ram, CPU and network don't seem to be the bottleneck. The drives are behind a dell H810p raid card with a 1GB writeback cache and battery. I have tried with LSI JBOD cards and haven't found it faster ( as you would expect with write cache ). The disks through iostat -xyz 1 show 10-30% usage with general service + write latency around 3-4ms. Queue depth is normally less than one. RocksDB write latency is around 0.6ms, read 1-2ms. Usage is RBD backend for Cloudstack. What is your hardware? Your CPU, RAM, Eth? k This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] op_w_latency
Hello Ceph Users, I am finding that the write latency across my ceph clusters isn't great and I wanted to see what other people are getting for op_w_latency. Generally I am getting 70-110ms latency. I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | grep -A3 '\"op_w_latency' | grep 'avgtime' Ram, CPU and network don't seem to be the bottleneck. The drives are behind a dell H810p raid card with a 1GB writeback cache and battery. I have tried with LSI JBOD cards and haven't found it faster ( as you would expect with write cache ). The disks through iostat -xyz 1 show 10-30% usage with general service + write latency around 3-4ms. Queue depth is normally less than one. RocksDB write latency is around 0.6ms, read 1-2ms. Usage is RBD backend for Cloudstack. Dumping the ops seems to show the latency here: (ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok dump_historic_ops_by_duration |less) { "time": "2019-04-01 22:24:38.432000", "event": "queued_for_pg" }, { "time": "2019-04-01 22:24:38.438691", "event": "reached_pg" }, { "time": "2019-04-01 22:24:38.438740", "event": "started" }, { "time": "2019-04-01 22:24:38.727820", "event": "sub_op_started" }, { "time": "2019-04-01 22:24:38.728448", "event": "sub_op_committed" }, { "time": "2019-04-01 22:24:39.129175", "event": "commit_sent" }, { "time": "2019-04-01 22:24:39.129231", "event": "done" } ] } } This write was around a very slow one and I am wondering if I have a few ops that are taking along time and most that are good What else can I do to figure out where the issue is? This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] When to use a separate RocksDB SSD
Hello Ceph, What is the best way to find out how the RocksDB is currently performing? I need to build a business case for NVME devices for RocksDB. Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow OPS
Hello Brad, It doesn't seem to be a set of OSDs, the cluster has 160ish OSDs over 9 hosts. I seem to get a lot of these ops also that don't show a client. "description": "osd_repop(client.14349712.0:4866968 15.36 e30675/22264 15:6dd17247:::rbd_data.2359ef6b8b4567.0042766 a:head v 30675'5522366)", "initiated_at": "2019-03-21 16:51:56.862447", "age": 376.527241, "duration": 1.331278, Kind regards, Glen Baars -Original Message- From: Brad Hubbard Sent: Thursday, 21 March 2019 1:43 PM To: Glen Baars Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Slow OPS Actually, the lag is between "sub_op_committed" and "commit_sent". Is there any pattern to these slow requests? Do they involve the same osd, or set of osds? On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard wrote: > > On Thu, Mar 21, 2019 at 3:20 PM Glen Baars > wrote: > > > > Thanks for that - we seem to be experiencing the wait in this section of > > the ops. > > > > { > > "time": "2019-03-21 14:12:42.830191", > > "event": "sub_op_committed" > > }, > > { > > "time": "2019-03-21 14:12:43.699872", > > "event": "commit_sent" > > }, > > > > Does anyone know what that section is waiting for? > > Hi Glen, > > These are documented, to some extent, here. > > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting > -osd/ > > It looks like it may be taking a long time to communicate the commit > message back to the client? Are these slow ops always the same client? > > > > > Kind regards, > > Glen Baars > > > > -Original Message- > > From: Brad Hubbard > > Sent: Thursday, 21 March 2019 8:23 AM > > To: Glen Baars > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Slow OPS > > > > On Thu, Mar 21, 2019 at 12:11 AM Glen Baars > > wrote: > > > > > > Hello Ceph Users, > > > > > > > > > > > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd > > > daemon waiting on the disk subsystem? > > > > This is set by "mark_started()" and is roughly set when the pg starts > > processing the op. Might want to capture dump_historic_ops output after the > > op completes. > > > > > > > > > > > > > > Ceph 13.2.4 on centos 7.5 > > > > > > > > > > > > "description": "osd_op(client.1411875.0:422573570 > > > 5.18ds0 > > > 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read > > > > > > 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected > > > e30622)", > > > > > > "initiated_at": "2019-03-21 01:04:40.598438", > > > > > > "age": 11.340626, > > > > > > "duration": 11.342846, > > > > > > "type_data": { > > > > > > "flag_point": "started", > > > > > > "client_info": { > > > > > > "client": "client.1411875", > > > > > > "client_addr": "10.4.37.45:0/627562602", > > > > > > "tid": 422573570 > > > > > > }, > > > > > > "events": [ > > > > > > { > > > > > > "time": "2019-03-21 01:04:40.598438", > > > > > > "event": "initiated" > > > > > > }, > > > > > > { > > > > > > "time": "2019-03-21 01:04:40.598438", > > > > > > "event": "header_read" > > > > > > }, > > > > > > { > > > > > > "time": "2019-03-21 01:04:40.598439", > > > > > > &q
Re: [ceph-users] Slow OPS
Thanks for that - we seem to be experiencing the wait in this section of the ops. { "time": "2019-03-21 14:12:42.830191", "event": "sub_op_committed" }, { "time": "2019-03-21 14:12:43.699872", "event": "commit_sent" }, Does anyone know what that section is waiting for? Kind regards, Glen Baars -Original Message- From: Brad Hubbard Sent: Thursday, 21 March 2019 8:23 AM To: Glen Baars Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Slow OPS On Thu, Mar 21, 2019 at 12:11 AM Glen Baars wrote: > > Hello Ceph Users, > > > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd daemon > waiting on the disk subsystem? This is set by "mark_started()" and is roughly set when the pg starts processing the op. Might want to capture dump_historic_ops output after the op completes. > > > > Ceph 13.2.4 on centos 7.5 > > > > "description": "osd_op(client.1411875.0:422573570 5.18ds0 > 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read > > 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected e30622)", > > "initiated_at": "2019-03-21 01:04:40.598438", > > "age": 11.340626, > > "duration": 11.342846, > > "type_data": { > > "flag_point": "started", > > "client_info": { > > "client": "client.1411875", > > "client_addr": "10.4.37.45:0/627562602", > > "tid": 422573570 > > }, > > "events": [ > > { > > "time": "2019-03-21 01:04:40.598438", > > "event": "initiated" > > }, > > { > > "time": "2019-03-21 01:04:40.598438", > > "event": "header_read" > > }, > > { > > "time": "2019-03-21 01:04:40.598439", > > "event": "throttled" > > }, > > { > > "time": "2019-03-21 01:04:40.598450", > > "event": "all_read" > > }, > > { > > "time": "2019-03-21 01:04:40.598499", > > "event": "dispatched" > > }, > > { > > "time": "2019-03-21 01:04:40.598504", > > "event": "queued_for_pg" > > }, > > { > > "time": "2019-03-21 01:04:40.598883", > > "event": "reached_pg" > > }, > > { > > "time": "2019-03-21 01:04:40.598905", > > "event": "started" > > } > > ] > > } > > } > > ], > > > > Glen > > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. If > you have received this e-mail in error, please notify us immediately. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Slow OPS
Hello Ceph Users, Does anyone know what the flag point 'Started' is? Is that ceph osd daemon waiting on the disk subsystem? Ceph 13.2.4 on centos 7.5 "description": "osd_op(client.1411875.0:422573570 5.18ds0 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected e30622)", "initiated_at": "2019-03-21 01:04:40.598438", "age": 11.340626, "duration": 11.342846, "type_data": { "flag_point": "started", "client_info": { "client": "client.1411875", "client_addr": "10.4.37.45:0/627562602", "tid": 422573570 }, "events": [ { "time": "2019-03-21 01:04:40.598438", "event": "initiated" }, { "time": "2019-03-21 01:04:40.598438", "event": "header_read" }, { "time": "2019-03-21 01:04:40.598439", "event": "throttled" }, { "time": "2019-03-21 01:04:40.598450", "event": "all_read" }, { "time": "2019-03-21 01:04:40.598499", "event": "dispatched" }, { "time": "2019-03-21 01:04:40.598504", "event": "queued_for_pg" }, { "time": "2019-03-21 01:04:40.598883", "event": "reached_pg" }, { "time": "2019-03-21 01:04:40.598905", "event": "started" } ] } } ], Glen This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic 13.2.4 rbd du slowness
Here is the strace result. % time seconds usecs/call callserrors syscall -- --- --- - - 99.940.236170 790 299 5 futex 0.060.000136 0 365 brk 0.000.00 041 2 read 0.000.00 048 write 0.000.00 07227 open 0.000.00 043 close 0.000.00 010 5 stat 0.000.00 036 fstat 0.000.00 0 1 lseek 0.000.00 0 103 mmap 0.000.00 070 mprotect 0.000.00 019 munmap 0.000.00 011 rt_sigaction 0.000.00 032 rt_sigprocmask 0.000.00 02626 access 0.000.00 0 3 pipe 0.000.00 019 clone 0.000.00 0 1 execve 0.000.00 0 7 uname 0.000.00 012 fcntl 0.000.00 0 1 getrlimit 0.000.00 0 2 sysinfo 0.000.00 0 1 getuid 0.000.00 0 1 prctl 0.000.00 0 1 arch_prctl 0.000.00 0 1 gettid 0.000.00 0 3 epoll_create 0.000.00 0 1 set_tid_address 0.000.00 0 1 set_robust_list 0.000.00 0 1 membarrier -- --- --- - - 100.000.236306 123165 total From: David Turner Sent: Friday, 1 March 2019 11:46 AM To: Glen Baars Cc: Wido den Hollander ; ceph-users Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness Have you used strace on the du command to see what it's spending its time doing? On Thu, Feb 28, 2019, 8:45 PM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Wido, The cluster layout is as follows: 3 x Monitor hosts ( 2 x 10Gbit bonded ) 9 x OSD hosts ( 2 x 10Gbit bonded, LSI cachecade and write cache drives set to single, All HDD in this pool, no separate DB / WAL. With the write cache and the SSD read cache on the LSI card it seems to perform well. 168 OSD disks No major increase in OSD disk usage or CPU usage. The RBD DU process uses 100% of a single 2.4Ghz core while running - I think that is the limiting factor. I have just tried removing most of the snapshots for that volume ( from 14 snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 minutes. Kind regards, Glen Baars -Original Message- From: Wido den Hollander mailto:w...@42on.com>> Sent: Thursday, 28 February 2019 5:05 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>>; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness On 2/28/19 9:41 AM, Glen Baars wrote: > Hello Wido, > > I have looked at the libvirt code and there is a check to ensure that > fast-diff is enabled on the image and only then does it try to get the real > disk usage. The issue for me is that even with fast-diff enabled it takes > 25min to get the space usage for a 50TB image. > > I had considered turning off fast-diff on the large images to get > around to issue but I think that will hurt my snapshot removal times ( > untested ) > Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on SSD? Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these images? Wido > I can't see in the code any other way of bypassing the disk usage check but I > am not that familiar with the code. > > --- > if (volStorageBackendRBDUseFastDiff(features)) { > VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " > "Querying for actual allocation", > def->source.name<http://source.name>, vol->name); > > if (virStorageBackendRBDSetAllocation(vol, image, ) < 0) > goto cleanup; > } else { > vol->target.allocation = info.obj_size * info.num_objs; } > -- > > Kind regards, > Glen Baars > > -Original Message- > From: Wido den Hollander mailto:w...@42on.com>> > Sent: Thursday, 28 February 2019 3:49 PM > To: Glen Baars > mailto:g...@onsitecomputers.com.au>>; > cep
Re: [ceph-users] Mimic 13.2.4 rbd du slowness
Hello Wido, The cluster layout is as follows: 3 x Monitor hosts ( 2 x 10Gbit bonded ) 9 x OSD hosts ( 2 x 10Gbit bonded, LSI cachecade and write cache drives set to single, All HDD in this pool, no separate DB / WAL. With the write cache and the SSD read cache on the LSI card it seems to perform well. 168 OSD disks No major increase in OSD disk usage or CPU usage. The RBD DU process uses 100% of a single 2.4Ghz core while running - I think that is the limiting factor. I have just tried removing most of the snapshots for that volume ( from 14 snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 minutes. Kind regards, Glen Baars -Original Message- From: Wido den Hollander Sent: Thursday, 28 February 2019 5:05 PM To: Glen Baars ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness On 2/28/19 9:41 AM, Glen Baars wrote: > Hello Wido, > > I have looked at the libvirt code and there is a check to ensure that > fast-diff is enabled on the image and only then does it try to get the real > disk usage. The issue for me is that even with fast-diff enabled it takes > 25min to get the space usage for a 50TB image. > > I had considered turning off fast-diff on the large images to get > around to issue but I think that will hurt my snapshot removal times ( > untested ) > Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on SSD? Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these images? Wido > I can't see in the code any other way of bypassing the disk usage check but I > am not that familiar with the code. > > --- > if (volStorageBackendRBDUseFastDiff(features)) { > VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " > "Querying for actual allocation", > def->source.name, vol->name); > > if (virStorageBackendRBDSetAllocation(vol, image, ) < 0) > goto cleanup; > } else { > vol->target.allocation = info.obj_size * info.num_objs; } > -- > > Kind regards, > Glen Baars > > -Original Message- > From: Wido den Hollander > Sent: Thursday, 28 February 2019 3:49 PM > To: Glen Baars ; > ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness > > > > On 2/28/19 2:59 AM, Glen Baars wrote: >> Hello Ceph Users, >> >> Has anyone found a way to improve the speed of the rbd du command on large >> rbd images? I have object map and fast diff enabled - no invalid flags on >> the image or it's snapshots. >> >> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu >> 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it >> discovers all images in the pool and tries to get their disk usage. We are >> seeing a 50TB image take 25min. The pool has over 300TB of images in it and >> takes hours for libvirt to start. >> > > This is actually a pretty bad thing imho. As a lot of images people will be > using do not have fast-diff enabled (images from the past) and that will kill > their performance. > > Isn't there a way to turn this off in libvirt? > > Wido > >> We can replicate the issue without libvirt by just running a rbd du on the >> large images. The limiting factor is the cpu on the rbd du command, it uses >> 100% of a single core. >> >> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 >> hosts. >> >> Kind regards, >> Glen Baars >> This e-mail is intended solely for the benefit of the addressee(s) and any >> other named recipient. It is confidential and may contain legally privileged >> or confidential information. If you are not the recipient, any use, >> distribution, disclosure or copying of this e-mail is prohibited. The >> confidentiality and legal privilege attached to this communication is not >> waived or lost by reason of the mistaken transmission or delivery to you. If >> you have received this e-mail in error, please notify us immediately. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not
Re: [ceph-users] Mimic 13.2.4 rbd du slowness
Hello Wido, I have looked at the libvirt code and there is a check to ensure that fast-diff is enabled on the image and only then does it try to get the real disk usage. The issue for me is that even with fast-diff enabled it takes 25min to get the space usage for a 50TB image. I had considered turning off fast-diff on the large images to get around to issue but I think that will hurt my snapshot removal times ( untested ) I can't see in the code any other way of bypassing the disk usage check but I am not that familiar with the code. --- if (volStorageBackendRBDUseFastDiff(features)) { VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " "Querying for actual allocation", def->source.name, vol->name); if (virStorageBackendRBDSetAllocation(vol, image, ) < 0) goto cleanup; } else { vol->target.allocation = info.obj_size * info.num_objs; } ------ Kind regards, Glen Baars -Original Message- From: Wido den Hollander Sent: Thursday, 28 February 2019 3:49 PM To: Glen Baars ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness On 2/28/19 2:59 AM, Glen Baars wrote: > Hello Ceph Users, > > Has anyone found a way to improve the speed of the rbd du command on large > rbd images? I have object map and fast diff enabled - no invalid flags on the > image or it's snapshots. > > We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu > 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it > discovers all images in the pool and tries to get their disk usage. We are > seeing a 50TB image take 25min. The pool has over 300TB of images in it and > takes hours for libvirt to start. > This is actually a pretty bad thing imho. As a lot of images people will be using do not have fast-diff enabled (images from the past) and that will kill their performance. Isn't there a way to turn this off in libvirt? Wido > We can replicate the issue without libvirt by just running a rbd du on the > large images. The limiting factor is the cpu on the rbd du command, it uses > 100% of a single core. > > Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 > hosts. > > Kind regards, > Glen Baars > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. If > you have received this e-mail in error, please notify us immediately. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mimic 13.2.4 rbd du slowness
Hello Ceph Users, Has anyone found a way to improve the speed of the rbd du command on large rbd images? I have object map and fast diff enabled - no invalid flags on the image or it's snapshots. We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it discovers all images in the pool and tries to get their disk usage. We are seeing a 50TB image take 25min. The pool has over 300TB of images in it and takes hours for libvirt to start. We can replicate the issue without libvirt by just running a rbd du on the large images. The limiting factor is the cpu on the rbd du command, it uses 100% of a single core. Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 hosts. Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mimic Bluestore memory optimization
Hello Ceph! I am tracking down a performance issue with some of our mimic 13.2.4 OSDs. It feels like a lack of memory but I have no real proof of the issue. I have used the memory profiling ( pprof tool ) and the OSD's are maintaining their 4GB allocated limit. My questions are: 1.How do you know if the allocated memory is enough for the OSD? My 1TB disks and 12TB disks take the same memory and I wonder if the OSDs should have memory allocated based on the size of the disks? 2.In the past, SSD disks needs 3 times the memory and now they don't, why is that? ( 1GB ram per HDD and 3GB ram per SSD both went to 4GB ) 3.I have read that the number of placement groups per OSD is a significant factor in the memory usage. Generally I have ~200 placement groups per OSD, this is at the higher end of the recommended values and I wonder if its causing high memory usage? For reference the hosts are 1 x 6 core CPU, 72GB ram, 14 OSDs, 2 x 10Gbit. LSI cachecade / writeback cache for the HDD and LSI JBOD for SSDs. 9 hosts in this cluster. Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Segfaults on 12.2.9 and 12.2.8
dedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x55565ee0c1a4] 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55565ee0f1e0] 19: (()+0x76ba) [0x7fec8af206ba] 20: (clone()+0x6d) [0x7fec89f9741d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hyper-v ISCSI support
Hello Ceph Users, We have been using ceph-iscsi-cli for some time now with vmware and it is performing ok. We would like to use the same iscsi service to store our Hyper-v VMs via windows clustered shared volumes. When we add the volume to windows failover manager we get a device is not ready error. I am assuming this is due to SCSI-3 persistent reservations. Has anyone managed to get ceph to serve iscsi to windows clustered shared volumes? If so, how? Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Invalid Object map without flags set
Hello K, We have found our issue – we were only fixing the main RDB image in our script rather than the snapshots. Working fine now. Thanks for your help. Kind regards, Glen Baars From: Konstantin Shalygin Sent: Friday, 17 August 2018 11:20 AM To: ceph-users@lists.ceph.com; Glen Baars Subject: Re: [ceph-users] Invalid Object map without flags set We are having issues with ensuring that object-map and fast-diff is working correctly. Most of the time when there is an invalid fast-diff map, the flag is set to correctly indicate this. We have a script that checks for this and rebuilds object maps as required. If we don't fix these, snapshot removal and rbd usage commands are too slow. About 10% of the time when we issue an `rbd du` command we get the following situation. warning: fast-diff map is invalid for 2d6b4502-f720-4c00-b4a4-8879e415f283 at 18-D-2018-07-11<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>:1109. operation may be slow. When we check the `rbd info` it doesn't have any flags set. [INFO] {"name":"2d6b4502-f720-4c00-b4a4-8879e415f283","size":536870912000,"objects":128000,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.17928643c9869","format":2,"features":["layering","exclusive-lock","object-map","fast-diff","deep-flatten"],"flags":[],"create_timestamp":"Sat Apr 28 19:45:59 2018"} [Feat]["layering","exclusive-lock","object-map","fast-diff","deep-flatten"] [Flag][] Is there another way to detect invalid object maps? Ceph 12.2.7 - All Bluestore As long as I remember, when object-map is bad you will see this flag on rbd info. for e in `rbd ls replicated_rbd`; do echo "replicated_rbd/${e}"; rbd info replicated_rbd/${e} | grep "flag"; done k This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Invalid Object map without flags set
Hello Ceph Users, We are having issues with ensuring that object-map and fast-diff is working correctly. Most of the time when there is an invalid fast-diff map, the flag is set to correctly indicate this. We have a script that checks for this and rebuilds object maps as required. If we don't fix these, snapshot removal and rbd usage commands are too slow. About 10% of the time when we issue an `rbd du` command we get the following situation. warning: fast-diff map is invalid for 2d6b4502-f720-4c00-b4a4-8879e415f283@18-D-2018-07-11:1109. operation may be slow. When we check the `rbd info` it doesn't have any flags set. [INFO] {"name":"2d6b4502-f720-4c00-b4a4-8879e415f283","size":536870912000,"objects":128000,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.17928643c9869","format":2,"features":["layering","exclusive-lock","object-map","fast-diff","deep-flatten"],"flags":[],"create_timestamp":"Sat Apr 28 19:45:59 2018"} [Feat]["layering","exclusive-lock","object-map","fast-diff","deep-flatten"] [Flag][] Is there another way to detect invalid object maps? Ceph 12.2.7 - All Bluestore Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD journal feature
Thanks for your help Kind regards, Glen Baars From: Jason Dillaman Sent: Thursday, 16 August 2018 10:21 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature On Thu, Aug 16, 2018 at 2:37 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Is there any workaround that you can think of to correctly enable journaling on locked images? You could add the "rbd journal pool = XYZ" configuration option to the ceph.conf on the hosts currently using the images (or use 'rbd image-meta set conf_rbd_journal_pool SSDPOOL' on each image), restart/live-migrate the affected VMs(?) to pick up the config changes, and enable journaling. Kind regards, Glen Baars From: ceph-users mailto:ceph-users-boun...@lists.ceph.com>> On Behalf Of Glen Baars Sent: Tuesday, 14 August 2018 9:36 PM To: dilla...@redhat.com<mailto:dilla...@redhat.com> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature Hello Jason, Thanks for your help. Here is the output you asked for also. https://pastebin.com/dKH6mpwk Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:33 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:31 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have now narrowed it down. If the image has an exclusive lock – the journal doesn’t go on the correct pool. OK, that makes sense. If you have an active client on the image holding the lock, the request to enable journaling is sent over to that client but it's missing all the journal options. I'll open a tracker ticket to fix the issue. Thanks. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:29 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:19 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it doesn’t seem to make a difference. It should be SSDPOOL, but regardless, I am at a loss as to why it's not working for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature enable" command and provide the generated logs in a pastebin link. Also, here is the output: rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a There are 0 metadata on this image. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:00 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling journaling on a different pool: $ rbd info rbd/foo rbd image 'foo': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.101e6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Aug 14 08:51:19 2018 $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd --image foo rbd journal '101e6b8b4567': header_oid: journal.101e6b8b4567 object_oid_prefix: journal_data.1.101e6b8b4567. order: 24 (16384 kB objects) splay_width: 4 object_pool: rbd_ssd Can you please run "rbd image-meta list " to see if you are overwriting any configuration settings? Do you have any client configuration overrides in your "/etc/ceph/ceph.conf"? On Tue, Aug 14, 2018 at 8:25 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current image
Re: [ceph-users] RBD journal feature
Is there any workaround that you can think of to correctly enable journaling on locked images? Kind regards, Glen Baars From: ceph-users On Behalf Of Glen Baars Sent: Tuesday, 14 August 2018 9:36 PM To: dilla...@redhat.com Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature Hello Jason, Thanks for your help. Here is the output you asked for also. https://pastebin.com/dKH6mpwk Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:33 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:31 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have now narrowed it down. If the image has an exclusive lock – the journal doesn’t go on the correct pool. OK, that makes sense. If you have an active client on the image holding the lock, the request to enable journaling is sent over to that client but it's missing all the journal options. I'll open a tracker ticket to fix the issue. Thanks. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:29 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:19 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it doesn’t seem to make a difference. It should be SSDPOOL, but regardless, I am at a loss as to why it's not working for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature enable" command and provide the generated logs in a pastebin link. Also, here is the output: rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a There are 0 metadata on this image. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:00 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling journaling on a different pool: $ rbd info rbd/foo rbd image 'foo': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.101e6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Aug 14 08:51:19 2018 $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd --image foo rbd journal '101e6b8b4567': header_oid: journal.101e6b8b4567 object_oid_prefix: journal_data.1.101e6b8b4567. order: 24 (16384 kB objects) splay_width: 4 object_pool: rbd_ssd Can you please run "rbd image-meta list " to see if you are overwriting any configuration settings? Do you have any client configuration overrides in your "/etc/ceph/ceph.conf"? On Tue, Aug 14, 2018 at 8:25 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 just in case it's an issue that's only in the luminous release. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HD
Re: [ceph-users] RBD journal feature
Hello Jason, Thanks for your help. Here is the output you asked for also. https://pastebin.com/dKH6mpwk Kind regards, Glen Baars From: Jason Dillaman Sent: Tuesday, 14 August 2018 9:33 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:31 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have now narrowed it down. If the image has an exclusive lock – the journal doesn’t go on the correct pool. OK, that makes sense. If you have an active client on the image holding the lock, the request to enable journaling is sent over to that client but it's missing all the journal options. I'll open a tracker ticket to fix the issue. Thanks. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:29 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:19 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it doesn’t seem to make a difference. It should be SSDPOOL, but regardless, I am at a loss as to why it's not working for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature enable" command and provide the generated logs in a pastebin link. Also, here is the output: rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a There are 0 metadata on this image. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:00 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling journaling on a different pool: $ rbd info rbd/foo rbd image 'foo': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.101e6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Aug 14 08:51:19 2018 $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd --image foo rbd journal '101e6b8b4567': header_oid: journal.101e6b8b4567 object_oid_prefix: journal_data.1.101e6b8b4567. order: 24 (16384 kB objects) splay_width: 4 object_pool: rbd_ssd Can you please run "rbd image-meta list " to see if you are overwriting any configuration settings? Do you have any client configuration overrides in your "/etc/ceph/ceph.conf"? On Tue, Aug 14, 2018 at 8:25 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 just in case it's an issue that's only in the luminous release. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling --journal-pool RBD_SSD :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd journal '37c8974b0dc51': header_oid: journal.
Re: [ceph-users] RBD journal feature
Hello Jason, I have now narrowed it down. If the image has an exclusive lock – the journal doesn’t go on the correct pool. Kind regards, Glen Baars From: Jason Dillaman Sent: Tuesday, 14 August 2018 9:29 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 9:19 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it doesn’t seem to make a difference. It should be SSDPOOL, but regardless, I am at a loss as to why it's not working for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature enable" command and provide the generated logs in a pastebin link. Also, here is the output: rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a There are 0 metadata on this image. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 9:00 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling journaling on a different pool: $ rbd info rbd/foo rbd image 'foo': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.101e6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Aug 14 08:51:19 2018 $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd --image foo rbd journal '101e6b8b4567': header_oid: journal.101e6b8b4567 object_oid_prefix: journal_data.1.101e6b8b4567. order: 24 (16384 kB objects) splay_width: 4 object_pool: rbd_ssd Can you please run "rbd image-meta list " to see if you are overwriting any configuration settings? Do you have any client configuration overrides in your "/etc/ceph/ceph.conf"? On Tue, Aug 14, 2018 at 8:25 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 just in case it's an issue that's only in the luminous release. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling --journal-pool RBD_SSD :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd journal '37c8974b0dc51': header_oid: journal.37c8974b0dc51 object_oid_prefix: journal_data.1.37c8974b0dc51. order: 24 (16384 kB objects) splay_width: 4 *** :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: create_timestamp: Sat May 5 11:39:07 2018 journal: 37c8974b0dc51 mirroring state: disabled Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 12
Re: [ceph-users] RBD journal feature
Hello Jason, I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it doesn’t seem to make a difference. Also, here is the output: rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a There are 0 metadata on this image. Kind regards, Glen Baars From: Jason Dillaman Sent: Tuesday, 14 August 2018 9:00 PM To: Glen Baars Cc: dillaman ; ceph-users Subject: Re: [ceph-users] RBD journal feature I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling journaling on a different pool: $ rbd info rbd/foo rbd image 'foo': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.101e6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Aug 14 08:51:19 2018 $ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd --image foo rbd journal '101e6b8b4567': header_oid: journal.101e6b8b4567 object_oid_prefix: journal_data.1.101e6b8b4567. order: 24 (16384 kB objects) splay_width: 4 object_pool: rbd_ssd Can you please run "rbd image-meta list " to see if you are overwriting any configuration settings? Do you have any client configuration overrides in your "/etc/ceph/ceph.conf"? On Tue, Aug 14, 2018 at 8:25 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 just in case it's an issue that's only in the luminous release. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling --journal-pool RBD_SSD :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd journal '37c8974b0dc51': header_oid: journal.37c8974b0dc51 object_oid_prefix: journal_data.1.37c8974b0dc51. order: 24 (16384 kB objects) splay_width: 4 *** :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: create_timestamp: Sat May 5 11:39:07 2018 journal: 37c8974b0dc51 mirroring state: disabled Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 12:04 AM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Sun, Aug 12, 2018 at 12:13 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. Is this the correct way to view the journal objects? You won't see any journal objects in the SSDPOOL until you issue a write: $ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M rbd_hdd/test --rbd-cache=false bench type write io_size 4096 io_threads 16 bytes 16777216 pattern random SEC
Re: [ceph-users] RBD journal feature
Hello Jason, I will also complete testing of a few combinations tomorrow to try and isolate the issue now that we can get it to work with a new image. The cluster started out at 12.2.3 bluestore so there shouldn’t be any old issues from previous versions. Kind regards, Glen Baars From: Jason Dillaman Sent: Tuesday, 14 August 2018 7:43 PM To: Glen Baars Cc: dillaman ; ceph-users Subject: Re: [ceph-users] RBD journal feature On Tue, Aug 14, 2018 at 4:08 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 just in case it's an issue that's only in the luminous release. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling --journal-pool RBD_SSD :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd journal '37c8974b0dc51': header_oid: journal.37c8974b0dc51 object_oid_prefix: journal_data.1.37c8974b0dc51. order: 24 (16384 kB objects) splay_width: 4 *** :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: create_timestamp: Sat May 5 11:39:07 2018 journal: 37c8974b0dc51 mirroring state: disabled Kind regards, Glen Baars From: Jason Dillaman mailto:jdill...@redhat.com>> Sent: Tuesday, 14 August 2018 12:04 AM To: Glen Baars mailto:g...@onsitecomputers.com.au>> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] RBD journal feature On Sun, Aug 12, 2018 at 12:13 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. Is this the correct way to view the journal objects? You won't see any journal objects in the SSDPOOL until you issue a write: $ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M rbd_hdd/test --rbd-cache=false bench type write io_size 4096 io_threads 16 bytes 16777216 pattern random SEC OPS OPS/SEC BYTES/SEC 1 320332.01 1359896.98 2 736360.83 1477975.96 3 1040351.17 1438393.57 4 1392350.94 1437437.51 5 1744350.24 1434576.94 6 2080349.82 1432866.06 7 2416341.73 1399731.23 8 2784348.37 1426930.69 9 3152347.40 1422966.67 10 3520356.04 1458356.70 11 3920361.34 1480050.97 elapsed:11 ops: 4096 ops/sec: 353.61 bytes/sec: 1448392.06 $ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd_hdd --image test rbd journal '10746b8b4567': header_oid: journal.10746b8b4567 object_oid_prefix: journal_data.2.10746b8b4567. order: 24 (16 MiB objects) splay_width: 4 object_pool: rbd_ssd $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M rbd_hdd/test --rbd-cache=false bench type write io_size 4096 io_threads 16 bytes 16777216 pattern random SEC OPS OPS/SEC BYTES/SEC 1 240248.54 1018005.17 2 512263.47 1079154.06 3 768258.74 1059792.10 4 1040258.50 1058812.60 5 1312258.06 1057001.34 6 1536258.21 1057633.14 7 1792253.81 1039604.73 8 2032253.66 1038971.01 9 2256241.41 988800.93 10 2480237.87 974335.65 11 2752239.41 980624.20 12 2992239.61 981440.94 13 3200233.13 954887.84 14 3440237.36 972237.80 15 3680239.47 980853.37 16 3920238.75 977920.70 el
Re: [ceph-users] RBD journal feature
Hello Jason, I can confirm that your tests work on our cluster with a newly created image. We still can’t get the current images to use a different object pool. Do you think that maybe another feature is incompatible with this feature? Below is a log of the issue. :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Sat May 5 11:39:07 2018 :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a :~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling --journal-pool RBD_SSD :~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd journal '37c8974b0dc51': header_oid: journal.37c8974b0dc51 object_oid_prefix: journal_data.1.37c8974b0dc51. order: 24 (16384 kB objects) splay_width: 4 *** :~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a': size 51200 MB in 12800 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.37c8974b0dc51 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: create_timestamp: Sat May 5 11:39:07 2018 journal: 37c8974b0dc51 mirroring state: disabled Kind regards, Glen Baars From: Jason Dillaman Sent: Tuesday, 14 August 2018 12:04 AM To: Glen Baars Cc: dillaman ; ceph-users Subject: Re: [ceph-users] RBD journal feature On Sun, Aug 12, 2018 at 12:13 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Jason, Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. Is this the correct way to view the journal objects? You won't see any journal objects in the SSDPOOL until you issue a write: $ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M rbd_hdd/test --rbd-cache=false bench type write io_size 4096 io_threads 16 bytes 16777216 pattern random SEC OPS OPS/SEC BYTES/SEC 1 320332.01 1359896.98 2 736360.83 1477975.96 3 1040351.17 1438393.57 4 1392350.94 1437437.51 5 1744350.24 1434576.94 6 2080349.82 1432866.06 7 2416341.73 1399731.23 8 2784348.37 1426930.69 9 3152347.40 1422966.67 10 3520356.04 1458356.70 11 3920361.34 1480050.97 elapsed:11 ops: 4096 ops/sec: 353.61 bytes/sec: 1448392.06 $ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd $ rbd journal info --pool rbd_hdd --image test rbd journal '10746b8b4567': header_oid: journal.10746b8b4567 object_oid_prefix: journal_data.2.10746b8b4567. order: 24 (16 MiB objects) splay_width: 4 object_pool: rbd_ssd $ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M rbd_hdd/test --rbd-cache=false bench type write io_size 4096 io_threads 16 bytes 16777216 pattern random SEC OPS OPS/SEC BYTES/SEC 1 240248.54 1018005.17 2 512263.47 1079154.06 3 768258.74 1059792.10 4 1040258.50 1058812.60 5 1312258.06 1057001.34 6 1536258.21 1057633.14 7 1792253.81 1039604.73 8 2032253.66 1038971.01 9 2256241.41 988800.93 10 2480237.87 974335.65 11 2752239.41 980624.20 12 2992239.61 981440.94 13 3200233.13 954887.84 14 3440237.36 972237.80 15 3680239.47 980853.37 16 3920238.75 977920.70 elapsed:16 ops: 4096 ops/sec: 245.04 bytes/sec: 1003692.81 $ rados -p rbd_ssd ls | grep journal_data.2.10746b8b4567. journal_data.2.10746b8b4567.3 journal_data.2.10746b8b4567.0 journal_data.2.10746b8b4567.2 journal_data.2.10746b8b4567.1 rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL The symptoms that we are experiencing is a huge decrease in write speed ( 1QD 128K writes from 160MB/s down to 14MB/s ). We see no improvement when moving the journal to SSDPOOL ( but we don’t think it is really moving ) If you are trying to optimize for 128KiB writes, you might need to tweak the "rbd_journal_max_payload_bytes" setting since it currently is defaulted to split journal write events into a maximum of 16KiB payload [1] in order to optimize the worst-case memory usage of the r
Re: [ceph-users] RBD journal feature
Hello Jason, Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. Is this the correct way to view the journal objects? rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL The symptoms that we are experiencing is a huge decrease in write speed ( 1QD 128K writes from 160MB/s down to 14MB/s ). We see no improvement when moving the journal to SSDPOOL ( but we don’t think it is really moving ) Kind regards, Glen Baars From: Jason Dillaman Sent: Saturday, 11 August 2018 11:28 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature On Fri, Aug 10, 2018 at 3:01 AM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Ceph Users, I am trying to implement image journals for our RBD images ( required for mirroring ) rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL When we run the above command we still find the journal on the SLOWPOOL and not on the SSDPOOL. We are running 12.2.7 and all bluestore. We have also tried the ceph.conf option (rbd journal pool = SSDPOOL ) Has anyone else gotten this working? The journal header was on SLOWPOOL or the journal data objects? I would expect that the journal metadata header is located on SLOWPOOL but all data objects should be created on SSDPOOL as needed. Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-deploy] Cluster Name
I have now gotten this working. Thanks everyone for the help. The RBD-Mirror service is co-located on a MON server. Key points are: Start the services on the boxes with the following syntax ( depending on your config file names ) On primary systemctl start ceph-rbd-mirror@primary On secondary systemctl start ceph-rbd-mirror@secondary Ensure this works on both boxes ceph --cluster secondary -n client.secondary -s ceph --cluster primary -n client.primary -s check the log files under - /var/log/ceph/ceph-client.primary.log and /var/log/ceph/ceph-client.secondary.log My primary server had these files in it. ceph.client.admin.keyring ceph.client.primary.keyring ceph.conf primary.client.primary.keyring primary.conf secondary.client.secondary.keyring secondary.conf Kind regards, Glen Baars -Original Message- From: Thode Jocelyn Sent: Thursday, 9 August 2018 1:41 PM To: Erik McCormick Cc: Glen Baars ; Vasu Kulkarni ; ceph-users@lists.ceph.com Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name Hi Erik, The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file to determine which configuration file to use (from CLUSTER_NAME). So you need to set this to the name you chose for rbd-mirror to work. However setting this CLUSTER_NAME variable in /etc/sysconfig/ceph makes it so that the mon, osd etc services will also use this variable. Because of this they cannot start anymore as all their path are set with "ceph" as cluster name. However there might be something that I missed which would make this point moot Best Regards Jocelyn Thode -Original Message- From: Erik McCormick [mailto:emccorm...@cirrusseven.com] Sent: mercredi, 8 août 2018 16:39 To: Thode Jocelyn Cc: Glen Baars ; Vasu Kulkarni ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name I'm not using this feature, so maybe I'm missing something, but from the way I understand cluster naming to work... I still don't understand why this is blocking for you. Unless you are attempting to mirror between two clusters running on the same hosts (why would you do this?) then systemd doesn't come into play. The --cluster flag on the rbd command will simply set the name of a configuration file with the FSID and settings of the appropriate cluster. Cluster name is just a way of telling ceph commands and systemd units where to find the configs. So, what you end up with is something like: /etc/ceph/ceph.conf (your local cluster configuration) on both clusters /etc/ceph/local.conf (config of the source cluster. Just a copy of ceph.conf of the source clsuter) /etc/ceph/remote.conf (config of destination peer cluster. Just a copy of ceph.conf of the remote cluster). Run all your rbd mirror commands against local and remote names. However when starting things like mons, osds, mds, etc. you need no cluster name as it can use ceph.conf (cluster name of ceph). Am I making sense, or have I completely missed something? -Erik On Wed, Aug 8, 2018 at 8:34 AM, Thode Jocelyn wrote: > Hi, > > > > We are still blocked by this problem on our end. Glen did you or > someone else figure out something for this ? > > > > Regards > > Jocelyn Thode > > > > From: Glen Baars [mailto:g...@onsitecomputers.com.au] > Sent: jeudi, 2 août 2018 05:43 > To: Erik McCormick > Cc: Thode Jocelyn ; Vasu Kulkarni > ; ceph-users@lists.ceph.com > Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name > > > > Hello Erik, > > > > We are going to use RBD-mirror to replicate the clusters. This seems > to need separate cluster names. > > Kind regards, > > Glen Baars > > > > From: Erik McCormick > Sent: Thursday, 2 August 2018 9:39 AM > To: Glen Baars > Cc: Thode Jocelyn ; Vasu Kulkarni > ; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > > > Don't set a cluster name. It's no longer supported. It really only > matters if you're running two or more independent clusters on the same > boxes. That's generally inadvisable anyway. > > > > Cheers, > > Erik > > > > On Wed, Aug 1, 2018, 9:17 PM Glen Baars wrote: > > Hello Ceph Users, > > Does anyone know how to set the Cluster Name when deploying with > Ceph-deploy? I have 3 clusters to configure and need to correctly set > the name. > > Kind regards, > Glen Baars > > -Original Message- > From: ceph-users On Behalf Of Glen > Baars > Sent: Monday, 23 July 2018 5:59 PM > To: Thode Jocelyn ; Vasu Kulkarni > > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > How very timely, I am facing the exact same issue. > > Kind regards, > Glen Baars > > -Original Message- > From: ceph-users On Behalf Of > Thode Jocelyn > Sent: M
[ceph-users] RBD journal feature
Hello Ceph Users, I am trying to implement image journals for our RBD images ( required for mirroring ) rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL When we run the above command we still find the journal on the SLOWPOOL and not on the SSDPOOL. We are running 12.2.7 and all bluestore. We have also tried the ceph.conf option (rbd journal pool = SSDPOOL ) Has anyone else gotten this working? Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-deploy] Cluster Name
Hello Erik, We are going to use RBD-mirror to replicate the clusters. This seems to need separate cluster names. Kind regards, Glen Baars From: Erik McCormick Sent: Thursday, 2 August 2018 9:39 AM To: Glen Baars Cc: Thode Jocelyn ; Vasu Kulkarni ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name Don't set a cluster name. It's no longer supported. It really only matters if you're running two or more independent clusters on the same boxes. That's generally inadvisable anyway. Cheers, Erik On Wed, Aug 1, 2018, 9:17 PM Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Ceph Users, Does anyone know how to set the Cluster Name when deploying with Ceph-deploy? I have 3 clusters to configure and need to correctly set the name. Kind regards, Glen Baars -Original Message- From: ceph-users mailto:ceph-users-boun...@lists.ceph.com>> On Behalf Of Glen Baars Sent: Monday, 23 July 2018 5:59 PM To: Thode Jocelyn mailto:jocelyn.th...@elca.ch>>; Vasu Kulkarni mailto:vakul...@redhat.com>> Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name How very timely, I am facing the exact same issue. Kind regards, Glen Baars -Original Message- From: ceph-users mailto:ceph-users-boun...@lists.ceph.com>> On Behalf Of Thode Jocelyn Sent: Monday, 23 July 2018 1:42 PM To: Vasu Kulkarni mailto:vakul...@redhat.com>> Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name Hi, Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where they are collocated as they all use the "/etc/sysconfig/ceph" configuration file. Best Jocelyn Thode -Original Message- From: Vasu Kulkarni [mailto:vakul...@redhat.com<mailto:vakul...@redhat.com>] Sent: vendredi, 20 juillet 2018 17:25 To: Thode Jocelyn mailto:jocelyn.th...@elca.ch>> Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn mailto:jocelyn.th...@elca.ch>> wrote: > Hi, > > > > I noticed that in commit > https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980 > 23b60efe421f3, the ability to specify a cluster name was removed. Is > there a reason for this removal ? > > > > Because right now, there are no possibility to create a ceph cluster > with a different name with ceph-deploy which is a big problem when > having two clusters replicating with rbd-mirror as we need different names. > > > > And even when following the doc here: > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h > tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w > ith-the-same-name > > > > This is not sufficient as once we change the CLUSTER variable in the > sysconfig file, mon,osd, mds etc. all use it and fail to start on a > reboot as they then try to load data from a path in /var/lib/ceph > containing the cluster name. Is you rbd-mirror client also colocated with mon/osd? This needs to be changed only on the client side where you are doing mirroring, rest of the nodes are not affected? > > > > Is there a solution to this problem ? > > > > Best Regards > > Jocelyn Thode > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is
Re: [ceph-users] [Ceph-deploy] Cluster Name
Hello Ceph Users, Does anyone know how to set the Cluster Name when deploying with Ceph-deploy? I have 3 clusters to configure and need to correctly set the name. Kind regards, Glen Baars -Original Message- From: ceph-users On Behalf Of Glen Baars Sent: Monday, 23 July 2018 5:59 PM To: Thode Jocelyn ; Vasu Kulkarni Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name How very timely, I am facing the exact same issue. Kind regards, Glen Baars -Original Message- From: ceph-users On Behalf Of Thode Jocelyn Sent: Monday, 23 July 2018 1:42 PM To: Vasu Kulkarni Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name Hi, Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where they are collocated as they all use the "/etc/sysconfig/ceph" configuration file. Best Jocelyn Thode -Original Message- From: Vasu Kulkarni [mailto:vakul...@redhat.com] Sent: vendredi, 20 juillet 2018 17:25 To: Thode Jocelyn Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn wrote: > Hi, > > > > I noticed that in commit > https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980 > 23b60efe421f3, the ability to specify a cluster name was removed. Is > there a reason for this removal ? > > > > Because right now, there are no possibility to create a ceph cluster > with a different name with ceph-deploy which is a big problem when > having two clusters replicating with rbd-mirror as we need different names. > > > > And even when following the doc here: > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h > tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w > ith-the-same-name > > > > This is not sufficient as once we change the CLUSTER variable in the > sysconfig file, mon,osd, mds etc. all use it and fail to start on a > reboot as they then try to load data from a path in /var/lib/ceph > containing the cluster name. Is you rbd-mirror client also colocated with mon/osd? This needs to be changed only on the client side where you are doing mirroring, rest of the nodes are not affected? > > > > Is there a solution to this problem ? > > > > Best Regards > > Jocelyn Thode > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-deploy] Cluster Name
How very timely, I am facing the exact same issue. Kind regards, Glen Baars -Original Message- From: ceph-users On Behalf Of Thode Jocelyn Sent: Monday, 23 July 2018 1:42 PM To: Vasu Kulkarni Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name Hi, Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where they are collocated as they all use the "/etc/sysconfig/ceph" configuration file. Best Jocelyn Thode -Original Message- From: Vasu Kulkarni [mailto:vakul...@redhat.com] Sent: vendredi, 20 juillet 2018 17:25 To: Thode Jocelyn Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn wrote: > Hi, > > > > I noticed that in commit > https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980 > 23b60efe421f3, the ability to specify a cluster name was removed. Is > there a reason for this removal ? > > > > Because right now, there are no possibility to create a ceph cluster > with a different name with ceph-deploy which is a big problem when > having two clusters replicating with rbd-mirror as we need different names. > > > > And even when following the doc here: > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h > tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w > ith-the-same-name > > > > This is not sufficient as once we change the CLUSTER variable in the > sysconfig file, mon,osd, mds etc. all use it and fail to start on a > reboot as they then try to load data from a path in /var/lib/ceph > containing the cluster name. Is you rbd-mirror client also colocated with mon/osd? This needs to be changed only on the client side where you are doing mirroring, rest of the nodes are not affected? > > > > Is there a solution to this problem ? > > > > Best Regards > > Jocelyn Thode > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks
Thanks for the reply! - it ended being that the HDD pool in this server is larger than the other servers. This increases the server's weight and therefore the SSD pool in this server is affected. I will add more SSDs to this server to keep the ratio of HDDs to SSDs the same across all hosts. Kind regards, Glen Baars From: Linh Vu Sent: Sunday, 22 July 2018 7:46 AM To: Glen Baars ; ceph-users Subject: Re: 12.2.7 - Available space decreasing when adding disks Something funny going on with your new disks: 138 ssd 0.90970 1.0 931G 820G 111G 88.08 2.71 216 Added 139 ssd 0.90970 1.0 931G 771G 159G 82.85 2.55 207 Added 140 ssd 0.90970 1.0 931G 709G 222G 76.12 2.34 197 Added 141 ssd 0.90970 1.0 931G 664G 267G 71.31 2.19 184 Added The last 3 columns are: % used, variation, and PG count. These 4 have much higher %used and PG count than the rest, almost double. You probably have these disks in multiple pools and therefore have too many PGs on them. One of them is at 88% used. The max available capacity of a pool is calculated based on the most full OSD in it, which is why your total available capacity drops to 0.6TB. From: ceph-users mailto:ceph-users-boun...@lists.ceph.com>> on behalf of Glen Baars mailto:g...@onsitecomputers.com.au>> Sent: Saturday, 21 July 2018 10:43:16 AM To: ceph-users Subject: [ceph-users] 12.2.7 - Available space decreasing when adding disks Hello Ceph Users, We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the SSD pool ). I would assume that the weight needs to be changed but I didn't think I would need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance correctly? #ceph osd tree | grep -v hdd ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -1 534.60309 root default -1962.90637 host NAS-AUBUN-RK2-CEPH06 115 ssd 0.43660 osd.115 up 1.0 1.0 116 ssd 0.43660 osd.116 up 1.0 1.0 117 ssd 0.43660 osd.117 up 1.0 1.0 118 ssd 0.43660 osd.118 up 1.0 1.0 -22 105.51169 host NAS-AUBUN-RK2-CEPH07 138 ssd 0.90970 osd.138 up 1.0 1.0 Added 139 ssd 0.90970 osd.139 up 1.0 1.0 Added -25 105.51169 host NAS-AUBUN-RK2-CEPH08 140 ssd 0.90970 osd.140 up 1.0 1.0 Added 141 ssd 0.90970 osd.141 up 1.0 1.0 Added -356.32617 host NAS-AUBUN-RK3-CEPH01 60 ssd 0.43660 osd.60up 1.0 1.0 61 ssd 0.43660 osd.61up 1.0 1.0 62 ssd 0.43660 osd.62up 1.0 1.0 63 ssd 0.43660 osd.63up 1.0 1.0 -556.32617 host NAS-AUBUN-RK3-CEPH02 64 ssd 0.43660 osd.64up 1.0 1.0 65 ssd 0.43660 osd.65up 1.0 1.0 66 ssd 0.43660 osd.66up 1.0 1.0 67 ssd 0.43660 osd.67up 1.0 1.0 -756.32617 host NAS-AUBUN-RK3-CEPH03 68 ssd 0.43660 osd.68up 1.0 1.0 69 ssd 0.43660 osd.69up 1.0 1.0 70 ssd 0.43660 osd.70up 1.0 1.0 71 ssd 0.43660 osd.71up 1.0 1.0 -1345.84741 host NAS-AUBUN-RK3-CEPH04 72 ssd 0.54579 osd.72up 1.0 1.0 73 ssd 0.54579 osd.73up 1.0 1.0 76 ssd 0.54579 osd.76up 1.0 1.0 77 ssd 0.54579 osd.77up 1.0 1.0 -1645.84741 host NAS-AUBUN-RK3-CEPH05 74 ssd 0.54579 osd.74up 1.0 1.0 75 ssd 0.54579 osd.75up 1.0 1.0 78 ssd 0.54579 osd.78up 1.0 1.0 79 ssd 0.54579 osd.79up 1.0 1.0 # ceph osd df | grep -v hdd ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 115 ssd 0.43660 1.0 447G 250G 196G 56.00 1.72 103 116 ssd 0.43660 1.0 447G 191G 255G 42.89 1.32 84 117 ssd 0.43660 1.0 447G 213G 233G 47.79 1.47 92 118 ssd 0.43660 1.0 447G 208G 238G 46.61 1.43 85 138 ssd 0.90970 1.0 931G 820G 111G 88.08 2.71 216 Added 139 ssd 0.90970 1.0 931G 771G 159G 82.85 2.55 207 Added 140 ssd 0.90970 1
Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks
0.43660 osd.68up 1.0 1.0 69 ssd 0.43660 osd.69up 1.0 1.0 70 ssd 0.43660 osd.70up 1.0 1.0 71 ssd 0.43660 osd.71up 1.0 1.0 -1345.84741 host NAS-AUBUN-RK3-CEPH04 80 hdd 3.63869 osd.80up 1.0 1.0 81 hdd 3.63869 osd.81up 1.0 1.0 82 hdd 3.63869 osd.82up 1.0 1.0 83 hdd 3.63869 osd.83up 1.0 1.0 84 hdd 3.63869 osd.84up 1.0 1.0 85 hdd 3.63869 osd.85up 1.0 1.0 86 hdd 3.63869 osd.86up 1.0 1.0 87 hdd 3.63869 osd.87up 1.0 1.0 88 hdd 3.63869 osd.88up 1.0 1.0 89 hdd 3.63869 osd.89up 1.0 1.0 90 hdd 3.63869 osd.90up 1.0 1.0 91 hdd 3.63869 osd.91up 1.0 1.0 72 ssd 0.54579 osd.72up 1.0 1.0 73 ssd 0.54579 osd.73up 1.0 1.0 76 ssd 0.54579 osd.76up 1.0 1.0 77 ssd 0.54579 osd.77up 1.0 1.0 -1645.84741 host NAS-AUBUN-RK3-CEPH05 92 hdd 3.63869 osd.92up 1.0 1.0 93 hdd 3.63869 osd.93up 1.0 1.0 94 hdd 3.63869 osd.94up 1.0 1.0 95 hdd 3.63869 osd.95up 1.0 1.0 96 hdd 3.63869 osd.96up 1.0 1.0 97 hdd 3.63869 osd.97up 1.0 1.0 98 hdd 3.63869 osd.98up 1.0 1.0 99 hdd 3.63869 osd.99up 1.0 1.0 100 hdd 3.63869 osd.100 up 1.0 1.0 101 hdd 3.63869 osd.101 up 1.0 1.0 102 hdd 3.63869 osd.102 up 1.0 1.0 103 hdd 3.63869 osd.103 up 1.0 1.0 74 ssd 0.54579 osd.74up 1.0 1.0 75 ssd 0.54579 osd.75up 1.0 1.0 78 ssd 0.54579 osd.78up 1.0 1.0 79 ssd 0.54579 osd.79up 1.0 1.0 Kind regards, Glen Baars From: Shawn Iverson Sent: Saturday, 21 July 2018 9:21 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks Glen, Correction...looked at the wrong column for weights, my bad... I was looking at the wrong column for weight. You have varying weights, but the process is still the same. Balance your buckets (hosts) in your crush map, and balance your osds in each bucket (host). On Sat, Jul 21, 2018 at 9:14 AM, Shawn Iverson mailto:ivers...@rushville.k12.in.us>> wrote: Glen, It appears you have 447G, 931G, and 558G disks in your cluster, all with a weight of 1.0. This means that although the new disks are bigger, they are not going to be utilized by pgs any more than any other disk. I would suggest reweighting your other disks (they are smaller), so that you balance your cluster. You should do this gradually over time, preferably during off-peak times, when remapping will not affect operations. I do a little math, first by taking total cluster capacity and dividing it by total capacity of each bucket. I then do the same thing in each bucket, until everything is proportioned appropriately down to the osds. On Fri, Jul 20, 2018 at 8:43 PM, Glen Baars mailto:g...@onsitecomputers.com.au>> wrote: Hello Ceph Users, We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the SSD pool ). I would assume that the weight needs to be changed but I didn’t think I would need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance correctly? #ceph osd tree | grep -v hdd ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -1 534.60309 root default -1962.90637 host NAS-AUBUN-RK2-CEPH06 115 ssd 0.43660 osd.115 up 1.0 1.0 116 ssd 0.43660 osd.116 up 1.0 1.0 117 ssd 0.43660 osd.117 up 1.0 1.0 118 ssd 0.43660 osd.118 up 1.0 1.0 -22 105.51169 host NAS-AUBUN-RK2-CEPH07 138 ssd 0.90970 osd.138
[ceph-users] 12.2.7 - Available space decreasing when adding disks
Hello Ceph Users, We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the SSD pool ). I would assume that the weight needs to be changed but I didn't think I would need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance correctly? #ceph osd tree | grep -v hdd ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -1 534.60309 root default -1962.90637 host NAS-AUBUN-RK2-CEPH06 115 ssd 0.43660 osd.115 up 1.0 1.0 116 ssd 0.43660 osd.116 up 1.0 1.0 117 ssd 0.43660 osd.117 up 1.0 1.0 118 ssd 0.43660 osd.118 up 1.0 1.0 -22 105.51169 host NAS-AUBUN-RK2-CEPH07 138 ssd 0.90970 osd.138 up 1.0 1.0 Added 139 ssd 0.90970 osd.139 up 1.0 1.0 Added -25 105.51169 host NAS-AUBUN-RK2-CEPH08 140 ssd 0.90970 osd.140 up 1.0 1.0 Added 141 ssd 0.90970 osd.141 up 1.0 1.0 Added -356.32617 host NAS-AUBUN-RK3-CEPH01 60 ssd 0.43660 osd.60up 1.0 1.0 61 ssd 0.43660 osd.61up 1.0 1.0 62 ssd 0.43660 osd.62up 1.0 1.0 63 ssd 0.43660 osd.63up 1.0 1.0 -556.32617 host NAS-AUBUN-RK3-CEPH02 64 ssd 0.43660 osd.64up 1.0 1.0 65 ssd 0.43660 osd.65up 1.0 1.0 66 ssd 0.43660 osd.66up 1.0 1.0 67 ssd 0.43660 osd.67up 1.0 1.0 -756.32617 host NAS-AUBUN-RK3-CEPH03 68 ssd 0.43660 osd.68up 1.0 1.0 69 ssd 0.43660 osd.69up 1.0 1.0 70 ssd 0.43660 osd.70up 1.0 1.0 71 ssd 0.43660 osd.71up 1.0 1.0 -1345.84741 host NAS-AUBUN-RK3-CEPH04 72 ssd 0.54579 osd.72up 1.0 1.0 73 ssd 0.54579 osd.73up 1.0 1.0 76 ssd 0.54579 osd.76up 1.0 1.0 77 ssd 0.54579 osd.77up 1.0 1.0 -1645.84741 host NAS-AUBUN-RK3-CEPH05 74 ssd 0.54579 osd.74up 1.0 1.0 75 ssd 0.54579 osd.75up 1.0 1.0 78 ssd 0.54579 osd.78up 1.0 1.0 79 ssd 0.54579 osd.79up 1.0 1.0 # ceph osd df | grep -v hdd ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 115 ssd 0.43660 1.0 447G 250G 196G 56.00 1.72 103 116 ssd 0.43660 1.0 447G 191G 255G 42.89 1.32 84 117 ssd 0.43660 1.0 447G 213G 233G 47.79 1.47 92 118 ssd 0.43660 1.0 447G 208G 238G 46.61 1.43 85 138 ssd 0.90970 1.0 931G 820G 111G 88.08 2.71 216 Added 139 ssd 0.90970 1.0 931G 771G 159G 82.85 2.55 207 Added 140 ssd 0.90970 1.0 931G 709G 222G 76.12 2.34 197 Added 141 ssd 0.90970 1.0 931G 664G 267G 71.31 2.19 184 Added 60 ssd 0.43660 1.0 447G 275G 171G 61.62 1.89 100 61 ssd 0.43660 1.0 447G 237G 209G 53.04 1.63 90 62 ssd 0.43660 1.0 447G 275G 171G 61.58 1.89 95 63 ssd 0.43660 1.0 447G 260G 187G 58.15 1.79 97 64 ssd 0.43660 1.0 447G 232G 214G 52.08 1.60 83 65 ssd 0.43660 1.0 447G 207G 239G 46.36 1.42 75 66 ssd 0.43660 1.0 447G 217G 230G 48.54 1.49 84 67 ssd 0.43660 1.0 447G 252G 195G 56.36 1.73 92 68 ssd 0.43660 1.0 447G 248G 198G 55.56 1.71 94 69 ssd 0.43660 1.0 447G 229G 217G 51.25 1.57 84 70 ssd 0.43660 1.0 447G 259G 187G 58.01 1.78 87 71 ssd 0.43660 1.0 447G 267G 179G 59.83 1.84 97 72 ssd 0.54579 1.0 558G 217G 341G 38.96 1.20 100 73 ssd 0.54579 1.0 558G 283G 275G 50.75 1.56 121 76 ssd 0.54579 1.0 558G 286G 272G 51.33 1.58 129 77 ssd 0.54579 1.0 558G 246G 312G 44.07 1.35 104 74 ssd 0.54579 1.0 558G 273G 285G 48.91 1.50 122 75 ssd 0.54579 1.0 558G 281G 276G 50.45 1.55 114 78 ssd 0.54579 1.0 558G 289G 269G 51.80 1.59 133 79 ssd 0.54579 1.0 558G 276G 282G 49.39 1.52 119 Kind regards, Glen Baars BackOnline Manager This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you
Re: [ceph-users] 12.2.6 upgrade
Thanks, we are fully bluestore and therefore just set osd skip data digest = true Kind regards, Glen Baars -Original Message- From: Dan van der Ster Sent: Friday, 20 July 2018 4:08 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] 12.2.6 upgrade That's right. But please read the notes carefully to understand if you need to set osd skip data digest = true or osd distrust data digest = true .. dan On Fri, Jul 20, 2018 at 10:02 AM Glen Baars wrote: > > I saw that on the release notes. > > Does that mean that the active+clean+inconsistent PGs will be OK? > > Is the data still getting replicated even if inconsistent? > > Kind regards, > Glen Baars > > -Original Message- > From: Dan van der Ster > Sent: Friday, 20 July 2018 3:57 PM > To: Glen Baars > Cc: ceph-users > Subject: Re: [ceph-users] 12.2.6 upgrade > > CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. > See > https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12 > -2-6 > > On Fri, Jul 20, 2018 at 8:30 AM Glen Baars > wrote: > > > > Hello Ceph Users, > > > > > > > > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub > > errors ) to fix from the time when we ran 12.2.6. It doesn’t seem to be > > affecting production at this time. > > > > > > > > Below is the log of a PG repair. What is the best way to correct these > > errors? Is there any further information required? > > > > > > > > rados list-inconsistent-obj 1.275 --format=json-pretty > > > > { > > > > "epoch": 38481, > > > > "inconsistents": [] > > > > } > > > > > > > > Is it odd that it doesn’t list any inconsistents? > > > > > > > > Ceph.log entries for this PG. > > > > 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 > > 81 : cluster [ERR] 1.275 shard 100: soid > > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head > > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi > > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148 > > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 > > uv 314836 dd 92f2c4c8 od alloc_hint [4194304 4194304 0]) > > > > 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 > > 82 : cluster [ERR] 1.275 shard 124: soid > > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head > > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi > > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148 > > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 > > uv 314836 dd 92f2c4c8 od alloc_hint [4194304 4194304 0]) > > > > 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 > > 83 : cluster [ERR] 1.275 soid > > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to > > pick suitable auth object > > > > 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 > > 84 : cluster [ERR] 1.275 shard 100: soid > > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head > > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi > > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306 > > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 > > uv 307138 dd 38400b00 od alloc_hint [4194304 4194304 0]) > > > > 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 > > 85 : cluster [ERR] 1.275 shard 124: soid > > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head > > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi > > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306 > > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 > > uv 307138 dd 38400b00 od alloc_hint [4194304 4194304 0]) > > > > 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 > > 86 : cluster [ERR] 1.275 soid > > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to > > pick suitable auth object > > > > 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 > > 87 : cluster [ERR] 1.275 shard 100: soid > > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head > > data_digest 0x6555a7c9 != data_digest 0xbad822f from auth oi > > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'3148 > > 79 client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 > > uv 314879 dd bad82
Re: [ceph-users] 12.2.6 upgrade
I saw that on the release notes. Does that mean that the active+clean+inconsistent PGs will be OK? Is the data still getting replicated even if inconsistent? Kind regards, Glen Baars -Original Message- From: Dan van der Ster Sent: Friday, 20 July 2018 3:57 PM To: Glen Baars Cc: ceph-users Subject: Re: [ceph-users] 12.2.6 upgrade CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6 On Fri, Jul 20, 2018 at 8:30 AM Glen Baars wrote: > > Hello Ceph Users, > > > > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub errors > ) to fix from the time when we ran 12.2.6. It doesn’t seem to be affecting > production at this time. > > > > Below is the log of a PG repair. What is the best way to correct these > errors? Is there any further information required? > > > > rados list-inconsistent-obj 1.275 --format=json-pretty > > { > > "epoch": 38481, > > "inconsistents": [] > > } > > > > Is it odd that it doesn’t list any inconsistents? > > > > Ceph.log entries for this PG. > > 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : > cluster [ERR] 1.275 shard 100: soid > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest > 0x1a131dab != data_digest 0x92f2c4c8 from auth oi > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 > client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 > dd 92f2c4c8 od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : > cluster [ERR] 1.275 shard 124: soid > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest > 0x1a131dab != data_digest 0x92f2c4c8 from auth oi > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 > client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 > dd 92f2c4c8 od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : > cluster [ERR] 1.275 soid > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to pick > suitable auth object > > 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : > cluster [ERR] 1.275 shard 100: soid > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest > 0xdf907335 != data_digest 0x38400b00 from auth oi > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 > client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd > 38400b00 od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : > cluster [ERR] 1.275 shard 124: soid > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest > 0xdf907335 != data_digest 0x38400b00 from auth oi > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 > client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd > 38400b00 od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : > cluster [ERR] 1.275 soid > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to pick > suitable auth object > > 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : > cluster [ERR] 1.275 shard 100: soid > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest > 0x6555a7c9 != data_digest 0xbad822f from auth oi > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 > client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 > dd bad822f od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : > cluster [ERR] 1.275 shard 124: soid > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest > 0x6555a7c9 != data_digest 0xbad822f from auth oi > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 > client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 > dd bad822f od alloc_hint [4194304 4194304 0]) > > 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : > cluster [ERR] 1.275 soid > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: failed to pick > suitable auth object > > 2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : > cluster [ERR] 1.275 shard 100: soid > 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest > 0xa394e845 != data_digest 0xd8aa931c
[ceph-users] 12.2.6 upgrade
gest 0x218b7cb4 from auth oi 1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head(37426'306744 client.1079025.0:23363742 dirty|data_digest|omap_digest s 4194304 uv 306744 dd 218b7cb4 od alloc_hint [4194304 4194304 0]) 2018-07-20 12:19:59.498925 osd.124 osd.124 10.4.35.36:6810/1865422 94 : cluster [ERR] 1.275 shard 124: soid 1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head data_digest 0x2008cb1b != data_digest 0x218b7cb4 from auth oi 1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head(37426'306744 client.1079025.0:23363742 dirty|data_digest|omap_digest s 4194304 uv 306744 dd 218b7cb4 od alloc_hint [4194304 4194304 0]) 2018-07-20 12:19:59.498927 osd.124 osd.124 10.4.35.36:6810/1865422 95 : cluster [ERR] 1.275 soid 1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head: failed to pick suitable auth object 2018-07-20 12:20:29.937564 osd.124 osd.124 10.4.35.36:6810/1865422 96 : cluster [ERR] 1.275 shard 100: soid 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head data_digest 0x1b42858b != data_digest 0x69a5f3de from auth oi 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head(38220'328463 client.1084539.0:403248048 dirty|data_digest|omap_digest s 4194304 uv 308146 dd 69a5f3de od alloc_hint [4194304 4194304 0]) 2018-07-20 12:20:29.937568 osd.124 osd.124 10.4.35.36:6810/1865422 97 : cluster [ERR] 1.275 shard 124: soid 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head data_digest 0x1b42858b != data_digest 0x69a5f3de from auth oi 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head(38220'328463 client.1084539.0:403248048 dirty|data_digest|omap_digest s 4194304 uv 308146 dd 69a5f3de od alloc_hint [4194304 4194304 0]) 2018-07-20 12:20:29.937570 osd.124 osd.124 10.4.35.36:6810/1865422 98 : cluster [ERR] 1.275 soid 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head: failed to pick suitable auth object 2018-07-20 12:21:07.463206 osd.124 osd.124 10.4.35.36:6810/1865422 99 : cluster [ERR] 1.275 repair 12 errors, 0 fixed Kind regards, Glen Baars From: ceph-users mailto:ceph-users-boun...@lists.ceph.com>> On Behalf Of Glen Baars Sent: Wednesday, 18 July 2018 10:33 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] 10.2.6 upgrade Hello Ceph Users, We installed 12.2.6 on a single node in the cluster ( new node added, 80TB moved ) Disabled scrub/deepscrub once the issues with 12.2.6 were discovered. Today we upgrade the one affected node to 12.2.7 today, set osd skip data digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster. We are now seeing inconsistent PGs and scrub errors now the scrubbing has resumed. What is the best way forward? 1. Upgrade all nodes to 12.2.7? 2. Remove the 12.2.7 node and rebuild? Kind regards, Glen Baars BackOnline Manager This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 10.2.6 upgrade
Hello Sage, Thanks for the response. I new fairly new to ceph. Is there any commands that would help confirm the issue? Kind regards, Glen Baars T 1300 733 328 NZ +64 9280 3561 MOB +61 447 991 234 This e-mail may contain confidential and/or privileged information.If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. -Original Message- From: Sage Weil Sent: Wednesday, 18 July 2018 10:38 PM To: Glen Baars Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] 10.2.6 upgrade On Wed, 18 Jul 2018, Glen Baars wrote: > Hello Ceph Users, > > We installed 12.2.6 on a single node in the cluster ( new node added, > 80TB moved ) Disabled scrub/deepscrub once the issues with 12.2.6 were > discovered. > > > Today we upgrade the one affected node to 12.2.7 today, set osd skip data > digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster. > > > We are now seeing inconsistent PGs and scrub errors now the scrubbing has > resumed. It is likely the inconsistencies were tehre from teh period running 12.2.6, not due ot 12.2.7. I would suggest continuing the upgrade. The scrub errors will either go away on their own or need to wait until 12.2.8 for scrub to learn how to repair them for you. Can you share the scrub error you got to confirm it is the digest issue in 12.2.6 that is to blame? sage > What is the best way forward? > > > 1. Upgrade all nodes to 12.2.7? > 2. Remove the 12.2.7 node and rebuild? > Kind regards, > Glen Baars > BackOnline Manager > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. If > you have received this e-mail in error, please notify us immediately. > This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 10.2.6 upgrade
Hello Ceph Users, We installed 12.2.6 on a single node in the cluster ( new node added, 80TB moved ) Disabled scrub/deepscrub once the issues with 12.2.6 were discovered. Today we upgrade the one affected node to 12.2.7 today, set osd skip data digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster. We are now seeing inconsistent PGs and scrub errors now the scrubbing has resumed. What is the best way forward? 1. Upgrade all nodes to 12.2.7? 2. Remove the 12.2.7 node and rebuild? Kind regards, Glen Baars BackOnline Manager This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] intermittent slow requests on idle ssd ceph clusters
Hello Pavel, I don't have all that much info ( fairly new to Ceph ) but we are facing a similar issue. If the cluster is fairly idle we get slow requests - if I'm backfilling a new node there is no slow requests. Same X540 network cards but ceph 12.2.5 and Ubuntu 16.04. 4.4.0 kernel. LACP with VLANs for ceph front/backend networks. Not sure that it is the same issue but if you want me to do any tests - let me know. Kind regards, Glen Baars -Original Message- From: ceph-users On Behalf Of Xavier Trilla Sent: Tuesday, 17 July 2018 6:16 AM To: Pavel Shub ; Ceph Users Subject: Re: [ceph-users] intermittent slow requests on idle ssd ceph clusters Hi Pavel, Any strange messages on dmesg, syslog, etc? I would recommend profiling the kernel with perf and checking for the calls that are consuming more CPU. We had several problems like the one you are describing, and for example one of them got fixed increasing vm.min_free_kbytes to 4GB. Also, how is the sys usage if you run top on the machines hosting the OSDs? Saludos Cordiales, Xavier Trilla P. Clouding.io ¿Un Servidor Cloud con SSDs, redundado y disponible en menos de 30 segundos? ¡Pruébalo ahora en Clouding.io! -Mensaje original- De: ceph-users En nombre de Pavel Shub Enviado el: lunes, 16 de julio de 2018 23:52 Para: Ceph Users Asunto: [ceph-users] intermittent slow requests on idle ssd ceph clusters Hello folks, We've been having issues with slow requests cropping up on practically idle ceph clusters. From what I can tell the requests are hanging waiting for subops, and the OSD on the other end receives requests minutes later! Below it started waiting for subops at 12:09:51 and the subop was completed at 12:14:28. { "description": "osd_op(client.903117.0:569924 6.391 6:89ed76f2:::%2fraster%2fv5%2fes%2f16%2f36320%2f24112:head [writefull 0~2072] snapc 0=[] ondisk+write+known_if_redirected e5777)", "initiated_at": "2018-07-05 12:09:51.191419", "age": 326.651167, "duration": 276.977834, "type_data": { "flag_point": "commit sent; apply or cleanup", "client_info": { "client": "client.903117", "client_addr": "10.20.31.234:0/1433094386", "tid": 569924 }, "events": [ { "time": "2018-07-05 12:09:51.191419", "event": "initiated" }, { "time": "2018-07-05 12:09:51.191471", "event": "queued_for_pg" }, { "time": "2018-07-05 12:09:51.191538", "event": "reached_pg" }, { "time": "2018-07-05 12:09:51.191877", "event": "started" }, { "time": "2018-07-05 12:09:51.192135", "event": "waiting for subops from 11" }, { "time": "2018-07-05 12:09:51.192599", "event": "op_commit" }, { "time": "2018-07-05 12:09:51.192616", "event": "op_applied" }, { "time": "2018-07-05 12:14:28.169018", "event": "sub_op_commit_rec from 11" }, { "time": "2018-07-05 12:14:28.169164", "event": "commit_sent" }, { "time": "2018-07-05 12:14:28.169253", "event": "done" } ] } }, The below is what I assume the corresponding request on osd.11, it seems to be receiving the network request ~4 minutes later. 2018-07-05 12:14:28.058552 7fb75ee0e700 20 osd.11 5777 share_map_peer 0x562b61bca000 already has epoch 5777 2018-07-05 12:14:28.167247 7fb75de0c700 10 osd.11 5777 new session 0x562cc23f0200 con=0x562baaa0e000 addr=10.16.15.28:6805/3218 2018-07-05 12:14:28.167282 7fb75de0c700 10 osd.11 5777 session 0x562cc23f0200 osd.20 has caps osdcap[grant(*)] 'allow *' 2018-07-05 12:14:28.167291 7fb75de0c700 0 -- 10.16.16.32:6817/3808 >> 10.16.15.28:6805/3218 conn(0x562baaa0e000 :6817 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 20 vs existing csq=19 existing_state=STATE_STANDBY 2018-07-05 12:14:28.167322 7fb7546d6700 2 osd.11 5777 ms_handle_reset con 0x562baaa0e000 session 0x562cc23f0200 2018-07-05 12:14:28.167546 7fb75de0c700 10 osd.11 5777 session 0x562b62195c00 osd.20 has caps osdcap[grant(*)] 'allow *' This is an all SSD cluster with minimal load. All hardware checks return good values. The cluster is currently running latest ceph mimic (13.2.0) but we have also experienced this on other versions of luminous 12.2.2 and 12.2.5. I'm starting to think that this is a potential network driver issue. We're currently running on kernel 4.14.15 and when we updated to latest 4.17 the slow requests seem to occur more frequently. The network cards that we run are 10g intel X540. Does anyone know how I can debug this further? Thanks, Pavel _
Re: [ceph-users] 12.2.6 CRC errors
Thanks Uwe, I was that on the website. Any idea if what I have done is correct? Do I now just wait? Sent from my Cyanogen phone On 14 Jul 2018 11:16 PM, Uwe Sauter wrote: Hi Glen, about 16h ago there has been a notice on this list with subject "IMPORTANT: broken luminous 12.2.6 release in repo, do not upgrade" from Sage Weil (main developer of Ceph). Quote from this notice: "tl;dr: Please avoid the 12.2.6 packages that are currently present on download.ceph.com. We will have a 12.2.7 published ASAP (probably Monday). If you do not use bluestore or erasure-coded pools, none of the issues affect you. Details: We built 12.2.6 and pushed it to the repos Wednesday, but as that was happening realized there was a potentially dangerous regression in 12.2.5[1] that an upgrade might exacerbate. While we sorted that issue out, several people noticed the updated version in the repo and upgraded. That turned up two other regressions[2][3]. We have fixes for those, but are working on an additional fix to make the damage from [3] be transparently repaired." Regards, Uwe Am 14.07.2018 um 17:02 schrieb Glen Baars: > Hello Ceph users! > > Note to users, don't install new servers on Friday the 13th! > > We added a new ceph node on Friday and it has received the latest 12.2.6 > update. I started to see CRC errors and investigated hardware issues. I have > since found that it is caused by the 12.2.6 release. About 80TB copied onto > this server. > > I have set noout,noscrub,nodeepscrub and repaired the affected PGs ( ceph pg > repair ) . This has cleared the errors. > > * no idea if this is a good way to fix the issue. From the bug report > this issue is in the deepscrub and therefore I suppose stopping it will limit > the issues. *** > > Can anyone tell me what to do? Downgrade seems that it won't fix the issue. > Maybe remove this node and rebuild with 12.2.5 and resync data? Wait a few > days for 12.2.7? > > Kind regards, > Glen Baars > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally privileged > or confidential information. If you are not the recipient, any use, > distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. If > you have received this e-mail in error, please notify us immediately. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 12.2.6 CRC errors
Hello Ceph users! Note to users, don't install new servers on Friday the 13th! We added a new ceph node on Friday and it has received the latest 12.2.6 update. I started to see CRC errors and investigated hardware issues. I have since found that it is caused by the 12.2.6 release. About 80TB copied onto this server. I have set noout,noscrub,nodeepscrub and repaired the affected PGs ( ceph pg repair ) . This has cleared the errors. * no idea if this is a good way to fix the issue. From the bug report this issue is in the deepscrub and therefore I suppose stopping it will limit the issues. *** Can anyone tell me what to do? Downgrade seems that it won't fix the issue. Maybe remove this node and rebuild with 12.2.5 and resync data? Wait a few days for 12.2.7? Kind regards, Glen Baars This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use, distribution, disclosure or copying of this e-mail is prohibited. The confidentiality and legal privilege attached to this communication is not waived or lost by reason of the mistaken transmission or delivery to you. If you have received this e-mail in error, please notify us immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com