Re: [ceph-users] RGW pools don't show up in luminous
Casey - this was exactly it. My ceph-mgr had issues. I didn't know this was necessary for ceph df to work. Thank you R On Fri, Aug 24, 2018 at 8:56 AM Casey Bodley wrote: > > > On 08/23/2018 01:22 PM, Robert Stanford wrote: > > > > I installed a new Ceph cluster with Luminous, after a long time > > working with Jewel. I created my RGW pools the same as always (pool > > create default.rgw.buckets.data etc.), but they don't show up in ceph > > df with Luminous. Has the command changed? > > > > Thanks > > R > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > Hi Robert, > > Do you have a ceph-mgr running? I believe the accounting for 'ceph df' > is performed by ceph-mgr in Luminous and beyond, rather than ceph-mon. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client hangs
Are there hang request in /sys/kernel/debug/ceph//osdc On Fri, Aug 24, 2018 at 9:32 PM Zhenshi Zhou wrote: > > I'm afaid that the client hangs again...the log shows: > > 2018-08-24 21:27:54.714334 [WRN] slow request 62.607608 seconds old, > received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:27:54.714320 [WRN] 3 slow requests, 1 included below; oldest > blocked for > 843.556758 secs > 2018-08-24 21:27:24.713740 [WRN] slow request 32.606979 seconds old, > received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:27:24.713729 [WRN] 3 slow requests, 1 included below; oldest > blocked for > 813.556129 secs > 2018-08-24 21:25:49.711778 [WRN] slow request 483.807963 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:25:49.711766 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 718.554206 secs > 2018-08-24 21:21:54.707536 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, > sent 483.548912 seconds ago > 2018-08-24 21:21:54.706930 [WRN] slow request 483.549363 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > 2018-08-24 21:21:54.706920 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 483.549363 secs > 2018-08-24 21:21:49.706838 [WRN] slow request 243.803027 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:21:49.706828 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 478.549269 secs > 2018-08-24 21:19:49.704294 [WRN] slow request 123.800486 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:19:49.704284 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 358.546729 secs > 2018-08-24 21:18:49.703073 [WRN] slow request 63.799269 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:18:49.703062 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 298.545511 secs > 2018-08-24 21:18:19.702465 [WRN] slow request 33.798637 seconds old, > received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 > getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, > caller_gid=0{}) currently failed to rdlock, waiting > 2018-08-24 21:18:19.702456 [WRN] 2 slow requests, 1 included below; oldest > blocked for > 268.544880 secs > 2018-08-24 21:17:54.702517 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, > sent 243.543893 seconds ago > 2018-08-24 21:17:54.701904 [WRN] slow request 243.544331 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > 2018-08-24 21:17:54.701894 [WRN] 1 slow requests, 1 included below; oldest > blocked for > 243.544331 secs > 2018-08-24 21:15:54.700034 [WRN] client.213528 isn't responding to > mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, > sent 123.541410 seconds ago > 2018-08-24 21:15:54.699385 [WRN] slow request 123.541822 seconds old, > received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 > setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 > 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, > waiting > 2018-08-24 21:15:54.699375 [WRN] 1 slow requests, 1 included below; oldest > blocked for > 123.541822 secs > 2018-08-24 21:14:57.055183 [WRN] Health check failed: 1 clients failing to > respond to capability release (MDS_CLIENT_LATE_RELEASE) > 2018-08-24 21:14:56.167868 [WRN] MDS health message (mds.0): Client docker39 > failing to respond
Re: [ceph-users] ceph-fuse slow cache?
On Fri, Aug 24, 2018 at 1:20 AM Stefan Kooman wrote: > Hi Gregory, > > Quoting Gregory Farnum (gfar...@redhat.com): > > This is quite strange. Given that you have a log, I think what you want > to > > do is find one request in the log, trace it through its lifetime, and see > > where the time is elapsed. You may find a bifurcation, where some > > categories of requests happen instantly but other categories take a > second > > or more; focus on the second, obviously. > > So that is what I did. Turns out it's not the (slow) cache at all, probably > not to your surprise. The reads are quit fast actually, compared to > kernel client it's ~ 8 ms slower, or ~ 40%. It looks like couple > of writes / updates to, at least a session file, are slow: > > 2018-08-23 16:40:25.631 7f79156a8700 10 client.15158830 put_inode on > 0x1693859.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={3=1} > mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601 > mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601 > caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0 > objects 0 dirty_or_tx 0] > parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"] > 0x5646ff0e8000) > > 2018-08-23 16:40:28.547 7f79156a8700 10 client.15158830 > update_inode_file_time 0x1693859.head(faked_ino=0 ref=4 ll_ref=1 > cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1 > btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601 > ctime=2018-08-23 16:40:25.632601 > caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0 > objects 0 dirty_or_tx 0] > parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"] > 0x5646ff0e8000) pAsxLsXsxFsxcrwb ctime 2018-08-23 16:40:25.632601 mtime > 2018-08-23 16:40:25.632601 > Hmm, these aren't actually the start and end times to the same operation. put_inode() is literally adjusting a refcount, which can happen for reasons ranging from the VFS doing something that drops it to an internal operation completing to a response coming back from the MDS. You should be able to find requests coming in from the kernel and a response going back out (the function names will be prefixed with "ll_", eg "ll_lookup"). > > So, almost 3 seconds. Page is only served after this, and possibly, after > some cache files have been written. Note though that ceph-fuse is in > debug=20 mode. Apparently the kernel client is _much_ faster in writing > than ceph-fuse. If I write a file with "dd" (from /dev/urandom) it's in > the tens of milliseconds range, not seconds. atime / ctime changes are > handled in < 5 ms. > > I wonder if tuning file-striping [1] with stripe units of 4KB would be > beneficial in this case. > > Gr. Stefan > > [1]: http://docs.ceph.com/docs/master/dev/file-striping/ > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 > <+31%20318%20648%20688> / i...@bit.nl > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mimic - troubleshooting prometheus
Hi, Any idea/suggestions for troubleshooting prometheus ? what logs /commands are available to find out why OSD servers specific data ( IOPS, disk and network data) is not scrapped but cluster specific data ( pools, capacity ..etc) is ? Increasing log level for MGR showed only the following 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic prometheus plugin -no socket could be created
To have prometheus plugin working you HAVE to tell it to listen to an IPV4 address ...like this ceph config set mgr mgr/prometheus/server_addr 0.0.0.0 On Fri, 24 Aug 2018 at 12:44, Jones de Andrade wrote: > Hi all. > > I'm new to ceph, and after having serious problems in ceph stages 0, 1 and > 2 that I could solve myself, now it seems that I have hit a wall harder > than my head. :) > > When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going > up to here: > > ### > [14/71] ceph.sysctl on > node01... ✓ (0.5s) > node02 ✓ (0.7s) > node03... ✓ (0.6s) > node04. ✓ (0.5s) > node05... ✓ (0.6s) > node06.. ✓ (0.5s) > > [15/71] ceph.osd on > node01.. ❌ (0.7s) > node02 ❌ (0.7s) > node03... ❌ (0.7s) > node04. ❌ (0.6s) > node05... ❌ (0.6s) > node06.. ❌ (0.7s) > > Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s > > Failures summary: > > ceph.osd (/srv/salt/ceph/osd): > node02: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node02 for cephdisks.list > node03: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node03 for cephdisks.list > node01: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node01 for cephdisks.list > node04: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node04 for cephdisks.list > node05: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node05 for cephdisks.list > node06: > deploy OSDs: Module function osd.deploy threw an exception. Exception: > Mine on node06 for cephdisks.list > ### > > Since this is a first attempt in 6 simple test machines, we are going to > put the mon, osds, etc, in all nodes at first. Only the master is left in a > single machine (node01) by now. > > As they are simple machines, they have a single hdd, which is partitioned > as follows (the hda4 partition is unmounted and left for the ceph system): > > ### > # lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:00 465,8G 0 disk > ├─sda1 8:10 500M 0 part /boot/efi > ├─sda2 8:2016G 0 part [SWAP] > ├─sda3 8:30 49,3G 0 part / > └─sda4 8:40 400G 0 part > sr0 11:01 3,7G 0 rom > > # salt -I 'roles:storage' cephdisks.list > node01: > node02: > node03: > node04: > node05: > node06: > > # salt -I 'roles:storage' pillar.get ceph > node02: > -- > storage: > -- > osds: > -- > /dev/sda4: > -- > format: > bluestore > standalone: > True > (and so on for all 6 machines) > ## > > Finally and just in case, my policy.cfg file reads: > > # > #cluster-unassigned/cluster/*.sls > cluster-ceph/cluster/*.sls > profile-default/cluster/*.sls > profile-default/stack/default/ceph/minions/*yml > config/stack/default/global.yml > config/stack/default/ceph/cluster.yml > role-master/cluster/node01.sls > role-admin/cluster/*.sls > role-mon/cluster/*.sls > role-mgr/cluster/*.sls > role-mds/cluster/*.sls > role-ganesha/cluster/*.sls > role-client-nfs/cluster/*.sls > role-client-cephfs/cluster/*.sls > ## > > Please, could someone help me and shed some light on this issue? > > Thanks a lot in advance, > > Regasrds, > > Jones > > > > On Thu, Aug 23, 2018 at 2:46 PM John Spray wrote: > >> On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia >> wrote: >> > >> > Hi All, >> > >> > I am trying to enable prometheus plugin with no success due to "no >> socket could be created" >> > >> > The instructions for enabling the plugin are very straightforward and >> simple >> > >> > Note >> > My ultimate goal is to use Prometheus with Cephmetrics >> > Some of you suggested to deploy ceph-exporter but why do we need to do >> that when there is a plugin already ? >> > >> > >> > How can I troubleshoot this further ? >> > >> > nhandled exception from module 'prometheus' while running on mgr.mon01: >> error('No socket could be created',) >> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 >> prometheus.serve: >> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 >> Traceback (most recent call last): >> > Aug 23 12:03:06 mon01 ceph-mgr: File >>
[ceph-users] Ceph-Deploy error on 15/71 stage
(Please forgive my previous email: I was using another message and completely forget to update the subject) Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going up to here: ### [14/71] ceph.sysctl on node01... ✓ (0.5s) node02 ✓ (0.7s) node03... ✓ (0.6s) node04. ✓ (0.5s) node05... ✓ (0.6s) node06.. ✓ (0.5s) [15/71] ceph.osd on node01.. ❌ (0.7s) node02 ❌ (0.7s) node03... ❌ (0.7s) node04. ❌ (0.6s) node05... ❌ (0.6s) node06.. ❌ (0.7s) Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s Failures summary: ceph.osd (/srv/salt/ceph/osd): node02: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node02 for cephdisks.list node03: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node03 for cephdisks.list node01: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node01 for cephdisks.list node04: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node04 for cephdisks.list node05: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node05 for cephdisks.list node06: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node06 for cephdisks.list ### Since this is a first attempt in 6 simple test machines, we are going to put the mon, osds, etc, in all nodes at first. Only the master is left in a single machine (node01) by now. As they are simple machines, they have a single hdd, which is partitioned as follows (the hda4 partition is unmounted and left for the ceph system): ### # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 465,8G 0 disk ├─sda1 8:10 500M 0 part /boot/efi ├─sda2 8:2016G 0 part [SWAP] ├─sda3 8:30 49,3G 0 part / └─sda4 8:40 400G 0 part sr0 11:01 3,7G 0 rom # salt -I 'roles:storage' cephdisks.list node01: node02: node03: node04: node05: node06: # salt -I 'roles:storage' pillar.get ceph node02: -- storage: -- osds: -- /dev/sda4: -- format: bluestore standalone: True (and so on for all 6 machines) ## Finally and just in case, my policy.cfg file reads: # #cluster-unassigned/cluster/*.sls cluster-ceph/cluster/*.sls profile-default/cluster/*.sls profile-default/stack/default/ceph/minions/*yml config/stack/default/global.yml config/stack/default/ceph/cluster.yml role-master/cluster/node01.sls role-admin/cluster/*.sls role-mon/cluster/*.sls role-mgr/cluster/*.sls role-mds/cluster/*.sls role-ganesha/cluster/*.sls role-client-nfs/cluster/*.sls role-client-cephfs/cluster/*.sls ## Please, could someone help me and shed some light on this issue? Thanks a lot in advance, Regasrds, Jones ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic prometheus plugin -no socket could be created
Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going up to here: ### [14/71] ceph.sysctl on node01... ✓ (0.5s) node02 ✓ (0.7s) node03... ✓ (0.6s) node04. ✓ (0.5s) node05... ✓ (0.6s) node06.. ✓ (0.5s) [15/71] ceph.osd on node01.. ❌ (0.7s) node02 ❌ (0.7s) node03... ❌ (0.7s) node04. ❌ (0.6s) node05... ❌ (0.6s) node06.. ❌ (0.7s) Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s Failures summary: ceph.osd (/srv/salt/ceph/osd): node02: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node02 for cephdisks.list node03: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node03 for cephdisks.list node01: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node01 for cephdisks.list node04: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node04 for cephdisks.list node05: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node05 for cephdisks.list node06: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node06 for cephdisks.list ### Since this is a first attempt in 6 simple test machines, we are going to put the mon, osds, etc, in all nodes at first. Only the master is left in a single machine (node01) by now. As they are simple machines, they have a single hdd, which is partitioned as follows (the hda4 partition is unmounted and left for the ceph system): ### # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 465,8G 0 disk ├─sda1 8:10 500M 0 part /boot/efi ├─sda2 8:2016G 0 part [SWAP] ├─sda3 8:30 49,3G 0 part / └─sda4 8:40 400G 0 part sr0 11:01 3,7G 0 rom # salt -I 'roles:storage' cephdisks.list node01: node02: node03: node04: node05: node06: # salt -I 'roles:storage' pillar.get ceph node02: -- storage: -- osds: -- /dev/sda4: -- format: bluestore standalone: True (and so on for all 6 machines) ## Finally and just in case, my policy.cfg file reads: # #cluster-unassigned/cluster/*.sls cluster-ceph/cluster/*.sls profile-default/cluster/*.sls profile-default/stack/default/ceph/minions/*yml config/stack/default/global.yml config/stack/default/ceph/cluster.yml role-master/cluster/node01.sls role-admin/cluster/*.sls role-mon/cluster/*.sls role-mgr/cluster/*.sls role-mds/cluster/*.sls role-ganesha/cluster/*.sls role-client-nfs/cluster/*.sls role-client-cephfs/cluster/*.sls ## Please, could someone help me and shed some light on this issue? Thanks a lot in advance, Regasrds, Jones On Thu, Aug 23, 2018 at 2:46 PM John Spray wrote: > On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote: > > > > Hi All, > > > > I am trying to enable prometheus plugin with no success due to "no > socket could be created" > > > > The instructions for enabling the plugin are very straightforward and > simple > > > > Note > > My ultimate goal is to use Prometheus with Cephmetrics > > Some of you suggested to deploy ceph-exporter but why do we need to do > that when there is a plugin already ? > > > > > > How can I troubleshoot this further ? > > > > nhandled exception from module 'prometheus' while running on mgr.mon01: > error('No socket could be created',) > > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 > prometheus.serve: > > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 > Traceback (most recent call last): > > Aug 23 12:03:06 mon01 ceph-mgr: File > "/usr/lib64/ceph/mgr/prometheus/module.py", line 720, in serve > > Aug 23 12:03:06 mon01 ceph-mgr: cherrypy.engine.start() > > Aug 23 12:03:06 mon01 ceph-mgr: File > "/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in > start > > Aug 23 12:03:06 mon01 ceph-mgr: raise e_info > > Aug 23 12:03:06 mon01 ceph-mgr: ChannelFailures: error('No socket could > be created',) > > The things I usually check if a process can't create a socket are: > - is
Re: [ceph-users] ceph auto repair. What is wrong?
Hi! Did not help. :( HEALTH_WARN 3 osds down; 1 host (3 osds) down; 1 rack (3 osds) down; Degraded data redundancy: 112 pgs undersized OSD_DOWN 3 osds down osd.24 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down osd.25 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down osd.26 (root=default,rack=R-26-7-1,host=S-26-7-1-1) is down OSD_HOST_DOWN 1 host (3 osds) down host S-26-7-1-1 (root=default,rack=R-26-7-1) (3 osds) is down OSD_RACK_DOWN 1 rack (3 osds) down rack R-26-7-1 (root=default) (3 osds) is down PG_DEGRADED Degraded data redundancy: 112 pgs undersized pg 2.0 is stuck undersized for 2466.145928, current state active+undersized, last acting [18,33] pg 2.6 is stuck undersized for 2466.144061, current state active+undersized, last acting [15,18] pg 2.1b is stuck undersized for 2466.143789, current state active+undersized, last acting [30,6] pg 2.22 is stuck undersized for 2466.141138, current state active+undersized, last acting [15,21] [] [root@S-26-6-1-2 tmp]# ceph config dump WHO MASK LEVELOPTION VALUE RO mon advanced mon_allow_pool_delete true mon advanced mon_osd_down_out_subtree_limit pod * On 08/24/18 17:12, Fyodor Ustinov wrote: Hi! I.e. I have to do ceph config set mon mon_osd_down_out_subtree_limit row and restart every mon? On 08/24/18 12:44, Paul Emmerich wrote: Ceph doesn't mark out whole racks by default, set mon_osd_down_out_subtree_limit to something higher like row or pod. Paul 2018-08-24 10:50 GMT+02:00 Christian Balzer : Hello, On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: Hi! I wait about hour. Aside from verifying those timeout values in your cluster, what's your mon_osd_down_out_subtree_limit set to? Christian - Original Message - From: "Wido den Hollander" To: "Fyodor Ustinov" , ceph-users@lists.ceph.com Sent: Friday, 24 August, 2018 09:52:23 Subject: Re: [ceph-users] ceph auto repair. What is wrong? On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: Hi! I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack. I make such crush configuration on fresh ceph installation: sudo ceph osd crush add-bucket R-26-3-1 rack sudo ceph osd crush add-bucket R-26-3-2 rack sudo ceph osd crush add-bucket R-26-4-1 rack sudo ceph osd crush add-bucket R-26-4-2 rack [...] sudo ceph osd crush add-bucket R-26-8-1 rack sudo ceph osd crush add-bucket R-26-8-2 rack sudo ceph osd crush move R-26-3-1 root=default [...] sudo ceph osd crush move R-26-8-2 root=default sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 [...] sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 sudo ceph osd crush rule create-replicated hddreplrule default rack hdd sudo ceph osd pool create rbd 256 256 replicated hddreplrule sudo ceph osd pool set rbd size 3 sudo ceph osd pool set rbd min_size 2 osd tree look like: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 117.36346 root default -2 9.78029 rack R-26-3-1 -27 9.78029 host S-26-3-1-1 0 hdd 9.32390 osd.0 up 1.0 1.0 1 ssd 0.22820 osd.1 up 1.0 1.0 2 ssd 0.22820 osd.2 up 1.0 1.0 -3 9.78029 rack R-26-3-2 -43 9.78029 host S-26-3-2-1 3 hdd 9.32390 osd.3 up 1.0 1.0 4 ssd 0.22820 osd.4 up 1.0 1.0 5 ssd 0.22820 osd.5 up 1.0 1.0 [...] Now write some data to rbd pool and shutdown one node. cluster: id: 9000d700-8529-4d38-b9f5-24d6079429a2 health: HEALTH_WARN 3 osds down 1 host (3 osds) down 1 rack (3 osds) down Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 pgs degraded, 74 pgs undersized And ceph does not try to repair pool. Why? How long did you wait? The default timeout is 600 seconds before recovery starts. These OSDs are not marked as out yet. Wido WBR, Fyodor. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Balzer Network/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Clients report OSDs down/up (dmesg) nothing in Ceph logs (flapping OSDs)
Update: I changed the primary affinity of one OSD back to 1.0 to test if those metrics change, and indeed they do: OSD.24 immediately shows values greater than 0. I guess the metrics are completely unrelated to the flapping. So the search goes on... Zitat von Eugen Block : An hour ago host5 started to report the OSDs on host4 as down (still no clue why), resulting in slow requests. This time no flapping occured, the cluster recovered a couple of minutes later. No other OSDs reported that, only those two on host5. There's nothing in the logs of the reporting or the affected OSDs. Then I compared a perf dump of one healthy OSD with one on host4. There's something strange about the metrics (many of them are 0), I just can't tell if it's related to the fact that host4 has no primary OSDs. But even with no primary OSD I would expect different values for OSDs that are running for a week now. ---cut here--- host1:~ # diff -u perfdump.osd1 perfdump.osd24 --- perfdump.osd1 2018-08-23 11:03:03.695927316 +0200 +++ perfdump.osd24 2018-08-23 11:02:09.919927375 +0200 @@ -1,99 +1,99 @@ { "osd": { "op_wip": 0, -"op": 7878594, -"op_in_bytes": 852767683202, -"op_out_bytes": 1019871565411, +"op": 0, +"op_in_bytes": 0, +"op_out_bytes": 0, "op_latency": { -"avgcount": 7878594, -"sum": 1018863.131206702, -"avgtime": 0.129320425 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_process_latency": { -"avgcount": 7878594, -"sum": 879970.400440694, -"avgtime": 0.111691299 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_prepare_latency": { -"avgcount": 8321733, -"sum": 41376.442963329, -"avgtime": 0.004972094 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, -"op_r": 3574792, -"op_r_out_bytes": 1019871565411, +"op_r": 0, +"op_r_out_bytes": 0, "op_r_latency": { -"avgcount": 3574792, -"sum": 54750.502669010, -"avgtime": 0.015315717 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_r_process_latency": { -"avgcount": 3574792, -"sum": 34107.703579874, -"avgtime": 0.009541171 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_r_prepare_latency": { -"avgcount": 3574817, -"sum": 34262.515884817, -"avgtime": 0.009584411 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, -"op_w": 4249520, -"op_w_in_bytes": 847518164870, +"op_w": 0, +"op_w_in_bytes": 0, "op_w_latency": { -"avgcount": 4249520, -"sum": 960898.540843217, -"avgtime": 0.226119312 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_w_process_latency": { -"avgcount": 4249520, -"sum": 844398.804808119, -"avgtime": 0.198704513 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_w_prepare_latency": { -"avgcount": 4692618, -"sum": 7032.358957948, -"avgtime": 0.001498600 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, -"op_rw": 54282, -"op_rw_in_bytes": 5249518332, +"op_rw": 0, +"op_rw_in_bytes": 0, "op_rw_out_bytes": 0, "op_rw_latency": { -"avgcount": 54282, -"sum": 3214.087694475, -"avgtime": 0.059210929 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_rw_process_latency": { -"avgcount": 54282, -"sum": 1463.892052701, -"avgtime": 0.026968277 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_rw_prepare_latency": { -"avgcount": 54298, -"sum": 81.568120564, -"avgtime": 0.001502230 +"avgcount": 0, +"sum": 0.0, +"avgtime": 0.0 }, "op_before_queue_op_lat": { -"avgcount": 25469574, -"sum": 6654.779033909, -"avgtime": 0.000261283 +"avgcount": 4307123, +"sum": 2361.603323307, +"avgtime": 0.000548301 }, "op_before_dequeue_op_lat": { -
Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)
My mistake, Lenz. That recording is just the 7 minutes of fun before everyone joined. This is the proper one (~1 hour): https://bluejeans.com/s/HUofE Ernesto ERNESTO PUERTA SENIOR SOFTWARE ENGINEER, CEPH R Red Hat On Fri, Aug 24, 2018 at 4:38 PM Lenz Grimmer wrote: > > On 08/24/2018 02:00 PM, Lenz Grimmer wrote: > > > On 08/24/2018 10:59 AM, Lenz Grimmer wrote: > > > >> JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly > >> conference call that discusses the ongoing development and gives an > >> update on recent improvements/features. > >> > >> Today, we plan to give a demo of the new dashboard landing page (See > >> https://tracker.ceph.com/issues/24573 and > >> https://github.com/ceph/ceph/pull/23568 for details) and the > >> implementation of the "RBD trash" functionality in the UI > >> (http://tracker.ceph.com/issues/24272 and > >> https://github.com/ceph/ceph/pull/23351) > >> > >> The meeting takes places every second Friday at 15:00 CET at this URL: > >> > >> https://bluejeans.com/150063190 > > > > My apologies, I picked an incorrect meeting URL - this is the correct one: > > > > https://bluejeans.com/470119167/ > > > > Sorry for the confusion. > > Thanks to everyone who participated. We actually moved to yet another > different BlueJeans session in order to be able to record it... > > For those of you who missed it, here's a recording: > > https://bluejeans.com/s/HXnam > > Have a nice weekend! > > Lenz > > -- > SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) > GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd + openshift cause cpu stuck now and then
I am testing openshift with ceph rbd, it works as expected. except that sometimes the container which has a rbd volume start slowly. And the load on the node that containers running will pretty high, until following error raise in dmesg. After some google, i found one similar issue at[0]. seems it is a kernel bug? But since i can not reproduce this issue steadily, so i wanna make sure that does anyone could confirm this issue and fix it? Btw, here is my env: OS: centos 7.5 kernel: Linux ocm-74 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ceph: ceph-12.2.5-0.el7.x86_64 from ceph offical repo openshift: origin-3.9.0-1.el7.git.0.ba7faec.x86_64 [0] https://www.spinics.net/lists/ceph-users/msg46963.html 4381870.921579] device veth816c5e2f entered promiscuous mode [4381899.771170] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [mount:2760216] [4381899.772326] Modules linked in: vfat fat isofs ip_vs fuse ext4 mbcache jbd2 rbd libceph dns_resolver cfg80211 rfkill udp_diag unix_diag tcp_diag inet_diag veth nf_conntrack_netlink nfnetlink xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_mark xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc overlay(T) scsi_transport_iscsi bonding vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ipmi_ssif joydev mei_me mei iTCO_wdt iTCO_vendor_support [4381899.772379] pcspkr dcdbas ipmi_si ipmi_devintf ipmi_msghandler shpchp lpc_ich acpi_pad acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm libahci drm ixgbe libata crc32c_intel tg3 megaraid_sas i2c_core mdio dca ptp pps_core dm_mirror dm_region_hash dm_log dm_snapshot target_core_user uio target_core_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common dm_multipath dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_mod libcrc32c [4381899.772421] CPU: 14 PID: 2760216 Comm: mount Kdump: loaded Tainted: GWL T 3.10.0-862.6.3.el7.x86_64 #1 [4381899.772423] Hardware name: Dell Inc. PowerEdge R720/0DCWD1, BIOS 2.6.1 02/12/2018 [4381899.772426] task: 93917178dee0 ti: 93a1cf22c000 task.ti: 93a1cf22c000 [4381899.772428] RIP: 0010:[] [] __call_rcu+0x98/0x2c0 [4381899.772440] RSP: 0018:93a1cf22fd30 EFLAGS: 0246 [4381899.772441] RAX: 02e07679 RBX: 939c3f9dbb80 RCX: acd41e20 [4381899.772443] RDX: acc73000 RSI: 00014340 RDI: 0246 [4381899.772445] RBP: 93a1cf22fd58 R08: R09: [4381899.772446] R10: 939c3f9dbb80 R11: db1b021f3800 R12: 2a7c93a8 [4381899.772448] R13: 93a1cf22fd58 R14: 93a1cf22fcb0 R15: 938cc7ce208f [4381899.772450] FS: 7fc6c57db880() GS:939c3f9c() knlGS: [4381899.772452] CS: 0010 DS: ES: CR0: 80050033 [4381899.772465] CR2: 7fc6c499c15c CR3: 00108440 CR4: 001607e0 [4381899.772467] Call Trace: [4381899.772474] [] call_rcu_sched+0x1d/0x20 [4381899.772479] [] d_free+0x4f/0x70 [4381899.772481] [] __dentry_kill+0x16a/0x180 [4381899.772483] [] shrink_dentry_list+0xde/0x230 [4381899.772485] [] shrink_dcache_sb+0x9a/0xe0 [4381899.772491] [] do_remount_sb+0x51/0x200 [4381899.772496] [] do_mount+0x757/0xce0 [4381899.772501] [] ? memdup_user+0x42/0x70 [4381899.772503] [] SyS_mount+0x83/0xd0 [4381899.772512] [] system_call_fastpath+0x1c/0x21 [4381899.772513] Code: 3c cd a0 53 d3 ac 80 3d 66 eb 02 01 00 8b 87 70 01 00 00 0f 85 3a 01 00 00 80 3d 3f ba bc 00 00 0f 84 bd 01 00 00 4c 89 ef 57 9d <0f> 1f 44 00 00 48 83 c4 10 5b 41 5c 41 5d 5d c3 0f 1f 84 00 00 [4381899.984579] rbd: rbd0: encountered watch error: -107 [4381927.770763] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [mount:2760216] [4381927.771950] Modules linked in: vfat fat isofs ip_vs fuse ext4 mbcache jbd2 rbd libceph dns_resolver cfg80211 rfkill udp_diag unix_diag tcp_diag inet_diag veth nf_conntrack_netlink nfnetlink xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_mark xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc overlay(T) scsi_transport_iscsi bonding vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul
Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)
On 08/24/2018 02:00 PM, Lenz Grimmer wrote: > On 08/24/2018 10:59 AM, Lenz Grimmer wrote: > >> JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly >> conference call that discusses the ongoing development and gives an >> update on recent improvements/features. >> >> Today, we plan to give a demo of the new dashboard landing page (See >> https://tracker.ceph.com/issues/24573 and >> https://github.com/ceph/ceph/pull/23568 for details) and the >> implementation of the "RBD trash" functionality in the UI >> (http://tracker.ceph.com/issues/24272 and >> https://github.com/ceph/ceph/pull/23351) >> >> The meeting takes places every second Friday at 15:00 CET at this URL: >> >> https://bluejeans.com/150063190 > > My apologies, I picked an incorrect meeting URL - this is the correct one: > > https://bluejeans.com/470119167/ > > Sorry for the confusion. Thanks to everyone who participated. We actually moved to yet another different BlueJeans session in order to be able to record it... For those of you who missed it, here's a recording: https://bluejeans.com/s/HXnam Have a nice weekend! Lenz -- SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy
The proper way would be to do this change atomically by adjusting the crush hierarchy and rules at the same time by editing and setting the crush map manually. Paul 2018-08-24 9:40 GMT+02:00 Konstantin Shalygin : > On 08/24/2018 01:57 PM, Buchberger, Carsten wrote: >> >> Hi Konstantin, >> >> sounds easy;-) If i apply the new rule to the existing pools there won't >> be any osds to satisfy the requirements of the rule - because the osds are >> not in the new root yet. >> Isn't that a problem ? >> >> Thank you > > > Your IO will stall. > You need fast move osds to new root. Make list of commands `ceph osd crush > move host=` and paste it after apply crush rule. > > > > > > k > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph auto repair. What is wrong?
Hi! I.e. I have to do ceph config set mon mon_osd_down_out_subtree_limit row and restart every mon? On 08/24/18 12:44, Paul Emmerich wrote: Ceph doesn't mark out whole racks by default, set mon_osd_down_out_subtree_limit to something higher like row or pod. Paul 2018-08-24 10:50 GMT+02:00 Christian Balzer : Hello, On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: Hi! I wait about hour. Aside from verifying those timeout values in your cluster, what's your mon_osd_down_out_subtree_limit set to? Christian - Original Message - From: "Wido den Hollander" To: "Fyodor Ustinov" , ceph-users@lists.ceph.com Sent: Friday, 24 August, 2018 09:52:23 Subject: Re: [ceph-users] ceph auto repair. What is wrong? On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: Hi! I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack. I make such crush configuration on fresh ceph installation: sudo ceph osd crush add-bucket R-26-3-1 rack sudo ceph osd crush add-bucket R-26-3-2 rack sudo ceph osd crush add-bucket R-26-4-1 rack sudo ceph osd crush add-bucket R-26-4-2 rack [...] sudo ceph osd crush add-bucket R-26-8-1 rack sudo ceph osd crush add-bucket R-26-8-2 rack sudo ceph osd crush move R-26-3-1 root=default [...] sudo ceph osd crush move R-26-8-2 root=default sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 [...] sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 sudo ceph osd crush rule create-replicated hddreplrule default rack hdd sudo ceph osd pool create rbd 256 256 replicated hddreplrule sudo ceph osd pool set rbd size 3 sudo ceph osd pool set rbd min_size 2 osd tree look like: ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF -1 117.36346 root default -2 9.78029 rack R-26-3-1 -27 9.78029 host S-26-3-1-1 0 hdd 9.32390 osd.0 up 1.0 1.0 1 ssd 0.22820 osd.1 up 1.0 1.0 2 ssd 0.22820 osd.2 up 1.0 1.0 -3 9.78029 rack R-26-3-2 -43 9.78029 host S-26-3-2-1 3 hdd 9.32390 osd.3 up 1.0 1.0 4 ssd 0.22820 osd.4 up 1.0 1.0 5 ssd 0.22820 osd.5 up 1.0 1.0 [...] Now write some data to rbd pool and shutdown one node. cluster: id: 9000d700-8529-4d38-b9f5-24d6079429a2 health: HEALTH_WARN 3 osds down 1 host (3 osds) down 1 rack (3 osds) down Degraded data redundancy: 1223/12300 objects degraded (9.943%), 74 pgs degraded, 74 pgs undersized And ceph does not try to repair pool. Why? How long did you wait? The default timeout is 600 seconds before recovery starts. These OSDs are not marked as out yet. Wido WBR, Fyodor. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?
On 08/24/2018 06:44 AM, Konstantin Shalygin wrote: Answer to myself. radosgw-admin realm create --rgw-realm=default --default radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default radosgw-admin period update --commit radosgw-admin zonegroup placement add --rgw-zonegroup="default" \ --placement-id="indexless-placement" radosgw-admin zonegroup placement default --placement-id="default-placement" radosgw-admin period update --commit radosgw-admin zone placement add --rgw-zone="default" \ --placement-id="indexless-placement" \ --data-pool="default.rgw.buckets.data" \ --index-pool="default.rgw.buckets.index" \ --data_extra_pool="default.rgw.buckets.non-ec" \ --placement-index-type="indexless" Restart rgw instances and now is possible to create indexless buckets: s3cmd mb s3://blindbucket --region=:indexless-placement The documentation of Object Storage Gateway worse that for rbd or cephfs and have outdated (removed year ago) strings. http://tracker.ceph.com/issues/18082 http://tracker.ceph.com/issues/24508 http://tracker.ceph.com/issues/8073 Hope this post will help somebody in future. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Thank you very much! If anyone would like to help update these docs, I would be happy to help with guidance/review. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW pools don't show up in luminous
On 08/23/2018 01:22 PM, Robert Stanford wrote: I installed a new Ceph cluster with Luminous, after a long time working with Jewel. I created my RGW pools the same as always (pool create default.rgw.buckets.data etc.), but they don't show up in ceph df with Luminous. Has the command changed? Thanks R ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi Robert, Do you have a ceph-mgr running? I believe the accounting for 'ceph df' is performed by ceph-mgr in Luminous and beyond, rather than ceph-mon. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client hangs
I'm afaid that the client hangs again...the log shows: 2018-08-24 21:27:54.714334 [WRN] slow request 62.607608 seconds old, received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:27:54.714320 [WRN] 3 slow requests, 1 included below; oldest blocked for > 843.556758 secs 2018-08-24 21:27:24.713740 [WRN] slow request 32.606979 seconds old, received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:26:52.106425 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:27:24.713729 [WRN] 3 slow requests, 1 included below; oldest blocked for > 813.556129 secs 2018-08-24 21:25:49.711778 [WRN] slow request 483.807963 seconds old, received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:25:49.711766 [WRN] 2 slow requests, 1 included below; oldest blocked for > 718.554206 secs 2018-08-24 21:21:54.707536 [WRN] client.213528 isn't responding to mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, sent 483.548912 seconds ago 2018-08-24 21:21:54.706930 [WRN] slow request 483.549363 seconds old, received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2018-08-24 21:21:54.706920 [WRN] 2 slow requests, 1 included below; oldest blocked for > 483.549363 secs 2018-08-24 21:21:49.706838 [WRN] slow request 243.803027 seconds old, received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:21:49.706828 [WRN] 2 slow requests, 1 included below; oldest blocked for > 478.549269 secs 2018-08-24 21:19:49.704294 [WRN] slow request 123.800486 seconds old, received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:19:49.704284 [WRN] 2 slow requests, 1 included below; oldest blocked for > 358.546729 secs 2018-08-24 21:18:49.703073 [WRN] slow request 63.799269 seconds old, received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:18:49.703062 [WRN] 2 slow requests, 1 included below; oldest blocked for > 298.545511 secs 2018-08-24 21:18:19.702465 [WRN] slow request 33.798637 seconds old, received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810 getattr pAsLsXsFs #0x12e7e5a 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting 2018-08-24 21:18:19.702456 [WRN] 2 slow requests, 1 included below; oldest blocked for > 268.544880 secs 2018-08-24 21:17:54.702517 [WRN] client.213528 isn't responding to mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, sent 243.543893 seconds ago 2018-08-24 21:17:54.701904 [WRN] slow request 243.544331 seconds old, received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2018-08-24 21:17:54.701894 [WRN] 1 slow requests, 1 included below; oldest blocked for > 243.544331 secs 2018-08-24 21:15:54.700034 [WRN] client.213528 isn't responding to mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, sent 123.541410 seconds ago 2018-08-24 21:15:54.699385 [WRN] slow request 123.541822 seconds old, received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065 setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x12e7e5a 2018-08-24 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting 2018-08-24 21:15:54.699375 [WRN] 1 slow requests, 1 included below; oldest blocked for > 123.541822 secs 2018-08-24 21:14:57.055183 [WRN] Health check failed: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE) 2018-08-24 21:14:56.167868 [WRN] MDS health message (mds.0): Client docker39 failing to respond to capability release 2018-08-24 21:14:54.698753 [WRN] client.213528 isn't responding to mclientcaps(revoke), ino 0x12e7e5a pending pAsLsXsFr issued pAsLsXsFscr, sent 63.540127 seconds ago 2018-08-24 21:14:54.698104 [WRN] slow request 63.540533 seconds old, received at 2018-08-24 21:13:51.157483:
Re: [ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)
On 08/24/2018 10:59 AM, Lenz Grimmer wrote: > JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly > conference call that discusses the ongoing development and gives an > update on recent improvements/features. > > Today, we plan to give a demo of the new dashboard landing page (See > https://tracker.ceph.com/issues/24573 and > https://github.com/ceph/ceph/pull/23568 for details) and the > implementation of the "RBD trash" functionality in the UI > (http://tracker.ceph.com/issues/24272 and > https://github.com/ceph/ceph/pull/23351) > > The meeting takes places every second Friday at 15:00 CET at this URL: > > https://bluejeans.com/150063190 My apologies, I picked an incorrect meeting URL - this is the correct one: https://bluejeans.com/470119167/ Sorry for the confusion. Lenz -- SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?
Answer to myself. radosgw-admin realm create --rgw-realm=default --default radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default radosgw-admin period update --commit radosgw-admin zonegroup placement add --rgw-zonegroup="default" \ --placement-id="indexless-placement" radosgw-admin zonegroup placement default --placement-id="default-placement" radosgw-admin period update --commit radosgw-admin zone placement add --rgw-zone="default" \ --placement-id="indexless-placement" \ --data-pool="default.rgw.buckets.data" \ --index-pool="default.rgw.buckets.index" \ --data_extra_pool="default.rgw.buckets.non-ec" \ --placement-index-type="indexless" Restart rgw instances and now is possible to create indexless buckets: s3cmd mb s3://blindbucket --region=:indexless-placement The documentation of Object Storage Gateway worse that for rbd or cephfs and have outdated (removed year ago) strings. http://tracker.ceph.com/issues/18082 http://tracker.ceph.com/issues/24508 http://tracker.ceph.com/issues/8073 Hope this post will help somebody in future. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stability Issue with 52 OSD hosts
We pin half the OSDs to each socket (and to the corresponding memory). Since the disk controller and the network card is connected only to one socket, this still probably produces quite a bit of QPI traffic. It is also worth investigating how the network does under high load. We did run into problems where 40Gbps cards dropped packets heavily under load. Andras On 08/24/2018 05:16 AM, Marc Roos wrote: Can this be related to numa issues? I have also dual processor nodes, and was wondering if there is some guide on how to optimize for numa. -Original Message- From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] Sent: vrijdag 24 augustus 2018 3:11 To: Andras Pataki Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts Thanks for the info. I was investigating bluestore as well. My host dont go unresponsive but I do see parallel io slow down. On Thu, Aug 23, 2018, 8:02 PM Andras Pataki wrote: We are also running some fairly dense nodes with CentOS 7.4 and ran into similar problems. The nodes ran filestore OSDs (Jewel, then Luminous). Sometimes a node would be so unresponsive that one couldn't even ssh to it (even though the root disk was a physically separate drive on a separate controller from the OSD drives). Often these would coincide with kernel stack traces about hung tasks. Initially we did blame high load, etc. from all the OSDs. But then we benchmarked the nodes independently of ceph (with iozone and such) and noticed problems there too. When we started a few dozen iozone processes on separate JBOD drives with xfs, some didn't even start and write a single byte for minutes. The conclusion we came to was that there is some interference among a lot of mounted xfs file systems in the Red Hat 3.10 kernels. Some kind of central lock that prevents dozens of xfs file systems from running in parallel. When we do I/O directly to raw devices in parallel, we saw no problems (no high loads, etc.). So we built a newer kernel, and the situation got better. 4.4 is already much better, nowadays we are testing moving to 4.14. Also, migrating to bluestore significantly reduced the load on these nodes too. At busy times, the filestore host loads were 20-30, even higher (on a 28 core node), while the bluestore nodes hummed along at a lot of perhaps 6 or 8. This also confirms that somehow lots of xfs mounts don't work in parallel. Andras On 08/23/2018 03:24 PM, Tyler Bishop wrote: > Yes I've reviewed all the logs from monitor and host. I am not > getting useful errors (or any) in dmesg or general messages. > > I have 2 ceph clusters, the other cluster is 300 SSD and i never have > issues like this. That's why Im looking for help. > > On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev wrote: >> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop >> wrote: >>> During high load testing I'm only seeing user and sys cpu load around 60%... my load doesn't seem to be anything crazy on the host and iowait stays between 6 and 10%. I have very good `ceph osd perf` numbers too. >>> >>> I am using 10.2.11 Jewel. >>> >>> >>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer wrote: Hello, On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote: > Hi, I've been fighting to get good stability on my cluster for about > 3 weeks now. I am running into intermittent issues with OSD flapping > marking other OSD down then going back to a stable state for hours and > days. > > The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G > Network to 40G Brocade VDX Switches. The OSD are 6TB HGST SAS drives > with 400GB HGST SAS 12G SSDs. My configuration is 4 journals per > host with 12 disk per journal for a total of 56 disk per system and 52 > OSD. > Any denser and you'd have a storage black hole. You already pointed your finger in the (or at least one) right direction and everybody will agree that this setup is woefully underpowered in the CPU department. > I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile > for throughput-performance enabled. > Ceph version would be interesting as well... > I have these sysctls set: > > kernel.pid_max = 4194303 > fs.file-max = 6553600 > vm.swappiness = 0 >
Re: [ceph-users] ceph auto repair. What is wrong?
Ceph doesn't mark out whole racks by default, set mon_osd_down_out_subtree_limit to something higher like row or pod. Paul 2018-08-24 10:50 GMT+02:00 Christian Balzer : > Hello, > > On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: > >> Hi! >> >> I wait about hour. >> > Aside from verifying those timeout values in your cluster, what's your > mon_osd_down_out_subtree_limit set to? > > Christian > >> - Original Message - >> From: "Wido den Hollander" >> To: "Fyodor Ustinov" , ceph-users@lists.ceph.com >> Sent: Friday, 24 August, 2018 09:52:23 >> Subject: Re: [ceph-users] ceph auto repair. What is wrong? >> >> On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: >> > Hi! >> > >> > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and >> > two - ssd). Each host located in own rack. >> > >> > I make such crush configuration on fresh ceph installation: >> > >> >sudo ceph osd crush add-bucket R-26-3-1 rack >> >sudo ceph osd crush add-bucket R-26-3-2 rack >> >sudo ceph osd crush add-bucket R-26-4-1 rack >> >sudo ceph osd crush add-bucket R-26-4-2 rack >> > [...] >> >sudo ceph osd crush add-bucket R-26-8-1 rack >> >sudo ceph osd crush add-bucket R-26-8-2 rack >> > >> >sudo ceph osd crush move R-26-3-1 root=default >> > [...] >> >sudo ceph osd crush move R-26-8-2 root=default >> > >> > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 >> > [...] >> > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 >> > >> > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd >> > sudo ceph osd pool create rbd 256 256 replicated hddreplrule >> > sudo ceph osd pool set rbd size 3 >> > sudo ceph osd pool set rbd min_size 2 >> > >> > osd tree look like: >> > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF >> > -1 117.36346 root default >> > -2 9.78029 rack R-26-3-1 >> > -27 9.78029 host S-26-3-1-1 >> > 0 hdd 9.32390 osd.0 up 1.0 1.0 >> > 1 ssd 0.22820 osd.1 up 1.0 1.0 >> > 2 ssd 0.22820 osd.2 up 1.0 1.0 >> > -3 9.78029 rack R-26-3-2 >> > -43 9.78029 host S-26-3-2-1 >> > 3 hdd 9.32390 osd.3 up 1.0 1.0 >> > 4 ssd 0.22820 osd.4 up 1.0 1.0 >> > 5 ssd 0.22820 osd.5 up 1.0 1.0 >> > [...] >> > >> > >> > Now write some data to rbd pool and shutdown one node. >> > cluster: >> > id: 9000d700-8529-4d38-b9f5-24d6079429a2 >> > health: HEALTH_WARN >> > 3 osds down >> > 1 host (3 osds) down >> > 1 rack (3 osds) down >> > Degraded data redundancy: 1223/12300 objects degraded >> > (9.943%), 74 pgs degraded, 74 pgs undersized >> > >> > And ceph does not try to repair pool. Why? >> >> How long did you wait? The default timeout is 600 seconds before >> recovery starts. >> >> These OSDs are not marked as out yet. >> >> Wido >> >> > >> > WBR, >> > Fyodor. >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Rakuten Communications > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stability Issue with 52 OSD hosts
Can this be related to numa issues? I have also dual processor nodes, and was wondering if there is some guide on how to optimize for numa. -Original Message- From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] Sent: vrijdag 24 augustus 2018 3:11 To: Andras Pataki Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts Thanks for the info. I was investigating bluestore as well. My host dont go unresponsive but I do see parallel io slow down. On Thu, Aug 23, 2018, 8:02 PM Andras Pataki wrote: We are also running some fairly dense nodes with CentOS 7.4 and ran into similar problems. The nodes ran filestore OSDs (Jewel, then Luminous). Sometimes a node would be so unresponsive that one couldn't even ssh to it (even though the root disk was a physically separate drive on a separate controller from the OSD drives). Often these would coincide with kernel stack traces about hung tasks. Initially we did blame high load, etc. from all the OSDs. But then we benchmarked the nodes independently of ceph (with iozone and such) and noticed problems there too. When we started a few dozen iozone processes on separate JBOD drives with xfs, some didn't even start and write a single byte for minutes. The conclusion we came to was that there is some interference among a lot of mounted xfs file systems in the Red Hat 3.10 kernels. Some kind of central lock that prevents dozens of xfs file systems from running in parallel. When we do I/O directly to raw devices in parallel, we saw no problems (no high loads, etc.). So we built a newer kernel, and the situation got better. 4.4 is already much better, nowadays we are testing moving to 4.14. Also, migrating to bluestore significantly reduced the load on these nodes too. At busy times, the filestore host loads were 20-30, even higher (on a 28 core node), while the bluestore nodes hummed along at a lot of perhaps 6 or 8. This also confirms that somehow lots of xfs mounts don't work in parallel. Andras On 08/23/2018 03:24 PM, Tyler Bishop wrote: > Yes I've reviewed all the logs from monitor and host. I am not > getting useful errors (or any) in dmesg or general messages. > > I have 2 ceph clusters, the other cluster is 300 SSD and i never have > issues like this. That's why Im looking for help. > > On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev wrote: >> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop >> wrote: >>> During high load testing I'm only seeing user and sys cpu load around 60%... my load doesn't seem to be anything crazy on the host and iowait stays between 6 and 10%. I have very good `ceph osd perf` numbers too. >>> >>> I am using 10.2.11 Jewel. >>> >>> >>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer wrote: Hello, On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote: > Hi, I've been fighting to get good stability on my cluster for about > 3 weeks now. I am running into intermittent issues with OSD flapping > marking other OSD down then going back to a stable state for hours and > days. > > The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G > Network to 40G Brocade VDX Switches. The OSD are 6TB HGST SAS drives > with 400GB HGST SAS 12G SSDs. My configuration is 4 journals per > host with 12 disk per journal for a total of 56 disk per system and 52 > OSD. > Any denser and you'd have a storage black hole. You already pointed your finger in the (or at least one) right direction and everybody will agree that this setup is woefully underpowered in the CPU department. > I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile > for throughput-performance enabled. > Ceph version would be interesting as well... > I have these sysctls set: > > kernel.pid_max = 4194303 > fs.file-max = 6553600 > vm.swappiness = 0 > vm.vfs_cache_pressure = 50 > vm.min_free_kbytes = 3145728 > > I feel like my issue is directly related to the high number of OSD per > host but I'm not sure what issue I'm really running into. I believe > that I have ruled out network issues, i am able to get 38Gbit > consistently via
Re: [ceph-users] Ceph RGW Index Sharding In Jewel
You should probably have a look at ceph-ansible as it has a "take-over-existing-cluster" playbook. I think versions older than 2.0 support Ceph versions older than Jewel. --- Alex Cucu On Fri, Aug 24, 2018 at 4:31 AM Russell Holloway wrote: > > Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think > my only route to address this issue is to figure out the upgrade, at the very > least to 0.94.10. The biggest issue again is the deployment tool originally > used is set on 0.94.5 and pretty convoluted and no longer receiving updates, > but this isn't a ceph issue. > > -Russ > > > > From: David Turner > Sent: Wednesday, August 22, 2018 11:48 PM > To: Russell Holloway > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph RGW Index Sharding In Jewel > > The release notes for 0.94.10 mention the introduction of the `radosgw-admin > bucket reshard` command. Redhat [1] documentation for their Enterprise > version of Jewel goes into detail for the procedure. You can also search the > ML archives for the command to find several conversations about the process > as well as problems. Make sure that the procedure works on a test bucket for > Hammer before attempting it on your 12M object bucket. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#rados_gateway_user_management > > > On Wed, Aug 22, 2018, 9:23 PM Russell Holloway > wrote: > > Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster > is hammer :( > > > -Russ > > > From: ceph-users on behalf of Russell > Holloway > Sent: Wednesday, August 22, 2018 8:49:19 PM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Ceph RGW Index Sharding In Jewel > > > So, I've finally journeyed deeper into the depths of ceph and discovered a > grand mistake that is likely the root cause of many woeful nights of blocked > requests. To start off, I'm running jewel, and I know that is dated and I > need to upgrade (if anyone knows if this is a seamless upgrade even though > several major versions behind, do let me know. > > > My current issue is due to a rgw bucket index. I have just discovered I have > a bucket with about 12M objects in it. Sharding is not enabled on it. And > it's on a spinning disk, not SSD (journal is SSD though, so it could be > worse?). A bad combination as I just learned. From my recent understanding, > in jewel I could maybe update the rgw region to set max shards for buckets, > but it also sounds like this may or may not affect my existing bucket. > Furthermore, somewhere I saw mention that prior to luminous, resharding > needed to be done offline. I haven't found any documentation on this process > though. There is some mention around putting bucket indexes on SSD for > performance and latency reasons, which sounds great, but I get the feeling if > I modified crush map and tried to get the index pool on SSDs, and tried to > start moving things around involving this PG, it will fail in the same way I > can't even do a deep scrub on the PG. > > > Does anyone have a good reference on how I could begin to clean this bucket > up or get it sharded while on jewel? Again, it sounds like in Luminous it may > just start resharding itself and fix itself right up, but I feel going to > luminous will require more work and testing (mostly due to my original > deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade > path for fuel...I'll have to sort out how to transition away from that while > maintaining my existing nodes) > > > The core issue was identified when I took finer grained control over deep > scrubs and trigger them manually. I eventually found out I could trigger my > entire ceph cluster to hang by triggering a deep scrub on a single PG, which > happens to be the one hosting this index. The OSD hosting it basically > becomes unresponsive for a very long time and begins blocking a lot of other > requests affecting all sorts of VMs using rbd. I could simply not deep scrub > this PG (ceph ends up marking OSD as down and deep scrub seems to fail, never > completes, and about 30 minutes after hung requests, cluster eventually > recovers), but I know I need to address this bucket sizing issue and then try > to work on upgrading ceph. > > > Is it doable? For what it's worth, I tried to list the keys in ceph with > rados and that also hung requests. I'm not quite sure how to break the bucket > up at a software level especially if I cannot list the contents, so I hope > within ceph there is some route forward here... > > > Thanks a bunch in advance for helping a naive ceph operator. > > > -Russ > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
[ceph-users] Reminder: bi-weekly dashboard sync call today (15:00 CET)
Hi all, JFYI, the team working on the Ceph Manager Dashboard has a bi-weekly conference call that discusses the ongoing development and gives an update on recent improvements/features. Today, we plan to give a demo of the new dashboard landing page (See https://tracker.ceph.com/issues/24573 and https://github.com/ceph/ceph/pull/23568 for details) and the implementation of the "RBD trash" functionality in the UI (http://tracker.ceph.com/issues/24272 and https://github.com/ceph/ceph/pull/23351) The meeting takes places every second Friday at 15:00 CET at this URL: https://bluejeans.com/150063190 See you there! Lenz -- SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph auto repair. What is wrong?
Hello, On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote: > Hi! > > I wait about hour. > Aside from verifying those timeout values in your cluster, what's your mon_osd_down_out_subtree_limit set to? Christian > - Original Message - > From: "Wido den Hollander" > To: "Fyodor Ustinov" , ceph-users@lists.ceph.com > Sent: Friday, 24 August, 2018 09:52:23 > Subject: Re: [ceph-users] ceph auto repair. What is wrong? > > On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > > Hi! > > > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and > > two - ssd). Each host located in own rack. > > > > I make such crush configuration on fresh ceph installation: > > > >sudo ceph osd crush add-bucket R-26-3-1 rack > >sudo ceph osd crush add-bucket R-26-3-2 rack > >sudo ceph osd crush add-bucket R-26-4-1 rack > >sudo ceph osd crush add-bucket R-26-4-2 rack > > [...] > >sudo ceph osd crush add-bucket R-26-8-1 rack > >sudo ceph osd crush add-bucket R-26-8-2 rack > > > >sudo ceph osd crush move R-26-3-1 root=default > > [...] > >sudo ceph osd crush move R-26-8-2 root=default > > > > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 > > [...] > > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 > > > > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd > > sudo ceph osd pool create rbd 256 256 replicated hddreplrule > > sudo ceph osd pool set rbd size 3 > > sudo ceph osd pool set rbd min_size 2 > > > > osd tree look like: > > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF > > -1 117.36346 root default > > -2 9.78029 rack R-26-3-1 > > -27 9.78029 host S-26-3-1-1 > > 0 hdd 9.32390 osd.0 up 1.0 1.0 > > 1 ssd 0.22820 osd.1 up 1.0 1.0 > > 2 ssd 0.22820 osd.2 up 1.0 1.0 > > -3 9.78029 rack R-26-3-2 > > -43 9.78029 host S-26-3-2-1 > > 3 hdd 9.32390 osd.3 up 1.0 1.0 > > 4 ssd 0.22820 osd.4 up 1.0 1.0 > > 5 ssd 0.22820 osd.5 up 1.0 1.0 > > [...] > > > > > > Now write some data to rbd pool and shutdown one node. > > cluster: > > id: 9000d700-8529-4d38-b9f5-24d6079429a2 > > health: HEALTH_WARN > > 3 osds down > > 1 host (3 osds) down > > 1 rack (3 osds) down > > Degraded data redundancy: 1223/12300 objects degraded (9.943%), > > 74 pgs degraded, 74 pgs undersized > > > > And ceph does not try to repair pool. Why? > > How long did you wait? The default timeout is 600 seconds before > recovery starts. > > These OSDs are not marked as out yet. > > Wido > > > > > WBR, > > Fyodor. > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph auto repair. What is wrong?
Hi! I wait about hour. - Original Message - From: "Wido den Hollander" To: "Fyodor Ustinov" , ceph-users@lists.ceph.com Sent: Friday, 24 August, 2018 09:52:23 Subject: Re: [ceph-users] ceph auto repair. What is wrong? On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > Hi! > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two > - ssd). Each host located in own rack. > > I make such crush configuration on fresh ceph installation: > >sudo ceph osd crush add-bucket R-26-3-1 rack >sudo ceph osd crush add-bucket R-26-3-2 rack >sudo ceph osd crush add-bucket R-26-4-1 rack >sudo ceph osd crush add-bucket R-26-4-2 rack > [...] >sudo ceph osd crush add-bucket R-26-8-1 rack >sudo ceph osd crush add-bucket R-26-8-2 rack > >sudo ceph osd crush move R-26-3-1 root=default > [...] >sudo ceph osd crush move R-26-8-2 root=default > > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 > [...] > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 > > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd > sudo ceph osd pool create rbd 256 256 replicated hddreplrule > sudo ceph osd pool set rbd size 3 > sudo ceph osd pool set rbd min_size 2 > > osd tree look like: > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF > -1 117.36346 root default > -2 9.78029 rack R-26-3-1 > -27 9.78029 host S-26-3-1-1 > 0 hdd 9.32390 osd.0 up 1.0 1.0 > 1 ssd 0.22820 osd.1 up 1.0 1.0 > 2 ssd 0.22820 osd.2 up 1.0 1.0 > -3 9.78029 rack R-26-3-2 > -43 9.78029 host S-26-3-2-1 > 3 hdd 9.32390 osd.3 up 1.0 1.0 > 4 ssd 0.22820 osd.4 up 1.0 1.0 > 5 ssd 0.22820 osd.5 up 1.0 1.0 > [...] > > > Now write some data to rbd pool and shutdown one node. > cluster: > id: 9000d700-8529-4d38-b9f5-24d6079429a2 > health: HEALTH_WARN > 3 osds down > 1 host (3 osds) down > 1 rack (3 osds) down > Degraded data redundancy: 1223/12300 objects degraded (9.943%), > 74 pgs degraded, 74 pgs undersized > > And ceph does not try to repair pool. Why? How long did you wait? The default timeout is 600 seconds before recovery starts. These OSDs are not marked as out yet. Wido > > WBR, > Fyodor. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse slow cache?
Hi Gregory, Quoting Gregory Farnum (gfar...@redhat.com): > This is quite strange. Given that you have a log, I think what you want to > do is find one request in the log, trace it through its lifetime, and see > where the time is elapsed. You may find a bifurcation, where some > categories of requests happen instantly but other categories take a second > or more; focus on the second, obviously. So that is what I did. Turns out it's not the (slow) cache at all, probably not to your surprise. The reads are quit fast actually, compared to kernel client it's ~ 8 ms slower, or ~ 40%. It looks like couple of writes / updates to, at least a session file, are slow: 2018-08-23 16:40:25.631 7f79156a8700 10 client.15158830 put_inode on 0x1693859.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0 objects 0 dirty_or_tx 0] parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"] 0x5646ff0e8000) 2018-08-23 16:40:28.547 7f79156a8700 10 client.15158830 update_inode_file_time 0x1693859.head(faked_ino=0 ref=4 ll_ref=1 cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1693859 ts 0/0 objects 0 dirty_or_tx 0] parents=0x168547c.head["sess_ia0agoj01ul4rob7ji55ouca41"] 0x5646ff0e8000) pAsxLsXsxFsxcrwb ctime 2018-08-23 16:40:25.632601 mtime 2018-08-23 16:40:25.632601 So, almost 3 seconds. Page is only served after this, and possibly, after some cache files have been written. Note though that ceph-fuse is in debug=20 mode. Apparently the kernel client is _much_ faster in writing than ceph-fuse. If I write a file with "dd" (from /dev/urandom) it's in the tens of milliseconds range, not seconds. atime / ctime changes are handled in < 5 ms. I wonder if tuning file-striping [1] with stripe units of 4KB would be beneficial in this case. Gr. Stefan [1]: http://docs.ceph.com/docs/master/dev/file-striping/ -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?
Hi, I don't know why but, I noticed in the ceph-volume-systemd.log (above in bold), that there are 2 different lines corresponding to the lvm-1 (normally associated to the osd.1) ? One seems to have the correct id, while the other has a bad one...and it's looks like he's trying to start the one with the wrong id !? those can be remains of previous attempts to create OSDs. There are probably still enabled systemd-units referring to old LVs, just disable them to rule it out as the root cause. I've seen these messages, too, but eventually ceph-volume was able to find the right LVs. In your case it seems like it doesn't, though. Regards, Eugen Zitat von Hervé Ballans : Le 23/08/2018 à 18:44, Alfredo Deza a écrit : ceph-volume-systemd.log (extract) [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-3-9380cd27-c0fe-4ede-9ed3-d09eff545037 *[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd* [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 3-9380cd27-c0fe-4ede-9ed3-d09eff545037 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 *[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-1-4a9954ce-0a0f-432b-a91d-eaacb45287d4* [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 3-9380cd27-c0fe-4ede-9ed3-d09eff545037 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 *[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4* [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 *[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd* [2018-08-20 11:26:27,068][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.1 with fsid 4a9954ce-0a0f-432b-a91d-eaacb45287d4
Re: [ceph-users] RGW pools don't show up in luminous
I installed a new Ceph cluster with Luminous, after a long time working with Jewel. I created my RGW pools the same as always (pool create default.rgw.buckets.data etc.), but they don't show up in ceph df with Luminous. Has the command changed? Since Luminous you don't need to create pools. rgw will create it automatically. And no, rgw pools will be present on 'ceph df' or 'rados df'. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?
Le 23/08/2018 à 18:44, Alfredo Deza a écrit : ceph-volume-systemd.log (extract) [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-3-9380cd27-c0fe-4ede-9ed3-d09eff545037 *[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd* [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 3-9380cd27-c0fe-4ede-9ed3-d09eff545037 [2018-08-20 11:26:26,386][systemd][INFO ] parsed sub-command: lvm, extra data: 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 [2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 *[2018-08-20 11:26:26,386][systemd][INFO ] raw systemd input received: lvm-1-4a9954ce-0a0f-432b-a91d-eaacb45287d4* [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 [2018-08-20 11:26:26,387][systemd][INFO ] parsed sub-command: lvm, extra data: 2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 3-9380cd27-c0fe-4ede-9ed3-d09eff545037 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 2-b8e82f22-e993-4458-984b-90232b8b3d55 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 5-b7100200-9eef-4c85-b855-b5a0a435354c [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 6-ba351d69-5c48-418e-a377-4034f503af93 [2018-08-20 11:26:26,458][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-02540fff-5478-4a67-bf5c-679c72150e8d [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 8-913e65e3-62d9-48f8-a0ef-45315cf64593 [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 0-98bfb597-009b-4e88-bc5e-dd22587d21fe [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 7-5d4af2fc-388c-4795-9d1a-53ad8aba56d8 *[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 1-4a9954ce-0a0f-432b-a91d-eaacb45287d4* [2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 9-2e5b3463-5904-4aee-9ae1-7d31d8576dc8 *[2018-08-20 11:26:26,459][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 1-bcb9d7e6-44ea-449b-ad97-1aa5f880dfdd* [2018-08-20 11:26:27,068][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.1 with fsid 4a9954ce-0a0f-432b-a91d-eaacb45287d4 This is odd: why is osd.1 not found? Do you have an OSD with that ID and FSID? This line means that we have queried all the LVs in the system and we haven't found anything that responds to that ID and FSID Hi Alfredo, I don't know why but, I noticed in the ceph-volume-systemd.log (above in bold), that there are 2 different lines corresponding to the lvm-1 (normally associated to the osd.1) ? One seems to have the correct id, while the other has a bad one...and it's looks like he's trying to start the one with the wrong id !? Just a stupid assumption, but would it be possible, following the NVMe device path reversal, that a second lvm path for the same osd to then be created ?
Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy
On 08/24/2018 01:57 PM, Buchberger, Carsten wrote: Hi Konstantin, sounds easy;-) If i apply the new rule to the existing pools there won't be any osds to satisfy the requirements of the rule - because the osds are not in the new root yet. Isn't that a problem ? Thank you Your IO will stall. You need fast move osds to new root. Make list of commands `ceph osd crush move host=` and paste it after apply crush rule. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG auto repair with BlueStore
Hi, osd_scrub_auto_repair still defaults to false and I was wondering how we think about enabling this feature by default. Would we say it's safe to enable this with BlueStore? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph auto repair. What is wrong?
On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > Hi! > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two > - ssd). Each host located in own rack. > > I make such crush configuration on fresh ceph installation: > >sudo ceph osd crush add-bucket R-26-3-1 rack >sudo ceph osd crush add-bucket R-26-3-2 rack >sudo ceph osd crush add-bucket R-26-4-1 rack >sudo ceph osd crush add-bucket R-26-4-2 rack > [...] >sudo ceph osd crush add-bucket R-26-8-1 rack >sudo ceph osd crush add-bucket R-26-8-2 rack > >sudo ceph osd crush move R-26-3-1 root=default > [...] >sudo ceph osd crush move R-26-8-2 root=default > > sudo ceph osd crush move S-26-3-1-1 rack=R-26-3-1 > [...] > sudo ceph osd crush move S-26-8-2-1 rack=R-26-8-2 > > sudo ceph osd crush rule create-replicated hddreplrule default rack hdd > sudo ceph osd pool create rbd 256 256 replicated hddreplrule > sudo ceph osd pool set rbd size 3 > sudo ceph osd pool set rbd min_size 2 > > osd tree look like: > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF > -1 117.36346 root default > -2 9.78029 rack R-26-3-1 > -27 9.78029 host S-26-3-1-1 > 0 hdd 9.32390 osd.0 up 1.0 1.0 > 1 ssd 0.22820 osd.1 up 1.0 1.0 > 2 ssd 0.22820 osd.2 up 1.0 1.0 > -3 9.78029 rack R-26-3-2 > -43 9.78029 host S-26-3-2-1 > 3 hdd 9.32390 osd.3 up 1.0 1.0 > 4 ssd 0.22820 osd.4 up 1.0 1.0 > 5 ssd 0.22820 osd.5 up 1.0 1.0 > [...] > > > Now write some data to rbd pool and shutdown one node. > cluster: > id: 9000d700-8529-4d38-b9f5-24d6079429a2 > health: HEALTH_WARN > 3 osds down > 1 host (3 osds) down > 1 rack (3 osds) down > Degraded data redundancy: 1223/12300 objects degraded (9.943%), > 74 pgs degraded, 74 pgs undersized > > And ceph does not try to repair pool. Why? How long did you wait? The default timeout is 600 seconds before recovery starts. These OSDs are not marked as out yet. Wido > > WBR, > Fyodor. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating from pre-luminous multi-root crush hierachy
We recently upgrade to luminous (you can see the device-classes in the output). So it should be possible to have one single root, no fake hosts and just use the device-class. We added some hosts/osds recently which back a new pools, so we also created a new hierarchy and crush rules for those. That worked perfect, and of course we want to have that for the old parts of the cluster, too Is it possible to move the existing osd's to a new root/bucket without having to move all the data around (which might be difficult cause we don't have enough capacity to move 50 % of the osd's ) ? I imagine something like: 1. Magic maintenance command 2. Move osds to new bucket in hierarchy 3. Update either existing crush-rule or create new rule an update pool 4. Magic maintenance-done command We also plan to migrate the ods to bluestore. Should we do this a) before moving b) after moving I hope our issue is clear. Best regards Carsten You don't need "magic maintenance command", when you online apply your crush rule you need to move your osds to root defined in new crush rule. Data movement is not huge in this case. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com