Setting iotune limits on rbd
Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting iotune limits on rbd
Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path That is correct. As far as I know iotune uses the underlying cgroups from the OS. RBD devices (when using Qemu) are not block devices which can be managed by cgroups. That's why it's not working and you get the error that the device can't be found. There is however somebody working on DiskIoThrottling inside Qemu: http://wiki.qemu.org/Features/DiskIOLimits That would work with RBD (he even names Ceph :) ) Wido 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting iotune limits on rbd
But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollander w...@widodh.nl wrote: Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path That is correct. As far as I know iotune uses the underlying cgroups from the OS. RBD devices (when using Qemu) are not block devices which can be managed by cgroups. That's why it's not working and you get the error that the device can't be found. There is however somebody working on DiskIoThrottling inside Qemu: http://wiki.qemu.org/Features/DiskIOLimits That would work with RBD (he even names Ceph :) ) Wido 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting iotune limits on rbd
Op 3-4-2012 10:28, Andrey Korolyov schreef: But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. Are you sure about that? http://libvirt.org/formatdomain.html#elementsBlockTuning Browsing through the source code I found that this is indeed related to libvirt. In the Qemu driver: if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK disk-type != VIR_DOMAIN_DISK_TYPE_FILE) goto cleanup; ... ... cleanup: if (!ret) { qemuReportError(VIR_ERR_INVALID_ARG, %s, _(No device found for specified path)); } RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK That's why you get this error, it's assuming the device you want to set the limits on is a block device or a regular file. Wido On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl wrote: Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path That is correct. As far as I know iotune uses the underlying cgroups from the OS. RBD devices (when using Qemu) are not block devices which can be managed by cgroups. That's why it's not working and you get the error that the device can't be found. There is however somebody working on DiskIoThrottling inside Qemu: http://wiki.qemu.org/Features/DiskIOLimits That would work with RBD (he even names Ceph :) ) Wido 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting iotune limits on rbd
At least, elements under iotune block applies to rbd and you can test it by running fio, for example. I have suggested by reading libvirt code that blkdeviotune call can be applied to pseudo-devices, there are no counterpoints in code that shows different behavior for iotune and blkdeviotune calls at all. Seems that libvirt folks did not pushed workaround for sheepdog/ceph at all. I`ll try a dirty hack in this code snippet (somehow, I overlooked such idea that there is no non-block workaround) and report soon. On Tue, Apr 3, 2012 at 12:45 PM, Wido den Hollander w...@widodh.nl wrote: Op 3-4-2012 10:28, Andrey Korolyov schreef: But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. Are you sure about that? http://libvirt.org/formatdomain.html#elementsBlockTuning Browsing through the source code I found that this is indeed related to libvirt. In the Qemu driver: if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK disk-type != VIR_DOMAIN_DISK_TYPE_FILE) goto cleanup; ... ... cleanup: if (!ret) { qemuReportError(VIR_ERR_INVALID_ARG, %s, _(No device found for specified path)); } RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK That's why you get this error, it's assuming the device you want to set the limits on is a block device or a regular file. Wido On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl wrote: Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path That is correct. As far as I know iotune uses the underlying cgroups from the OS. RBD devices (when using Qemu) are not block devices which can be managed by cgroups. That's why it's not working and you get the error that the device can't be found. There is however somebody working on DiskIoThrottling inside Qemu: http://wiki.qemu.org/Features/DiskIOLimits That would work with RBD (he even names Ceph :) ) Wido 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Setting iotune limits on rbd
Suggested hack works, seems that libvirt devs does not remove block limitation as they count this feature as experimental, or forgot about it. On Tue, Apr 3, 2012 at 12:55 PM, Andrey Korolyov and...@xdel.ru wrote: At least, elements under iotune block applies to rbd and you can test it by running fio, for example. I have suggested by reading libvirt code that blkdeviotune call can be applied to pseudo-devices, there are no counterpoints in code that shows different behavior for iotune and blkdeviotune calls at all. Seems that libvirt folks did not pushed workaround for sheepdog/ceph at all. I`ll try a dirty hack in this code snippet (somehow, I overlooked such idea that there is no non-block workaround) and report soon. On Tue, Apr 3, 2012 at 12:45 PM, Wido den Hollander w...@widodh.nl wrote: Op 3-4-2012 10:28, Andrey Korolyov schreef: But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. Are you sure about that? http://libvirt.org/formatdomain.html#elementsBlockTuning Browsing through the source code I found that this is indeed related to libvirt. In the Qemu driver: if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK disk-type != VIR_DOMAIN_DISK_TYPE_FILE) goto cleanup; ... ... cleanup: if (!ret) { qemuReportError(VIR_ERR_INVALID_ARG, %s, _(No device found for specified path)); } RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK That's why you get this error, it's assuming the device you want to set the limits on is a block device or a regular file. Wido On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl wrote: Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path That is correct. As far as I know iotune uses the underlying cgroups from the OS. RBD devices (when using Qemu) are not block devices which can be managed by cgroups. That's why it's not working and you get the error that the device can't be found. There is however somebody working on DiskIoThrottling inside Qemu: http://wiki.qemu.org/Features/DiskIOLimits That would work with RBD (he even names Ceph :) ) Wido 2012-04-03 07:38:49.170+: 30171: debug : virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda, params=0x1114600, nparams=1, flags=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:331 : Got a timeout scheduled for 1333438734170 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338 : invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 : dom=0x1114590, (VM: name=Test, uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595) 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 : unref domain 0x1114590 Test 1 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 : release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595 2012-04-03 07:38:49.170+: 30169: debug : virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in 5000 ms 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 : EVENT_POLL_RUN: nhandles=9 imeout=5000 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 : unref connection 0x1177b10 2 libvir 0.9.10, json-escape patch applied, but seems that this problem related to another incorrect path handle. I`m in doubt if it belongs to libvirt ml or here. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD suicide
Yes, Stefan. You are right. I'm not sure about the D state, but high cpu usage is fact. I do want to try an OSD per disk configuration but a bit later. Thanks, Vladimir. 2012/4/3 Stefan Kleijkers ste...@unilogicnetworks.net: Hello, A while back I had the same errors you are seeing. I had these problems only when using mdraid. After doing IO for some time the IO stalled and in most cases if you look at the cepg-osd daemon it's in D mode (waiting for IO). Also if you look with top you notice a very high load and IO wait. I didn't find out what the exact reason was, but I stopped using mdraid and use a setup with an OSD per disk and the disks formatted with XFS. This gave me the best stability and performance. Stefan On 04/02/2012 04:01 PM, Бородин Владимир wrote: Hi all. I have a cluster with 4 OSDs (mdRAID10 on each of them with XFS) and I am trying to put into RADOS (through python API) 20 million objects 20 KB each. I have two problems: 1. the speed is not as good as I expect (but that's not the main problem now), 2. after I put 10 million objects, OSDs started to mark itself down and out. The logs give something like that: 2012-04-02 17:05:17.894395 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30 2012-04-02 17:05:18.877781 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:04:58.87 7752) 2012-04-02 17:05:19.578112 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:04:59.57 8079) 2012-04-02 17:05:20.678455 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:00.67 8421) 2012-04-02 17:05:21.678785 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:01.67 8751) 2012-04-02 17:05:22.579101 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:02.57 9069) 2012-04-02 17:05:22.894568 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30 2012-04-02 17:05:22.894601 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had suicide timed out after 300 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f2d2a213700 time 2012-04-02 17: 05:22.894637 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50] 4: (CephContextServiceThread::entry()+0x5f) [0x65a31f] 5: (()+0x69ca) [0x7f2d2beab9ca] 6: (clone()+0x6d) [0x7f2d2a4fccdd] ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50] 4: (CephContextServiceThread::entry()+0x5f) [0x65a31f] 5: (()+0x69ca) [0x7f2d2beab9ca] 6: (clone()+0x6d) [0x7f2d2a4fccdd] *** Caught signal (Aborted) ** in thread 7f2d2a213700 ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: /usr/bin/ceph-osd() [0x661cb1] 2: (()+0xf8f0) [0x7f2d2beb48f0] 3: (gsignal()+0x35) [0x7f2d2a449a75] 4: (abort()+0x180) [0x7f2d2a44d5c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2d2acec58d] 6: (()+0xb7736) [0x7f2d2acea736] 7: (()+0xb7763) [0x7f2d2acea763] 8: (()+0xb785e) [0x7f2d2acea85e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x841) [0x667541] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50] 13: (CephContextServiceThread::entry()+0x5f) [0x65a31f] 14: (()+0x69ca) [0x7f2d2beab9ca] 15: (clone()+0x6d) [0x7f2d2a4fccdd] Or something like that: ... 2012-04-02 17:01:38.673223 7f7855486700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f7847369700' had timed out after 30 2012-04-02 17:01:38.673267 7f7855486700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f7847b6a700' had timed out after 30 2012-04-02 17:01:38.833509 7f7847369700 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f7847369700' had timed out after 30 2012-04-02 17:01:39.031229 7f7847b6a700 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7f7847b6a700' had timed out after 30 2012-04-02 17:02:06.971487 7f784324b700 -- 84.201.161.48:6803/17442
Re: OSD suicide
Hello Vladimir, Well in that case you could try BTRFS. With BTRFS it's possible to grab all the disks in a node together in a RAID0/RAID1/RAID10 configuration. So you can run one or a few OSDs per node. But I would recommend the newest kernel possible. I haven't tried with the 3.3 range, but with the early 3.2.x kernels I got BTRFS crashes. And with the later 3.2.x kernels I saw a real slowdown after some time. If you get it stabilised with the mdraid, please let me know, I'm still interested in that setup. With the current setup I have the problem that with a disk crash in most cases I can't umount the filesystem anymore and I have to reboot the node. I would like to avoid that and with mdraid it's possible to swap a disk without bringing the system down. Stefan On 04/03/2012 07:16 PM, Borodin Vladimir wrote: Yes, Stefan. You are right. I'm not sure about the D state, but high cpu usage is fact. I do want to try an OSD per disk configuration but a bit later. Thanks, Vladimir. 2012/4/3 Stefan Kleijkersste...@unilogicnetworks.net: Hello, A while back I had the same errors you are seeing. I had these problems only when using mdraid. After doing IO for some time the IO stalled and in most cases if you look at the cepg-osd daemon it's in D mode (waiting for IO). Also if you look with top you notice a very high load and IO wait. I didn't find out what the exact reason was, but I stopped using mdraid and use a setup with an OSD per disk and the disks formatted with XFS. This gave me the best stability and performance. Stefan On 04/02/2012 04:01 PM, Бородин Владимир wrote: Hi all. I have a cluster with 4 OSDs (mdRAID10 on each of them with XFS) and I am trying to put into RADOS (through python API) 20 million objects 20 KB each. I have two problems: 1. the speed is not as good as I expect (but that's not the main problem now), 2. after I put 10 million objects, OSDs started to mark itself down and out. The logs give something like that: 2012-04-02 17:05:17.894395 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30 2012-04-02 17:05:18.877781 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:04:58.87 7752) 2012-04-02 17:05:19.578112 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:04:59.57 8079) 2012-04-02 17:05:20.678455 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:00.67 8421) 2012-04-02 17:05:21.678785 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:01.67 8751) 2012-04-02 17:05:22.579101 7f2d1a8f3700 osd.47 1673 heartbeat_check: no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff 2012-04-02 17:05:02.57 9069) 2012-04-02 17:05:22.894568 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30 2012-04-02 17:05:22.894601 7f2d2a213700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f2d1d0f8700' had suicide timed out after 300 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f2d2a213700 time 2012-04-02 17: 05:22.894637 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50] 4: (CephContextServiceThread::entry()+0x5f) [0x65a31f] 5: (()+0x69ca) [0x7f2d2beab9ca] 6: (clone()+0x6d) [0x7f2d2a4fccdd] ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50] 4: (CephContextServiceThread::entry()+0x5f) [0x65a31f] 5: (()+0x69ca) [0x7f2d2beab9ca] 6: (clone()+0x6d) [0x7f2d2a4fccdd] *** Caught signal (Aborted) ** in thread 7f2d2a213700 ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) 1: /usr/bin/ceph-osd() [0x661cb1] 2: (()+0xf8f0) [0x7f2d2beb48f0] 3: (gsignal()+0x35) [0x7f2d2a449a75] 4: (abort()+0x180) [0x7f2d2a44d5c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2d2acec58d] 6: (()+0xb7736) [0x7f2d2acea736] 7: (()+0xb7763) [0x7f2d2acea763] 8: (()+0xb785e) [0x7f2d2acea85e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x841) [0x667541] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x7634ee] 11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f] 12: