Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov
Hi,

# virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

# virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
error: Unable to change block I/O throttle
error: invalid argument: No device found for specified path

2012-04-03 07:38:49.170+: 30171: debug :
virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
params=0x1114600, nparams=1, flags=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:331 : Got a timeout scheduled for
1333438734170
2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
: invalid argument: No device found for specified path
2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
unref domain 0x1114590 Test 1
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
5000 ms
2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
EVENT_POLL_RUN: nhandles=9 imeout=5000
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
unref connection 0x1177b10 2

libvir 0.9.10, json-escape patch applied, but seems that this problem
related to another incorrect path handle.

I`m in doubt if it belongs to libvirt ml or here.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting iotune limits on rbd

2012-04-03 Thread Wido den Hollander

Hi,

Op 3-4-2012 10:02, Andrey Korolyov schreef:

Hi,

# virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

# virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
error: Unable to change block I/O throttle
error: invalid argument: No device found for specified path


That is correct. As far as I know iotune uses the underlying cgroups 
from the OS.


RBD devices (when using Qemu) are not block devices which can be managed 
by cgroups. That's why it's not working and you get the error that the 
device can't be found.


There is however somebody working on DiskIoThrottling inside Qemu: 
http://wiki.qemu.org/Features/DiskIOLimits


That would work with RBD (he even names Ceph :) )

Wido



2012-04-03 07:38:49.170+: 30171: debug :
virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
params=0x1114600, nparams=1, flags=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:331 : Got a timeout scheduled for
1333438734170
2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
: invalid argument: No device found for specified path
2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
unref domain 0x1114590 Test 1
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
5000 ms
2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
EVENT_POLL_RUN: nhandles=9 imeout=5000
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
unref connection 0x1177b10 2

libvir 0.9.10, json-escape patch applied, but seems that this problem
related to another incorrect path handle.

I`m in doubt if it belongs to libvirt ml or here.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov
But I am able to set static limits in the config for rbd :) All I want
is a change on-the-fly.

It is NOT cgroups mechanism, but completely qemu-driven.

On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollander w...@widodh.nl wrote:
 Hi,

 Op 3-4-2012 10:02, Andrey Korolyov schreef:

 Hi,

 # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

 # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
 error: Unable to change block I/O throttle
 error: invalid argument: No device found for specified path


 That is correct. As far as I know iotune uses the underlying cgroups from
 the OS.

 RBD devices (when using Qemu) are not block devices which can be managed by
 cgroups. That's why it's not working and you get the error that the device
 can't be found.

 There is however somebody working on DiskIoThrottling inside Qemu:
 http://wiki.qemu.org/Features/DiskIOLimits

 That would work with RBD (he even names Ceph :) )

 Wido


 2012-04-03 07:38:49.170+: 30171: debug :
 virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
 params=0x1114600, nparams=1, flags=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:331 : Got a timeout scheduled for
 1333438734170
 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
 : invalid argument: No device found for specified path
 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
 dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
 unref domain 0x1114590 Test 1
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
 release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
 5000 ms
 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
 EVENT_POLL_RUN: nhandles=9 imeout=5000
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
 unref connection 0x1177b10 2

 libvir 0.9.10, json-escape patch applied, but seems that this problem
 related to another incorrect path handle.

 I`m in doubt if it belongs to libvirt ml or here.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting iotune limits on rbd

2012-04-03 Thread Wido den Hollander

Op 3-4-2012 10:28, Andrey Korolyov schreef:

But I am able to set static limits in the config for rbd :) All I want
is a change on-the-fly.

It is NOT cgroups mechanism, but completely qemu-driven.


Are you sure about that? 
http://libvirt.org/formatdomain.html#elementsBlockTuning


Browsing through the source code I found that this is indeed related to 
libvirt.


In the Qemu driver:

if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK 
disk-type != VIR_DOMAIN_DISK_TYPE_FILE)
goto cleanup;
...
...
cleanup:
if (!ret) {
qemuReportError(VIR_ERR_INVALID_ARG,
%s, _(No device found for specified path));
}


RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK

That's why you get this error, it's assuming the device you want to set 
the limits on is a block device or a regular file.


Wido



On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl  wrote:

Hi,

Op 3-4-2012 10:02, Andrey Korolyov schreef:


Hi,

# virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

# virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
error: Unable to change block I/O throttle
error: invalid argument: No device found for specified path



That is correct. As far as I know iotune uses the underlying cgroups from
the OS.

RBD devices (when using Qemu) are not block devices which can be managed by
cgroups. That's why it's not working and you get the error that the device
can't be found.

There is however somebody working on DiskIoThrottling inside Qemu:
http://wiki.qemu.org/Features/DiskIOLimits

That would work with RBD (he even names Ceph :) )

Wido



2012-04-03 07:38:49.170+: 30171: debug :
virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
params=0x1114600, nparams=1, flags=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:331 : Got a timeout scheduled for
1333438734170
2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
: invalid argument: No device found for specified path
2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
dom=0x1114590, (VM: name=Test,
uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
unref domain 0x1114590 Test 1
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
2012-04-03 07:38:49.170+: 30169: debug :
virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
5000 ms
2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
EVENT_POLL_RUN: nhandles=9 imeout=5000
2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
unref connection 0x1177b10 2

libvir 0.9.10, json-escape patch applied, but seems that this problem
related to another incorrect path handle.

I`m in doubt if it belongs to libvirt ml or here.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov
At least, elements under iotune block applies to rbd and you can
test it by running fio, for example. I have suggested by reading
libvirt code that blkdeviotune call can be applied to pseudo-devices,
there are no counterpoints in code that shows different behavior for
iotune and blkdeviotune calls at all. Seems that libvirt folks did not
pushed workaround for sheepdog/ceph at all. I`ll try a dirty hack in
this code snippet (somehow, I overlooked such idea that there is no
non-block workaround) and report soon.

On Tue, Apr 3, 2012 at 12:45 PM, Wido den Hollander w...@widodh.nl wrote:
 Op 3-4-2012 10:28, Andrey Korolyov schreef:

 But I am able to set static limits in the config for rbd :) All I want
 is a change on-the-fly.

 It is NOT cgroups mechanism, but completely qemu-driven.


 Are you sure about that?
 http://libvirt.org/formatdomain.html#elementsBlockTuning

 Browsing through the source code I found that this is indeed related to
 libvirt.

 In the Qemu driver:

    if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK 
        disk-type != VIR_DOMAIN_DISK_TYPE_FILE)
        goto cleanup;
 ...
 ...
 cleanup:
    if (!ret) {
        qemuReportError(VIR_ERR_INVALID_ARG,
                        %s, _(No device found for specified path));
    }


 RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK

 That's why you get this error, it's assuming the device you want to set the
 limits on is a block device or a regular file.

 Wido



 On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl
  wrote:

 Hi,

 Op 3-4-2012 10:02, Andrey Korolyov schreef:

 Hi,

 # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

 # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
 error: Unable to change block I/O throttle
 error: invalid argument: No device found for specified path



 That is correct. As far as I know iotune uses the underlying cgroups from
 the OS.

 RBD devices (when using Qemu) are not block devices which can be managed
 by
 cgroups. That's why it's not working and you get the error that the
 device
 can't be found.

 There is however somebody working on DiskIoThrottling inside Qemu:
 http://wiki.qemu.org/Features/DiskIOLimits

 That would work with RBD (he even names Ceph :) )

 Wido


 2012-04-03 07:38:49.170+: 30171: debug :
 virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
 params=0x1114600, nparams=1, flags=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:331 : Got a timeout scheduled for
 1333438734170
 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
 : invalid argument: No device found for specified path
 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
 dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
 unref domain 0x1114590 Test 1
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
 release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
 5000 ms
 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
 EVENT_POLL_RUN: nhandles=9 imeout=5000
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
 unref connection 0x1177b10 2

 libvir 0.9.10, json-escape patch applied, but seems that this problem
 related to another incorrect path handle.

 I`m in doubt if it belongs to libvirt ml or here.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov
Suggested hack works, seems that libvirt devs does not remove block
limitation as they count this feature as experimental, or forgot about
it.

On Tue, Apr 3, 2012 at 12:55 PM, Andrey Korolyov and...@xdel.ru wrote:
 At least, elements under iotune block applies to rbd and you can
 test it by running fio, for example. I have suggested by reading
 libvirt code that blkdeviotune call can be applied to pseudo-devices,
 there are no counterpoints in code that shows different behavior for
 iotune and blkdeviotune calls at all. Seems that libvirt folks did not
 pushed workaround for sheepdog/ceph at all. I`ll try a dirty hack in
 this code snippet (somehow, I overlooked such idea that there is no
 non-block workaround) and report soon.

 On Tue, Apr 3, 2012 at 12:45 PM, Wido den Hollander w...@widodh.nl wrote:
 Op 3-4-2012 10:28, Andrey Korolyov schreef:

 But I am able to set static limits in the config for rbd :) All I want
 is a change on-the-fly.

 It is NOT cgroups mechanism, but completely qemu-driven.


 Are you sure about that?
 http://libvirt.org/formatdomain.html#elementsBlockTuning

 Browsing through the source code I found that this is indeed related to
 libvirt.

 In the Qemu driver:

    if (disk-type != VIR_DOMAIN_DISK_TYPE_BLOCK 
        disk-type != VIR_DOMAIN_DISK_TYPE_FILE)
        goto cleanup;
 ...
 ...
 cleanup:
    if (!ret) {
        qemuReportError(VIR_ERR_INVALID_ARG,
                        %s, _(No device found for specified path));
    }


 RBD devices are however of the type: VIR_DOMAIN_DISK_TYPE_NETWORK

 That's why you get this error, it's assuming the device you want to set the
 limits on is a block device or a regular file.

 Wido



 On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollanderw...@widodh.nl
  wrote:

 Hi,

 Op 3-4-2012 10:02, Andrey Korolyov schreef:

 Hi,

 # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device

 # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device
 error: Unable to change block I/O throttle
 error: invalid argument: No device found for specified path



 That is correct. As far as I know iotune uses the underlying cgroups from
 the OS.

 RBD devices (when using Qemu) are not block devices which can be managed
 by
 cgroups. That's why it's not working and you get the error that the
 device
 can't be found.

 There is however somebody working on DiskIoThrottling inside Qemu:
 http://wiki.qemu.org/Features/DiskIOLimits

 That would work with RBD (he even names Ceph :) )

 Wido


 2012-04-03 07:38:49.170+: 30171: debug :
 virDomainSetBlockIoTune:18317 : dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595), disk=vda,
 params=0x1114600, nparams=1, flags=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollMakePollFDs:383 : Prepare n=8 w=11, f=16 e=1 d=0
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:325 : Calculate expiry of 4 timers
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:331 : Got a timeout scheduled for
 1333438734170
 2012-04-03 07:38:49.170+: 30171: error : qemuDiskPathToAlias:11338
 : invalid argument: No device found for specified path
 2012-04-03 07:38:49.170+: 30171: debug : virDomainFree:2313 :
 dom=0x1114590, (VM: name=Test,
 uuid=8c27bf32-82dc-315a-d0ba-4653b1b3d595)
 2012-04-03 07:38:49.170+: 30171: debug : virUnrefDomain:276 :
 unref domain 0x1114590 Test 1
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:238 :
 release domain 0x1114590 Test 8c27bf32-82dc-315a-d0ba-4653b1b3d595
 2012-04-03 07:38:49.170+: 30169: debug :
 virEventPollCalculateTimeout:351 : Timeout at 1333438734170 due in
 5000 ms
 2012-04-03 07:38:49.170+: 30169: debug : virEventPollRunOnce:619 :
 EVENT_POLL_RUN: nhandles=9 imeout=5000
 2012-04-03 07:38:49.170+: 30171: debug : virReleaseDomain:246 :
 unref connection 0x1177b10 2

 libvir 0.9.10, json-escape patch applied, but seems that this problem
 related to another incorrect path handle.

 I`m in doubt if it belongs to libvirt ml or here.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD suicide

2012-04-03 Thread Borodin Vladimir
Yes, Stefan. You are right. I'm not sure about the D state, but high
cpu usage is fact.
I do want to try an OSD per disk configuration but a bit later.

Thanks,
Vladimir.

2012/4/3 Stefan Kleijkers ste...@unilogicnetworks.net:
 Hello,

 A while back I had the same errors you are seeing. I had these problems only
 when using mdraid. After doing IO for some time the IO stalled and in most
 cases if you look at the cepg-osd daemon it's in D mode (waiting for IO).
 Also if you look with top you notice a very high load and IO wait.

 I didn't find out what the exact reason was, but I stopped using mdraid and
 use a setup with an OSD per disk and the disks formatted with XFS. This gave
 me the best stability and performance.

 Stefan


 On 04/02/2012 04:01 PM, Бородин Владимир wrote:

 Hi all.

 I have a cluster with 4 OSDs (mdRAID10 on each of them with XFS) and I
 am trying to put into RADOS (through python API) 20 million objects 20
 KB each. I have two problems:
 1. the speed is not as good as I expect (but that's not the main problem
 now),
 2. after I put 10 million objects, OSDs started to mark itself down
 and out. The logs give something like that:

 2012-04-02 17:05:17.894395 7f2d2a213700 heartbeat_map is_healthy
 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30
 2012-04-02 17:05:18.877781 7f2d1a8f3700 osd.47 1673 heartbeat_check:
 no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
 2012-04-02 17:04:58.87
 7752)
 2012-04-02 17:05:19.578112 7f2d1a8f3700 osd.47 1673 heartbeat_check:
 no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
 2012-04-02 17:04:59.57
 8079)
 2012-04-02 17:05:20.678455 7f2d1a8f3700 osd.47 1673 heartbeat_check:
 no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
 2012-04-02 17:05:00.67
 8421)
 2012-04-02 17:05:21.678785 7f2d1a8f3700 osd.47 1673 heartbeat_check:
 no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
 2012-04-02 17:05:01.67
 8751)
 2012-04-02 17:05:22.579101 7f2d1a8f3700 osd.47 1673 heartbeat_check:
 no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
 2012-04-02 17:05:02.57
 9069)
 2012-04-02 17:05:22.894568 7f2d2a213700 heartbeat_map is_healthy
 'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30
 2012-04-02 17:05:22.894601 7f2d2a213700 heartbeat_map is_healthy
 'OSD::op_tp thread 0x7f2d1d0f8700' had suicide timed out after 300
 common/HeartbeatMap.cc: In function 'bool
 ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
 time_t)' thread 7f2d2a213700 time 2012-04-02 17:
 05:22.894637
 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout)
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
 const*, long)+0x1fe) [0x7634ee]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50]
  4: (CephContextServiceThread::entry()+0x5f) [0x65a31f]
  5: (()+0x69ca) [0x7f2d2beab9ca]
  6: (clone()+0x6d) [0x7f2d2a4fccdd]
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
 const*, long)+0x1fe) [0x7634ee]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50]
  4: (CephContextServiceThread::entry()+0x5f) [0x65a31f]
  5: (()+0x69ca) [0x7f2d2beab9ca]
  6: (clone()+0x6d) [0x7f2d2a4fccdd]
 *** Caught signal (Aborted) **
  in thread 7f2d2a213700
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: /usr/bin/ceph-osd() [0x661cb1]
  2: (()+0xf8f0) [0x7f2d2beb48f0]
  3: (gsignal()+0x35) [0x7f2d2a449a75]
  4: (abort()+0x180) [0x7f2d2a44d5c0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2d2acec58d]
  6: (()+0xb7736) [0x7f2d2acea736]
  7: (()+0xb7763) [0x7f2d2acea763]
  8: (()+0xb785e) [0x7f2d2acea85e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x841) [0x667541]
  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
 const*, long)+0x1fe) [0x7634ee]
  11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50]
  13: (CephContextServiceThread::entry()+0x5f) [0x65a31f]
  14: (()+0x69ca) [0x7f2d2beab9ca]
  15: (clone()+0x6d) [0x7f2d2a4fccdd]

 Or something like that:

 ...
 2012-04-02 17:01:38.673223 7f7855486700 heartbeat_map is_healthy
 'OSD::op_tp thread 0x7f7847369700' had timed out after 30
 2012-04-02 17:01:38.673267 7f7855486700 heartbeat_map is_healthy
 'OSD::op_tp thread 0x7f7847b6a700' had timed out after 30
 2012-04-02 17:01:38.833509 7f7847369700 heartbeat_map reset_timeout
 'OSD::op_tp thread 0x7f7847369700' had timed out after 30
 2012-04-02 17:01:39.031229 7f7847b6a700 heartbeat_map reset_timeout
 'OSD::op_tp thread 0x7f7847b6a700' had timed out after 30
 2012-04-02 17:02:06.971487 7f784324b700 -- 84.201.161.48:6803/17442
 

Re: OSD suicide

2012-04-03 Thread Stefan Kleijkers

Hello Vladimir,

Well in that case you could try BTRFS. With BTRFS it's possible to grab 
all the disks in a node together in a RAID0/RAID1/RAID10 configuration. 
So you can run one or a few OSDs per node. But I would recommend the 
newest kernel possible. I haven't tried with the 3.3 range, but with the 
early 3.2.x kernels I got BTRFS crashes. And with the later 3.2.x 
kernels I saw a real slowdown after some time.


If you get it stabilised with the mdraid, please let me know, I'm still 
interested in that setup. With the current setup I have the problem that 
with a disk crash in most cases I can't umount the filesystem anymore 
and I have to reboot the node. I would like to avoid that and with 
mdraid it's possible to swap a disk without bringing the system down.


Stefan

On 04/03/2012 07:16 PM, Borodin Vladimir wrote:

Yes, Stefan. You are right. I'm not sure about the D state, but high
cpu usage is fact.
I do want to try an OSD per disk configuration but a bit later.

Thanks,
Vladimir.

2012/4/3 Stefan Kleijkersste...@unilogicnetworks.net:

Hello,

A while back I had the same errors you are seeing. I had these problems only
when using mdraid. After doing IO for some time the IO stalled and in most
cases if you look at the cepg-osd daemon it's in D mode (waiting for IO).
Also if you look with top you notice a very high load and IO wait.

I didn't find out what the exact reason was, but I stopped using mdraid and
use a setup with an OSD per disk and the disks formatted with XFS. This gave
me the best stability and performance.

Stefan


On 04/02/2012 04:01 PM, Бородин Владимир wrote:

Hi all.

I have a cluster with 4 OSDs (mdRAID10 on each of them with XFS) and I
am trying to put into RADOS (through python API) 20 million objects 20
KB each. I have two problems:
1. the speed is not as good as I expect (but that's not the main problem
now),
2. after I put 10 million objects, OSDs started to mark itself down
and out. The logs give something like that:

2012-04-02 17:05:17.894395 7f2d2a213700 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30
2012-04-02 17:05:18.877781 7f2d1a8f3700 osd.47 1673 heartbeat_check:
no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
2012-04-02 17:04:58.87
7752)
2012-04-02 17:05:19.578112 7f2d1a8f3700 osd.47 1673 heartbeat_check:
no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
2012-04-02 17:04:59.57
8079)
2012-04-02 17:05:20.678455 7f2d1a8f3700 osd.47 1673 heartbeat_check:
no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
2012-04-02 17:05:00.67
8421)
2012-04-02 17:05:21.678785 7f2d1a8f3700 osd.47 1673 heartbeat_check:
no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
2012-04-02 17:05:01.67
8751)
2012-04-02 17:05:22.579101 7f2d1a8f3700 osd.47 1673 heartbeat_check:
no heartbeat from osd.49 since 2012-04-02 17:02:49.217108 (cutoff
2012-04-02 17:05:02.57
9069)
2012-04-02 17:05:22.894568 7f2d2a213700 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f2d1d0f8700' had timed out after 30
2012-04-02 17:05:22.894601 7f2d2a213700 heartbeat_map is_healthy
'OSD::op_tp thread 0x7f2d1d0f8700' had suicide timed out after 300
common/HeartbeatMap.cc: In function 'bool
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
time_t)' thread 7f2d2a213700 time 2012-04-02 17:
05:22.894637
common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout)
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x1fe) [0x7634ee]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50]
  4: (CephContextServiceThread::entry()+0x5f) [0x65a31f]
  5: (()+0x69ca) [0x7f2d2beab9ca]
  6: (clone()+0x6d) [0x7f2d2a4fccdd]
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x1fe) [0x7634ee]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x763a50]
  4: (CephContextServiceThread::entry()+0x5f) [0x65a31f]
  5: (()+0x69ca) [0x7f2d2beab9ca]
  6: (clone()+0x6d) [0x7f2d2a4fccdd]
*** Caught signal (Aborted) **
  in thread 7f2d2a213700
  ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
  1: /usr/bin/ceph-osd() [0x661cb1]
  2: (()+0xf8f0) [0x7f2d2beb48f0]
  3: (gsignal()+0x35) [0x7f2d2a449a75]
  4: (abort()+0x180) [0x7f2d2a44d5c0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f2d2acec58d]
  6: (()+0xb7736) [0x7f2d2acea736]
  7: (()+0xb7763) [0x7f2d2acea763]
  8: (()+0xb785e) [0x7f2d2acea85e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x841) [0x667541]
  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
const*, long)+0x1fe) [0x7634ee]
  11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x76381f]
  12: