Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-31 Thread Jason Dillaman
The exclusive-lock feature should only require grabbing the lock on
the very first IO, so if this is an issue that pops up after extended
use, it's either most likely not related to exclusive-lock or perhaps
you had a client<->OSD link hiccup. In the latter case, you will see a
log message like "image watch failed" in your logs.

Since this isn't something that we have run into during our regular
testing, I would greatly appreciate if someone could capture a "gcore"
dump from a running but stuck process and use "ceph-post-file" to
provide us with the dump (along with versions of installed RPMs/DEBs
so we can configure the proper debug symbols).

On Thu, Mar 30, 2017 at 7:18 AM, Peter Maloney
 wrote:
> On 03/28/17 17:28, Brian Andrus wrote:
>> Just adding some anecdotal input. It likely won't be ultimately
>> helpful other than a +1..
>>
>> Seemingly, we also have the same issue since enabling exclusive-lock
>> on images. We experienced these messages at a large scale when making
>> a CRUSH map change a few weeks ago that resulted in many many VMs
>> experiencing the blocked task kernel messages, requiring reboots.
>>
>> We've since disabled on all images we can, but there are still
>> jewel-era instances that cannot have the feature disabled. Since
>> disabling the feature, I have not observed any cases of blocked tasks,
>> but so far given the limited timeframe I'd consider that anecdotal.
>>
>>
>
> Why do you need it enabled in jewel-era instances? With jewel you can
> set them on the fly, and live migrate the VM to get the client to update
> its usage of it.
>
> I couldn't find any difference except removing big images is faster with
> object-map (which depends on exclusive-lock). So I can't imagine why it
> can be required.
>
> And how long did you test it? I tested it a few weeks ago for about a
> week, with no hangs. Normally there are hangs after a few days. And I
> have permanently disabled it since the 20th, without any hangs since.
> And I'm gradually adding back the VMs that died when they were there,
> starting with the worst offenders. With that small time, I'm still very
> convinced.
>
> And did you test other features? I suspected exclusive-lock, so I only
> tested removing that one, which required removing object-map and
> fast-diff too, so I didn't test those 2 separately.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-30 Thread Peter Maloney
On 03/28/17 17:28, Brian Andrus wrote:
> Just adding some anecdotal input. It likely won't be ultimately
> helpful other than a +1..
>
> Seemingly, we also have the same issue since enabling exclusive-lock
> on images. We experienced these messages at a large scale when making
> a CRUSH map change a few weeks ago that resulted in many many VMs
> experiencing the blocked task kernel messages, requiring reboots.
>
> We've since disabled on all images we can, but there are still
> jewel-era instances that cannot have the feature disabled. Since
> disabling the feature, I have not observed any cases of blocked tasks,
> but so far given the limited timeframe I'd consider that anecdotal.
>
>

Why do you need it enabled in jewel-era instances? With jewel you can
set them on the fly, and live migrate the VM to get the client to update
its usage of it.

I couldn't find any difference except removing big images is faster with
object-map (which depends on exclusive-lock). So I can't imagine why it
can be required.

And how long did you test it? I tested it a few weeks ago for about a
week, with no hangs. Normally there are hangs after a few days. And I
have permanently disabled it since the 20th, without any hangs since.
And I'm gradually adding back the VMs that died when they were there,
starting with the worst offenders. With that small time, I'm still very
convinced.

And did you test other features? I suspected exclusive-lock, so I only
tested removing that one, which required removing object-map and
fast-diff too, so I didn't test those 2 separately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Brian Andrus
Just adding some anecdotal input. It likely won't be ultimately helpful
other than a +1..

Seemingly, we also have the same issue since enabling exclusive-lock on
images. We experienced these messages at a large scale when making a CRUSH
map change a few weeks ago that resulted in many many VMs experiencing the
blocked task kernel messages, requiring reboots.

We've since disabled on all images we can, but there are still jewel-era
instances that cannot have the feature disabled. Since disabling the
feature, I have not observed any cases of blocked tasks, but so far given
the limited timeframe I'd consider that anecdotal.


On Mon, Mar 27, 2017 at 12:31 PM, Hall, Eric 
wrote:

> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
> ceph hosts, we occasionally see hung processes (usually during boot, but
> otherwise as well), with errors reported in the instance logs as shown
> below.  Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in
> terms of memory or network bandwidth, none of the 67 osds are over 80%
> full, nor do any of them appear to be overwhelmed in terms of IO.  Compute
> hosts and ceph cluster are connected via a relatively quiet 1Gb network,
> with an IBoE net between the ceph nodes.  Neither network appears
> overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even
> with 20/20 logging from various components (rbd, rados, client,
> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
> (currently 64k... overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues
> or constraining bottlenecks.
>
> Any suggestions or insights into this situation are appreciated.  Thank
> you for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
> than 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226
> 2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  []
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ?
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] jbd2_journal_commit_
> transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ?
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ?
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Jason Dillaman
Eric,

If you already have debug level 20 logs captured from one of these
events, I would love to be able to take a look at them to see what's
going on. Depending on the size, you could either attach the log to a
new RBD tracker ticket [1] or use the ceph-post-file helper to upload
a large file.

Thanks,
Jason

[1] http://tracker.ceph.com/projects/rbd/issues

On Mon, Mar 27, 2017 at 3:31 PM, Hall, Eric  wrote:
> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), 
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph 
> hosts, we occasionally see hung processes (usually during boot, but otherwise 
> as well), with errors reported in the instance logs as shown below.  
> Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
> of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
> any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
> cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
> between the ceph nodes.  Neither network appears overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even 
> with 20/20 logging from various components (rbd, rados, client, objectcacher, 
> etc.)  I’ve increased the qemu file descriptor limit (currently 64k... 
> overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues or 
> constraining bottlenecks.
>
> Any suggestions or insights into this situation are appreciated.  Thank you 
> for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 
> 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226 
>  2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180 
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] 
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] 
> jbd2_journal_commit_transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ? 
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Marius Vaitiekunas
On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

> I can't guarantee it's the same as my issue, but from that it sounds the
> same.
>
> Jewel 10.2.4, 10.2.5 tested
> hypervisors are proxmox qemu-kvm, using librbd
> 3 ceph nodes with mon+osd on each
>
> -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
> and bw limits on client side, jumbo frames, etc. all improve/smooth out
> performance and mitigate the hangs, but don't prevent it.
> -hangs are usually associated with blocked requests (I set the complaint
> time to 5s to see them)
> -hangs are very easily caused by rbd snapshot + rbd export-diff to do
> incremental backup (one snap persistent, plus one more during backup)
> -when qemu VM io hangs, I have to kill -9 the qemu process for it to
> stop. Some broken VMs don't appear to be hung until I try to live
> migrate them (live migrating all VMs helped test solutions)
>
> Finally I have a workaround... disable exclusive-lock, object-map, and
> fast-diff rbd features (and restart clients via live migrate).
> (object-map and fast-diff appear to have no effect on dif or export-diff
> ... so I don't miss them). I'll file a bug at some point (after I move
> all VMs back and see if it is still stable). And one other user on IRC
> said this solved the same problem (also using rbd snapshots).
>
> And strangely, they don't seem to hang if I put back those features,
> until a few days later (making testing much less easy...but now I'm very
> sure removing them prevents the issue)
>
> I hope this works for you (and maybe gets some attention from devs too),
> so you don't waste months like me.
>
> On 03/27/17 19:31, Hall, Eric wrote:
> > In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
> ceph hosts, we occasionally see hung processes (usually during boot, but
> otherwise as well), with errors reported in the instance logs as shown
> below.  Configuration is vanilla, based on openstack/ceph docs.
> >
> > Neither the compute hosts nor the ceph hosts appear to be overloaded in
> terms of memory or network bandwidth, none of the 67 osds are over 80%
> full, nor do any of them appear to be overwhelmed in terms of IO.  Compute
> hosts and ceph cluster are connected via a relatively quiet 1Gb network,
> with an IBoE net between the ceph nodes.  Neither network appears
> overloaded.
> >
> > I don’t see any related (to my eye) errors in client or server logs,
> even with 20/20 logging from various components (rbd, rados, client,
> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
> (currently 64k... overkill for sure.)
> >
> > I “feels” like a performance problem, but I can’t find any capacity
> issues or constraining bottlenecks.
> >
> > Any suggestions or insights into this situation are appreciated.  Thank
> you for your time,
> > --
> > Eric
> >
> >
> > [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
> than 120 seconds.
> > [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> > [Fri Mar 24 20:30:40 2017] "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> > [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0
>  226  2 0x
> > [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046
> 88042690 88003728bfd8
> > [Fri Mar 24 20:30:40 2017]  00013180 00013180
> 88042690 88043fd13a18
> > [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002
> 811ef7c0 88003728bc50
> > [Fri Mar 24 20:30:40 2017] Call Trace:
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> > [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> > [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  []
> out_of_line_wait_on_bit+0x77/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> autoremove_wake_function+0x40/0x40
> > [Fri Mar 24 20:30:40 2017]  []
> __wait_on_buffer+0x2a/0x30
> > [Fri Mar 24 20:30:40 2017]  [] jbd2_journal_commit_
> transaction+0x185d/0x1ab0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> try_to_del_timer_sync+0x4f/0x70
> > [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> > [Fri Mar 24 20:30:40 2017]  [] ?
> prepare_to_wait_event+0x100/0x100
> > [Fri Mar 24 20:30:40 2017]  [] ?
> commit_timeout+0x10/0x10
> > [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> > [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > 

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Peter Maloney
I can't guarantee it's the same as my issue, but from that it sounds the
same.

Jewel 10.2.4, 10.2.5 tested
hypervisors are proxmox qemu-kvm, using librbd
3 ceph nodes with mon+osd on each

-faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
and bw limits on client side, jumbo frames, etc. all improve/smooth out
performance and mitigate the hangs, but don't prevent it.
-hangs are usually associated with blocked requests (I set the complaint
time to 5s to see them)
-hangs are very easily caused by rbd snapshot + rbd export-diff to do
incremental backup (one snap persistent, plus one more during backup)
-when qemu VM io hangs, I have to kill -9 the qemu process for it to
stop. Some broken VMs don't appear to be hung until I try to live
migrate them (live migrating all VMs helped test solutions)

Finally I have a workaround... disable exclusive-lock, object-map, and
fast-diff rbd features (and restart clients via live migrate).
(object-map and fast-diff appear to have no effect on dif or export-diff
... so I don't miss them). I'll file a bug at some point (after I move
all VMs back and see if it is still stable). And one other user on IRC
said this solved the same problem (also using rbd snapshots).

And strangely, they don't seem to hang if I put back those features,
until a few days later (making testing much less easy...but now I'm very
sure removing them prevents the issue)

I hope this works for you (and maybe gets some attention from devs too),
so you don't waste months like me.

On 03/27/17 19:31, Hall, Eric wrote:
> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), 
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph 
> hosts, we occasionally see hung processes (usually during boot, but otherwise 
> as well), with errors reported in the instance logs as shown below.  
> Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
> of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
> any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
> cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
> between the ceph nodes.  Neither network appears overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even 
> with 20/20 logging from various components (rbd, rados, client, objectcacher, 
> etc.)  I’ve increased the qemu file descriptor limit (currently 64k... 
> overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues or 
> constraining bottlenecks. 
>
> Any suggestions or insights into this situation are appreciated.  Thank you 
> for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 
> 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226 
>  2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180 
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] 
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] 
> jbd2_journal_commit_transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ? 
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Hall, Eric
In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), using 
libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph hosts, 
we occasionally see hung processes (usually during boot, but otherwise as 
well), with errors reported in the instance logs as shown below.  Configuration 
is vanilla, based on openstack/ceph docs.

Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
between the ceph nodes.  Neither network appears overloaded.

I don’t see any related (to my eye) errors in client or server logs, even with 
20/20 logging from various components (rbd, rados, client, objectcacher, etc.)  
I’ve increased the qemu file descriptor limit (currently 64k... overkill for 
sure.)

I “feels” like a performance problem, but I can’t find any capacity issues or 
constraining bottlenecks. 

Any suggestions or insights into this situation are appreciated.  Thank you for 
your time,
--
Eric


[Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 120 
seconds.
[Fri Mar 24 20:30:40 2017]       Not tainted 3.13.0-52-generic #85-Ubuntu
[Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D 88043fd13180     0   226      
2 0x
[Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 88042690 
88003728bfd8
[Fri Mar 24 20:30:40 2017]  00013180 00013180 88042690 
88043fd13a18
[Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 811ef7c0 
88003728bc50
[Fri Mar 24 20:30:40 2017] Call Trace:
[Fri Mar 24 20:30:40 2017]  [] ? generic_block_bmap+0x50/0x50
[Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
[Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
[Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
[Fri Mar 24 20:30:40 2017]  [] ? generic_block_bmap+0x50/0x50
[Fri Mar 24 20:30:40 2017]  [] 
out_of_line_wait_on_bit+0x77/0x90
[Fri Mar 24 20:30:40 2017]  [] ? 
autoremove_wake_function+0x40/0x40
[Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
[Fri Mar 24 20:30:40 2017]  [] 
jbd2_journal_commit_transaction+0x185d/0x1ab0
[Fri Mar 24 20:30:40 2017]  [] ? 
try_to_del_timer_sync+0x4f/0x70
[Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
[Fri Mar 24 20:30:40 2017]  [] ? 
prepare_to_wait_event+0x100/0x100
[Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
[Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
[Fri Mar 24 20:30:40 2017]  [] ? 
kthread_create_on_node+0x1c0/0x1c0
[Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
[Fri Mar 24 20:30:40 2017]  [] ? 
kthread_create_on_node+0x1c0/0x1c0



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com