100% CPU lockup

Lee Garrett Thu, 07 Jun 2012 07:37:08 -0700

Hi everyone,

I'm tring to debug a 100% CPU lockup with aox. I'm experiencing the bug in 
3.1.3 and in master branch of the git repo (up to commit e96c93d).


There is a IMAP folder with 2000+ new messages, my mail client accesses it ( 
UID fetch 1:* (FLAGS) (CHANGEDSINCE 13466) ), the SQL query takes maybe a sec 
or two and after that aox "loops" in Allocator::allocate.

359         if ( taken < capacity ) {
(gdb) bt
#0  0x0000000000547a88 in Allocator::allocate (this=0x21f5270, size=24, 
pointers=1) at core/allocator.cpp:359
#1  0x0000000000547b0e in Allocator::allocate (this=0x232b310, size=24, 
pointers=1) at core/allocator.cpp:393
#2  0x0000000000547b0e in Allocator::allocate (this=0x2363d60, size=24, 
pointers=1) at core/allocator.cpp:393
#3  0x0000000000547b0e in Allocator::allocate (this=0x232abb0, size=24, 
pointers=1) at core/allocator.cpp:393
#4  0x0000000000547b0e in Allocator::allocate (this=0x232b7b0, size=24, 
pointers=1) at core/allocator.cpp:393
#5  0x0000000000547b0e in Allocator::allocate (this=0x231ab80, size=24, 
pointers=1) at core/allocator.cpp:393
#6  0x0000000000547b0e in Allocator::allocate (this=0x21f1850, size=24, 
pointers=1) at core/allocator.cpp:393
#7  0x0000000000547b0e in Allocator::allocate (this=0x23aaca0, size=24, 
pointers=1) at core/allocator.cpp:393
[...]
#825 0x0000000000547b0e in Allocator::allocate (this=0x1947930, size=44, 
pointers=5) at core/allocator.cpp:393
#826 0x0000000000547da7 in Allocator::alloc (s=44, n=4294967295) at 
core/allocator.cpp:183
#827 0x000000000042486e in PatriciaTree<FetchData::DynamicData>::Node::operator 
new (this=<value optimized out>, x=<value optimized out>)
    at core/patriciatree.h:373
#828 PatriciaTree<FetchData::DynamicData>::node (this=<value optimized out>, 
x=<value optimized out>) at core/patriciatree.h:373
#829 0x0000000000421cc2 in PatriciaTree<FetchData::DynamicData>::insert 
(this=0x7fb8afda1f08) at core/patriciatree.h:169
#830 Map<FetchData::DynamicData>::insert (this=0x7fb8afda1f08) at core/map.h:26
#831 Fetch::execute (this=0x7fb8afda1f08) at imap/handlers/fetch.cpp:741
#832 0x000000000049b18e in Query::notify (this=0x7fb89dd25468) at 
db/query.cpp:563
#833 0x000000000048a1e6 in Postgres::process (this=0x7fb8b8681c48, type=<value 
optimized out>) at db/postgres.cpp:624
#834 0x0000000000491100 in Postgres::react (this=0x7fb8b8681c48, e=<value 
optimized out>) at db/postgres.cpp:326
#835 0x0000000000507ff5 in EventLoop::dispatch (this=0x7fb8b86c2248, 
c=0x7fb8b8681c48, r=<value optimized out>, w=false, now=1339076952)
    at server/eventloop.cpp:463
#836 0x00000000005091b8 in EventLoop::start (this=0x7fb8b86c2248) at 
server/eventloop.cpp:306
#837 0x000000000050e2a3 in Server::run (this=<value optimized out>) at 
server/server.cpp:645
#838 0x000000000040377e in main (argc=<value optimized out>, argv=<value 
optimized out>) at archiveopteryx/archiveopteryx.cpp:257

taken and capacity are always the same size, so that block gets skipped and 
AFAICS a new Allocator object gets added to the chain. I don't known enough of 
aox's memory allocator to understand the gist of the problem. What strikes me 
as odd is that apparently Allocator::alloc tries to find 4.3 billion 44 byte 
ranges?

Only patch on top of the master branch is this one in db/postgres.cpp line 141:
-    if ( p && getuid() != p->pw_uid ) {
+    if ( false && p && getuid() != p->pw_uid ) {

our earlier admin reported that fixed certain deadlocks on MP machines. I don't 
think it's related to the bug.

The bug is reproducible on our machine, I hope you can give me any pointers to 
fix this problem.

Greets,
Lee

100% CPU lockup

Reply via email to