Re: 100% CPU lockup

Lee Garrett Thu, 07 Jun 2012 09:06:32 -0700

On 06/07/2012 04:43 PM, Arnt Gulbrandsen wrote:

On 06/07/2012 04:37 PM, Lee Garrett wrote:

I'm tring to debug a 100% CPU lockup with aox. I'm experiencing the
bug in 3.1.3 and in master branch of the git repo (up to commit e96c93d).


There is a IMAP folder with 2000+ new messages, my mail client
accesses it ( UID fetch 1:* (FLAGS) (CHANGEDSINCE 13466) ), the SQL
query takes maybe a sec or two and after that aox "loops" in
Allocator::allocate.

359 if ( taken< capacity ) {
(gdb) bt
#0 0x0000000000547a88 in Allocator::allocate (this=0x21f5270, size=24,
pointers=1) at core/allocator.cpp:359
#1 0x0000000000547b0e in Allocator::allocate (this=0x232b310, size=24,
pointers=1) at core/allocator.cpp:393
#2 0x0000000000547b0e in Allocator::allocate (this=0x2363d60, size=24,
pointers=1) at core/allocator.cpp:393
#3 0x0000000000547b0e in Allocator::allocate (this=0x232abb0, size=24,
pointers=1) at core/allocator.cpp:393
#4 0x0000000000547b0e in Allocator::allocate (this=0x232b7b0, size=24,
pointers=1) at core/allocator.cpp:393
#5 0x0000000000547b0e in Allocator::allocate (this=0x231ab80, size=24,
pointers=1) at core/allocator.cpp:393
#6 0x0000000000547b0e in Allocator::allocate (this=0x21f1850, size=24,
pointers=1) at core/allocator.cpp:393
#7 0x0000000000547b0e in Allocator::allocate (this=0x23aaca0, size=24,
pointers=1) at core/allocator.cpp:393
[...]


Sounds like memory corruption affecting the allocator. Do the 'this'
values ever repeat?

I checked three different backtraces. All the 'this' values onallocate() within a backtrace are unique.


If you change BlockShift in core/allocator.cpp to e.g. 20 or 24, does
the stack change? If you build with jam -sOPTIM= install, does the bug
go away or change?

I tried jam -sOPTIM= first, the problem was still there. Though it didenter the "loop" from a different part of the code:


[...]

#270 0x000000000059e008 in Allocator::allocate (this=0x147f4d0, size=16,pointers=2) at core/allocator.cpp:393#271 0x000000000059e008 in Allocator::allocate (this=0x13e2010, size=16,pointers=2) at core/allocator.cpp:393#272 0x000000000059e008 in Allocator::allocate (this=0x1401830, size=16,pointers=2) at core/allocator.cpp:393#273 0x000000000059e008 in Allocator::allocate (this=0x1401c00, size=16,pointers=2) at core/allocator.cpp:393#274 0x000000000059d4d1 in Allocator::alloc (s=16, n=2) atcore/allocator.cpp:183#275 0x0000000000596772 in Garbage::operator new (s=16) atcore/global.cpp:48#276 0x00000000004ca0ed in PgDataRow (this=0x7fff6b7320d0,b=0x7ffc347c1948, d=0x7ffc323c0a88) at db/pgmessage.cpp:965#277 0x00000000004bc456 in Postgres::process (this=0x7ffc347c1748,type=68 'D') at db/postgres.cpp:579#278 0x00000000004ba8fb in Postgres::react (this=0x7ffc347c1748,e=Connection::Read) at db/postgres.cpp:326#279 0x000000000055bf02 in EventLoop::dispatch (this=0x7ffc34802248,c=0x7ffc347c1748, r=true, w=false, now=1339083595) atserver/eventloop.cpp:463#280 0x000000000055b60b in EventLoop::start (this=0x7ffc34802248) atserver/eventloop.cpp:306#281 0x0000000000562c2d in Server::run (this=0x7fff6b732daf) atserver/server.cpp:645#282 0x0000000000404501 in main (argc=2, argv=0x7fff6b732eb8) atarchiveopteryx/archiveopteryx.cpp:257

However, changing BlockShift to 20 did fix my problem. The IMAP folderdid finish loading and my aox instance is still responsive. I guessthrowing larger memory chunks at aox did help :)


Thanks for helping me with debugging!

Greets, Lee


17 means "allocate 2^17 bytes from the OS at a time", 128k. 20 is 1MB,
24 16MB.

taken and capacity are always the same size, so that block gets
skipped and AFAICS a new Allocator object gets added to the chain. I
don't known enough of aox's memory allocator to understand the gist of
the problem. What strikes me as odd is that apparently
Allocator::alloc tries to find 4.3 billion 44 byte ranges?


No, that function is alloc( number of bytes, maximum number of pointers
) and the default number of pointers is UINT_MAX, meaning "all of the
bytes are pointers". So it's looking for a 44-byte object which may
contain pointers anywhere.

Arnt

Re: 100% CPU lockup

Reply via email to