On 06/07/2012 04:43 PM, Arnt Gulbrandsen wrote:
On 06/07/2012 04:37 PM, Lee Garrett wrote:
I'm tring to debug a 100% CPU lockup with aox. I'm experiencing the
bug in 3.1.3 and in master branch of the git repo (up to commit e96c93d).

There is a IMAP folder with 2000+ new messages, my mail client
accesses it ( UID fetch 1:* (FLAGS) (CHANGEDSINCE 13466) ), the SQL
query takes maybe a sec or two and after that aox "loops" in
Allocator::allocate.

359 if ( taken< capacity ) {
(gdb) bt
#0 0x0000000000547a88 in Allocator::allocate (this=0x21f5270, size=24,
pointers=1) at core/allocator.cpp:359
#1 0x0000000000547b0e in Allocator::allocate (this=0x232b310, size=24,
pointers=1) at core/allocator.cpp:393
#2 0x0000000000547b0e in Allocator::allocate (this=0x2363d60, size=24,
pointers=1) at core/allocator.cpp:393
#3 0x0000000000547b0e in Allocator::allocate (this=0x232abb0, size=24,
pointers=1) at core/allocator.cpp:393
#4 0x0000000000547b0e in Allocator::allocate (this=0x232b7b0, size=24,
pointers=1) at core/allocator.cpp:393
#5 0x0000000000547b0e in Allocator::allocate (this=0x231ab80, size=24,
pointers=1) at core/allocator.cpp:393
#6 0x0000000000547b0e in Allocator::allocate (this=0x21f1850, size=24,
pointers=1) at core/allocator.cpp:393
#7 0x0000000000547b0e in Allocator::allocate (this=0x23aaca0, size=24,
pointers=1) at core/allocator.cpp:393
[...]

Sounds like memory corruption affecting the allocator. Do the 'this'
values ever repeat?
I checked three different backtraces. All the 'this' values on allocate() within a backtrace are unique.


If you change BlockShift in core/allocator.cpp to e.g. 20 or 24, does
the stack change? If you build with jam -sOPTIM= install, does the bug
go away or change?
I tried jam -sOPTIM= first, the problem was still there. Though it did enter the "loop" from a different part of the code:

[...]
#270 0x000000000059e008 in Allocator::allocate (this=0x147f4d0, size=16, pointers=2) at core/allocator.cpp:393 #271 0x000000000059e008 in Allocator::allocate (this=0x13e2010, size=16, pointers=2) at core/allocator.cpp:393 #272 0x000000000059e008 in Allocator::allocate (this=0x1401830, size=16, pointers=2) at core/allocator.cpp:393 #273 0x000000000059e008 in Allocator::allocate (this=0x1401c00, size=16, pointers=2) at core/allocator.cpp:393 #274 0x000000000059d4d1 in Allocator::alloc (s=16, n=2) at core/allocator.cpp:183 #275 0x0000000000596772 in Garbage::operator new (s=16) at core/global.cpp:48 #276 0x00000000004ca0ed in PgDataRow (this=0x7fff6b7320d0, b=0x7ffc347c1948, d=0x7ffc323c0a88) at db/pgmessage.cpp:965 #277 0x00000000004bc456 in Postgres::process (this=0x7ffc347c1748, type=68 'D') at db/postgres.cpp:579 #278 0x00000000004ba8fb in Postgres::react (this=0x7ffc347c1748, e=Connection::Read) at db/postgres.cpp:326 #279 0x000000000055bf02 in EventLoop::dispatch (this=0x7ffc34802248, c=0x7ffc347c1748, r=true, w=false, now=1339083595) at server/eventloop.cpp:463 #280 0x000000000055b60b in EventLoop::start (this=0x7ffc34802248) at server/eventloop.cpp:306 #281 0x0000000000562c2d in Server::run (this=0x7fff6b732daf) at server/server.cpp:645 #282 0x0000000000404501 in main (argc=2, argv=0x7fff6b732eb8) at archiveopteryx/archiveopteryx.cpp:257

However, changing BlockShift to 20 did fix my problem. The IMAP folder did finish loading and my aox instance is still responsive. I guess throwing larger memory chunks at aox did help :)

Thanks for helping me with debugging!

Greets, Lee

17 means "allocate 2^17 bytes from the OS at a time", 128k. 20 is 1MB,
24 16MB.

taken and capacity are always the same size, so that block gets
skipped and AFAICS a new Allocator object gets added to the chain. I
don't known enough of aox's memory allocator to understand the gist of
the problem. What strikes me as odd is that apparently
Allocator::alloc tries to find 4.3 billion 44 byte ranges?

No, that function is alloc( number of bytes, maximum number of pointers
) and the default number of pointers is UINT_MAX, meaning "all of the
bytes are pointers". So it's looking for a 44-byte object which may
contain pointers anywhere.

Arnt


Reply via email to