Hi everyone,
I'm tring to debug a 100% CPU lockup with aox. I'm experiencing the bug in
3.1.3 and in master branch of the git repo (up to commit e96c93d).
There is a IMAP folder with 2000+ new messages, my mail client accesses it (
UID fetch 1:* (FLAGS) (CHANGEDSINCE 13466) ), the SQL query takes maybe a sec
or two and after that aox "loops" in Allocator::allocate.
359 if ( taken < capacity ) {
(gdb) bt
#0 0x0000000000547a88 in Allocator::allocate (this=0x21f5270, size=24,
pointers=1) at core/allocator.cpp:359
#1 0x0000000000547b0e in Allocator::allocate (this=0x232b310, size=24,
pointers=1) at core/allocator.cpp:393
#2 0x0000000000547b0e in Allocator::allocate (this=0x2363d60, size=24,
pointers=1) at core/allocator.cpp:393
#3 0x0000000000547b0e in Allocator::allocate (this=0x232abb0, size=24,
pointers=1) at core/allocator.cpp:393
#4 0x0000000000547b0e in Allocator::allocate (this=0x232b7b0, size=24,
pointers=1) at core/allocator.cpp:393
#5 0x0000000000547b0e in Allocator::allocate (this=0x231ab80, size=24,
pointers=1) at core/allocator.cpp:393
#6 0x0000000000547b0e in Allocator::allocate (this=0x21f1850, size=24,
pointers=1) at core/allocator.cpp:393
#7 0x0000000000547b0e in Allocator::allocate (this=0x23aaca0, size=24,
pointers=1) at core/allocator.cpp:393
[...]
#825 0x0000000000547b0e in Allocator::allocate (this=0x1947930, size=44,
pointers=5) at core/allocator.cpp:393
#826 0x0000000000547da7 in Allocator::alloc (s=44, n=4294967295) at
core/allocator.cpp:183
#827 0x000000000042486e in PatriciaTree<FetchData::DynamicData>::Node::operator
new (this=<value optimized out>, x=<value optimized out>)
at core/patriciatree.h:373
#828 PatriciaTree<FetchData::DynamicData>::node (this=<value optimized out>,
x=<value optimized out>) at core/patriciatree.h:373
#829 0x0000000000421cc2 in PatriciaTree<FetchData::DynamicData>::insert
(this=0x7fb8afda1f08) at core/patriciatree.h:169
#830 Map<FetchData::DynamicData>::insert (this=0x7fb8afda1f08) at core/map.h:26
#831 Fetch::execute (this=0x7fb8afda1f08) at imap/handlers/fetch.cpp:741
#832 0x000000000049b18e in Query::notify (this=0x7fb89dd25468) at
db/query.cpp:563
#833 0x000000000048a1e6 in Postgres::process (this=0x7fb8b8681c48, type=<value
optimized out>) at db/postgres.cpp:624
#834 0x0000000000491100 in Postgres::react (this=0x7fb8b8681c48, e=<value
optimized out>) at db/postgres.cpp:326
#835 0x0000000000507ff5 in EventLoop::dispatch (this=0x7fb8b86c2248,
c=0x7fb8b8681c48, r=<value optimized out>, w=false, now=1339076952)
at server/eventloop.cpp:463
#836 0x00000000005091b8 in EventLoop::start (this=0x7fb8b86c2248) at
server/eventloop.cpp:306
#837 0x000000000050e2a3 in Server::run (this=<value optimized out>) at
server/server.cpp:645
#838 0x000000000040377e in main (argc=<value optimized out>, argv=<value
optimized out>) at archiveopteryx/archiveopteryx.cpp:257
taken and capacity are always the same size, so that block gets skipped and
AFAICS a new Allocator object gets added to the chain. I don't known enough of
aox's memory allocator to understand the gist of the problem. What strikes me
as odd is that apparently Allocator::alloc tries to find 4.3 billion 44 byte
ranges?
Only patch on top of the master branch is this one in db/postgres.cpp line 141:
- if ( p && getuid() != p->pw_uid ) {
+ if ( false && p && getuid() != p->pw_uid ) {
our earlier admin reported that fixed certain deadlocks on MP machines. I don't
think it's related to the bug.
The bug is reproducible on our machine, I hope you can give me any pointers to
fix this problem.
Greets,
Lee