> > yeah. ok, nest steps: > *) can you confirm that postgres process is using high cpu (according > to top) during stall time >
yes, CPU is spread across a lot of postmasters PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29863 pgsql 20 0 3636m 102m 36m R 19.1 0.3 0:01.33 postmaster 30277 pgsql 20 0 3645m 111m 37m R 16.8 0.3 0:01.27 postmaster 11966 pgsql 20 0 3568m 22m 15m R 15.1 0.1 0:00.66 postmaster 8073 pgsql 20 0 3602m 60m 26m S 13.6 0.2 0:00.77 postmaster 29780 pgsql 20 0 3646m 115m 43m R 13.6 0.4 0:01.13 postmaster 11865 pgsql 20 0 3606m 61m 23m S 12.8 0.2 0:01.87 postmaster 29379 pgsql 20 0 3603m 70m 30m R 12.8 0.2 0:00.80 postmaster 29727 pgsql 20 0 3616m 77m 31m R 12.5 0.2 0:00.81 postmaster > *) if, so, please strace that process and save some of the log > https://dl.dropbox.com/u/109778/stall_postmaster.log > *) you're using a 'bleeding edge' kernel. so we must be suspicious of > a regression there, particularly in the scheduler. > this was observed for a while, during which period system went from using 3.4.* kernels to 3.5.*... but I do not deny such a possibility. > *) I am suspicious of spinlock issue. so, if we can't isolate the > problem, is running a hand complied postgres a possibility (for lock > stats)? > Yes, definitely possible. we run manually compiled postgresql anyway. Pls, provide instructions. > *) what is the output of this: > echo /proc/sys/vm/zone_reclaim_mode > > I presume you wanted cat instead of echo, and it shows 0. -- vlad
