Re: [HACKERS] ExclusiveLock on extension of relation with huge shared_buffers

2014-12-28 Thread Borodin Vladimir

25 окт. 2014 г., в 4:31, Jim Nasby  написал(а):

> Please don't top-post.
> 
> On 10/24/14, 3:40 AM, Borodin Vladimir wrote:
>> I have taken some backtraces (they are attached to the letter) of two 
>> processes with such command:
>> pid=17981; while true; do date; gdb -batch -e back 
>> /usr/pgsql-9.4/bin/postgres $pid; echo; echo; echo; echo; sleep 0.1; done
>> 
>> Process 17981 was holding the lock for a long time - 
>> http://pastie.org/9671931.
>> And process 13886 was waiting for lock (in different time and from different 
>> blocker actually but I don’t think it is really important) - 
>> http://pastie.org/9671939.
>> 
>> As I can see, 17981 is actually waiting for LWLock on BufFreelistLock in 
>> StrategyGetBuffer function, freelist.c:134 while holding exclusive lock on 
>> relation. I will try to increase NUM_BUFFER_PARTITIONS (on read-only load it 
>> also gave us some performance boost) and write the result in this thread.
> 
> BufFreelistLock becomes very contended when shared buffers are under a lot of 
> pressure.
> 
> Here's what I believe is happening:
> 
> If RelationGetBufferForTuple() decides it needs to extend, this happens:
>   LockRelationForExtension(relation, ExclusiveLock);
>   buffer = ReadBufferBI(relation, P_NEW, bistate);
> 
> Assuming bistate is false (I didn't check the bulk case), ReadBufferBI() ends 
> up at ReadBuffer_common(), which calls BufferAlloc(). In the normal case, 
> BufferAlloc() won't find the necessary buffer, so it will call 
> StrategyGetBuffer(), which will end up getting the freelist lock. Currently 
> the free list is normally empty, which means we now need to run the clock 
> sweep to find a victim buffer. The clock sweep will keep running until it 
> finds a buffer that is not pinned and has usage_count = 0. If shared buffers 
> are under heavy pressure, you can have a huge number of them with usage_count 
> = 5, which for 100GB shared buffers and an 8K BLKSZ, you could have to check 
> buffers *52 million* times (assuming you finally find a buffer on the start 
> of the 5th loop) before you find a victim.
> 
> Keep in mind that's all happening while you're holding both the extension 
> lock *and the freelist lock*, which basically means no one else in the entire 
> system can allocate a new buffer.

I’ll try the same workload with recent patch from Andres Freund [0].

> 
> This is one reason why a large shared_buffers setting is usually 
> counter-productive. Experience with older versions is that setting it higher 
> than about 8GB is more likely to hurt than to help. Newer versions are 
> probably better, but I think you'll be hard-pressed to find a workload where 
> 100GB makes sense. It might if your entire database fits in shared_buffers 
> (though, even then there's probably a number of O(n) or worse operations that 
> will hurt you), but if your database is > shared_buffers you're probably in 
> trouble.
> 
> I suggest cutting shared_buffers *way* down. Old-school advice for this 
> machine would be 8G (since 25% of 128G would be too big). You might be able 
> to do better than 8G, but I recommend not even trying unless you've got a 
> good way to test your performance.
> 
> If you can test performance and find an optimal setting for shared_buffers, 
> please do share your test data and findings. :)

Of course, it works well with shared_buffers <= 8GB. But we have seen that on 
read-only load when data set fits in RAM with <=8GB shared_buffers we hit 
BufFreelistLock LWLock while moving pages between shared buffers and page 
cache. Increasing shared_buffers size to the size of data set improves 
performance up to 2,5X faster on this read-only load. So we started testing 
configuration with huge shared_buffers under writing load and that’s why I 
started this thread.

Since StrategyGetBuffer() does not use BufFreelistLock LWLock any more [1] I’ll 
also re-run tests with read-only load and small shared_buffers.

[0] 
http://git.postgresql.org/pg/commitdiff/d72731a70450b5e7084991b9caa15cb58a2820df
[1] 
http://git.postgresql.org/pg/commitdiff/1dcfb8da09c47d2a7502d1dfab06c8be4b6cf323

> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Vladimir






Re: [HACKERS] ExclusiveLock on extension of relation with huge shared_buffers

2014-10-24 Thread Jim Nasby

Please don't top-post.

On 10/24/14, 3:40 AM, Borodin Vladimir wrote:

I have taken some backtraces (they are attached to the letter) of two processes 
with such command:
pid=17981; while true; do date; gdb -batch -e back /usr/pgsql-9.4/bin/postgres 
$pid; echo; echo; echo; echo; sleep 0.1; done

Process 17981 was holding the lock for a long time - http://pastie.org/9671931.
And process 13886 was waiting for lock (in different time and from different 
blocker actually but I don’t think it is really important) - 
http://pastie.org/9671939.

As I can see, 17981 is actually waiting for LWLock on BufFreelistLock in 
StrategyGetBuffer function, freelist.c:134 while holding exclusive lock on 
relation. I will try to increase NUM_BUFFER_PARTITIONS (on read-only load it 
also gave us some performance boost) and write the result in this thread.


BufFreelistLock becomes very contended when shared buffers are under a lot of 
pressure.

Here's what I believe is happening:

If RelationGetBufferForTuple() decides it needs to extend, this happens:
LockRelationForExtension(relation, ExclusiveLock);
buffer = ReadBufferBI(relation, P_NEW, bistate);

Assuming bistate is false (I didn't check the bulk case), ReadBufferBI() ends 
up at ReadBuffer_common(), which calls BufferAlloc(). In the normal case, 
BufferAlloc() won't find the necessary buffer, so it will call 
StrategyGetBuffer(), which will end up getting the freelist lock. Currently the 
free list is normally empty, which means we now need to run the clock sweep to 
find a victim buffer. The clock sweep will keep running until it finds a buffer 
that is not pinned and has usage_count = 0. If shared buffers are under heavy 
pressure, you can have a huge number of them with usage_count = 5, which for 
100GB shared buffers and an 8K BLKSZ, you could have to check buffers *52 
million* times (assuming you finally find a buffer on the start of the 5th 
loop) before you find a victim.

Keep in mind that's all happening while you're holding both the extension lock 
*and the freelist lock*, which basically means no one else in the entire system 
can allocate a new buffer.

This is one reason why a large shared_buffers setting is usually 
counter-productive. Experience with older versions is that setting it higher than 
about 8GB is more likely to hurt than to help. Newer versions are probably better, 
but I think you'll be hard-pressed to find a workload where 100GB makes sense. It 
might if your entire database fits in shared_buffers (though, even then there's 
probably a number of O(n) or worse operations that will hurt you), but if your 
database is > shared_buffers you're probably in trouble.

I suggest cutting shared_buffers *way* down. Old-school advice for this machine 
would be 8G (since 25% of 128G would be too big). You might be able to do 
better than 8G, but I recommend not even trying unless you've got a good way to 
test your performance.

If you can test performance and find an optimal setting for shared_buffers, 
please do share your test data and findings. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers