On Oct 4, 2010, at 1:23 PM, Heikki Linnakangas wrote:

> On 04.10.2010 14:02, Hans-Jürgen Schönig wrote:
>> it seems we have found a fairly nasty problem.
>> imagine a long transaction which piles up XX.XXX of locks (count on 
>> pg_locks) inside the same transaction by doing some tasty savepoints, with 
>> hold cursors and so on.
>> in this case we see that a normal count issued in a second database 
>> connection will take ages. in a practical case we did a plain seq_scan in 
>> connection 2. instead of 262 ms (cached case) it started to head north 
>> linearily with the number of locks taken by connection 1. in an extreme case 
>> it took around 1.5 hours or so (on XXX.XXX pg_locks entries).
>> 
>> i tracked down the issue quickly and make the following profile (in 10k 
>> locks or so):
>> 
>> Flat profile:
>> 
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls   s/call   s/call  name
>>  32.49      6.01     6.01 98809118     0.00     0.00  
>> SimpleLruReadPage_ReadOnly
>>  26.97     11.00     4.99 98837761     0.00     0.00  LWLockAcquire
>>  21.89     15.05     4.05 98837761     0.00     0.00  LWLockRelease
>>   8.70     16.66     1.61 98789370     0.00     0.00  SubTransGetParent
>>   4.38     17.47     0.81    19748     0.00     0.00  
>> SubTransGetTopmostTransaction
>>   2.41     17.92     0.45 98851951     0.00     0.00  TransactionIdPrecedes
>>   0.59     18.03     0.11                             LWLockAssign
>>   0.54     18.13     0.10                             
>> LWLockConditionalAcquire
>>   0.46     18.21     0.09    19748     0.00     0.00  TransactionLogFetch
>>   0.38     18.28     0.07                             SimpleLruReadPage
>>   0.27     18.33     0.05                             SubTransSetParent
>>   0.05     18.34     0.01   136778     0.00     0.00  AllocSetAlloc
>>   0.05     18.35     0.01    42996     0.00     0.00  slot_deform_tuple
>>   0.05     18.36     0.01    42660     0.00     0.00  
>> TransactionIdIsCurrentTransactionId
>> 
>> it seems we are running into a nice shared buffer / locking contention here 
>> and the number of calls explodes (this profiling infos is coming from a seq 
>> scan on a 500.000 rows table - 400 mb or so).
> 
> That doesn't seem related to the lock manager. Is the long-running 
> transaction inserting a lot of tuples (by INSERT or UPDATE) to the same table 
> that the seqscan scans? With a lot of different subtransaction xids. That 
> profile looks like the seqscan is spending a lot of time swapping pg_subtrans 
> pages in and out of the slru buffers.
> 
> Increasing NUM_SUBTRANS_BUFFERS should help. A more sophisticated solution 
> would be to allocate slru buffers (for clog and other slru caches as well) 
> dynamically from shared_buffers. That's been discussed before but no-one has 
> gotten around to it.
> 
> -- 
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com



hello ...

yeah, it seems this solves the problem.
i had a closer look at the SQL trace and did some more profiling.
this was the case.

        many thanks for the quick hint.

        hans






--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to