On Oct 4, 2010, at 1:23 PM, Heikki Linnakangas wrote:
> On 04.10.2010 14:02, Hans-Jürgen Schönig wrote:
>> it seems we have found a fairly nasty problem.
>> imagine a long transaction which piles up XX.XXX of locks (count on
>> pg_locks) inside the same transaction by doing some tasty savepoints, with
>> hold cursors and so on.
>> in this case we see that a normal count issued in a second database
>> connection will take ages. in a practical case we did a plain seq_scan in
>> connection 2. instead of 262 ms (cached case) it started to head north
>> linearily with the number of locks taken by connection 1. in an extreme case
>> it took around 1.5 hours or so (on XXX.XXX pg_locks entries).
>>
>> i tracked down the issue quickly and make the following profile (in 10k
>> locks or so):
>>
>> Flat profile:
>>
>> Each sample counts as 0.01 seconds.
>> % cumulative self self total
>> time seconds seconds calls s/call s/call name
>> 32.49 6.01 6.01 98809118 0.00 0.00
>> SimpleLruReadPage_ReadOnly
>> 26.97 11.00 4.99 98837761 0.00 0.00 LWLockAcquire
>> 21.89 15.05 4.05 98837761 0.00 0.00 LWLockRelease
>> 8.70 16.66 1.61 98789370 0.00 0.00 SubTransGetParent
>> 4.38 17.47 0.81 19748 0.00 0.00
>> SubTransGetTopmostTransaction
>> 2.41 17.92 0.45 98851951 0.00 0.00 TransactionIdPrecedes
>> 0.59 18.03 0.11 LWLockAssign
>> 0.54 18.13 0.10
>> LWLockConditionalAcquire
>> 0.46 18.21 0.09 19748 0.00 0.00 TransactionLogFetch
>> 0.38 18.28 0.07 SimpleLruReadPage
>> 0.27 18.33 0.05 SubTransSetParent
>> 0.05 18.34 0.01 136778 0.00 0.00 AllocSetAlloc
>> 0.05 18.35 0.01 42996 0.00 0.00 slot_deform_tuple
>> 0.05 18.36 0.01 42660 0.00 0.00
>> TransactionIdIsCurrentTransactionId
>>
>> it seems we are running into a nice shared buffer / locking contention here
>> and the number of calls explodes (this profiling infos is coming from a seq
>> scan on a 500.000 rows table - 400 mb or so).
>
> That doesn't seem related to the lock manager. Is the long-running
> transaction inserting a lot of tuples (by INSERT or UPDATE) to the same table
> that the seqscan scans? With a lot of different subtransaction xids. That
> profile looks like the seqscan is spending a lot of time swapping pg_subtrans
> pages in and out of the slru buffers.
>
> Increasing NUM_SUBTRANS_BUFFERS should help. A more sophisticated solution
> would be to allocate slru buffers (for clog and other slru caches as well)
> dynamically from shared_buffers. That's been discussed before but no-one has
> gotten around to it.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
hello ...
yeah, it seems this solves the problem.
i had a closer look at the SQL trace and did some more profiling.
this was the case.
many thanks for the quick hint.
hans
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers