Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Andres Freund
On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:
 On 14/02/14 00:49, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?
 Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
 I think we would need to at least need to stop using the fast tuple
 mechanism during gininsert() if it's already too big and do plain
 inserts.
 No, because we've already got a process working on cleaning it out.
 
 In any case, this needs some testing to see if it's an improvement
 or not.
 
 Having some real-world experience with the fastupdate mechanism. Under
 concurrent load
 it behaves really bad. Random processes waiting for cleanup (or competing
 with cleanup) is
 going to see latency-spikes, because they magically hit that corner, thus
 reverting to plain
 inserts if it cannot add to the pending list, will not remove this problem,
 but will
 make it only hit the process actually doing the cleanup.

Yea, this is only a part of fixing fastupdate. Limiting the size of the
fastupdate list to something more reasonable is pretty important as
well. Not competing around cleanup will make cleanup much faster though,
so I am not that concerned about the latency spikes it causes once it's
limited in size and done non-concurrently.

 The build in mechanism, that cleanup is i cost paid by the process who
 happened to fill the pendinglist, is really hard to deal with in
 production. More control is appreciated, perhaps even an explicit
 flush-mechanism..  I'd like to batch up inserts during one transaction
 only and flush on commit.

That doesn't seem likely to work with a reasonable amount of effort. The
fastupdate list is shared across all processes, so one backend will
always pay the price for several others.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:

  The build in mechanism, that cleanup is i cost paid by the process who
  happened to fill the pendinglist, is really hard to deal with in
  production. More control is appreciated, perhaps even an explicit
  flush-mechanism..  I'd like to batch up inserts during one transaction
  only and flush on commit.
 
 That doesn't seem likely to work with a reasonable amount of effort. The
 fastupdate list is shared across all processes, so one backend will
 always pay the price for several others.

Unless some other process does it, such as autovacuum.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andrew Dunstan


On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't 
actually stuck, just livelocked. I'm just on my mobile right now, but 
afair Andrew described a loop involving lots of semaphores and 
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a 
large amount of items one by one, it's not surprising if its 
dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to 
clean up the pending list at a time.




Is that going to serialize all these inserts?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 05:40 PM, Andrew Dunstan wrote:


On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't
actually stuck, just livelocked. I'm just on my mobile right now, but
afair Andrew described a loop involving lots of semaphores and
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a
large amount of items one by one, it's not surprising if its
dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.


Is that going to serialize all these inserts?


It will serialize the cleanup process, which moves entries from the 
pending list to the tree proper. But that's better than the current 
situation. Currently, when two processes attempt it, they will both try 
to insert into the GIN tree, but one of them will notice that the other 
one already did the cleanup, and bail out. So only one process 
contributes to progress, while the others just waste their effort.


The processes should try to get the lock, and just give up if it's 
already held rather than wait. If someone else is already doing the 
cleanup, there's no need for the current process to do it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andrew Dunstan


On 02/13/2014 10:58 AM, Heikki Linnakangas wrote:


Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.


Is that going to serialize all these inserts?


It will serialize the cleanup process, which moves entries from the 
pending list to the tree proper. But that's better than the current 
situation. Currently, when two processes attempt it, they will both 
try to insert into the GIN tree, but one of them will notice that the 
other one already did the cleanup, and bail out. So only one process 
contributes to progress, while the others just waste their effort.


The processes should try to get the lock, and just give up if it's 
already held rather than wait. If someone else is already doing the 
cleanup, there's no need for the current process to do it.





Sounds good.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Perhaps we should use a lock to enforce that only one process tries to
 clean up the pending list at a time.

Something like the attached?  Can somebody who's seen this problem confirm
this improves matters?

(ginInsertCleanup's header comment also needs to be rewritten, but for
testing purposes, this is fine.)

regards, tom lane

diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 4a65046..38cda14 100644
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 21,26 
--- 21,27 
  #include access/gin_private.h
  #include commands/vacuum.h
  #include miscadmin.h
+ #include storage/lmgr.h
  #include utils/memutils.h
  #include utils/rel.h
  
*** ginInsertCleanup(GinState *ginstate,
*** 739,744 
--- 740,755 
  	KeyArray	datums;
  	BlockNumber blkno;
  
+ 	/*
+ 	 * We use a heavyweight lock on the metapage to ensure that only one
+ 	 * backend at a time tries to clean up the pending list.  While it does
+ 	 * actually work for multiple backends to run this code concurrently, that
+ 	 * turns out to be a bad idea because there's lots of locking conflicts.
+ 	 * So if someone else is already running cleanup, we just do nothing.
+ 	 */
+ 	if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
+ 		return;
+ 
  	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
  	LockBuffer(metabuffer, GIN_SHARE);
  	metapage = BufferGetPage(metabuffer);
*** ginInsertCleanup(GinState *ginstate,
*** 748,753 
--- 759,765 
  	{
  		/* Nothing to do */
  		UnlockReleaseBuffer(metabuffer);
+ 		UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
  		return;
  	}
  
*** ginInsertCleanup(GinState *ginstate,
*** 925,930 
--- 937,944 
  
  	ReleaseBuffer(metabuffer);
  
+ 	UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+ 
  	/* Clean up temporary space */
  	MemoryContextSwitchTo(oldCtx);
  	MemoryContextDelete(opCtx);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andres Freund
On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  Perhaps we should use a lock to enforce that only one process tries to
  clean up the pending list at a time.
 
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?
 
 (ginInsertCleanup's header comment also needs to be rewritten, but for
 testing purposes, this is fine.)

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?

 Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
 I think we would need to at least need to stop using the fast tuple
 mechanism during gininsert() if it's already too big and do plain
 inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andres Freund
On 2014-02-13 18:49:21 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
  Something like the attached?  Can somebody who's seen this problem confirm
  this improves matters?
 
  Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
  I think we would need to at least need to stop using the fast tuple
  mechanism during gininsert() if it's already too big and do plain
  inserts.
 
 No, because we've already got a process working on cleaning it out.

Well, appending to the fast tuple list will normally be fully cached io,
cleaning it up not so much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Jesper Krogh

On 14/02/14 00:49, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached?  Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.


Having some real-world experience with the fastupdate mechanism. Under 
concurrent load
it behaves really bad. Random processes waiting for cleanup (or 
competing with cleanup) is
going to see latency-spikes, because they magically hit that corner, 
thus reverting to plain
inserts if it cannot add to the pending list, will not remove this 
problem, but will

make it only hit the process actually doing the cleanup.

The build in mechanism, that cleanup is i cost paid by the process who 
happened to
fill the pendinglist, is really hard to deal with in production. More 
control is appreciated,
perhaps even an explicit flush-mechanism..I'd like to batch up 
inserts during one transaction only

and flush on commit.

--
Jesper - with fastupdate turned off due to above issues.


--
Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andrew Dunstan
A client of ours encountered this problem when upgrading (via slony) 
from 8.4 to 9.2, and then from 8.4 to to 9.3. The application is a 
telephony app that inserts call records at pretty hight volume in the 
busiest part of the day, which usually starts around 10.00 am to 11.00 
am US eastern time. What happened was that the system load average shot 
up from, say, 6 to way over 100, and the database became largely 
unresponsive. This problem did not occur on 8.4. The platform is CentOS 
6.4 / x86_64.


On investigation I found that a number of processes were locked waiting 
for one wedged process to end its transaction, which never happened 
(this transaction should normally take milliseconds). oprofile revealed 
that postgres was spending 87% of its time in s_lock(), and strace on 
the wedged process revealed that it was in a tight loop constantly 
calling select(). It did not respond to a SIGTERM. I attached a debugger 
to the wedged process and got this backtrace:


   #0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
   #1  0x006021a7 in PGSemaphoreLock ()
   #2  0x006463a1 in LWLockAcquire ()
   #3  0x00631997 in ReadBuffer_common ()
   #4  0x006322ee in ReadBufferExtended ()
   #5  0x00467475 in ginPrepareFindLeafPage ()
   #6  0x0046663f in ginInsertItemPointers ()
   #7  0x00462adc in ginEntryInsert ()
   #8  0x0046cad0 in ginInsertCleanup ()
   #9  0x0046d7c6 in ginHeapTupleFastInsert ()
   #10 0x00462cfa in gininsert ()
   #11 0x00722b33 in FunctionCall6Coll ()
   #12 0x0048bdff in index_insert ()
   #13 0x00587595 in ExecInsertIndexTuples ()
   #14 0x005948e1 in ExecModifyTable ()
   #15 0x0057e718 in ExecProcNode ()
   #16 0x0057d512 in standard_ExecutorRun ()
   [...]



Armed with this the client identified a single gin index (on an array of 
text) that could have been involved, and on removing the index the 
problem simply went away, and they have now been live on 9.3 for 36 
hours without a mishap.


The client is attempting to create a self-contained reproducible test, 
but understandably are not going to re-add the index to their production 
system.


I'm not terribly familiar with the gin code, so I'm reporting this in 
the hope that other people who are more familiar with it than I am might 
know where to look for a potential race condition or other bug that 
might have caused this.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andres Freund
On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked waiting for
 one wedged process to end its transaction, which never happened (this
 transaction should normally take milliseconds). oprofile revealed that
 postgres was spending 87% of its time in s_lock(), and strace on the wedged
 process revealed that it was in a tight loop constantly calling select(). It
 did not respond to a SIGTERM. I attached a debugger to the wedged process
 and got this backtrace:
 
#0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
#1  0x006021a7 in PGSemaphoreLock ()
#2  0x006463a1 in LWLockAcquire ()
#3  0x00631997 in ReadBuffer_common ()
#4  0x006322ee in ReadBufferExtended ()
#5  0x00467475 in ginPrepareFindLeafPage ()
#6  0x0046663f in ginInsertItemPointers ()
#7  0x00462adc in ginEntryInsert ()
#8  0x0046cad0 in ginInsertCleanup ()
#9  0x0046d7c6 in ginHeapTupleFastInsert ()
#10 0x00462cfa in gininsert ()
#11 0x00722b33 in FunctionCall6Coll ()
#12 0x0048bdff in index_insert ()
#13 0x00587595 in ExecInsertIndexTuples ()
#14 0x005948e1 in ExecModifyTable ()
#15 0x0057e718 in ExecProcNode ()
#16 0x0057d512 in standard_ExecutorRun ()
[...]
 
 
 
 Armed with this the client identified a single gin index (on an array of
 text) that could have been involved, and on removing the index the problem
 simply went away, and they have now been live on 9.3 for 36 hours without a
 mishap.
 
 The client is attempting to create a self-contained reproducible test, but
 understandably are not going to re-add the index to their production system.
 
 I'm not terribly familiar with the gin code, so I'm reporting this in the
 hope that other people who are more familiar with it than I am might know
 where to look for a potential race condition or other bug that might have
 caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 There's previously been talk about changing the limits to something more
 reasonable but it got stalled in bikeshedding IIRC.

As I recall, there was argument that we didn't really need a new GUC for
this (which was the proposal) but rather just need to pick a reasonable
(small) value and hard-code it.  Are there objections to doing so, or
are there cases where that would be a serious problem?  How do people
feel about 4MB?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked waiting for
 one wedged process to end its transaction, which never happened (this
 transaction should normally take milliseconds). oprofile revealed that
 postgres was spending 87% of its time in s_lock(), and strace on the wedged
 process revealed that it was in a tight loop constantly calling select(). It
 did not respond to a SIGTERM.

 That's a deficiency of the gin fastupdate cache: a) it bases it's size
 on work_mem which usually makes it *far* too big b) it doesn't perform the
 cleanup in one go if it can get a suitable lock, but does independent
 locking for each entry. That usually leads to absolutely horrific
 performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andrew Dunstan


On 02/12/2014 02:50 PM, Andres Freund wrote:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked waiting for
one wedged process to end its transaction, which never happened (this
transaction should normally take milliseconds). oprofile revealed that
postgres was spending 87% of its time in s_lock(), and strace on the wedged
process revealed that it was in a tight loop constantly calling select(). It
did not respond to a SIGTERM. I attached a debugger to the wedged process
and got this backtrace:

#0  0x00386a2eaf37 in semop () from /lib64/libc.so.6
#1  0x006021a7 in PGSemaphoreLock ()
#2  0x006463a1 in LWLockAcquire ()
#3  0x00631997 in ReadBuffer_common ()
#4  0x006322ee in ReadBufferExtended ()
#5  0x00467475 in ginPrepareFindLeafPage ()
#6  0x0046663f in ginInsertItemPointers ()
#7  0x00462adc in ginEntryInsert ()
#8  0x0046cad0 in ginInsertCleanup ()
#9  0x0046d7c6 in ginHeapTupleFastInsert ()
#10 0x00462cfa in gininsert ()
#11 0x00722b33 in FunctionCall6Coll ()
#12 0x0048bdff in index_insert ()
#13 0x00587595 in ExecInsertIndexTuples ()
#14 0x005948e1 in ExecModifyTable ()
#15 0x0057e718 in ExecProcNode ()
#16 0x0057d512 in standard_ExecutorRun ()
[...]



Armed with this the client identified a single gin index (on an array of
text) that could have been involved, and on removing the index the problem
simply went away, and they have now been live on 9.3 for 36 hours without a
mishap.

The client is attempting to create a self-contained reproducible test, but
understandably are not going to re-add the index to their production system.

I'm not terribly familiar with the gin code, so I'm reporting this in the
hope that other people who are more familiar with it than I am might know
where to look for a potential race condition or other bug that might have
caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.



So this doesn't work just when it might be most useful?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Andres Freund
On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
 On investigation I found that a number of processes were locked
waiting for
 one wedged process to end its transaction, which never happened
(this
 transaction should normally take milliseconds). oprofile revealed
that
 postgres was spending 87% of its time in s_lock(), and strace on the
wedged
 process revealed that it was in a tight loop constantly calling
select(). It
 did not respond to a SIGTERM.

 That's a deficiency of the gin fastupdate cache: a) it bases it's
size
 on work_mem which usually makes it *far* too big b) it doesn't
perform the
 cleanup in one go if it can get a suitable lock, but does independent
 locking for each entry. That usually leads to absolutely horrific
 performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't actually 
stuck, just livelocked. I'm just on my mobile right now, but afair Andrew 
described a loop involving lots of semaphores and spinlock, that shouldn't be 
the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large 
amount of items one by one, it's not surprising if its dramatically slow.

Andres
-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-12 Thread Heikki Linnakangas

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't actually 
stuck, just livelocked. I'm just on my mobile right now, but afair Andrew 
described a loop involving lots of semaphores and spinlock, that shouldn't be 
the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large 
amount of items one by one, it's not surprising if its dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to 
clean up the pending list at a time.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers