subject:"Re\: \[HACKERS\] GIN fast insert"

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Teodor Sigaev


Why is tbm_add_page() not coded as a simple wrapper around
tbm_mark_page_lossy()?  As coded, it sometimes forces a whole bunch of
pages *other than* the target page to become lossy too, which I cannot
see a reason for it to be doing.


[after digging in tidbitmap]
Oops, I was wrong, I supposed that all pages in chunk should be lossy, but it's 
true only for chunk page. So, tbm_add_page() should only call 
tbm_mark_page_lossy()...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Tom Lane

Teodor Sigaev teo...@sigaev.ru writes:
 Oops, I was wrong, I supposed that all pages in chunk should be lossy, but 
 it's 
 true only for chunk page. So, tbm_add_page() should only call 
 tbm_mark_page_lossy()...

OK, thanks, that's what I thought.  I've changed it in the copy I'm
editing here.

I have another question.  I added the following comment to
ginInsertCleanup(); is it accurate?  (If it isn't, I think
the code is buggy ...)


 * This can be called concurrently by multiple backends, so it must cope.
 * On first glance it looks completely not concurrent-safe and not crash-safe
 * either.  The reason it's okay is that multiple insertion of the same entry
 * is detected and treated as a no-op by gininsert.c.  If we crash after
 * posting entries to the main index and before removing them from the
 * pending list, it's okay because when we redo the posting later on, nothing
 * bad will happen.  Likewise, if two backends simultaneously try to post
 * a pending entry into the main index, one will succeed and one will do
 * nothing.  We try to notice when someone else is a little bit ahead of
 * us in the process, but that's just to avoid wasting cycles.  Only the
 * action of removing a page from the pending list really needs exclusive
 * lock.


regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Teodor Sigaev


ginInsertCleanup(); is it accurate?  (If it isn't, I think


Exactly, that is right.




 * This can be called concurrently by multiple backends, so it must cope.
 * On first glance it looks completely not concurrent-safe and not crash-safe
 * either.  The reason it's okay is that multiple insertion of the same entry
 * is detected and treated as a no-op by gininsert.c.  If we crash after
 * posting entries to the main index and before removing them from the
 * pending list, it's okay because when we redo the posting later on, nothing
 * bad will happen.  Likewise, if two backends simultaneously try to post
 * a pending entry into the main index, one will succeed and one will do
 * nothing.  We try to notice when someone else is a little bit ahead of
 * us in the process, but that's just to avoid wasting cycles.  Only the
 * action of removing a page from the pending list really needs exclusive
 * lock.


regards, tom lane



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Tom Lane

I've committed this patch with additional editorialization --- mostly
cosmetic changes, except for removing the stats-driven cleanup in favor
of letting cleanups occur during auto-ANALYZE, as per my suggestion
yesterday.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Josh Berkus


On 3/24/09 1:18 PM, Tom Lane wrote:

I've committed this patch with additional editorialization --- mostly
cosmetic changes, except for removing the stats-driven cleanup in favor
of letting cleanups occur during auto-ANALYZE, as per my suggestion
yesterday.


By my count, this was the last patch left for 8.4.  No?

--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-24 Thread David Fetter

On Tue, Mar 24, 2009 at 04:55:47PM -0700, Josh Berkus wrote:
 On 3/24/09 1:18 PM, Tom Lane wrote:
 I've committed this patch with additional editorialization --- mostly
 cosmetic changes, except for removing the stats-driven cleanup in favor
 of letting cleanups occur during auto-ANALYZE, as per my suggestion
 yesterday.

 By my count, this was the last patch left for 8.4.  No?

One more patch left according to the Commitfest wiki: B-Tree emulation
for GIN.

http://wiki.postgresql.org/wiki/CommitFestInProgress

Cheers,
David whisperbeta/
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Alvaro Herrera

Tom Lane wrote:

 On top of those issues, there are implementation problems in the
 proposed relation_has_pending_indexes() check: it has hard-wired
 knowledge about GIN indexes, which means the feature cannot be
 extended to add-on index AMs; and it's examining indexes without any
 lock whatsoever on either the indexes or their parent table.  (And
 we really would rather not let autovacuum take a lock here.)

I wonder if it's workable to have GIN send pgstats a message with number
of fast-inserted tuples, and have autovacuum check that number as well
as dead/live tuples.

ISTM this shouldn't be considered part of either vacuum or analyze at
all, and have autovacuum invoke it separately from both, with its own
decision equations and such.  We could even have a scan over pg_class
just for GIN indexes to implement this.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-03-23 Thread Tom Lane

While I'm looking at this thing...

Why is tbm_add_page() not coded as a simple wrapper around
tbm_mark_page_lossy()?  As coded, it sometimes forces a whole bunch of
pages *other than* the target page to become lossy too, which I cannot
see a reason for it to be doing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 On top of those issues, there are implementation problems in the
 proposed relation_has_pending_indexes() check:

 I wonder if it's workable to have GIN send pgstats a message with number
 of fast-inserted tuples, and have autovacuum check that number as well
 as dead/live tuples.

 ISTM this shouldn't be considered part of either vacuum or analyze at
 all, and have autovacuum invoke it separately from both, with its own
 decision equations and such.  We could even have a scan over pg_class
 just for GIN indexes to implement this.

That's going in the wrong direction IMHO, because it's building
GIN-specific infrastructure into the core system.  There is no need for
any such infrastructure if we just drive it off a post-ANALYZE callback.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Jeff Davis

On Mon, 2009-03-23 at 15:23 -0400, Tom Lane wrote:
 There is no need for any such infrastructure if we just drive it off a
  post-ANALYZE callback.

That sounds reasonable, although it does seem a little strange for
analyze to actually perform cleanup.

Now that we have FSM, the cost of VACUUMing insert-only tables is a lot
less. Does that possibly justify running VACUUM on insert-only tables?
On tables without GIN indexes, that wouldn't be a complete waste,
because it could set hint bits, which needs to be done sometime anyway.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 On Mon, 2009-03-23 at 15:23 -0400, Tom Lane wrote:
 There is no need for any such infrastructure if we just drive it off a
 post-ANALYZE callback.

 That sounds reasonable, although it does seem a little strange for
 analyze to actually perform cleanup.

My thought was to have GIN do cleanup only in an autovacuum-driven
ANALYZE, not in a client-issued ANALYZE.  You could argue it either way
I suppose, but I agree that if a user says ANALYZE he's probably not
expecting index cleanup to happen.

 Now that we have FSM, the cost of VACUUMing insert-only tables is a lot
 less.

Well, not if you just did a huge pile of inserts, which is the case
that we need to worry about here.

 On tables without GIN indexes, that wouldn't be a complete waste,
 because it could set hint bits, which needs to be done sometime anyway.

True, but we have not chosen to make autovacuum do that, and whether we
should or not seems to me to be orthogonal to when GIN index cleanup
should happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Teodor Sigaev

Teodor, can you confirm 
* we WAL log the insert into the pending list

* we WAL log the move from the pending list to the main index

Yes, I can and I confirm


* that we maintain the pending list correctly during redo so that it can
be accessed by index scans

Not sure about correct locking in ginxlog.c - but it should fixable rather easy.



The main thing with Hot Standby is that we can't do any writes. So a
pending list cannot change solely because of a gingettuple call on the
*standby*. But that's easy to disable. If all the inserts happened on

Right. And suggest to increase work_mem.


the primary node and all the reads happened on the standby, then pending
list would never be cleaned up if the cleanup is triggered only by read.
I would suggest that we trigger cleanup by read at threshold size X and
trigger cleanup by insert at threshold size 5X. That avoids the strange
threshold is not helpful here because for gininsert and gingettuple it depends 
on work_mem setting which could be very different on slave(s) and master. Next, 
it's possible that master never executes read-only queries (i.e. gingettuple 
will not be called) by application's design.


This is a strong objection for invocation of cleanup from gingettuple. So, I'm 
inclined to Tom's idea about removing gingettuple interface although GIN with 
fastupdate=off doesn't affect discussed problem .



I found many parts of the patch and docs quite confusing because of the
way things are named. For me, this is a deferred or delayed insert
technique to allow batching. I would prefer if everything used one
description, rather than fast, pending, delayed etc. 

Terminology was taken from inverted index technique.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Teodor Sigaev


it be?  So far we've ruled out using the planner to prevent index
scans when the pending list is long (because it's not reliable) and
cleaning up the pending list during insert when needed (because it
won't work with Hot Standby).  We haven't decided what WILL work,


During insert it will work with Hot Standby because there is no any limitation 
for number of pages touched or WAL records. There is a problem with cleanup 
invoked by gingettuple - slave could not start cleanup process at all and hence 
it could emit an error if tidbitmap becomes lossy.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Robert Haas

On Thu, Feb 26, 2009 at 11:41 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 it be?  So far we've ruled out using the planner to prevent index
 scans when the pending list is long (because it's not reliable) and
 cleaning up the pending list during insert when needed (because it
 won't work with Hot Standby).  We haven't decided what WILL work,

 During insert it will work with Hot Standby because there is no any
 limitation for number of pages touched or WAL records. There is a problem
 with cleanup invoked by gingettuple - slave could not start cleanup process
 at all and hence it could emit an error if tidbitmap becomes lossy.

I didn't think about that option - might be reasonable.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On my system this takes about 45 ms to execute with default settings
 and about 90 ms to execute with index scan disabled.

[ shrug... ]  That's well within my threshold of pain for this.
In any case, it might be possible to buy some/all of that back with
minor optimization effort on the bitmap-scan code paths; nobody's
ever really bothered to profile that AFAIK.  There is no real
difference in the useful work (page and tuple fetches) getting done
in the two cases, so there's no reason in principle for bitmap scan
to be much slower than indexscan here.  The LIMIT case is the only
one I'm aware of where there's a fundamental reason that bitmap scan
should be slower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Robert Haas

On Tue, Feb 24, 2009 at 10:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On my system this takes about 45 ms to execute with default settings
 and about 90 ms to execute with index scan disabled.

 [ shrug... ]  That's well within my threshold of pain for this.
 In any case, it might be possible to buy some/all of that back with
 minor optimization effort on the bitmap-scan code paths; nobody's
 ever really bothered to profile that AFAIK.  There is no real
 difference in the useful work (page and tuple fetches) getting done
 in the two cases, so there's no reason in principle for bitmap scan
 to be much slower than indexscan here.  The LIMIT case is the only
 one I'm aware of where there's a fundamental reason that bitmap scan
 should be slower.

Uh, a semi or anti join stops as soon as one matching row is found,
doesn't it?  ISTM that a semi or anti join is in essence an iterated
limit 1 clause.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Jeff Davis

On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: 
 It only took me about 5 minutes to come up with a test case against CVS
  HEAD where disabling index scans resulted in a significant dropoff in
  performance.  Here it is:

On the other hand, Teodor showed a typical use case and a very
substantial performance gain:

http://archives.postgresql.org/message-id/497f4606.6070...@sigaev.ru

My impression is that GIN is used almost entirely for full text search.
Even though GIN is actually quite general (as the BTree patch shows),
the current users we have to worry about are a fairly narrow group
(correct me if I'm wrong here).

I wonder how many people really use GIN with non-bitmap scans for some
benefit? And even if the benefit exists, does the planner have a way to
identify those cases reliably, or does it have to be done manually?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: 
 It only took me about 5 minutes to come up with a test case against CVS
 HEAD where disabling index scans resulted in a significant dropoff in
 performance.  Here it is:

 On the other hand, Teodor showed a typical use case and a very
 substantial performance gain:

Yeah.  Whatever we do here is a tradeoff (and whether Robert likes it
or not, reliability and code maintainability weigh heavily in the
tradeoff).

 I wonder how many people really use GIN with non-bitmap scans for some
 benefit? And even if the benefit exists, does the planner have a way to
 identify those cases reliably, or does it have to be done manually?

A relevant point there is that most of the estimator functions for
GIN-amenable operators are just smoke and mirrors; so if the planner
is making a good choice between indexscan and bitmapscan at all, it's
mostly luck.  This might get better someday, but not in 8.4.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Robert Haas

On Tue, Feb 24, 2009 at 2:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On the other hand, Teodor showed a typical use case and a very
 substantial performance gain:

 Yeah.  Whatever we do here is a tradeoff (and whether Robert likes it
 or not, reliability and code maintainability weigh heavily in the
 tradeoff).

I have no problem with reliability or code maintainability and I'm not
sure what I said that would give that impression.  If the consensus of
the group is that the performance loss from dropping index scans is
not important, then I'm fine with that, especially if that consensus
is reached in the context of an educated knowledge of what that
performance loss is likely to be.  To me, a 2x slowdown on two-table
anti-join seems pretty bad, but I just work here.  Perhaps nobody else
thinks that a semi-join or anti-join against a GIN index is a
plausible use case (like, find all of the words from the following
list that do not appear in any document)?

If everyone agrees that we don't care about that case (or about
ORDER-BY-without-LIMIT, which is certainly less compelling), then go
ahead and remove it.  I have no horse in this race other than having
been asked to review the patch, which I did.

On the other hand, if a significant number of people think that it
might be a bad idea to make that case significantly worse, then some
redesign work is called for, and that may mean the patch needs to get
bumped.

My own opinion is that it is better to decide on the right design and
then figure out which release that design can go into than it is to
start by deciding this has to go into 8.4 and then figuring out what
can be done in that period of time.  I don't think there is any
question that making GIN continue to support both index scans and
bitmap index scans will make the code more complex, but how bad will
it be?  So far we've ruled out using the planner to prevent index
scans when the pending list is long (because it's not reliable) and
cleaning up the pending list during insert when needed (because it
won't work with Hot Standby).  We haven't decided what WILL work,
apart from ripping out index scans altogether, so to some degree we're
comparing against an unknown.

 I wonder how many people really use GIN with non-bitmap scans for some
 benefit? And even if the benefit exists, does the planner have a way to
 identify those cases reliably, or does it have to be done manually?

 A relevant point there is that most of the estimator functions for
 GIN-amenable operators are just smoke and mirrors; so if the planner
 is making a good choice between indexscan and bitmapscan at all, it's
 mostly luck.  This might get better someday, but not in 8.4.

Based on the limited testing I've done thus far, it appears to pick an
index scan for small numbers of rows and a bitmap index scan for
larger number of rows.  Index scans will have lower startup costs
which can be valuable if you only need to scan part of the index (as
in the semi and anti join cases).  I haven't done enough testing to
see if there is any benefit when scanning the whole index and only
returning a few tuples.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Simon Riggs


On Thu, 2009-02-19 at 22:43 -0500, Robert Haas wrote:
  I don't see a problems here, because indexes in postgres don't
 depend on any
  transaction's ids or modes as heap depends. WAL-logger works without
 that
  knowledge too. May be I missed something here or don't understand.
 
  Although heap's pages could be changed by setting commit-status bits
 on
  tuple even in read-only transaction, but that changes are not
 WAL-logged.
  That is correct for index's page too.
 
 It would be helpful if Heikki or Simon could jump in here, but my
 understanding is that cleaning up the pending list is a read-write
 operation.  I don't think we can do that on a hot standby server.

From reading the docs with the patch the pending list is merged into the
main index when a VACUUM is performed. I (think I) can see that
additions to the pending list are WAL logged, so that will work in Hot
Standby. I also see ginEntryInsert() calls during the move from pending
list to main index which means that is WAL logged also. AFAICS this
*could* work during Hot Standby mode.
Best check here http://wiki.postgresql.org/wiki/Hot_Standby#Usage
rather than attempt to read the patch.
Teodor, can you confirm 
* we WAL log the insert into the pending list
* we WAL log the move from the pending list to the main index
* that we maintain the pending list correctly during redo so that it can
be accessed by index scans

The main thing with Hot Standby is that we can't do any writes. So a
pending list cannot change solely because of a gingettuple call on the
*standby*. But that's easy to disable. If all the inserts happened on
the primary node and all the reads happened on the standby, then pending
list would never be cleaned up if the cleanup is triggered only by read.
I would suggest that we trigger cleanup by read at threshold size X and
trigger cleanup by insert at threshold size 5X. That avoids the strange
case mentioned, but generally ensures only reads trigger cleanup. (But
why do we want that??)

I found many parts of the patch and docs quite confusing because of the
way things are named. For me, this is a deferred or delayed insert
technique to allow batching. I would prefer if everything used one
description, rather than fast, pending, delayed etc. 

Personally, I see ginInsertCleanup() as a scheduled task unrelated to
vacuum. Making the deferred tasks happen at vacuum time is just a
convenient way of having a background task occur regularly. That's OK
for now, but I would like to be able to request a background task
without having to hook into AV.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas

On Mon, Feb 23, 2009 at 4:56 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It would be helpful if Heikki or Simon could jump in here, but my
 understanding is that cleaning up the pending list is a read-write
 operation.  I don't think we can do that on a hot standby server.

 From reading the docs with the patch the pending list is merged into the
 main index when a VACUUM is performed. I (think I) can see that
 additions to the pending list are WAL logged, so that will work in Hot
 Standby. I also see ginEntryInsert() calls during the move from pending
 list to main index which means that is WAL logged also. AFAICS this
 *could* work during Hot Standby mode.
 Best check here http://wiki.postgresql.org/wiki/Hot_Standby#Usage
 rather than attempt to read the patch.
 Teodor, can you confirm
 * we WAL log the insert into the pending list
 * we WAL log the move from the pending list to the main index
 * that we maintain the pending list correctly during redo so that it can
 be accessed by index scans

 The main thing with Hot Standby is that we can't do any writes. So a
 pending list cannot change solely because of a gingettuple call on the
 *standby*.

That's what I thought.  Thanks for confirming.

 But that's easy to disable. If all the inserts happened on
 the primary node and all the reads happened on the standby, then pending
 list would never be cleaned up if the cleanup is triggered only by read.

No, because the inserts would trigger VACUUM on the primary.

 I would suggest that we trigger cleanup by read at threshold size X and
 trigger cleanup by insert at threshold size 5X. That avoids the strange
 case mentioned, but generally ensures only reads trigger cleanup. (But
 why do we want that??)

I think that's actually not what we want.  What we want is for VACUUM
to deal with it.  Unfortunately that's hard to guarantee since, for
example, someone might turn autovacuum off.  So the issue is what do
we do when we're in the midst of an index scan and our TIDBitmap has
become lossy.  Right now, the answer is that we clean up the pending
list from inside the index scan and then retry the index scan.  I
don't think that's going to work.

I'm starting to think that the right thing to do here is to create a
non-lossy option for TIDBitmap.  Tom has been advocating just losing
the index scan AM altogether, but that risks losing performance in
cases where a LIMIT would have stopped the scan well prior to
completion.

 I found many parts of the patch and docs quite confusing because of the
 way things are named. For me, this is a deferred or delayed insert
 technique to allow batching. I would prefer if everything used one
 description, rather than fast, pending, delayed etc.

I mentioned this in my previous review (perhaps not quite so
articulately) and I completely agree with you.  It's clear enough
reading the patch because you know that all the changes in the patch
must be related to each other, but once it's applied it's going to be
tough to figure out.

 Personally, I see ginInsertCleanup() as a scheduled task unrelated to
 vacuum. Making the deferred tasks happen at vacuum time is just a
 convenient way of having a background task occur regularly. That's OK
 for now, but I would like to be able to request a background task
 without having to hook into AV.

This has been discussed previously and I assume you will be submitting
a patch at some point, since no one else has volunteered to implement
it.  I think autovacuum is the right way to handle this particular
case because it is a cleanup operation that is not dependent on time
but on write activity and hooks into more or less the same stats
infrastructure, but I don't deny the existence of other cases that
would benefit from a scheduler.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I'm starting to think that the right thing to do here is to create a
 non-lossy option for TIDBitmap.  Tom has been advocating just losing
 the index scan AM altogether, but that risks losing performance in
 cases where a LIMIT would have stopped the scan well prior to
 completion.

Actually, I'm going to *insist* that we lose the index AM scan
altogether.  There might be a possibility to put it back in 8.5 or later
if anyone actually makes the case (with some evidence) that it's worth
the trouble.  But right now, what this patch needs is to be made to work
reliably, and having a great deal of complexity added by an inessential
feature is a good way to make sure it doesn't go in at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas

On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm starting to think that the right thing to do here is to create a
 non-lossy option for TIDBitmap.  Tom has been advocating just losing
 the index scan AM altogether, but that risks losing performance in
 cases where a LIMIT would have stopped the scan well prior to
 completion.

 Actually, I'm going to *insist* that we lose the index AM scan
 altogether.  There might be a possibility to put it back in 8.5 or later
 if anyone actually makes the case (with some evidence) that it's worth
 the trouble.  But right now, what this patch needs is to be made to work
 reliably, and having a great deal of complexity added by an inessential
 feature is a good way to make sure it doesn't go in at all.

Except that the inessential feature in question is a feature that
currently WORKS, and I don't believe that the testing you've done is
anywhere near sufficient to show that no one will be upset if it goes
away.  Without some convincing evidence to support that proposition, I
think it would be better to postpone the whole patch to 8.5 and use
that time to fix the problem, rather than slam something through now
and then possibly find out that we've introduced a performance
regression in cases that used to work well.

In fact, as far as I can see, the performance testing that has been
done on this patch to date is fairly one-sided.  It focuses on the
improvement to index build speed, which is a good thing to test, but I
don't believe anyone has done any testing at all of the cases where
the patch might HURT.  Previously, the relevant case was where someone
has done a bunch of fast-inserts, but autovacuum has not kicked in
yet, or the threshold to trigger autovacuum has not been reached.  Now
we're adding to that every query that currently uses an index scan,
which seemed to be a pretty common plan in the simple cases I tested.

Maybe I'm worrying over nothing?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Actually, I'm going to *insist* that we lose the index AM scan
 altogether.

 Except that the inessential feature in question is a feature that
 currently WORKS, and I don't believe that the testing you've done is
 anywhere near sufficient to show that no one will be upset if it goes
 away.

What feature is that --- the ability to get an undefined subset of rows
quickly by using LIMIT without ORDER BY?  Not much of a feature.

 Without some convincing evidence to support that proposition, I
 think it would be better to postpone the whole patch to 8.5 and use
 that time to fix the problem,

Wouldn't bother me any.  We are way overdue for 8.4 already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas

On Mon, Feb 23, 2009 at 1:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Actually, I'm going to *insist* that we lose the index AM scan
 altogether.

 Except that the inessential feature in question is a feature that
 currently WORKS, and I don't believe that the testing you've done is
 anywhere near sufficient to show that no one will be upset if it goes
 away.

 What feature is that --- the ability to get an undefined subset of rows
 quickly by using LIMIT without ORDER BY?  Not much of a feature.

That's a huge over-generalization of the effect of removing index
scans, as I am sure you are well aware.  It only took me about 5
minutes to come up with a test case against CVS HEAD where disabling
index scans resulted in a significant dropoff in performance.  Here it
is:

create table foo (id serial, x int[], primary key (id));
create index foo_gin on foo using gin (x);
insert into foo (x) select array[random()*1::integer,
random()*1::integer, random()*1::integer,
random()*1::integer] from generate_series(1,1);
analyze foo;

OK, now here's the query:

select sum(1) from generate_series(1,1) g left join foo on
array[g] @ x where x is null;

On my system this takes about 45 ms to execute with default settings
and about 90 ms to execute with index scan disabled.

 Without some convincing evidence to support that proposition, I
 think it would be better to postpone the whole patch to 8.5 and use
 that time to fix the problem,

 Wouldn't bother me any.  We are way overdue for 8.4 already.

I completely agree, but there are four patches on the CommitFest wiki
that still need some committer attention before we close up shop:

B-Tree Emulation for GIN
Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Proposal of PITR performance improvement
SE-PostgreSQL Lite

I have reviewed the second of these already and believe it's in pretty
good shape and may review one or more of the others as time permits,
especially if you or one of the other committers express an opinion
that it would be helpful for me to do that and especially if you
express an opinion on which of them it would be most helpful for.

It is well past time to move Reducing some DDL Locks to ShareLock to
committed and leave the unapplied portions for 8.5.  As much as I
would like to have the feature, it is probably also time to think
about punting Hot Standby unless it's going to get committed RSN.
At this point, we are definitely holding up both the release of 8.4
and development for 8.5.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-20 Thread Teodor Sigaev


Right, can't do that on a hot standby server.


Is anywhere applicable hot standby patch? Last version on wiki is 9g and it 
can't be applied cleanly.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-20 Thread Heikki Linnakangas


Teodor Sigaev wrote:

Right, can't do that on a hot standby server.


Is anywhere applicable hot standby patch? Last version on wiki is 9g and 
it can't be applied cleanly.


The latest version is in Simon's git repository at:

http://git.postgresql.org/?p=~sriggs/simon.git;a=shortlog;h=refs/heads/dev_hot_standby

in the dev_hot_standby branch. I don't think he's posted an updated 
patch based on that work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Teodor Sigaev


and I still think that's bogus.  It seems clear that this is going to
change much faster than plans are going to be invalidated, and if
autovacuum is doing its job, the pending list won't get long enough to
matter much anyway, right?  I don't think this patch should be
touching gincostestimate at all.
Why not? For any estimation function it's possible to invent workload which will 
change cost (or optimal plan) much faster than plan's invalidation. 
Gincostestimate depends on statistic on table, not on real index's state, and 
plan's invalidation will occur after analyze run.




I am thinking that it is may not be necessary to remove the
gingettuple interface (as you did in v0.28.2).  Forcing a cleanup of
the pending list seems like a reasonable workaround.  We don't expect
this situation to come up frequently, so if the method we use to

Agree


handle it is not terribly efficient, oh well.  The one thing that
concerns me is - what will happen in a hot standby environment, when
that patch is committed?  In that situation, I believe that we can't
call modify any heap or index pages, so...


I don't see a problems here, because indexes in postgres don't depend on any 
transaction's ids or modes as heap depends. WAL-logger works without that 
knowledge too. May be I missed something here or don't understand.


Although heap's pages could be changed by setting commit-status bits on tuple 
even in read-only transaction, but that changes are not WAL-logged. That is 
correct for index's page too.



1. The description of the fastupdate reloption should be reworded
for consistency with other options:
Enables fast update feature for this GIN index

Fixed



2. Why is this implemented as a string reloption rather than a boolean
reloption?  It seems like you want to model this on
autovacuum_enabled.

Fixed


3. Documentation wordsmithing.  You have the following paragraph:

Done


4. More wordsmithing.  In the following paragraph, you have:

Done


5. In textsearch.sgml, you say that GIN indexes are moderately slower
to update, but about 10x slower without fastupdate.  Can we provide a
real number in place of moderately?  I don't know whether to think
this means 20% or 2x.

Will make exact measurement some later :)



6. One of the comment changes in startScanEntry is simply a correction
of a typographical error (deletion for deletition).  You might as
well commit this change separately and remove it from this patch.

Sorry, I don't find what you mean. I found only begining (and fixed it)



7. pg_stat_get_fresh_inserted_tuples.  I am not crazy about the fact
that we call this the pending list in some places, fast update in some
places, and now, here, fresh tuples.  Let's just call it fast insert
tuples.
It's really inserted (fresh) tuples in table since last vacuum. This number 
doesn't depend on existence of GIN indexes or its modes.




8. tbm_check_tuple.  The opening curly brace should be uncuddled.  The
curly braces around wordnum = bitnum = 0 are superfluous.

It's a copy/paste code from tbm_add_tuples, and I'd like
if
{
...
}
else
{
...
}

much more than
if
..
else
{

}



9. gincostestimate.  There are a lot of superfluous whitespace changes
here, and some debugging code that obviously wasn't fully removed.

Cleaned.



10. GinPageOpaqueData.  Surely we can come up with a better name than
GIN_LIST. This is yet another name for the same concept.  Why not call
this GIN_FAST_INSERT_LIST?
Like other defines here - GIN_DATA, not a GIN_DATA_PAGE, GIN_LEAF - not a 
GIN_LEAF_TREE_PAGE.




11. ginInsertCleanup.  Inserion is a typo.

fixed


Unfortunately, I don't understand all of this patch well enough to
give it as thorough a going-over as it deserves, so my apologies for
whatever I've missed.

Thank you a lot for your reviewing.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.29.1.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Robert Haas

On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 and I still think that's bogus.  It seems clear that this is going to
 change much faster than plans are going to be invalidated, and if
 autovacuum is doing its job, the pending list won't get long enough to
 matter much anyway, right?  I don't think this patch should be
 touching gincostestimate at all.

 Why not? For any estimation function it's possible to invent workload which
 will change cost (or optimal plan) much faster than plan's invalidation.
 Gincostestimate depends on statistic on table, not on real index's state,
 and plan's invalidation will occur after analyze run.

Hrm, hum, maybe you're right.

 I am thinking that it is may not be necessary to remove the
 gingettuple interface (as you did in v0.28.2).  Forcing a cleanup of
 the pending list seems like a reasonable workaround.  We don't expect
 this situation to come up frequently, so if the method we use to

 Agree

 handle it is not terribly efficient, oh well.  The one thing that
 concerns me is - what will happen in a hot standby environment, when
 that patch is committed?  In that situation, I believe that we can't
 call modify any heap or index pages, so...

 I don't see a problems here, because indexes in postgres don't depend on any
 transaction's ids or modes as heap depends. WAL-logger works without that
 knowledge too. May be I missed something here or don't understand.

 Although heap's pages could be changed by setting commit-status bits on
 tuple even in read-only transaction, but that changes are not WAL-logged.
 That is correct for index's page too.

It would be helpful if Heikki or Simon could jump in here, but my
understanding is that cleaning up the pending list is a read-write
operation.  I don't think we can do that on a hot standby server.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Heikki Linnakangas


Robert Haas wrote:

On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev teo...@sigaev.ru wrote:

handle it is not terribly efficient, oh well.  The one thing that
concerns me is - what will happen in a hot standby environment, when
that patch is committed?  In that situation, I believe that we can't
call modify any heap or index pages, so...

I don't see a problems here, because indexes in postgres don't depend on any
transaction's ids or modes as heap depends. WAL-logger works without that
knowledge too. May be I missed something here or don't understand.

Although heap's pages could be changed by setting commit-status bits on
tuple even in read-only transaction, but that changes are not WAL-logged.
That is correct for index's page too.


It would be helpful if Heikki or Simon could jump in here, but my
understanding is that cleaning up the pending list is a read-write
operation.  I don't think we can do that on a hot standby server.


Right, can't do that on a hot standby server.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-18 Thread Robert Haas

On Tue, Feb 17, 2009 at 2:28 PM, Teodor Sigaev teo...@sigaev.ru wrote:
 Hi there,

 we present two variants of GIN fast insert patch, since we're not sure
 what is a the best solution:

 v0.28.1
 - remove disable cost in gincostestimate
 - per http://archives.postgresql.org/message-id/499466d2.4010...@sigaev.ru
  gingettuple could force cleanup of pending list if it got a lossy
 tidbitmap.
  If such cleanup occurs the gingettuple will rescan pending list again.

 v0.28.2
 - remove disable cost in gincostestimate
 - per
 http://archives.postgresql.org/message-id/12795.1234379...@sss.pgh.pa.us
  AM can now have only one search method: amgettuple or amgetbitmap.
 - GIN now has only amgetbitmap interface

I reviewed v0.28.1.  I see that disable_cost is gone from
gincostestimate, but you're still using the pending list to set costs,
and I still think that's bogus.  It seems clear that this is going to
change much faster than plans are going to be invalidated, and if
autovacuum is doing its job, the pending list won't get long enough to
matter much anyway, right?  I don't think this patch should be
touching gincostestimate at all.

I am thinking that it is may not be necessary to remove the
gingettuple interface (as you did in v0.28.2).  Forcing a cleanup of
the pending list seems like a reasonable workaround.  We don't expect
this situation to come up frequently, so if the method we use to
handle it is not terribly efficient, oh well.  The one thing that
concerns me is - what will happen in a hot standby environment, when
that patch is committed?  In that situation, I believe that we can't
call modify any heap or index pages, so...

Some other assorted minor comments on v0.28.1...

1. The description of the fastupdate reloption should be reworded
for consistency with other options:

Enables fast update feature for this GIN index

2. Why is this implemented as a string reloption rather than a boolean
reloption?  It seems like you want to model this on
autovacuum_enabled.

3. Documentation wordsmithing.  You have the following paragraph:

As of productnamePostgreSQL/productname 8.4, this problem is
alleviated by postponing most of the work until the next
commandVACUUM/.  Newly inserted index entries are temporarily
stored in an unsorted list of pending entries commandVACUUM/
inserts all pending entries into the main acronymGIN/acronym index
data structure, using the same bulk insert techniques used during
initial index creation.  This greatly improves acronymGIN/acronym
index update speed, even counting the additional vacuum overhead.

Here is my proposed rewrite:

As of productnamePostgreSQL/productname 8.4, acronymGIN/ is
capable of postponing much of this work by inserting new tuples into a
temporary, unsorted list of pending entries.  When the table is
vacuumed, or in some cases when the pending list becomes too large,
the entries are moved to the main acronymGIN/acronym data
structure using the same bulk insert techniques used during initial
index creation.  This greatly improves acronymGIN/acronym index
update speed, even counting the additional vacuum overhead.

4. More wordsmithing.  In the following paragraph, you have:

It's recommended to use properly-configured autovacuum with tables
having acronymGIN/acronym indexes, to keep this overhead to
reasonable levels.

I think it is clearer and more succinct to write simply:

Proper use of autovacuum can minimize this problem.

5. In textsearch.sgml, you say that GIN indexes are moderately slower
to update, but about 10x slower without fastupdate.  Can we provide a
real number in place of moderately?  I don't know whether to think
this means 20% or 2x.

6. One of the comment changes in startScanEntry is simply a correction
of a typographical error (deletion for deletition).  You might as
well commit this change separately and remove it from this patch.

7. pg_stat_get_fresh_inserted_tuples.  I am not crazy about the fact
that we call this the pending list in some places, fast update in some
places, and now, here, fresh tuples.  Let's just call it fast insert
tuples.

8. tbm_check_tuple.  The opening curly brace should be uncuddled.  The
curly braces around wordnum = bitnum = 0 are superfluous.

9. gincostestimate.  There are a lot of superfluous whitespace changes
here, and some debugging code that obviously wasn't fully removed.

10. GinPageOpaqueData.  Surely we can come up with a better name than
GIN_LIST. This is yet another name for the same concept.  Why not call
this GIN_FAST_INSERT_LIST?

11. ginInsertCleanup.  Inserion is a typo.

Unfortunately, I don't understand all of this patch well enough to
give it as thorough a going-over as it deserves, so my apologies for
whatever I've missed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev


So?  Barring some evidence that there's a significant performance win
from a conventional indexscan, this is a weak argument.  AFAICS the only
significant advantage of the conventional API is to support ordered
scans, and GIN doesn't do that anyway.

What about  SELECT ... AND EXISTS (SELECT ... t @@ query) ?
But I don't believe that is popular use-case. In most cases, GIN is used with 
bitmap scan. Your emails are so convincing and I'll remove support amgettuple 
interface in GIN.


Do you think we need to add new pg_am boolean option? Like pg_am.amcangettuple 
or pg_am.amcanpertuplescan





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Heikki Linnakangas


Teodor Sigaev wrote:

So?  Barring some evidence that there's a significant performance win
from a conventional indexscan, this is a weak argument.  AFAICS the only
significant advantage of the conventional API is to support ordered
scans, and GIN doesn't do that anyway.

What about  SELECT ... AND EXISTS (SELECT ... t @@ query) ?
But I don't believe that is popular use-case. In most cases, GIN is used 
with bitmap scan. Your emails are so convincing and I'll remove support 
amgettuple interface in GIN.


SELECT * FROM foo WHERE t @@ query LIMIT 100

might be a fairly common use case.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev


SELECT * FROM foo WHERE t @@ query LIMIT 100
might be a fairly common use case.


It seems to me -
SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100;



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Robert Haas

On Fri, Feb 13, 2009 at 8:00 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 SELECT * FROM foo WHERE t @@ query LIMIT 100
 might be a fairly common use case.

 It seems to me -
 SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100;

I'm not sure you can really assume that the ORDER BY rank will always
be in there.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Tom Lane

Teodor Sigaev teo...@sigaev.ru writes:
 Do you think we need to add new pg_am boolean option? Like 
 pg_am.amcangettuple 
 or pg_am.amcanpertuplescan

I think it's sufficient to mark this by setting amgettuple to zero.

We might as well allow amgetbitmap to be zero, too, to mark the case
where the AM doesn't want to support bitmap scans.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Teodor Sigaev wrote:
 But I don't believe that is popular use-case. In most cases, GIN is used 
 with bitmap scan. Your emails are so convincing and I'll remove support 
 amgettuple interface in GIN.

 SELECT * FROM foo WHERE t @@ query LIMIT 100
 might be a fairly common use case.

[ shrug... ]  The proposed implementation fails to be particularly
fast-start anyway, since it will process the entire pending queue
before returning anything to the executor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev


[ shrug... ]  The proposed implementation fails to be particularly
fast-start anyway, since it will process the entire pending queue
before returning anything to the executor.


That is not true for fastupdate=off. But in any case it could be improved, but 
improvements doesn't solve the issue with lossy bitmap.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert database hang

2009-02-12 Thread Teodor Sigaev


This freezes the whole system even with autovacuum = off in
postgresql.conf.  As before, the backends wait on a semop() call.
Fixed. There was a deadlock of LockBufferForCleanup and LockBuffer(SHARE). 
Redesign that place to downgrade  LockBufferForCleanup to LockBuffer(EXCLUSIVE) 
with correction of page's locking during scan of pending list.




I was able to reproduce the recovery failure this way once as well,
but that part of the problem seems to be much more erratic.  Most of

Fixed too. I missed comments on XLogInsert:
 * NB: this routine feels free to scribble on the XLogRecData structs,
 * though not on the data they reference.  This is OK since the XLogRecData
 * structs are always just temporaries in the calling code.

and I reused once initialized XLogRecData many times in a loop.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.27.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Teodor Sigaev


But the *real* problem is that you simply can not guarantee that
someone doesn't increase the size of the pending list between the time


If insertion process has bigger work_mem. Agree.


What did you think of the idea of simply abandoning support for
conventional indexscans in GIN?  I agree that we could probably kluge
something to make conventional scans still work reliably, but it seems
to me that it'd be ugly, fragile, and quite possibly slow enough to not
ever beat bitmap scans anyway.


I don't like this idea because it forbids conventional indexscans even with 
fastupdate=off.


May readonly query change the index? Index doesn't use xmin/xmax/cmin/cmax 
anyhow, so it doesn't depend on transaction state. If so, gingettuple could make 
cleanup of pending list if it got lossy bitmap and repeat search. Although it 
could  be slow but it will never produce a failures and it will cause very rare 
(and GIN could emit WARNING/NOTICE/LOG message). And this solution allows to 
remove disabling of indexscan in gincostestimate.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Tom Lane

Teodor Sigaev teo...@sigaev.ru writes:
 What did you think of the idea of simply abandoning support for
 conventional indexscans in GIN?

 I don't like this idea because it forbids conventional indexscans even with 
 fastupdate=off.

So?  Barring some evidence that there's a significant performance win
from a conventional indexscan, this is a weak argument.  AFAICS the only
significant advantage of the conventional API is to support ordered
scans, and GIN doesn't do that anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Robert Haas

On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Teodor Sigaev teo...@sigaev.ru writes:
 What did you think of the idea of simply abandoning support for
 conventional indexscans in GIN?

 I don't like this idea because it forbids conventional indexscans even with
 fastupdate=off.

 So?  Barring some evidence that there's a significant performance win
 from a conventional indexscan, this is a weak argument.  AFAICS the only
 significant advantage of the conventional API is to support ordered
 scans, and GIN doesn't do that anyway.

Wouldn't it force you to recheck all tuples on the page, instead of
just rechecking the one of interest?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So?  Barring some evidence that there's a significant performance win
 from a conventional indexscan, this is a weak argument.  AFAICS the only
 significant advantage of the conventional API is to support ordered
 scans, and GIN doesn't do that anyway.

 Wouldn't it force you to recheck all tuples on the page, instead of
 just rechecking the one of interest?

In the scenario at hand you'd have to do that anyway.

Bear in mind that if the query is predicted to return more than a few
rows, the planner is always going to pick bitmap scan anyway.  So this
whole issue is really only going to arise when you have a very bad
rowcount prediction (or a very stale plan), leading to a choice of
indexscan plan followed by enough rows actually returned to make the TID
bitmap become lossy.  That's certainly within the realm of possibility,
particularly since we often don't have good estimators for
GIN-compatible operators.  But I think designing to squeeze every last
bit of performance out of the case is a mistake.  We should be satisfied
to have correctness.

In the end this is a tradeoff: how much complexity and loss of
maintainability are we willing to accept to squeeze out a bit more
performance?  I'm leaning to the KISS end of that choice.  The tests
I did yesterday suggested to me that it would be difficult even to
measure a performance gain from supporting conventional indexscan in
GIN.  IMHO the kinds of schemes that are being tossed around here are
not remotely sane to choose if they don't lead to *big* wins.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Teodor Sigaev


What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions.


No, I don't think that's it.  See here, beginning with the problem
with lossy tbm has two aspects:

http://archives.postgresql.org/message-id/4974b002.3040...@sigaev.ru

Right. Some comments to that points:

 - amgettuple interface hasn't possibility to work with page-wide result instead
   of exact ItemPointer. amgettuple can not return just a block number as
   amgetbitmap can.

It's not so difficult to teach GIN to return ItemPointer one by one from pending 
list and eliminate this point. GIN will not collect matched ItemPointers in 
tidbitmap and will return them immediately. But:


 - Because of concurrent vacuum process: while we scan pending list, it's
   content could be transferred into regular structure of index and then we will
   find the same tuple twice. Again, amgettuple hasn't protection from that,
   only amgetbitmap has it. So, we need to filter results from regular GIN
   by results from pending list. ANd for filtering we can't use lossy tbm.

Again, we talk about amgettuple interface. We need to filter results from 
regular GIN by results from pending list. Now GIN does it by lookup matched 
ItemPointer in tidbitmap constructed from pending list. We could use ordered 
array of ItemPointers instead of tidbitmap, but I don't believe that it will 
take less memory. It's impossible to rescan pending list for two reasons: a) too 
slow, b) concurrent cleanup process (vacuum or insert now), because they could 
move tuples into regular structure.


Is it acceptable to add option to tidbitmap which will forbid tidbitmap to 
become lossy? That removes disabling index scan in gincostestimate and checking 
of non-lossy tidbitmap.


In current version, cleanup of pending list starts much earlier than non-lossy 
limit is reached in typical use-cases. Insertion process will start cleanup with 
at least one fired trigger:

 - number of heap tuples in pending list could produce lossy tidbitmap
 - total size of pending list is greater than work_mem. This trigger is 
developed to speedup bulk insertion (which is used in cleanup), because it will 
collect whole pending list in memory at once. And this trigger is more strict 
than first one because in typical use-case of GIN heap tuple is rather big.


I believe that user could get GIN's error about work_mem only intentionally:
- turn off autovacuum
- set big work_mem
- populate table with GIN index (by needed number of insertion)
- prepare query which will return a lot of results (possibly, with seqscan=off 
because cost of scan of pending list grows fast)

- decrease work_mem for at least ten times
- execute query


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Robert Haas

 I believe that user could get GIN's error about work_mem only intentionally:
 - turn off autovacuum

Meanwhile, in the other thread, we're having a discussion about people
wanting to do exactly this on a database-wide basis during peak load
hours...

 - set big work_mem
 - populate table with GIN index (by needed number of insertion)
 - prepare query which will return a lot of results (possibly, with
 seqscan=off because cost of scan of pending list grows fast)
 - decrease work_mem for at least ten times
 - execute query

Why would the new work_mem need to be 10x smaller than the old work mem?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Teodor Sigaev


Robert Haas wrote:

I believe that user could get GIN's error about work_mem only intentionally:
- turn off autovacuum

Meanwhile, in the other thread, we're having a discussion about people
wanting to do exactly this on a database-wide basis during peak load
hours...

- decrease work_mem for at least ten times
- execute query

Why would the new work_mem need to be 10x smaller than the old work mem?


That is is way to get GIN's error emitted. Work_mem should be decreased 
to catch a chance to get lossy tidbitmap.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Tom Lane

Teodor Sigaev teo...@sigaev.ru writes:
 Robert Haas wrote:
 Why would the new work_mem need to be 10x smaller than the old work mem?

 That is is way to get GIN's error emitted. Work_mem should be decreased 
 to catch a chance to get lossy tidbitmap.

But it only has to be marginally lower, not 10x lower.  And there are
plenty of scenarios where different backends might be running with
different work_mem settings.

But the *real* problem is that you simply can not guarantee that
someone doesn't increase the size of the pending list between the time
that someone else commits to an indexscan plan and the time that they
execute that plan.  This scheme will result in random failures for
concurrent inserts/selects, and that's not acceptable.

What did you think of the idea of simply abandoning support for
conventional indexscans in GIN?  I agree that we could probably kluge
something to make conventional scans still work reliably, but it seems
to me that it'd be ugly, fragile, and quite possibly slow enough to not
ever beat bitmap scans anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas

On Wed, Feb 11, 2009 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm going to try to reproduce this, but here's approximately what I did.

OK, I've managed to build a reproducible test case.  Initial setup is
just as I had before:

 create table foo (id serial, x int[], primary key (id));
 create index foo_gin on foo using gin (x);

Then just start these two commands running in different windows and wait:

while true; do psql -c explain analyze select sum(1) from foo where
array[1] @ x; done
while true; do psql -c insert into foo (x) select array[(random() *
100)::int, (random() * 90)::int, (random()*80)::int] from
generate_series(1,10);; done

I did this four times, sometimes with the variant of adding set
enable_bitmapscan to false; before the explain analyze.  In three
cases the database recovered succesfully after the immediate shutdown;
in the other case, it crapped out much as described in my original
email.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas

 I did this four times, sometimes with the variant of adding set
 enable_bitmapscan to false; before the explain analyze.  In three
 cases the database recovered succesfully after the immediate shutdown;
 in the other case, it crapped out much as described in my original
 email.

Sorry to keep replying to myself, but I've figured that autovacuum is
not required to trigger this bug.  In fact, I can reliably trigger it
much more quickly just by starting two concurrent copies of:

psql -c insert into foo (x) select array[(random() * 100)::int,
(random() * 90)::int, (random()*80)::int] from
generate_series(1,10);

This freezes the whole system even with autovacuum = off in
postgresql.conf.  As before, the backends wait on a semop() call.

I was able to reproduce the recovery failure this way once as well,
but that part of the problem seems to be much more erratic.  Most of
the time after an immediate shutdown, pg_ctl start triggers recovery
followed by normal startup.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I think this is related to the problems with gincostestimate() that
 Tom Lane was complaining about here:
 http://archives.postgresql.org/message-id/20441.1234209...@sss.pgh.pa.us

 I am not 100% sure I'm understanding this correctly, but I think the
 reason why gincostestimate() is so desperate to avoid index scans when
 the pending list is long is because it knows that scanFastInsert()
 will blow up if an index scan is actually attempted because of the
 aforementioned TIDBitmap problem.  This seems unacceptably fragile.

Yipes.  If that's really the reason then I agree, it's a nonstarter.

 I think this code needs to be somehow rewritten to make things degrade
 gracefully when the pending list is long - I'm not sure what the best
 way to do that is.  Inventing a new data structure to store TIDs that
 is never lossy seems like it might work, but you'd have to think about
 what to do if it got too big.

What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions.

It strikes me that part of the issue here is that the behavior of this
code is much better adapted to the bitmap-scan API than the traditional
indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
whether it wouldn't be more sensible to simply allow it to not offer
the traditional API.  It should be easy to make the planner ignore plain
indexscan plans for an AM that didn't support them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Robert Haas

On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think this code needs to be somehow rewritten to make things degrade
 gracefully when the pending list is long - I'm not sure what the best
 way to do that is.  Inventing a new data structure to store TIDs that
 is never lossy seems like it might work, but you'd have to think about
 what to do if it got too big.

 What would be wrong with letting it degrade to lossy?  I suppose the
 reason it's trying to avoid that is to avoid having to recheck all the
 rows on that page when it comes time to do the index insertion; but
 surely having to do that is better than having arbitrary, unpredictable
 failure conditions.

No, I don't think that's it.  See here, beginning with the problem
with lossy tbm has two aspects:

http://archives.postgresql.org/message-id/4974b002.3040...@sigaev.ru

 It strikes me that part of the issue here is that the behavior of this
 code is much better adapted to the bitmap-scan API than the traditional
 indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
 whether it wouldn't be more sensible to simply allow it to not offer
 the traditional API.  It should be easy to make the planner ignore plain
 indexscan plans for an AM that didn't support them.

If that doesn't lead to a performance degradation, I think it would be
a good idea.  It certainly seems like it would allow this patch to be
a LOT simpler, cleaner, and more robust.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It strikes me that part of the issue here is that the behavior of this
 code is much better adapted to the bitmap-scan API than the traditional
 indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
 whether it wouldn't be more sensible to simply allow it to not offer
 the traditional API.  It should be easy to make the planner ignore plain
 indexscan plans for an AM that didn't support them.

 If that doesn't lead to a performance degradation, I think it would be
 a good idea.

For queries that select only a single index entry, there might be some
speed degradation, but a quick test suggests it's in the
single-digit-percentage range if everything's cached; and of course if
you have to go to disk then the extra CPU cycles to push a bitmap around
are lost in the noise.

In any case, as a wise man once said I can make this code arbitrarily
fast, if it doesn't have to give the right answer.  Blowing up in easily
foreseeable circumstances isn't my idea of giving the right answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Robert Haas

On Tue, Feb 10, 2009 at 11:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For queries that select only a single index entry, there might be some
 speed degradation, but a quick test suggests it's in the
 single-digit-percentage range if everything's cached; and of course if
 you have to go to disk then the extra CPU cycles to push a bitmap around
 are lost in the noise.

 In any case, as a wise man once said I can make this code arbitrarily
 fast, if it doesn't have to give the right answer.  Blowing up in easily
 foreseeable circumstances isn't my idea of giving the right answer.

Sure, but dropping index-scan support is not the only way of fixing
that problem.  It has a certain elegance to it, but if it produces a
perceptible slowdown, it may not be the best approach.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Teodor Sigaev


Here is a test of the fast insert patch. The patch has gone through some
changes, so this set of tests is to see the current performance impact
compared with HEAD.

The test is simple: inserting a bunch of integer arrays into a table
with a GIN index on the array column.

I'm testing with small work_mem and large work_mem because the smaller
work mem should be periodically flushing the pending list throughout a
large insert, while large work_mem should allow larger batches to build
up before flushing the pending list. For HEAD, larger work_mem should
have no effect.


You didn't provide distributions of array's element, number of unique element 
and so on. And I make simple test script, which generates data  rather close to 
typical tsearch installation (see tst.sql).


And results on my notebook (in seconds):
 fastupdate |   insert   | vacuum
++
10 rows
 off|316.147 |  0.770
 on | 65.461 | 12.795
100 rows
 off|  16 hours |-
 on |   6612.595 | 12.795

I stop the test with fastupdate=off and one million rows - it ran too long :).
Changes in postgresql.conf:
shared_buffers=128MB
temp_buffers=16MB
work_mem=16MB
maintenance_work_mem=256MB
effective_cache_size=1024MB
autovacuum=off

Fastest way is create table, fill it, create index and vacuum it (for 10 
records):

17 secs to insert
27 secs to create an index
1  second to vacuum
So, in summary, it takes 45 secs instead of 78 secs with fast update and 317 
seconds without fast update. I think, it's a win in performance.



With the fast insert patch, the total time for insert + vacuum isn't
much different with increased work_mem, but increased work_mem clearly
defers a lot of the work to VACUUM.


but increased work_mem clearly *may* defer a lot of the work to VACUUM. 
Because in real world it's impossible to predict when clearing of pending list 
will be started. And autovacuum usually will fire the clearing earlier than 
pending list reaches the limit.



So, unfortunately, the fast insert patch does not appear to bring the
overall time anywhere close to building the index from scratch. When the
work_mem is set to 1GB, the VACUUM took about twice as long to run than
the entire index build. Teodor, can you explain why that might be?
Yeah, creation of index is much more optimizable than sequential insertions. 
With enabled fast update there is a overhead of writing pending pages (and WAL 
too), and that pages should be readed to collect data into memory. Next, 
clearing process uses work_mem instead of maintenance_work_mem, which is usually 
greater. Algorithm of bulk insertion (it's used in creation and cleaning too) 
likes tids at the end of table, if lowest tid to insert is greater than lastest 
tid in current tree then algorithm could insert more than one tid at once.




It does show improvement, and I think my test case might just be too

If dataset is bigger then improvement is better :).


small. It seems like a lot of time is used inserting into the pending
list, but it seems like it should be a simple operation. Maybe that can
be optimized?


As you can see (ginfast.c), insertion into pending list could cause not fully 
filled pages, in worst case pending list will contain about 50% of unused space 
(if every indexed value takes GIN_PAGE_FREESIZE+1 bytes then value will takes 
two pages). This is a price to keep concurrency at high level :(. If you have an 
idea how to do compact, fast and concurrent insertion into pending list (or 
another structure) and keep reasonable time to search, please, don't be quiet :)



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Teodor Sigaev


Sorry, lost test sript

BTW, is btree_gin ready to commit by your opinion?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
CREATE OR REPLACE FUNCTION gena() 
RETURNS _int4 AS
$$
SELECT array( 
SELECT (10*random())::int
FROM generate_series( 0, 2 + (100*random())::int ) 
);
$$
LANGUAGE SQL VOLATILE;


\echo  FU = off =
DROP TABLE IF EXISTS ta;
CREATE TABLE ta (
a   int[]
);

CREATE INDEX taidx ON ta USING gin (a) with (fastupdate=off);
INSERT INTO ta (SELECT gena() FROM generate_series(1,10));
VACUUM ANALYZE ta;


\echo  FU = on =

DROP TABLE IF EXISTS ta;
CREATE TABLE ta (
a   int[]
);

CREATE INDEX taidx ON ta USING gin (a) with (fastupdate=on);
INSERT INTO ta (SELECT gena() FROM generate_series(1,10));
VACUUM ANALYZE ta;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Jeff Davis

On Tue, 2009-01-27 at 20:36 +0300, Teodor Sigaev wrote:
 You didn't provide distributions of array's element, number of unique element 
 and so on. And I make simple test script, which generates data  rather close 
 to 
 typical tsearch installation (see tst.sql).

The arrays I was inserting were actually all identical. In the case of a
1000-element array inserted 1 times, it was just ARRAY[1, 2, ...,
1000].

My test case must have been much to simple, but I expected that it would
still benefit from fast insert.

 but increased work_mem clearly *may* defer a lot of the work to VACUUM. 
 Because in real world it's impossible to predict when clearing of pending 
 list 
 will be started. And autovacuum usually will fire the clearing earlier than 
 pending list reaches the limit.

Yes, that is the expected result and part of the design. It was just an
observation, not a criticism.

I will try with a better test case.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

56 matches

Mail list logo