Re: [HACKERS] GIN fast insert

2009-03-24 Thread David Fetter
On Tue, Mar 24, 2009 at 04:55:47PM -0700, Josh Berkus wrote: > On 3/24/09 1:18 PM, Tom Lane wrote: >> I've committed this patch with additional editorialization --- mostly >> cosmetic changes, except for removing the stats-driven cleanup in favor >> of letting cleanups occur during auto-ANALYZE, as

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Josh Berkus
On 3/24/09 1:18 PM, Tom Lane wrote: I've committed this patch with additional editorialization --- mostly cosmetic changes, except for removing the stats-driven cleanup in favor of letting cleanups occur during auto-ANALYZE, as per my suggestion yesterday. By my count, this was the last patch l

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Tom Lane
I've committed this patch with additional editorialization --- mostly cosmetic changes, except for removing the stats-driven cleanup in favor of letting cleanups occur during auto-ANALYZE, as per my suggestion yesterday. regards, tom lane -- Sent via pgsql-hackers mailing

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Teodor Sigaev
ginInsertCleanup(); is it accurate? (If it isn't, I think Exactly, that is right. * This can be called concurrently by multiple backends, so it must cope. * On first glance it looks completely not concurrent-safe and not crash-safe * either. The reason it's okay is that multiple inserti

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Tom Lane
Teodor Sigaev writes: > Oops, I was wrong, I supposed that all pages in chunk should be lossy, but > it's > true only for chunk page. So, tbm_add_page() should only call > tbm_mark_page_lossy()... OK, thanks, that's what I thought. I've changed it in the copy I'm editing here. I have another

Re: [HACKERS] GIN fast insert

2009-03-24 Thread Teodor Sigaev
Why is tbm_add_page() not coded as a simple wrapper around tbm_mark_page_lossy()? As coded, it sometimes forces a whole bunch of pages *other than* the target page to become lossy too, which I cannot see a reason for it to be doing. [after digging in tidbitmap] Oops, I was wrong, I supposed tha

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Tom Lane
Jeff Davis writes: > On Mon, 2009-03-23 at 15:23 -0400, Tom Lane wrote: >> There is no need for any such infrastructure if we just drive it off a >> post-ANALYZE callback. > That sounds reasonable, although it does seem a little strange for > analyze to actually perform cleanup. My thought was t

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Jeff Davis
On Mon, 2009-03-23 at 15:23 -0400, Tom Lane wrote: > There is no need for any such infrastructure if we just drive it off a > post-ANALYZE callback. That sounds reasonable, although it does seem a little strange for analyze to actually perform cleanup. Now that we have FSM, the cost of VACUUMing

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> On top of those issues, there are implementation problems in the >> proposed relation_has_pending_indexes() check: > I wonder if it's workable to have GIN send pgstats a message with number > of fast-inserted tuples, and have autovacuum check that numbe

Re: [HACKERS] GIN fast insert

2009-03-23 Thread Tom Lane
While I'm looking at this thing... Why is tbm_add_page() not coded as a simple wrapper around tbm_mark_page_lossy()? As coded, it sometimes forces a whole bunch of pages *other than* the target page to become lossy too, which I cannot see a reason for it to be doing. rega

Re: [HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Alvaro Herrera
Tom Lane wrote: > On top of those issues, there are implementation problems in the > proposed relation_has_pending_indexes() check: it has hard-wired > knowledge about GIN indexes, which means the feature cannot be > extended to add-on index AMs; and it's examining indexes without any > lock whats

[HACKERS] GIN fast-insert vs autovacuum scheduling

2009-03-23 Thread Tom Lane
I'm looking again at the fast-insert patch, and I find myself still desperately unhappy about the mechanism for scheduling autovacuum cleanup of pending insertions. I complained about that before, but I think I only cited a worry about adding overhead to statistics tracking in order to have the "r

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Robert Haas
On Thu, Feb 26, 2009 at 11:41 AM, Teodor Sigaev wrote: >> it be?  So far we've ruled out using the planner to prevent index >> scans when the pending list is long (because it's not reliable) and >> cleaning up the pending list during insert when needed (because it >> won't work with Hot Standby).

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Teodor Sigaev
it be? So far we've ruled out using the planner to prevent index scans when the pending list is long (because it's not reliable) and cleaning up the pending list during insert when needed (because it won't work with Hot Standby). We haven't decided what WILL work, During insert it will work wi

Re: [HACKERS] GIN fast insert

2009-02-26 Thread Teodor Sigaev
Teodor, can you confirm * we WAL log the insert into the pending list * we WAL log the move from the pending list to the main index Yes, I can and I confirm * that we maintain the pending list correctly during redo so that it can be accessed by index scans Not sure about correct locking in gi

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Robert Haas
On Tue, Feb 24, 2009 at 2:56 PM, Tom Lane wrote: >> On the other hand, Teodor showed a typical use case and a very >> substantial performance gain: > > Yeah.  Whatever we do here is a tradeoff (and whether Robert likes it > or not, reliability and code maintainability weigh heavily in the > tradeo

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Tom Lane
Jeff Davis writes: > On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: >> It only took me about 5 minutes to come up with a test case against CVS >> HEAD where disabling index scans resulted in a significant dropoff in >> performance. Here it is: > On the other hand, Teodor showed a typical

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Jeff Davis
On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: > It only took me about 5 minutes to come up with a test case against CVS > HEAD where disabling index scans resulted in a significant dropoff in > performance. Here it is: On the other hand, Teodor showed a typical use case and a very subs

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Robert Haas
On Tue, Feb 24, 2009 at 10:39 AM, Tom Lane wrote: > Robert Haas writes: >> On my system this takes about 45 ms to execute with default settings >> and about 90 ms to execute with index scan disabled. > > [ shrug... ]  That's well within my threshold of pain for this. > In any case, it might be po

Re: [HACKERS] GIN fast insert

2009-02-24 Thread Tom Lane
Robert Haas writes: > On my system this takes about 45 ms to execute with default settings > and about 90 ms to execute with index scan disabled. [ shrug... ] That's well within my threshold of pain for this. In any case, it might be possible to buy some/all of that back with minor optimization

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas
On Mon, Feb 23, 2009 at 1:35 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane wrote: >>> Actually, I'm going to *insist* that we lose the index AM scan >>> altogether. > >> Except that the "inessential" feature in question is a feature that >> currently WOR

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Tom Lane
Robert Haas writes: > On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane wrote: >> Actually, I'm going to *insist* that we lose the index AM scan >> altogether. > Except that the "inessential" feature in question is a feature that > currently WORKS, and I don't believe that the testing you've done is >

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas
On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane wrote: > Robert Haas writes: >> I'm starting to think that the right thing to do here is to create a >> non-lossy option for TIDBitmap. Tom has been advocating just losing >> the index scan AM altogether, but that risks losing performance in >> cases wh

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Tom Lane
Robert Haas writes: > I'm starting to think that the right thing to do here is to create a > non-lossy option for TIDBitmap. Tom has been advocating just losing > the index scan AM altogether, but that risks losing performance in > cases where a LIMIT would have stopped the scan well prior to > c

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Robert Haas
On Mon, Feb 23, 2009 at 4:56 AM, Simon Riggs wrote: >> It would be helpful if Heikki or Simon could jump in here, but my >> understanding is that cleaning up the pending list is a read-write >> operation. I don't think we can do that on a hot standby server. > > >From reading the docs with the pa

Re: [HACKERS] GIN fast insert

2009-02-23 Thread Simon Riggs
On Thu, 2009-02-19 at 22:43 -0500, Robert Haas wrote: > > I don't see a problems here, because indexes in postgres don't > depend on any > > transaction's ids or modes as heap depends. WAL-logger works without > that > > knowledge too. May be I missed something here or don't understand. > > > > Al

Re: [HACKERS] GIN fast insert

2009-02-20 Thread Heikki Linnakangas
Teodor Sigaev wrote: Right, can't do that on a hot standby server. Is anywhere applicable hot standby patch? Last version on wiki is 9g and it can't be applied cleanly. The latest version is in Simon's git repository at: http://git.postgresql.org/?p=~sriggs/simon.git;a=shortlog;h=refs/heads

Re: [HACKERS] GIN fast insert

2009-02-20 Thread Teodor Sigaev
Right, can't do that on a hot standby server. Is anywhere applicable hot standby patch? Last version on wiki is 9g and it can't be applied cleanly. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.s

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Heikki Linnakangas
Robert Haas wrote: On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev wrote: handle it is not terribly efficient, oh well. The one thing that concerns me is - what will happen in a hot standby environment, when that patch is committed? In that situation, I believe that we can't call modify any he

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Robert Haas
On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev wrote: >> and I still think that's bogus. It seems clear that this is going to >> change much faster than plans are going to be invalidated, and if >> autovacuum is doing its job, the pending list won't get long enough to >> matter much anyway, right

Re: [HACKERS] GIN fast insert

2009-02-19 Thread Teodor Sigaev
and I still think that's bogus. It seems clear that this is going to change much faster than plans are going to be invalidated, and if autovacuum is doing its job, the pending list won't get long enough to matter much anyway, right? I don't think this patch should be touching gincostestimate at

Re: [HACKERS] GIN fast insert

2009-02-18 Thread Robert Haas
On Tue, Feb 17, 2009 at 2:28 PM, Teodor Sigaev wrote: > Hi there, > > we present two variants of GIN fast insert patch, since we're not sure > what is a the best solution: > > v0.28.1 > - remove disable cost in gincostestimate > - per http://archives.postgresql.org/message-id/499466d2.4010...@siga

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev
[ shrug... ] The proposed implementation fails to be particularly fast-start anyway, since it will process the entire pending queue before returning anything to the executor. That is not true for fastupdate=off. But in any case it could be improved, but improvements doesn't solve the issue wit

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Tom Lane
Heikki Linnakangas writes: > Teodor Sigaev wrote: >> But I don't believe that is popular use-case. In most cases, GIN is used >> with bitmap scan. Your emails are so convincing and I'll remove support >> amgettuple interface in GIN. > SELECT * FROM foo WHERE t @@ query LIMIT 100 > might be a fa

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Tom Lane
Teodor Sigaev writes: > Do you think we need to add new pg_am boolean option? Like > pg_am.amcangettuple > or pg_am.amcanpertuplescan I think it's sufficient to mark this by setting amgettuple to zero. We might as well allow amgetbitmap to be zero, too, to mark the case where the AM doesn't wa

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Robert Haas
On Fri, Feb 13, 2009 at 8:00 AM, Teodor Sigaev wrote: >> SELECT * FROM foo WHERE t @@ query LIMIT 100 >> might be a fairly common use case. > > It seems to me - > SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100; I'm not sure you can really assume that the ORDER BY rank will always be

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev
SELECT * FROM foo WHERE t @@ query LIMIT 100 might be a fairly common use case. It seems to me - SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100; -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: ht

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Heikki Linnakangas
Teodor Sigaev wrote: So? Barring some evidence that there's a significant performance win from a conventional indexscan, this is a weak argument. AFAICS the only significant advantage of the conventional API is to support ordered scans, and GIN doesn't do that anyway. What about SELECT ... AN

Re: [HACKERS] GIN fast insert

2009-02-13 Thread Teodor Sigaev
So? Barring some evidence that there's a significant performance win from a conventional indexscan, this is a weak argument. AFAICS the only significant advantage of the conventional API is to support ordered scans, and GIN doesn't do that anyway. What about SELECT ... AND EXISTS (SELECT ... t

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Tom Lane
Robert Haas writes: > On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane wrote: >> So? Barring some evidence that there's a significant performance win >> from a conventional indexscan, this is a weak argument. AFAICS the only >> significant advantage of the conventional API is to support ordered >> sca

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Robert Haas
On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane wrote: > Teodor Sigaev writes: >>> What did you think of the idea of simply abandoning support for >>> conventional indexscans in GIN? > >> I don't like this idea because it forbids conventional indexscans even with >> fastupdate=off. > > So? Barring som

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Tom Lane
Teodor Sigaev writes: >> What did you think of the idea of simply abandoning support for >> conventional indexscans in GIN? > I don't like this idea because it forbids conventional indexscans even with > fastupdate=off. So? Barring some evidence that there's a significant performance win from

Re: [HACKERS] GIN fast insert

2009-02-12 Thread Teodor Sigaev
But the *real* problem is that you simply can not guarantee that someone doesn't increase the size of the pending list between the time If insertion process has bigger work_mem. Agree. What did you think of the idea of simply abandoning support for conventional indexscans in GIN? I agree that

Re: [HACKERS] GIN fast insert database hang

2009-02-12 Thread Teodor Sigaev
This freezes the whole system even with autovacuum = off in postgresql.conf. As before, the backends wait on a semop() call. Fixed. There was a deadlock of LockBufferForCleanup and LockBuffer(SHARE). Redesign that place to downgrade LockBufferForCleanup to LockBuffer(EXCLUSIVE) with correction

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas
> I did this four times, sometimes with the variant of adding "set > enable_bitmapscan to false;" before the explain analyze. In three > cases the database recovered succesfully after the immediate shutdown; > in the other case, it crapped out much as described in my original > email. Sorry to ke

Re: [HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas
On Wed, Feb 11, 2009 at 10:03 PM, Robert Haas wrote: > I'm going to try to reproduce this, but here's approximately what I did. OK, I've managed to build a reproducible test case. Initial setup is just as I had before: > create table foo (id serial, x int[], primary key (id)); > create index fo

[HACKERS] GIN fast insert database hang

2009-02-11 Thread Robert Haas
While fooling around with the GIN fast insert patch tonight, I managed to hang my test database. :-( I'm going to try to reproduce this, but here's approximately what I did. create table foo (id serial, x int[], primary key (id)); create index foo_gin on foo using gin (x); insert into foo (x) se

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Tom Lane
Teodor Sigaev writes: > Robert Haas wrote: >> Why would the new work_mem need to be 10x smaller than the old work mem? > That is is way to get GIN's error emitted. Work_mem should be decreased > to catch a chance to get lossy tidbitmap. But it only has to be marginally lower, not 10x lower. An

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Teodor Sigaev
Robert Haas wrote: I believe that user could get GIN's error about work_mem only intentionally: - turn off autovacuum Meanwhile, in the other thread, we're having a discussion about people wanting to do exactly this on a database-wide basis during peak load hours... - decrease work_mem for at l

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Robert Haas
> I believe that user could get GIN's error about work_mem only intentionally: > - turn off autovacuum Meanwhile, in the other thread, we're having a discussion about people wanting to do exactly this on a database-wide basis during peak load hours... > - set big work_mem > - populate table with

Re: [HACKERS] GIN fast insert

2009-02-11 Thread Teodor Sigaev
What would be wrong with letting it degrade to lossy? I suppose the reason it's trying to avoid that is to avoid having to recheck all the rows on that page when it comes time to do the index insertion; but surely having to do that is better than having arbitrary, unpredictable failure conditions

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Robert Haas
On Tue, Feb 10, 2009 at 11:18 PM, Tom Lane wrote: > For queries that select only a single index entry, there might be some > speed degradation, but a quick test suggests it's in the > single-digit-percentage range if everything's cached; and of course if > you have to go to disk then the extra CPU

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Tom Lane
Robert Haas writes: > On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane wrote: >> It strikes me that part of the issue here is that the behavior of this >> code is much better adapted to the bitmap-scan API than the traditional >> indexscan API. Since GIN doesn't support ordered scan anyway, I wonder >

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Robert Haas
On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane wrote: >> I think this code needs to be somehow rewritten to make things degrade >> gracefully when the pending list is long - I'm not sure what the best >> way to do that is. Inventing a new data structure to store TIDs that >> is never lossy seems like

Re: [HACKERS] GIN fast insert

2009-02-10 Thread Tom Lane
Robert Haas writes: > I think this is related to the problems with gincostestimate() that > Tom Lane was complaining about here: > http://archives.postgresql.org/message-id/20441.1234209...@sss.pgh.pa.us > I am not 100% sure I'm understanding this correctly, but I think the > reason why gincostes

[HACKERS] GIN fast insert

2009-02-10 Thread Robert Haas
Jeff Davis asked me if I'd be willing to do a review of the GIN fast insert patch about two weeks ago, but I haven't actually had a chance to read through it in detail until tonight. I can't say I really know anything about GIN (though I did take this opportunity to RTM), so apologies in advance i

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Jeff Davis
On Tue, 2009-01-27 at 20:36 +0300, Teodor Sigaev wrote: > You didn't provide distributions of array's element, number of unique element > and so on. And I make simple test script, which generates data rather close > to > typical tsearch installation (see tst.sql). The arrays I was inserting we

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Teodor Sigaev
Sorry, lost test sript BTW, is btree_gin ready to commit by your opinion? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ CREATE OR REPLACE FUNCTION gena() RETURNS _int4 AS $$ SELECT array(

Re: [HACKERS] gin fast insert performance

2009-01-27 Thread Teodor Sigaev
Here is a test of the fast insert patch. The patch has gone through some changes, so this set of tests is to see the current performance impact compared with HEAD. The test is simple: inserting a bunch of integer arrays into a table with a GIN index on the array column. I'm testing with small wo

[HACKERS] gin fast insert performance

2009-01-26 Thread Jeff Davis
Here is a test of the fast insert patch. The patch has gone through some changes, so this set of tests is to see the current performance impact compared with HEAD. The test is simple: inserting a bunch of integer arrays into a table with a GIN index on the array column. I'm testing with small wo