Re: [HACKERS] [PATCHES] GIN improvements

2009-02-11 Thread Teodor Sigaev
But the real bottom line is: if autovacuum is working properly, it should clean up the index before the list ever gets to the point where it'd be sane to turn off indexscans. So I don't see why we need to hack the planner for this at all. If any hacking is needed, it should be in the direction

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-09 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote: Well, there's nothing to force that plan to be invalidated when the state of the pending list changes, is there? Would it be unreasonable to invalidate cached plans during the pending list cleanup? If

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Teodor Sigaev
I looked at this a little bit --- it needs proofreading (VACUUME?). Sorry, VACUUME fixed Do we really need an additional column in pgstat table entries in order to store something that looks like it can be derived from the other columns? The stats tables are way too big already. It's not

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis
On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote: Also, I really think it's a pretty bad idea to make index cost estimation depend on the current state of the index's pending list --- that state seems far too transient to base plan choices on. I'm confused by this. Don't we want to base the

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas
On Wed, Feb 4, 2009 at 1:39 PM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote: Also, I really think it's a pretty bad idea to make index cost estimation depend on the current state of the index's pending list --- that state seems far too transient to

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis
On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote: Well, there's nothing to force that plan to be invalidated when the state of the pending list changes, is there? Would it be unreasonable to invalidate cached plans during the pending list cleanup? Anyway, it just strikes me as strange to

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas
On Wed, Feb 4, 2009 at 4:23 PM, Jeff Davis pg...@j-davis.com wrote: On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote: Well, there's nothing to force that plan to be invalidated when the state of the pending list changes, is there? Would it be unreasonable to invalidate cached plans

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-02 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes: I'm very sorry, but v0.24 has a silly bug with not initialized value :(. New version is attached I looked at this a little bit --- it needs proofreading (VACUUME?). Do we really need an additional column in pgstat table entries in order to store something

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-23 Thread Teodor Sigaev
I'm very sorry, but v0.24 has a silly bug with not initialized value :(. New version is attached -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ fast_insert_gin-0.25.gz Description: Unix

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-22 Thread Teodor Sigaev
BTW, gincostestimate could use that information for cost estimation, but is index opening and metapge reading in amcostestimate acceptable? That sounds reasonable to me. I think that's what the index-specific cost estimators are for. Done. Do you expect a performance impact? I'm afraid

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Teodor Sigaev
- after limit is reached, force cleanup of pending list by calling gininsertcleanup. Not very good, because users sometimes will see a huge execution time of simple insert. Although users who runs a huge update should be satisfied. I have difficulties in a choice of way. Seems to me, the

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Jeff Davis
On Wed, 2009-01-21 at 15:06 +0300, Teodor Sigaev wrote: Done. Now GIN counts number of pending tuples and pages and stores they on metapage. Index cleanup could start during normal insertion in two cases: - number of pending tuples is too high to keep guaranteed non-lossy tidbitmap - pending

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-20 Thread Jeff Davis
On Mon, 2009-01-19 at 19:53 +0300, Teodor Sigaev wrote: I see only two guaranteed solution of the problem: - after limit is reached, force normal index inserts. One of the motivation of patch was frequent question from users: why update of whole table with GIN index is so slow? So this

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev
Changes: Results of pernding list's scan now are placed directly in resulting tidbitmap. This saves cycles for filtering results and reduce memory usage. Also, it allows to not check losiness of tbm. Is this a 100% bulletproof solution, or is it still possible for a query to fail due to

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera
Teodor Sigaev wrote: New version. Changes: - synced with current CVS I notice you added a fillfactor reloption in ginoptions, but does it really make sense? I recall removing it because the original code contained a comment that says this is here because default_reloptions wants it, but it

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev
I notice you added a fillfactor reloption in ginoptions, but does it really make sense? I recall removing it because the original code contained a comment that says this is here because default_reloptions wants it, but it has no effect. I didn't change a recognition of fillfactor value,

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera
Teodor Sigaev wrote: I notice you added a fillfactor reloption in ginoptions, but does it really make sense? I recall removing it because the original code contained a comment that says this is here because default_reloptions wants it, but it has no effect. I didn't change a recognition of

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Teodor Sigaev wrote: I didn't change a recognition of fillfactor value, although GIN doesn't use it for now. I suggest you take StdRdOptions out of the GinOptions struct, and leave fillfactor out of ginoptions. I don't think there's much

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev
I suggest you take StdRdOptions out of the GinOptions struct, and leave fillfactor out of ginoptions. I don't think there's much point in supporting options that don't actually do anything. If the user tries to set fillfactor for a gin index, he will get an error. Which is a good thing IMHO.

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-18 Thread Jeff Davis
On Fri, 2009-01-16 at 15:39 +0300, Teodor Sigaev wrote: START_CRIT_SECTION(); ... l = PageAddItem(...); if (l == InvalidOffsetNumber) elog(ERROR, failed to add item to index page in \%s\, RelationGetRelationName(index)); It's no use using ERROR, because it will turn

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-16 Thread Teodor Sigaev
New version. Changes: - synced with current CVS - added all your changes - autovacuum will run if fast update mode is turned on and trigger of fresh tuple is fired - gincostestimate now tries to calculate cost of scan of pending pages. gincostestimate set disable_cost if it believe that

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-12 Thread Teodor Sigaev
Changes: - added vacuum_delay_point() in gininsertcleanup - add trigger to run vacuum by number of inserted tuples from last vacuum. Number of inserted tuples represents number of really inserted tuples in index and it is calculated as tuples_inserted + tuples_updated -

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Greg Stark
On 3 Dec 2008, at 06:57 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit.

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: If we do this though it would be really nice to do it at a higher level than the indexam. If we could do it for any indexam that provides a kind of bulk insert method that would be great. I'm just not sure how to support all the indexable operators for

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Heikki Linnakangas
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: If we do this though it would be really nice to do it at a higher level than the indexam. If we could do it for any indexam that provides a kind of bulk insert method that would be great. I'm just not sure how to support all the

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Heikki Linnakangas
Teodor Sigaev wrote: - Falling back to regular insert will take long time for update of whole table - and that was one of reasons of that patch. Users forget to drop GIN index before a global update and query runs forever. If *that* is a use case we're interested in, the incoming tuples could

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Teodor Sigaev
grovel through. The situation will only be rectified at the next vacuum, but if there's no deletes or updates on the table, just inserts, autovacuum won't happen until the next anti-wraparound vacuum. There is not agreement here, see http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Teodor Sigaev wrote: - Falling back to regular insert will take long time for update of whole table - and that was one of reasons of that patch. Users forget to drop GIN index before a global update and query runs forever. If *that* is a use

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit. That would be a lot simpler, with no need to worry about concurrent inserts or

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Heikki Linnakangas
There's a pretty fundamental issue with this patch, which is that while buffering the inserts in the list pages makes the inserts fast, all subsequent queries become slower until the tuples have been properly inserted into the index. I'm sure it's a good tradeoff in many cases, but there has

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes: I think we need a hard limit on the number of list pages, before we can consider accepting this patch. After the limit is full, the next inserter can flush the list, inserting the tuples in the list into the tree, or just fall back to regular,

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Alvaro Herrera
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I think we need a hard limit on the number of list pages, before we can consider accepting this patch. After the limit is full, the next inserter can flush the list, inserting the tuples in the list into the tree, or

Re: [HACKERS] [PATCHES] GIN improvements

2008-10-31 Thread Teodor Sigaev
Reworked version of fast insertion patch for GIN. * shiftList does LockBufferForCleanup, which means that it can be blocked for an indefinitely long time by a concurrent scan, and since it's holding exclusive lock on metapage no new scans or insertions can start meanwhile. This is not only

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-25 Thread Teodor Sigaev
(a) that's not back-patchable and (b) it'll create a merge conflict with your patch, if you're still going to add a new AM function column. I think that aminsertcleanup per se isn't needed, but if we want an amanalyze there'd still be a conflict. Where are we on that? I'll revert

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev
* shiftList() holds an exclusive lock on metapage throughout its run, which means that it's impossible for two of them to run concurrently. So why bother with concurrent deletion detection? Because metapage is locked immediately before shiftList call, while metapage is unlocked another process

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: * There is a bigger race condition, which is that after a scan has returned a tuple from a pending page, vacuum could move the index entry into the main index structure, and then that same scan could return that same index entry a second time. This is a

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane
I wrote: Really? Then GiST needs to be fixed too. Otherwise you risk having queries return the same row twice. A bitmap indexscan plan would mask such an index bug ... but a plain indexscan won't. BTW, there's another issue I forgot about yesterday, which is that the planner assumes that

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev
operations such as page splits. Do we need to change the planner to assume that this only works nicely for btree? It seems to that direction (backward or forward) has meaning only for indexes with amcanorder = true. With amcanorder=false results will be occasionally for any direction. --

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: operations such as page splits. Do we need to change the planner to assume that this only works nicely for btree? It seems to that direction (backward or forward) has meaning only for indexes with amcanorder = true. With amcanorder=false results will

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev
queries return the same row twice. A bitmap indexscan plan would mask such an index bug ... but a plain indexscan won't. Fuh. :(. Well. Will fix. GiST: - GiST already supports both scan directions in theory, but page split may change order between forward and backward scans (user-defined

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: - GiST already supports both scan directions in theory, but page split may change order between forward and backward scans (user-defined pageSplit doesn't preserve order of tuples). Holding of split until end of scan will produce unacceptable

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Teodor Sigaev
once, for regular VACUUM I think you really have to call it within each bulkdelete operation. Exactly what I did in last patch. There's probably no point in optimizing it away in VACUUM FULL either, since surely it'll be fast to call index_cleanup when there's nothing in the pending list?

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: So, may be we just move insertcleanup call to ginbulkdelete/ginvacuumcleanup but leave aminsertcleanup field in pg_proc for a future. I'd be inclined not to add the extra AM call if we aren't going to use it now. There's no very good reason to think

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
I wrote: Yeah, I was going to complain about that next :-). Autovacuum isn't going to trigger as a result of INSERT operations; somehow we have to teach it what to do for GIN indexes. I remember we discussed this at PGCon but I don't think we decided exactly what to do... One simple idea is

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz Here is the GIN fast-insert patch back again. Changes: * Sync with CVS HEAD * Clean up documentation and some of the code comments * Fix up custom reloptions code * Suppress some compiler warnings

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Alvaro Herrera
Tom Lane wrote: Teodor Sigaev [EMAIL PROTECTED] writes: I didn't get much further than that because I got discouraged after looking at the locking issues around the pending-insertions list. It's a mess: These are rather severe problems. Maybe there's a better solution, but perhaps it would

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: It's a mess: These are rather severe problems. Maybe there's a better solution, but perhaps it would be good enough to lock out concurrent access to the index while the bulkinsert procedure is working. Ugh... The idea I was toying

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz I still havn't clearness of acceptability for suggested aminsertcleanup calling. I started to look at this. I don't understand why VACUUM does an insert cleanup before starting to vacuum, but

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Teodor Sigaev
I started to look at this. I don't understand why VACUUM does an insert cleanup before starting to vacuum, but VACUUM FULL doesn't? Hmm. May be I missed something, but I don't understand where and what... I tried to track all places of ambultdelete call. aminsertcleanup should be called

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: That's close to trivial to revert this piece to add cleanup call to ginbulkdelete/ginvacuumcleanup. Early variants used this variant. Yeah, I think we should do it that way. On reflection I don't think you even need the amvacuumstartup call, because it

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: I started to look at this. I don't understand why VACUUM does an insert cleanup before starting to vacuum, but VACUUM FULL doesn't? Hmm. May be I missed something, but I don't understand where and what... I tried to track all places of ambultdelete

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Teodor Sigaev
Well, if that is required to be true then this whole design is pretty broken, because VACUUM doesn't hold any lock that would guarantee that no insert happens between the two calls. If we fold the two AM calls into one call then it'd be okay for the index AM to take such a lock transiently

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-13 Thread Teodor Sigaev
Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz need more review of fast_insert yet? It looked like a number of people commented on it already. I still havn't clearness of acceptability for suggested aminsertcleanup calling. -- Teodor Sigaev

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-11 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz I've committed the multicolumn one with minor revisions (fix some poor English in docs and comments, add regression-test coverage). Do you need more review

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-11 Thread Teodor Sigaev
I've committed the multicolumn one with minor revisions (fix some poor English in docs and comments, add regression-test coverage). Do you Thank you very much. need more review of fast_insert yet? It looked like a number of people commented on it already. I should modify it to

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-09 Thread Neil Conway
On Tue, 2008-07-08 at 14:51 -0400, Tom Lane wrote: I'd still like to take a look. I was tasked with reviewing this for the current commit fest, although so far I've just been working on grokking the rest of the GIN code. But if you'd like to review it instead, that's fine with me. -Neil --

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-09 Thread Josh Berkus
Neil, I was tasked with reviewing this for the current commit fest, although so far I've just been working on grokking the rest of the GIN code. But if you'd like to review it instead, that's fine with me. I have plenty of other stuff I could assign you if you're not needed on GIN. --

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-08 Thread Teodor Sigaev
I looked this over and it looks good in general. May I think that patch passed review and commit it? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-08 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes: I looked this over and it looks good in general. May I think that patch passed review and commit it? I'd still like to take a look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Alvaro Herrera
Teodor Sigaev wrote: Sync with current CVS HEAD and post in hackers- too because patches- close to the closing. http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz I looked this over and it looks good in general. I was only wondering about for single-column indexes -- we're storing attribute

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Teodor Sigaev
I looked this over and it looks good in general. I was only wondering about for single-column indexes -- we're storing attribute numbers too, right? No, GinState-oneCol field signals to GinFormTuple and gin_index_getattr/gintuple_get_attrnum about actual storage. Single column index is binary

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Alvaro Herrera
Teodor Sigaev wrote: I looked this over and it looks good in general. I was only wondering about for single-column indexes -- we're storing attribute numbers too, right? No, GinState-oneCol field signals to GinFormTuple and gin_index_getattr/gintuple_get_attrnum about actual storage.

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-02 Thread Teodor Sigaev
Sync with current CVS HEAD and post in hackers- too because patches- close to the closing. http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz -- Teodor Sigaev E-mail: [EMAIL PROTECTED]