Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: In the meantime, though, it's not quite clear why this would lead to a buildfarm failure --- it should just mean a lot of extraneous files appearing in a fresh checkout. (Looks a bit harder ... Oh, it looks like btree_gist has some files that used to be autogenerated and are now in CVS, so the bogusly new versions from CVS are suppressing the desired generation from the old btree_num.c file.) Funny enoughy, that's exactly the problem I saw on win32 when I rsynced from a windows box and tried a local CVS checkout :-) Once I rsynced to a local linux instead and checked out from there, things worked again :) //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] is this trigger safe and efective? - locking (caching via triiggers)
Hello I write sample about triggers and i have question. is my solution correct and exists better solution? Regards Pavel Stehule DROP SCHEMA safecache CASCADE; CREATE SCHEMA safecache; CREATE TABLE safecache.source_tbl(category int, int_value int); CREATE TABLE safecache.cache(category int, sum_val int); CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce() RETURNS trigger AS $$ BEGIN IF TG_OP = 'INSERT' THEN -- row cannot exists in cache -- complication -- I would to finish these transaction without conflict IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE; -- I have to repeat test IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN INSERT INTO safecache.cache VALUES(NEW.category, NEW.int_value); END IF; ELSE -- simple UPDATE safecache.cache SET sum_val = sum_val + NEW.int_value WHERE category = NEW.category; END IF; ELSEIF TG_OP = 'UPDATE' THEN -- if category is without change simple IF NEW.category = OLD.category THEN UPDATE safecache.cache SET sum_val = sum_val + (NEW.int_value - OLD.int_value) WHERE category = OLD.category; ELSE -- old category has to exists UPDATE safecache.cache SET sum_val = sum_val - OLD.int_value WHERE category = OLD.category; -- new category is maybe problem IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE; -- I have to repeat test IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN INSERT INTO safecache.cache VALUES(NEW.category, NEW.int_value); END IF; ELSE -- simple, new category exists UPDATE safecache.cache SET sum_val = sum_val + OLD.int_value WHERE category = NEW.category; END IF; END IF; ELSE -- DELETE -- value have to exist in cache, simple UPDATE safecache.cache SET sum_val = sum_val - OLD.int_value WHERE category = OLD.category; END IF; RETURN NEW; END $$ LANGUAGE plpgsql; CREATE TRIGGER actualise_cache AFTER INSERT OR UPDATE OR DELETE ON safecache.source_tbl FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce(); ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: So I think that cvs rtag -d REL7_4_STABLE pgsql will fix it. I'd like someone to double-check that though. Also maybe we should back up the repository first? Just for your info: all VMs on tribble, which includes cvs, are backed up at 02:30 every day, CEST (that's 00:30 UTC right now, but it follows the local time on tribble so it'll be 23:30 UTC when austra switches later this year. Perhaps we should switch the machine to GMT, but that's a different discussion) For something like this it's good to run a special backup before that of course, but it's probably good to know when the backups run anyway... //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: So I'd say your strategy looks good - backup and remove the phony tag. I'd also say we should probably be logging tag commands in taginfo. Presumably we mere mortal committers should not be doing any tagging whatsoever, and tags should only be applied in preparation for releases. +1 ... we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. Particularly since they don't get reported in pgsql-committers messages (or is that something we can/should change?) I think you can, by sticking a command in the taginfo command. It's the same script that Marc is using to try to restrict who can do it - once he has that working, it should be as simple as adding a mail command to the end of that script. And +1 on doing it. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Since I don't think that a datatype solution is the way to go, I don't feel that we are as far away from an agreement as Bruce is worried about. Well, from where I sit, there is one person saying give me the foot gun, and Heikki saying he wants a bullet-proof type system, and you and I are in the middle, so the big problem is I don't see a concensus forming, and we have been discussing this for a while. The people who actually use tsearch2 seem to all have the same opinion ... so I think we can't go too far in the bullet-proofing direction. But I would like a design that is bulletproof in dump/reload scenarios, and I think it's fair to question that aspect of the tsearch2 design because we've seen many reports of people having trouble updating databases that use tsearch2. dump/reload is *the* biggest problem I've had with tsearch2 so far. But it hasn't been with the actual data - it's been the functions, and only when migrating between versions. But solving dump/reload reliably is one of the main things I'm hoping for in 8.3 ;-) As for a nother use-pointer, I use different configurations in the same database - but only one per table. I explicitly use the to_tsvector that specifies a configuration always - to avoid surprising myself. I don't use the functional index part, but for new users I can see how that's certainly a *lot* easier. Requiring the specification of the configuration explicitly when creating this index I don't see as a big problem at all - compared to the work needed to set up triggers. But it's nice not to have to do it when querying. But wouldn't that be solved by having to_tsvector() require the configuration, but to_tsquery() and plainto_tsquery() not require it? //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CVS corruption/mistagging?
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'd like someone to double-check that though. Also maybe we should back up the repository first? Just for your info: all VMs on tribble, which includes cvs, are backed up at 02:30 every day, CEST Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'd like someone to double-check that though. Also maybe we should back up the repository first? Just for your info: all VMs on tribble, which includes cvs, are backed up at 02:30 every day, CEST Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? 1) The entire VM, with dump 2) We have today, yesterday and one weekly (copies the daily on monday, if I read it right) They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
On Wed, 15 Aug 2007, Magnus Hagander wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Since I don't think that a datatype solution is the way to go, I don't feel that we are as far away from an agreement as Bruce is worried about. Well, from where I sit, there is one person saying give me the foot gun, and Heikki saying he wants a bullet-proof type system, and you and I are in the middle, so the big problem is I don't see a concensus forming, and we have been discussing this for a while. The people who actually use tsearch2 seem to all have the same opinion ... so I think we can't go too far in the bullet-proofing direction. But I would like a design that is bulletproof in dump/reload scenarios, and I think it's fair to question that aspect of the tsearch2 design because we've seen many reports of people having trouble updating databases that use tsearch2. dump/reload is *the* biggest problem I've had with tsearch2 so far. But it hasn't been with the actual data - it's been the functions, and only when migrating between versions. But solving dump/reload reliably is one of the main things I'm hoping for in 8.3 ;-) The dump/reload problem should be gone once tsearch2 became a part of core. the problem is an inability to say what is a correct configuration in case of expressional index when restoring. In any other case there are many use cases when tsvector could be intentionally obtained using different configurations. As for a nother use-pointer, I use different configurations in the same database - but only one per table. I explicitly use the to_tsvector that specifies a configuration always - to avoid surprising myself. I don't use the functional index part, but for new users I can see how that's certainly a *lot* easier. Requiring the specification of the configuration explicitly when creating this index I don't see as a big problem at all - compared to the work needed to set up triggers. But it's nice not to have to do it when querying. But wouldn't that be solved by having to_tsvector() require the configuration, but to_tsquery() and plainto_tsquery() not require it? or better to introduce novice-level interface with configuration name required and insist on using it with expressional index (don't know if there is a machinery to do so). Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CVS corruption/mistagging?
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... The CVS repository is already synced almost-live back to both svr1 in panama and the anoncvs box in the US. It will be synced to the new VM marc is setting up as well. I assume these machines also have some kind of backup - Marc will have to fill in on that. Oh, and the plan is to have the entire VM synced up between austria and panama machine once / day again once Marc has fixed the issues with that box. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... We already have that -- there's plenty of people who have rsync'ed copies of the repository. Now, I don't back it up, so if someone makes a mistake my copy will faithfully follow it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CVS corruption/mistagging?
On 8/15/07, Tom Lane [EMAIL PROTECTED] wrote: I wrote: Kris Jurka [EMAIL PROTECTED] writes: It looks like parts of the CVS repository have been mistagged as belonging to REL7_4_STABLE or have been corrupted somehow: I have no idea how you make CVS do that, but I'm sure there is some magic one-liner for it. Something like cvs tag without -F on the HEAD? -- marko ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CVS corruption/mistagging?
Magnus Hagander wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... The CVS repository is already synced almost-live back to both svr1 in panama and the anoncvs box in the US. It will be synced to the new VM marc is setting up as well. I assume these machines also have some kind of backup - Marc will have to fill in on that. Oh, and the plan is to have the entire VM synced up between austria and panama machine once / day again once Marc has fixed the issues with that box. yeah that has been my understanding as well though I'm not sure if marc has set up the sync of the full VM already. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS corruption/mistagging?
Magnus Hagander wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'd like someone to double-check that though. Also maybe we should back up the repository first? Just for your info: all VMs on tribble, which includes cvs, are backed up at 02:30 every day, CEST Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? 1) The entire VM, with dump 2) We have today, yesterday and one weekly (copies the daily on monday, if I read it right) They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. that's exactly what happens on the backup side here - but maybe I missed an earlier mail - is something broken on tribble or on the new CVS VM that requires a restore from backup ? Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CVS corruption/mistagging?
Stefan Kaltenbrunner wrote: Magnus Hagander wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: I'd like someone to double-check that though. Also maybe we should back up the repository first? Just for your info: all VMs on tribble, which includes cvs, are backed up at 02:30 every day, CEST Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? 1) The entire VM, with dump 2) We have today, yesterday and one weekly (copies the daily on monday, if I read it right) They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. that's exactly what happens on the backup side here - but maybe I missed an earlier mail - is something broken on tribble or on the new CVS VM that requires a restore from backup ? erm -ENOTENOUGHCOFFEE Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS corruption/mistagging?
Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. I never understood why tagging uses a special account anyway. It should be done as the person doing the tagging. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Index Tuple Compression Approach?
On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote: Jeff Davis wrote: On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote: Isn't this what Grouped Index Tuples is? http://community.enterprisedb.com/git/git-readme.txt It looks like GIT is a little different. GIT actually stores a lower-bound key of a contiguous* range of keys that all point to the same page, and for each of those ranges stores a bitmap of page offsets. A search searches first for an exact match in the index, and failing that, looks to see if the previous index tuple happens to be one of these ranges. The algorithm Chris is talking about stores a set of tuple ids (which include page and offset) for each distinct key. Right. Both could be helpful, although I don't think they can work together very well. What Chris is suggesting is basically a special case of GIT, where all the heap tuples represented by an index tuple have the same key. I was actually thinking of adding a flag to index tuples to indicate that special case in GIT. We could effectively do both. A few additional thoughts... The approach suggested by Chris is also used by Teradata Non-Unique Secondary Indexes, known as NUSIs (but not named by me!). The following link is a public domain description that is detailed enough: http://teradata.uark.edu/research/wang/indexes.html Replicating this approach directly isn't that useful because it would interact badly with the way we handle updates. Thinking about the basic design pattern however, we can envisage a type of index that changes the 1:1 mapping between index and heap tuple into the concept of a tuple set index where we have a 1:Many mapping between index and heap. In general, the tuple set index approach can significantly reduce index size. This won't be of interest to anyone unless all of your data overflows RAM and you need to do I/O to constantly page back in pieces of your data. If your database does fit in RAM, reducing the number of index blocks might just increase contention. This means that the tuple set approach is only useful when you have very large databases, but when you do it is very, very useful. GIT is a tuple set index with two important extensions: 1. GIT allows a range of values to be indexed, not just a single value, so this allows it to be useful for both Unique and Non-Unique cases. 2. GIT restricts the set of tuples to a small range of blocks within the table. Making the range of blocks = 1 means that GIT is designed to work well with HOT, which also stays on the same block. Keeping the range of blocks small means GIT degrades as slowly as possible in the face of cold UPDATEs. If the values are inserted in roughly ordered/clustered sequence then this doesn't increase index size at all, so the most common/highest volume use cases are covered. So from my perspective, GIT is very close to the optimal design for a tuple set index that addresses the need for high concurrency and unique/near-uniqueness with PostgreSQL. There are certainly many other options for a tuple set index, and bitmap indexes are simply another version of a tuple set index but with different behaviours tuned to a different use case. There maybe other use cases that require more than two kinds of tuple set index...and we have discussed implementing the tuple set behaviour as part of the other main index types. As an aside, it turns out, after further research that GIT is similar to a clustered index in SQLServer, as described by Louis Davidson, Pro SQL Server 2005 Database Design and Optimization, p.405. SQLServer creates a clustered index by default on each PK, so it says. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Testing the async-commit patch
On Tue, 2007-08-14 at 12:29 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote: heapam.c lines 1843-1852 presume previous xmax can be hinted immediately, ditto lines 2167-2176, ditto lines 2716-2725. I think probably we should just remove those lines --- they are only trying to save work for future tqual.c calls. I'll check those out later tonight. [ looks closer ] Actually, we can't just dike out those code sections, because the immediately following code assumes that XMAX_INVALID is correct. So I guess we need to export HeapTupleSetHintBits from tqual.c and do the full pushup in these places. Looks good: the code is neater and better commented than before as well as being fully accurate with async commit. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
On Tue, 2007-08-14 at 17:41 -0400, Tom Lane wrote: I've just finished re-reading the prior thread, and here are what seem to me to be the salient points: * Oleg, Teodor, and all of the old-line users of tsearch2 are comfortable with setting up a trigger to maintain a materialized tsvector column for a table. They argue that recomputing the tsvector (possibly more than once) during a query is more expensive than fetching it from disk. My suspicion is that the latter argument gets weaker every year --- CPUs are getting faster lots faster than disks are. * Bruce (and I ... not sure about anyone else) want to support usage of text search via a functional index. This is argued to be easier to set up (no fooling with triggers) and possibly faster depending on CPU vs I/O speeds. I don't think there is any desire here to eliminate the trigger approach, just to provide an alternative. ISTM that the functional index would be considerably smaller than the additional column approach, since tsvectors can be quite long. That seems like a very desirable thing with larger textbases. However, without an additional column certain queries would not be possible, such as IndexScans on a non-text search index with an additional filter on text search. So each way would be desirable in different situations. Would it be wrong to allow both approaches? If there is strong disagreement then it usually means both people are right. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
On Wed, 2007-08-15 at 08:10 +0200, Magnus Hagander wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Since I don't think that a datatype solution is the way to go, I don't feel that we are as far away from an agreement as Bruce is worried about. Well, from where I sit, there is one person saying give me the foot gun, and Heikki saying he wants a bullet-proof type system, and you and I are in the middle, so the big problem is I don't see a concensus forming, and we have been discussing this for a while. The people who actually use tsearch2 seem to all have the same opinion ... so I think we can't go too far in the bullet-proofing direction. But I would like a design that is bulletproof in dump/reload scenarios, and I think it's fair to question that aspect of the tsearch2 design because we've seen many reports of people having trouble updating databases that use tsearch2. dump/reload is *the* biggest problem I've had with tsearch2 so far. But it hasn't been with the actual data - it's been the functions, and only when migrating between versions. But solving dump/reload reliably is one of the main things I'm hoping for in 8.3 ;-) I can see the problem, but I'm sure there are more solutions than have been listed so far. If dump/restore is a problem we can: 1. force pg_dump to output a SET command for the GUC, so it is correctly set at restore time. That seems like a straightforward addition to pg_dump. Maybe this can be done in a generalised manner to support other dump/restore configuration difficulties that might occur in the future. 2. put the desired value in a table and make sure the text_search_config table is dumped ahead of other objects. When we restore we build the index based on the config option set in the table, so it all just works. 3... probably other options too. Maybe we should consider that the user may be dumping and reloading *because* they want the configuration to change. Just a thought. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Magnus Hagander wrote: But I would like a design that is bulletproof in dump/reload scenarios, and I think it's fair to question that aspect of the tsearch2 design because we've seen many reports of people having trouble updating databases that use tsearch2. dump/reload is *the* biggest problem I've had with tsearch2 so far. But it hasn't been with the actual data - it's been the functions, and only when migrating between versions. But solving dump/reload reliably is one of the main things I'm hoping for in 8.3 ;-) As for a nother use-pointer, I use different configurations in the same database - but only one per table. I explicitly use the to_tsvector that specifies a configuration always - to avoid surprising myself. I don't use the functional index part, but for new users I can see how that's certainly a *lot* easier. Requiring the specification of the configuration explicitly when creating this index I don't see as a big problem at all - compared to the work needed to set up triggers. But it's nice not to have to do it when querying. But wouldn't that be solved by having to_tsvector() require the configuration, but to_tsquery() and plainto_tsquery() not require it? Yea, I have thought about splitting up the behavior so tsvector always needs the configuration but tsquery does not. However, for a query, you are probably still creating a tsvector so it didn't see to help much in clarity. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
On Wed, Aug 15, 2007 at 10:23:00AM -0400, Bruce Momjian wrote: Magnus Hagander wrote: But I would like a design that is bulletproof in dump/reload scenarios, and I think it's fair to question that aspect of the tsearch2 design because we've seen many reports of people having trouble updating databases that use tsearch2. dump/reload is *the* biggest problem I've had with tsearch2 so far. But it hasn't been with the actual data - it's been the functions, and only when migrating between versions. But solving dump/reload reliably is one of the main things I'm hoping for in 8.3 ;-) As for a nother use-pointer, I use different configurations in the same database - but only one per table. I explicitly use the to_tsvector that specifies a configuration always - to avoid surprising myself. I don't use the functional index part, but for new users I can see how that's certainly a *lot* easier. Requiring the specification of the configuration explicitly when creating this index I don't see as a big problem at all - compared to the work needed to set up triggers. But it's nice not to have to do it when querying. But wouldn't that be solved by having to_tsvector() require the configuration, but to_tsquery() and plainto_tsquery() not require it? Yea, I have thought about splitting up the behavior so tsvector always needs the configuration but tsquery does not. However, for a query, you are probably still creating a tsvector so it didn't see to help much in clarity. Agh, I got stuck thinking the trigger case aagin - when you don't need to create a vector at all. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] is this trigger safe and efective? - locking (caching via triiggers)
I don't like the locking... take a look at Ex 37-1 at the end of http://lnk.nu/postgresql.org/fhe.html for a better way (though, the comment below about going into an infinite loop is a good observantion, but I think perhaps after some number of fast tries it should start putting a sleep in the loop, rather than just arbitrarily bombing after 10 tries. Also, I remember discussion on -performance about this from folks using it in the real world... the problem they ran into is that doing the updates in the cache/mview table directly bloated it too much... they found it was better to just insert changes into an interim table, and then periodically batch-process that table. On Wed, Aug 15, 2007 at 08:01:24AM +0200, Pavel Stehule wrote: Hello I write sample about triggers and i have question. is my solution correct and exists better solution? Regards Pavel Stehule DROP SCHEMA safecache CASCADE; CREATE SCHEMA safecache; CREATE TABLE safecache.source_tbl(category int, int_value int); CREATE TABLE safecache.cache(category int, sum_val int); CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce() RETURNS trigger AS $$ BEGIN IF TG_OP = 'INSERT' THEN -- row cannot exists in cache -- complication -- I would to finish these transaction without conflict IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE; -- I have to repeat test IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN INSERT INTO safecache.cache VALUES(NEW.category, NEW.int_value); END IF; ELSE -- simple UPDATE safecache.cache SET sum_val = sum_val + NEW.int_value WHERE category = NEW.category; END IF; ELSEIF TG_OP = 'UPDATE' THEN -- if category is without change simple IF NEW.category = OLD.category THEN UPDATE safecache.cache SET sum_val = sum_val + (NEW.int_value - OLD.int_value) WHERE category = OLD.category; ELSE -- old category has to exists UPDATE safecache.cache SET sum_val = sum_val - OLD.int_value WHERE category = OLD.category; -- new category is maybe problem IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE; -- I have to repeat test IF NOT EXISTS(SELECT category FROM safecache.cache WHERE category = NEW.category) THEN INSERT INTO safecache.cache VALUES(NEW.category, NEW.int_value); END IF; ELSE -- simple, new category exists UPDATE safecache.cache SET sum_val = sum_val + OLD.int_value WHERE category = NEW.category; END IF; END IF; ELSE -- DELETE -- value have to exist in cache, simple UPDATE safecache.cache SET sum_val = sum_val - OLD.int_value WHERE category = OLD.category; END IF; RETURN NEW; END $$ LANGUAGE plpgsql; CREATE TRIGGER actualise_cache AFTER INSERT OR UPDATE OR DELETE ON safecache.source_tbl FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce(); ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpM1sQ9YWbpd.pgp Description: PGP signature
Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]
On Wed, Aug 15, 2007 at 06:47:05AM +0200, Pavel Stehule wrote: 2007/8/14, Decibel! [EMAIL PROTECTED]: On Tue, Aug 14, 2007 at 05:38:33PM +0200, Pavel Stehule wrote: 2007/8/14, Bruce Momjian [EMAIL PROTECTED]: TODO item? + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend I am against. It's too simple do it in SQL language. Why make everyone who works with arrays create a function just to do this? Something that's of use to common users should be included, simple or not. -- Unpacking array is more SQL construct for me, than SRF function. With function you cannot conntrol behave of unpacking. With SQL construct I can Huh? You can do a DISTINCT or an ORDER BY on the output of a SRF. SELECT DISTINCT a(i) FROM generate_series ... remove duplicities SELECT a(i) FROM generate_series ORDER BY .. sorted output etc But I can SELECT * FROM generate_series(ARRAY[1,3,4,5,7,10]); else FUNCTION generate_series(anyarray) returns setof any -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgphVkwnyowqO.pgp Description: PGP signature
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Simon Riggs [EMAIL PROTECTED] writes: ISTM that the functional index would be considerably smaller than the additional column approach, since tsvectors can be quite long. That seems like a very desirable thing with larger textbases. However, without an additional column certain queries would not be possible, such as IndexScans on a non-text search index with an additional filter on text search. So each way would be desirable in different situations. Huh? Of course you can do the searching without an additional column; you just have to compute the tsvector on-the-fly at each row. This is a straight trade of more CPU cycles for less I/O. Would it be wrong to allow both approaches? Nobody has suggested disallowing the trigger approach (indeed it's hard to see how we could). The argument is mostly about how to make a functional index approach work conveniently; and secondarily about what's needed to make dump/restore reliably reproduce the current database state, whichever approach you choose. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CVS corruption/mistagging?
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... Yes, I think we could improve on that. Have we considered more sophisticated solutions that provide incremental backup on a more frequent basis? I'd be inclined to use Bacula or similar (and it uses Postgres for its metadata store :-) ). Ideally I think we'd like to be able fairly easily and quickly to roll the repo (or some portion of it) back to a fairly arbitrary and fairly precise (say within an hour or two) point in recent time. Meanwhile, those of us who rsync the entire repo could easily make rolling backup copies. Arguably this might be better done from a repo made using --numeric-ids. Tarred and compressed it's a shade under 90 Mb, which isn't huge. If people do this at various times of the day we'd get pretty good coverage :-) cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Another idea for index-only scans
I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] XID wraparound and busy databases
I was talking to someone at LinuxWorld and they mentioned they often have activity of 6k SELECTs per second, and that they were needing to autovacuum every few days because of xid wraparound. I did some calculations and found that: 6000 * 60 * 60 * 24 51840 or 500 million xids per day, confirming they would need the autovacuum to run every few days. Is enlarging the xid field something we should consider for 8.4? Is the autovacuum I/O less then the extra I/O needed for an expanded xid fields on every row? -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS corruption/mistagging?
Andrew Dunstan wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Good, but the salient followup questions to that are (1) backed up to where exactly?, and (2) how many days' past backups could we get at, if we had to? They are dumped to a NFS share on this schedule. That NFS share is dumped to tape by systems at Conova - I'll let Stefan fill in the details about that. That's good as far as it goes, but seeing that PG is a worldwide organization now, I wonder whether our primary CVS shouldn't have backups on several continents. Pardon my paranoia ... but our collective arses have been saved by offsite backups at least once already ... Yes, I think we could improve on that. Have we considered more sophisticated solutions that provide incremental backup on a more frequent basis? I'd be inclined to use Bacula or similar (and it uses Postgres for its metadata store :-) ). Ideally I think we'd like to be able fairly easily and quickly to roll the repo (or some portion of it) back to a fairly arbitrary and fairly precise (say within an hour or two) point in recent time. well yeah - while I do think that something as complex like bacula is probably overkill for our needs we can certainly improve over the current state. One thing to consider is that we actually have two major scenarios to deal with: 1. simple repo corruption (accident,cvs software bug, admin error) this one might require us to restore the repo from an older backup in the case the corruption cannot be repaired easily. For this we already have myriads of copies of the trees out in the wild but i might be a good idea to keep a number snapshots of the main repo on the CVS-VPS itself (either done every few hours or made as part of the push to anoncvs and svr1). This way we could do a very simple inplace recovery on the same running VPS instance with fairly low inpact to all the commiters (and depending parts of teh infrastructure) 2. total loss of the main CVS-VPS (extended power failure, hardware error, OS bug, admin error, fire, some other catastropic event) - in this case we will have to fail over to one of the other project datacenters and for this we need to have full regular copies of the whole VM on external hosts. Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CVS corruption/mistagging?
Peter Eisentraut wrote: Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. Is this really a problem in practise ? If such a need arises any commiter (or sysadmin) would probably be able to remove that restriction in a few minutes. I think the point here is to prevent such things done by accident(vs. on purpose) which a script like that seems like a fairly simple yet effective solution. Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CVS corruption/mistagging?
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Peter Eisentraut wrote: Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. Is this really a problem in practise ? I'd be happy if such commands got reported to the pgsql-committers mailing list. The real problem with this mistake is not so much that it was made as that we had to do detective work to find out. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]
2007/8/15, Decibel! [EMAIL PROTECTED]: On Wed, Aug 15, 2007 at 06:47:05AM +0200, Pavel Stehule wrote: 2007/8/14, Decibel! [EMAIL PROTECTED]: On Tue, Aug 14, 2007 at 05:38:33PM +0200, Pavel Stehule wrote: 2007/8/14, Bruce Momjian [EMAIL PROTECTED]: TODO item? + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend I am against. It's too simple do it in SQL language. Why make everyone who works with arrays create a function just to do this? Something that's of use to common users should be included, simple or not. -- Unpacking array is more SQL construct for me, than SRF function. With function you cannot conntrol behave of unpacking. With SQL construct I can Huh? You can do a DISTINCT or an ORDER BY on the output of a SRF. Yes, but then you get little bit different my form :) SELECT DISTINCT a(i) FROM generate_series ... remove duplicities SELECT a(i) FROM generate_series ORDER BY .. sorted output etc But I can SELECT * FROM generate_series(ARRAY[1,3,4,5,7,10]); else FUNCTION generate_series(anyarray) returns setof any -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XID wraparound and busy databases
Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Possibly your respondent should think about trying to do more than one thing per transaction? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] XID wraparound and busy databases
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Possibly your respondent should think about trying to do more than one thing per transaction? OK, yea, I think that makes sense. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] XID wraparound and busy databases
Bruce Momjian wrote: Is enlarging the xid field something we should consider for 8.4? Is the autovacuum I/O less then the extra I/O needed for an expanded xid fields on every row? I doubt that's going to happen... Maybe we can do something to reduce the xid consumption? For example, reuse xids for read-only queries. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] CVS corruption/mistagging?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Wednesday, August 15, 2007 12:11:35 +0200 Peter Eisentraut [EMAIL PROTECTED] wrote: Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. This isn't a big issue ... note that the 'restriction' is easy to remove ... you checkout CVSROOT, modify taginfo and comment out the ALL l ine and check that in ... What this will prevent is an 'accidental tagging' ... you would have to consciously remove the restriction .. but its something anyone could do ... I never understood why tagging uses a special account anyway. It should be done as the person doing the tagging. Agreed, I'm going to start doing it as myself from now on ... I'm not even 100% certain *why* I started doing it as pgsql in the first place ... - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGwye54QvfyHIvDvMRAiHEAKCHGtXA+r5PM1SoBewMJDo3An7BKACg0//z gM7eZNoTEU4sqwBIHprFK1k= =/Qga -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XID wraparound and busy databases
Heikki Linnakangas [EMAIL PROTECTED] writes: Maybe we can do something to reduce the xid consumption? For example, reuse xids for read-only queries. Hmm, that's an idea. More simply, just keep the current transaction open (resetting everything but the XID) if we have made no changes by the time we're told to commit or rollback ... which is something we track already, so as not to waste cycles on useless commit XLOG records. You'd want some upper limit on transaction lifetime, so as to avoid the long lived transactions hurt VACUUM problem, but even reusing a single xact for a few seconds would pretty much eliminate this issue, I bet. It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Index Tuple Compression Approach?
On 8/14/07, Chris Browne [EMAIL PROTECTED] wrote: I recently had a chat with someone who was pretty intimate with Adabas for a number of years who's in the process of figuring things out about PostgreSQL. We poked at bits of the respective implementations, seeing some similarities and differences. He pointed out one aspect of index handling that could (in principle) be an interesting optimization. Evidently, in Adabas, index leaf nodes were not simply tuples, but lists where the index value would not be repeated. In PostgreSQL, if you have the index value 'abc', and there are 10 tuples with that value, then you'll have a page full of tuples of the following form: |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth... Now, the Adabas approach was rather different. It would only have the index value once, and then have the list of tuple pointers: |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]| This could allow a fair bit of compression, for cases where the index value is not unique. Interesting. Some time ago I've played a little with quite a big table which constained path (file path) as a primary key. It did have sense to have a strucure (SELECTs were mostly ORDER BY path WHERE path '/foo' LIMIT n). The actual index was only a little bit smaller than the table (there were maybe 4 or 5 columns there). Some time ago I've had an idea that it might be possible to compress th index size, even if it is a unique index. Take the path example. My idea would be to to split indexed value to 8-byte chunks. For example: /var/lib/postgresql/8.2/main would be split into: /var/lib /postgre sql/8.2 -- these would be insertered into a tree as a scaffold, and only vacuum should remove them.. main -- this would be a leaf node. It could be repeated in non-unique indexes. [/etc/pas] -- scaffold-node |-swd-- leaf node [/etc/sha] |-dow [/var/lib] -- a problematic mixed scaffold/leaf node. [/postgre] |-sql |-sql/8.2 [sql/8.2/] |-main |-foobar The scaffold nodes would be there to guarantee that there is some place to attach leafs to. They should not be removed by DELETE (unless we are sure no other node depends on them). Advantages? The repeated values (as /var/lib/postgresql/8.2) are not repeated -- they are embedded into tree, as a scaffold, actual nodes that are significant (files, not directories, in my example) are put as actual leafs. I guess it would reduce large indexes size and at the same time it could remove limitation that B-tree index cannot index values larger than 1/3 of the database page. 8-byte chunks was given as an example here, perhaps larger value would be better. (Of course redesigning schema to put directories separate from files woul be useful, but it would not help with ORDER BY .. LIMIT queries -- they would have to be JOIN-ed and re-sorted in memory I'm afraid). Regards, Dawid ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] XID wraparound and busy databases
On Wed, Aug 15, 2007 at 12:49:52PM -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Maybe we can do something to reduce the xid consumption? For example, reuse xids for read-only queries. Hmm, that's an idea. More simply, just keep the current transaction open (resetting everything but the XID) if we have made no changes by the time we're told to commit or rollback ... which is something we track already, so as not to waste cycles on useless commit XLOG records. You'd want some upper limit on transaction lifetime, so as to avoid the long lived transactions hurt VACUUM problem, but even reusing a single xact for a few seconds would pretty much eliminate this issue, I bet. It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. Aren't there potential issues with keeping the same XID if a transaction in the middle has modified data? IE: SELECT * FROM a DELETE FROM a ... SELECT * FROM a I'm thinking that in any system that's doing an enormous transaction rate, most will be read-only, which means there'd still be a lot of benefit to simply using one transaction until the next write transaction came along. Something else to think about... any app that's doing that kind of transaction rate is likely going to have a large number of backends, so it would be even better if one XID could be shared across backends. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgp8a0Ey4UZrF.pgp Description: PGP signature
Re: [HACKERS] Index Tuple Compression Approach?
On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote: What Chris is suggesting is basically a special case of GIT, where all the heap tuples represented by an index tuple have the same key. I was actually thinking of adding a flag to index tuples to indicate that special case in GIT. We could effectively do both. The bigger difference that I see is that GIT doesn't just group together ranges of keys, it also groups by heap page number (or a small range of page numbers, as Simon pointed out). Regards, Jeff Davis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XID wraparound and busy databases
Decibel! [EMAIL PROTECTED] writes: Aren't there potential issues with keeping the same XID if a transaction in the middle has modified data? I don't see any, as long as you take a new snapshot. Something else to think about... any app that's doing that kind of transaction rate is likely going to have a large number of backends, so it would be even better if one XID could be shared across backends. Not sane in the least. What happens if two of them start to modify data, and then one commits and one rolls back? In any case, if we can cut the xact rate to one every few seconds per backend, the problem goes away. Actually ... an idea that might be worth investigating is to do something similar to what we already did for subtransactions: don't assign an XID at all until the transaction makes a data change. I think that the main reason for assigning an XID earlier is just that the ProcArray routines ignore backends that are not currently showing an active XID when they figure global XMIN. But a backend could probably advertise an XMIN, indicating the age of the oldest active snapshot it's got, without advertising an active XID. (This could also tie into the idea we discussed awhile back of tracking minimum XMIN better by maintaining a central list of open snapshots within each backend. Perhaps that should be totally decoupled from whether we are advertising an active XID ...) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Another idea for index-only scans
On 8/15/07, Bruce Momjian [EMAIL PROTECTED] wrote: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. Perhaps this is naive (or discussed and discarded... if so, I couldn't find it, but I apologize if that's the case), but wouldn't recording the xid of non-readonly transactions, at commit time, and at the table level, be equivalent to the flag and remove the need for a counter? Readers could just check the last-modification-xid at the beginning and end of their scans to test for heap stability. I suppose that would require a write-exclusive lock on some metadata for each modified table during each commit... so perhaps it's a non-starter right there. --miker ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XID wraparound and busy databases
On Wednesday 15 August 2007 09:49:52 Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Maybe we can do something to reduce the xid consumption? For example, reuse xids for read-only queries. Hmm, that's an idea. More simply, just keep the current transaction open (resetting everything but the XID) if we have made no changes by the time we're told to commit or rollback ... which is something we track already, so as not to waste cycles on useless commit XLOG records. Jan and myself were discussing something like this as it would relate to a subscribe process for slony. Jan care to summerize your thoughts on this? You'd want some upper limit on transaction lifetime, so as to avoid the long lived transactions hurt VACUUM problem, but even reusing a single xact for a few seconds would pretty much eliminate this issue, I bet. It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Another idea for index-only scans
Bruce, I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. Seems marginal at best. Checking overlap between the index and the FSM/DSM and only check dirty pages seems more intelligent, and able to cover a larger number of cases. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] change name of redirect_stderr?
Heikki, Should we change the ordering of pg_settings? Well, an unfixed issue in pg_settings is that the category/subcategory relationship got represented by a non-atomic field which makes sorting on that difficult. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Another idea for index-only scans
Bruce Momjian [EMAIL PROTECTED] writes: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. I think I would prefer to address this in the same infrastructure as the dead-space-map. That way you're not dependent on having no updates happening on the table at all. Any tuples on pages which contain no in-doubt tuples could have their visibility check skipped but when you come across a tuple on a page which has been modified since the last vacuum then you have to check the visibility. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS corruption/mistagging?
Marc G. Fournier wrote: --On Wednesday, August 15, 2007 12:11:35 +0200 Peter Eisentraut [EMAIL PROTECTED] wrote: Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. This isn't a big issue ... note that the 'restriction' is easy to remove ... you checkout CVSROOT, modify taginfo and comment out the ALL l ine and check that in ... What this will prevent is an 'accidental tagging' ... you would have to consciously remove the restriction .. but its something anyone could do ... I never understood why tagging uses a special account anyway. It should be done as the person doing the tagging. Agreed, I'm going to start doing it as myself from now on ... I'm not even 100% certain *why* I started doing it as pgsql in the first place ... If you're doing that, we should probably just delete the pgsql userid from the system? Or at least change it so it doesn't have 'dev' permissions. That way you can't do it wrong in that direction ;-) Seems reasonable? //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] XID wraparound and busy databases
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Speaking of vacuum, hopefully we'll get some sort of dead space map in 8.4. If we keep track of frozen pages there, vacuuming to avoid xid wraparound will be much cheaper. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Another idea for index-only scans
A hybrid scan approach combined with this idea would fit nicely - provide results for tids that are directly visible and set a bit in a bitmap for those that need recheck and extend recheck to take a bitmap (wait - it already does :-) - Luke Msg is shrt cuz m on ma treo -Original Message- From: Gregory Stark [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 15, 2007 02:58 PM Eastern Standard Time To: Bruce Momjian Cc: PostgreSQL-development Subject:Re: [HACKERS] Another idea for index-only scans Bruce Momjian [EMAIL PROTECTED] writes: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. I think I would prefer to address this in the same infrastructure as the dead-space-map. That way you're not dependent on having no updates happening on the table at all. Any tuples on pages which contain no in-doubt tuples could have their visibility check skipped but when you come across a tuple on a page which has been modified since the last vacuum then you have to check the visibility. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS corruption/mistagging?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Wednesday, August 15, 2007 20:57:14 +0200 Magnus Hagander [EMAIL PROTECTED] wrote: Marc G. Fournier wrote: --On Wednesday, August 15, 2007 12:11:35 +0200 Peter Eisentraut [EMAIL PROTECTED] wrote: Am Mittwoch, 15. August 2007 04:20 schrieb Tom Lane: we should at least log such commands, and maybe disallow to anyone except Marc's pgsql account. I don't think we should disallow it. Or otherwise we might one day be stuck if we need to release while some specific person is on vacation. This isn't a big issue ... note that the 'restriction' is easy to remove ... you checkout CVSROOT, modify taginfo and comment out the ALL l ine and check that in ... What this will prevent is an 'accidental tagging' ... you would have to consciously remove the restriction .. but its something anyone could do ... I never understood why tagging uses a special account anyway. It should be done as the person doing the tagging. Agreed, I'm going to start doing it as myself from now on ... I'm not even 100% certain *why* I started doing it as pgsql in the first place ... If you're doing that, we should probably just delete the pgsql userid from the system? Or at least change it so it doesn't have 'dev' permissions. That way you can't do it wrong in that direction ;-) Seems reasonable? there is no pgsql user *on* cvs.postgresql.org, at least there wasn't when I logged on last night: cvs# grep pgsql /etc/passwd cvs# - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGw1AR4QvfyHIvDvMRArsSAJ9WzXyMu7Io6dP8kDnR5HHex1f1gQCg7iDq 3p2RrAzww4dCDPVCyMozAnM= =L+/M -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XID wraparound and busy databases
Tom Lane írta: It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. regards, tom lane In a recent stress test with our PostgreSQL-based cluster between two machines 3 million transaction were performed with pgbench -c 150 -t 2 -s 200 in about _ten hours_. The primary machine (desktop-level machine for development) used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want about my disk lying about flush, its 75MB/sec transfer rate transfer rate is real. So 5 million real transaction in 24 hours is not unrealistic. -- -- Zoltán Böszörményi Cybertec Geschwinde Schönig GmbH http://www.postgresql.at/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Another idea for index-only scans
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. I think I would prefer to address this in the same infrastructure as the dead-space-map. That way you're not dependent on having no updates happening on the table at all. Any tuples on pages which contain no in-doubt tuples could have their visibility check skipped but when you come across a tuple on a page which has been modified since the last vacuum then you have to check the visibility. Yea, the bitmap/page idea is already on the TODO list. This was just a less granular idea. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CVS corruption/mistagging?
Marc G. Fournier wrote: If you're doing that, we should probably just delete the pgsql userid from the system? Or at least change it so it doesn't have 'dev' permissions. That way you can't do it wrong in that direction ;-) Seems reasonable? there is no pgsql user *on* cvs.postgresql.org, at least there wasn't when I logged on last night: cvs# grep pgsql /etc/passwd cvs# Ah, seems I was faster than I expected myself :-) Problem solved then. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Index Tuple Compression Approach?
Dawid Kuroczko wrote: Some time ago I've had an idea that it might be possible to compress th index size, even if it is a unique index. Take the path example. My idea would be to to split indexed value to 8-byte chunks. For example: /var/lib/postgresql/8.2/main would be split into: /var/lib /postgre sql/8.2 -- these would be insertered into a tree as a scaffold, and only vacuum should remove them.. main -- this would be a leaf node. It could be repeated in non-unique indexes. That general approach of storing a common part leading part just once is called prefix compression. Yeah, it helps a lot on long text fields. Tree structures like file paths in particular. It's been discussed before. One big problem is extracting the common leading part. You could only do it for text, but it should be done in a datatype neutral way. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] XID wraparound and busy databases
Heikki Linnakangas írta: Zoltan Boszormenyi wrote: Tom Lane írta: It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. In a recent stress test with our PostgreSQL-based cluster between two machines 3 million transaction were performed with pgbench -c 150 -t 2 -s 200 in about _ten hours_. The primary machine (desktop-level machine for development) used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want about my disk lying about flush, its 75MB/sec transfer rate transfer rate is real. So 5 million real transaction in 24 hours is not unrealistic. 6k xacts / s is five *hundred* million transactions, not five million... Blush. :-) You're right. However a single machine with ramdisk is able to do that. -- -- Zoltán Böszörményi Cybertec Geschwinde Schönig GmbH http://www.postgresql.at/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] XID wraparound and busy databases
Zoltan Boszormenyi wrote: Tom Lane írta: It's hard to see how anyone could be doing 6K xacts/sec unless most are read-only. In a recent stress test with our PostgreSQL-based cluster between two machines 3 million transaction were performed with pgbench -c 150 -t 2 -s 200 in about _ten hours_. The primary machine (desktop-level machine for development) used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want about my disk lying about flush, its 75MB/sec transfer rate transfer rate is real. So 5 million real transaction in 24 hours is not unrealistic. 6k xacts / s is five *hundred* million transactions, not five million... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XID wraparound and busy databases
Zoltan Boszormenyi wrote: used a real disk, the secondary used tmpfs as PGDATA. Say whatever you want about my disk lying about flush, its 75MB/sec transfer rate transfer rate is real. So 5 million real transaction in 24 hours is not unrealistic. 6k xacts / s is five *hundred* million transactions, not five million... Blush. :-) You're right. However a single machine with ramdisk is able to do that. A battery-backed hard disk controller card can do that too. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index Tuple Compression Approach?
[EMAIL PROTECTED] (Dawid Kuroczko) writes: On 8/14/07, Chris Browne [EMAIL PROTECTED] wrote: I recently had a chat with someone who was pretty intimate with Adabas for a number of years who's in the process of figuring things out about PostgreSQL. We poked at bits of the respective implementations, seeing some similarities and differences. He pointed out one aspect of index handling that could (in principle) be an interesting optimization. Evidently, in Adabas, index leaf nodes were not simply tuples, but lists where the index value would not be repeated. In PostgreSQL, if you have the index value 'abc', and there are 10 tuples with that value, then you'll have a page full of tuples of the following form: |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth... Now, the Adabas approach was rather different. It would only have the index value once, and then have the list of tuple pointers: |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]| This could allow a fair bit of compression, for cases where the index value is not unique. Interesting. Some time ago I've played a little with quite a big table which constained path (file path) as a primary key. It did have sense to have a strucure (SELECTs were mostly ORDER BY path WHERE path '/foo' LIMIT n). The actual index was only a little bit smaller than the table (there were maybe 4 or 5 columns there). Some time ago I've had an idea that it might be possible to compress th index size, even if it is a unique index. Take the path example. My idea would be to to split indexed value to 8-byte chunks. For example: /var/lib/postgresql/8.2/main would be split into: /var/lib /postgre sql/8.2 -- these would be insertered into a tree as a scaffold, and only vacuum should remove them.. main -- this would be a leaf node. It could be repeated in non-unique indexes. [/etc/pas] -- scaffold-node |-swd-- leaf node [/etc/sha] |-dow [/var/lib] -- a problematic mixed scaffold/leaf node. [/postgre] |-sql |-sql/8.2 [sql/8.2/] |-main |-foobar The scaffold nodes would be there to guarantee that there is some place to attach leafs to. They should not be removed by DELETE (unless we are sure no other node depends on them). Note that there is a well-understood name for this; this is assortedly known as a Radix tree or a Patricia trie. http://en.wikipedia.org/wiki/Radix_tree As you observe, the tree/trie edges consist not of individual characters, but rather of sequences of characters. Advantages? The repeated values (as /var/lib/postgresql/8.2) are not repeated -- they are embedded into tree, as a scaffold, actual nodes that are significant (files, not directories, in my example) are put as actual leafs. Certainly as compared to a traditional trie, this representation leads to there being a whole lot less nodes and a whole lot less branching. The Radix/Patricia tree compresses things two ways: - As you observe, there can be fewer, larger componets - By combining common prefixes together, this makes cases of strings that are highly patterned much, much, much cheaper, as the tree branches at (and only at) the places where they tend to differ. It could conceivably make it workable to have indexes on big, highly repeating things such as blobs of XML. (Although it *doesn't* get you the ability to search on substrings, which is probably what you'd also want...) I guess it would reduce large indexes size and at the same time it could remove limitation that B-tree index cannot index values larger than 1/3 of the database page. 8-byte chunks was given as an example here, perhaps larger value would be better. (Of course redesigning schema to put directories separate from files woul be useful, but it would not help with ORDER BY .. LIMIT queries -- they would have to be JOIN-ed and re-sorted in memory I'm afraid). I'll gleefully ignore the nature of the example, as it's kind of beside the point. The point is to try to compress what's in the column. If it's being abused somewhat, and has more crud in it, that gives a potential for a *bigger* win. -- output = reverse(gro.mca @ enworbbc) http://linuxdatabases.info/info/spiritual.html Documentation wants to be obsolete. -- Bruce R. Lewis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] XID wraparound and busy databases
On Wednesday 15 August 2007 13:54, Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: Aren't there potential issues with keeping the same XID if a transaction in the middle has modified data? I don't see any, as long as you take a new snapshot. I'm a little confused, wouldnt the transaction that waits 30 minutes before modifying data need to get an XID that jives with the system when it's transaction started, not when it began manipulating data? Would it really be safe to take a new snapshot at that time, istm concurrent writers might have caused potential issues by that point. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index Tuple Compression Approach?
Heikki Linnakangas [EMAIL PROTECTED] writes: That general approach of storing a common part leading part just once is called prefix compression. Yeah, it helps a lot on long text fields. Tree structures like file paths in particular. You kind of want to do avoid both the prefix and the suffix, no? It's been discussed before. One big problem is extracting the common leading part. You could only do it for text, Or for multi-column indexes I could see this being especially useful if you have some columns in the index key which are small and some that are quite large. So if you have an event table with an index on userid,timestamp you wouldn't have to store lots of timestamps on the upper level tree nodes. You would only store them for the leaf nodes. but it should be done in a datatype neutral way. I wonder if there's an analogous operation for other data types though. Numeric could store the a value relative to the parent value. Arrays could store only the elements needed. bytea of course works just as well as text (or better in the face of i18n). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] XID wraparound and busy databases
Robert Treat [EMAIL PROTECTED] writes: I'm a little confused, wouldnt the transaction that waits 30 minutes before modifying data need to get an XID that jives with the system when it's transaction started, not when it began manipulating data? Why? Would it really be safe to take a new snapshot at that time, You wouldn't take a new snapshot. The thought that occurs to me is that there's no reason that a transaction has to have an XID for itself before it takes a snapshot. We always special-case our own XID anyway. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Magnus Hagander wrote: I don't use the functional index part, but for new users I can see how that's certainly a *lot* easier. Can someone with modern hardware test to see if it's still quite a bit slower than the extra column. I had tried it too years ago; and found the functional index to be quite a bit slower: http://archives.postgresql.org/pgsql-general/2005-10/msg00475.php but it'd be interesting to see if faster CPUs changed this. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XID wraparound and busy databases
On Wed, Aug 15, 2007 at 5:06 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: Robert Treat [EMAIL PROTECTED] writes: I'm a little confused, wouldnt the transaction that waits 30 minutes before modifying data need to get an XID that jives with the system when it's transaction started, not when it began manipulating data? Why? Would it really be safe to take a new snapshot at that time, You wouldn't take a new snapshot. The thought that occurs to me is that there's no reason that a transaction has to have an XID for itself before it takes a snapshot. We always special-case our own XID anyway. I'm having trouble picturing how that would work with a transaction using the SERIALIZABLE transaction isolation level, or would this just be done at the READ COMMITTED level? -Kevin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] XID wraparound and busy databases
Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: You wouldn't take a new snapshot. The thought that occurs to me is that there's no reason that a transaction has to have an XID for itself before it takes a snapshot. We always special-case our own XID anyway. I'm having trouble picturing how that would work with a transaction using the SERIALIZABLE transaction isolation level, Why? You take a snapshot, you use it. If you later need to allocate an XID for yourself, you do that --- your own XID is surely uncommitted in any case, so this doesn't affect the validity of the snapshot. The bottom line here is that we need own-XID-if-any to be = snapshot xmin, but there's no obvious reason why it has to be snapshot xmax. This is, in fact, *not* the case for subtransaction XIDs, and I see no fundamental reason why it need be true for the top transaction XID. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Another idea for index-only scans
On Aug 15, 2007, at 1:54 PM, Gregory Stark wrote: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. I think I would prefer to address this in the same infrastructure as the dead-space-map. That way you're not dependent on having no updates happening on the table at all. Any tuples on pages which contain no in-doubt tuples could have their visibility check skipped but when you come across a tuple on a page which has been modified since the last vacuum then you have to check the visibility. The advantage to Bruce's idea is that it sounds pretty simple to implement. While it wouldn't be of use for many general cases, it *would* be useful for read-only tables, ie: old partitions. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq