Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Gavin, Personally, I'd love to see some of these newer data analysis capabilities added to PostgreSQL -- or at least put out there as interesting patches. I think if the code is good enough, and we can avoid horrible non-standard syntax extensions, they should go in. We have to defend our title as most advanced database and having stuff like Skyline first (before DB2 or MS) goes a long way for that. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bug: Buffer cache is not scan resistant
On 3/7/07, Hannu Krosing [EMAIL PROTECTED] wrote: Do any of you know about a way to READ PAGE ONLY IF IN CACHE in *nix systems ? Supposedly you could mmap() a file and then do mincore() on the area to see which pages are cached. But you were talking about postgres cache before, there it should be easily implementable. -- marko ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
Am Mittwoch, 7. März 2007 07:44 schrieb NikhilS: Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain uniqueness on a partition-by-partition basis too? Many things might be useful, but the aim of the table partitioning venture is believed to be the provision of a transparent interface to the existing do-it-yourself partitioning facilities. Therefore, the logical definition of a table must continue to work unchanged (or alternatively, the use of the feature must be prohibited if that cannot be guaranteed in a particular case). Other features such as uniqueness on a partition basis might also be useful but you can do that today and you don't need partitioning for that. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Tue, 2007-03-06 at 18:31 +0530, NikhilS wrote: On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Grouped Index Tuples / Clustered Indexes
I've updated the GIT patch at http://community.enterprisedb.com/git/. Bitrot caused by the findinsertloc-patch has been fixed, making that part of the GIT patch a little bit smaller and cleaner. I also did some refactoring, and minor cleanup and commenting. Any comments on the design or patch? For your convenience, I copied the same text I added to access/nbtree/README to http://community.enterprisedb.com/git/git-readme.txt Should we start playing the name game at this point? I've been thinking we should call this feature just Clustered Indexes, even though it's not exactly the same thing as clustered indexes in other DBMSs. From user point of view, they behave similarly enough that it may be best to use the existing term. As a next step, I'm hoping to get the indexam API changes from the bitmap index patch committed soon, and in a way that supports GIT as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Auto creation of Partitions
Hi, If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. If the unique constraint is supposed to be on a column which is NOT being used for the partitioning, then all the above becomes much more difficult. While partitioning, the additional onus on the user is to specify non-conflicting CHECKs for the range/list partitions. Regards, Nikhils -- Simon Riggs EnterpriseDB http://www.enterprisedb.com -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Grouped Index Tuples / Clustered Indexes
my only question would be. Why isn't that in core already ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? Many partitioning schemes have (or can be made to have) only one possible target partition for the primary key. Thus if you create separate unique indexes on each partition the problem is solved. For a first version I opt, that it is sufficient to disallow creation of a unique index on the master, when the constraints (on columns of this index) do not nail down a specific partition for each row (e.g. a hash or a range on one of the index columns that does not overlap). Later, global index, or indexes with separate partitioning rules can be implemented, that cover the other cases. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Grouped Index Tuples / Clustered Indexes
+1 On 3/7/07 6:53 AM, Grzegorz Jaskiewicz [EMAIL PROTECTED] wrote: my only question would be. Why isn't that in core already ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/7/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? Many partitioning schemes have (or can be made to have) only one possible target partition for the primary key. Thus if you create separate unique indexes on each partition the problem is solved. For a first version I opt, that it is sufficient to disallow creation of a unique index on the master, when the constraints (on columns of this index) do not nail down a specific partition for each row (e.g. a hash or a range on one of the index columns that does not overlap). Later, global index, or indexes with separate partitioning rules can be implemented, that cover the other cases. Andreas Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the child table, only if the indexed column is present as part of the partitioning rule. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Patch license update to developer's FAQ
I don't think it is common. I didn't add that part, so if you also think it is rare, I will remove that distinction. New text: liPostgreSQL is licensed under a BSD license. By posting a patch to the public PostgreSQL mailling lists, you are giving the PostgreSQL Global Development Group the non-revokable right to distribute your patch under the BSD license. If the patch is not BSD-licensed, it will be rejected./li I would remove the last sentence, since it puts the responsibility back on the group (namely to actively reject). Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. This dummy entry would have an expanded CTID which would include the tableoid, so it's possible to check it (albeit there is a problem in that we may require the opening of another heap to do the actual checking). These dummy entries could be removed by bulkcleanup as soon as the inserting transaction is no longer running, to avoid bloating the index too much. All said dummy index entries would be located at either the rightmost or the leftmost leaf, or close to it, so another idea is to have future inserters reuse the entry for a different key. The obvious problem with this is, naturally, the excess I/O that extra index traversing causes. The not so obvious ones are locking, deadlocking and the opening of other heaps and indexes while you do the insertion, which may be too expensive. On the other hand, maybe this idea is easier to implement than full-fledged cross-table indexes, so we could have richer partitioning earlier than when somebody finally bites the bullet and implements cross-table indexes. Or maybe this is just a dumb idea, but I had to let it out anyway :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Josh Berkus josh@agliodbs.com writes: I think if the code is good enough, and we can avoid horrible non-standard syntax extensions, they should go in. We have to defend our title as most advanced database and having stuff like Skyline first (before DB2 or MS) goes a long way for that. Well, whether it's horrible or not is in the eye of the beholder, but this is certainly a non-standard syntax extension. My questions about whether to adopt it have more to do with cost/benefit. I haven't seen the patch, but it sounds like it will be large and messy; and it's for a feature that nobody ever heard of before, let alone one that the community has developed a consensus it wants. I'm not interested in adopting stuff just because DB2 hasn't got it. It's also worth noting that what we've got here is a large patch developed, by students, completely outside our normal development process; so the odds that it's going to be anywhere near acceptable are low. I think the last time we applied a patch that met that description was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember what a fiasco that was, but I do. Sorry to be a thrower of cold water, but I just don't see that this comes anywhere near being something we should be eager to accept. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
Simon, On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote: What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] GIST and TOAST
Teodor Sigaev [EMAIL PROTECTED] writes: input value. As I remember, only R-Tree emulation over boxes, contrib/seg and contrib/cube have simple compress method. Hm, if they just return the original datum without detoasting it then it could be an issue. I'll check. seg and box aren't a varlena types, but cube is and it seems broken :(. g_cube_decompress and g_cube_compress don't detoast values. I'll fix that. Also, all the cube operators like cube_union, cube_size, cube_cmp, etc. Would you like me to do it or are you already started? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Wed, 2007-03-07 at 10:23 -0500, Luke Lonergan wrote: Simon, On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote: What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Oracle's implementation is fairly close to ours, actually. Each partition is a segment. SQLServer's is fairly low level. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. It depends what those advanced solutions are. I definitely want to be able to do run-time exclusion, push down merge joins and parallelism, but I also think that being able to index only the first 3 partitions is a useful feature too. ISTM that making the Append node responsible for exclusion might be a way to go with this, but various ways are possible, I'm sure. The trick is to find one that does everything you need and that will take some deeper analysis. However you do it, you'll still need a way of deciding which partitions to include/exclude that doesn't involve a sequential scan of all partition constraints. So my comments about a different index structure are still going to be relevant, wherever that lives/executes. I'm not doing anything in this area personally for 8.3 though. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Time-correlated columns in large tables
Jeroen, On 3/5/07 12:39 PM, Jeroen T. Vermeulen [EMAIL PROTECTED] wrote: I guess if you did simple run-length compression on these bitmaps you'd end up more or less where I came in. But you wouldn't want to flip a bit somewhere in the middle of a compressed data stream, of course. :- We handle that by doing a recompression in page if possibly, page splitting if not. Jie/Gavin's work will initially be an equality encoded bitmap as Heikki indicates, soon after we can implement range encoding, etc. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] GIST and TOAST
I'm already started, don't worry about that. Cube is broken since TOAST implemented :) Gregory Stark wrote: Teodor Sigaev [EMAIL PROTECTED] writes: input value. As I remember, only R-Tree emulation over boxes, contrib/seg and contrib/cube have simple compress method. Hm, if they just return the original datum without detoasting it then it could be an issue. I'll check. seg and box aren't a varlena types, but cube is and it seems broken :(. g_cube_decompress and g_cube_compress don't detoast values. I'll fix that. Also, all the cube operators like cube_union, cube_size, cube_cmp, etc. Would you like me to do it or are you already started? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Test report on GENERATED/IDENTITY
Hi, I made some tests to prove that GENERATED can help boost performance. I created a table like this: create table t1 ( id serial, i1 integer, i2 integer, g1 integer generated always as ( case when i1 is null then i2 when i2 is null then i1 else i1 + i2 end)); I inserted 1 million records into the table: for (i = 1; i = 1000; i++) for (j = 1; j = 1000; j++) INSERT INTO t1 (i1, i2) VALUES (i, j); After VACUUM FULL ANALYZE, I timed SELECT id, i1, i2, g1 FROM t1 and SELECT id, i1, i2, generation expression FROM t1, result redirected to /dev/null. Results of ten consecutive runs are: SELECT id, i1, i2, g1 FROM t1 -- 2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33536minor)pagefaults 0swaps 2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34068minor)pagefaults 0swaps 2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33826minor)pagefaults 0swaps 2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34067minor)pagefaults 0swaps 2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34068minor)pagefaults 0swaps 2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33574minor)pagefaults 0swaps 2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34067minor)pagefaults 0swaps 2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32147minor)pagefaults 0swaps 2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33584minor)pagefaults 0swaps 2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+34066minor)pagefaults 0swaps Average is 4.68 seconds. SELECT id, i1, i2, generation expression FROM t1 -- 2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33945minor)pagefaults 0swaps 2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33917minor)pagefaults 0swaps 2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32027minor)pagefaults 0swaps 2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32626minor)pagefaults 0swaps 2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32550minor)pagefaults 0swaps 2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32609minor)pagefaults 0swaps 2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33043minor)pagefaults 0swaps 2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33042minor)pagefaults 0swaps 2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+33043minor)pagefaults 0swaps 2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+32052minor)pagefaults 0swaps Average is 5.16 seconds. Using a single GENERATED column gains a good 9.4% in runtime. With the IDENTITY, the picture is not so bright. I expected some cost but not that much. Why is [simple_]heap_update() so expensive? I created a table and times inserting 1 million rows into it: create table t2 (id serial, i1 integer); or create table t2 (id serial generated always as identity, i1 integer); Using a serial column gave me about 12 seconds on the average of 5 runs. With an IDENTITY column, I got 61 seconds once and 66 seconds twice. So, the strictness of the identity column gave me 500-550% performance penalty. With a single unique index on i1, I got 24.4 seconds with the serial column and 67 seconds for the identity column. I run these only once so this last one isn't representative. I tried to use heap_inplace_update() to update the newly updated or inserted tuple in place but it gave me ERROR: heap_inplace_update: wrong tuple length even when I already filled the IDENTITY column with a constant Datum with an Int64 value 0 converted to the type of the column. If I read it correctly, the HOT patch would give me a speedup for this case? Best regards, Zoltán Böszörményi ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. Whoa, do you have anything to back that up ? You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Bug in VACUUM FULL ?
Hi, I am right now working on to get HOT and VACUUM FULL work together. I hit upon a bug which I initially thought is something that HOT has introduced. But I can reproduce it with CVS HEAD as well. Here is what I do: Create a table a simple table with three columns and one index. Insert a single row in the table. CREATE TABLE test (a int, b int, c char(512)); CREATE UNIQUE INDEX testindx ON test (a); INSERT INTO test VALUES (1, 2, 'test'); Now, I continuosly UPDATE the same row with a simple sql script using pgbench with two clients. $ cat test.sql UPDATE test set b = b + 10 WHERE a = 1; SELECT *, ctid FROM test; $ ./pgbench -c 2 -t 5 -f ./test.sql postgres Now, I run VACUUM FULL on the table in a loop with 20 seconds sleep. $ while (true); do echo VACUUM FULL START; ./install/bin/psql -c 'vacuum full verbose test;' postgres; echo VACUUM FULL COMPLETE; sleep 20; done; After few seconds pgbench fails with the following output: starting vacuum...end. Client 1 aborted in state 0: ERROR: could not read block 650 of relation 1663/11467/16401: read only 0 of 8192 bytes Client 0 aborted in state 0: ERROR: could not read block 649 of relation 1663/11467/16401: read only 0 of 8192 bytes transaction type: Custom query scaling factor: 1 number of clients: 2 number of transactions per client: 5 number of transactions actually processed: 29445/10 tps = 459.980394 (including connections establishing) tps = 460.040423 (excluding connections establishing) Is this something which has been reported earlier ? My first guess would be some kind of race condition between the FSM updates and the relation truncation. Its too late for me to look into this now. If someone wants to look into this, that would be great. I would otherwise work on this tomorrow. Any clues/pointers are appreciated. Thanks, Pavan EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
The query Ranbeer gave - as with any skyline query - can be solved with just pure SQL: select * from books b where not exists( select * from books b2 where b2.rating = b.rating and b2.price = b.price and (b2.rating b.rating or b2.price b.price) ); book_name | rating | price ---++--- Prodigal Daughter | 3 | 250 The Notebook | 4 | 300 Fountain Head | 5 | 350 (3 rows) The idea of the BNL (block nested loop) skyline algorithm is to avoid the nested loop by storing dominating records as the query proceeds - in the above example, records which are relatively high in rating and low in price - and comparing each candidate record to those first. BNL is the most reasonable skyline algorithm in the absence of a multidimensional (usually R-Tree) index on the columns. For answering skyline queries where such an index exists over all query columns, the most broadly used generalized algorithm is BBS [1]. Thanks, Dave Fuhry [1] Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1 (Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320 Gavin Sherry wrote: On Tue, 6 Mar 2007, Alvaro Herrera wrote: Also, keep in mind that there were plenty of changes in the executor. This stuff is not likely to be very easy to implement efficiently using our extant executor machinery; note that Ranbeer mentioned implementation of block nested loop and other algorithms. Not sure how easy would be to fold that stuff into the optimizer for multi-input aggregates, instead of hardwiring it to the SKYLINE OF syntax. Yes, there's been a lot of working on calculating skyline efficiently, with different sorting techniques and so on. This is the most interesting part of the idea. You could calculate the query Ranbeer gave using pure SQL and, perhaps, use of some covariance aggregates or something already. Of course, it gets harder when you want to calculate across many dimensions. Personally, I'd love to see some of these newer data analysis capabilities added to PostgreSQL -- or at least put out there as interesting patches. Thanks, Gavin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Tom, My questions about whether to adopt it have more to do with cost/benefit. I haven't seen the patch, but it sounds like it will be large and messy; and it's for a feature that nobody ever heard of before, let alone one that the community has developed a consensus it wants. I'm not interested in adopting stuff just because DB2 hasn't got it. OK, to make it a clearer case: we have an increasing user base using PostgreSQL for decision support. One of the reasons for this is that PG is the *only* OSDB which does a decent job of DSS. Adding unique DSS features will make PostgreSQL attractive to a lot more DSS application developers, and help make up for the things which we don't have yet (parallel query, async I/O, windowing functions). Approximate queries is something with DSS users *want*. Jim Grey addressed this in his ACM editiorial on the databases of the future. It's something that *I* want, and if the Greenplum people aren't speaking up here, it's because they're not paying atttention. Now, I don't know if this Skyline patch is our answer for approximate queries. Maybe I should pester Meredith about getting QBE free of its IP issues; it certainly looked more flexible than Skyline. In either case, the code probably needs a complete refactor. But I think that approximate queries ought to be on our TODO list. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bug in VACUUM FULL ?
Pavan Deolasee [EMAIL PROTECTED] writes: I am right now working on to get HOT and VACUUM FULL work together. I hit upon a bug which I initially thought is something that HOT has introduced. But I can reproduce it with CVS HEAD as well. I think we broke this in 8.2: vac_update_relstats needs to ensure that it always sends a relcache invalidation event, but as of 8.2 it's set up to conditionally update the pg_class entry only if it wrote new values into the tuple. If vacuum full is truncating the rel back to the same length that it was on the previous cycle (as is always the case in this test), then no update, hence no relcache flush, hence clients still think their cached rd_targblock is good. I think the code in vac_update_relstats if (dirty) heap_inplace_update(rd, ctup); needs to look more like what index_update_stats does: if (dirty) { heap_inplace_update(pg_class, tuple); /* the above sends a cache inval message */ } else { /* no need to change tuple, but force relcache inval anyway */ CacheInvalidateRelcacheByTuple(tuple); } Please check if this makes it go away for you --- I'm a bit busy at the moment. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alvaro Herrera Sent: woensdag 7 maart 2007 15:59 To: NikhilS Cc: Zeugswetter Andreas ADI SD; Peter Eisentraut; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Auto creation of Partitions I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. I think that one big index is much better in this case. You are already replicating the data and need a lot of work when updating the partition structure (e.g. removing one). [snip] Rather: If we have the partition relations (aka partition), then we can define a unique index on it. This guarentees that there is at most one tuple with the same value (for the specified columns) for every individual partition. Now for the inserts. We already guarentee that is unique within the partition it lives. So we must insert it first. Next its to check the other partitions (in order!) for existence of a row with a similar context. Of course we require full visibility of the data. We do the insert first, as this ensures other will find it. The order is also important, otherwise there can be situations where we can't guarentee the constraint. Updates are exactly the same. Deletes are trivial. Perhaps you can do it without an index, but this is nothing different than for a normal table. - Joris Dobbelsteen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Synchronized Scan update
On Mar 6, 2007, at 9:43 AM, Josh Berkus wrote: Don't get me wrong, I want things to be easily understandable as well but the reason you site above pretty much makes us need to remove most of the postgresql.conf, including all bgwriter, vacuum cost delay, and autovac settings. Not to mention commit delay and others ;). Wouldn't that be nice! The explosion of GUC settings is primarily a result of not enough information. The reason there are 7 bgwriter settings, for example, is that we have no idea what those settings should be and are hoping that people will tinker with them and tell us. Someday when I can fully profile bgwriter, we'll just have one setting: bgwriter_aggressive, set to a number between 0 and 9. In the mean time; it would be great for these multiple-settings cases to be listed somewhere, indicating that it's something we could use help with. I think that with some explanation of what we're looking for there's any number of people who could do this kind of profiling. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Grouped Index Tuples / Clustered Indexes
On Wed, 2007-03-07 at 10:32 +, Heikki Linnakangas wrote: I've been thinking we should call this feature just Clustered Indexes Works for me. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] WITH/RECURSIVE plans
On Fri, 2007-03-02 at 11:09 -0800, David Fetter wrote: On Fri, Mar 02, 2007 at 10:52:14AM -0800, Joshua D. Drake wrote: Hello, Unfortunately we (the community) will not have WITH/RECURSIVE for 8.3. However I have spoken with a Alexey and Alvaro and Command Prompt has decided to make WITH/RECURSIVE a priority for 8.4. Any chance we can get WITH without RECURSIVE? That would be very handy all by itself. I thought Greg already did submitted that? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring
On Mar 6, 2007, at 10:11 PM, ITAGAKI Takahiro wrote: I have some results that if we have plenty of time for checkpoints, bgwriter_all_maxpages is not a so important parameter because it is adjusted to shared_buffers / duration of checkpoint. Also, my recommended bgwriter_lru_maxpages is average number of recycled buffers per cycle, that is hardly able to tune manually. What do you mean by 'number of recycled buffers per cycle? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Josh Berkus wrote: Now, I don't know if this Skyline patch is our answer for approximate queries. Maybe I should pester Meredith about getting QBE free of its IP issues; it certainly looked more flexible than Skyline. In either case, the code probably needs a complete refactor. But I think that approximate queries ought to be on our TODO list. +1 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
On Wed, 7 Mar 2007, Josh Berkus wrote: Approximate queries is something with DSS users *want*. Jim Grey addressed this in his ACM editiorial on the databases of the future. It's something that *I* want, and if the Greenplum people aren't speaking up here, it's because they're not paying atttention. Now, I don't know if this Skyline patch is our answer for approximate queries. Maybe I should pester Meredith about getting QBE free of its IP issues; it certainly looked more flexible than Skyline. In either case, the code probably needs a complete refactor. What people want from approximate queries is different to this: the desire is usually to balance run time with level of accuracy/quality (some times the desire is to have accurate results as well as similar results). Neither skyline or QBE are about this. The only thing in the spec which addresses this is 'tablesample'. Thanks, Gavin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Yep - we're paying attention Josh! I like the category being explored with skyline, I'm not sure yet how it fits with existing 'soft data' models and applications that use them. If SKYLINE is interesting to app developers, maybe we should consider it for Bizgres? - Luke Msg is shrt cuz m on ma treo -Original Message- From: Gavin Sherry [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 07, 2007 05:44 PM Eastern Standard Time To: Josh Berkus Cc: Tom Lane; pgsql-hackers@postgresql.org; Alvaro Herrera; Chris Browne Subject:Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added! On Wed, 7 Mar 2007, Josh Berkus wrote: Approximate queries is something with DSS users *want*. Jim Grey addressed this in his ACM editiorial on the databases of the future. It's something that *I* want, and if the Greenplum people aren't speaking up here, it's because they're not paying atttention. Now, I don't know if this Skyline patch is our answer for approximate queries. Maybe I should pester Meredith about getting QBE free of its IP issues; it certainly looked more flexible than Skyline. In either case, the code probably needs a complete refactor. What people want from approximate queries is different to this: the desire is usually to balance run time with level of accuracy/quality (some times the desire is to have accurate results as well as similar results). Neither skyline or QBE are about this. The only thing in the spec which addresses this is 'tablesample'. Thanks, Gavin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] RFC: changing autovacuum_naptime semantics
Hackers, I want to propose some very simple changes to autovacuum in order to move forward (a bit): 1. autovacuum_naptime semantics 2. limiting the number of workers: global, per database, per tablespace? I still haven't received the magic bullet to solve the hot table problem, but these at least means we continue doing *something*. Changing autovacuum_naptime semantics Are we agreed on changing autovacuum_naptime semantics? The idea is to make it per-database instead of the current per-cluster, i.e., a nap would be the minimum time that passes between starting one worker into a database and starting another worker in the same database. Currently, naptime is the time elapsed between two worker runs across all databases. So if you have 15 databases, autovacuuming each one takes place every 15*naptime. Eventually, we could have per-database naptime defined in pg_database, and do away with the autovacuum_naptime GUC param (or maybe keep it as a default value). Say for database D1 you want to have workers every 60 seconds but for database D2 you want 1 hour. Question: Is everybody OK with changing the autovacuum_naptime semantics? Limiting the number of workers I was originally proposing having a GUC parameter which would limit the cluster-wide maximum number of workers. Additionally we could have a per-database limit (stored in a pg_database column), being simple to implement. Josh Drake proposed getting rid of the GUC param, saying that it would confuse users to set the per-database limit to some higher value than the GUC setting and then finding the lower limit enforced (presumably because of being unaware of it). The problem is that we need to set shared memory up for workers, so we really need a hard limit and it must be global. Thus the GUC param is not optional. Other people also proposed having a per-tablespace limit. This would make a lot of sense, tablespaces being the natural I/O units. However, I'm not very sure it's too easy to implement, because you can put half of database D1 and half of database D2 in tablespace T1, and the two other halves in tablespace T2. Then enforcing the limit becomes rather complicated and will probably mean putting a worker to sleep. I think it makes more sense to skip implementing per-tablespace limits for now, and have a plan to put per-tablespace IO throttles in the future. Questions: Is everybody OK with not putting a per-tablespace worker limit? Is everybody OK with putting per-database worker limits on a pg_database column? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Proposed ProcessUtility() API additions
I'd like to change ProcessUtility to take a couple of additional parameters, which it in turn would pass down to those (relatively few) utility statements that need one or both: * query_string: source text of command, if known (can be NULL) * is_top_level: TRUE if command is being driven directly from exec_simple_query or exec_execute_message, else FALSE (this would need to be passed through PortalRun, so it gets this parameter added too). The point of adding query_string is that whenever parse analysis of a sub-command is postponed until utility execution, we need to pass the query string to parse_analyze if we want syntax error location. This is already an issue for CREATE SCHEMA, and it's about to be a problem for PREPARE. There are also a couple of places that rely on debug_query_string, which they really shouldn't be doing since that's the current interactive command, not necessarily what's being parsed at the moment. The point of adding is_top_level is to provide a simpler, more reliable means for PreventTransactionChain and related functions to detect whether a function is trying to invoke a non-transaction-block-safe command. Currently we rely on an ugly test involving seeing if the statement node is in the QueryContext, but that's always been a kluge, and I'm not sure that it works 100% even today. I'd like to get rid of the QueryContext global altogether. Comments, objections? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
Alvaro Herrera [EMAIL PROTECTED] writes: Is everybody OK with changing the autovacuum_naptime semantics? it seems already different from 8.2, so no objection to further change. Is everybody OK with not putting a per-tablespace worker limit? Is everybody OK with putting per-database worker limits on a pg_database column? I don't think we need a new pg_database column. If it's a GUC you can do ALTER DATABASE SET, no? Or was that what you meant? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] WITH/RECURSIVE plans
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2007-03-02 at 11:09 -0800, David Fetter wrote: On Fri, Mar 02, 2007 at 10:52:14AM -0800, Joshua D. Drake wrote: Hello, Unfortunately we (the community) will not have WITH/RECURSIVE for 8.3. However I have spoken with a Alexey and Alvaro and Command Prompt has decided to make WITH/RECURSIVE a priority for 8.4. Any chance we can get WITH without RECURSIVE? That would be very handy all by itself. I thought Greg already did submitted that? I thought so too. Except then I tested it on more complex examples and it didn't seem to work. Somehow the common table expression name wasn't being propagated into the scope of subqueries. If that were fixed then, yes, basically it's done. I think in the long term we want common table expressions to not be executed multiple times when they're referred to multiple times. So the existing implementation which is basically just the syntax support and directly inlines them wherever they're referenced, is inadequate. But there's nothing (aside from the aforementioned failure to work) wrong with putting in something that multiply evaluates now and avoiding the multiple evaluation later as an optimisation. Having said that I suppose one could argue it's incorrect to multiply evaluate if there's a volatile function in the common table expression. But I could see an equally strong argument for the converse. I'm wouldn't be too concerned about that. So... The question I posed earlier was what types of data belong in the pstate and what belongs in the parse node itself. I had the impression from the code that the pstate was inherited when subqueries were planned (and restored after they were done) so that made it the right place to keep the list of common table expression names that were in scope. Is that right? If so then I probably just have some mundane bug somewhere I can track down now that I have some time. If not then what am I misunderstanding about the pstate and where would be the right place to keep this kind of parser namespace state? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposed ProcessUtility() API additions
Tom Lane [EMAIL PROTECTED] writes: * is_top_level: TRUE if command is being driven directly from exec_simple_query or exec_execute_message, else FALSE (this would need to be passed through PortalRun, so it gets this parameter added too). ... The point of adding is_top_level is to provide a simpler, more reliable means for PreventTransactionChain and related functions to detect whether a function is trying to invoke a non-transaction-block-safe command. Currently we rely on an ugly test involving seeing if the statement node is in the QueryContext, but that's always been a kluge, and I'm not sure that it works 100% even today. I'd like to get rid of the QueryContext global altogether. I'm not exactly following. How does the exec_simple_query or exec_execute_message tell you whether you're in a transaction? Can't you exec_simple_query(BEGIN;) and then exec_simple_query a second query in the same transaction? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Grouped Index Tuples / Clustered Indexes
On Wed, 2007-03-07 at 10:32 +, Heikki Linnakangas wrote: I've been thinking we should call this feature just Clustered Indexes So we would have clustered tables which are tables whose heap is ordered according to an index and separately clustered indexes which are indexes optimized for such tables? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] WITH/RECURSIVE plans
Any chance we can get WITH without RECURSIVE? That would be very handy all by itself. I thought Greg already did submitted that? To my knowledge, it is not done. If it is, great! Joshua D. Drake ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: I think if the code is good enough, and we can avoid horrible non-standard syntax extensions, they should go in. We have to defend our title as most advanced database and having stuff like Skyline first (before DB2 or MS) goes a long way for that. Well, whether it's horrible or not is in the eye of the beholder, but this is certainly a non-standard syntax extension. Being non-standard should not be the only reason to reject a worthwhile feature. Do you really believe that the SQL standard covers every feature that a RDBMS could ever want to implement? Do you think that the current non-standard features of PostgreSQL should be removed? My questions about whether to adopt it have more to do with cost/benefit. I haven't seen the patch, but it sounds like it will be large and messy; and it's for a feature that nobody ever heard of before, let alone one that the community has developed a consensus it wants. I'm not interested in adopting stuff just because DB2 hasn't got it. Partially agree but I do think it is worth looking at to see if some or all of the feature is worth implementing. The fact that several different groups have been mentioned to be working on this feature would indicate that it is worth considering. Maybe one of the other groups will have implemented it better than the first off the rank. Maybe our core developers can work out a better way to implement these features. A few people on this list have said they are interested in this. It's also worth noting that what we've got here is a large patch developed, by students, completely outside our normal development process; so the odds that it's going to be anywhere near acceptable are low. I think the last time we applied a patch that met that description was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember what a fiasco that was, but I do. True but the quals he has listed on his web pages look impressive and probably give him a little reason to have his work considered/looked at. He may just end up being a main PostgreSQL developer in the future. Sorry to be a thrower of cold water, but I just don't see that this comes anywhere near being something we should be eager to accept. True we shouldn't just say sounds good let's put it in but with some indication that this feature is along the lines of what users want, would indicate that we should be asking - Do we want this or a similar feature? Is the theory behind this feature solid? Can the same end results be gained with other existing methods? Is the implementation offered worth considering? Has it been developed to meet the PostgreSQL developer guidelines? Is it reasonable to work on it to reach a level of quality/performance that we will be happy to include? Can we implement this feature better ourselves? Do we want to start this feature from scratch ourselves? -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
Alvaro, Alvaro Herrera wrote: I still haven't received the magic bullet to solve the hot table problem, but these at least means we continue doing *something*. Can I know about what is your plan or idea for autovacuum improvement for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4? Thanks, Galy Lee lee.galy _at_ ntt.oss.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
Andreas, On 3/7/07 11:45 AM, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: Whoa, do you have anything to back that up ? Sure - when we start to consider designs that implement advanced data management features, we run into problems with the architecture of tables-tables-tables Here are some examples: 1 - people think of partitions as a logical building block for tables, they would like to move partitions around underneath a table without the table definition being involved. In the current implementation, there are explicit linkages between the table definition and the child tables - imagine an ALTER TABLE foo_parent ADD COLUMN and how it would need to cascade to 1,000 child tables and you get the beginning of it - this connection should not exist. 2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is terribly slow and gets slower as you add more partitions. If done closer to the storage layer, this can be done in ways that use access methods shared with other storage entities, e.g. Indices, and the code path would flow more naturally. 3 - Parallel query can be accomplished more easily by separating scans across relations split among tablespaces. This is more natural than trying to parallelize APPEND nodes within existing plans You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Well, it's being used and that's good, but it needs to be better IMO and I think that before we go too far down the current path we should consider the alternatives more carefully. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring
On Wed, 7 Mar 2007, ITAGAKI Takahiro wrote: Also, my recommended bgwriter_lru_maxpages is average number of recycled buffers per cycle, that is hardly able to tune manually. This is completely dependent on what percentage of your buffer cache is pinned. If your load is something like the standard pgbench, the LRU writer will rarely find anything useful to write, so this entire line of thinking won't work. The proper behavior for heavily pinned data is to turn off the LRU writer altogether so there's more time to run the all scan. The job I'm trying to take on here is not to presume I can solve these problems myself yet. I've instead recognized that people need usefully organized information in order to even move in that direction, and that informatoin is not even close to being available right now. What my latest work in progress patches do is summarize each scan of the buffer pool with information about how much was written by each of the two writers, along with noting what percentage of the pool was pinned data. I'm trying to get that one ready to submit this week. Those three values suggest some powerful techniques for tuning, but it's not quite good enough to allow auto-tuning. It also needs a feel for how much time is left before the next checkpoint. What really needs to go along with all this is a sort of progress bar that esimates how long we are from a checkpoint based on both a) the timeout, and b) how many segments have been written. The timeout one is easy to work with that way (from what I read of your code, you've worked that angle). The part I had trouble doing was getting the WAL writers to communicate a progress report on how many segments they filled back to the bgwriter. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposed ProcessUtility() API additions
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: The point of adding is_top_level is to provide a simpler, more reliable means for PreventTransactionChain and related functions to detect whether a function is trying to invoke a non-transaction-block-safe command. I'm not exactly following. How does the exec_simple_query or exec_execute_message tell you whether you're in a transaction? Read PreventTransactionChain and friends. We can tell from state info provided by xact.c whether we're inside a BEGIN block. The problem stems from cases like SELECT myfunc() where myfunc() tries to execute one of the verboten commands, via SPI or some similar mechanism. If we allowed that, then the function could try to execute more commands within the same top-level transaction, which is exactly what we don't want to allow. If the command is being issued directly by exec_simple_query or exec_execute_message, and it's not inside a BEGIN block, then there's no way for an additional command to be issued before commit. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Trivial HugeTLB Benchmark
On Mar 4, 2007, at 3:33 PM, Ryan Cumming wrote: I did another 18 runs, 9 each for huge pages and normal shared memory. The database was reinitialized before every third run with pgbench -i -s 10. The runs themselves were done with pgbench -s 10 -c 5 -t 1 Rather than doing that, I think you'd be much better off just running a very long benchmark and turning on autovaccum. That would at least be closer to real-world usage. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
On Mar 6, 2007, at 9:13 AM, Shane Ambler wrote: NikhilS wrote: On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning. I would think that a trigger would be a better choice as I see the need (or at least the possibility) for more than just a rewrite. When a value is inserted that is outside of a condition currently covered by an existing child table then a new child will need to be spawned to contain the new data. There's no reason a new child has to be spawned, and I don't think that having a DML statement 'automagically' generating DDL is such a hot idea, either. Also, there's nothing inherently wrong with having an 'overflow partition' (to use Oracle syntax) that holds values that don't fall in the range of any other tables. The obvious place to do that with our partitioning is in the parent table. There are 2 other reasons to favor triggers though: 1) People (Josh Drake comes to mind) have found that if you get over a tiny number of partitions, the performance of rules is abysmal. 2) I believe it should be possible to construct an update trigger that allows you to perform updates that will place the row in question into a new partition. While I can see cases for simply disallowing updates to the partitioning key, I think there are also times when being able to do that would be very useful. Will ALTER TABLE be extended to handle partitions? This will allow partitioning existing tables (maybe without blocking access?) and allow things like ALTER TABLE mytable ADD PARTITION (mydate within 200703) and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or would dropping be covered by DELETE FROM mytable where mydate = 199912 ? I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. Could such a syntax be devised for date columns? (month of mydate) or similar to auto create partitions based on the year and month of a date column? or will we just do CHECK(mydatecol = 1/3/07 and mydatecol = 31/3/07) for each month of data? Also (day of mydatecol) to partition based on year and day of year. Another syntax possibility - range(myserialcol of 50) where new child tables are created every 50 rows? Maybe I'm looking at auto-maintenance which is beyond any current planning? I don't think it's worth it to burden the database with auto-creating time partitions; it's easy enough to setup a cron job to handle it. It might be more useful to have the database handle things like partitioning on a SERIAL column, though I agree with Nikhils that this should wait. Does any other database support 'automagic' partition creation? I know Oracle 9i doesn't... not sure about 10g or DB2 or MSSQL... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
On Mar 6, 2007, at 4:57 AM, NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. Since we want to eventually support 'global' indexes, I think we need to be really careful here. There's actually 3 possible cases: 1) Index* should be global (spanning multiple tables) 2) Index* should be inherited by all partitions as they're created 3) Index* should exist only on the parent table * Note that there's really no reason this has to be limit to indexes; it could certainly apply to constraints, or even triggers. IIRC, Oracle has a special syntax for global indexes; any other index defined on a partitioned table is picked up *by newly created partitions*. If you want to add indexes to existing partitions, you have to explicitly add it to each partition. I'd like to eventually see us supporting all 3 options. I'm not sure how much we want to clutter the grammar, though. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote: If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. Directing data to child tables with triggers pretty much necessitates having some way to codify what partition a particular row belongs in. IE: for partitioning by month, you'll see things like naming the partition tables parent_table_name_$YEAR_$MONTH, so the 'partitioning function' takes a date or timestamp and then returns what partition it belongs to. Perhaps there is some way to use that mapping to drive the selection of what partitions could contain a given value? One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
On Mar 7, 2007, at 4:00 PM, Alvaro Herrera wrote: Is everybody OK with putting per-database worker limits on a pg_database column? I'm worried that we would live to regret such a limit. I can't really see any reason to limit how many vacuums are occurring in a database, because there's no limiting factor there; you're either going to be IO bound (per-tablespace), or *maybe* CPU-bound (perhaps the Greenplum folks could enlighten us as to whether they run into vacuum being CPU-bound on thumpers). Changing the naptime behavior to be database related makes perfect sense, because the minimum XID you have to worry about is a per- database thing; I just don't see limiting the number of vacuums as being per-database, though. I'm also skeptical that we'll be able to come up with a good way to limit the number of backends until we get the hot table issue addressed. Perhaps a decent compromise for now would be to limit how many 'small table' vacuums could run on each tablespace, and then limit how many 'unlimited table size' vacuums could run on each tablespace, where 'small table' would probably have to be configurable. I don't think it's the best final solution, but it should at least solve the immediate need. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!
Shane Ambler [EMAIL PROTECTED] writes: Tom Lane wrote: Well, whether it's horrible or not is in the eye of the beholder, but this is certainly a non-standard syntax extension. Being non-standard should not be the only reason to reject a worthwhile feature. No, but being non-standard is certainly an indicator that the feature may not be of widespread interest --- if it were, the SQL committee would've gotten around to including it; seems they've managed to include everything but the kitchen sink already. Add to that the complete lack of any previous demand for the feature, and you have to wonder where the market is. The fact that several different groups have been mentioned to be working on this feature would indicate that it is worth considering. It looks to me more like someone published a paper that caught the attention of a few profs looking for term projects for their students. Now maybe it really is the best idea since sliced bread and will be seen in the next SQL spec edition, but color me skeptical. It seems to me to be a very narrow-usage extension, as opposed to (eg) multi-input aggregates or WITH/RECURSIVE, which provide general mechanisms applicable to a multitude of problems. Now even so it would be fine if the implementation were similarly narrow in scope, but the published description of the patch mentions a large chunk of additional executor mechanisms. If we're going to be adding as much code as that, I'd like to see a wider scope of usage for it. Basically, this patch isn't sounding like it has a reasonable bang-to-the-buck ratio ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bug in VACUUM FULL ?
Tom Lane wrote: Please check if this makes it go away for you --- I'm a bit busy at the moment. Thanks a lot, Tom. It seems to work fine for me. I will do some more tests and report if I see any issue. Btw, the patch as per your suggestion is attached. Thanks, Pavan *** src/backend/commands/vacuum.c 2007-02-16 04:53:22.0 +0530 --- src/backend/commands/vacuum.c 2007-03-08 09:25:15.016991272 +0530 *** *** 718,724 --- 718,732 * If anything changed, write out the tuple */ if (dirty) + { heap_inplace_update(rd, ctup); + /* the above sends a cache inval message */ + } + else + { + /* no need to change tuple, but force relcache inval anyway */ + CacheInvalidateRelcacheByTuple(ctup); + } heap_close(rd, RowExclusiveLock); } ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq