[HACKERS] about hacking postgresql
hello, here more of details I have a set of transaction. Naturally, the transactions execute themselves in competition. But I would want to give to every transaction a priority. Thus the transaction more priority must execute itself in first. I thought, as first step, to change the transaction syntax as follows Start transaction (priority) But I do not know where to do exactly the change in gram.y for I have to try to change in TransactionStmt, but I always receive the message syntax error at now near ( thanks you a lot
Re: [HACKERS] Python 3.0 does not work with PL/Python
On 4/4/09, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: I have recently fixed the configure script to recognize Python 3.0. But note that building and running PL/Python with Python 3.0 does not actually work. It looks like several symbols have been removed or changed. It would be good if the Python pundits around here could take a look. (I have found Python 3.0 to be very quick and easy to install from source, in case your distribution doesn't have it packaged yet.) I thought I would experiment with this a bit. I got past Python's configure; make; make install okay, but got no further than here with building PG: checking for python... /home/tgl/python3.0.1/bin/python checking for Python distutils module... ./configure: line 6946: 21044 Aborted ${PYTHON} -c 'import distutils' 2- no configure: error: distutils module not found $ Okay, but some research revealed that there does not exist any working distutils for Python 3.0.1 yet: http://regebro.wordpress.com/2009/02/01/setuptools-and-easy_install-for-python-3/ If the language is still at the point where they're breaking fundamental tools with each dot-release, I don't think it's really stable enough for us to spend effort on :-( Well, the point of 3.0 was to break the world... I also found out that my favorite distro is just *starting* to think about what it will take to migrate to Python 3, and they seem to think that it's not going to be viable till around Fedora 13 (a year away): https://www.redhat.com/archives/fedora-devel-list/2009-April/msg00085.html So my conclusion is that Python 3.0 is much too wet behind the ears for us to worry about in PG 8.4. I'd guess that we should come back to the issue towards the end of 2009, and perhaps think about back-porting after we have something working in 8.5. It is not wet (the new interfaces should be stable), but it is break from 2.x series. This means that users of PL/Python should not expect PL/Python to automatically work with 3.0. Supporting 3.0 will be a new feature. So it's OK to drop it from 8.4. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about hacking postgresql
abdelhak benmohamed wrote: hello, here more of details I have a set of transaction. Naturally, the transactions execute themselves in competition. But I would want to give to every transaction a priority. Thus the transaction more priority must execute itself in first. I thought, as first step, to change the transaction syntax as follows Start transaction (priority) But I do not know where to do exactly the change in gram.y for I have to try to change in TransactionStmt, but I always receive the message syntax error at now near ( thanks you a lot hello ... if you pump then in through the same database connection you do basically the same thing. if you want transactions to lock out each other, take a look at advisory locks. implementing this for start transaction seems pretty useless to me however. best regards, hans -- Cybertec Schönig Schönig GmbH Professional PostgreSQL Consulting, Support, Training Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d commands and information_schema
Tom Lane wrote: I don't find this to be a pressing problem. If the user has lots of schemas, they probably have lots of objects too, and are unlikely to need such a thing. Current behaviour makes it impossible to get a quick overview of all the user defined objects. And it doesn't really matter what the number of schemas is -- it gets messy for even small number of schemas and objects. Lets assume 2 user tables in schemas public and foo. \dt *.* will give: List of relations Schema | Name | Type | Owner +-+---+- foo| t2 | table | martinp information_schema | sql_features| table | martinp ... pg_catalog | pg_aggregate| table | martinp ... public | t1 | table | martinp (51 rows) This is a lot of irrelevant stuff the user has to filter out. It is much worse with functions -- \df *.* results in 1900+ functions that I usually don't want to see. The alternative is to perform a \dn first and then loop through that (this is the annoyance the U switch would remove). search_path enters into it too; a simple U switch isn't going to provide a full answer. For our needs I wouldn't really consider using search_path for anything but temporary hacks. However, a psql variable that specifies a list of name patterns to be excluded from describe, could be useful. Something along the lines of: \set DESCRIBE_EXCLUDE_PATTERNS 'pg_catalog.*, information_schema.*, ...' This could be then customized to each site's needs -- add pgq, slony, etc. and put to .psqlrc. It is questionable whether the filter should be applied to default \dX (override with S to describe all). Maybe it'd be better to introduce an extra switch that applies the filters. I just noticed that something similar was already suggested by Euler in http://archives.postgresql.org/message-id/49cdb4e0.8030...@timbira.com regards, Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not thrilled about adding a column to pg_attribute for this. What is the specific nature of your concern? Actually, I'm more worried about the TupleDesc data structure than the catalogs. There are TupleDescs all over the backend, and I've seen evidence in profiles that setting them up is a nontrivial cost. You're very possibly right that four more bytes is in the noise, though. Two other comments now that I've read a little further: * This isn't happening for 8.4, so adjust the pg_dump code. I thought about writing 80500, but the effect of that would have been to render the patch impossible to test, so I didn't. :-) I think I'll be very lucky if that's the most bitrot this accumulates between now and when the tree is open for 8.5 development. System catalog changes stink in that regard. I suppose we could tag and branch the tree now, but that would just move the work of fixing any subsequent conflicts from patch authors to committers, which is sort of a zero-sum game. * Using an integer is bogus. Use a float4 and forget the weird scaling; it should have exactly the same interpretation as stadistinct, except for 0 meaning unset instead of unknown. I think there's a pretty good chance that will lead to a complaint that is some variant of the following: I ran this command and then I did a pg_dump and the output doesn't match what I put in. Or maybe, I did a dump and a restore on a different machine with a different architecture and then another dump and then I diffed them and this popped out. I have a deep-seated aversion to storing important values as float, and we seem to have no other floats anywhere in our DDL, so I was a little leery about breaking new ground. There's nothing particularly special about the scaling that the pg_statistic stuff uses, and it's basically pretty obscure internal stuff anyway, so I think the consistency argument is fairly weak. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
Tom Lane wrote: If there are no objections, I'm going to remove the following items from the list at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items change cardinality() for multi-dim arrays? Drop; there's no consensus that this should be changed I don't think we should let this go quite so easily, as this is a new function, so the bias should be to getting it right rather than don't change it. The supplied functionality is not only surprising, but easily obtained by an existing function. ISTM if we're supplying a new function it should have new functionality. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 5, 2009 at 7:45 AM, Andrew Dunstan and...@dunslane.net wrote: Tom Lane wrote: If there are no objections, I'm going to remove the following items from the list at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items change cardinality() for multi-dim arrays? Drop; there's no consensus that this should be changed I don't think we should let this go quite so easily, as this is a new function, so the bias should be to getting it right rather than don't change it. I think it is right already, but the point is debatable. The supplied functionality is not only surprising, but easily obtained by an existing function. ISTM if we're supplying a new function it should have new functionality. Well, it's a compatibility function... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
Robert Haas wrote: On Sun, Apr 5, 2009 at 7:45 AM, Andrew Dunstan and...@dunslane.net wrote: Tom Lane wrote: If there are no objections, I'm going to remove the following items from the list at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items change cardinality() for multi-dim arrays? Drop; there's no consensus that this should be changed I don't think we should let this go quite so easily, as this is a new function, so the bias should be to getting it right rather than don't change it. I think it is right already, but the point is debatable. The supplied functionality is not only surprising, but easily obtained by an existing function. ISTM if we're supplying a new function it should have new functionality. Well, it's a compatibility function... compatible with what? The other thing that frankly bothers me is that we appear to have acquired this function by a curious process which involved no proposal or discussion that I have discovered. If there had been proper and adequate discussion before the item was committed I wouldn't be making a fuss now, whether or not I agreed with the result. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 05, 2009 at 07:55:44AM -0400, Robert Haas wrote: On Sun, Apr 5, 2009 at 7:45 AM, Andrew Dunstan and...@dunslane.net wrote: Tom Lane wrote: If there are no objections, I'm going to remove the following items from the list at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items change cardinality() for multi-dim arrays? Drop; there's no consensus that this should be changed I don't think we should let this go quite so easily, as this is a new function, so the bias should be to getting it right rather than don't change it. I think it is right already, but the point is debatable. The supplied functionality is not only surprising, but easily obtained by an existing function. ISTM if we're supplying a new function it should have new functionality. Well, it's a compatibility function... It's actually in SQL:2008. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] About hacking postgresql
Hellow, I want to be more precis I want to change the syntax as follows Start Transaction (priority) I have 03 transactions I want to give to every transaction à priority Example Transaction 1 Start transaction (03) . Commit Transaction 2 Start transaction (04) ……… Commit Transaction 1 Start transaction (01) .. Commit (the transaction with the higth priority must execute first) How to express this in gram.y? Is change in gram.y sufficient? Thanks a lot
[HACKERS] about hacking postgresql
Hellow, I want to be more precis I want to change the syntax as follows Start Transaction (priority) I have 03 transactions I want to give to every transaction à priority Example Transaction 1 Start transaction (03) . Commit Transaction 2 Start transaction (04) ……… Commit Transaction 1 Start transaction (01) .. Commit (the transaction with the higth priority must execute first) How to express this in gram.y? Is change in gram.y sufficient? Thanks a lot
[HACKERS] EXPLAIN WITH
I'm a bit unsatisfied with this output because it doesn't tell me the plan it used to construct the CTE being scanned. rhaas=# explain with wumpus as (select * from foo where id 200) select * from foo f, wumpus c, wumpus u where f.creator_id = c.id and f.last_updater_id = u.id; QUERY PLAN - Hash Join (cost=31.44..74.94 rows=1000 width=36) Hash Cond: (f.creator_id = c.id) InitPlan - Seq Scan on foo (cost=0.00..18.50 rows=199 width=12) Filter: (id 200) - Hash Join (cost=6.47..36.22 rows=1000 width=24) Hash Cond: (f.last_updater_id = u.id) - Seq Scan on foo f (cost=0.00..16.00 rows=1000 width=12) - Hash (cost=3.98..3.98 rows=199 width=12) - CTE Scan on wumpus u (cost=0.00..3.98 rows=199 width=12) - Hash (cost=3.98..3.98 rows=199 width=12) - CTE Scan on wumpus c (cost=0.00..3.98 rows=199 width=12) (12 rows) I haven't looked at what would be required to fix this, but I think we should fix it before releasing 8.4. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash in gist insertion on pathological box data
I don't like throwing an error there; I wish there were a way for the generic code to apply the fallbackSplit code instead. I see that in this particular formulation it's dependent on the datatype --- can we get around that, by having it invoke the union method? Done. rtree.patch.gz contains patch for gistproc.c, genericsplit.patch.gz adds simple genericPickSplit to gistsplit.c to workaround bug of user-defined picksplit. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ rtree.patch.gz Description: Unix tar archive genericsplit.patch.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
On Sun, Apr 5, 2009 at 9:40 AM, Robert Haas robertmh...@gmail.com wrote: I'm a bit unsatisfied with this output because it doesn't tell me the plan it used to construct the CTE being scanned. I'm totally wrong. Sorry for the noise. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
* Robert Haas (robertmh...@gmail.com) wrote: On Sun, Apr 5, 2009 at 9:40 AM, Robert Haas robertmh...@gmail.com wrote: I'm a bit unsatisfied with this output because it doesn't tell me the plan it used to construct the CTE being scanned. I'm totally wrong. Sorry for the noise. Eh. It could be made clearer what's happening, imv. Esp. if you have a number of WITH pieces and want to know which ones connect to which in the plan between the CTE nodes and the plans that they're running... eg: postgres=# explain with wumpus as (select * from foo where id 200), abc as (select * from tab1) select * from foo f, wumpus c, wumpus u, abc x where f.creator_id = c.id and f.last_updater_id = u.id; QUERY PLAN -- Nested Loop (cost=1266.59..1001458.19 rows=35936310 width=52) InitPlan - Seq Scan on foo (cost=0.00..34.25 rows=647 width=12) Filter: (id 200) - Seq Scan on tab1 (cost=0.00..27.70 rows=1770 width=16) - CTE Scan on abc x (cost=0.00..35.40 rows=1770 width=16) - Materialize (cost=1204.64..1566.67 rows=20303 width=36) - Hash Join (cost=42.05..1025.34 rows=20303 width=36) Hash Cond: (f.creator_id = c.id) - Hash Join (cost=21.03..275.66 rows=6276 width=24) Hash Cond: (f.last_updater_id = u.id) - Seq Scan on foo f (cost=0.00..29.40 rows=1940 width=12) - Hash (cost=12.94..12.94 rows=647 width=12) - CTE Scan on wumpus u (cost=0.00..12.94 rows=647 width=12) - Hash (cost=12.94..12.94 rows=647 width=12) - CTE Scan on wumpus c (cost=0.00..12.94 rows=647 width=12) (16 rows) I see the initplan's for foo and tab1, and can figure out where those fit in, but in a more complex situation it would be much less clear, I believe. Would be nice if there was a CTE ID or similar to link between the pieces of the InitPlan and the CTE nodes. I think that makes sense anyway, I havn't played with CTE much but that's what it looks like to me. Stephen signature.asc Description: Digital signature
Re: [HACKERS] EXPLAIN WITH
* Stephen Frost (sfr...@snowman.net) wrote: Would be nice if there was a CTE ID or similar to link between the pieces of the InitPlan and the CTE nodes. Erm, of course, the CTE *has* an ID already, since you name them. Could we get that ID/name up into the piece of the InitPlan that is handling that CTE? Stephen signature.asc Description: Digital signature
Re: [HACKERS] Python 3.0 does not work with PL/Python
Marko Kreen mark...@gmail.com writes: On 4/4/09, Tom Lane t...@sss.pgh.pa.us wrote: So my conclusion is that Python 3.0 is much too wet behind the ears for us to worry about in PG 8.4. I'd guess that we should come back to the issue towards the end of 2009, and perhaps think about back-porting after we have something working in 8.5. It is not wet (the new interfaces should be stable), but it is break from 2.x series. Hm, did you read the link I cited? It's not so surprising that 3.0 should have broken distutils, but what I found distressing is that they fixed distutils and then 3.0.1 broke it *again*. I stand by my opinion that Python 3 isn't stable yet. This means that users of PL/Python should not expect PL/Python to automatically work with 3.0. Supporting 3.0 will be a new feature. So it's OK to drop it from 8.4. One other thing that we'll have to seriously consider is whether we should package python3 as a separate PL, so that people can keep using their 2.x plpython functions without fear of breakage. I know that the Fedora guys are currently debating whether to treat it that way, and I suppose other distros are having or will soon have the same conversation. Six months from now, there will be some precedents and some track record for us to look at in making that choice. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
Stephen Frost sfr...@snowman.net writes: * Stephen Frost (sfr...@snowman.net) wrote: Would be nice if there was a CTE ID or similar to link between the pieces of the InitPlan and the CTE nodes. Erm, of course, the CTE *has* an ID already, since you name them. Could we get that ID/name up into the piece of the InitPlan that is handling that CTE? I'm not sure but will be glad to take a look. Assuming it's not unreasonably difficult, does anyone object to a format like this: Nested Loop (cost=1266.59..1001458.19 rows=35936310 width=52) InitPlan CTE abc - Seq Scan on foo (cost=0.00..34.25 rows=647 width=12) Filter: (id 200) CTE wumpus - Seq Scan on tab1 (cost=0.00..27.70 rows=1770 width=16) - CTE Scan on abc x (cost=0.00..35.40 rows=1770 width=16) - Materialize (cost=1204.64..1566.67 rows=20303 width=36) - Hash Join (cost=42.05..1025.34 rows=20303 width=36) ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
Andrew Dunstan and...@dunslane.net writes: Robert Haas wrote: Well, it's a compatibility function... compatible with what? It's required by the SQL standard. The other thing that frankly bothers me is that we appear to have acquired this function by a curious process which involved no proposal or discussion that I have discovered. If there had been proper and adequate discussion before the item was committed I wouldn't be making a fuss now, whether or not I agreed with the result. I think Peter put it in under the assumption that meeting spec-required syntax would always pass muster. It is however fair to question whether he made the right extrapolation of the spec's definition to cases that are not in the spec. Personally I am in favor of changing it to give the total number of array elements, on the grounds that (1) that's as defensible a reading of the spec as the other and (2) it would add actual new functionality rather than being only a relabeling of array_length. I will leave that item on the Open Items list. I take it no one's excited about the others? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
I wrote: Stephen Frost sfr...@snowman.net writes: Erm, of course, the CTE *has* an ID already, since you name them. Could we get that ID/name up into the piece of the InitPlan that is handling that CTE? I'm not sure but will be glad to take a look. Assuming it's not unreasonably difficult, does anyone object to a format like this: I looked at this a bit and found that it *is* unreasonably difficult given what's currently stored in plan trees. SubPlans and InitPlans are identified only by ID numbers. In principle we could search the plan tree for a CTE Scan node referencing that ID number and then grab the name of the RTE it refers to ... but ick, especially considering we'd have to do that for every subplan, even the ones that are not CTEs. What we could do instead, which is a pretty simple change, is to add a name field to struct SubPlan. If we were going to do this, I'd vote for filling it in for every subplan. For actual CTEs we could fill in CTE name; for anonymous subqueries we could do no better than InitPlan n or SubPlan n. However, that would still be a step forward, because then we could have references to subplans say subplan n instead of just subplan. So instead of regression=# explain select * from tenk1 a where unique2 not in (select f1 from int4_tbl); QUERY PLAN Seq Scan on tenk1 a (cost=1.06..484.06 rows=5000 width=244) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) (4 rows) you'd get Seq Scan on tenk1 a (cost=1.06..484.06 rows=5000 width=244) Filter: (NOT (hashed subplan 1)) SubPlan 1 - Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) This isn't terribly compelling in this example, of course, but it gets a lot more important when you've got a dozen of 'em. From the perspective of the backend this is a simple and cheap change. I think the objection that is most likely to be raised is that it would confuse or break programs that analyze EXPLAIN output in any degree of detail. Of course such programs are going to need some work for 8.4 already. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
* Tom Lane (t...@sss.pgh.pa.us) wrote: This isn't terribly compelling in this example, of course, but it gets a lot more important when you've got a dozen of 'em. Exactly. From the perspective of the backend this is a simple and cheap change. Awesome. I think the objection that is most likely to be raised is that it would confuse or break programs that analyze EXPLAIN output in any degree of detail. Of course such programs are going to need some work for 8.4 already. I definitely feel that it would be best to make this change now, when we're introducing CTE as a type that anything doing EXPLAIN would need to deal with at some level anyway than to add it later (eg: 8.5). I'm definitely a +1 on this. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 5, 2009 at 12:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Robert Haas wrote: Well, it's a compatibility function... compatible with what? It's required by the SQL standard. The other thing that frankly bothers me is that we appear to have acquired this function by a curious process which involved no proposal or discussion that I have discovered. If there had been proper and adequate discussion before the item was committed I wouldn't be making a fuss now, whether or not I agreed with the result. I think Peter put it in under the assumption that meeting spec-required syntax would always pass muster. It is however fair to question whether he made the right extrapolation of the spec's definition to cases that are not in the spec. Personally I am in favor of changing it to give the total number of array elements, on the grounds that (1) that's as defensible a reading of the spec as the other and (2) it would add actual new functionality rather than being only a relabeling of array_length. I will leave that item on the Open Items list. I take it no one's excited about the others? I'm excited about some of them, but not to the point of not wanting to ship beta. So +1 for removing them as per your suggestions. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 05, 2009 at 12:21:41PM -0400, Tom Lane wrote: I will leave that item on the Open Items list. I take it no one's excited about the others? When the windowing functions become a pain point, let's revisit :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
David Fetter da...@fetter.org writes: On Sun, Apr 05, 2009 at 12:21:41PM -0400, Tom Lane wrote: I will leave that item on the Open Items list. I take it no one's excited about the others? When the windowing functions become a pain point, let's revisit :) The \df thing? That's something it'd be okay to revisit during beta, IMHO. The things I'd really like to get right before beta are the ones that are going to require an initdb to change. Like, say, the cardinality() issue ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 5, 2009 at 6:54 PM, Robert Haas robertmh...@gmail.com wrote: I'm excited about some of them, but not to the point of not wanting to ship beta. So +1 for removing them as per your suggestions. I'm somewhat excited about posix_fadvise but my general feeling was that it was best to do nothing anyways. I don't know how to test these questions though because they depend a lot on workload and pgbench or synthetic queries which stress prefetching aren't especially good at measuring how fast pages get evicted. As far as reimplementing regular index scans -- I don't currently see any way to do it in a way that would satisfy your demands that wouldn't be insanely complex. Hopefully I'm missing something obvious and if someone sees what I would be happy to go ahead and implement something. But everything I've tried has turned into a monster. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
Greg Stark st...@enterprisedb.com writes: On Sun, Apr 5, 2009 at 6:54 PM, Robert Haas robertmh...@gmail.com wrote: I'm excited about some of them, but not to the point of not wanting to ship beta. So +1 for removing them as per your suggestions. I'm somewhat excited about posix_fadvise but my general feeling was that it was best to do nothing anyways. Yeah. One of the things in the back of my mind is that the planner is going to prefer bitmap scans anyway for anything that fetches more than a very few rows. So it's not clear that prefetching plain indexscans is going to buy enough to justify a whole lotta work or ugliness there. I'm content to throw this one on TODO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
Stephen Frost sfr...@snowman.net writes: I definitely feel that it would be best to make this change now, when we're introducing CTE as a type that anything doing EXPLAIN would need to deal with at some level anyway than to add it later (eg: 8.5). I'm definitely a +1 on this. Here's a somewhat contrived example of what you get now: regression=# explain with wumpus as (select * from tenk1 ) select * from wumpus where unique2 = (select sum(f1) from int4_tbl) or exists(select 1 from tenk1 z where z.unique1 = wumpus.thousand); QUERY PLAN - CTE Scan on wumpus (cost=459.07..83371.07 rows=5025 width=244) Filter: ((unique2 = $1) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) CTE wumpus - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) InitPlan 2 (returns $1) - Aggregate (cost=1.06..1.07 rows=1 width=4) - Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) SubPlan 4 - Seq Scan on tenk1 z (cost=0.00..458.00 rows=1 width=4) SubPlan 3 - Index Scan using tenk1_unique1 on tenk1 z (cost=0.00..8.27 rows=1 width=0) Index Cond: (unique1 = $2) (12 rows) Before, it would have been pretty tough to tell which subplan did what. The ruleutils printout of a subplan reference still leaves some things to be desired, eg in this example you can't really tell it's an EXISTS check as opposed to something else. But fixing that is probably too much for this late date --- I'm not even real clear on what a better printout would look like, exactly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: XML only working in UTF-8 - Re: [HACKERS] 8.4 open items list
On Sunday 05 April 2009 05:00:04 Tom Lane wrote: Chris Browne cbbro...@acm.org writes: j...@agliodbs.com (Josh Berkus) writes: This one is also really bad, but probably only Doc-patchable. However, can SQL/XML really be said to be core functionality if it only works in UTF-8? * BUG #4622: xpath only work in utf-8 server encoding Well, much of the definition of XML assumes the use of Unicode, so I don't feel entirely badly about there being such a restriction. It seems likely to me that opening its use to other encodings has a considerable risk of breaking due to a loss of, erm, closure, in the mathematical sense. Or, alternatively, opening a Pandora's Box of needing to do translations to prevent mappings from breaking. Is there a reason not to fix it as suggested at http://archives.postgresql.org/pgsql-bugs/2009-02/msg00032.php ie recode on-the-fly from database encoding to UTF8? Probably just verifying that it works. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris getopt_long and PostgreSQL
Zdenek Kotala píše v út 31. 03. 2009 v 21:25 +0200: Another possibility is to rewrite postgres(and pg_resetxlog) to use getopt_long() instead of getopt(). Attached patch rewrites postgres to use getopt_long instead of getopt. Patch also removes configure part for Solaris related to getopt. Zdenek diff -Nrc pgsql_getopt.2ecfaec29a72/configure.in pgsql_getopt/configure.in *** pgsql_getopt.2ecfaec29a72/configure.in 2009-04-05 23:12:35.718886756 +0200 --- pgsql_getopt/configure.in 2009-04-05 23:12:35.761770812 +0200 *** *** 1276,1287 AC_LIBOBJ(getopt_long) fi - # Solaris' getopt() doesn't do what we want for long options, so always use - # our version on that platform. - if test $PORTNAME = solaris; then - AC_LIBOBJ(getopt) - fi - # Win32 support if test $PORTNAME = win32; then AC_REPLACE_FUNCS(gettimeofday) --- 1276,1281 diff -Nrc pgsql_getopt.2ecfaec29a72/src/backend/bootstrap/bootstrap.c pgsql_getopt/src/backend/bootstrap/bootstrap.c *** pgsql_getopt.2ecfaec29a72/src/backend/bootstrap/bootstrap.c 2009-04-05 23:12:35.723955129 +0200 --- pgsql_getopt/src/backend/bootstrap/bootstrap.c 2009-04-05 23:12:35.762270558 +0200 *** *** 17,25 #include time.h #include unistd.h #include signal.h ! #ifdef HAVE_GETOPT_H ! #include getopt.h ! #endif #include access/genam.h #include access/heapam.h --- 17,23 #include time.h #include unistd.h #include signal.h ! #include getopt_long.h #include access/genam.h #include access/heapam.h *** *** 38,43 --- 36,42 #include tcop/tcopprot.h #include utils/builtins.h #include utils/fmgroids.h + #include utils/guc.h #include utils/memutils.h #include utils/ps_status.h #include utils/tqual.h *** *** 208,213 --- 207,214 int flag; AuxProcType auxType = CheckerProcess; char *userDoption = NULL; + struct option *optlist; + int optindex; /* * initialize globals *** *** 247,253 argc--; } ! while ((flag = getopt(argc, argv, B:c:d:D:Fr:x:-:)) != -1) { switch (flag) { --- 248,256 argc--; } ! optlist = GetLongOptionList(); ! ! while ((flag = getopt_long(argc, argv, B:c:d:D:Fr:x:, optlist, optindex)) != -1) { switch (flag) { *** *** 280,286 auxType = atoi(optarg); break; case 'c': - case '-': { char *name, *value; --- 283,288 *** *** 288,293 --- 290,296 ParseLongOption(optarg, name, value); if (!value) { + free(optlist); if (flag == '-') ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *** *** 306,311 --- 309,317 free(value); break; } + case 1 : + SetConfigOption(optlist[optindex].name, optarg, PGC_POSTMASTER, PGC_S_ARGV); + break; default: write_stderr(Try \%s --help\ for more information.\n, progname); *** *** 313,318 --- 319,325 break; } } + free(optlist); if (argc != optind) { diff -Nrc pgsql_getopt.2ecfaec29a72/src/backend/postmaster/postmaster.c pgsql_getopt/src/backend/postmaster/postmaster.c *** pgsql_getopt.2ecfaec29a72/src/backend/postmaster/postmaster.c 2009-04-05 23:12:35.729964754 +0200 --- pgsql_getopt/src/backend/postmaster/postmaster.c 2009-04-05 23:12:35.762704640 +0200 *** *** 83,92 #ifdef HAVE_SYS_SELECT_H #include sys/select.h #endif ! ! #ifdef HAVE_GETOPT_H ! #include getopt.h ! #endif #ifdef USE_BONJOUR #include DNSServiceDiscovery/DNSServiceDiscovery.h --- 83,89 #ifdef HAVE_SYS_SELECT_H #include sys/select.h #endif ! #include getopt_long.h #ifdef USE_BONJOUR #include DNSServiceDiscovery/DNSServiceDiscovery.h *** *** 116,121 --- 113,119 #include tcop/tcopprot.h #include utils/builtins.h #include utils/datetime.h + #include utils/guc.h #include utils/memutils.h #include utils/ps_status.h *** *** 463,468 --- 461,468 int status; char *userDoption = NULL; int i; + struct option *optlist; + int optindex; MyProcPid = PostmasterPid = getpid(); *** *** 506,517 opterr = 1; /* * Parse command-line options. CAUTION: keep this in sync with * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:)) != -1) { switch (opt) { --- 506,518 opterr = 1; + optlist = GetLongOptionList(); /* * Parse command-line options. CAUTION: keep this in sync with * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt_long(argc, argv,
Re: XML only working in UTF-8 - Re: [HACKERS] 8.4 open items list
Peter Eisentraut pete...@gmx.net writes: On Sunday 05 April 2009 05:00:04 Tom Lane wrote: Is there a reason not to fix it as suggested at http://archives.postgresql.org/pgsql-bugs/2009-02/msg00032.php ie recode on-the-fly from database encoding to UTF8? Probably just verifying that it works. Well, I'm willing to review that patch for sanity and apply it, but I know too little about xpath to test it creatively. Have you any suggestions other than something like the original complaint at http://archives.postgresql.org/pgsql-bugs/2009-01/msg00135.php which is basically trying to use a made-up(?) non-ASCII tag name? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris getopt_long and PostgreSQL
Zdenek Kotala zdenek.kot...@sun.com writes: Zdenek Kotala pÃÅ¡e v út 31. 03. 2009 v 21:25 +0200: Another possibility is to rewrite postgres(and pg_resetxlog) to use getopt_long() instead of getopt(). Attached patch rewrites postgres to use getopt_long instead of getopt. Actually, I fooled around with it last night and seem to have fixed it (buildfarm is all green today) by the expedient of not defining our own optind etc. variables if the system supplies them. So that seemed like a clean fix to me --- the old handling of optreset in particular was a huge kluge, whereas it's clear how this code is supposed to work. I don't think we need to mess around with changing our option parsing logic, especially not to the extent that you propose here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: XML only working in UTF-8 - Re: [HACKERS] 8.4 open items list
Peter Eisentraut pete...@gmx.net writes: On Sunday 05 April 2009 05:00:04 Tom Lane wrote: Is there a reason not to fix it as suggested at http://archives.postgresql.org/pgsql-bugs/2009-02/msg00032.php ie recode on-the-fly from database encoding to UTF8? Probably just verifying that it works. I studied this patch a bit and I'm unimpressed. It looks to me like xml.c is absolutely chock-full of places where we pass DB-encoding data to libxml, or vice versa. The patch only fixes a few of them, and does so in a fairly ugly, ad-hoc fashion with lots of duplicated code. As near as I can tell, every place where you see an explicit cast between char * and xmlChar * is probably broken. I think we ought to approach this by refactoring to have all those conversions go through subroutines, instead of blithely casting. This is more work than I personally care to put into xml.c. Any takers? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Python 3.0 does not work with PL/Python
On Apr 5, 2009, at 8:54 AM, Tom Lane wrote: Hm, did you read the link I cited? It's not so surprising that 3.0 should have broken distutils, but what I found distressing is that they fixed distutils and then 3.0.1 broke it *again*. I stand by my opinion that Python 3 isn't stable yet. Yeah, actually. From some of the talk I've seen on python-dev, it sounds like 3.0.2 will be the last 3.0 release. 3.1 is in alpha, and ready to start cleaning things up, afaict. This means that users of PL/Python should not expect PL/Python to automatically work with 3.0. Supporting 3.0 will be a new feature. So it's OK to drop it from 8.4. One other thing that we'll have to seriously consider is whether we should package python3 as a separate PL, so that people can keep using their 2.x plpython functions without fear of breakage. I know that the Fedora guys are currently debating whether to treat it that way, and I suppose other distros are having or will soon have the same conversation. Six months from now, there will be some precedents and some track record for us to look at in making that choice. I think this would be wise. Any thoughts on the acceptability of a complete rewrite for Python 3? I've been fiddling with a HEAD branch including the plpy code in a github repo. (nah it dunt compile yet: bitrot and been busy with a 3.x driver. ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Python 3.0 does not work with PL/Python
James Pye li...@jwp.name writes: Any thoughts on the acceptability of a complete rewrite for Python 3? I've always thought that plpython.c was a bit on the hackish side. If we do decide we have to make plpython2 and plpython3 separate languages, it'd be pretty easy to just start over with a whole new implementation for python3 ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d commands and information_schema
Martin Pihlak martin.pih...@gmail.com writes: Tom Lane wrote: I don't find this to be a pressing problem. If the user has lots of schemas, they probably have lots of objects too, and are unlikely to need such a thing. Current behaviour makes it impossible to get a quick overview of all the user defined objects. And it doesn't really matter what the number of schemas is -- it gets messy for even small number of schemas and objects. Well, if they're all in your search_path then plain old \df will do fine. If they're not in your search path then I think it gets pretty questionable whether they're user defined in a real sense. It seems more likely that you've got a pile of modules loaded, and which of those modules is user defined for your immediate purposes is something that psql can't hope to intuit. In short I'm still not convinced that there's much use-case for a simple U switch. For our needs I wouldn't really consider using search_path for anything but temporary hacks. However, a psql variable that specifies a list of name patterns to be excluded from describe, could be useful. Something along the lines of: \set DESCRIBE_EXCLUDE_PATTERNS 'pg_catalog.*, information_schema.*, ...' Possibly something like this could be useful. But I'd like to see it designed in conjunction with the real module facility that we keep hoping for, because I think a situation with a number of modules loaded is going to be exactly where you want some flexible filtering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
Robert Haas robertmh...@gmail.com writes: On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: * Using an integer is bogus. Use a float4 and forget the weird scaling; it should have exactly the same interpretation as stadistinct, except for 0 meaning unset instead of unknown. I have a deep-seated aversion to storing important values as float, [ shrug... ] Precision is not important for this value: we are not anywhere near needing more than six significant digits for our statistical estimates. Range, on the other hand, could be important when dealing with really large tables. So I'm much more concerned about whether the definition is too restrictive than about whether some uninformed person complains about exactness. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 05, 2009 at 02:07:32PM -0400, Tom Lane wrote: David Fetter da...@fetter.org writes: On Sun, Apr 05, 2009 at 12:21:41PM -0400, Tom Lane wrote: I will leave that item on the Open Items list. I take it no one's excited about the others? When the windowing functions become a pain point, let's revisit :) The \df thing? That's something it'd be okay to revisit during beta, IMHO. OK, I'll work on this tomorrow :) Cheers, David. The things I'd really like to get right before beta are the ones that are going to require an initdb to change. Like, say, the cardinality() issue ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sampling Profler for Postgres
I wrote: Peter Eisentraut pete...@gmx.net writes: On March 10, 2009, Tom Lane wrote: FWIW, the systemtap guys are really, really close to having a working DTrace equivalent for Linux: http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ So how is this going? Is it usable? I assume it's source compatible with the dtrace support that we already have? The current 0.9 release does *not* work on our CVS tip (dtrace fails on more-than-6-argument probes, and there are some other issues), but you can pull from their git repository: BTW, systemtap 0.9.5 is now available as part of the standard Fedora 10 package set, so you don't have to install any nonstandard software anymore. I've checked, and 0.9.5 appears to just work with our CVS HEAD. You need these packages: $ rpm -qa | grep systemtap systemtap-sdt-devel-0.9.5-1.fc10.x86_64 systemtap-runtime-0.9.5-1.fc10.x86_64 systemtap-0.9.5-1.fc10.x86_64 Then configure --enable-dtrace, and away you go. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
David Fetter da...@fetter.org writes: On Sun, Apr 05, 2009 at 02:07:32PM -0400, Tom Lane wrote: The \df thing? That's something it'd be okay to revisit during beta, IMHO. OK, I'll work on this tomorrow :) I think what we were lacking was consensus on what it should do, not code ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing some 8.4 open items
On Sun, Apr 05, 2009 at 08:55:07PM -0400, Tom Lane wrote: David Fetter da...@fetter.org writes: On Sun, Apr 05, 2009 at 02:07:32PM -0400, Tom Lane wrote: The \df thing? That's something it'd be okay to revisit during beta, IMHO. OK, I'll work on this tomorrow :) I think what we were lacking was consensus on what it should do, not code ... I was thinking I'd knock out a proposal or two. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash in gist insertion on pathological box data
Teodor Sigaev teo...@sigaev.ru writes: I don't like throwing an error there; I wish there were a way for the generic code to apply the fallbackSplit code instead. I see that in this particular formulation it's dependent on the datatype --- can we get around that, by having it invoke the union method? Done. rtree.patch.gz contains patch for gistproc.c, genericsplit.patch.gz adds simple genericPickSplit to gistsplit.c to workaround bug of user-defined picksplit. This looks good to me. I tested it to the extent of verifying that either patch individually would prevent the originally-reported failure. The only question I have is whether we want this nag message or not: ! ereport(LOG, ! (errcode(ERRCODE_INTERNAL_ERROR), ! errmsg(Picksplit method for %d column of index \%s\ failed, ! attno+1, RelationGetRelationName(r)), ! errhint(Index is not optimal, to optimize it contact developer or try to use the column as a second one in create index command))); I'd be inclined to keep it but reduce it to level DEBUG1 or so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN WITH
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 I think the objection that is most likely to be raised is that it would confuse or break programs that analyze EXPLAIN output in any degree of detail. Of course such programs are going to need some work for 8.4 already. As someone who has written more than one such program, I say go for it, looks like a good solution. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200904052149 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAknZX7IACgkQvJuQZxSWSsjTXgCfdF+qXRLliWjgd4FFFJLzVsh0 zf0AoNN3S02y+4DopOwo/bGTwEXZuA5S =yiKq -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Apr 4, 2009 at 11:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: * Using an integer is bogus. Use a float4 and forget the weird scaling; it should have exactly the same interpretation as stadistinct, except for 0 meaning unset instead of unknown. I have a deep-seated aversion to storing important values as float, [ shrug... ] Precision is not important for this value: we are not anywhere near needing more than six significant digits for our statistical estimates. Range, on the other hand, could be important when dealing with really large tables. So I'm much more concerned about whether the definition is too restrictive than about whether some uninformed person complains about exactness. I thought about that, and if you think that's better, I can implement it that way. Personally, I'm unconvinced. The use case for specifying a number of distinct values in excess of 2 billion as an absolute number rather than as a percentage of the table size seems pretty weak to me. I would rather use integers and have it be clean. But I would rather have it your way than not have it at all. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
Robert Haas robertmh...@gmail.com writes: On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] Precision is not important for this value: we are not anywhere near needing more than six significant digits for our statistical estimates. Range, on the other hand, could be important when dealing with really large tables. I thought about that, and if you think that's better, I can implement it that way. Personally, I'm unconvinced. The use case for specifying a number of distinct values in excess of 2 billion as an absolute number rather than as a percentage of the table size seems pretty weak to me. I was more concerned about the other end of it. Your patch sets a not-too-generous lower bound on the percentage that can be represented ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
On Sun, Apr 5, 2009 at 10:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] Precision is not important for this value: we are not anywhere near needing more than six significant digits for our statistical estimates. Range, on the other hand, could be important when dealing with really large tables. I thought about that, and if you think that's better, I can implement it that way. Personally, I'm unconvinced. The use case for specifying a number of distinct values in excess of 2 billion as an absolute number rather than as a percentage of the table size seems pretty weak to me. I was more concerned about the other end of it. Your patch sets a not-too-generous lower bound on the percentage that can be represented ... Huh? With a scaling factor of 1 million, you can represent anything down to about 0.01, which is apparently all you can expect out of a float4 anyway. http://archives.postgresql.org/pgsql-bugs/2009-01/msg00039.php In fact, we could change the scaling factor to 1 billion if you like, and it would then give you MORE significant digits than you'll get out of a float4 (and you'll be able to predict the exact number that you're gonna get). If someone has billions of rows in the table but only thousands of distinct values, I would expect them to run a script to count 'em up and specify the exact number, rather than specifying some microscopic percentage. But there's certainly enough range in int4 to tack on three more decimal places if you think it's warranted. (It's also worth pointing out that the calculations we do with ndistinct are pretty approximations anyway. If the difference between stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing that's determining whether the planner is picking the correct plan on your 4-billion-row table, you probably want to tune some other parameter as well so as to get further away from that line. Just getting the value in the ballpark should be a big improvement over how things stand now.) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
On Sun, Apr 5, 2009 at 10:38 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Apr 5, 2009 at 10:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] Precision is not important for this value: we are not anywhere near needing more than six significant digits for our statistical estimates. Range, on the other hand, could be important when dealing with really large tables. I thought about that, and if you think that's better, I can implement it that way. Personally, I'm unconvinced. The use case for specifying a number of distinct values in excess of 2 billion as an absolute number rather than as a percentage of the table size seems pretty weak to me. I was more concerned about the other end of it. Your patch sets a not-too-generous lower bound on the percentage that can be represented ... Huh? With a scaling factor of 1 million, you can represent anything down to about 0.01, which is apparently all you can expect out of a float4 anyway. http://archives.postgresql.org/pgsql-bugs/2009-01/msg00039.php I guess I'm wrong here - 0.1 is only one SIGNIFICANT digit. But the point remains that specifying ndistinct in ppm is probably enough for most cases, and ppb (which would still fit in int4) even more so. I don't think we need to worry about people with trillions of rows (and even they could still specify an absolute number). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
Robert Haas robertmh...@gmail.com writes: (It's also worth pointing out that the calculations we do with ndistinct are pretty approximations anyway. If the difference between stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing that's determining whether the planner is picking the correct plan on your 4-billion-row table, No, it's the loss of ability to set stadistinct to -1e-9 or -1e-12 or -1e-15 or so that is bothering me. In a table with billions of rows that could become important. Or maybe not; but the real bottom line here is that it is 100% silly to use a different representation in this column than is used in the underlying stadistinct column. All you accomplish by that is to impose on the user the intersection of the accuracy/range limits of the two different representations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers