Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD
Michael Fuhr wrote: On Thu, Apr 28, 2005 at 09:47:45PM +0200, Thomas Hallgren wrote: What version of PostgreSQL are you using The latest and greatest from CVS. Which branch? HEAD? REL8_0_STABLE? Sorry. To me latest always defaults to HEAD and by greatest I mean the coming 8.1. Wouldn't it make sense to be able to define a record in the projection part of a query, similar to what I was attempting with my SELECT? Has this been discussed or is it just considered as not very useful? Sounds reasonable to me, but if it's currently possible then I haven't yet figured out how to do it. I can't remember if it's been discussed before or not. If nobody answers here then you might try pgsql-hackers. Ok. Thanks. I'll redirect this to hackers and see if I have any luck there. My original question was: I do the following: CREATE FUNCTION xyz(int, int) RETURNS RECORD AS '...' CREATE TABLE abc(a int, b int); Now I want to call my xyz function once for each row in abc and I want my RECORD to be (x int, y int, z timestamptz). How do I write that query? I.e. where do specify my RECORD definition? Is it possible at all? Ideally I'd like to write something like this: SELECT xyz(a, b) AS (x int, y int, z timestamptz) FROM abc; but that yields a syntax error. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pg_restore stuck in a loop?
On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: You tell us ;-). You've got the test case, attach to it with a debugger and find out what it's doing. I wasn't entirely sure how to catch it in action so I just used CTRL+C to interrupt the pg_restore process, and got these as backtraces: Hm, I wonder if there is something broken about the SQL parsing logic in _sendSQLLine, such that it could go into an infinite loop given the right input? Let me see if I can put together a test case which is smaller for next week, and I'll send that in. -- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Distinct-Sampling (Gibbons paper) for Postgres
Well, this guy has it nailed. He cites Flajolet and Martin, which was (I thought) as good as you could get with only a reasonable amount of memory per statistic. Unfortunately, their hash table is a one-shot deal; there's no way to maintain it once the table changes. His incremental update doesn't degrade as the table changes. If there isn't the same wrangle of patent as with the ARC algorithm, and if the existing stats collector process can stand the extra traffic, then this one is a winner. Many thanks to the person who posted this reference in the first place; so sorry I canned your posting and can't recall your name. Now, if we can come up with something better than the ARC algorithm ... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Feature freeze date for 8.1
In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote: o integrated auto-vacuum (Bruce) If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. What I'd _really_ love to see (and alas, it's beyond my ken) is some parallel to the FSM, namely a Recently Updated Blocks Map, which would enable a vacuuming approach that would not go through entire tables, but which would rather go through only those blocks known to be recently updated. There continues to be trouble if you have a table that grows to 50 million rows where there are 100K rows that are being heavily updated. In effect, only the 100K rows need facuuming. -- (reverse (concatenate 'string moc.liamg @ enworbbc)) http://linuxfinances.info/info/emacs.html Group Dynamics Following Minsky and Schelling, consider a person as a society of agents. A group is then a larger society of such agents. Understand groups by examining interactions of coalitions of agents that cross-cut their grouping into people. -- Mark Miller ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] PseudoPartitioning and agregates
Hello, pgsql-hackers. I have an idea ( :-) ) about SELECT field1,agregate(field2) FROM view GROUP BY field1; (and its variant SELECT agragate(field2) FROM view) where view is SELECT ... UNION ALL ... : As i understood from thread http://archives.postgresql.org/pgsql-performance/2004-12/msg00101.php there is a problem, that APPEND allways perfoms before aggregate; I think, postgres can perfoms aggregate on each table in union first, and then merge results. For this, for aggregate we must define function for merging, which should accept two values of type STYPE. for example: CREATE AGGREGATE max( BASETYPE=float4, SFUNC=float4larger, STYPE=float4, MERGEFUNC=float4larger, FINALFUNC=- ); CREATE AGGREGATE sum( BASETYPE=float4, SFUNC=float4pl, STYPE=float4, MERGEFUNC=float4pl, FINALFUNC=- ); CREATE AGGREGATE count( BASETYPE=any, SFUNC=int8inc, STYPE=int8, FINALFUNC=-, MERGEFUNC=int8pl, -- special case INITCOND='0' ); CREATE TABLE t1 ( id INT PRIMARY KEY, grp INT, amount FLOAT4 ); CREATE INDEX ix_t1_grp ON t1 (grp); CREATE TABLE t2 ( id INT PRIMARY KEY, grp INT, amount FLOAT4 ); CREATE INDEX ix_t2_grp ON t2 (grp); insert into t1 select tt.i,tt.i/100,tt.i*sin(tt.i) from generate_series(1,262144) tt(i); insert into t2 select tt.i,tt.i/100,tt.i*sin(tt.i) from generate_series(262145,524288) tt(i); VACUUM FULL ANALIZE t1; VACUUM FULL ANALIZE t2; CREATE VIEW union_table AS SELECT id,grp,amount FROM t1 UNION ALL SELECT id,grp, amount FROM t2; So, now t1 and t2 both contain 262144 rows ( summary 524288) max(t1.grp)=min(t2.grp)=2621 Now, for perfoming query SELECT group,count(*) AS c,sum(amount) AS s,max(amount) AS m FROM union_table GROUP BY grp; Postgres selects rows from t1 and t2, APPENDs it together, and then perfoming HASH. HashAggregate (cost=23830.52..23832.02 rows=200 width=8) (actual time=22547.272..22586.130 rows=5243 loops=1) - Subquery Scan t_union (cost=0.00..18587.64 rows=524288 width=8) (actual time=0.204..17863.444 rows=524288 loops=1) - Append (cost=0.00..13344.76 rows=524288 width=12) (actual time=0.193..12990.177 rows=524288 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..6684.88 rows=262144 width=12) (actual time=0.186..4488.981 rows=262144 loops=1) - Seq Scan on t (cost=0.00..4063.44 rows=262144 width=12) (actual time=0.163..1915.213 rows=262144 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..6659.88 rows=262144 width=12) (actual time=0.225..4558.788 rows=262144 loops=1) - Seq Scan on t1 (cost=0.00..4038.44 rows=262144 width=12) (actual time=0.208..1798.410 rows=262144 loops=1) Total runtime: 22634.454 ms (well, actual time is 2375 ms Postgres 8.0.1 Slackware 10.0) But it would be quicker agregates first table, then second and merge it. For example, here is a query, that do it explicitly: SELECT COALESCE(t1.grp,t2.grp) as grp, CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN int8pl(t1.c,t2.c) when t1.grp IS NOT NULL THEN t1.c ELSE t2.c END AS c, CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN float4pl(t1.s,t2.s) WHEN t1.grp IS NOT NULL THEN t1.s ELSE t2.s END AS s, CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN float4larger(t1.m,t2.m) WHEN t1.grp IS NOT NULL THEN t1.m ELSE t2.m END AS m FROM (SELECT grp,count(*) AS c,sum(amount) AS s,max(amount) AS m from t1 group by grp) as t1 FULL JOIN (SELECT grp,count(*) AS c,sum(amount) AS s,max(amount) AS m from t2 group by grp) as t2 ON t1.grp=t2.grp; Here is an explain analize: Merge Full Join (cost=13737.48..14535.48 rows=34885 width=40) (actual time=7908.438..7989.105 rows=5243 loops=1) Merge Cond: (outer.grp = inner.grp) - Sort (cost=6854.32..6860.87 rows=2618 width=20) (actual time=4070.833..4083.687 rows=2622 loops=1) Sort Key: t2.grp - Subquery Scan t2 (cost=6659.88..6705.70 rows=2618 width=20) (actual time=4005.290..4057.131 rows=2622 loops=1) - HashAggregate (cost=6659.88..6679.52 rows=2618 width=8) (actual time=4005.273..4025.564 rows=2622 loops=1) - Seq Scan on t1 (cost=0.00..4038.44 rows=262144 width=8) (actual time=0.094..1712.362 rows=262144 loops=1) - Sort (cost=6883.15..6889.82 rows=2665 width=20) (actual time=3837.433..3845.754 rows=2622 loops=1) Sort Key: t1.grp - Subquery Scan t1 (cost=6684.88..6731.52 rows=2665 width=20) (actual time=3771.661..3822.520 rows=2622 loops=1) - HashAggregate (cost=6684.88..6704.87 rows=2665 width=8) (actual time=3771.564..3793.023 rows=2622 loops=1)
Re: [HACKERS] Feature freeze date for 8.1
Christopher Browne [EMAIL PROTECTED] writes: In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote: o integrated auto-vacuum (Bruce) If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. [ shrug... ] You'll always be able to turn it off if you don't want it. I'm not sure that we'll be ready to turn it on by default even in 8.1. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Feature freeze date for 8.1
Tom Lane wrote: Christopher Browne [EMAIL PROTECTED] writes: In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote: o integrated auto-vacuum (Bruce) If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. [ shrug... ] You'll always be able to turn it off if you don't want it. I'm not sure that we'll be ready to turn it on by default even in 8.1. Agreed. It will just be there to turn on from postgresql.conf if you want it, and we do have TODO information about keeping such an FSM for recently expired pages. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] performance of bitmap scans in nested loop joins
Hello All, I'd like to report about suprising (for me) results of performance testing of bitmap indexes in nested loop join plans. I have the table q3c test=# \d q3c Table public.q3c Column | Type | Modifiers ++--- ipix | bigint | errbox | box| ra | real | dec| real | Indexes: ipix_idx UNIQUE, btree (ipix) test=# SELECT count(*) from q3c; count - 300 (1 row) First, two join plans with just simple index scan: # 1) test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3; QUERY PLAN --- Nested Loop (cost=0.01..49059519577.21 rows=11095 width=96) (actual time=0.137..39385.725 rows=300 loops=1) - Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=316 width=48) (actual time=0.007..3659.171 rows=300 loops=1) - Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=35 width=48) (actual time=0.007..0.009 rows=1 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = (outer.ipix + 3))) Total runtime: 40755.390 ms (5 rows) # 2) test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE (q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993); QUERY PLAN --- Nested Loop (cost=0.01..49059519577.21 rows=11095 width=96) (actual time=28058.983..28058.983 rows=0 loops=1) - Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=316 width=48) (actual time=0.061..3803.598 rows=300 loops=1) - Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=35 width=48) (actual time=0.006..0.006 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) Total runtime: 28059.066 ms (5 rows) # And now I combine them in one: test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE (q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) OR (q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993); QUERY PLAN - Nested Loop (cost=5832.03..190281569757.46 rows=1888909037091 width=96) (actual time=0.180..122444.610 rows=300 loops=1) - Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=316 width=48) (actual time=0.063..3871.731 rows=300 loops=1) - Bitmap Heap Scan on q3c (cost=5832.03..43426.73 rows=70 width=48) (actual time=0.033..0.034 rows=1 loops=300) Recheck Cond: (((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = (outer.ipix + 3))) OR ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993 - BitmapOr (cost=5832.03..5832.03 rows=70 width=0) (actual time=0.029..0.029 rows=0 loops=300) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=35 width=0) (actual time=0.014..0.014 rows=1 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = (outer.ipix + 3))) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) Total runtime: 124366.545 ms (10 rows) So, we see that total time of the plan with bitmap scan is roughly two times greater than the sum of times of individual index scans. I see that in my case even simple union is significantly faster than bitmap indexes. test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE (q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) UNION ALL SELECT * FROM q3c,q3c AS q3cs WHERE (q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993); QUERY PLAN --- Append (cost=0.01..118119252488.32 rows=22190 width=96) (actual time=0.139..75048.897 rows=300 loops=1) - Subquery Scan *SELECT* 1 (cost=0.01..59059626244.16
Re: [HACKERS] [proposal] protocol extension to support loadable stream filters
Brent Verner [EMAIL PROTECTED] writes: Now, the hard part...where should this code live? I'm thinking a src/transport directory seems sensible. Our experience with trying to write single files to serve both server and client sides has been, um, painful. I recommend you not try this. My recommendation would be server-side code in src/backend/libpq/ and client-side code in src/interfaces/libpq/. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD
Thomas Hallgren [EMAIL PROTECTED] writes: Ideally I'd like to write something like this: SELECT xyz(a, b) AS (x int, y int, z timestamptz) FROM abc; but that yields a syntax error. While that's probably doable if anyone were really motivated, I'm not sure it's worth the trouble in view of the recent OUT-parameter improvements. IMHO most of the use cases for such a thing would be better served by declaring the function with OUT parameters. The AS-clause-column-list functionality was invented for functions where the result type is truly not known when the function is written, such as dblink. But it's pretty hard to believe that many people need to write such things. Your example can be done like this in CVS tip: regression=# create function xyz(int, int, out x int, out y int, out z timestamptz) as $$ select $1, $2, now() $$ language sql; CREATE FUNCTION regression=# select xyz(unique1,unique2) from tenk1 limit 5; xyz -- (8800,0,2005-04-29 10:26:37.738946-04) (1891,1,2005-04-29 10:26:37.738946-04) (3420,2,2005-04-29 10:26:37.738946-04) (9850,3,2005-04-29 10:26:37.738946-04) (7164,4,2005-04-29 10:26:37.738946-04) (5 rows) Notice that this returns the record as a single column. In most cases you would probably wish that the record were burst into multiple columns, which you can do easily with regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5; x | y | z --+---+--- 8800 | 0 | 2005-04-29 10:27:53.197948-04 1891 | 1 | 2005-04-29 10:27:53.197948-04 3420 | 2 | 2005-04-29 10:27:53.197948-04 9850 | 3 | 2005-04-29 10:27:53.197948-04 7164 | 4 | 2005-04-29 10:27:53.197948-04 (5 rows) but AFAICS that is not amenable to having an AS plastered on it (unless the AS goes inside the parentheses, which'd be a really spectacular abuse of the syntax). regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Feature freeze date for 8.1
On Fri, Apr 29, 2005 at 10:09:43 -0400, Bruce Momjian pgman@candle.pha.pa.us wrote: Tom Lane wrote: Christopher Browne [EMAIL PROTECTED] writes: In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote: o integrated auto-vacuum (Bruce) If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. [ shrug... ] You'll always be able to turn it off if you don't want it. I'm not sure that we'll be ready to turn it on by default even in 8.1. Agreed. It will just be there to turn on from postgresql.conf if you want it, and we do have TODO information about keeping such an FSM for recently expired pages. I think if we aren't finding problems in testing that it would be better to turn pg_autovacuum on by default. Vacuum is something the burns new users and having it one by default is going to cut down on surprises. Experienced users know about vacuum and will probably read the release notes when upgrading and do something that is appropiate for them. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Feature freeze date for 8.1
Tom Lane wrote: Christopher Browne [EMAIL PROTECTED] writes: If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. [ shrug... ] You'll always be able to turn it off if you don't want it. I'm not sure that we'll be ready to turn it on by default even in 8.1. What to people think about having an optional maintenance window so that autovac only takes action during an approved time. But perhaps just using the vacuum delay settings will be enough. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Feature freeze date for 8.1
On Fri, 29 Apr 2005, Bruno Wolff III wrote: On Fri, Apr 29, 2005 at 10:09:43 -0400, Bruce Momjian pgman@candle.pha.pa.us wrote: Tom Lane wrote: Christopher Browne [EMAIL PROTECTED] writes: In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote: o integrated auto-vacuum (Bruce) If this can kick off a vacuum of a Very Large Table at an unfortunate time, this can turn out to be a prety painful misfeature. [ shrug... ] You'll always be able to turn it off if you don't want it. I'm not sure that we'll be ready to turn it on by default even in 8.1. Agreed. It will just be there to turn on from postgresql.conf if you want it, and we do have TODO information about keeping such an FSM for recently expired pages. I think if we aren't finding problems in testing that it would be better to turn pg_autovacuum on by default. Vacuum is something the burns new users and having it one by default is going to cut down on surprises. Except for the surprise of peridically having the system go unresponsive because it hit a large table, and that new user wondering what is wrong with postgresql that it just stalls seemingly randomly :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Feature freeze date for 8.1
On Fri, Apr 29, 2005 at 12:43:37 -0300, Marc G. Fournier [EMAIL PROTECTED] wrote: On Fri, 29 Apr 2005, Bruno Wolff III wrote: Except for the surprise of peridically having the system go unresponsive because it hit a large table, and that new user wondering what is wrong with postgresql that it just stalls seemingly randomly :( I think most users running systems with that large of tables will know what they are doing. And will be more careful with vacuum. Vacuum running for a long time on large tables with few pages that need updating is really a separate problem that could use its own solution. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pg_restore stuck in a loop?
Rod Taylor [EMAIL PROTECTED] writes: On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote: Hm, I wonder if there is something broken about the SQL parsing logic in _sendSQLLine, such that it could go into an infinite loop given the right input? Let me see if I can put together a test case which is smaller for next week, and I'll send that in. If the parsing-bug theory is right, then there is probably some one object in your database that will make even pg_dump -s go nuts. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Feature freeze date for 8.1
Matthew T. O'Connor matthew@zeut.net writes: What to people think about having an optional maintenance window so that autovac only takes action during an approved time. But perhaps just using the vacuum delay settings will be enough. I'm not sure autovac should go completely catatonic during the day; what if someone does an unusual mass deletion, or something? But it does seem pretty reasonable to have a notion of a maintenance window where it should be more active than it is at other times. Maybe what you want is two complete sets of autovac parameters. Definitely at least two sets of the vacuum-delay values. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Feature freeze date for 8.1
On Fri, 29 Apr 2005, Tom Lane wrote: Matthew T. O'Connor matthew@zeut.net writes: What to people think about having an optional maintenance window so that autovac only takes action during an approved time. But perhaps just using the vacuum delay settings will be enough. I'm not sure autovac should go completely catatonic during the day; what if someone does an unusual mass deletion, or something? But it does seem pretty reasonable to have a notion of a maintenance window where it should be more active than it is at other times. Maybe what you want is two complete sets of autovac parameters. Definitely at least two sets of the vacuum-delay values. With the introduction of the stats collector, is there not some way of extending it so that autovac has more information to work off of? For instance, in my environment, we have clients in every timezone hitting the database ... our Japanese clients will be busy at a totally different time of day then our East Coast/NA clients, so a 'maintenance window' is near impossible to state ... I know one person was talking about being able to target only those that pages that have changes, instead of the whole table ... but some sort of load monitoring that checks # of active connections and tries to find 'lulls'? Basically, everything right now is being keyed to updates to the tables themselves, but isn't looking at what the system itself is doing ... if I'm doing a massive import of data into a table, the last time I want is a VACUUM to cut in and slow down the loading ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD
On Fri, Apr 29, 2005 at 10:36:05AM -0400, Tom Lane wrote: regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5; This is a little off topic, but I've noticed that the above invokes the function once per output column: CREATE FUNCTION xyz(INOUT x integer, INOUT y integer, OUT z integer) AS $$ BEGIN RAISE INFO 'calling xyz'; z := x + y; END; $$ LANGUAGE plpgsql IMMUTABLE; SELECT xyz(1,2); INFO: calling xyz xyz - (1,2,3) (1 row) SELECT (xyz(1,2)).*; INFO: calling xyz INFO: calling xyz INFO: calling xyz x | y | z ---+---+--- 1 | 2 | 3 (1 row) Is that because the splat causes the query to be expanded into SELECT (xyz(1,2)).x, (xyz(1,2)).y, (xyz(1,2)).z? Is it possible or desirable to optimize that into a single call, at least if the function were stable or immutable? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD
Michael Fuhr [EMAIL PROTECTED] writes: On Fri, Apr 29, 2005 at 10:36:05AM -0400, Tom Lane wrote: regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5; This is a little off topic, but I've noticed that the above invokes the function once per output column: Yeah, that is unfortunate but doesn't seem very easy to fix. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] performance of bitmap scans in nested loop joins
Sergey E. Koposov [EMAIL PROTECTED] writes: I'd like to report about suprising (for me) results of performance testing of bitmap indexes in nested loop join plans. I'm surprised too. There's something weird in the indexscans themselves: - Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=35 width=48) (actual time=0.006..0.006 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) The latter is (or should be) doing slightly *less* work, so why is it taking almost twice as much time? Can you get gprof profiles of the two cases? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [proposal] protocol extension to support loadable stream filters
On Fri, Apr 29, 2005 at 10:17:44AM -0400, Tom Lane wrote: Our experience with trying to write single files to serve both server and client sides has been, um, painful. I recommend you not try this. BTW, why not get rid of src/corba? -- Alvaro Herrera ([EMAIL PROTECTED]) XML! Exclaimed C++. What are you doing here? You're not a programming language. Tell that to the people who use me, said XML. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Increased company involvement
I've deliberately let the dust settle slightly on this. One thing that might help is a more open sponsorship clearing house. Example (not meant as a bid, but just to illustrate): the JDBC driver needs a scanner overhaul - it breaks on dollar quoting and a bunch of other stuff. I could do that work (as could others, of course) but I don't have time, unless someone buys some of my professional time. Someone might want to do just that, but how would they find me? Regarding the secret code stuff - I predict that it will quickly bite whoever does it, unless they are extremely lucky. cheers andrew Bruce Momjian wrote: I am very excited to see companies involved in PostgreSQL development. It gives us funding for developers and features that is new for us. We had Fujitsu funding some features for 8.0 and that really helped us. However, there was a lot of coordination that happened with Fujitsu that I don't see happening with the current companies involved. Companies are already duplicating work that is also done by community members or by other companies. The big issue is communication. Because the PostgreSQL code base is common for most of the companies involved, there has to be coordination in what they are working on and their approaches. If that doesn't happen, two companies will work on the same feature, and only one can be added, or a complex process of merging the two patches into one patch has to happen --- again duplicated effort. I am willing to do the coordination, or even better, have the companies involved publicly post their efforts so all the other companies can know what is happening. I realize this is hard for companies because their efforts are in some ways part of their profitability. Does profitability require duplication of effort and code collisions? I am not sure, but if it does, we are in trouble. I am not sure the community has the resources to resolve that many collisions. Second, some developers are being hired from the community to work on closed-source additions to PostgreSQL. That is fine and great, but one way to kill PostgreSQL is to hire away its developers. If a commercial company wanted to hurt us, that is certainly one way they might do it. Anyway, it is a concern I have. I am hoping community members hired to do closed-source additions can at least spend some of their time on community work. And finally, we have a few companies working on features that they eventually want merged back into the PostgreSQL codebase. That is a very tricky process and usually goes badly unless the company seeks community involvement from the start, including user interface, implementation, and coding standards. I hate to be discouraging here, but I am trying to communicate what we have learned over the past few years to help companies be effective in working with open source communities. I am available to talk to any company that wants further details. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Feature freeze date for 8.1
I think what you're suggesting is that vacuum settings (most likely delay) take into consideration the load on the database, which I think is a great idea. One possibility is if vacuum tracks how many blocks it's read/written, it can see how many blocks the database has done overall; subtract the two and you know how much other disk IO is going on in the system. You can then use that number to decide how long you'll sleep before the next vacuum cycle. On Fri, Apr 29, 2005 at 01:34:56PM -0300, Marc G. Fournier wrote: On Fri, 29 Apr 2005, Tom Lane wrote: Matthew T. O'Connor matthew@zeut.net writes: What to people think about having an optional maintenance window so that autovac only takes action during an approved time. But perhaps just using the vacuum delay settings will be enough. I'm not sure autovac should go completely catatonic during the day; what if someone does an unusual mass deletion, or something? But it does seem pretty reasonable to have a notion of a maintenance window where it should be more active than it is at other times. Maybe what you want is two complete sets of autovac parameters. Definitely at least two sets of the vacuum-delay values. With the introduction of the stats collector, is there not some way of extending it so that autovac has more information to work off of? For instance, in my environment, we have clients in every timezone hitting the database ... our Japanese clients will be busy at a totally different time of day then our East Coast/NA clients, so a 'maintenance window' is near impossible to state ... I know one person was talking about being able to target only those that pages that have changes, instead of the whole table ... but some sort of load monitoring that checks # of active connections and tries to find 'lulls'? Basically, everything right now is being keyed to updates to the tables themselves, but isn't looking at what the system itself is doing ... if I'm doing a massive import of data into a table, the last time I want is a VACUUM to cut in and slow down the loading ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] performance of bitmap scans in nested loop joins
On Fri, 29 Apr 2005, Tom Lane wrote: - Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=35 width=48) (actual time=0.006..0.006 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) The latter is (or should be) doing slightly *less* work, so why is it taking almost twice as much time? Can you get gprof profiles of the two cases? I've got them. Here there are two gprof profiles: http://lnfm1.sai.msu.ru/~math/public_misc/idxscan.gprof http://lnfm1.sai.msu.ru/~math/public_misc/bitmap.gprof (now as links, because the previous letter with those files as attachements haven't passed on -hackers (due to size, I think)) bitmap.gprof is the profiling of the: test=# explain analyze select * from q3c,q3c as q3cs where (q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) OR (q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993); QUERY PLAN - Nested Loop (cost=5832.01..190280130928.00 rows=18889 width=96) (actual time=0.435..374743.591 rows=300 loops=1) - Seq Scan on q3c q3cs (cost=0.00..60928.00 rows=300 width=48) (actual time=0.079..10632.570 rows=300 loops=1) - Bitmap Heap Scan on q3c (cost=5832.01..43426.68 rows=67 width=48) (actual time=0.102..0.104 rows=1 loops=300) Recheck Cond: (((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = (outer.ipix + 3))) OR ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993 - BitmapOr (cost=5832.01..5832.01 rows=67 width=0) (actual time=0.094..0.094 rows=0 loops=300) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.01 rows=33 width=0) (actual time=0.045..0.045 rows=1 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = (outer.ipix + 3))) - Bitmap Index Scan on ipix_idx (cost=0.00..2916.01 rows=33 width=0) (actual time=0.041..0.041 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) Total runtime: 377551.805 ms (10 rows) And idxscan.gprof is the profiling of the: test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE (q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993); QUERY PLAN --- Nested Loop (cost=0.01..49059045928.00 rows=1 width=96) (actual time=104991.950..104991.950 rows=0 loops=1) - Seq Scan on q3c q3cs (cost=0.00..60928.00 rows=300 width=48) (actual time=0.069..10465.514 rows=300 loops=1) - Index Scan using ipix_idx on q3c (cost=0.01..9686.33 rows=33 width=48) (actual time=0.025..0.025 rows=0 loops=300) Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = (outer.ipix - 993))) Total runtime: 104992.202 ms (5 rows) With Best regards, Sergey Koposov ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] SPI bug.
Hi, While trying to determine if SPI_cursor_move can rewind (and receiving a great help from the guys at the irc), we found out that since the count parameter is int and FETCH_ALL is LONG_MAX then setting the count parameter to FETCH_ALL to rewind will not work on 64bit systems. On my pIII 32 bit system it works since int size=long size. I am using 8.0.2 (i.e. the repositioning bug is already fixed here). I think the solution can be either changing the FETCH_ALL to INT_MAX or changing the interface parameter count and subsequent usages to long. (FETCH_ALL at parsenodes.h) Regards, tzahi. WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Feature freeze date for 8.1
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Marc G. Fournier) wrote: I know one person was talking about being able to target only those that pages that have changes, instead of the whole table ... but some sort of load monitoring that checks # of active connections and tries to find 'lulls'? I have some log table purging processes I'd like to put in place; it would be really slick to be able to get some statistics from the system as to how busy the DB has been in the last little while. The nice, adaptive algorithm: - Loop forever - Once a minute, evaluate how busy things seem, giving some metric X - If X is high then purge 10 elderly tuples from table log_table - If X is moderate then purge 100 elderly tuples from table log_table - If X is low then purge 1000 elderly tuples from table log_table The trouble is in measuring some form of X. Some reasonable approximations might include: - How much disk I/O was recorded in the last 60 seconds? - How many application transactions (e.g. - invoices or such) were issued in the last 60 seconds (monitoring a sequence could be good enough). -- output = reverse(gro.mca @ enworbbc) http://linuxfinances.info/info/slony.html ?OM ERROR ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Feature freeze date for 8.1
On Fri, 29 Apr 2005, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Marc G. Fournier) wrote: I know one person was talking about being able to target only those that pages that have changes, instead of the whole table ... but some sort of load monitoring that checks # of active connections and tries to find 'lulls'? I have some log table purging processes I'd like to put in place; it would be really slick to be able to get some statistics from the system as to how busy the DB has been in the last little while. The nice, adaptive algorithm: - Loop forever - Once a minute, evaluate how busy things seem, giving some metric X - If X is high then purge 10 elderly tuples from table log_table - If X is moderate then purge 100 elderly tuples from table log_table - If X is low then purge 1000 elderly tuples from table log_table The trouble is in measuring some form of X. Some reasonable approximations might include: - How much disk I/O was recorded in the last 60 seconds? - How many application transactions (e.g. - invoices or such) were issued in the last 60 seconds (monitoring a sequence could be good enough). Some way of doing a 'partial vacuum' would be nice ... where a VACUUM could stop after it processed those '10 elderly tuples' and on the next pass, resume from that point instead of starting from the beginning again ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Increased company involvement
Andrew Dunstan wrote: I've deliberately let the dust settle slightly on this. One thing that might help is a more open sponsorship clearing house. Example (not meant as a bid, but just to illustrate): the JDBC driver needs a scanner overhaul - it breaks on dollar quoting and a bunch of other stuff. I could do that work (as could others, of course) but I don't have time, unless someone buys some of my professional time. Someone might want to do just that, but how would they find me? Regarding the secret code stuff - I predict that it will quickly bite whoever does it, unless they are extremely lucky. Let me add that we really do want these companies to succeed. For me, it is all a matter of meeting the company's expectations. If a company wants to add a closed-source feature to PostgreSQL, that's fine, but if they work on a features and eventually want it integrated into the main code, I don't want that to fail and have them be disappointed. The same goes for duplicated effort --- if they don't wnat it to happen, I want to help them prevent it. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Feature freeze date for 8.1
Marc G. Fournier wrote: On Fri, 29 Apr 2005, Christopher Browne wrote: Some reasonable approximations might include: - How much disk I/O was recorded in the last 60 seconds? - How many application transactions (e.g. - invoices or such) were issued in the last 60 seconds (monitoring a sequence could be good enough). Some way of doing a 'partial vacuum' would be nice ... where a VACUUM could stop after it processed those '10 elderly tuples' and on the next pass, resume from that point instead of starting from the beginning again ... That is sorta what the vacuum delay settings accomplish. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Increased company involvement
Andrew Dunstan [EMAIL PROTECTED] writes: Regarding the secret code stuff - I predict that it will quickly bite whoever does it, unless they are extremely lucky. Yeah. Bruce and I were worrying about this on the phone today. If a company is doing some work with the intent that it's a proprietary extension they can sell, no problem --- the BSD license is specifically intended to let them do that. What's bothering us is the thought that people are off in corners developing code that they think they are going to contribute back into the community code base after it's done. Past history shows that the odds of getting such things accepted into the PG community code base are *very* bad if you didn't communicate with the community from the start of your development process. We'd like to avoid such unpleasant surprises, but how to get the word out? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [HACKERS] Increased company involvement
On Fri, 29 Apr 2005, Andrew Dunstan wrote: One thing that might help is a more open sponsorship clearing house. Example (not meant as a bid, but just to illustrate): the JDBC driver needs a scanner overhaul - it breaks on dollar quoting and a bunch of other stuff. I could do that work (as could others, of course) but I don't have time, unless someone buys some of my professional time. Someone might want to do just that, but how would they find me? I don't think this is a big issue. I don't know of any companies who were desperate for a feature and willing to throw money at the problem who couldn't find a developer to take them up on it. Right now this seems to be a kind of behind the scenes operation that relies heavily on knowing the right people, but I think most of our sponsor contact points are able to point sponsors to the right people. Could this process be more open? Depends on how the sponsor wants to handle it, they probably don't just want to throw the task out there and see who comes calling, they want an assurance from someone they trust that the chosen developer is capable. One thing that definitely would be nice would be to be able to combine funds from various sponsors for various features. Alone a company can't spring for it, but by pooling resources it could get done. This is a lot tougher to coordinate and unless there is a complete spec in place different sponsors will pull in different directions. Other bounty type schemes don't seem to produce results, largely from a lack of cash. (Here's $500 for two weeks of work). Anyone care to shed some light on how it works now? Kris Jurka ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly