Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD

2005-04-29 Thread Thomas Hallgren
Michael Fuhr wrote:
On Thu, Apr 28, 2005 at 09:47:45PM +0200, Thomas Hallgren wrote:
 

What version of PostgreSQL are you using
 

The latest and greatest from CVS.
   

Which branch?  HEAD?  REL8_0_STABLE?
 

Sorry. To me latest always defaults to HEAD and by greatest I mean 
the coming 8.1.

 

Wouldn't it make sense to be able to define a record in the projection 
part of a query, similar to what I was attempting with my SELECT? Has 
this been discussed or is it just considered as not very useful?
   

Sounds reasonable to me, but if it's currently possible then I
haven't yet figured out how to do it.  I can't remember if it's
been discussed before or not.  If nobody answers here then you
might try pgsql-hackers.
 

Ok. Thanks. I'll redirect this to hackers and see if I have any luck 
there. My original question was:

I do the following:
CREATE FUNCTION xyz(int, int) RETURNS RECORD AS '...'
CREATE TABLE abc(a int, b int);
Now I want to call my xyz function once for each row in abc and I want 
my RECORD to be (x int, y int, z timestamptz). How do I write that 
query? I.e. where do specify my RECORD definition? Is it possible at 
all? Ideally I'd like to write something like this:

SELECT xyz(a, b) AS (x int, y int, z timestamptz) FROM abc;
but that yields a syntax error.
Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore stuck in a loop?

2005-04-29 Thread Rod Taylor
On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote:
  You tell us ;-).  You've got the test case, attach to it with a debugger
  and find out what it's doing.
 
  I wasn't entirely sure how to catch it in action so I just used CTRL+C
  to interrupt the pg_restore process, and got these as backtraces:
 
 Hm, I wonder if there is something broken about the SQL parsing logic
 in _sendSQLLine, such that it could go into an infinite loop given the
 right input?

Let me see if I can put together a test case which is smaller for next
week, and I'll send that in.

-- 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Distinct-Sampling (Gibbons paper) for Postgres

2005-04-29 Thread a3a18850
Well, this guy has it nailed. He cites Flajolet and Martin, which was (I 
thought) as good as you could get with only a reasonable amount of memory per 
statistic. Unfortunately, their hash table is a one-shot deal; there's no way 
to maintain it once the table changes. His incremental update doesn't degrade 
as the table changes. If there isn't the same wrangle of patent as with the 
ARC algorithm, and if the existing stats collector process can stand the extra 
traffic, then this one is a winner. 
 
Many thanks to the person who posted this reference in the first place; so 
sorry I canned your posting and can't recall your name. 
 
Now, if we can come up with something better than the ARC algorithm ... 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Christopher Browne
In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
   o integrated auto-vacuum (Bruce)

If this can kick off a vacuum of a Very Large Table at an unfortunate
time, this can turn out to be a prety painful misfeature.

What I'd _really_ love to see (and alas, it's beyond my ken) is some
parallel to the FSM, namely a Recently Updated Blocks Map, which
would enable a vacuuming approach that would not go through entire
tables, but which would rather go through only those blocks known to
be recently updated.

There continues to be trouble if you have a table that grows to 50
million rows where there are 100K rows that are being heavily
updated.  In effect, only the 100K rows need facuuming.
-- 
(reverse (concatenate 'string moc.liamg @ enworbbc))
http://linuxfinances.info/info/emacs.html
Group Dynamics
Following Minsky and Schelling, consider a person as a society of
agents. A group is then a larger society of such agents. Understand
groups by examining interactions of coalitions of agents that
cross-cut their grouping into people.
-- Mark Miller

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] PseudoPartitioning and agregates

2005-04-29 Thread Sokolov Yura
Hello, pgsql-hackers.

I have an idea ( :-) ) about
SELECT field1,agregate(field2) FROM view GROUP BY field1;
(and its variant SELECT agragate(field2) FROM view)
where view is SELECT ... UNION ALL ... :

As i understood from thread
http://archives.postgresql.org/pgsql-performance/2004-12/msg00101.php
there is a problem, that APPEND allways perfoms before aggregate;

I think, postgres can perfoms aggregate on each table in union first,
and then merge results.
For this, for aggregate we must define function for merging, which should
accept two values of type STYPE.

for example:
CREATE AGGREGATE max(
  BASETYPE=float4,
  SFUNC=float4larger,
  STYPE=float4,
  MERGEFUNC=float4larger,
  FINALFUNC=-
);
CREATE AGGREGATE sum(
  BASETYPE=float4,
  SFUNC=float4pl,
  STYPE=float4,
  MERGEFUNC=float4pl,
  FINALFUNC=-
);
CREATE AGGREGATE count(
  BASETYPE=any,
  SFUNC=int8inc,
  STYPE=int8,
  FINALFUNC=-,
  MERGEFUNC=int8pl,  -- special case
  INITCOND='0'
);

CREATE TABLE t1
(
id INT PRIMARY KEY,
grp  INT,
amount FLOAT4
);
CREATE INDEX ix_t1_grp ON t1 (grp);

CREATE TABLE t2
(
id INT PRIMARY KEY,
grp  INT,
amount FLOAT4
);
CREATE INDEX ix_t2_grp ON t2 (grp);

insert into t1 select tt.i,tt.i/100,tt.i*sin(tt.i) from 
generate_series(1,262144) tt(i);
insert into t2 select tt.i,tt.i/100,tt.i*sin(tt.i) from 
generate_series(262145,524288) tt(i);

VACUUM FULL ANALIZE t1;
VACUUM FULL ANALIZE t2;

CREATE VIEW union_table AS
SELECT id,grp,amount FROM t1
UNION ALL
SELECT id,grp, amount FROM t2;

So, now t1 and t2 both contain 262144 rows ( summary 524288)
max(t1.grp)=min(t2.grp)=2621

Now, for perfoming query

SELECT group,count(*) AS c,sum(amount) AS s,max(amount) AS m FROM union_table 
GROUP BY grp;

Postgres selects rows from t1 and t2, APPENDs it together, and then perfoming 
HASH.

HashAggregate  (cost=23830.52..23832.02 rows=200 width=8) (actual 
time=22547.272..22586.130 rows=5243 loops=1)
  -  Subquery Scan t_union  (cost=0.00..18587.64 rows=524288 width=8) (actual 
time=0.204..17863.444 rows=524288 loops=1)
-  Append  (cost=0.00..13344.76 rows=524288 width=12) (actual 
time=0.193..12990.177 rows=524288 loops=1)
  -  Subquery Scan *SELECT* 1  (cost=0.00..6684.88 rows=262144 
width=12) (actual time=0.186..4488.981 rows=262144 loops=1)
-  Seq Scan on t  (cost=0.00..4063.44 rows=262144 
width=12) (actual time=0.163..1915.213 rows=262144 loops=1)
  -  Subquery Scan *SELECT* 2  (cost=0.00..6659.88 rows=262144 
width=12) (actual time=0.225..4558.788 rows=262144 loops=1)
-  Seq Scan on t1  (cost=0.00..4038.44 rows=262144 
width=12) (actual time=0.208..1798.410 rows=262144 loops=1)
Total runtime: 22634.454 ms
(well, actual time is 2375 ms  Postgres 8.0.1 Slackware 10.0)

But it would be quicker agregates first table, then second and merge it.
For example, here is a query, that do it explicitly:

SELECT  COALESCE(t1.grp,t2.grp) as grp,
CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN
int8pl(t1.c,t2.c)
 when t1.grp IS NOT NULL THEN
t1.c
 ELSE
t2.c
END AS c,
CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN
float4pl(t1.s,t2.s)
 WHEN t1.grp IS NOT NULL THEN
t1.s
 ELSE
t2.s
END AS s,
CASE WHEN t1.grp IS NOT NULL AND t2.grp IS NOT NULL THEN
float4larger(t1.m,t2.m)
 WHEN t1.grp IS NOT NULL THEN
t1.m
 ELSE
t2.m
END AS m
FROM
(SELECT grp,count(*) AS c,sum(amount) AS s,max(amount) AS m from t1 
group by grp) as t1
FULL JOIN
(SELECT grp,count(*) AS c,sum(amount) AS s,max(amount) AS m from t2 
group by grp) as t2
ON t1.grp=t2.grp;

Here is an explain analize:

Merge Full Join  (cost=13737.48..14535.48 rows=34885 width=40) (actual 
time=7908.438..7989.105 rows=5243 loops=1)
  Merge Cond: (outer.grp = inner.grp)
  -  Sort  (cost=6854.32..6860.87 rows=2618 width=20) (actual 
time=4070.833..4083.687 rows=2622 loops=1)
Sort Key: t2.grp
-  Subquery Scan t2  (cost=6659.88..6705.70 rows=2618 width=20) 
(actual time=4005.290..4057.131 rows=2622 loops=1)
  -  HashAggregate  (cost=6659.88..6679.52 rows=2618 width=8) 
(actual time=4005.273..4025.564 rows=2622 loops=1)
-  Seq Scan on t1  (cost=0.00..4038.44 rows=262144 
width=8) (actual time=0.094..1712.362 rows=262144 loops=1)
  -  Sort  (cost=6883.15..6889.82 rows=2665 width=20) (actual 
time=3837.433..3845.754 rows=2622 loops=1)
Sort Key: t1.grp
-  Subquery Scan t1  (cost=6684.88..6731.52 rows=2665 width=20) 
(actual time=3771.661..3822.520 rows=2622 loops=1)
  -  HashAggregate  (cost=6684.88..6704.87 rows=2665 width=8) 
(actual time=3771.564..3793.023 rows=2622 loops=1)
   

Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Tom Lane
Christopher Browne [EMAIL PROTECTED] writes:
 In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
 o integrated auto-vacuum (Bruce)

 If this can kick off a vacuum of a Very Large Table at an unfortunate
 time, this can turn out to be a prety painful misfeature.

[ shrug... ]  You'll always be able to turn it off if you don't want it.
I'm not sure that we'll be ready to turn it on by default even in 8.1.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Bruce Momjian
Tom Lane wrote:
 Christopher Browne [EMAIL PROTECTED] writes:
  In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
  o integrated auto-vacuum (Bruce)
 
  If this can kick off a vacuum of a Very Large Table at an unfortunate
  time, this can turn out to be a prety painful misfeature.
 
 [ shrug... ]  You'll always be able to turn it off if you don't want it.
 I'm not sure that we'll be ready to turn it on by default even in 8.1.

Agreed.  It will just be there to turn on from postgresql.conf if you
want it, and we do have TODO information about keeping such an FSM for
recently expired pages.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] performance of bitmap scans in nested loop joins

2005-04-29 Thread Sergey E. Koposov
Hello All,

I'd like to report about suprising (for me) results of performance testing of
bitmap  indexes in nested loop join plans. 

I have the table q3c

test=# \d q3c
 Table public.q3c
 Column |  Type  | Modifiers 
++---
 ipix   | bigint | 
 errbox | box| 
 ra | real   | 
 dec| real   | 
Indexes:
ipix_idx UNIQUE, btree (ipix)



test=# SELECT count(*) from q3c;
  count  
-
 300
(1 row)


First, two join plans with just simple index scan: 

# 1) 

test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE 
q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3;
QUERY PLAN  
   
---
 Nested Loop  (cost=0.01..49059519577.21 rows=11095 width=96) (actual 
time=0.137..39385.725 rows=300 loops=1)
   -  Seq Scan on q3c q3cs  (cost=0.00..60928.16 rows=316 width=48) 
(actual time=0.007..3659.171 rows=300 loops=1)
   -  Index Scan using ipix_idx on q3c  (cost=0.01..9686.37 rows=35 
width=48) (actual time=0.007..0.009 rows=1 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = 
(outer.ipix + 3)))
 Total runtime: 40755.390 ms
(5 rows)


# 2)

test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE 
(q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993);
QUERY PLAN  
   
---
 Nested Loop  (cost=0.01..49059519577.21 rows=11095 width=96) (actual 
time=28058.983..28058.983 rows=0 loops=1)
   -  Seq Scan on q3c q3cs  (cost=0.00..60928.16 rows=316 width=48) 
(actual time=0.061..3803.598 rows=300 loops=1)
   -  Index Scan using ipix_idx on q3c  (cost=0.01..9686.37 rows=35 
width=48) (actual time=0.006..0.006 rows=0 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = 
(outer.ipix - 993)))
 Total runtime: 28059.066 ms
(5 rows)

#

And now I combine them in one:


test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE 
(q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) OR 
(q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993);

   QUERY PLAN   
 
-
 Nested Loop  (cost=5832.03..190281569757.46 rows=1888909037091 width=96) 
(actual time=0.180..122444.610 rows=300 loops=1)
   -  Seq Scan on q3c q3cs  (cost=0.00..60928.16 rows=316 width=48) 
(actual time=0.063..3871.731 rows=300 loops=1)
   -  Bitmap Heap Scan on q3c  (cost=5832.03..43426.73 rows=70 width=48) 
(actual time=0.033..0.034 rows=1 loops=300)
 Recheck Cond: (((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix = 
(outer.ipix + 3))) OR ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = 
(outer.ipix - 993
 -  BitmapOr  (cost=5832.03..5832.03 rows=70 width=0) (actual 
time=0.029..0.029 rows=0 loops=300)
   -  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.02 
rows=35 width=0) (actual time=0.014..0.014 rows=1 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND 
(q3c.ipix = (outer.ipix + 3)))
   -  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.02 
rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND 
(q3c.ipix = (outer.ipix - 993)))
 Total runtime: 124366.545 ms
(10 rows)


So, we see that total time of the plan with bitmap scan is roughly two times
greater than the sum of times of individual index scans.  


I see that in my case even simple union is significantly faster than bitmap
indexes. 

test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c AS q3cs WHERE 
(q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) 
UNION ALL
SELECT * FROM q3c,q3c AS q3cs WHERE 
(q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993);
  QUERY PLAN
   
---
 Append  (cost=0.01..118119252488.32 rows=22190 width=96) (actual 
time=0.139..75048.897 rows=300 loops=1)
   -  Subquery Scan *SELECT* 1  (cost=0.01..59059626244.16 

Re: [HACKERS] [proposal] protocol extension to support loadable stream filters

2005-04-29 Thread Tom Lane
Brent Verner [EMAIL PROTECTED] writes:
   Now, the hard part...where should this code live?  I'm thinking a 
 src/transport directory seems sensible.

Our experience with trying to write single files to serve both server
and client sides has been, um, painful.  I recommend you not try this.
My recommendation would be server-side code in src/backend/libpq/
and client-side code in src/interfaces/libpq/.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD

2005-04-29 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
 Ideally I'd like to write something like this:

 SELECT xyz(a, b) AS (x int, y int, z timestamptz) FROM abc;

 but that yields a syntax error.

While that's probably doable if anyone were really motivated,
I'm not sure it's worth the trouble in view of the recent OUT-parameter
improvements.  IMHO most of the use cases for such a thing would be
better served by declaring the function with OUT parameters.  The
AS-clause-column-list functionality was invented for functions where the
result type is truly not known when the function is written, such as
dblink.  But it's pretty hard to believe that many people need to write
such things.

Your example can be done like this in CVS tip:

regression=# create function xyz(int, int, out x int, out y int, out z 
timestamptz) as $$ select $1, $2, now() $$ language sql;
CREATE FUNCTION
regression=# select xyz(unique1,unique2) from tenk1 limit 5;
   xyz
--
 (8800,0,2005-04-29 10:26:37.738946-04)
 (1891,1,2005-04-29 10:26:37.738946-04)
 (3420,2,2005-04-29 10:26:37.738946-04)
 (9850,3,2005-04-29 10:26:37.738946-04)
 (7164,4,2005-04-29 10:26:37.738946-04)
(5 rows)

Notice that this returns the record as a single column.  In most cases
you would probably wish that the record were burst into multiple
columns, which you can do easily with

regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5;
  x   | y |   z
--+---+---
 8800 | 0 | 2005-04-29 10:27:53.197948-04
 1891 | 1 | 2005-04-29 10:27:53.197948-04
 3420 | 2 | 2005-04-29 10:27:53.197948-04
 9850 | 3 | 2005-04-29 10:27:53.197948-04
 7164 | 4 | 2005-04-29 10:27:53.197948-04
(5 rows)

but AFAICS that is not amenable to having an AS plastered on it (unless
the AS goes inside the parentheses, which'd be a really spectacular
abuse of the syntax).

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Bruno Wolff III
On Fri, Apr 29, 2005 at 10:09:43 -0400,
  Bruce Momjian pgman@candle.pha.pa.us wrote:
 Tom Lane wrote:
  Christopher Browne [EMAIL PROTECTED] writes:
   In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) 
   wrote:
   o integrated auto-vacuum (Bruce)
  
   If this can kick off a vacuum of a Very Large Table at an unfortunate
   time, this can turn out to be a prety painful misfeature.
  
  [ shrug... ]  You'll always be able to turn it off if you don't want it.
  I'm not sure that we'll be ready to turn it on by default even in 8.1.
 
 Agreed.  It will just be there to turn on from postgresql.conf if you
 want it, and we do have TODO information about keeping such an FSM for
 recently expired pages.

I think if we aren't finding problems in testing that it would be better
to turn pg_autovacuum on by default. Vacuum is something the burns new
users and having it one by default is going to cut down on surprises.
Experienced users know about vacuum and will probably read the release
notes when upgrading and do something that is appropiate for them.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Matthew T. O'Connor
Tom Lane wrote:
Christopher Browne [EMAIL PROTECTED] writes:
 

If this can kick off a vacuum of a Very Large Table at an unfortunate
time, this can turn out to be a prety painful misfeature.
   

[ shrug... ]  You'll always be able to turn it off if you don't want it.
I'm not sure that we'll be ready to turn it on by default even in 8.1.
 

What to people think about having an optional maintenance window so 
that autovac only takes action during an approved time.  But perhaps 
just using the vacuum delay settings will be enough.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Marc G. Fournier
On Fri, 29 Apr 2005, Bruno Wolff III wrote:
On Fri, Apr 29, 2005 at 10:09:43 -0400,
 Bruce Momjian pgman@candle.pha.pa.us wrote:
Tom Lane wrote:
Christopher Browne [EMAIL PROTECTED] writes:
In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
o integrated auto-vacuum (Bruce)

If this can kick off a vacuum of a Very Large Table at an unfortunate
time, this can turn out to be a prety painful misfeature.
[ shrug... ]  You'll always be able to turn it off if you don't want it.
I'm not sure that we'll be ready to turn it on by default even in 8.1.
Agreed.  It will just be there to turn on from postgresql.conf if you
want it, and we do have TODO information about keeping such an FSM for
recently expired pages.
I think if we aren't finding problems in testing that it would be better
to turn pg_autovacuum on by default. Vacuum is something the burns new
users and having it one by default is going to cut down on surprises.
Except for the surprise of peridically having the system go unresponsive 
because it hit a large table, and that new user wondering what is wrong 
with postgresql that it just stalls seemingly randomly :(


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Bruno Wolff III
On Fri, Apr 29, 2005 at 12:43:37 -0300,
  Marc G. Fournier [EMAIL PROTECTED] wrote:
 On Fri, 29 Apr 2005, Bruno Wolff III wrote:
 
 Except for the surprise of peridically having the system go unresponsive 
 because it hit a large table, and that new user wondering what is wrong 
 with postgresql that it just stalls seemingly randomly :(

I think most users running systems with that large of tables will know
what they are doing. And will be more careful with vacuum.

Vacuum running for a long time on large tables with few pages that need
updating is really a separate problem that could use its own solution.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_restore stuck in a loop?

2005-04-29 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote:
 Hm, I wonder if there is something broken about the SQL parsing logic
 in _sendSQLLine, such that it could go into an infinite loop given the
 right input?

 Let me see if I can put together a test case which is smaller for next
 week, and I'll send that in.

If the parsing-bug theory is right, then there is probably some one
object in your database that will make even pg_dump -s go nuts.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Tom Lane
Matthew T. O'Connor matthew@zeut.net writes:
 What to people think about having an optional maintenance window so 
 that autovac only takes action during an approved time.  But perhaps 
 just using the vacuum delay settings will be enough.

I'm not sure autovac should go completely catatonic during the day;
what if someone does an unusual mass deletion, or something?  But
it does seem pretty reasonable to have a notion of a maintenance
window where it should be more active than it is at other times.

Maybe what you want is two complete sets of autovac parameters.
Definitely at least two sets of the vacuum-delay values.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Marc G. Fournier
On Fri, 29 Apr 2005, Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
What to people think about having an optional maintenance window so
that autovac only takes action during an approved time.  But perhaps
just using the vacuum delay settings will be enough.
I'm not sure autovac should go completely catatonic during the day;
what if someone does an unusual mass deletion, or something?  But
it does seem pretty reasonable to have a notion of a maintenance
window where it should be more active than it is at other times.
Maybe what you want is two complete sets of autovac parameters.
Definitely at least two sets of the vacuum-delay values.
With the introduction of the stats collector, is there not some way of 
extending it so that autovac has more information to work off of?  For 
instance, in my environment, we have clients in every timezone hitting the 
database ... our Japanese clients will be busy at a totally different time 
of day then our East Coast/NA clients, so a 'maintenance window' is near 
impossible to state ...

I know one person was talking about being able to target only those that 
pages that have changes, instead of the whole table ... but some sort of 
load monitoring that checks # of active connections and tries to find 
'lulls'?

Basically, everything right now is being keyed to updates to the tables 
themselves, but isn't looking at what the system itself is doing ... if 
I'm doing a massive import of data into a table, the last time I want is a 
VACUUM to cut in and slow down the loading ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD

2005-04-29 Thread Michael Fuhr
On Fri, Apr 29, 2005 at 10:36:05AM -0400, Tom Lane wrote:

 regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5;

This is a little off topic, but I've noticed that the above invokes
the function once per output column:

CREATE FUNCTION xyz(INOUT x integer, INOUT y integer, OUT z integer) AS $$
BEGIN
RAISE INFO 'calling xyz';
z := x + y;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

SELECT xyz(1,2);
INFO:  calling xyz
   xyz   
-
 (1,2,3)
(1 row)

SELECT (xyz(1,2)).*;
INFO:  calling xyz
INFO:  calling xyz
INFO:  calling xyz
 x | y | z 
---+---+---
 1 | 2 | 3
(1 row)

Is that because the splat causes the query to be expanded into
SELECT (xyz(1,2)).x, (xyz(1,2)).y, (xyz(1,2)).z?  Is it possible
or desirable to optimize that into a single call, at least if the
function were stable or immutable?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Returning a RECORD, not SETOF RECORD

2005-04-29 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 On Fri, Apr 29, 2005 at 10:36:05AM -0400, Tom Lane wrote:
 regression=# select (xyz(unique1,unique2)).* from tenk1 limit 5;

 This is a little off topic, but I've noticed that the above invokes
 the function once per output column:

Yeah, that is unfortunate but doesn't seem very easy to fix.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] performance of bitmap scans in nested loop joins

2005-04-29 Thread Tom Lane
Sergey E. Koposov [EMAIL PROTECTED] writes:
 I'd like to report about suprising (for me) results of performance testing of
 bitmap  indexes in nested loop join plans. 

I'm surprised too.  There's something weird in the indexscans
themselves:

-  Index Scan using ipix_idx on q3c  (cost=0.01..9686.37 rows=35 
 width=48) (actual time=0.006..0.006 rows=0 loops=300)
  Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = 
 (outer.ipix - 993)))

-  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.02 
 rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300)
  Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND 
 (q3c.ipix = (outer.ipix - 993)))

The latter is (or should be) doing slightly *less* work, so why is it
taking almost twice as much time?  Can you get gprof profiles of the
two cases?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [proposal] protocol extension to support loadable stream filters

2005-04-29 Thread Alvaro Herrera
On Fri, Apr 29, 2005 at 10:17:44AM -0400, Tom Lane wrote:

 Our experience with trying to write single files to serve both server
 and client sides has been, um, painful.  I recommend you not try this.

BTW, why not get rid of src/corba?

-- 
Alvaro Herrera ([EMAIL PROTECTED])
XML! Exclaimed C++.  What are you doing here? You're not a programming
language.
Tell that to the people who use me, said XML.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Increased company involvement

2005-04-29 Thread Andrew Dunstan
I've deliberately let the dust settle slightly on this.
One thing that might help is a more open sponsorship clearing house. 
Example (not meant as a bid, but just to illustrate): the JDBC driver 
needs a scanner overhaul - it breaks on dollar quoting and a bunch of 
other stuff. I could do that work (as could others, of course) but I 
don't have time, unless someone buys some of my professional time. 
Someone might want to do just that, but how would they find me?

Regarding the secret code stuff - I predict that it will quickly bite 
whoever does it, unless they are extremely lucky.

cheers
andrew
Bruce Momjian wrote:
I am very excited to see companies involved in PostgreSQL development. 
It gives us funding for developers and features that is new for us.  We
had Fujitsu funding some features for 8.0 and that really helped us.

However, there was a lot of coordination that happened with Fujitsu that
I don't see happening with the current companies involved.  Companies
are already duplicating work that is also done by community members or
by other companies.  The big issue is communication.  Because the
PostgreSQL code base is common for most of the companies involved, there
has to be coordination in what they are working on and their approaches.
If that doesn't happen, two companies will work on the same feature, and
only one can be added, or a complex process of merging the two patches
into one patch has to happen --- again duplicated effort.  I am willing
to do the coordination, or even better, have the companies involved
publicly post their efforts so all the other companies can know what
is happening.  I realize this is hard for companies because their
efforts are in some ways part of their profitability.  Does
profitability require duplication of effort and code collisions?  I am
not sure, but if it does, we are in trouble.  I am not sure the
community has the resources to resolve that many collisions.
Second, some developers are being hired from the community to work on
closed-source additions to PostgreSQL.  That is fine and great, but one
way to kill PostgreSQL is to hire away its developers.  If a commercial
company wanted to hurt us, that is certainly one way they might do it.
Anyway, it is a concern I have.  I am hoping community members hired to
do closed-source additions can at least spend some of their time on
community work.
And finally, we have a few companies working on features that they
eventually want merged back into the PostgreSQL codebase.  That is a
very tricky process and usually goes badly unless the company seeks
community involvement from the start, including user interface,
implementation, and coding standards.
I hate to be discouraging here, but I am trying to communicate what we
have learned over the past few years to help companies be effective in
working with open source communities.  I am available to talk to any
company that wants further details.
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Jim C. Nasby
I think what you're suggesting is that vacuum settings (most likely
delay) take into consideration the load on the database, which I think
is a great idea. One possibility is if vacuum tracks how many blocks
it's read/written, it can see how many blocks the database has done
overall; subtract the two and you know how much other disk IO is going
on in the system. You can then use that number to decide how long you'll
sleep before the next vacuum cycle.

On Fri, Apr 29, 2005 at 01:34:56PM -0300, Marc G. Fournier wrote:
 On Fri, 29 Apr 2005, Tom Lane wrote:
 
 Matthew T. O'Connor matthew@zeut.net writes:
 What to people think about having an optional maintenance window so
 that autovac only takes action during an approved time.  But perhaps
 just using the vacuum delay settings will be enough.
 
 I'm not sure autovac should go completely catatonic during the day;
 what if someone does an unusual mass deletion, or something?  But
 it does seem pretty reasonable to have a notion of a maintenance
 window where it should be more active than it is at other times.
 
 Maybe what you want is two complete sets of autovac parameters.
 Definitely at least two sets of the vacuum-delay values.
 
 With the introduction of the stats collector, is there not some way of 
 extending it so that autovac has more information to work off of?  For 
 instance, in my environment, we have clients in every timezone hitting the 
 database ... our Japanese clients will be busy at a totally different time 
 of day then our East Coast/NA clients, so a 'maintenance window' is near 
 impossible to state ...
 
 I know one person was talking about being able to target only those that 
 pages that have changes, instead of the whole table ... but some sort of 
 load monitoring that checks # of active connections and tries to find 
 'lulls'?
 
 Basically, everything right now is being keyed to updates to the tables 
 themselves, but isn't looking at what the system itself is doing ... if 
 I'm doing a massive import of data into a table, the last time I want is a 
 VACUUM to cut in and slow down the loading ...
 
 
 Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] performance of bitmap scans in nested loop joins

2005-04-29 Thread Sergey E. Koposov
On Fri, 29 Apr 2005, Tom Lane wrote:
 -  Index Scan using ipix_idx on q3c  (cost=0.01..9686.37 rows=35 
  width=48) (actual time=0.006..0.006 rows=0 loops=300)
   Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix = 
  (outer.ipix - 993)))
 
 -  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.02 
  rows=35 width=0) (actual time=0.011..0.011 rows=0 loops=300)
   Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND 
  (q3c.ipix = (outer.ipix - 993)))
 
 The latter is (or should be) doing slightly *less* work, so why is it
 taking almost twice as much time?  Can you get gprof profiles of the
 two cases?


I've got them. Here there are two gprof profiles:

http://lnfm1.sai.msu.ru/~math/public_misc/idxscan.gprof
http://lnfm1.sai.msu.ru/~math/public_misc/bitmap.gprof
  (now as links, because the previous letter with those files as attachements
   haven't passed on -hackers (due to size, I think)) 


bitmap.gprof is  the profiling of the:

test=# explain analyze select * from q3c,q3c as q3cs where 
(q3c.ipix=q3cs.ipix-3 AND q3c.ipix=q3cs.ipix+3) OR 
(q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993);

QUERY PLAN  


-
 Nested Loop  (cost=5832.01..190280130928.00 rows=18889 width=96)
(actual time=0.435..374743.591 rows=300 loops=1)
   -  Seq Scan on q3c q3cs  (cost=0.00..60928.00 rows=300 width=48)
(actual time=0.079..10632.570 rows=300 loops=1)
   -  Bitmap Heap Scan on q3c  (cost=5832.01..43426.68 rows=67
width=48)
(actual time=0.102..0.104 rows=1 loops=300)
 Recheck Cond: (((q3c.ipix = (outer.ipix - 3)) AND (q3c.ipix =
(outer.ipix + 3))) OR ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix
=
(outer.ipix - 993
 -  BitmapOr  (cost=5832.01..5832.01 rows=67 width=0) (actual
time=0.094..0.094 rows=0 loops=300)
   -  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.01
rows=33 width=0) (actual time=0.045..0.045 rows=1 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 3)) AND
(q3c.ipix = (outer.ipix + 3)))
   -  Bitmap Index Scan on ipix_idx  (cost=0.00..2916.01
rows=33 width=0) (actual time=0.041..0.041 rows=0 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND
(q3c.ipix = (outer.ipix - 993)))
 Total runtime: 377551.805 ms
(10 rows)


And idxscan.gprof is the profiling of the:

test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE 
(q3c.ipix=q3cs.ipix-1000 AND q3c.ipix=q3cs.ipix-993);
QUERY PLAN  

---
 Nested Loop  (cost=0.01..49059045928.00 rows=1 width=96)
(actual
time=104991.950..104991.950 rows=0 loops=1)
   -  Seq Scan on q3c q3cs  (cost=0.00..60928.00 rows=300 width=48)
(actual time=0.069..10465.514 rows=300 loops=1)
   -  Index Scan using ipix_idx on q3c  (cost=0.01..9686.33 rows=33
width=48) (actual time=0.025..0.025 rows=0 loops=300)
 Index Cond: ((q3c.ipix = (outer.ipix - 1000)) AND (q3c.ipix =
(outer.ipix - 993)))
 Total runtime: 104992.202 ms
(5 rows)


With Best regards, 
Sergey Koposov






---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] SPI bug.

2005-04-29 Thread Tzahi Fadida
Hi,
While trying to determine if SPI_cursor_move can rewind
(and receiving a great help from the guys at the irc), we
found out that since the count parameter is int
and FETCH_ALL is LONG_MAX then setting
the count parameter to FETCH_ALL to rewind
will not work on 64bit systems.

On my pIII 32 bit system it works since int size=long size.

I am using 8.0.2 (i.e. the repositioning bug is already fixed here).

I think the solution can be either changing the FETCH_ALL to
INT_MAX or changing the interface parameter count and subsequent usages
to long.
(FETCH_ALL at parsenodes.h)

Regards,
tzahi.

WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Christopher Browne
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Marc G. 
Fournier) wrote:
 I know one person was talking about being able to target only those
 that pages that have changes, instead of the whole table ... but some
 sort of load monitoring that checks # of active connections and
 tries to find 'lulls'?

I have some log table purging processes I'd like to put in place; it
would be really slick to be able to get some statistics from the
system as to how busy the DB has been in the last little while.  

The nice, adaptive algorithm:

- Loop forever

  - Once a minute, evaluate how busy things seem, giving some metric X

   - If X is high then purge 10 elderly tuples from table log_table
   - If X is moderate then purge 100 elderly tuples from table
  log_table
   - If X is low then purge 1000 elderly tuples from table
  log_table

The trouble is in measuring some form of X.

Some reasonable approximations might include:
 - How much disk I/O was recorded in the last 60 seconds?
 - How many application transactions (e.g. - invoices or such) were 
   issued in the last 60 seconds (monitoring a sequence could be
   good enough).
-- 
output = reverse(gro.mca @ enworbbc)
http://linuxfinances.info/info/slony.html
?OM ERROR

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Marc G. Fournier
On Fri, 29 Apr 2005, Christopher Browne wrote:
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Marc G. Fournier) wrote:
I know one person was talking about being able to target only those
that pages that have changes, instead of the whole table ... but some
sort of load monitoring that checks # of active connections and
tries to find 'lulls'?
I have some log table purging processes I'd like to put in place; it
would be really slick to be able to get some statistics from the
system as to how busy the DB has been in the last little while.
The nice, adaptive algorithm:
- Loop forever
 - Once a minute, evaluate how busy things seem, giving some metric X
  - If X is high then purge 10 elderly tuples from table log_table
  - If X is moderate then purge 100 elderly tuples from table
 log_table
  - If X is low then purge 1000 elderly tuples from table
 log_table
The trouble is in measuring some form of X.
Some reasonable approximations might include:
- How much disk I/O was recorded in the last 60 seconds?
- How many application transactions (e.g. - invoices or such) were
  issued in the last 60 seconds (monitoring a sequence could be
  good enough).
Some way of doing a 'partial vacuum' would be nice ... where a VACUUM 
could stop after it processed those '10 elderly tuples' and on the next 
pass, resume from that point instead of starting from the beginning again 
...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Increased company involvement

2005-04-29 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 I've deliberately let the dust settle slightly on this.
 
 One thing that might help is a more open sponsorship clearing house. 
 Example (not meant as a bid, but just to illustrate): the JDBC driver 
 needs a scanner overhaul - it breaks on dollar quoting and a bunch of 
 other stuff. I could do that work (as could others, of course) but I 
 don't have time, unless someone buys some of my professional time. 
 Someone might want to do just that, but how would they find me?
 
 Regarding the secret code stuff - I predict that it will quickly bite 
 whoever does it, unless they are extremely lucky.

Let me add that we really do want these companies to succeed.

For me, it is all a matter of meeting the company's expectations.  If a
company wants to add a closed-source feature to PostgreSQL, that's fine,
but if they work on a features and eventually want it integrated into
the main code, I don't want that to fail and have them be disappointed. 
The same goes for duplicated effort --- if they don't wnat it to happen,
I want to help them prevent it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Feature freeze date for 8.1

2005-04-29 Thread Matthew T. O'Connor
Marc G. Fournier wrote:
On Fri, 29 Apr 2005, Christopher Browne wrote:
Some reasonable approximations might include:
- How much disk I/O was recorded in the last 60 seconds?
- How many application transactions (e.g. - invoices or such) were
  issued in the last 60 seconds (monitoring a sequence could be
  good enough).

Some way of doing a 'partial vacuum' would be nice ... where a VACUUM 
could stop after it processed those '10 elderly tuples' and on the 
next pass, resume from that point instead of starting from the 
beginning again ...

That is sorta what the vacuum delay settings accomplish.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Increased company involvement

2005-04-29 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Regarding the secret code stuff - I predict that it will quickly bite 
 whoever does it, unless they are extremely lucky.

Yeah.  Bruce and I were worrying about this on the phone today.
If a company is doing some work with the intent that it's a proprietary
extension they can sell, no problem --- the BSD license is specifically
intended to let them do that.  What's bothering us is the thought that
people are off in corners developing code that they think they are going
to contribute back into the community code base after it's done.  Past
history shows that the odds of getting such things accepted into the PG
community code base are *very* bad if you didn't communicate with the
community from the start of your development process.

We'd like to avoid such unpleasant surprises, but how to get the word
out?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [pgsql-advocacy] [HACKERS] Increased company involvement

2005-04-29 Thread Kris Jurka


On Fri, 29 Apr 2005, Andrew Dunstan wrote:

 One thing that might help is a more open sponsorship clearing house. 
 Example (not meant as a bid, but just to illustrate): the JDBC driver 
 needs a scanner overhaul - it breaks on dollar quoting and a bunch of 
 other stuff. I could do that work (as could others, of course) but I 
 don't have time, unless someone buys some of my professional time. 
 Someone might want to do just that, but how would they find me?

I don't think this is a big issue.  I don't know of any companies who were
desperate for a feature and willing to throw money at the problem who
couldn't find a developer to take them up on it.  Right now this seems to
be a kind of behind the scenes operation that relies heavily on knowing
the right people, but I think most of our sponsor contact points are able 
to point sponsors to the right people.  Could this process be more open?  
Depends on how the sponsor wants to handle it, they probably don't just 
want to throw the task out there and see who comes calling, they want an 
assurance from someone they trust that the chosen developer is capable.

One thing that definitely would be nice would be to be able to combine 
funds from various sponsors for various features.  Alone a company can't 
spring for it, but by pooling resources it could get done.  This is a lot 
tougher to coordinate and unless there is a complete spec in place 
different sponsors will pull in different directions.  Other bounty type 
schemes don't seem to produce results, largely from a lack of cash. 
(Here's $500 for two weeks of work).

Anyone care to shed some light on how it works now?

Kris Jurka

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly