Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Jim C. Nasby
On Wed, Mar 22, 2006 at 02:37:28PM -0500, Kris Jurka wrote:
 
 
 On Wed, 22 Mar 2006, Jim C. Nasby wrote:
 
 Ok, I saw disk activity on the base directory and assumed it was pg_xlog
 stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore
 default_tablepsace and create the new tables in the base directory. I'm
 guessing that's a bug... (this is on 8.1.2, btw).
 
 This has been fixed in CVS HEAD as part of a patch to allow additional 
 options to CREATE TABLE AS.
 
 http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php

I'll argue that the current behavior is still a bug and should be fixed.
Would it be difficult to patch 8.1 (and 8.0 if there were tablespaces
then...) to honor default_tablespace?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Jim C. Nasby
On Wed, Mar 22, 2006 at 02:20:39PM +0800, Qingqing Zhou wrote:
 
 Simon Riggs [EMAIL PROTECTED] wrote
  On Tue, 2006-03-21 at 06:22 -0600, Jim C. Nasby wrote:
   Currently, it appears that SELECT * INTO new_table FROM old_table logs
   each page as it's written to WAL. Is this actually needed? Couldn't the
   database simply log that the SELECT ... INTO statement was executed
   instead? Doing so would likely result in a large performance improvement
   in most installs. Is there no provision for writing anything but data
   page changes (or whole pages) to WAL?
 
  AFAIK it takes the same code path as CREATE TABLE AS SELECT, which
  already does exactly what you suggest (except when using PITR).
 
 
 As I read, they did take the same code path, but did they simply log that
 the SELECT ... INTO statement was executed? If so, how can we rely on the
 unreliable content of the old_table to do recovery?

Why would the content of the old_table be unreliable? If we've replayed
logs up to the point of the CTAS then any data that would be visible to
the CTAS should be fine, no?

Though, the way Tom put it in one of his replies it sounds like WAL
doesn't do any kind of statement logging, only data logging. If that's
the case I'm not sure that the CTAS would actually get replayed. But I
suspect I'm just misunderstanding...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Jim C. Nasby
On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
 wrote:
  On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
   Has someone been working on the problem of splitting a query into pieces
   and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
   done that.  
  
  I believe that is limited to Bizgress MPP yes?
 
  Yep.  I hope that someday it will be released to the postgresql global
  dev group for inclusion.  Or at least parts of it.
 
 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).
 
 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

Correct me if I'm wrong, but there's no way to (reasonably) accomplish
that without having some dedicated extra processes laying around that
you can use to execute the queries, no? In other words, the cost of a
fork() during query execution would be too prohibitive...

FWIW, DB2 executes all queries in a dedicated set of processes. The
process handling the connection from the client will pass a query
request off to one of the executor processes. I can't remember which
process actually plans the query, but I know that the executor runs it.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Postmaster using only 4-5% CPU

2006-03-24 Thread Jim C. Nasby
On Thu, Mar 23, 2006 at 10:14:24AM +0100, Edoardo Serra wrote:
 Now, for the interesting test.  Run the import on both machines, with
 the begin; commit; pairs around it.  Halfway through the import, pull
 the power cord, and see which one comes back up.  Don't do this to
 servers with data you like, only test machines, obviously.  For an even
 more interesting test, do this with MySQL, Oracle, DB2, etc...
 
 I will surely run a test like this ;)

If you do, I'd be *very* interested in the results. Pervasive would
probably pay for a whitepaper about this, btw (see
http://www.pervasivepostgres.com/postgresql/partners_in_publishing.asp).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Indexes with descending date columns

2006-03-24 Thread Jim C. Nasby
On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
 ii If no to i, is it feasible to extend PostgreSQL to allow traversing
an index in column descending and column ascending order - assuming
an order by on more than one column with column order not 
in the same direction and indexes existing? ... if that makes sense.

Yes.

stats=# explain select * from email_contrib order by project_id desc, id desc, 
date desc limit 10;
   QUERY PLAN   


 Limit  (cost=0.00..31.76 rows=10 width=24)
   -  Index Scan Backward using email_contrib_pkey on email_contrib  
(cost=0.00..427716532.18 rows=134656656 width=24)
(2 rows)

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Problem with query, server totally unresponsive

2006-03-24 Thread Jim C. Nasby
On Thu, Mar 23, 2006 at 01:12:08PM +0100, Bendik Rognlien Johansen wrote:
 Hello, I have a big problem with one of my databases. When i run my  
 query, after a few minutes, the postmaster shows 99% mem i top, and  
 the server becomes totally unresponsive.

You've got a bunch of sorts going on; could you be pushing the machine
into swapping?

 I get this message when I try to cancel the query:
 
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 
Did you send a kill of some kind to the backend?
 
 The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of ram.

Unless I missed some big news recently, no such CPU exists.
Hyperthreading is absolutely not the same as dual core, and many people
have found that it's best to disable hyperthreading on database servers.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Alvaro Herrera
Jim C. Nasby wrote:

 Why would the content of the old_table be unreliable? If we've replayed
 logs up to the point of the CTAS then any data that would be visible to
 the CTAS should be fine, no?
 
 Though, the way Tom put it in one of his replies it sounds like WAL
 doesn't do any kind of statement logging, only data logging. If that's
 the case I'm not sure that the CTAS would actually get replayed. But I
 suspect I'm just misunderstanding...

The CTAS doesn't get logged (nor replayed obviously).  What happens is
that the involved files are fsync'ed before transaction commit, AFAIR.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Array performance

2006-03-24 Thread Ruben Rubio Rey

Hi,

I have a select like

SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
FROM table
WHERE
(array[20]+array[21]+ ... +array[50]+array[51])5000
AND array[20]0
AND array[21]0
...
AND array[50]0
AND array[51])0

Any ideas to make this query faster?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Array performance

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote:
 Hi,
 
 I have a select like
 
 SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
 FROM table
 WHERE
 (array[20]+array[21]+ ... +array[50]+array[51])5000

http://www.varlena.com/GeneralBits/109.php might provide some useful
insights. I also recall seeing something about sum operators for arrays,
but I can't recall where.

 AND array[20]0
 AND array[21]0
 ...
 AND array[50]0
 AND array[51])0

Uhm... please don't tell me that you're using 0 in place of NULL...

You might be able to greatly simplify that by use of ANY; you'd need to
ditch elements 1-19 though:

... WHERE NOT ANY(array) = 0

See http://www.postgresql.org/docs/8.1/interactive/arrays.html
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 01:49:17PM +0100, Svenne Krap wrote:
 explain select dataset_id, entity, sum(amount) from entrydata_current 
 where  flow_direction in (select * from outflow_direction(dataset_id)) 
 and dataset_id in (select * from get_dataset_ids(122)) group by 
 dataset_id, entity;
snip 
 which does not return within 10 minutes - which is unacceptable.


The issue is that the planner has no way to know what's comming back
from get_dataset_ids.

I think your best bet will be to wrap that select into it's own function
and have that function prepare the query statement, going back to
hard-coded values. So you could do something like:

SQL := 'SELECT ... AND dataset_id IN (''' || get_dataset_ids(122) ||
''');' (yeah, I know that won't work as written, but you get the idea).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Array performance

2006-03-24 Thread Ruben Rubio Rey

Jim C. Nasby wrote:


On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote:
 


Hi,

I have a select like

SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
FROM table
WHERE
(array[20]+array[21]+ ... +array[50]+array[51])5000
   



http://www.varlena.com/GeneralBits/109.php might provide some useful
insights. I also recall seeing something about sum operators for arrays,
but I can't recall where.
 


I ll check it out, seems to be very useful
Is faster create a function to sum the array?

 


AND array[20]0
AND array[21]0
...
AND array[50]0
AND array[51])0
   



Uhm... please don't tell me that you're using 0 in place of NULL...
 

mmm ... i have read in postgres documentation that null values on arrays 
are not supported ...



You might be able to greatly simplify that by use of ANY; you'd need to
ditch elements 1-19 though:

... WHERE NOT ANY(array) = 0
 


Yep this is much better.


See http://www.postgresql.org/docs/8.1/interactive/arrays.html
 





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Array performance

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote:
 http://www.varlena.com/GeneralBits/109.php might provide some useful
 insights. I also recall seeing something about sum operators for arrays,
 but I can't recall where.
  
 
 I ll check it out, seems to be very useful
 Is faster create a function to sum the array?

There's been talk of having one, but I don't think any such thing
currently exists.

 AND array[20]0
 AND array[21]0
 ...
 AND array[50]0
 AND array[51])0

 
 
 Uhm... please don't tell me that you're using 0 in place of NULL...
  
 
 mmm ... i have read in postgres documentation that null values on arrays 
 are not supported ...

Damn, you're right. Another reason I tend to stay away from them...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Array performance

2006-03-24 Thread Michael Fuhr
On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote:
 On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote:
  mmm ... i have read in postgres documentation that null values on arrays 
  are not supported ...
 
 Damn, you're right. Another reason I tend to stay away from them...

8.2 will support NULL array elements.

http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php
http://developer.postgresql.org/docs/postgres/arrays.html

test= SELECT '{1,2,NULL,3,4}'::integer[];
  int4  

 {1,2,NULL,3,4}
(1 row)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Alvaro Herrera
Jim C. Nasby wrote:
 On Fri, Mar 24, 2006 at 08:39:02AM -0400, Alvaro Herrera wrote:
  Jim C. Nasby wrote:
  
   Why would the content of the old_table be unreliable? If we've replayed
   logs up to the point of the CTAS then any data that would be visible to
   the CTAS should be fine, no?
   
   Though, the way Tom put it in one of his replies it sounds like WAL
   doesn't do any kind of statement logging, only data logging. If that's
   the case I'm not sure that the CTAS would actually get replayed. But I
   suspect I'm just misunderstanding...
  
  The CTAS doesn't get logged (nor replayed obviously).  What happens is
  that the involved files are fsync'ed before transaction commit, AFAIR.
 
 Ahh, yes, that sounds right. Might be a nice gain to be had if there was
 some way to log the statement, but I suspect getting WAL to support that
 would be extremely non-trivial.

None at all, at least in the current incarnation, I think, because said
query execution is dependent on the contents of the FSM, which is itself
dependent on the timing of VACUUM and other stuff.  Such an action,
running with a different FSM content, can very trivially cause data
corruption.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 09:47:20AM -0400, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
  On Fri, Mar 24, 2006 at 08:39:02AM -0400, Alvaro Herrera wrote:
   Jim C. Nasby wrote:
   
Why would the content of the old_table be unreliable? If we've replayed
logs up to the point of the CTAS then any data that would be visible to
the CTAS should be fine, no?

Though, the way Tom put it in one of his replies it sounds like WAL
doesn't do any kind of statement logging, only data logging. If that's
the case I'm not sure that the CTAS would actually get replayed. But I
suspect I'm just misunderstanding...
   
   The CTAS doesn't get logged (nor replayed obviously).  What happens is
   that the involved files are fsync'ed before transaction commit, AFAIR.
  
  Ahh, yes, that sounds right. Might be a nice gain to be had if there was
  some way to log the statement, but I suspect getting WAL to support that
  would be extremely non-trivial.
 
 None at all, at least in the current incarnation, I think, because said
 query execution is dependent on the contents of the FSM, which is itself
 dependent on the timing of VACUUM and other stuff.  Such an action,
 running with a different FSM content, can very trivially cause data
 corruption.

Oh, duh, because subsiquent operations will depend on the heap being in
a very specific state. Oh well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Problem with query, server totally unresponsive

2006-03-24 Thread Dave Dutcher


 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Jim C. Nasby
 Subject: Re: [PERFORM] Problem with query, server totally unresponsive
 
 On Thu, Mar 23, 2006 at 01:12:08PM +0100, Bendik Rognlien Johansen
wrote:
  Hello, I have a big problem with one of my databases. When i run my
  query, after a few minutes, the postmaster shows 99% mem i top, and
  the server becomes totally unresponsive.
 
 You've got a bunch of sorts going on; could you be pushing the machine
 into swapping?
 
  I get this message when I try to cancel the query:
 
  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Failed.
 
 Did you send a kill of some kind to the backend?
 
  The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of
ram.
 
 Unless I missed some big news recently, no such CPU exists.
 Hyperthreading is absolutely not the same as dual core, and many
people
 have found that it's best to disable hyperthreading on database
servers.

Maybe I'm confused by the marketing, but I think those CPUs do exist.
According to New Egg the Pentium D 830 and the Pentium D 930 both are
dual core Pentiums that run at 3Ghz.  It also specifically says these
processors don't support hyper threading, so I believe they really have
two cores.   Maybe you are thinking he was talking about a 3Ghz Core
Duo.

http://www.newegg.com/Product/ProductList.asp?Category=34N=200034+5
0001157+1302820275+1051007392Submit=ENE

Dave



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Problem with query, server totally unresponsive

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 08:46:54AM -0600, Dave Dutcher wrote:
   The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of
 ram.
  
  Unless I missed some big news recently, no such CPU exists.
  Hyperthreading is absolutely not the same as dual core, and many
 people
  have found that it's best to disable hyperthreading on database
 servers.
 
 Maybe I'm confused by the marketing, but I think those CPUs do exist.
 According to New Egg the Pentium D 830 and the Pentium D 930 both are
 dual core Pentiums that run at 3Ghz.  It also specifically says these
 processors don't support hyper threading, so I believe they really have
 two cores.   Maybe you are thinking he was talking about a 3Ghz Core
 Duo.

A quick google shows I'm just behind the times; Intel does have true
dual-core CPUs now.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Luke Lonergan
Christopher,

On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote:

 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).

OK - I'll byte:

It's process based, we fork backends at slice points in the execution plan.

To take care of the startup latency problem, we persist sets of these
backends, called gangs.  They appear, persist for connection scope for
reuse, then are disbanded.

 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

:-)

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] limitation using LIKE on ANY(array)

2006-03-24 Thread K C Lau



With 8.1.3, I get an error when trying to do this on a Text[] column
:
.. WHERE ANY(array) LIKE 'xx%'

Indeed, I get rejected even with:
.. WHERE ANY(array) = 'xx'

In both cases, the error is: ERROR: syntax error at or near
any ... 

It would only work as documented in the manual (8.10.5):
SELECT * FROM sal_emp WHERE 1 = ANY (pay_by_quarter);

It appears that this restriction is still in place in
8.2:

http://developer.postgresql.org/docs/postgres/arrays.html

Is that the case?
Thanks in advance,
KC.



Re: [PERFORM] Array performance

2006-03-24 Thread Tom Lane
Ruben Rubio Rey [EMAIL PROTECTED] writes:
 SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
 FROM table
 WHERE
 (array[20]+array[21]+ ... +array[50]+array[51])5000
 AND array[20]0
 AND array[21]0
  ...
 AND array[50]0
 AND array[51])0

 Any ideas to make this query faster?

What's the array datatype?  Integer or float would probably go a lot
faster than NUMERIC, if that's what you're using now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Fri, Mar 24, 2006 at 01:49:17PM +0100, Svenne Krap wrote:
 explain select dataset_id, entity, sum(amount) from entrydata_current 
 where  flow_direction in (select * from outflow_direction(dataset_id)) 
 and dataset_id in (select * from get_dataset_ids(122)) group by 
 dataset_id, entity;

 The issue is that the planner has no way to know what's comming back
 from get_dataset_ids.

More specifically, the first IN is not optimizable into a join because
the results of the sub-SELECT depend on the current row of the outer
query.  The second IN is being optimized fine, but the first one is
what's killing you.

I'd suggest refactoring the functions into something that returns a set
of outflow_direction/dataset_id pairs, and then phrase the query as

where (flow_direction, dataset_id) in (select * from new_func(122))

You could do it without refactoring:

where (flow_direction, dataset_id) in
  (select outflow_direction(id),id from get_dataset_ids(122) id)

however this won't work if outflow_direction() is a plpgsql function
because of limitations in plpgsql's set-function support.  (It will work
if outflow_direction() is a SQL function, or you could kluge it as a SQL
function wrapper around a plpgsql function.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] limitation using LIKE on ANY(array)

2006-03-24 Thread Tom Lane
K C Lau [EMAIL PROTECTED] writes:
 Indeed, I get rejected even with:
 .. WHERE ANY(array) = 'xx'

 It would only work as documented in the manual (8.10.5):
 SELECT * FROM sal_emp WHERE 1 = ANY (pay_by_quarter);

That's not changing any time soon; the SQL spec defines only the second
syntax for ANY, and I believe there would be syntactic ambiguity if we
tried to allow the other.

 With 8.1.3, I get an error when trying to do this on a Text[] column :
 .. WHERE ANY(array) LIKE 'xx%'

If you're really intent on doing that, make an operator for reverse
LIKE and use it with the ANY on the right-hand side.

regression=# create function rlike(text,text) returns bool as
regression-# 'select $2 like $1' language sql strict immutable;
CREATE FUNCTION
regression=# create operator ~~~ (procedure = rlike, leftarg = text,
regression(# rightarg = text, commutator = ~~);
CREATE OPERATOR
regression=# select 'xx%' ~~~ any(array['aaa','bbb']);
 ?column?
--
 f
(1 row)

regression=# select 'xx%' ~~~ any(array['aaa','xxb']);
 ?column?
--
 t
(1 row)

regression=#

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Postmaster using only 4-5% CPU

2006-03-24 Thread Scott Marlowe
On Fri, 2006-03-24 at 04:16, Jim C. Nasby wrote:
 On Thu, Mar 23, 2006 at 10:14:24AM +0100, Edoardo Serra wrote:
  Now, for the interesting test.  Run the import on both machines, with
  the begin; commit; pairs around it.  Halfway through the import, pull
  the power cord, and see which one comes back up.  Don't do this to
  servers with data you like, only test machines, obviously.  For an even
  more interesting test, do this with MySQL, Oracle, DB2, etc...
  
  I will surely run a test like this ;)
 
 If you do, I'd be *very* interested in the results. Pervasive would
 probably pay for a whitepaper about this, btw (see
 http://www.pervasivepostgres.com/postgresql/partners_in_publishing.asp).

Hehe.  good luck with it.

At the last company I worked at I was the PostgreSQL DBA, and I could
not get one single Oracle, DB2, MySQL, MSSQL, Ingres, or other DBA to
agree to that kind of test.

6 months later, when all three power conditioners blew at once (amazing
what a 1/4 piece of wire can do, eh?) and we lost all power in our
hosting center, there was one, and only one, database server that came
back up without errors, and we know which one that was.  No other
database there was up in less than 2 hours.  So, I wandered the floor
watching the folks panic who were trying to bring their systems back
up.  

And you know what?  They still didn't want to test their systems for
recovery from a power loss situation.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] WAL logging of SELECT ... INTO command

2006-03-24 Thread Kris Jurka



On Fri, 24 Mar 2006, Jim C. Nasby wrote:


On Wed, Mar 22, 2006 at 02:37:28PM -0500, Kris Jurka wrote:


On Wed, 22 Mar 2006, Jim C. Nasby wrote:


Ok, I saw disk activity on the base directory and assumed it was pg_xlog
stuff. Turns out that both SELECT INTO and CREATE TABLE AS ignore
default_tablepsace and create the new tables in the base directory. I'm
guessing that's a bug... (this is on 8.1.2, btw).


This has been fixed in CVS HEAD as part of a patch to allow additional
options to CREATE TABLE AS.

http://archives.postgresql.org/pgsql-patches/2006-02/msg00211.php


I'll argue that the current behavior is still a bug and should be fixed.
Would it be difficult to patch 8.1 (and 8.0 if there were tablespaces
then...) to honor default_tablespace?


Here are patches that fix this for 8.0 and 8.1.

Kris JurkaIndex: src/backend/executor/execMain.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.241.4.2
diff -c -r1.241.4.2 execMain.c
*** src/backend/executor/execMain.c 12 Jan 2006 21:49:17 -  
1.241.4.2
--- src/backend/executor/execMain.c 24 Mar 2006 18:05:53 -
***
*** 36,41 
--- 36,42 
  #include catalog/heap.h
  #include catalog/namespace.h
  #include commands/tablecmds.h
+ #include commands/tablespace.h
  #include commands/trigger.h
  #include executor/execdebug.h
  #include executor/execdefs.h
***
*** 731,736 
--- 732,738 
{
char   *intoName;
Oid namespaceId;
+   Oid tablespaceId;
AclResult   aclresult;
Oid intoRelationId;
TupleDesc   tupdesc;
***
*** 747,752 
--- 749,764 
aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
   
get_namespace_name(namespaceId));
  
+   tablespaceId = GetDefaultTablespace();
+   if (OidIsValid(tablespaceId)) {
+   aclresult = pg_tablespace_aclcheck(tablespaceId, 
GetUserId(),
+   
   ACL_CREATE);
+ 
+   if (aclresult != ACLCHECK_OK)
+   aclcheck_error(aclresult, ACL_KIND_TABLESPACE,
+  
get_tablespace_name(tablespaceId));
+   }
+ 
/*
 * have to copy tupType to get rid of constraints
 */
***
*** 754,760 
  
intoRelationId = heap_create_with_catalog(intoName,

  namespaceId,
!   
  InvalidOid,

  tupdesc,

  RELKIND_RELATION,

  false,
--- 766,772 
  
intoRelationId = heap_create_with_catalog(intoName,

  namespaceId,
!   
  tablespaceId,

  tupdesc,

  RELKIND_RELATION,

  false,
Index: src/backend/executor/execMain.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.256.2.5
diff -c -r1.256.2.5 execMain.c
*** src/backend/executor/execMain.c 12 Jan 2006 21:49:06 -  
1.256.2.5
--- src/backend/executor/execMain.c 24 Mar 2006 17:57:11 -
***
*** 37,42 
--- 37,43 
  #include catalog/heap.h
  #include catalog/namespace.h
  #include commands/tablecmds.h
+ #include commands/tablespace.h
  #include commands/trigger.h
  #include executor/execdebug.h
  #include executor/execdefs.h
***
*** 737,742 
--- 738,744 
{
char   *intoName;
Oid namespaceId;
+   Oid tablespaceId;
AclResult   aclresult;
Oid intoRelationId;
TupleDesc   tupdesc;
***
*** 

Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Jim C. Nasby) writes:
 On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
 wrote:
  On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
   Has someone been working on the problem of splitting a query into pieces
   and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
   done that.  
  
  I believe that is limited to Bizgress MPP yes?
 
  Yep.  I hope that someday it will be released to the postgresql global
  dev group for inclusion.  Or at least parts of it.
 
 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).
 
 Most times Jan comes to town, we spend a few minutes musing about the
 splitting queries across threads problem, and dismiss it again; if
 there's the beginning of a split across processes, that's decidedly
 neat :-).

 Correct me if I'm wrong, but there's no way to (reasonably) accomplish
 that without having some dedicated extra processes laying around that
 you can use to execute the queries, no? In other words, the cost of a
 fork() during query execution would be too prohibitive...

Counterexample...

The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table.  There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
by default.

Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files).  Let's assume that the query
plan indicates doing seq scans on both.

You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
prohibitively expensive.

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

That particular scenario is one where the fork()s would hardly be
noticeable.

 FWIW, DB2 executes all queries in a dedicated set of processes. The
 process handling the connection from the client will pass a query
 request off to one of the executor processes. I can't remember which
 process actually plans the query, but I know that the executor runs
 it.

It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
-- 
let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];;
http://www.ntlug.org/~cbbrowne/languages.html
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Luke Lonergan) writes:
 Christopher,

 On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote:

 Question: Does the Bizgress/MPP use threading for this concurrency?
 Or forking?
 
 If it does so via forking, that's more portable, and less dependent on
 specific complexities of threading implementations (which amounts to
 non-portability ;-)).

 OK - I'll byte:

 It's process based, we fork backends at slice points in the execution plan.

By slice points, do you mean that you'd try to partition tables
(e.g. - if there's a Seq Scan on a table with 8 1GB segments, you
could spawn as many as 8 processes), or that two scans that are then
merge joined means a process for each scan, and a process for the
merge join?  Or perhaps both :-).  Or perhaps something else entirely ;-).

 To take care of the startup latency problem, we persist sets of
 these backends, called gangs.  They appear, persist for connection
 scope for reuse, then are disbanded.

If only that could happen to more gangs...
-- 
output = (cbbrowne @ cbbrowne.com)
http://cbbrowne.com/info/multiplexor.html
I'm sorry, the teleportation booth you have reached is not in service
at this  time.   Please  hand-reassemble  your molecules or   call  an
operator to help you

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Svenne Krap

Tom Lane wrote:

where (flow_direction, dataset_id) in (select * from new_func(122))
  


Is this form of multi-column IN mentioned anywhere in the docs? I can't 
find it.


Svenne


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Michael Stone

On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.


Of course, table scanning is going to be IO limited in most cases, and 
having every query spawn 16 independent IO threads is likely to slow 
things down in more cases than it speeds them up. It could work if you 
have a bunch of storage devices, but at that point it's probably easier 
and more direct to implement a clustered approach.


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Tom Lane
Svenne Krap [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 where (flow_direction, dataset_id) in (select * from new_func(122))

 Is this form of multi-column IN mentioned anywhere in the docs? I can't 
 find it.

Sure, look under Subquery Expressions.  8.0 and later refer to it as a
row_constructor, but it's documented at least as far back as 7.3.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
  Correct me if I'm wrong, but there's no way to (reasonably) accomplish
  that without having some dedicated extra processes laying around that
  you can use to execute the queries, no? In other words, the cost of a
  fork() during query execution would be too prohibitive...
 
 Counterexample...
 
 The sort of scenario we keep musing about is where you split off a
 (thread|process) for each partition of a big table.  There is in fact
 a natural such partitioning, in that tables get split at the 1GB mark,
 by default.
 
 Consider doing a join against 2 tables that are each 8GB in size
 (e.g. - they consist of 8 data files).  Let's assume that the query
 plan indicates doing seq scans on both.
 
 You *know* you'll be reading through 16 files, each 1GB in size.
 Spawning a process for each of those files doesn't strike me as
 prohibitively expensive.

Have you ever tried reading from 2 large files on a disk at the same
time, let alone 16? The results ain't pretty.

What you're suggesting maybe makes sense if the two tables are in
different tablespaces, provided you have some additional means to know
if those two tablespaces are on the same set of spindles. Though even
here the usefulness is somewhat suspect, because CPU is a hell of a lot
faster than disks are, unless you have a whole lot of disks. Of course,
this is exactly the target market for MPP.

Where parallel execution really makes sense is when you're doing things
like sorts or hash operations, because those are relatively
CPU-intensive.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-24 Thread Chris Browne
[EMAIL PROTECTED] (Michael Stone) writes:

 On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

 Of course, table scanning is going to be IO limited in most cases, and
 having every query spawn 16 independent IO threads is likely to slow
 things down in more cases than it speeds them up. It could work if you
 have a bunch of storage devices, but at that point it's probably
 easier and more direct to implement a clustered approach.

All stipulated, yes.  It obviously wouldn't be terribly useful to scan
more aggressively than I/O bandwidth can support.  The point is that
this is one of the kinds of places where concurrent processing could
do some good...
-- 
let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;;
http://cbbrowne.com/info/spiritual.html
Save the whales. Collect the whole set. 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Query plan from hell

2006-03-24 Thread PFC


Whoa !

	bookmark_delta contains very few rows but is inserted/deleted very  
often... the effect is spectacular !

I guess I'll have to vacuum analyze this table every minute...


annonces=# EXPLAIN ANALYZE SELECT id, priority FROM annonces WHERE id IN  
(SELECT annonce_id FROM bookmark_delta);

   QUERY PLAN

 Hash IN Join  (cost=32.12..8607.08 rows=1770 width=6) (actual  
time=387.011..387.569 rows=1 loops=1)

   Hash Cond: (outer.id = inner.annonce_id)
   -  Seq Scan on annonces  (cost=0.00..7796.00 rows=101500 width=6)  
(actual time=0.022..164.369 rows=101470 loops=1)
   -  Hash  (cost=27.70..27.70 rows=1770 width=4) (actual  
time=0.013..0.013 rows=5 loops=1)
 -  Seq Scan on bookmark_delta  (cost=0.00..27.70 rows=1770  
width=4) (actual time=0.004..0.010 rows=5 loops=1)

 Total runtime: 387.627 ms
(6 lignes)

annonces=# EXPLAIN ANALYZE SELECT id, priority FROM annonces a, (SELECT  
annonce_id FROM bookmark_delta GROUP BY annonce_id) foo WHERE  
a.id=foo.annonce_id;

   QUERY PLAN

 Nested Loop  (cost=32.12..10409.31 rows=1770 width=6) (actual  
time=0.081..0.084 rows=1 loops=1)
   -  HashAggregate  (cost=32.12..49.83 rows=1770 width=4) (actual  
time=0.038..0.040 rows=1 loops=1)
 -  Seq Scan on bookmark_delta  (cost=0.00..27.70 rows=1770  
width=4) (actual time=0.024..0.027 rows=5 loops=1)
   -  Index Scan using annonces_pkey on annonces a  (cost=0.00..5.83  
rows=1 width=6) (actual time=0.039..0.040 rows=1 loops=1)

 Index Cond: (a.id = outer.annonce_id)
 Total runtime: 0.163 ms
(6 lignes)

annonces=# vacuum bookmark_delta ;
VACUUM
annonces=# EXPLAIN ANALYZE SELECT id, priority FROM annonces WHERE id IN  
(SELECT annonce_id FROM bookmark_delta);

   QUERY PLAN

 Hash IN Join  (cost=32.12..8607.08 rows=1770 width=6) (actual  
time=195.284..196.063 rows=1 loops=1)

   Hash Cond: (outer.id = inner.annonce_id)
   -  Seq Scan on annonces  (cost=0.00..7796.00 rows=101500 width=6)  
(actual time=0.014..165.626 rows=101470 loops=1)
   -  Hash  (cost=27.70..27.70 rows=1770 width=4) (actual  
time=0.008..0.008 rows=2 loops=1)
 -  Seq Scan on bookmark_delta  (cost=0.00..27.70 rows=1770  
width=4) (actual time=0.003..0.004 rows=2 loops=1)

 Total runtime: 196.122 ms
(6 lignes)

annonces=# vacuum analyze bookmark_delta ;
VACUUM
annonces=# EXPLAIN ANALYZE SELECT id, priority FROM annonces WHERE id IN  
(SELECT annonce_id FROM bookmark_delta);

  QUERY PLAN
--
 Nested Loop  (cost=1.02..6.88 rows=1 width=6) (actual time=0.025..0.027  
rows=1 loops=1)
   -  HashAggregate  (cost=1.02..1.03 rows=1 width=4) (actual  
time=0.011..0.012 rows=1 loops=1)
 -  Seq Scan on bookmark_delta  (cost=0.00..1.02 rows=2 width=4)  
(actual time=0.004..0.006 rows=2 loops=1)
   -  Index Scan using annonces_pkey on annonces  (cost=0.00..5.83 rows=1  
width=6) (actual time=0.009..0.010 rows=1 loops=1)

 Index Cond: (annonces.id = outer.annonce_id)
 Total runtime: 0.104 ms
(6 lignes)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match