date:20030815

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Jan Wieck

Bruce Momjian wrote:
Is there a TODO here?
Maybe!? It's one of these premature things noone can tell by now. So the 
TODO would be "investigation" for now.

Jan

---

Tom Lane wrote:
Jan Wieck <[EMAIL PROTECTED]> writes:
> I'm thinking instead of a way to "cache" entire executors for this. Each 
> SPI plan used during a transaction would need it's own executor, and I 
> don't know offhand what type and how much resources an executor requires 
> (I think it's only some memory that get's initialized and the VFD's 
> opened).

Hmm.  This is probably more feasible now than it would have been a year
ago, because I did some cleanup work to ensure that executor state is
localized into a specific memory context.  I'm not certain about the
amount of overhead either, but it's surely worth a try.
			regards, tom lane




--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] New function: epoch_to_timestamp...

2003-08-15 Thread David Fetter

On Fri, Aug 15, 2003 at 06:57:24AM -0400, Tom Lane wrote:
> David Fetter <[EMAIL PROTECTED]> writes:
> > Is there a reasonable way to add an optional param or two to mark
> > time zones, or should that just the application programmer's
> > hassle?

> The return type should be timestamptz, which makes the
> transformation timezone-independent.

Do you mean the default one should read as follows, or that the
default one should take params (integer, output_timezone,
input_timezone), or...?

CREATE OR REPLACE FUNCTION epoch_to_timestampz (INTEGER)
RETURNS TIMESTAMPZ AS '
DECLARE
the_sql  TEXT;
the_record RECORD;
the_time TIMESTAMPZ;
BEGIN
the_sql := ''SELECT January 1 1970::timestamp + ''
 || $1
 || '' seconds::interval AS "foo"'';
RAISE NOTICE ''%'', the_sql;
FOR the_record IN EXECUTE the_sql LOOP
the_time := the_record.foo;
END LOOP;
RETURN the_time;
END;
' LANGUAGE 'plpgsql';

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100cell: +1 415 235 3778

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Bruce Momjian


Is there a TODO here?

---

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > I'm thinking instead of a way to "cache" entire executors for this. Each 
> > SPI plan used during a transaction would need it's own executor, and I 
> > don't know offhand what type and how much resources an executor requires 
> > (I think it's only some memory that get's initialized and the VFD's 
> > opened).
> 
> Hmm.  This is probably more feasible now than it would have been a year
> ago, because I did some cleanup work to ensure that executor state is
> localized into a specific memory context.  I'm not certain about the
> amount of overhead either, but it's surely worth a try.
> 
>   regards, tom lane
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] best way to retreive the next record in a multi column index

2003-08-15 Thread Merlin Moncure

Bruno Wolff III wrote:
> How about something like the following:
> select * from t
> where  a >= a1 and b >= b1
>order by a, b limit 1 offset 1;

Well, this may have recently changed, but the offset clause is not
suitable for arbitrary jumps over large tables.  Essentially, pg does an
index lookup to the first element then sequential scans until the offset
criteria is met.  Even if that was not the case there is another
problem:  Suppose while you are iterating over your table another
backend deletes a row after your initial start position; this will cause
a record to get skipped! (unless inside a transaction, of course, but
that can't be assumed).

I also spent a lot of time thinking about use some type of concatenation
and functional indices to get around the multi column issue (then things
would be really simple!).  This turned out to be a very complicated and
I ended up giving it up: I was stymied in the creation of a 'universal
concatenation' function, plus losing the elegant syntax to do partials
was a loss.

Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] full path infrastructure for DT_SONAME?

2003-08-15 Thread Larry Rosenman

Has the idea of putting the infrastructure in place to allow a full pathname
to be fed to DT_SONAME been totally killed?
That is the one remaining issue I have right now.

I think we should put the infrastructure in place, and allow the UnixWare 
and SCO OpenServer ports use it for now.  (both of those have no ld.so.conf 
/ ldconfig type mechanism).

Thanks,
LER
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] UnixWare on Current CVS: Success!

2003-08-15 Thread ohp

Yep,

Just tested it with a full check out...

GREAT JOB!
Thanks to uou all!
On Thu, 14 Aug 2003, Larry Rosenman wrote:

> Date: Thu, 14 Aug 2003 22:23:17 -0500
> From: Larry Rosenman <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: UnixWare on Current CVS: Success!
>
> Thanks to Tom and Bruce, we can now compile CVS 7.4 on UnixWare with OUT
> any tweaks, including
> the --enable-thread-safety switch.
>
> Here is what I used, and is now running on lerami:
>
> CC=cc CXX=CC ./configure  --prefix=/usr/local/pgsql --enable-syslog \
> --with-CXX --enable-multibyte --enable-cassert \
> --with-includes=/usr/local/include --with-libs=/usr/local/lib \
> --with-tcl --with-tclconfig=/usr/local/lib \
> --with-tkconfig=/usr/local/lib  --enable-locale \
> --enable-thread-safety
>
>
> Thanks, Guys!
>
> LER
>
>
>

-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] best way to retreive the next record in a multi column index

2003-08-15 Thread Bruno Wolff III

On Fri, Aug 15, 2003 at 13:42:23 -0400,
  Merlin Moncure <[EMAIL PROTECTED]> wrote:
>  
> Example: 
> I have a table t with columns a, b, c.  I have values a1, b1, c1 for
> those columns and would like to know the next value in the table when
> ordered by a, b.  I have values a1, b1, and oid1 and would like to find
> the very next record in the table (essentially looking for the next
> record in the index).
>  
> I have two solutions: one with 'or' logic and one with 'and' logic.
> Note: if the index we are scanning has the unique constraint, the oid
> part of the logic (and the index) can be left out.
>  
How about something like the following:
select * from t
where  a >= a1 and b >= b1
order by a, b limit 1 offset 1;

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Behavior of equality_oper and ordering_oper

2003-08-15 Thread Tom Lane

There are a couple of routines in parse_oper.c that are intended to look
up appropriate '=' and '<' operators given a datatype OID.  These are
used in quite a variety of places, for example to determine the
semantics of GROUP BY.

It's bothered me for some time that these routines depend on actually
looking for an operator named '='.  (ordering_oper used to depend on
looking for the name '<', too, but at the moment it looks for a merge
sort operator associated with '='.)  The system in general does not
assume that operator names determine operator semantics, so depending
on names here is a big wart.  equality_oper tries to apply some
additional checks to verify that '=' really does behave like equality,
but that's just a kluge.

What's worse, as of 7.3 the lookup results could vary depending on your
schema search path.  This is bad news, especially for user-defined types.

Up to now I've seen no way around this; I've wished we could have a
type's pg_type entry link directly to the proper '=' operator, but there
are some circularity issues there.  (CREATE TYPE couldn't set the link
because the operator wouldn't exist yet when you're creating the type.)

Today it occurred to me that we could look in pg_opclass for a default
btree opclass for the datatype.  If we find one, then the Equal and Less
members of the opclass are the operators we want.  (If we don't find
one, we could try for a default hash opclass, which would give us Equal,
but not Less, for a few additional datatypes.)  This seems like a much
cleaner approach for two reasons: the opclass structure declares
directly that the operators have the semantics we are looking for,
and the search is not dependent on schema visibility.  (We only allow
one default opclass per datatype/AM, so the result would be unique.)

This lookup would perhaps be a little slower than the name-based
operator lookup, since AFAICS there's no way to use the catcaches for it.
However, we already realized we need a datatype-to-comparison-operator
lookup cache to avoid memory leakage in array_cmp.  If we cache the
results in a specialized hashtable then it certainly won't be any slower
than what the code does now.


There are a few built-in types for which this would change the behavior.
Presently, these operators are believed by equality_oper to be '=',
but do not have any corresponding btree or hash opclass:

 =(tid,tid) | tideq
 =(box,box) | box_eq
 =("path","path")   | path_n_eq
 =(tinterval,tinterval) | tintervaleq
 =(money,money) | cash_eq
 =(aclitem,aclitem) | aclitemeq
 =(circle,circle)   | circle_eq
 =(lseg,lseg)   | lseg_eq
 =(line,line)   | line_eq

In several of these cases, equality_oper is actually wrong --- box_eq
for example compares areas, which is not what one would consider the
normal equality behavior for boxes.  The only ones that really ought
to be found are the ones for TID, MONEY, and ACLITEM.  I'm not
particularly concerned about losing the ability to group by any of those
datatypes, but if anyone is, we could talk about forcing an initdb to
add the necessary comparison operators.

There are also a small number of operators that are found by
ordering_oper but would not be found by a btree opclass search:

 <(box,box) | box_eq
 <(reltime,reltime) | reltimeeq
 <(money,money) | cash_eq
 <(circle,circle)   | circle_eq

Again, I'm not too concerned about this; only MONEY actually has
comparison semantics that would support a btree index, but if it's
not btree-indexable does it need to be groupable?

Comments, objections?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Bounds error in LockMethodInit().

2003-08-15 Thread Tom Lane

Kurt Roeckx <[EMAIL PROTECTED]> writes:
> I have no idea what that numModes++ line is doing there.

I think the notion is that the lock modes are counted in 1-based
numbering; the copy loop starts at 0 so it needs an extra iteration.
Look at the uses of numLockModes for evidence.  (Note the extra zero
at the start of LockConflicts[].  Why it's bothering to copy that,
I dunno.)

I agree the call from lmgr.c is bogus though --- should be doing
something involving lengthof(LockConflicts), probably.  The existing
coding would fail to raise a flag if someone added a few more lock
modes without increasing MAX_LOCKMODES.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Dann Corbit wrote:

-Original Message-
From: Andreas Pflug [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 15, 2003 11:36 AM
To: Dann Corbit
Cc: Stephan Szabo; PostgreSQL-development
Subject: Re: [HACKERS] [GENERAL] 7.4Beta

Dann Corbit wrote:

Simplification of bulk operations can be very important for 

customers 

(on the other hand).  For the CONNX tool set, we offer an escape on 
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as 

not being 

logged and therefore existing outside of a transaction).  Therefore, 
they are useful really only in cases where a complete 

destruction and 

repopulation is called for.  If anything goes haywire, you 

can't simply 

roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert 
(such as using Microsoft's BCP API or Oracles Direct Path 

loading) can 

be 100 times faster.  If you are moving gigabytes of data and 
performing a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape 
notation is used.  For instance, an insert/select might look 

like this: 

INSERT INTO  SELECT  FROM 
 {fn commitcount 1000} {bulkmode} The commit 

count says 

to use batches of 1000 rows and bulkmode says to use the fastest 
possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar} 
data source as a table.  Then, with a syntax such as the above (or 
SELECT INTO etc.), you could very rapidly move data from one system 
into another.

When saying "bulk operation" I don't necessarily mean using 
bulk load or 
stuff like that. What I mean is handling large amounts of 
similar data 
at the same time. That doesn't say anything about transactions or 
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully) 
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging  and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be 
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass 
oriented (and thus 
optimizable) way.

I really, really like SQL*Server's solution to the problem you have
stated above.  A typical example is you have a pile of incoming words
and you want to create a dictionary.  An example might be a library
database, where you have an abstract for each item.  You parse the words
and look for new ones to add.
What SQL*Server allows is an index with an option called "IGNORE
DUPLICATES" that simply tosses out rows that are already in the table.
For applications like what I have described and what you have described
it is an incredibly useful extension.  Once I got used to it, I found
myself using it all the time.  Of course, you must be very careful to
ensure that the duplicates really are completely unimportant.

Hm, option is quite a quick hack. I like logging in the database much 
better than some sql messages. And it's only about duplicates, not about 
fk constraint violations.

Regards,
Andresa
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Dann Corbit

> -Original Message-
> From: Andreas Pflug [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 15, 2003 11:36 AM
> To: Dann Corbit
> Cc: Stephan Szabo; PostgreSQL-development
> Subject: Re: [HACKERS] [GENERAL] 7.4Beta
> 
> Dann Corbit wrote:
> 
> >Simplification of bulk operations can be very important for 
> customers 
> >(on the other hand).  For the CONNX tool set, we offer an escape on 
> >INSERT/SELECT that performs the operation in bulk mode.
> >
> >There are serious downsides to bulk operations also (such as 
> not being 
> >logged and therefore existing outside of a transaction).  Therefore, 
> >they are useful really only in cases where a complete 
> destruction and 
> >repopulation is called for.  If anything goes haywire, you 
> can't simply 
> >roll it back.
> >
> >Yet the speed savings can be enormous (orders of magnitude).
> >
> >Compared to iteration over a set of prepared inserts, a bulk insert 
> >(such as using Microsoft's BCP API or Oracles Direct Path 
> loading) can 
> >be 100 times faster.  If you are moving gigabytes of data and 
> >performing a complete refresh, the method to use becomes obvious.
> >
> >When we go outside of the language bounds, a curly braced escape 
> >notation is used.  For instance, an insert/select might look 
> like this: 
> >INSERT INTO  SELECT  FROM 
> > {fn commitcount 1000} {bulkmode} The commit 
> count says 
> >to use batches of 1000 rows and bulkmode says to use the fastest 
> >possible insert method.
> >
> >Imagine (for instance) that PostgreSQL can use an ODBC {or similar} 
> >data source as a table.  Then, with a syntax such as the above (or 
> >SELECT INTO etc.), you could very rapidly move data from one system 
> >into another.
> >  
> >
> When saying "bulk operation" I don't necessarily mean using 
> bulk load or 
> stuff like that. What I mean is handling large amounts of 
> similar data 
> at the same time. That doesn't say anything about transactions or 
> logging problems.
> Imagine you have 100k or rows to load, each having FKs to (hopefully) 
> existing rows
> - Step 1: load the rows into a temp table
> - Step 2: identify duplicates, logging  and deleting them
> - Step 3: insert all data satisfying the FK constraints.
> - Step 4: log all that didn't insert.
> 
> This are relatively few statements (not the simplest), which can be 
> handled in a safe manner.
> Just an example, how a RDBMS can do the job in a mass 
> oriented (and thus 
> optimizable) way.

I really, really like SQL*Server's solution to the problem you have
stated above.  A typical example is you have a pile of incoming words
and you want to create a dictionary.  An example might be a library
database, where you have an abstract for each item.  You parse the words
and look for new ones to add.

What SQL*Server allows is an index with an option called "IGNORE
DUPLICATES" that simply tosses out rows that are already in the table.
For applications like what I have described and what you have described
it is an incredibly useful extension.  Once I got used to it, I found
myself using it all the time.  Of course, you must be very careful to
ensure that the duplicates really are completely unimportant.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Tom Lane

Jan Wieck <[EMAIL PROTECTED]> writes:
> I'm thinking instead of a way to "cache" entire executors for this. Each 
> SPI plan used during a transaction would need it's own executor, and I 
> don't know offhand what type and how much resources an executor requires 
> (I think it's only some memory that get's initialized and the VFD's 
> opened).

Hmm.  This is probably more feasible now than it would have been a year
ago, because I did some cleanup work to ensure that executor state is
localized into a specific memory context.  I'm not certain about the
amount of overhead either, but it's surely worth a try.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[HACKERS] Bounds error in LockMethodInit().

2003-08-15 Thread Kurt Roeckx

In lmgr.c you have a static LOCKMASK LockConflicts[] with 9
elements in it.

You call LockMethodTableInit() with that pointer, and
MAX_LOCKMODES - 1 (10 - 1 = 9)

That calls LockMethodInit with the same arguments, but it does
numModes++.

So you basicly have a for loop that looks like:

for (i = 0; i < 10; i++, conflictsP++)

The last item you try to copy is conflictsP is not within the
the LockConflicts array anymore.

I have no idea what that numModes++ line is doing there.


Kurt


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Copyright (C) 1996-2002

2003-08-15 Thread Tom Lane

Christoph Haller <[EMAIL PROTECTED]> writes:
> Today I've d-loaded PostgreSQL 7.3.4.
> I've seen in
> $PGSQLD/doc/html/index.html
> it still says
> Copyright (C) 1996-2002
> shouldn't it be 2003?

We only update the copyright notices when we are preparing a major
release.  (Bruce just did it a week or two back for 7.4, for example.)
Updating for minor releases would create a lot of churn in the stable
CVS branches, for little purpose.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] 7.4 LOG: invalid message length

2003-08-15 Thread Barry Lind

Gmane,

I just checked in a fix to the jdbc driver for this.  The problem was 
that the connection termination message was being passed the wrong 
length, which really didn't have any other adverse side effect than this 
message in the log, since the connection was no longer being used.

thanks,
--Barry
Gmane Smith wrote:
Using
   Mac OS X 10.2.6
   Pgsql 7.4 (postgresql-7.4beta1) from postgresql.org
   devpgjdbc2.jar
and WebObjects 5.2
I get 
   LOG:  invalid message length
when EOModeler creates my database.

When I built PostgreSQL I specified 
--with-pam --without-readline

Since I don't have ant the "--with-java" failed so I left it off.

Should I go back to a more stable JDBC driver?  Or should I press on?
Thanks.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Dann Corbit wrote:

Simplification of bulk operations can be very important for customers
(on the other hand).  For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.
There are serious downsides to bulk operations also (such as not being
logged and therefore existing outside of a transaction).  Therefore,
they are useful really only in cases where a complete destruction and
repopulation is called for.  If anything goes haywire, you can't simply
roll it back.
Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path loading) can
be 100 times faster.  If you are moving gigabytes of data and performing
a complete refresh, the method to use becomes obvious.
When we go outside of the language bounds, a curly braced escape
notation is used.  For instance, an insert/select might look like this:
INSERT INTO  SELECT  FROM 
{fn commitcount 1000} {bulkmode}
The commit count says to use batches of 1000 rows and bulkmode says to
use the fastest possible insert method.
Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
source as a table.  Then, with a syntax such as the above (or SELECT
INTO etc.), you could very rapidly move data from one system into
another.
 

When saying "bulk operation" I don't necessarily mean using bulk load or 
stuff like that. What I mean is handling large amounts of similar data 
at the same time. That doesn't say anything about transactions or 
logging problems.
Imagine you have 100k or rows to load, each having FKs to (hopefully) 
existing rows
- Step 1: load the rows into a temp table
- Step 2: identify duplicates, logging  and deleting them
- Step 3: insert all data satisfying the FK constraints.
- Step 4: log all that didn't insert.

This are relatively few statements (not the simplest), which can be 
handled in a safe manner.
Just an example, how a RDBMS can do the job in a mass oriented (and thus 
optimizable) way.

Regards,
Andreas


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Dann Corbit

> -Original Message-
> From: Andreas Pflug [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 15, 2003 10:49 AM
> To: Stephan Szabo
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] [GENERAL] 7.4Beta
> 
> 
> Stephan Szabo wrote:
> 
> >I don't know if there will be or not, but in one case it's a single 
> >table select with constant values, in the other it's 
> probably some kind 
> >of scan and subselect. I'm just not going to rule out the 
> possibility, 
> >so we should profile it in large transactions with say 100k single 
> >inserts and see.
> >  
> >
> You're talking about bulk operations, that should be handled 
> carefully 
> either. Usually loading all data into a temporary table, and making a 
> INSERT INTO xxx SELECT FROM tmptable should give a better 
> performance if 
> indices and constraints are concerned. PostgreSQL shouldn't be 
> considered to accept the most abusive ways of operation, but 
> it should 
> offer a reasonable set of tools enabling the jobs in a 
> convenient way. 
> Best situation available is if many small random transactions are 
> performed good, for TPC like loads, as well as bulk 
> operations. Nobody 
> should expect that a database will smootly convert a bunch of single 
> transactions into an optimized bulk one. That's the job of a 
> programmer.
> 
> >Yeah, the 5 above are pretty easy to show that it's safe, but other 
> >cases and referential action cases won't necessarily be so easy.
> >
> So it's the programmers responsibility to offer mass data to the 
> backend, not separate inserts that by chance might be handled in a 
> similar way. A RDBMS is not a clairvoyant.

Simplification of bulk operations can be very important for customers
(on the other hand).  For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as not being
logged and therefore existing outside of a transaction).  Therefore,
they are useful really only in cases where a complete destruction and
repopulation is called for.  If anything goes haywire, you can't simply
roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path loading) can
be 100 times faster.  If you are moving gigabytes of data and performing
a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used.  For instance, an insert/select might look like this:
INSERT INTO  SELECT  FROM 
{fn commitcount 1000} {bulkmode}
The commit count says to use batches of 1000 rows and bulkmode says to
use the fastest possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
source as a table.  Then, with a syntax such as the above (or SELECT
INTO etc.), you could very rapidly move data from one system into
another.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Behavior of equality_oper and ordering_oper

2003-08-15 Thread Joe Conway

Tom Lane wrote:
Today it occurred to me that we could look in pg_opclass for a default
btree opclass for the datatype.  If we find one, then the Equal and Less
members of the opclass are the operators we want.  (If we don't find
one, we could try for a default hash opclass, which would give us Equal,
but not Less, for a few additional datatypes.)  This seems like a much
cleaner approach for two reasons: the opclass structure declares
directly that the operators have the semantics we are looking for,
and the search is not dependent on schema visibility.  (We only allow
one default opclass per datatype/AM, so the result would be unique.)
This sounds like a big improvement.

In several of these cases, equality_oper is actually wrong --- box_eq
for example compares areas, which is not what one would consider the
normal equality behavior for boxes.  The only ones that really ought
to be found are the ones for TID, MONEY, and ACLITEM.  I'm not
particularly concerned about losing the ability to group by any of those
datatypes, but if anyone is, we could talk about forcing an initdb to
add the necessary comparison operators.
I'd go for the initdb.

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Stephan Szabo wrote:

I don't know if there will be or not, but in one case it's a single table
select with constant values, in the other it's probably some kind of scan
and subselect. I'm just not going to rule out the possibility, so we
should profile it in large transactions with say 100k single inserts and
see.
 

You're talking about bulk operations, that should be handled carefully 
either. Usually loading all data into a temporary table, and making a 
INSERT INTO xxx SELECT FROM tmptable should give a better performance if 
indices and constraints are concerned. PostgreSQL shouldn't be 
considered to accept the most abusive ways of operation, but it should 
offer a reasonable set of tools enabling the jobs in a convenient way. 
Best situation available is if many small random transactions are 
performed good, for TPC like loads, as well as bulk operations. Nobody 
should expect that a database will smootly convert a bunch of single 
transactions into an optimized bulk one. That's the job of a programmer.

Yeah, the 5 above are pretty easy to show that it's safe, but other cases
and referential action cases won't necessarily be so easy.
So it's the programmers responsibility to offer mass data to the 
backend, not separate inserts that by chance might be handled in a 
similar way. A RDBMS is not a clairvoyant.

Regards,
Andreas


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[HACKERS] best way to retreive the next record in a multi column index

2003-08-15 Thread Merlin Moncure









Can anybody help me with this? (sorry
for posting on hackers)

 

I need to be able determine the next row based on a non
unique key (index).  I have solved
this problem, but I would like to know if there is a simpler solution.  For those of you who have ever dealt
with COBOL, this is an on the fly sql construction of a 'READ NEXT' statement
following a START.  Very similar to
cursors, but because of the transactional limitations of cursors they cannot be
used in this context.

 

Example: 

I have a table t with columns a, b, c.  I have values a1, b1, c1 for those
columns and would like to know the next value in the table when ordered by a,
b.  I have values a1, b1, and oid1
and would like to find the very next record in the table (essentially looking
for the next record in the index).

 

I have two solutions: one with 'or' logic and one with 'and'
logic.  Note: if the index we are
scanning has the unique constraint, the oid part of
the logic (and the index) can be left out.

 

create index t_idx on t(a, b, oid);

 

*or* logic:

select * from t 

where  

    a
> a1 OR

    (a
= a1 and b > b1) OR

    (a
= a1 and b = b1 and oid > oid1)

    order by a, b, oid

    

*and* logic

select * from t

where 

    a
>= a1 AND

    (a
> a1 or b >= b1) AND

    (a
> a1 or b > b1 or oid > oid1)

    order by a, b, oid

 

I think, of the two, the or logic
is much better.  The problem with
both approaches is that when we have a 4 column based key (common in COBOL) our
index is based on a,b,c,d,o and the number of
comparisons (and our select statement) becomes large, and performance is very
important!  If some logical genius
knows how to reduce the above logic into a more direct approach, feel free to
comment.

 

Postgres properly optimizes both cases, and uses the key
even for a table with 1 million + records in it, the
answer comes back right away.

 

My question is: is there a simpler way to do this? AFIK
there is no way in sql to directly find the 'next' or 'previous' record in an
ordered index (in other words, I have oid n, what is
the next oid in the index?) without using the above
logic.  In other words, I am missing
the ability to deal with a multi column index value in a comparison as a single
entity.

 

p.s.

the above
queries are 'sliding window' queries similar to cursors.  If your table traversal can be defined
by an (unique) index, you can use the above templates to slide over the tables
without the use of a cursor.

 

Merlin

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Andreas Pflug wrote:

> Stephan Szabo wrote:
>
> >On Fri, 15 Aug 2003, Andreas Pflug wrote:
> >
> >>Stephan Szabo wrote:
> >>
> >>>Well, I think single inserts might be more expensive (because the query is
> >>>more involved for the table joining case) using a statement level trigger,
> >>>so we'd probably want to profile the cases.
> >>>
> >>>
> >>>
> >>>
> >>This really depends. If a constraint is just a check on the
> >>inserted/updated column, so no other row needs to be checked, there's no
> >>faster way then the current row trigger. But FK constraints need to
> >>execute a query to retrieve the referenced row, and every RDBMS prefers
> >>to execute a single statement with many rows over many statements with a
> >>single row, because the first will profit from optimization. And even if
> >>only a single row is inserted or updated, there's still the need to
> >>lookup the reference.
> >>
> >>
> >
> >I don't think that addresses the issue I brought up.  If you're doing a
> >bunch of single inserts:
> >begin;
> >insert into foo values (1);
> >insert into foo values (1);
> >insert into foo values (1);
> >insert into foo values (1);
> >insert into foo values (1);
> >end;
> >
> >Each of those statement triggers is still only going to be dealing with a
> >single row.  If you're in immediate mode there's not much you can do about
> >that since the constraint is checked between inserts.  If you're in
> >deferred mode, right now it won't help because it's not going to batch
> >them, it's going to be 5 statement triggers AFAICT each with its own 1 row
> >affected table.
> >
> >I believe that the more complicated join the old/new table with the pk
> >table and do the constraint check is going to be slightly slower than the
> >current row behavior for such cases because the trigger query is going to
> >be more complicated.  What would be nice would be some way to choose
> >whether to use a single query per statement vs a simpler query per row
> >based on what's happening.
> >
>
> Deferring the constraint check would mean checking 5 single rows, right.
> But I still can't see why you think that a row level trigger would be
> cheaper in this case. I had a look at ri_triggers.c and what's coded
> there looks just as I expected, doing a query on the referenced table.
> the queries might look a bit different when checking multiple rows at
> once, but carefully designed I doubt that there would be a performance
> hit from this. In case it *is* significantly slower, single row updates

I don't know if there will be or not, but in one case it's a single table
select with constant values, in the other it's probably some kind of scan
and subselect. I'm just not going to rule out the possibility, so we
should profile it in large transactions with say 100k single inserts and
see.

> could be handled separately using the current triggers, and statement
> triggers for multiple rows. This would cover both scenarios best. At the

Yep.  I'd wish that it could do it without actually needing to queue up
both triggers, but I don't know how if that'd be possible without tying
some knowledge of the fk functions deeper down.

> Best thing in the situation above would certainly be if all 5 rows would
> be checked in a single query, but that looks quite impossible because a
> mixture of inserts/updates/deletes on different tables might be deferred.

Yeah, the 5 above are pretty easy to show that it's safe, but other cases
and referential action cases won't necessarily be so easy.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

 

Stephan Szabo wrote:

   

On Fri, 15 Aug 2003, Andreas Pflug wrote:



 

Stephan Szabo wrote:



   

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row.  I keep meaning to get
around to it and never actually do. :(  I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.


 

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.
   

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.
 

This really depends. If a constraint is just a check on the
inserted/updated column, so no other row needs to be checked, there's no
faster way then the current row trigger. But FK constraints need to
execute a query to retrieve the referenced row, and every RDBMS prefers
to execute a single statement with many rows over many statements with a
single row, because the first will profit from optimization. And even if
only a single row is inserted or updated, there's still the need to
lookup the reference.
   

I don't think that addresses the issue I brought up.  If you're doing a
bunch of single inserts:
begin;
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
end;
Each of those statement triggers is still only going to be dealing with a
single row.  If you're in immediate mode there's not much you can do about
that since the constraint is checked between inserts.  If you're in
deferred mode, right now it won't help because it's not going to batch
them, it's going to be 5 statement triggers AFAICT each with its own 1 row
affected table.
I believe that the more complicated join the old/new table with the pk
table and do the constraint check is going to be slightly slower than the
current row behavior for such cases because the trigger query is going to
be more complicated.  What would be nice would be some way to choose
whether to use a single query per statement vs a simpler query per row
based on what's happening.
 

Deferring the constraint check would mean checking 5 single rows, right. 
But I still can't see why you think that a row level trigger would be 
cheaper in this case. I had a look at ri_triggers.c and what's coded 
there looks just as I expected, doing a query on the referenced table. 
the queries might look a bit different when checking multiple rows at 
once, but carefully designed I doubt that there would be a performance 
hit from this. In case it *is* significantly slower, single row updates 
could be handled separately using the current triggers, and statement 
triggers for multiple rows. This would cover both scenarios best. At the 
moment, update/insert scales not too good.

Best thing in the situation above would certainly be if all 5 rows would 
be checked in a single query, but that looks quite impossible because a 
mixture of inserts/updates/deletes on different tables might be deferred.

Regards,
Andreas


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Jan Wieck

Tom Lane wrote:

Stephan Szabo <[EMAIL PROTECTED]> writes:
select * from fk where not exists(select * from pk where pk.key=fk.key)
 and key is not null;
(doing seq scan/subplan doing index scan - which is probably close to the
current system)
Actually, even that would probably be noticeably better than the current
system.  I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit.  Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.
(Of course, this would become less relevant if the triggers got
rewritten to not go through SPI ...)
One of the reasons why we used SPI to generate the plans was the ease of
use. I'm not 100% sure, but I think the standard doesn't require the
referencing and referenced column(s) to be identical, only compatible.
So for example a text type foreign key can reference a varchar() and an 
int4 can reference int8. Not using SPI for that lookup does not reduce 
to a simple index- or seq-scan (depending on index availability on the 
foreign key attributes).

Even if the standard does require it, we did not for a couple releases 
and breaking that backward compatibility is IMHO not an option.

I'm thinking instead of a way to "cache" entire executors for this. Each 
SPI plan used during a transaction would need it's own executor, and I 
don't know offhand what type and how much resources an executor requires 
(I think it's only some memory that get's initialized and the VFD's 
opened). If I also remember correctly, the executor holds the pointer to 
the parameters in the execstate and the actual values stay just in the 
caller provided array. All that can change for a given plan between 
SPI_execp() calls is this parameter array and the maxtuple arg. If an 
executor is comparably cheap resource wise, SPI_execp() should be able 
to just manipulate the parameter array in the execstate (with 
propagation into the scankeys I fear) and then let it do a rescan. At 
transaction commit time we'd need to close all executors then, like we 
do with cursors.

Does that all make any sense to you?

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Copyright (C) 1996-2002

2003-08-15 Thread Christoph Haller

Today I've d-loaded PostgreSQL 7.3.4.
I've seen in
$PGSQLD/doc/html/index.html
it still says
Copyright (C) 1996-2002
shouldn't it be 2003?

Regards, Christoph

PS
I've sent this to [EMAIL PROTECTED] before.
But in return I've got
Your message to pgsql-docs has been delayed, and requires the approval
of the moderators, for the following reason(s):

The author (Christoph Haller <[EMAIL PROTECTED]>)
  is not a member of any of the restrict_post groups.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Andreas Pflug wrote:

> Stephan Szabo wrote:
>
> >On Fri, 15 Aug 2003, Andreas Pflug wrote:
> >
> >
> >
> >>Stephan Szabo wrote:
> >>
> >>
> >>
> >>>That really needs to be rewritten to do a single check over the table
> >>>rather than running the constraint for every row.  I keep meaning to get
> >>>around to it and never actually do. :(  I'm not sure that in practice
> >>>you'll get a better plan at restore time depending on what the default
> >>>statistics give you.
> >>>
> >>>
> >>>
> >>This is clearly a case for a statement level trigger, as soon as
> >>affected rows can be identified.
> >>
> >>
> >
> >Well, I think single inserts might be more expensive (because the query is
> >more involved for the table joining case) using a statement level trigger,
> >so we'd probably want to profile the cases.
> >
> >
> This really depends. If a constraint is just a check on the
> inserted/updated column, so no other row needs to be checked, there's no
> faster way then the current row trigger. But FK constraints need to
> execute a query to retrieve the referenced row, and every RDBMS prefers
> to execute a single statement with many rows over many statements with a
> single row, because the first will profit from optimization. And even if
> only a single row is inserted or updated, there's still the need to
> lookup the reference.

I don't think that addresses the issue I brought up.  If you're doing a
bunch of single inserts:
begin;
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
insert into foo values (1);
end;

Each of those statement triggers is still only going to be dealing with a
single row.  If you're in immediate mode there's not much you can do about
that since the constraint is checked between inserts.  If you're in
deferred mode, right now it won't help because it's not going to batch
them, it's going to be 5 statement triggers AFAICT each with its own 1 row
affected table.

I believe that the more complicated join the old/new table with the pk
table and do the constraint check is going to be slightly slower than the
current row behavior for such cases because the trigger query is going to
be more complicated.  What would be nice would be some way to choose
whether to use a single query per statement vs a simpler query per row
based on what's happening.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Andreas Pflug wrote:

 

Stephan Szabo wrote:

   

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row.  I keep meaning to get
around to it and never actually do. :(  I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
 

This is clearly a case for a statement level trigger, as soon as
affected rows can be identified.
   

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.
 

This really depends. If a constraint is just a check on the 
inserted/updated column, so no other row needs to be checked, there's no 
faster way then the current row trigger. But FK constraints need to 
execute a query to retrieve the referenced row, and every RDBMS prefers 
to execute a single statement with many rows over many statements with a 
single row, because the first will profit from optimization. And even if 
only a single row is inserted or updated, there's still the need to 
lookup the reference.

Regards,
Andreas


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Tom Lane wrote:

> Stephan Szabo <[EMAIL PROTECTED]> writes:
> > select * from fk where not exists(select * from pk where pk.key=fk.key)
> >  and key is not null;
> > (doing seq scan/subplan doing index scan - which is probably close to the
> > current system)
>
> Actually, even that would probably be noticeably better than the current
> system.  I haven't profiled it (someone should) but I suspect that
> executor startup/shutdown time is a huge hit.  Even though the trigger
> is caching a plan, it has to instantiate that plan for each referencing
> tuple --- and the executor is not designed for quick startup/shutdown.

Yeah, but it was pretty much the best I could do testing on the command
line.  And it was still a fair bit more expensive than using IN (my tests
on various key types showed anywhere from 15% to 300% better speed on IN
over exists for this).



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Andreas Pflug wrote:

> Stephan Szabo wrote:
>
> >That really needs to be rewritten to do a single check over the table
> >rather than running the constraint for every row.  I keep meaning to get
> >around to it and never actually do. :(  I'm not sure that in practice
> >you'll get a better plan at restore time depending on what the default
> >statistics give you.
> >
> This is clearly a case for a statement level trigger, as soon as
> affected rows can be identified.

Well, I think single inserts might be more expensive (because the query is
more involved for the table joining case) using a statement level trigger,
so we'd probably want to profile the cases.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Tom Lane wrote:

> Stephan Szabo <[EMAIL PROTECTED]> writes:
> > On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
> >> I can also attest to the horrendously long time it takes to restore the ADD
> >> FOREIGN KEY section...
>
> > That really needs to be rewritten to do a single check over the table
> > rather than running the constraint for every row.  I keep meaning to get
> > around to it and never actually do. :(  I'm not sure that in practice
> > you'll get a better plan at restore time depending on what the default
> > statistics give you.
>
> In simple cases I think that the creation of indexes would be enough to
> get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
> so the planner will know how big the tables are, and for single-column
> primary keys the existence of a unique index is enough to cue the
> planner that the column is unique, even without any ANALYZE stats.
> Those are the biggest levers on the plan choice.
>
> This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
> KEY; I'm not certain if there's anything to enforce that at the
> moment...
>
> I assume what you have in mind is to replace
> validateForeignKeyConstraint() with something that does a join of the
> two tables via an SPI command.  But supposing that we want to keep the
> present ability to report (one of) the failing key values, it seems
> like the query has to look like
>   SELECT keycolumns FROM referencing_table WHERE
>   keycolumns NOT IN (SELECT refcols FROM referenced_table);
> which is only gonna do the right thing for one of the MATCH styles
> (not sure which, offhand ... actually it may not do the right thing
> for any match style if there are nulls in referenced_table ...).

Yes, in practice, you'd have to put IS NOT NULL checks in the subselect,
which is fine for the two match types we support since a referenced row
with a NULL isn't a choice for a referenced row for those.  I think MATCH
PARTIAL might have to fall back to the repeated check unless we can make
the query work which would be harder because you only want to compare the
columns for a particular row where the keycolumn case is not null and I
can't think of a query for that that'd be particularly clean and likely to
be fast, then again I don't think the constraint would be either. :(

It'd probably be:
MATCH unspecified:
 SELECT keycolumns FROM referencing_table WHERE
  (keycolumns) NOT IN (SELECT refcols FROM referenced_table
WHERE refcol1 IS NOT NULL AND ... )
 AND keycolumn1 IS NOT NULL AND ...;

MATCH FULL: (something like, I haven't tried it)
 SELECT keycolumns FROM referencing_table WHERE
  ((keycolumns) NOT IN (SELECT refcols FROM referenced_table
   WHERE refcol1 IS NOT NULL AND ...)
  AND
   (keycolumn1 IS NOT NULL AND ...)
  )
  OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)

> How would you make it work for all the MATCH styles?  And will it
> really be all that efficient?  (NOT IN is a lot more circumscribed
> than IN.)

I'm not really sure yet.  Limited tests seem to show that it'll probably
be as fast if not faster for all reasonable cases, but I'd want to
generate a much larger random data set and actually put it in to make a
fair comparison (maybe temporarily with a set to allow people to try both
cases on real world data).  One other advantage here is that we don't need
to get row locks while checking this if we've already gotten the exclusive
table locks on both tables involved.  I'm not sure if we do that currently
though.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] New function: epoch_to_timestamp...

2003-08-15 Thread Tom Lane

David Fetter <[EMAIL PROTECTED]> writes:
> Is there a reasonable way to add
> an optional param or two to mark time zones, or should that just the
> application programmer's hassle?

The return type should be timestamptz, which makes the transformation
timezone-independent.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Tom Lane

Stephan Szabo <[EMAIL PROTECTED]> writes:
> select * from fk where not exists(select * from pk where pk.key=fk.key)
>  and key is not null;
> (doing seq scan/subplan doing index scan - which is probably close to the
> current system)

Actually, even that would probably be noticeably better than the current
system.  I haven't profiled it (someone should) but I suspect that
executor startup/shutdown time is a huge hit.  Even though the trigger
is caching a plan, it has to instantiate that plan for each referencing
tuple --- and the executor is not designed for quick startup/shutdown.

(Of course, this would become less relevant if the triggers got
rewritten to not go through SPI ...)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

ADD FOREIGN KEY (was Re: [HACKERS] [GENERAL] 7.4Beta)

2003-08-15 Thread Tom Lane

Stephan Szabo <[EMAIL PROTECTED]> writes:
> On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
>> I can also attest to the horrendously long time it takes to restore the ADD
>> FOREIGN KEY section...

> That really needs to be rewritten to do a single check over the table
> rather than running the constraint for every row.  I keep meaning to get
> around to it and never actually do. :(  I'm not sure that in practice
> you'll get a better plan at restore time depending on what the default
> statistics give you.

In simple cases I think that the creation of indexes would be enough to
get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
so the planner will know how big the tables are, and for single-column
primary keys the existence of a unique index is enough to cue the
planner that the column is unique, even without any ANALYZE stats.
Those are the biggest levers on the plan choice.

This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
KEY; I'm not certain if there's anything to enforce that at the
moment...

I assume what you have in mind is to replace
validateForeignKeyConstraint() with something that does a join of the
two tables via an SPI command.  But supposing that we want to keep the
present ability to report (one of) the failing key values, it seems
like the query has to look like
SELECT keycolumns FROM referencing_table WHERE
keycolumns NOT IN (SELECT refcols FROM referenced_table);
which is only gonna do the right thing for one of the MATCH styles
(not sure which, offhand ... actually it may not do the right thing
for any match style if there are nulls in referenced_table ...).
How would you make it work for all the MATCH styles?  And will it
really be all that efficient?  (NOT IN is a lot more circumscribed
than IN.)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] 7.4 beta 1 getting out of swap

2003-08-15 Thread Tom Lane

Joe Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> 3. Set up a long-lived cache internal to the array functions that can
>> translate element type OID to the needed lookup data, and won't leak
>> memory across repeated calls.  This is not the fastest or most general
>> solution, but it seems the most localized and safest fix.

> It seems to me that #3 is the least risky, and even if it isn't the best 
> possible performance, this is the initial implementation of indexes on 
> arrays, so it isn't like we're taking away something. Maybe solution #2 
> is better held as a performance enhancement for 7.5.

I'm leaning that way too.  It occurs to me also that the same cache
could be used to eliminate repeated lookups in sorting setup --- which
would not be much of a win percentagewise, compared to the sort itself,
but still it seems worth doing.

> Do you want me to take a shot at this since I created the mess?

Actually I led you down the garden path on that, IIRC --- I was the one
who insisted these lookups needed to be cached.  I'll work on fixing it.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Andreas Pflug

Stephan Szabo wrote:

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

 

I throw last nights backup at it. Data went in in about 1/2 an hour then
   

the
   

constraints went in and they took at age. about 2 hours.
Is there anyway to speed up the database constraint code? Because quite
frankly at the current speed your probably better off without the
constraints (Same problem with 7.3 come to think about it.)
   

I can also attest to the horrendously long time it takes to restore the ADD
FOREIGN KEY section...
   

That really needs to be rewritten to do a single check over the table
rather than running the constraint for every row.  I keep meaning to get
around to it and never actually do. :(  I'm not sure that in practice
you'll get a better plan at restore time depending on what the default
statistics give you.
This is clearly a case for a statement level trigger, as soon as 
affected rows can be identified.
One remark on that enable/disable triggers stuff: from a user's 
perspective, I wouldn't consider a constraint trigger as a trigger, so 
if I'd disable all triggers on a table, I still would expect all 
constraints to be checked.

Regards,
Andreas
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Christopher Kings-Lynne

> We've talked about stuff like that in the past, but we seem to generally
> get stuck about how to specify it.  If we add it to the alter table add as
> an option then we're generating statements that are almost like a standard
> sql statement, but not quite, and some people didn't like that. A set
> option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
> but I'd be worried about making it apply to the normal checks as well
> (which I believe was suggested last time this came up).  In any case,
> making the full checks faster for when you really do care isn't a bad
> plan. :)

How about having an 'i'm restoring' SET var:

SET data_restore = true;

Which means (among other things) that FK's aren't checked?

Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Christopher Kings-Lynne

> We've talked about stuff like that in the past, but we seem to generally
> get stuck about how to specify it.  If we add it to the alter table add as
> an option then we're generating statements that are almost like a standard
> sql statement, but not quite, and some people didn't like that. A set
> option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
> but I'd be worried about making it apply to the normal checks as well
> (which I believe was suggested last time this came up).  In any case,
> making the full checks faster for when you really do care isn't a bad
> plan. :)

Or, this:

1. Add the FK to the table BEFORE COPYing data
2. Use the old update blah set reltriggers = 0 trick
3. restore the data
4. Undo step 2

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo

On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

> > We've talked about stuff like that in the past, but we seem to generally
> > get stuck about how to specify it.  If we add it to the alter table add as
> > an option then we're generating statements that are almost like a standard
> > sql statement, but not quite, and some people didn't like that. A set
> > option that only affected ALTER TABLE ADD CONSTRAINT wouldn't be too bad,
> > but I'd be worried about making it apply to the normal checks as well
> > (which I believe was suggested last time this came up).  In any case,
> > making the full checks faster for when you really do care isn't a bad
> > plan. :)
>
> Or, this:
>
> 1. Add the FK to the table BEFORE COPYing data
> 2. Use the old update blah set reltriggers = 0 trick
> 3. restore the data
> 4. Undo step 2

The problem with that is that I think the reltriggers=0 trick only works
if you're superuser, I thought that's why the trigger disabling became
optional.  A set that affected only atac would probably be reasonable in
the dump (and presumably easy to do).  It'd also carry over to future
cases where we separate some check constraints (for examples ones
that refer directly or indirectly to the same table in a subselect).

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Christopher Kings-Lynne

> > 1. Add the FK to the table BEFORE COPYing data
> > 2. Use the old update blah set reltriggers = 0 trick
> > 3. restore the data
> > 4. Undo step 2
>
> The problem with that is that I think the reltriggers=0 trick only works
> if you're superuser, I thought that's why the trigger disabling became
> optional.  A set that affected only atac would probably be reasonable in
> the dump (and presumably easy to do).  It'd also carry over to future
> cases where we separate some check constraints (for examples ones
> that refer directly or indirectly to the same table in a subselect).

Well yes, this would be a super-user only ability.  Are you worried about
people restoring dumps as non-superuser?

Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Christopher Kings-Lynne

> > I can also attest to the horrendously long time it takes to restore the
ADD
> > FOREIGN KEY section...
>
> That really needs to be rewritten to do a single check over the table
> rather than running the constraint for every row.  I keep meaning to get
> around to it and never actually do. :(  I'm not sure that in practice
> you'll get a better plan at restore time depending on what the default
> statistics give you.

Surely in the default case it would reduce to using the new hashed IN()
feature, so it'd be a lot faster?

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Stephan Szabo


On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:

> > > 1. Add the FK to the table BEFORE COPYing data
> > > 2. Use the old update blah set reltriggers = 0 trick
> > > 3. restore the data
> > > 4. Undo step 2
> >
> > The problem with that is that I think the reltriggers=0 trick only works
> > if you're superuser, I thought that's why the trigger disabling became
> > optional.  A set that affected only atac would probably be reasonable in
> > the dump (and presumably easy to do).  It'd also carry over to future
> > cases where we separate some check constraints (for examples ones
> > that refer directly or indirectly to the same table in a subselect).
>
> Well yes, this would be a super-user only ability.  Are you worried about
> people restoring dumps as non-superuser?

Basically, yes (you might dump only some tables that you own for example).
I think that's why the data only dumps no longer do the reltriggers thing
by default and you need an option to get at it.  We could make a similar
option for this case, but it'd only work when restored by a superuser.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [GENERAL] 7.4Beta

2003-08-15 Thread Gavin Sherry

On Thu, 14 Aug 2003, Stephan Szabo wrote:

> That really needs to be rewritten to do a single check over the table
> rather than running the constraint for every row.  I keep meaning to get
> around to it and never actually do. :(  I'm not sure that in practice
> you'll get a better plan at restore time depending on what the default
> statistics give you.

Perhaps it would be easier to allow SKIP VALIDATION (or something) with
ALTER TABLE ADD  which can set FkConstraint->skip_validation. If we're
just handling pg_dump output, then presumably the data is already
validated. On the other handle, it might encourage users to bypass FKs
when they feel like it...

Thanks,

Gavin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

41 matches

Mail list logo