[SQL] update query taking 24+ hours
Hello, I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard drive. I'm performing an update between two large tables and so far it's been running for 24+ hours. I have two tables: Master: x int4 y int4 val1 int2 val2 int2 Import: x int4 y int4 val int2 Each table has about 100 million rows. I want to populate val2 in Master with val from Import where the two tables match on x and y. So, my query looks like: UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND Master.y=Import.y; Both tables have indexes on the x and y columns. Will that help? Is there a better way to do this? In each table x,y are unique, does that make a difference? ie: would it be faster to run some kind of query, or loop, that just goes through each row in Import and updates Master (val2=val) where x=x and y=y? If this approach would be better how to construct such a SQL statement? The other weird thing is that when I monitor the system with xload it shows two bars of load, and the hard drive is going nuts, so far my database directory has grown by 25GB, however when I run "top" the system shows 98% idle and the postmaster process is usually only between 1-2% CPU, although it is using 50% (750MB) ram. Also the process shows up with a "D" status in the "S" column. Not sure what is going on. If the size of the tables makes what I'm trying to do insane, or if I just have a bad SQL approach, or if something is wrong with my postgres configuration. Really appreciate any help! Thanks! Ken ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] update query taking 24+ hours
Thanks Tom!
explain output:
Merge Join (cost=60454519.54..70701005.93 rows=682951183 width=22)
Merge Cond: (("outer".y = "inner".y) AND ("outer".x = "inner".x))
-> Sort (cost=41812068.08..42304601.78 rows=197013479 width=20)
Sort Key: Master.y, Master.x
-> Seq Scan on Master (cost=0.00..3129037.79 rows=197013479
width=20)
-> Sort (cost=18642451.46..18879400.92 rows=94779784 width=10)
Sort Key: Import.y, Import.x
-> Seq Scan on Import (cost=0.00..1460121.84 rows=94779784
width=10)
Don't really understand all those numbers but they look big, to me.
work_mem is set to 262144. should it be bigger? i have 1.5GB ram on
the system. also i set /proc/sys/kernel/shmmax to 25600. too big,
too small?
There are no foreign key constraints on either table.
I don't know what hashjoin or sort-and-mergejoin are but I will look
into them.
Thanks!
Ken
Tom Lane wrote:
Ken <[EMAIL PROTECTED]> writes:
I have postgres 8.1 on a linux box: 2.6Ghz P4, 1.5GB ram, 320GB hard
drive. I'm performing an update between two large tables and so far
it's been running for 24+ hours.
UPDATE Master SET val2=Import.val WHERE Master.x=Import.x AND
Master.y=Import.y;
What does EXPLAIN say about that? (Don't try EXPLAIN ANALYZE,
but a plain EXPLAIN should be quick enough.)
Both tables have indexes on the x and y columns. Will that help?
A two-column index would have helped a lot more, probably, although
with so many rows to process I'm not sure that indexes are useful
anyway. For really bulk updates a hashjoin or sort-and-mergejoin
plan is probably the best bet.
BTW, both of those would require plenty of work_mem to run fast
... what have you got work_mem set to?
And possibly even more to the point, do you have any foreign key
constraints leading into or out of the Master table?
regards, tom lane
---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
[SQL] Re: Help with query. (*)
[NOTE: I'm a pgsql newbie myself. Take this reply with a large-ish grain of salt!) Shouldn't it be something straightforward like: select a.a, a.b, a.c, ... from a a, b b where a.x = b.x, and a.y = b.y, ... (I'd watch out for too many clauses here... if you've got a lot of clauses, you're probably not normalized as much as you should be.) If you have indexes on the relevant fields, you shouldn't get a table scan and this should return rather quickly, right? -Ken "Diehl, Jeffrey" wrote: > > I'm having difficulty writing a query which I really can't live without... > > I need to get a list of records from table A for which there are > corresponding records in table B. I've tried to use the intersect clause, > but it doesn't seem to work, or it runs far too long. For example: > > select * from A > where 1=1 > intersect select * from A where > B.x=A.x > and A.y=B.y > and A.z=B.z > limit 100 > > I need the most efficient method possible; my A tables have upward of 5 > Million records. The B table, btw, only has about 100 records. > > Any help will be most appreciated.
[SQL] Re: binary operators
Frederic Metoz wrote: > I am looking for the binary AND and OR ... SHIFT as well. > Do they exist for postgresql ? Depending on what you're doing...you might get away with using mathematical operators to accomplish the above... A right SHIFT would be dividing by 2. A left shift would be multiplying by 2. I don't know off the top of my head about AND and OR. Doesn't the manual cover this? -Ken
[SQL] Hrm...why is this wrong?
In trying to use a plpgsql stored proc, I'm getting an error I don't
understand.
When the select at the bottom of this email is executed, I'm getting the
message:
ERROR: parser: parse error at or near "$1"
Any ideas?
--
Ken Corey, CTOAtomic Interactive, Ltd.
select 'drop FUNCTION IU_EMPLOYEE(varchar(255), ...);' as Progress;
drop FUNCTION IU_EMPLOYEE( varchar(255), varchar(255),
varchar(255), varchar(255), varchar(255), varchar(255),
varchar(255), varchar(255), varchar(255), varchar(255),
varchar(255));
select 'create FUNCTION IU_EMPLOYEE(varchar(255), ...)' as Progress;
create FUNCTION IU_EMPLOYEE( varchar(255), varchar(255),
varchar(255), varchar(255), varchar(255), varchar(255),
varchar(255), varchar(255), varchar(255), varchar(255),
varchar(255))
RETURNS INT4
AS '
DECLARE
user_name_in alias for $1;
passwd_in alias for $2;
firstname_in alias for $3;
lastname_in alias for $4;
company_in alias for $5;
addr1_in alias for $6;
addr2_in alias for $7;
city_in alias for $8;
state_in alias for $9;
postcode_in alias for $10;
country_in alias for $11;
userid_calc INT4;
companyid_calc INT4;
BEGIN
userid_calc := 0;
select into companyid_calc COMPANY_ID from COMPANY
where COMPANY_NAME = company_in;
if (companyid_calc is null)
then
insert into COMPANY (COMPANY_NAME) values (company_in);
companyid_calc := currval(''company_company_id_seq'');
end if;
if (companyid_calc is not null)
then
insert into EMPLOYEE ( COMPANY_ID ,
AUTHORIZED , RIGHTS , USERNAME , PASSWD , FIRSTNAME ,
LASTNAME , ADDR1 , ADDR2 , CITY , STATE , POSTCODE ,
COUNTRY)
values (
companyid_calc,0,0,username_in, password_in, firstname_in,
lastname_in, company_in,addr1_in,addr2_in,city_in,
state_in,postcode_in,country_in
);
userid_calc := currval(''employee_employee_id_seq'');
else
rollback;
return 0;
end if;
return userid_calc;
END;'
LANGUAGE 'plpgsql';
select iu_employee('handtest','password','hand','test','handcompany',
'handaddr','handaddr2','handcity','handstate','handpostcode','handcountry');
Re: [SQL] Hrm...why is this wrong?
Wow! Answering emails on a Sunday? Someone should be giving you an award or something. On Sunday 04 February 2001 8:13 pm, you wrote: > Ken Corey <[EMAIL PROTECTED]> writes: > > When the select at the bottom of this email is executed, I'm getting the > > message: > > ERROR: parser: parse error at or near "$1" > > I don't get that; I get > ERROR: Attribute 'username_in' not found > which is about what I'd expect for the given function text; maybe you > didn't transcribe it accurately? That's strange...perhaps the difference was a problem with my table definition? *shrug* I also had made a few mistakes, so once I got those fixed, the code seems to work again. > Anyway, an invaluable technique for debugging plpgsql functions is to > start psql with debug level 2, so that the queries the plpgsql executor > feeds to the SQL engine get logged in the postmaster log. (If you don't > run the postmaster with a logfile, you should...) For example: Hey, that's perfect. It's okay just so long as the debugging out goes *somewhere*...:^) Thanks, Tom. -- Ken Corey, CTOAtomic Interactive, Ltd.
[SQL] Transactions in PLPGSQL?
Hi All! Are the BEGIN/END; seen in a typical PL/PGSQL function a transaction wrapper, or do I need to add another BEGIN/END block? Should I just put a 'rollback' in the function, or do I need to do something special? Thanks! -Ken
[SQL] My apologies...
My apologies to the lists...in trying to stop my own spam, I spammed you all. This has now been fixed. Again, sorry for any inconvenience. -Ken
[SQL] logging a psql script
Hello, I would like my psql script to log everything that it does. I set the following \set ECHO all \o foo.txt \qecho some sql, some ddl, etc... \o But foo.txt only contains DROP DROP DROP CREATE CREATE CREATE I want it to contain everything that I see on the screen, what am I missing? Thanks Ken
[SQL] logging a script
Hello, I would like my psql script to log everything that it does. I set the following \set ECHO all \o foo.txt \qecho some sql, some ddl, etc... \o But foo.txt only contains DROP DROP DROP CREATE CREATE CREATE I want it to contain everything that I see on the screen, what am I missing? Thanks Ken
[SQL] greetings
I have just joined the list a few days ago and am trying quite hard to come up to speed with pgsql but i find documentaion frustratiing. I think maybe it;s just a matter of finding things that are of the correct scope. I've been an Oracle developer for over 6 years so often I know what it is I want to do but something is just a little different. If there are others on the list that learned in Oracle then pgsql please tell me what you think are the best resources. Recently I did a google search on the key words "postgresql cursor loop" the example below is all I could come up with but it doesn't seem to work is this for an older version or am I just overlooking something simple? thanks Ken DECLARE emp_cursor CURSOR FOR SELECT Salary, Title, Start, Stop FROM Employee; OPEN emp_cursor; loop: FETCH emp_cursor INTO :salary, :start, :stop; if no-data returned then goto finished; find position in linked list to insert this information; goto loop; finished: CLOSE emp_cursor;
Re: [SQL] greetings
it is to be server side code the code I gave you was merely an example of a cursor that I found when I did a search... http://www.armed.net/how/pg001676.htm orginally what I wanted to do was this: INSERT INTO pledge_classes (semester, year) SELECT distinct pseason, pyear from load_bros WHERE pyear is not null ANDpseason is not null order by pyear, pseason; however pgsql does not allow order by in an INSERT-SELECT statement so i thought maybe I could do something like this: DECLARE CURSOR get_rows AS SELECT DISTINCT pseason, pyear FROM load_members WHERE pyear IS NOT NULL AND pseason IS NOT NULL ORDER BY pyear, pseason; BEGIN FOR rec IN get rows LOOP INSERT INTO pledge_classes (semester, year) VALUES (rec.pseason, rec.pyear); END LOOP; COMMIT; END; / Well, all the code I just showed you works in orcacle but pgsql is a little different and even though the book has an example of a cursor http://www.postgresql.org/docs/aw_pgsql_book/node142.html it does not explain before hand 1) the format of an anoymous block 2) how to loop a cursor 3) how to reference columns froma cursor row (ie rec.column_name) thanks Ken Tom Lane wrote: > Ian Lance Taylor <[EMAIL PROTECTED]> writes: > > PL/pgSQL does not support cursors. It also does not support goto. > > The context is pretty unclear here, but perhaps he needs ecpg not > plpgsql ... is this to be client- or server-side code? > > regards, tom lane
[SQL] conversion
Hello,
another brain twister, at least for me...
i have a table of varchar and one of the values I want
to insert into another table, one of the columns is
defined as INTEGER in destination table, column...
and none of these statements seem to work
INSERT INTO pledge_classes (semester, year)
SELECT pseason, to_number('pyear','') from temp;
INSERT INTO pledge_classes (semester, year)
SELECT pseason, pyear::integer from temp;
INSERT INTO pledge_classes (semester, year)
SELECT pseason, pyear::numeric(4) from temp;
Re: [SQL] conversion
follow up
actually the destination column is defined
as a numeric(4)
the following are the statements again with there error messages:
SELECT pseason, to_number(pyear,'') from temp;
ERROR: Bad numeric input format ' '
SELECT pyear::int from temp;
ERROR: Cannot cast type 'varchar' to 'int4'
Ken Kline wrote:
> Hello,
> another brain twister, at least for me...
> i have a table of varchar and one of the values I want
> to insert into another table, one of the columns is
> defined as INTEGER in destination table, column...
> and none of these statements seem to work
>
> INSERT INTO pledge_classes (semester, year)
> SELECT pseason, to_number('pyear','') from temp;
>
> INSERT INTO pledge_classes (semester, year)
> SELECT pseason, pyear::integer from temp;
>
> INSERT INTO pledge_classes (semester, year)
> SELECT pseason, pyear::numeric(4) from temp;
Re: [SQL] conversion
here you go, thanks in advance, ken
Table "temp"
Attribute |Type | Modifier
---+-+--
pseason | varchar(15) |
pyear | varchar(5) |
adx=# \d pledge+ _classe4s s
Table "pledge_classes"
Attribute | Type | Modifier
---+--+--
pc_id | integer | not null default nextval('pc_seq'::text)
semester | varchar(6) |
year | numeric(4,0) |
pc_letter | varchar(20) |
pc_name | varchar(50) |
Index: pc_pk
adx=# \q
bash-2.04$
Script done on Mon Feb 26 11:42:35 2001
Christopher Sawtell wrote:
> On Mon, 26 Feb 2001 17:11, Ken Kline wrote:
> > follow up
> > actually the destination column is defined
> > as a numeric(4)
>
> Could you to a \d on each of the tables and tell us the results.
>
> [ ... ]
>
> --
> Sincerely etc.,
>
> NAME Christopher Sawtell
> CELL PHONE 021 257 4451
> ICQ UIN45863470
> EMAIL csawtell @ xtra . co . nz
> CNOTES ftp://ftp.funet.fi/pub/languages/C/tutorials/sawtell_C.tar.gz
>
> -->> Please refrain from using HTML or WORD attachments in e-mails to me
> <<--
Re: [SQL] Weird NOT IN effect with NULL values
this is kind of weird but it is how it works. You cannot use equality for null... Null does not equal Null Null means no value, since it's not a value it can't equal anything another no value. SELECT name FROM customer WHERE customer_id NOT IN ( SELECT customer_id FROM salesorder ) and customer_id is not null; should work Ken Frank Joerdens wrote: > When doing a subselect with NOT IN, as in > > SELECT name > FROM customer > WHERE customer_id NOT IN ( > SELECT customer_id > FROM salesorder > ); > > (from Bruce Momjian's book) > > I get no rows if the result column returned by the subselect > contains NULL values. It works as expected if I remove the NULL values > from the result set. Is this behaviour correct and if so, why? > > I am using 7.1 beta 4. > > Regards, Frank
[SQL] perl dbd
my apologies if this is not the coreect list but I cannot seem to install the package DBD-Pg-0.73-1.i386.rpm it complains that it needs libpq.so.1 i have the following installed from a source package rebuild: postgresql-7.0.3-2 ..server ..devel ..perl ..tk ..odbc ..tcl thanks as always Ken ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[SQL] Silly question about numbering of rows?
Hi All, I haven't posted a silly question in a while, and didn't want to break my streak... I have a table like this: id | make | model | year | value -+---++---+--- 57 |2 |0 | 4 | 4750 57 |2 |3 | 4 | 4750 57 |2 |0 | 0 | 4750 57 |2 |0 | 3 | 4750 57 |2 |3 | 0 | 4750 57 |2 |3 | 3 | 4750 2 |2 |0 | 3 | 4750 2 |2 |3 | 3 | 4750 2 |2 |0 | 4 | 4350 2 |2 |3 | 4 | 4350 2 |2 |0 | 0 | 4750 2 |2 |0 | 5 | 4750 2 |2 |3 | 0 | 4750 2 |2 |3 | 5 | 4750 I want to be able to select from this table like this: select , id, value from mytable where make=2 model=3 and year=5 order by score desc; which would return this: nbring |id | value ---+-+--- 1 | 57 | 4750 2 | 2 | 4350 Everything is working swimmingly, except for the numbering magic. As you can see there are several variations of data in that table, so I can't use a serial on the table directly. Further, this table is likely to change on a minute by minute basis, so I don't want to create a permanent numbering that will just have to be changed. I thought about creating a temporary table with a serial and selecting into that so that a numbering is created and then returning rows from that, but that sounds like much busywork for the database. It sounds easiest to me to just punt and number the rows as they are returned in my calling application... What's the best approach here? -- Ken Corey, CTOAtomic Interactive, Ltd. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[SQL] Fwd: Silly question about numbering of rows?
Hi All, I haven't posted a silly question in a while, and didn't want to break my streak... I have a table like this: id | make | model | year | value -+---++---+--- 57 |2 |0 | 4 | 4750 57 |2 |3 | 4 | 4750 57 |2 |0 | 0 | 4750 57 |2 |0 | 3 | 4750 57 |2 |3 | 0 | 4750 57 |2 |3 | 3 | 4750 2 |2 |0 | 3 | 4750 2 |2 |3 | 3 | 4750 2 |2 |0 | 4 | 4350 2 |2 |3 | 4 | 4350 2 |2 |0 | 0 | 4750 2 |2 |0 | 5 | 4750 2 |2 |3 | 0 | 4750 2 |2 |3 | 5 | 4750 I want to be able to select from this table like this: select , id, value from mytable where make=2 model=3 and year=5 order by score desc; which would return this: nbring |id | value ---+-+--- 1 | 57 | 4750 2 | 2 | 4350 Everything is working swimmingly, except for the numbering magic. As you can see there are several variations of data in that table, so I can't use a serial on the table directly. Further, this table is likely to change on a minute by minute basis, so I don't want to create a permanent numbering that will just have to be changed. I thought about creating a temporary table with a serial and selecting into that so that a numbering is created and then returning rows from that, but that sounds like much busywork for the database. It sounds easiest to me to just punt and number the rows as they are returned in my calling application... What's the best approach here? -- Ken Corey, CTOAtomic Interactive, Ltd. [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] Re: Re: Using Random Sequence as Key
First of all, let me apologise for stepping so far back into the archives,
but I wanted to get this in the archives in case anyone else wanted to use
this.
There's a slight addendum here...as written, it's fairly likely that people
will get duplicate ID's using this system.
A small change gets rid of the problem: multiply the serialportion by the
number of digits you'll have in timeportion. So, if timeportion is three
digits, you'd multiply serialportion by 1000 and *then* add the timeportion.
Below, I use the mod operator '%' to make sure it's 3 digits or less.
Caveat: this will only work as long as you have less than 40 million users.
;^)
-Ken
create FUNCTION new_request_id()
RETURNS INT4
as
'
DECLARE
timeportion INT4;
serialportion INT4;
BEGIN
timeportion := cast
(date_part(''milliseconds'',timeofday()::timestamp) as integer);
serialportion := nextval(''request_idfake_seq'')* 1000;
RETURN (timeportion % 1000) + serialportion;
END;'
LANGUAGE 'plpgsql'
;
On Tuesday 17 April 2001 3:08 pm, Josh Berkus wrote:
> Bernardo,
>
> > I needed the random field because if I use serial and the user gets a
> > 34203
> > he's sure that 34202 exists, and that (probably, there where 34202
> > inserts
> > before him (or at least an offset + some)). Using a random just makes
> > the
> > user totally blind.
> > As I said I could use a serial for indexing the table but I NEED the
> > random
> > field and I need to to be unique since all the queries will be using
> > it as a
> > search parameter.
> > If inserting this way is slow it's not such a big deal since it's a
> > small db
> > and inserts are seldom made.
> > Thanks in advance for any help.
>
> Here's another suggestion for you then:
>
> 1. Add a sequence "Sales_sq"
>
> 1. write a custom function for new id numbers:
> CREATE FUNCTION new_sales_id() RETURNS INT4 AS '
> DECLARE
> timeportion VARCHAR;
> serialportion INT4;
> BEGIN
> timeportion := to_char(current_timestamp, ''ms'');
> -- (or whatever the abbreviation for 2-digit milliseconds is)
> serialportion := 100*(nextval(''sales_seq''));
> RETURN CAST(to_number(timeportion) AS INT4) + serialportion;
> END;
>
> 3. Then set the id column to default to this new function.
>
> This would give you (after you correct my mistakes) a number, the first
> X digits of are Serial, and the last 2 digits based on the server's
> internal clock. Thus, the numbers would *not* be sequential, and would
> appear fairly random, but would be unique *without* and expensive check
> for that value anywhere in the table for each insert.
>
> -Josh Berkus
>
>
>
>
>
>
> __AGLIO DATABASE SOLUTIONS___
>Josh Berkus
> Complete information technology [EMAIL PROTECTED]
>and data management solutions (415) 565-7293
> for law firms, small businessesfax 621-2533
> and non-profit organizations. San Francisco
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
--
Ken Corey, CTOAtomic Interactive, Ltd. [EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Returning PK of first insert for second insert use.
On Mon, 2002-07-29 at 19:32, Peter Atkins wrote:
> I have two tables t_proj, t_task see below:
>
> CREATE TABLE t_proj (
> proj_id SERIAL NOT NULL,
> PRIMARY KEY (proj_id),
> task_id integer(12),
> user_id integer(6),
> title varchar(35),
> description varchar(80)
> );
>
> CREATE TABLE t_task (
> task_id SERIAL NOT NULL,
> PRIMARY KEY (task_id),
> title varchar(35),
> description varchar(80)
> );
>
> When I insert into t_task I need to return the task_id (PK) for that insert
> to be used for the insert into the t_proj table.
>
> I tried using RESULT_OID but I have no idea how to obtain the true PK using
> this opague id. Below is the procedure I tried to use.
Since the primary key of the first table is a SERIAL, it's really
defined as something like this:
create table t_task (
task_id int4 not null default nextval('t_task_task_id_seq'),
...
Which means that you can predict what the next value will be, store that
in a temporary var, and then insert it into both tables...
CREATE OR REPLACE FUNCTION insertTask (varchar, varchar)
RETURNS INTEGER AS '
DECLARE
-- local variables
oid1 INTEGER;
retval INTEGER;
tempvar int4;
BEGIN
select into tempvar nextval(''t_task_task_id_seq'');
INSERT INTO t_task (task_id, title, description)
VALUES (tempvar,$1, $2);
-- Everything has passed, return id as pk
RETURN tempvar;
END;
' LANGUAGE 'plpgsql';
WARNING: this is not guaranteed to be the correct syntax, I didn't
create the tables and the function to test it, but I do this kind of
thing all the time in my functions.
--
Ken Corey CTO http://www.atomic-interactive.com 07720 440 731
---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: [SQL] Returning PK of first insert for second insert use.
On Mon, 2002-07-29 at 20:52, Peter Atkins wrote: > Is there a possibility of another application accessing the DB and using the > id before my function has completed the transaction? I'm concerned with the > possibility of cross-over of ID's if the insert hangs. > > There's no way to return the id of that insert inherently, and then use it > for the second insert? I think SQL uses something like ADD_ID, not sure. That's the beauty of the nextval statement. The database internally sequences requests to it so that you're kept out of harm's way. Say process A called the function,and nextval returns 16. The function now continues on its way, but is not finished when process B then calls the function (before A is done), and nextval returns 17. So, then function called by process A returns 16, and the function called by process B returns 17. That means that unless the results of process B depend in some way upon the results of process A, there's no problem. -Ken -- Ken Corey CTO http://www.atomic-interactive.com 07720 440 731 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] VACUUM not doing its job?
I've run into this myself. Tom lane helped me out. In my case, it was the fact that indexes don't release the space of indexes of deleted rows. So, if you have a table that has a lot of inserts/deletes, your indexes will grow incredibly fast. The way to see what your biggest items are: select * from pg_class order by relpages desc; If your tables are active with lots of inserts/deletes, the biggest things will likely be indexes. The only way that I know to recover this space is to drop the indexes and recreate them. Vacuum didn't touch them for me. -Ken On Sat, 2002-08-03 at 21:53, Kristian Eide wrote: > Three week ago we did a pg_dump/pg_restore, and after pgsql/data/ took about > 750MB. Now it uses 2,4GB of disk space! If we do a new complete dump and > restore the volume will decrease to about 800MB. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[SQL] Timezone issue with date_part
In trying to debug some code, I've come across this SQL issue that's
causing my problem.
I've got two epoch time values that I have to compare. Time #1 seems
to be working straightforwardly enough, but a tricky timezone-related
error has surfaced with Time #2.
Looking at the straight timestamp:
kenzoid=# select max(posted_date) from pinds_blog_entries
kenzoid-# where package_id = '2969'
anddraft_p = 'f'
anddeleted_p = 'f'
kenzoid-# kenzoid-# kenzoid-# ;
max
2002-11-01 09:56:41.474084
That's correct, for my timezone. (EST5EDT)
The query that's in the script now to return that as an epoch time is:
kenzoid=# select coalesce
(date_part('epoch',max(posted_date)),0) as last_update
from pinds_blog_entries
where package_id = '2969'
anddraft_p = 'f'
anddeleted_p = 'f'
kenzoid-# kenzoid-# kenzoid-# kenzoid-# kenzoid-# ;
last_update
--
1036144601.47408
I finally realized something was amiss, and reconstituted that epoch
value:
kenzoid=# select timestamp 'epoch' + interval '1036144601.47408
seconds';
?column?
--
2002-11-01 04:56:41.47408-05
I'm five hours off...my timezone value, I imagine.
I tried putting the TIMESTAMP into the date_part, but no joy:
kenzoid=# select coalesce (date_part('epoch',
kenzoid-# TIMESTAMP max(posted_date)),0)
kenzoid-# as last_update
kenzoid-# from pinds_blog_entries
where package_id = '2969'
anddraft_p = 'f'
anddeleted_p = 'f'
kenzoid-# kenzoid-# kenzoid-# kenzoid-# ;
ERROR: parser: parse error at or near "max"
I kinda figured that.
So I'm stuck, without making two calls. If I call to the db and get
max(posted_date), and then turn around and call the date_part
with that value, things work. But I'm trying to avoid the two db
calls. Any ideas? Thanks!!
--
Ken Kennedy | http://www.kenzoid.com| [EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Timezone issue with date_part
On Sat, Nov 02, 2002 at 09:17:14AM -0500, Tom Lane wrote:
> Ken Kennedy <[EMAIL PROTECTED]> writes:
> > [ date_part('epoch') is wrong for a timestamp value ]
>
> The epoch value is really only correct for a TIMESTAMP WITH TIME ZONE
> value. If you apply date_part('epoch') to a timestamp without time zone,
> as you appear to be doing here, what you will get is the epoch for the
> given value interpreted as GMT.
Excellent! I see. The table is indeed using TIMESTAMP WITHOUT TIME
ZONE. (It is, in fact, an old 'datetime' hold-on in the table creation
DDL.) Hopefully, I can alter that sucker in place...it'll help for
upgrade scripts.
> A hack solution is to cast the value to TIMESTAMP WITH TIME ZONE before
> extracting the epoch; the cast will assume that the given value is local
> time. But a better idea is to store the column as TIMESTAMP WITH TIME
> ZONE in the first place.
Gotcha. I've confirmed the hack solution is working for now, and
eliminates my even hackier (more hackish?) two-call solution. I'll get
with the package owner (this is in an OpenACS package) and we'll work
out an upgrade for the table and procs.
> (IMHO, the SQL spec is really brain-dead to define timestamp without
> time zone as the default form of timestamp; the variant with time zone
> is much more useful for most applications.
I see exactly what you're saying now. I guess that's the reason
datetime resolves to 'TIMESTAMP WITHOUT TIME ZONE'? I agree...the TZ
is very useful to have tagging along!
Thanks so much for your help, Tom!
--
Ken Kennedy | http://www.kenzoid.com| [EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] how to get the source table & field name of a view field
On Thu, Nov 07, 2002 at 05:12:20PM +0800, Prime Ho wrote: > Hi, > > Could you tell me how to get view field's source table and field name? > another word, how could I know the view field come from? > \d view_name should give you the view definition in pqsl. -- Ken Kennedy | http://www.kenzoid.com| [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Making NULL entries appear first when ORDER BY
Well, for the docs to list every possible conditional-statement for an order by clause would just about include them all, so be all the more confusing. Sub queries, IN, aggregate functions, aliases.. the list goes on and on. I'd say that knowledge (that most conditionals can be used in an order-by or group-by) should be implicit once a person has a basic understanding of the language. Thomas F.O'Connell wrote: How would one know from the reference material that it is possible to include IS NOT NULL in an ORDER BY clause? Similarly, other than the FAQ, I've never been able to tell from the SELECT documentation why ORDER BY random() works. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Feb 16, 2005, at 3:39 AM, Andreas Joseph Krogh wrote: On Wednesday 16 February 2005 04:47, Bruno Wolff III wrote: Now, as you see, touples with NULL in the "start_time"-field appear "after" the others. I would like to make all entries where start_time IS NULL apear *before* all the others. Any idea how to achieve this? SELECT start_date, start_time, end_time, title FROM onp_crm_activity_log WHERE start_date IS NOT NULL ORDER BY start_date ASC, start_time IS NOT NULL ASC, start_time ASC; This assumes you want the NULL start times first within a particular date. Otherwise change the order in the ORDER BY clause. Thanks! This si, IMO, the cleanest solution as it doesn't involve any COALESCE. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] definative way to place secs from epoc into timestamp
and the fact that I want the same value from the data base that I put into it. "same" in which sense? The same absolute point in time? Or the same point on a calendar? Obviously if the timezone doesn't change, then the two are equivalent; but which one is your application actually looking for? (If your app is using Unix epoch times, then it's looking only at the absolute time and not the calendar time...) Unix time stamps, short (int) or long res, are always supposed to GMT based, as far as I know - I never seen anything different, except maybe in homebrew software. So it should be both calendar and P.I.T. And you wouldn't need the TZ storage if the date-number and number-> translation itself takes the TZ arg so that it can localize the Human String for you. Ken ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] definative way to place secs from epoc into timestamp
Unix time stamps, short (int) or long res, are always supposed to GMT based, as far as I know - I never seen anything different, except maybe in homebrew software. So it should be both calendar and P.I.T. And you wouldn't need the TZ storage if the date-number and number-> translation itself takes the TZ arg so that it can localize the Human String for you. Ken In fact, I would suggest that if there is any function, or field, that takes a TZ-less argument (*especially* if it takes only the number), that its name should be made to contain 'UTC' so clearly disambiguate whats its intended use for (since zone-less values/fields SHOULD be regarded as UTC) - Otherwise, some users will place epoch numbers adjusted for the their timezone in the field (and even with daylight saving offsets applies, somewhat amusingly but wrong). So then two different users are using the exact same datatype for inconsistent types. (just a concern for interoperability, user awareness, and when an employee comes on-board and has to deal with bad legacy) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[SQL]
In PL/pgSQL, is there a way to put a *variable* column-name in a dot notation reference to a RECORD column? For example, suppose I want to write a function like the following, which is to be called by a "BEFORE INSERT" trigger: CREATE OR REPLACE FUNCTION foo ( ) RETURNS TRIGGER AS ' DECLARE var VARCHAR; BEGIN var := TG_ARGV[0] NEW. := ''whatever''; RETURN NEW; END; ' LANGUAGE 'plpgsql' ; The aim of this uninteresting function is to assign the value 'whatever' to the table column that is passed in by the calling trigger as TG_ARGV[0], i.e. the first calling argument. What I don't know is what to put into the dot notation in place of "." so that the column of NEW that is addressed by the assignment statement is the one passed in as the first argument. Is there any PL/pgSQL construct that could be substituted in here to achieve this result? If not, can anybody suggest a way to write a trigger-called function that would accomplish the same result? In case it's not obvious, the underlying goal is to write a single trigger-called function that could modify different columns for each trigger that called it, where each trigger specified the target column by a calling argument (or by any other viable mechanism). ~ TIA ~ Ken ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[SQL] Variable column names in PL/pgSQL RECORD referencces
(Sorry for the redundancy – I sent this query earlier but forgot to put a title on it. Seems like it would be more useful with a title, so here it is again. If there’s a moderator who can delete my earlier message, please do so.) In PL/pgSQL, is there a way to put a *variable* column-name in a dot notation reference to a RECORD column? For example, suppose I want to write a function like the following, which is to be called by a "BEFORE INSERT" trigger: CREATE OR REPLACE FUNCTION foo ( ) RETURNS TRIGGER AS ' DECLARE var VARCHAR; BEGIN var := TG_ARGV[0] NEW. := ''whatever''; RETURN NEW; END; ' LANGUAGE 'plpgsql' ; The aim of this uninteresting function is to assign the value 'whatever' to the table column that is passed in by the calling trigger as TG_ARGV[0], i.e. the first calling argument. What I don't know is what to put into the dot notation in place of "." so that the column of NEW that is addressed by the assignment statement is the one passed in as the first argument. Is there any PL/pgSQL construct that could be substituted in here to achieve this result? If not, can anybody suggest a way to write a trigger-called function that would accomplish the same result? In case it's not obvious, the underlying goal is to write a single trigger-called function that could modify different columns for each trigger that called it, where each trigger specified the target column by a calling argument (or by any other viable mechanism). ~ TIA ~ Ken
[SQL] Defaulting a column to 'now'
How can a column’s default be set to ‘now’, meaning ‘now’ as of when each row is inserted? For example, here’s a snip of DDL: create table personal_data (… effective_date_and_time TIMESTAMP WITH TIME ZONE not null default 'now',… The problem is, when PostgreSQL processes this DDL, it interprets the ‘now’ as the timestamp when the table is created, so that the tables definition reads as if the DDL were: effective_date_and_time TIMESTAMP WITH TIME ZONE not null default ' 2005-12-14 11:00:16.749616-06 ', so all of the newly inserted rows get assigned effective_date_and_time = ' 2005-12-14 11:00:16.749616-06 ', which in addition to being wrong leads to uniqueness constraint violations. ~ TIA ~ Ken
Re: [SQL] Defaulting a column to 'now'
Thanks, Tom (also Keith Worthington and Bricklen Anderson). That works. ~ Ken > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 14, 2005 1:15 PM > To: Ken Winter > Cc: PostgreSQL pg-sql list > Subject: Re: [SQL] Defaulting a column to 'now' > > "Ken Winter" <[EMAIL PROTECTED]> writes: > > How can a column's default be set to 'now', meaning 'now' as of when > each > > row is inserted? > > You need a function, not a literal constant. The SQL-spec way is > CURRENT_TIMESTAMP > (which is a function, despite the spec's weird idea that it should be > spelled without parentheses); the traditional Postgres way is > now() > > Either way only sets an insertion default, though. If you want to > enforce a correct value on insertion, or change the value when the > row is UPDATEd, you need to use a trigger. > > regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[SQL] Rule causes baffling error
I'm trying to figure out why a rule gives me a uniqueness violation when I
try to do an update.
I have a table, "my_data", defined as:
create table my_data (
id INT8 not null default nextval('person_seq'),
effective_date_and_time TIMESTAMP WITH TIME ZONE not null default
CURRENT_TIMESTAMP,
expiration_date_and_time TIMESTAMP WITH TIME ZONE null default
'infinity',
user_name VARCHAR(255)null,
constraint PK_MY_DATA primary key
(effective_date_and_time, id)
);
I have a view, my_data_now, defined as:
SELECT
my_data.id,
my_data.user_name,
my_data.effective_date_and_time,
my_data.expiration_date_and_time
FROMmy_data
WHERE my_data.effective_date_and_time <= 'now'::text::timestamp(6)
with time zone
AND my_data.expiration_date_and_time >=
'now'::text::timestamp(6) with time zone;
And I have this rule (among others):
CREATE OR REPLACE RULE upd_my_data_now AS
ON UPDATE TO my_data_now
DO INSTEAD
(
/* Update current record, and make it effective now. */
UPDATE my_data
SET id = NEW.id,
user_name = NEW.user_name,
effective_date_and_time =
('now'::text)::timestamp(6) with time zone
WHERE effective_date_and_time =
OLD.effective_date_and_time
AND id = OLD.id;
/* Insert a record containing the old values,
and expire it as of now. */
INSERT INTO my_data (
effective_date_and_time,
expiration_date_and_time,
id,
user_name)
VALUES (
OLD.effective_date_and_time,
('now'::text)::timestamp(6) with time zone,
OLD.id,
OLD.user_name)
)
;
This rule is supposed to (1) cause an update directed to the view
"my_data_now" to be made to the underlying table "my_data", (2) reset the
"effective_date_and_time" of that row to 'now', (3) insert a record
containing the old values into "my_data", and (4) expire that "old" record
by setting its "expiration_date_and_time" to 'now'.
But when I try to do an update against the view "my_data_now" with a query
such as:
update my_data_now set user_name = 'Suzy' where id = 1;
I get:
ERROR: duplicate key violates unique constraint "pk_my_data"
Presumably this happens when the rule tries to insert the new row. The new
row does indeed contain the "old" id and effective_date_and_time. However,
the rule is structured so that the current row's "effective_date_and_time"
gets updated to 'now' *before* the new row is inserted, making its value
different from the old "effective_date_and_time". So the uniqueness
conflict shouldn't occur.
I figure either there's some bug in my code that I can't see, or else the
PostgreSQL rule processor works in some way that I don't understand.
In either case, help!
~ TIA
~ Ken
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Re: [SQL] Rule causes baffling error
Richard ~ Thanks for your response. Can a trigger be written on a *view*? I can't find anything in the PostgreSQL docs that answers this question. I originally wrote these actions (described in my original message) as a trigger on my base table, but then realized I was getting in deeper and deeper trouble because (a) I was getting into cascading triggers that I didn't want and (b) I need to enable some queries to access the base table without triggering these actions. That's why I set up the view, and then I assumed that the only way I could implement these actions was as rules. ~ Ken > -Original Message- > From: Richard Huxton [mailto:[EMAIL PROTECTED] > Sent: Monday, December 19, 2005 4:08 AM > To: Ken Winter > Cc: 'PostgreSQL pg-sql list' > Subject: Re: [SQL] Rule causes baffling error > > Ken Winter wrote: > > This rule is supposed to (1) cause an update directed to the view > > "my_data_now" to be made to the underlying table "my_data", (2) reset > the > > "effective_date_and_time" of that row to 'now', (3) insert a record > > containing the old values into "my_data", and (4) expire that "old" > record > > by setting its "expiration_date_and_time" to 'now'. > > I think you want a trigger rather than a rule. > > Rules rewrite the query structure, triggers let you deal with values on > a row-by-row basis (for row-level triggers). > > -- >Richard Huxton >Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] Rule causes baffling error
elevant change is that I'm now expiring the record with the old data and inserting the one with the new data, rather than vice versa. I still don't know why the old rule didn't work and this one does, but hey, whatever. Another advantage of the new one is that I don't have to re-point foreign keys that were already pointed to the record containing the old data, because that record stays in place. (The other change, adding the lines AND effective_date_and_time <= CURRENT_TIMESTAMP AND expiration_date_and_time >= CURRENT_TIMESTAMP; to the UPDATE, was necessary to keep updates to the "my_data_now" from updating the expired rows as well.) Thanks for your help. I hope this little essay is of some value to others. ~ Ken > -Original Message- > From: Richard Huxton [mailto:[EMAIL PROTECTED] > Sent: Monday, December 19, 2005 11:05 AM > To: Ken Winter > Cc: 'PostgreSQL pg-sql list' > Subject: Re: [SQL] Rule causes baffling error > > Ken Winter wrote: > > Richard ~ > > > > Thanks for your response. > > > > Can a trigger be written on a *view*? I can't find anything in the > > PostgreSQL docs that answers this question. > > There's nothing for them to fire against even if you could attach the > trigger. I suppose you could have a statement-level trigger in more > recent versions, but for row-level triggers there aren't any rows in the > view to be affected. > > > I originally wrote these actions (described in my original message) as a > > trigger on my base table, but then realized I was getting in deeper and > > deeper trouble because (a) I was getting into cascading triggers that I > > didn't want and (b) I need to enable some queries to access the base > table > > without triggering these actions. That's why I set up the view, and > then I > > assumed that the only way I could implement these actions was as rules. > > Hmm - the cascading should be straightforward enough to deal with. When > you are updating check if NEW.expiration_date_and_time = now() and if so > exit the trigger function (since there's nothing to do anyway). > > The other thing you might want to consider is whether the "live" data > should be in the same table as the "old" data. That will depend on how > you want to use it - conceptually is it all one continuum or is the > "old" data just for archive purposes. > > Now, having got this feature working, why do you want to bypass it? Will > it be a specific user, involve specific patterns of values or what? > > -- >Richard Huxton >Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] The Information Schema vs the PG Catalog
I’m writing PL/pgSQL routines that generate triggers, functions, and rules based on design characteristics of tables, columns, and other database objects. These routines need to be able to look up the definitions of these objects. I see that there are two places available to look up this info: the Information Schema and in the PG Catalog. Which source is preferable? Or if that answer isn’t absolute, what are the reasons or conditions for preferring one over the other? Also, a specific question: Does the Information Schema offer any way to list the sequences that exist and their attributes? I can’t seem to find any. ~ TIA ~ Ken
Re: [SQL] The Information Schema vs the PG Catalog
Thanks, George. What you say fits with what I was finding. I think that's the way I will go. ~ Ken > -Original Message- > From: George Pavlov [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 27, 2005 12:11 PM > To: [email protected] > Cc: [EMAIL PROTECTED] > Subject: Re: The Information Schema vs the PG Catalog > > I would say that pg_catalog is the more complete one whereas the > information_schema the more generic, standards-conformant place. I would > stick with the information_schema unless that becomes inadequate. A case > in point may be sequences. Apart from > information_schema.columns.column_default I haven't seen them > represented anywhere there (please someone correct me if I am wrong). > You can get more information about sequences from pg_catalog.pg_class > (look for pg_class.relkind='S') and various views that sit on top of > that (e.g. pg_statio_all_sequences). > > George > > > > I'm writing PL/pgSQL routines that generate triggers, functions, > > and rules based on design characteristics of tables, columns, and > > other database objects. These routines need to be able to look up > > the definitions of these objects. I see that there are two places > > available to look up this info: the Information Schema and in the > > PG Catalog. > > > > Which source is preferable? Or if that answer isn't absolute, > > what are the reasons or conditions for preferring one over the > > other? > > > > Also, a specific question: Does the Information Schema offer any > > way to list the sequences that exist and their attributes? I > > can't seem to find any. > > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] Arrays in PL/pgSQL routines?
Can arrays be declared in PL/pgSQL routines? If so, how? Section 8.10 of the documentation (http://www.postgresql.org/docs/7.4/static/arrays.html) tells how to declare and use arrays as table columns. But I don’t find any part of the documentation that says how to declare a simple array local to a PL/pgSQL function. I tried the following guess, but it only won me a “syntax error at or near VARCHAR: DECLARE my_array VARCHAR []; … ~ TIA ~ Ken
Re: [SQL] Arrays in PL/pgSQL routines?
Bricklen ~
That works. (Odd that the initialization seems to be necessary to make it
work.) Thanks! Yes, I'm using version 7.4.
~ Ken
> -Original Message-
> From: Bricklen Anderson [mailto:[EMAIL PROTECTED]
> Sent: Thursday, December 29, 2005 12:53 PM
> To: Ken Winter
> Cc: PostgreSQL pg-sql list
> Subject: Re: [SQL] Arrays in PL/pgSQL routines?
>
> Ken Winter wrote:
> > Can arrays be declared in PL/pgSQL routines? If so, how?
> >
>
> > DECLARE
> >
> try:
> my_array VARCHAR[] := '{}';
>
> not sure if this works in 7.4 though, if that's the version that you are
> using.
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
[SQL] Still struggling with history tables
Friends ~
I'm still trying to implement a solution to the requirement to keep a
complete history of data changes to a "person" table. (See earlier
correspondence below.) I'm trying for a variant of the architecture
suggested by Richard Huxton (also below). In my variant, I have this
"identifier" table, carrying the id and invariant info about each person:
/*==*/
/* Table: person_i */
/*==*/
create table person_i (
idi BIGSERIALnot null,
date_of_birthDATE null,
constraint PK_person_i_key_1 primary key (idi)
)
;
And then I have this "history" table, carrying the time-varying info on
which I want to keep a complete history:
/*==*/
/* Table: person_h */
/*==*/
create table person_h (
idh INT8 not null,
start_date DATE not null default 'now()',
end_date DATE null,
name VARCHAR(255) null,
constraint PK_person_h_key_1 primary key (idh, start_date),
constraint fk_reference_6 foreign key (idh)
references person_i (idi)
on delete restrict on update restrict
)
;
Triggers are in place on the "person_h" table so that when an app does an
update, the current h record is expired (with its old data) and a new record
(wuth the updated data)is inserted and made effective "now". What I'm now
trying to build is this view:
/*==*/
/* View: person */
/*==*/
create view person as
select idi, date_of_birth, start_date, end_date, name
from person_i i, person_h h
where i.idi = h.idh;
I want to enable users (and apps) who need to deal only with current data to
be able to treat "person" as a real table (i.e. to write to it as well as
read from it). Specifically, the requirements are that when a user does:
. Insert - The system inserts a record into the i table and the first
record in the h table.
. Select - The system returns attributes of i and h tables (not
duplicating the identifier columns).
. Update - The system allows updating of i attributes
(update-in-place, not creating a new history record) and h attributes
(creating a new history record).
. Delete - The system deletes the i record and all of its h records.
I'm stuck on how to implement the "insert" action, which I thought would be
simple. The problem is this: The i table id is of type BIGSERIAL, i.e.
sequence-assigned. I've tried writing the following rule to get both the i
record and the first h record inserted:
CREATE RULE ru AS
ON INSERT TO person
DO INSTEAD (
INSERT INTO person_i DEFAULT VALUES;
INSERT INTO person_h (idh) VALUES (NEW.idi)
);
I thought this would grab the sequence-assigned value of person_i.idi to put
into person_h.idh (this trick works in trigger functions), but instead it
just assigns Null to person_h.idh, and the transaction fails with a "not
null" violation on person_h.idh. And now that I look at the documentation
(http://www.postgresql.org/docs/7.4/static/rules-update.html), it says that
a column not assigned in the invoking query "is replaced by a null value
(for an INSERT)" in the NEW pseudo-record. Bummer. Triggers behave nicely,
but rules don't.
I'd be willing to do it with a trigger function instead, but I can't attach
a trigger to a view.
I considered doing it with a trigger function on the person_i table, but I
don't know how that could be made to cause an insert of the person_h table
record - and the assignment of h table values such as "name" from the app's
query.
Suggestions?
~ TIA
~ Ken
> -Original Message-
> From: Richard Huxton [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 20, 2005 4:16 AM
> To: Ken Winter
> Cc: 'PostgreSQL pg-sql list'
> Subject: Re: [SQL] Rule causes baffling error
>
> Ken Winter wrote:
> > Richard ~
> >
> > Let me zoom out for a moment, for the bigger picture.
> >
> > As you have inferred, what I'm trying to do is develop a history-
> preserving
> > table ("my_data" in the example that started this thread). *Most* user
> > programs would see and manipulate this table as if it contained only the
> > current rows (marked by effective_date_and_time <= 'now' and
> > expiration_date_and_time = 'infinit
[SQL] Non Matching Records in Two Tables
I need some help with a bit of SQL. I have two tables. I want to find records in one table that don't match records in another table based on a common column in the two tables. Both tables have a column named 'key100'. I was trying something like: SELECT count(*) FROM table1, table2 WHERE (table1.key100 != table2.key100); But the query is very slow and I finally just cancel it. Any help is very much appreciated. -Ken
Re: [SQL] Non Matching Records in Two Tables
On Wed, 2006-02-08 at 16:27 -0500, Frank Bax wrote:
At 04:10 PM 2/8/06, Ken Hill wrote:
>I need some help with a bit of SQL. I have two tables. I want to find
>records in one table that don't match records in another table based on a
>common column in the two tables. Both tables have a column named 'key100'.
>I was trying something like:
>
>SELECT count(*)
>FROM table1, table2
>WHERE (table1.key100 != table2.key100);
>
>But the query is very slow and I finally just cancel it. Any help is very
>much appreciated.
vacuum analyse table1;
vacuum analyse table2;
select count(*) from table1 full outer join table2 on
table1.key100=table2.key100 where table1.key100 is null or table2.key100 is
null;
If this is also slow, post output of "EXPLAIN ANALYSE SELECT "
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
OK. I added indexes on the two columns in the two tables:
CREATE INDEX key100_idex
ON ncccr9 (key100);
CREATE INDEX key100_ncccr10_idex
ON ncccr10 (key100);
Here is the analysis of the query:
csalgorithm=# EXPLAIN ANALYSE SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 = ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
QUERY PLAN
Aggregate (cost=208337.59..208337.59 rows=1 width=0) (actual time=255723.212.. 255723.214 rows=1 loops=1)
-> Hash Left Join (cost=99523.55..207101.41 rows=494471 width=0) (actual ti me=92326.635..255538.447 rows=38144 loops=1)
Hash Cond: ("outer".key100 = "inner".key100)
Filter: ("inner".key100 IS NULL)
-> Seq Scan on ncccr9 (cost=0.00..59360.71 rows=494471 width=104) (ac tual time=171.778..75099.734 rows=494471 loops=1)
-> Hash (cost=88438.64..88438.64 rows=611564 width=104) (actual time= 91962.956..91962.956 rows=0 loops=1)
-> Seq Scan on ncccr10 (cost=0.00..88438.64 rows=611564 width=1 04) (actual time=11.704..76519.323 rows=611564 loops=1)
Total runtime: 255724.219 ms
(8 rows)
The result of 38,144 non-matching records seems too much:
csalgorithm=# SELECT count(*)
csalgorithm-# FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 = ncccr10.key100
csalgorithm-# WHERE ncccr10.key100 IS NULL;
count
---
38144
(1 row)
Maybe I need to do a RIGHT JOIN to return the count of records in table 'ncccr10' that don't match records in 'ncccr9'?
Thanks for your help. JOINS are fairly new to me.
[SQL] Column Index vs Record Insert Trade-off?
Is there a performance trade-off between column indexes and record inserts? I know that in MS Access there is such a trade-off. This being indexes make SQL queries perform faster at the cost of record insert speed. Put another way, the more column indexes in a table, the slower a record insert in that table performs. Is there a similar trade-off in PostgreSQL?
Re: [SQL] Syntax for "IF" clause in SELECT
This has been something I've been trying do so that I can do some column comparisons as part of "data-cleaning" work. I'll let you know if this helps me accomplish my task! On Wed, 2006-02-08 at 15:20 -0800, Bricklen Anderson wrote: [EMAIL PROTECTED] wrote: > Greetings, > > the following is an MySQL statement that I would like to > translate to PostgreSQL: > > Could someone point me to a documentation of a coresponding > Systax for an "IF" clause in the a SELECT, > or is the some other way to do this > > select > if(spektrum is null,' ','J'), > if(s19 is null,' ','J'), > if(OhneGrenze is null,' ','J'), > from namen; > > > Do I need to create my own function to allow this behaviour! > > > my best regards, > > Stefan use CASE Since I'm not a user of MySQL, and if I'm reading your query correctly: try select (CASE when spektrum is null then 'J' else spektrum end), ... or if you are just trying to replace nulls, then try COALESCE ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] unique constraint instead of primary key? what
On Wed, 2006-02-08 at 21:04 -0500, george young wrote: [PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1] I'm designing a completely new schema for my database. A major criterion is that it facilitate ad-hoc queries via MS-access, excel and OpenOffice, presumably with ODBC. My question regards the use of UNIQUE constraints instead of PRIMARY KEY's on some tables. Both result in an index on the specified tuple of fields, so I presume query performance shouldn't be much different. Using UNIQUE constraints seems to let me better match the natural structure of my data. A 'run' contains a sequence of 'opsets'. Each opset contains a sequence of (a few) 'step's. run-foo opset-1 step-1 step-2 opset-2 step-1 So the 'steps' table is logically indexed by (run, opset_num, step_num). But some opsets are not in runs, and some steps are not in opsets, so I would have step.run be null in some cases, likewise step.opset_num. Null values mean I can't use these fields in a primary key, so I propose to use UNIQUE constraints instead. What am I losing by not using PRIMARY KEYS? Will ODBC clients have difficulty dealing nicely with the database? Will the planner produce lousy query plans? Will Edgar Codd haunt my dreams? -- George Young I think I can give you some insights about MS Access to help you. In MS Access, you can specify a column as a "primary key"; which basically means the column is indexed and must contain unique values (also, nulls are not allowed). I have run into problems depending on columns being "primary key" in MS Access in db apps that receive data in batch file uploads from other sources (e.g., uploading 1,000+ records into a table). Is sounds like your requirement to use MS Access for ad-hoc queries means that you will have some users that want to access the database with MS Access as a "front-end" client tool. If that is the situation, then you don't need to worry about the structure of the table as MS Access relies on ODBC for this. You may also want to communicate to the end users that MS Access is not a client-server tool; in other words, all of the records are transferred from the server to the client's box and then the query is executed. I hope that helps.
Re: [SQL] date
On Fri, 2006-02-10 at 07:38 +0100, A. Kretschmer wrote:
am 09.02.2006, um 22:18:09 -0800 mailte superboy143 (sent by Nabble.com) folgendes:
>
> Hello,
>
> How can I write an sql query in postgresql so that I can insert a date into
> a table in the format DD-MM-, and when I select the date from the table
> I should get the date in the same format.
You can't define the format in the db, but you can define the
output-format with to_char(date, 'DD-MM-');
HTH, Andreas
You could also try using the data_part() function:
date_part('month',date)||-||date_part('day',date)||-||date_part('year',date)
But I think Andreas' suggestion is a bit more elegant.
Re: [SQL] query
On Fri, 2006-02-10 at 00:11 -0600, Bruno Wolff III wrote: On Tue, Feb 07, 2006 at 01:45:50 -0800, "superboy143 (sent by Nabble.com)" <[EMAIL PROTECTED]> wrote: > > I have a table in which I have a field with format like 100101. It has many values like 100101, 100102, 100103, 100201, 100202, 100301. I have to write a query such that I have to get only distinct values such that they contain only the substring I need. If I give 10 as substring, then it should return only 100101 or 100102 but not both i.e if the last two characters are not same it should not return both of them. It should return only values starting with 10 the middle two values should be distinct and the last two characters may be anything. You can probably use the Postgres extension DISTINCT ON to do what you want. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq Try substring(column,1,4). That should return values of 1001, 1002, 1003 when grouped.
[SQL] Deleting rows in a file based on condition
I have the following perl script that reads a fixed-width file and replaces values in various sections of the file.
---
open (IN, '< in.txt');
open (OUT, '> out_test.txt');
while () {
chomp;
$first_section = substr $_, 0, 381; # extract the first section of the record
$facilityno = substr $_, 381, 10; # extract the facilityno field
$second_section = substr $_, 391, 1056; # extract the second section of the record
$requestor_section=" " x 500;
# $requestor_section = substr $_, 1446, 499; # extract requestor section of record
$third_section = substr $_, 1946, 4748; # extract third section of record
# print out the file with changed facilityno value ...
print OUT "$first_section$\0039007300$\$second_section$\$requestor_section$\$third_section\n";
}
close (IN);
close (OUT);
I want to place an "if...then" condition on the $facilityno value; such that if the $facilityno value = 00, delete the record (e.g., don't print out that row); rather skip that row and continue printing out the remaining rows.
Any advice is very much appreciated.
Re: [SQL] Deleting rows in a file based on condition
Oops. I posted this to the wrong support list. Sorry.
-Ken
On Fri, 2006-02-10 at 09:52 -0800, Ken Hill wrote:
I have the following perl script that reads a fixed-width file and replaces values in various sections of the file.
---
open (IN, '< in.txt');
open (OUT, '> out_test.txt');
while () {
chomp;
$first_section = substr $_, 0, 381; # extract the first section of the record
$facilityno = substr $_, 381, 10; # extract the facilityno field
$second_section = substr $_, 391, 1056; # extract the second section of the record
$requestor_section=" " x 500;
# $requestor_section = substr $_, 1446, 499; # extract requestor section of record
$third_section = substr $_, 1946, 4748; # extract third section of record
# print out the file with changed facilityno value ...
print OUT "$first_section$\0039007300$\$second_section$\$requestor_section$\$third_section\n";
}
close (IN);
close (OUT);
I want to place an "if...then" condition on the $facilityno value; such that if the $facilityno value = 00, delete the record (e.g., don't print out that row); rather skip that row and continue printing out the remaining rows.
Any advice is very much appreciated.
[SQL] Slow update SQL
I'm experiencing a very slow query. The table contains 611,564 rows of data. I vaccumed the table:
VACUUM ANALYZE ncccr10;
SELECT count(*) FROM ncccr10;
count
611564
(1 row)
When I try to analyze the query plan with:
EXPLAIN ANALYZE
UPDATE ncccr10
SET key = facilityno||'-'||
lastname||'-'||
sex||'-'||
ssno||'-'||
birthdate||'-'||
primarysit||'-'||
dxdate||'-'||
morphology3
WHERE date_part('year',dxdate) > '2000';
The query just never finishes (even 1 hour later). The colum key100 is indexed, and I'm setting the value of this
column from other columns. Why is this so slow?
Re: [SQL] Slow update SQL
On Mon, 2006-02-13 at 22:17 -0700, Michael Fuhr wrote:
[Please copy the mailing list on replies.]
On Mon, Feb 13, 2006 at 06:48:06PM -0800, Ken Hill wrote:
> On Mon, 2006-02-13 at 19:14 -0700, Michael Fuhr wrote:
> > How many rows does the condition match?
>
> csalgorithm=# SELECT count(*) FROM ncccr10 WHERE
> date_part('year',dxdate) > '2000';
> count
>
> 199209
> (1 row)
You're updating about a third of the table; an _expression_ index on
date_part probably wouldn't help because the planner is likely to
stick with a sequential scan for such a large update. Even if it
did help it's likely to be a small fraction of the total time.
The table definition you sent me showed nine indexes. You might
see a substantial performance improvement by dropping all the
indexes, doing the update, then creating the indexes again (don't
forget to vacuum and analyze the table after the update). However,
dropping the indexes has obvious implications for other queries so
you might need to do the update at a time when that doesn't matter.
> > Have you queried pg_locks
> > to see if the update is blocked on an ungranted lock?
>
> I don't know what that is. How do I query pg_locks?
SELECT * FROM pg_locks;
http://www.postgresql.org/docs/7.4/static/monitoring-locks.html
> > What version of PostgreSQL are you running?
>
> 7.4.8. Thank you for your help.
Newer versions generally perform better; consider upgrading to 8.0
or 8.1 if possible.
Removing the indexes, running the update SQL, and then adding back the indexes worked much faster. Thank you for you help.
Re: [SQL] Non Matching Records in Two Tables
On Wed, 2006-02-08 at 22:31 +0100, Markus Schaber wrote: Hi, Ken, Ken Hill schrieb: > I need some help with a bit of SQL. I have two tables. I want to find > records in one table that don't match records in another table based on > a common column in the two tables. Both tables have a column named > 'key100'. I was trying something like: > > SELECT count(*) > FROM table1, table2 > WHERE (table1.key100 != table2.key100); > > But the query is very slow and I finally just cancel it. Any help is > very much appreciated. Do you have indices on the key100 columns? Is autovacuum running, or do you do analyze manually? Can you send us the output from "EXPLAIN ANALYZE [your query]"? Btw, I don't think this query will do what you wanted, it basically creates a cross product, that means if your tables look like: schabitest=# select * from table1; key100 | valuea | valueb ++ 1 | foo| bar 2 | blah | blubb 3 | manga | mungo schabitest=# select * from table2; key100 | valuec | valued ++ 1 | monday | euro 2 | sunday | dollar 4 | friday | pounds Then your query will produce something like: schabitest=# select * from table1, table2 WHERE (table1.key100 != table2.key100); key100 | valuea | valueb | key100 | valuec | valued +++++ 1 | foo| bar| 2 | sunday | dollar 1 | foo| bar| 4 | friday | pounds 2 | blah | blubb | 1 | monday | euro 2 | blah | blubb | 4 | friday | pounds 3 | manga | mungo | 1 | monday | euro 3 | manga | mungo | 2 | sunday | dollar 3 | manga | mungo | 4 | friday | pounds I suggest you would like to have all records from table1 that don't have a corresponding record in table2: schabitest=# select * from table1 where table1.key100 not in (select key100 from table2); key100 | valuea | valueb ++ 3 | manga | mungo HTH, Markus Here is my query SQL: SELECT key100 FROM ncccr10 WHERE ncccr10.key100 NOT IN (SELECT key100 FROM ncccr9); It is is running after 30 minutes. Here is the query plan: QUERY PLAN - Seq Scan on ncccr10 (cost=0.00..20417160510.08 rows=305782 width=104) Filter: (NOT (subplan)) SubPlan -> Seq Scan on ncccr9 (cost=0.00..65533.71 rows=494471 width=104) (4 rows) Any ideas why it is so slow?
Re: [SQL] Non Matching Records in Two Tables
On Tue, 2006-02-14 at 13:08 -0800, chester c young wrote: > Here is my query SQL: > > SELECT key100 FROM ncccr10 > WHERE ncccr10.key100 NOT IN (SELECT key100 FROM ncccr9); > > It is is running after 30 minutes. Here is the query plan: > I would try an outer join: select a.key100 from ncccr10 a left join ncccr9 b on( key100 ) where b.key100 is null; also (hate to be obvious) have you analyzed lately? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Yes, I did a VACUUM ANALYZE on the entire database before running this query. Also, I get an error with your suggestion: csalgorithm=# SELECT a.key100 FROM ncccr10 a csalgorithm-# LEFT JOIN ncccr9 b ON(a.key100) csalgorithm-# WHERE b.key100 IS Null; ERROR: argument of JOIN/ON must be type boolean, not type character
Re: [SQL] Non Matching Records in Two Tables
On Tue, 2006-02-14 at 16:07 -0500, Tom Lane wrote: Ken Hill <[EMAIL PROTECTED]> writes: > Seq Scan on ncccr10 (cost=0.00..20417160510.08 rows=305782 width=104) >Filter: (NOT (subplan)) >SubPlan > -> Seq Scan on ncccr9 (cost=0.00..65533.71 rows=494471 width=104) > (4 rows) > Any ideas why it is so slow? "NOT (subplan)" is horrendous (and the system knows it, note the huge cost estimate). Try increasing work_mem enough so you get a hashed subplan instead. regards, tom lane How do I ncrease work_mem?
Re: [SQL] create table and data types
On Tue, 2006-02-14 at 22:12 +0100, Maciej Piekielniak wrote: Hello pgsql-sql, Is anybody know how create field in a new table with data type accuiring from a field in other table? For example: create table new_table ( name other_table.name%TYPE ); Have you tried inheritance from one table to the new table? CREATE TABLE new_table (new_column) INHERITS (old_table) All columns in 'old_table' will be inclueded in 'new_table' plus the column 'new_column'.
Re: [SQL] create table and data types
On Tue, 2006-02-14 at 23:03 +0100, Maciej Piekielniak wrote:
Hello Ken,
Tuesday, February 14, 2006, 10:30:34 PM, you wrote:
KH> On Tue, 2006-02-14 at 22:12 +0100, Maciej Piekielniak wrote:
>> Hello pgsql-sql,
>>
>> Is anybody know how create field in a new table with data type accuiring from a field in other table?
>> For example:
>>
>> create table new_table
>> (
>> name other_table.name%TYPE
>> );
>>
KH> Have you tried inheritance from one table to the new table?
KH> CREATE TABLE new_table (new_column)
KH> INHERITS (old_table)
KH> All columns in 'old_table' will be inclueded in 'new_table' plus the
KH> column 'new_column'.
yes, but i don't need all colums, i need only the same data type for only
some fields.
for example
create table new table
(
name other_table.name%TYPE,
mynewfield VARCHAR(100),
mynewfield2 VARCHAR(100)
);
Have you tried restructuring the table with CREATE TABLE AS...? Try this:
CREATE TABLE new_table
(id, mynewfield, mynewfield2)
AS SELECT id FROM old_table);
This should create a new table ('new_table') with the data-type for 'old_table' for the id column.
Re: [SQL] Non Matching Records in Two Tables
On Tue, 2006-02-14 at 13:24 -0800, Ken Hill wrote: On Tue, 2006-02-14 at 13:08 -0800, chester c young wrote: > Here is my query SQL: > > SELECT key100 FROM ncccr10 > WHERE ncccr10.key100 NOT IN (SELECT key100 FROM ncccr9); > > It is is running after 30 minutes. Here is the query plan: > I would try an outer join: select a.key100 from ncccr10 a left join ncccr9 b on( key100 ) where b.key100 is null; also (hate to be obvious) have you analyzed lately? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Yes, I did a VACUUM ANALYZE on the entire database before running this query. Also, I get an error with your suggestion: csalgorithm=# SELECT a.key100 FROM ncccr10 a csalgorithm-# LEFT JOIN ncccr9 b ON(a.key100) csalgorithm-# WHERE b.key100 IS Null; ERROR: argument of JOIN/ON must be type boolean, not type character Well, this works: SELECT * FROM ncccr9 a LEFT JOIN ncccr10 b USING( key100 ) WHERE b.key100 is null; It still seems slow. It takes about the same time to run as in MS Access. I thought PostgreSQL would be faster.
Re: [SQL] Non Matching Records in Two Tables
On Tue, 2006-02-14 at 15:05 -0800, Bryce Nesbitt wrote: Ken Hill wrote: >> also (hate to be obvious) have you analyzed lately? >> I'd say that's fair game, not obvious. Vacuum/Analyze is ar so aparent to a person moving to Postgres from other DB's. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org In my O'Reily "Practical PostgreSQL" book it recommends vacuuming a production database on a daily basis. I liken this to MS Access' "compact/repair" procedure.
[SQL] Modifying postgresql.org file
It has been suggested to me to increase my work_mem to make queries preform faster. I believe I do this in the 'postgresql.org' file. I seem to have two of these files: /etc/postgresql/7.4/main/postgresql.org /usr/share/postgresql/7.4/postgresql.conf.sample I believe the second one is an example/sample file (similar to how xorg.conf comes with a sample file). Can someone point me in a direction as to where I can learn how to modify the postgresql.org file to increase work_mem? I have the O'Reilly book "Practical PostgreSQL" but it doesn't seem to cover this topic. I'm fairly confident that I can tweak the postgresql.org file; given that I was successfuly tweaking my xorg.conf file. Any help/guidance is very much appreciated. -Ken
Re: [SQL] Modifying postgresql.org file
On Wed, 2006-02-15 at 11:20 -0500, Andrew Sullivan wrote: On Wed, Feb 15, 2006 at 08:15:46AM -0800, Ken Hill wrote: > It has been suggested to me to increase my work_mem to make queries > preform faster. I believe I do this in the 'postgresql.org' file. I > seem to have two of these files: > > /etc/postgresql/7.4/main/postgresql.org > /usr/share/postgresql/7.4/postgresql.conf.sample Where did you get this version of Postgres? The main config file should be $PGDATA/postgresql.conf. Anything else is probably the result of your system's packaging having done some magic. But in any case, unless I'm misremembering, the work_mem setting isn't in 7.4. You can check the docs on postgresql.org. There's a whole section on the configuration variables, and manuals are available for several releases back. A This is how Ubuntu installed postgresql via it's synaptic package manager.
Re: [SQL] Modifying postgresql.org file
On Wed, 2006-02-15 at 18:09 +0100, Peter Eisentraut wrote: Ken Hill wrote: > Can someone point me in a > direction as to where I can learn how to modify the postgresql.org > file to increase work_mem? RTFM I apologize for my lack of knowledge, but what is "RTFM"?
Re: [SQL] Slow update SQL
bungsu,
That does make the query work a bit faster. Thanks!
On Mon, 2006-02-27 at 09:42 +0700, Bungsuputra Linan wrote:
Hi Ken,
I used to have the same problem. In my computer, using date_part in WHERE
clause will always slow down the system when the table has plenty of rows.
My suggestion is try to change the query to:
... WHERE dxdate >= '2001-01-01';
I hope this helps.
Regards,
bungsu
- Original Message -----
From: Ken Hill
To: Postgres Help
Sent: Tuesday, February 14, 2006 8:48 AM
Subject: [SQL] Slow update SQL
I'm experiencing a very slow query. The table contains 611,564 rows of data.
I vaccumed the table:
VACUUM ANALYZE ncccr10;
SELECT count(*) FROM ncccr10;
count
611564
(1 row)
When I try to analyze the query plan with:
EXPLAIN ANALYZE
UPDATE ncccr10
SET key = facilityno||'-'||
lastname||'-'||
sex||'-'||
ssno||'-'||
birthdate||'-'||
primarysit||'-'||
dxdate||'-'||
morphology3
WHERE date_part('year',dxdate) > '2000';
The query just never finishes (even 1 hour later). The colum key100 is
indexed, and I'm setting the value of this
column from other columns. Why is this so slow?
Re: [SQL] Triggers using PL/pgSQL
On Sun, 2006-07-30 at 21:16 -0700, Thusitha Kodikara wrote: > Hello, > > I am interested in developing some triggers to keep track of records > that are changed (possibly the changes of one or more specific > columns). In addition to keeping the new values, I also need to keep > the old values (may be on a separate table). > > Though I have done similar things in other RDBMs using SQL, I find > doing this in Postgres, a little bit complicated - may be because it > needs to be done through a separate procedural language and through a > separate function. The Postgres documentation also didn't provide much > help ( the examples in C). I was mainly looking for example showing > how to refer 'OLD' and 'NEW' rows using PL/pgSQL. > > Can someone please direct me to some such examples? > > Thanks and regards, > -Thusitha I too am very interested in this! -Ken ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [SQL] Case Preservation disregarding case
Chuck McDevitt wrote:
At Teradata, we certainly interpreted the spec to allow case-preserving,
but case-insensitive, identifiers.
Users really liked it that way
My 2 thoughts:
1: It seems like this behavior of case sensitive-or-not-identifiers
could/should be a config option -- either globally for the server,
database, or at the connection/session level. Other databases *do*
support this type of granular config of misc SQL behavior -- its
essential for shared hosting environments. Without it some users just
*cant* make the switch. Quoting all an app's identifiers -- or renaming
camel-case to underscored -- show stopper.
2: Even though the spec state different (that identifiers should be
treated as case sensitive or else folded), precedence seems to have
changed that:
a) The databases that enforce this rule are fewer, I believe. IMO SQL
is now considered even higher than a 4GL language because it use is so
widespread - laymen need to use it.
b) the fact that different identifiers of mixed case could even coexist
in a table-columns or 'AS' or 'JOIN' -- really represents a more of an
err'd design -- and a case-insen option would detect this (unlike the
current behavior). It would throw an immediate ("fail fast") runtime
exception. So I think it's *safer*. (If tbl.rowId and tbl.rowid both
exist in a table or AS identifiers, something bad _will_ happen when
someone takes over a project)
If there were a new default behavior (or just config option added), my
vote would, without a doubt, be for case-insens (yet case preserving)
mode... even when using quoting identifiers. This case sen. behavior
doesn't seem to offer any advantage/safety.
ken
---(end of broadcast)---
TIP 6: explain analyze is your friend
Re: [HACKERS] [SQL] Case Preservation disregarding case
Martijn van Oosterhout wrote: On Sat, Dec 02, 2006 at 12:41:37AM -0700, Ken Johanson wrote: 1: It seems like this behavior of case sensitive-or-not-identifiers could/should be a config option -- either globally for the server, database, or at the connection/session level. Other databases *do* support this type of granular config of misc SQL behavior -- its essential for shared hosting environments. Without it some users just *cant* make the switch. Quoting all an app's identifiers -- or renaming camel-case to underscored -- show stopper. What about option 3: use camelcase without underscares and don't quote. Then you get case-insensetivity and it's still readable. You're obviously talking about an app which isn't quoting identifiers, so I'm not sure what the actual problem is. Have a nice day, Yes, I do routinely use non-quoted identifiers. The problem is, that they are case-folded (to lower in PG's case), so my camel-case does not match. For the query to work I MUST quote identifiers hat have camel-case. SELECT pers."firstName", pers.lastname, ... Has your experience with PG been different? If so I presume you have have found a config that allows?: SELECT pers.firstName, pers.lastname, Ken ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [SQL] Case Preservation disregarding case
Dennis Bjorklund wrote: Ken Johanson skrev: Has your experience with PG been different? If so I presume you have have found a config that allows?: SELECT pers.firstName, pers.lastname, As long as you don't create the columns using quotes you can use that kind of names. For example CREATE TABLE foo (BAR int); INSERT INTO foo VALUES (42); SELECT BaR, bar, BAR, "bar" FROM foo; But using "Bar" wont work. /Dennis That definitely makes sense. If one *wants* to enforce case, they should create the identifier with quotes. Although, since I'm using pgAdmin (III) to design tables in this case, or creating the tables through JDBC (a convenient way to copy tables and data from another datasource) (using the copy-paste gesture), I believe those tools both *are* quoting identifiers that have camel case. And that their behavior can't be overridden. So though I might personally be comfortable with DDL commands to re-define my existing quoted columns (as created by JDBC and pgAdmin), other users may not be. And having to recreate and issue the DDL to use un-quoted col names will be tedious in my case since I have ~20 tables to import. So my vote would remain for having a config-option to ignore case, even on quoted identifiers.. Ken ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [SQL] Case Preservation disregarding case
Dennis Bjorklund wrote: So my vote would remain for having a config-option to ignore case, even on quoted identifiers.. And my vote is to not have such an option. But I'm not the one who decide so don't worry about what I think :-) I would like to have an option to upper case the identifiers instead of lower casing them as pg do. The sql standard say that they should be upper cased. But as far as I know there are no plan at the moment to add such an option either. Some time in the future I expect it to be implemented only because it's the standard. /Dennis Ya, I thought about that - just designing with all-lower case column names. But then the column names are harder to read, unless I change over to using underscore naming. I personally cant see any benefit at all to having the DB fold case (up or down), or enforcing case sensitive identifiers. In fact I believe that if here were an option to support case insensitive (at least on a session-level) that would be largely innocuous --- beneficial even since the optimizer would now detect that someone created both a rowId, rowid, and that indicates a possible design error (the whole fail-fast notion). In one way I think that even allowing creation of a separate "rowid" and "rowId" sort of violates set theory in a 4+ GL language... a "name" in its most abstract (human) sense doesn't (shouldn't) consider the case of its characters. Only what the characters are. A rowid is also a rowId (or ROWID). Who really intentionally mixes them? (only 3-4GL *programmers* who consider all-caps to represent constants in my experience). thoughts, Ken ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [SQL] Case Preservation disregarding case
Martijn van Oosterhout wrote: On Sat, Dec 02, 2006 at 11:08:51AM -0700, Ken Johanson wrote: And my vote is to not have such an option. But I'm not the one who decide so don't worry about what I think :-) I would like to have an option to upper case the identifiers instead of lower casing them as pg do. The sql standard say that they should be upper cased. But as far as I know there are no plan at the moment to add such an option either. Some time in the future I expect it to be implemented only because it's the standard. I think it's unlikely to happen anytime soon. The primary reason being that then you can no longer use indexes to search the catalog. Which I'm pretty sure this is no the case - other DBs do allow index search on columns/identifiers regardless of their case. Probably the typical strategy is to use a case-insensitive hashtable (fold case for the keys before generating the hash). If its the actual data that you're referring to in index searches, that would be a separate topic I think. means it has to be fixed at initdb time. And it would break a large number of client apps, for no particularly good reason. I take a different opinion on this: -*If* the option to turn on case-insenetive behavior were selectable at the DB or session level, the existing apps could continue to use the case sensitve mode and be completely unaffected. -IMO turning it on *globally* would only break apps that are built case-sensitivly *and* refer to identifiers of the same name (but mixed case) *and* are written for PG (since PG *had* been by and large non-portable until recently.. the addition of standard string quoting for example) -It would *enhance* people's ability to "bring in" apps from so many other DBs which don't treat identifiers as case sensitive. More of a compatibility boon than loss. Thats is a particularly good reason to me (since I'm the one who has to issue DDL on all my camelCase columns and recode my identifiers). Since the way identifiers are treated is user-visible, it would mean that apps would have to be coded to work with any setting. What would probably happen is that app A would only work with case-sensetive, and app B would only work with case-insensetive, and you end up with two apps that can't work on the same database. That's *bad*, we don't want to go there. That is a good point and I'd normally agree - entice people to use the lowest common denominator behavior and code their apps case-sensitive. And yet, the DBs that expect case-sens are now the minority, and we have: a) programmers who code against MySQL or MSSQL, or; b) are laymen try to run or port an app designed on MySQL to PG Maybe not right per se - but the more popular way of doing things eventually wins out. .. In one way I think that even allowing creation of a separate "rowid" and "rowId" sort of violates set theory in a 4+ GL language... a "name" in its most abstract (human) sense doesn't (shouldn't) consider the case of its characters. Only what the characters are. A rowid is also a rowId (or ROWID). Who really intentionally mixes them? (only 3-4GL *programmers* who consider all-caps to represent constants in my experience). The thing is, postgresql *is* case-insensetive, as is the whole SQL language. It not case-preserving, that's all. Right, it's case insensitive only if you're willing to accept case folding (down) everything that's not quoted. Not being case-preserving, as you say. But thats a pita to anyone coming from those "other" DBs and wants their column names to have mixed/camel case (for readability). PG right now *forces* them to change/adhere to an underscore naming, or to quote *every* mixed case identifier. You MUST tolerate having your column names stored in all-lower case, or else you must quote all of them. Best, Ken ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [SQL] Case Preservation disregarding case
Tom Lane wrote: Ken Johanson <[EMAIL PROTECTED]> writes: -*If* the option to turn on case-insenetive behavior were selectable at the DB or session level, the existing apps could continue to use the case sensitve mode and be completely unaffected. Ken, you clearly fail to understand the point being made above. This is not something that is "selectable at the session level". It won't work from either a functional or performance perspective. Tom, re: "clearly fail to understand": I know it's not currently selectable at the session-level. My proposal (quote): *If* the option ... were selectable was to make them selectable -- this and another item that I'd proposed in the past (The proposal for 'standard_conforming_strings' at the session level was a step *toward* spec without harming existing apps). Having to set standard_conforming_strings globally for example, is a problematic setup for shared servers (ISPs etc). A session configurable strategy is a huge feature to have in shared environments. (BTW that strategy is both "functional and performing" for Mysql -- several SQL conformance items can be config'd per connection). ...proposals that involve moving away from the SQL spec rather than closer to it; and that's what you're asking us to do. If I ever proposed changing any *default* (spec) behavior, it was inadvertent or a mistake. (prob. in making my argument that case-insens does more harm that good for new PG users). Of course adhering to spec is desirable, and if that mode of operation remains *supported by default*, good. Adding per-connection or per-db 'quirks mode' (in the case of plug-in apps already built as case-insens is innocuous).. that's a useful thing. Put another way how many places is PG *deviating* from spec just by providing conveniences -- would users revolt if those were dropped? For compatibility, whether with old PG or Mysql (which is case-preserving but case insens.), we NEED them. Not as a default behavior per se, but as a per-session option. Best, Ken ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [SQL] Case Preservation disregarding case
Martijn van Oosterhout wrote:
I think you're missing the point that clients will be using languages
that are case sensetive. Consider the following Perl code:
$dbh->do("CREATE TEMP TABLE foo (Bar int4)");
$dbh->do("INSERT INTO foo VALUES (1)");
my $sth = $dbh->prepare("SELECT Bar FROM foo");
$sth->execute();
my $row = $sth->fetchrow_hashref();
print $row->{"bar"}; # prints 1
This code will works on any version of postgresql ever released, yet
your change will break it. By setting some options you can work around
it, but it's still a gratuitous change.
Have a nice day,
I agree, that code would break -- if the option were enabled globally --
because the named-column retrieval internals of that and many existing
modules don't do case-insens lookups. They would have to be retrofitted
to support it.
So that is the reason there was an idea proposed per database or per
connection SQL conformance controls (like what Mysql has). They'd allow
other apps to elect "less standard" modes (not just this one), for the
sake of compatibility (beit old PG modules or and other DB). You code
sample would continue to work.
Ken
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
Re: [HACKERS] [SQL] Case Preservation disregarding case
Bruce Momjian wrote: Tom Lane wrote: The real bottom line, though, is that this community has little respect for proposals that involve moving away from the SQL spec rather than closer to it; and that's what you're asking us to do. The spec is not at all vague about the case-sensitivity of identifiers. Now certainly we are not doing exactly what the spec says, but what you ask is even less like the spec's requirements. I think there are two major issues here: o if you quote identifiers that have any upper-case characters, do it both at table creation and use o display of non-quoted identifiers is lower-case I think we are OK making people either always quote, or always not quote. What we don't currently have a good answer for is people wanting the identifiers displayed using the original case. You can use quotes all the time of you want such display, but it is a pain. I think this is the crux of the complaints. Saving the case of the original creation and displaying that does work, but then it isn't clear if the identifier needs quotes (is the upper-case real or just for display). This gets us into even more confusion. Can someone can think of an answer to all this? I believe there is no one answer that will solve all cases.. but one solution that might come close is something like this (I take no credit, others have described this before me): PG would support two modes of operation: 1) Standard compliant, and; 2) Quirks / compatibility mode (case preserving but case-insensitive) I believe (and may be wrong) that the #2 mode-of-operation would only require: a) resultset data to have it's (unquoted) identifiers to be case-folded, and; b) queries with (unquoted) identifiers (joins / aliases etc) would continue in the planner to be case folded, so would work as now (and in effect be case-insensitive). c) the table describe syntax would return the case-preserved id name (which it already does if quoted?), or d: d) in addition to a), optionally send metadata describing the case-preserved name; this might someday allow newer drivers to display (display only, not column lookup) those names in database-designer views (iSQL types apps) If #a is possible, then drivers should not break, even if the DB is config'd to use setting #2. But I don't know the low-level protocol of PG to know if that is possible. .. Hopefully I'm not missing any points here, please correct me if so... I believe what I described above (while not being standard complaint per se) is identical to how mysql and mssql work (operationally anyway) right now.. On the other had Oracle and others work as PG does now, but my point in discussing this, is that the first two DBs have enough market, that offering a compatibility mode to ease the burden of porting apps would have substantial value (I know this from experience) Ken ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[SQL] Comparing two slices within one table efficiently
I have a table with the following simplified form: create table t ( run_id integer, domain_id integer, mta_id integer, attribute1 integer, attribute2 integer, unique(run_id, domain_id, mta_id) ); The table has about 1 million rows with run_id=1, another 1 million rows with run_id=2, and so on. I need to efficiently query the differences between "runs" - i.e. For each (domain_id, mta_id) tuple in run 1, is there a coresponding tuple in run 2 where either attribute1 or attribute2 have changed? The only way I have been able to think of doing this so far is an o(n^2) search, which even with indexes takes a long time. e.g. select * from t t1 where exists (select 1 from t t2 where t2.mta_id=t1.mta_id and t2.domain_id=t1.domain_id and (t2.attribute1 != t1.attribute1 or t2.attribute2 != t1.attribute2) This query takes millenia... Any help would be greatly appreciated. I hope I am naively missing some obvious alternative strategy, since this sort of operation must be common in databases. Thanks, Ken -- Ken Simpson, CEO MailChannels Corporation Reliable Email Delivery (tm) http://www.mailchannels.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Comparing two slices within one table efficiently
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Christian Kindler [13/08/07 21:34 +0200]: > Yes and you could make it even more speedy with the use table partitioning. > http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html Thanks for all your speedy help, everyone. I tried doing a "self join" and that sped things up enormously (query took on the order of 30 seconds to compare two million-row table slices, resulting in a 20K row result). I will also try re-ordering the unique constraint to get speedier indexing out of it and will look at table partitioning. Regards, Ken - -- Ken Simpson CEO, MailChannels Fax: +1 604 677 6320 Web: http://mailchannels.com MailChannels - Reliable Email Delivery (tm) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGwLQq2YHPr/ypq5QRApP8AKDfRGqDFkcONh0YaojX7362nXP12gCg3WZ6 k5ZBwcMplXyVkEguQtbgdFU= =bsyu -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[SQL] Shorthand column labels (SELECT col1 foo, ...) vs (SELECT col1 AS foo, ...)
I notice PG doesn't allow shorthand column labels -- it requires the 'AS' operand. SELECT col1 foo, ...; -> ERROR: syntax error at or near "foo" For compatibility with other databases, what objections might be argued in allowing this syntax in the future? On the 'pros' side I think it eases migration to PG, shortens code, is similar syntax to shorthand table aliases, and some users might argue it has become defacto syntax among DBs. Regards, Ken ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Shorthand column labels (SELECT col1 foo, ...) vs (SELECT col1 AS foo, ...)
Paul Lambert wrote: Ken Johanson wrote: I notice PG doesn't allow shorthand column labels -- it requires the 'AS' operand. SELECT col1 foo, ...; -> ERROR: syntax error at or near "foo" Briefly discussed a couple of weeks ago. See http://archives.postgresql.org/pgsql-general/2008-01/msg00089.php Interesting thread(s)! What I didn't see discussed was the possibility of making a server and/or session option, where we could elect to turn-off the old behavior (PG specific behavior) and enable the standard/shorthand syntax. Users need a migration path. I personally cant ever see using those PGisms/features and would choose to enable the standard mode. I think I'd have fewer compatibility problems. Ken ---(end of broadcast)--- TIP 6: explain analyze is your friend
[SQL] What are the (various) best practices/opinions for table/column/constraint naming?
I'd like to see a list o t he various approaches, and a poll as to which are best and why, for naming table and columns and constraints. We've all seen several variations, but the most common (and pg used) seems to be: columns: primary key: _pk OR _id OR _rowid foreign key: _fk OR _join indexes: __idx sequences: __seq constraints: __req OR __constr (etc) The most used variations seem to be removing the '_', and/or to remove the table/column prefix from objects where it is implied (when there is seen to be no point trying to make the names unique, e.g when a label is needed): columns: primary key: pk example: SELECT person.pk [AS person_id], person.named, company.pk, company.named FROM contact AS person JOIN contact AS company ON person.companyid=company.pk Other variations suggest putting the type at the start of the object name: columns: primary key: pk_ etc foreign key: fk_ etc And other names which don't necessarily represent constraints or indexes and are only meaningful to the apps: columns: _id (integer numbers or alpha-num, abstract/machine meaningful: uuids, base-36 etc) _no (integer numbers, human meaningful) _nm OR _name (named value, e.g user_name, app_name, etc) _date OR _ts (datetime/timestamp, e.g created_date, modified_date etc) _info (informational value) And other naming conventions suggest using mixed/camel case (quoted-identifiers) instead of '_' delimiters, or no delimiters at all... Has anyone seen articles or iso/ansi endorsed best-practices for naming, or otherwise have an opinion about the variations? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] SQL standards in Mysql
Here's one Mysql developer's response to adding (fixing) the integer/bigint/tinyint types to their CAST function: http://bugs.mysql.com/bug.php?id=34562 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
