Re: [GENERAL] Postgres Encoding conversion problem
Clemens Schwaighofer wrote: I sometimes have a problem with conversion of encodings eg from UTF-8 tio ShiftJIS: ERROR: character 0xf0a0aeb7 of encoding UTF8 has no equivalent in SJIS I have no idea what character this is, I cannot view it in my browser, etc. It translates to Unicode 10BB7, which is not defined. I guess that is not intended; can you guess what the character(s) should be? If I run the conversion through PHP with mb_convert_encoding it works, perhaps he is ignoring the character. Is there a way to do a similar thing, like ignoring this character in postgres too? As far as I know, no. You'll have to fix the data before you import them. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres Encoding conversion problem
On 04/22/2008 05:37 PM, Albe Laurenz wrote: Clemens Schwaighofer wrote: I sometimes have a problem with conversion of encodings eg from UTF-8 tio ShiftJIS: ERROR: character 0xf0a0aeb7 of encoding UTF8 has no equivalent in SJIS I have no idea what character this is, I cannot view it in my browser, etc. It translates to Unicode 10BB7, which is not defined. I guess that is not intended; can you guess what the character(s) should be? to be honest no idea. its some chinese character, I have no idea how the user input this, because this is a japanese page. I actually found the carachter, but only my Mac OS X can show it. It looks similar to a japanese character used for a name, but how the chinese one got selected is a mystery to me ... If I run the conversion through PHP with mb_convert_encoding it works, perhaps he is ignoring the character. Is there a way to do a similar thing, like ignoring this character in postgres too? As far as I know, no. You'll have to fix the data before you import them. well, the web page data is in utf8 so I never see this issue, except I would write a method that detects illegal shift_jis characters, and thats difficult. The reporting is only done in CSV ... so I am not sure if it is worth to waste too much time here. thanks for the tip. -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to modify ENUM datatypes?
All, In the past I have used foreign keys to lookup tables for small lists of values that I now think ENUM could do the job of. I was hoping that by using ENUM, I would avoid having to do joins in my queries, and that I'd be making the data more robust and faster. I used to have a table for account_status: A | Active B | Billing Failed C | Closed D | Deactivated account.acct_type CHAR references account_type.acct_type CHAR But, now I've converted that to an ENUM: ACTIVE BILLING_FAILED CLOSED DEACTIVATED account.acct_type ENUM account_type The problem is that once I create a column in my account table that uses this 'account_type' datatype, I can't seem to change or add to it any more. I want to add a new value or edit/delete an existing one. How do you make changes to an ENUM datatype that is already in use? -- Dante -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres Encoding conversion problem
Clemens Schwaighofer wrote: I sometimes have a problem with conversion of encodings eg from UTF-8 tio ShiftJIS: ERROR: character 0xf0a0aeb7 of encoding UTF8 has no equivalent in SJIS I have no idea what character this is, I cannot view it in my browser, etc. It translates to Unicode 10BB7, which is not defined. I guess that is not intended; can you guess what the character(s) should be? to be honest no idea. its some chinese character, I have no idea how the user input this, because this is a japanese page. I actually found the carachter, but only my Mac OS X can show it. It looks similar to a japanese character used for a name, but how the chinese one got selected is a mystery to me ... Are you sure that your Mac OS X computer interprets the character as UTF-8? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How is statement level read consistency implemented?
Hi, in my office we were discussing the various ways statement level read consistency is implemented in different databases, namely Oracle and Postgres. I am interested in the technical details on how PG determines that a block needs to be read from from some other place than the data block because another transaction has updated the data block. I'm referring to the scenario that Tom Kyte describes here: www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html and how PG would detect that row 342,023 has been modified by a different transaction (and thus reads the correct value, unlike the other database) Oracle assigns a SCN (system change number) to each transaction, each block contains the SCN for which it is valid, any number higher than the current SCN indicates that the block has to be taken from the rollback segment. How is this test implemented in Postgres? Does it have a similar concept (SCN) or does WAL imply a completely different way? Thanks in advance Thomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How is statement level read consistency implemented?
On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer [EMAIL PROTECTED] wrote: I am interested in the technical details on how PG determines that a block needs to be read from from some other place than the data block because another transaction has updated the data block. Postgres uses MVCC for transaction consistency. When a row is updated, a new copy of the row is created and the old version is marked for deletion (though its not removed immediately). The old readers continue to read from the old copy whereas new transactions see the new copy. This is all managed by XMIN and XMAX which are transaction ids of the inserting and deleting transactions respectively and control the visibility of the different versions. If you want to understand MVCC, please see this excellent documentation by Tom Lane: http://www.postgresql.org/files/developer/transactions.pdf Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How is statement level read consistency implemented?
On Tue, Apr 22, 2008 at 12:41:08PM +0200, Thomas Kellerer wrote: I'm referring to the scenario that Tom Kyte describes here: www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html and how PG would detect that row 342,023 has been modified by a different transaction (and thus reads the correct value, unlike the other database) Check the docs, but the basic ideas are: - Every transaction has a transaction ID and a snapshot which identifies the transactions whose results it is allowed to see. - Every row has a start and end transaction ID which identifes who created it and who deleted it. - Each row also has a ctid pointer which points to the newer version of that row. What will happen in this case is that the updating transaction will duplicate the row, setting the end xid of the old version and the start xid of the new version to its own xid. Any parallel scan will see either the new version or the old version depending on its snapshot. If you end up trying to update a row someone else updated in the meantime, the ctid is followed to the new version, the tuple is retested for matching and then updated. See http://www.postgresql.org/docs/8.3/interactive/mvcc.html Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] How is statement level read consistency implemented?
Pavan Deolasee, 22.04.2008 12:57: On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer [EMAIL PROTECTED] wrote: I am interested in the technical details on how PG determines that a block needs to be read from from some other place than the data block because another transaction has updated the data block. Postgres uses MVCC for transaction consistency. When a row is updated, a new copy of the row is created and the old version is marked for deletion (though its not removed immediately). The old readers continue to read from the old copy whereas new transactions see the new copy. This is all managed by XMIN and XMAX which are transaction ids of the inserting and deleting transactions respectively and control the visibility of the different versions. If you want to understand MVCC, please see this excellent documentation by Tom Lane: http://www.postgresql.org/files/developer/transactions.pdf The xmin, xmax info is basically what I was looking for. As far as I can tell (from the PDF and your quote) Postgres uses a very similar concept as Oracle. . Each transaction has a uniqe number and each tuple contains the information for which transaction number it is visible. Thanks for the quick answer. Thomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
Christophe - We use LiquiBase: http://www.liquibase.org/home We don't use it quite the same as what you're proposing, but I think you could use it that way. When we found it, we did 'Generate Changelog' to start, then made new changes to the ChangeLog, and applied it to each database as needed. It has really helped us, because we were forgetting to apply the changes to the production database that were needed. Hope this helps! Kerri On 4/21/08, Christophe [EMAIL PROTECTED] wrote: Greetings, We have the traditional three servers: dev -- staging -- production each with a PostgreSQL instance and the same schema, at least over time. Right now, we handle schema migration (updating the schema for new releases) by manually-created scripts that apply the changes to staging and production. I'd like a tool that would automatically create these scripts, and I wondered if anything like this existed. The theory would be that it would consider two databases a and b, and produce the appropriate script to change b's schema to match a. Does anything like this exist? If not, I might have a new project... Thanks! -- Christophe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: FW: Re: [GENERAL] create temp in function
Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, Adrian Klaver [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: [GENERAL] How is statement level read consistency implemented?
Pavan Deolasee, 22.04.2008 12:57: On Tue, Apr 22, 2008 at 4:11 PM, Thomas Kellerer [EMAIL PROTECTED] wrote: I am interested in the technical details on how PG determines that a block needs to be read from from some other place than the data block because another transaction has updated the data block. Postgres uses MVCC for transaction consistency. When a row is updated, a new copy of the row is created and the old version is marked for deletion (though its not removed immediately). The old readers continue to read from the old copy whereas new transactions see the new copy. This is all managed by XMIN and XMAX which are transaction ids of the inserting and deleting transactions respectively and control the visibility of the different versions. If you want to understand MVCC, please see this excellent documentation by Tom Lane: http://www.postgresql.org/files/developer/transactions.pdf The xmin, xmax info is basically what I was looking for. As far as I can tell (from the PDF and your quote) Postgres uses a very similar concept as Oracle. . Each transaction has a uniqe number and each tuple contains the information for which transaction number it is visible. Oracle moves the old row(s) to the rollback segment instead of putting a new row in the table as PostgreSQL does. The new value goes right in place of the old row and it adds a pointer to the rollback segment. A commit removes the pointer and a rollback forces the db to move the old row from the rollback segment back in place of the row updated. Oracle's rollback segment can be tuned so that retains data up to a certain age and then it is flushed. If rollback is too small, you can get a snapshot too old error. Back to PostgreSQL, when a vacuum is issued, the old deleted rows are removed. This can be done with the autovacuum feature or manually. -Does vacuum make sure there are no long running queries referencing the deleted rows before it attempts to remove the old rows? -With autovacuum, does it skip these rows still being referenced in a transaction or does it wait? I'm guessing that you will never get the snapshot too old error in PostgreSQL based on the architecture. Jon -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
D. Dante Lorenso wrote: The problem is that once I create a column in my account table that uses this 'account_type' datatype, I can't seem to change or add to it any more. I want to add a new value or edit/delete an existing one. How do you make changes to an ENUM datatype that is already in use? As far as I know ENUM is not well suited to uses where new enumeration members may be added later. A lookup table and a foreign key is probably better for this sort of use. You *can* add elements to an enum type - sort of - by creating a new type and converting columns. It's ugly, though, and will be hard to get right when the column of interest is referenced by foreign keys and such. One way to do it if you really must: -- Starting state CREATE TYPE et1 AS ENUM('yes','no'); CREATE TABLE testtab ( a et ); INSERT INTO testtab (a) values ('yes'); -- Change CREATE TYPE et2 AS ENUM('yes','no','filenotfound'); ALTER TABLE testtab ALTER COLUMN a TYPE et2 USING ( CASE a WHEN 'yes' THEN 'yes'::et2 WHEN 'no' THEN 'no'::et2 END ); -- Alternative ALTER that's suitable if you're not removing anything from the enum ALTER TABLE testtab ALTER COLUMN a TYPE et2 USING ( a::text::et2 ); Personally, though, I'd stick to the good 'ol lookup table and foreign key. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How is statement level read consistency implemented?
Roberts, Jon, 22.04.2008 14:56: As far as I can tell (from the PDF and your quote) Postgres uses a very similar concept as Oracle. . Each transaction has a uniqe number and each tuple contains the information for which transaction number it is visible. Oracle moves the old row(s) to the rollback segment instead of putting a new row in the table as PostgreSQL does. The new value goes right in place of the old row and it adds a pointer to the rollback segment. A commit removes the pointer and a rollback forces the db to move the old row from the rollback segment back in place of the row updated. Oracle's rollback segment can be tuned so that retains data up to a certain age and then it is flushed. If rollback is too small, you can get a snapshot too old error. Thanks, this was exactly the answer I was looking for! Cheers Thomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How is statement level read consistency implemented?
Roberts, Jon wrote: Back to PostgreSQL, when a vacuum is issued, the old deleted rows are removed. This can be done with the autovacuum feature or manually. -Does vacuum make sure there are no long running queries referencing the deleted rows before it attempts to remove the old rows? It does. -With autovacuum, does it skip these rows still being referenced in a transaction or does it wait? It skips them, the idea being that a future vacuum will remove them. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
Hello On 22/04/2008, Kerri Reno [EMAIL PROTECTED] wrote: Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. you don't need upgrade to 8.3. Just use dynamic statements. Like: BEGIN EXECUTE 'CREATE TEMP TABLE a ...'; a) EXECUTE 'SELECT * FROM a WHERE ...' INTO somevars; b) FOR vars IN EXECUTE 'SELECT * FROM .. ' LOOP Regards Pavel Stehule begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, Adrian Klaver [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
Can you explain what you mean by the restriction to do SELECT INTO? Why are you using a temp table to begin with? Jon From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kerri Reno Sent: Tuesday, April 22, 2008 7:55 AM To: pgsql-general@postgresql.org Subject: Re: FW: Re: [GENERAL] create temp in function Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, Adrian Klaver [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: [GENERAL] How is statement level read consistency implemented?
Back to PostgreSQL, when a vacuum is issued, the old deleted rows are removed. This can be done with the autovacuum feature or manually. -Does vacuum make sure there are no long running queries referencing the deleted rows before it attempts to remove the old rows? It does. -With autovacuum, does it skip these rows still being referenced in a transaction or does it wait? It skips them, the idea being that a future vacuum will remove them. Awesome. In a large data warehouse, the snapshot too old error is very annoying and I'm glad PostgreSQL is superior to Oracle in this regard. :) Jon -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN says SELECT INTO is not currently supported within EXECUTE. I was using a temp table to get around the above problem. On 4/22/08, Roberts, Jon [EMAIL PROTECTED] wrote: Can you explain what you mean by the restriction to do SELECT INTO? Why are you using a temp table to begin with? Jon -- *From:* [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] *On Behalf Of *Kerri Reno *Sent:* Tuesday, April 22, 2008 7:55 AM *To:* pgsql-general@postgresql.org *Subject:* Re: FW: Re: [GENERAL] create temp in function Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, *Adrian Klaver* [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: [GENERAL] How is statement level read consistency implemented?
Roberts, Jon wrote: -With autovacuum, does it skip these rows still being referenced in a transaction or does it wait? It skips them, the idea being that a future vacuum will remove them. Awesome. In a large data warehouse, the snapshot too old error is very annoying and I'm glad PostgreSQL is superior to Oracle in this regard. :) Well, the disadvantage of the PostgreSQL way is that it keeps dead rows around for longer than they're actually needed, and so it causes some problems in pathological conditions -- for example when setting up large replication sets with Slony, or during a pg_dump, no dead rows can be removed. Since the Slony thing can take a very long time, dead rows start to pile up in a way that can really harm performance. We're currently working on it so that it is less of a problem for 8.4 ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
I tried that, too, and got a different error. NOTICE: query: select salary_schedule, pay_column, step from saltab07 where cp_id =32 ERROR: syntax error at or near $2 SQL state: 42601 Context: PL/pgSQL function get_salary_schedule line 15 at execute statement In the PG log file, it says: syntax error at or near $2 SELECT $1 into $2, $3, $4 My code is below CREATE OR REPLACE FUNCTION private.get_salary_schedule(integer, text) RETURNS record AS $BODY$ declare tcp_id alias for $1; tfy alias for $2; tbl text := 'saltab' || tfy; arow record; sched text; cl text; st integer; query text; begin query = 'select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query into sched, cl, st; arow := (sched, cl, st); return arow; end; $BODY$ LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER; I aslo tried: execute query into (sched, cl, st); which gave me 'syntax error on (' and execute query into arow; which gave me 'record arow is not assigned yet' Thanks! Kerri On 4/22/08, Pavel Stehule [EMAIL PROTECTED] wrote: Hello you don't need upgrade to 8.3. Just use dynamic statements. Like: BEGIN EXECUTE 'CREATE TEMP TABLE a ...'; a) EXECUTE 'SELECT * FROM a WHERE ...' INTO somevars; b) FOR vars IN EXECUTE 'SELECT * FROM .. ' LOOP Regards Pavel Stehule -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: FW: Re: [GENERAL] create temp in function
On Tuesday 22 April 2008 6:26 am, Kerri Reno wrote: http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html#PLPGSQL-S TATEMENTS-EXECUTING-DYN says SELECT INTO is not currently supported within EXECUTE. In 8.2 EXECUTE INTO is supported.; The INTO clause specifies where the results of a SQL command returning rows should be assigned. If a row or variable list is provided, it must exactly match the structure of the query's results (when a record variable is used, it will configure itself to match the result structure automatically). If multiple rows are returned, only the first will be assigned to the INTO variable. If no rows are returned, NULL is assigned to the INTO variable. If no INTO clause is specified, the query results are discarded. I was using a temp table to get around the above problem. On 4/22/08, Roberts, Jon [EMAIL PROTECTED] wrote: Can you explain what you mean by the restriction to do SELECT INTO? Why are you using a temp table to begin with? Jon -- *From:* [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] *On Behalf Of *Kerri Reno *Sent:* Tuesday, April 22, 2008 7:55 AM *To:* pgsql-general@postgresql.org *Subject:* Re: FW: Re: [GENERAL] create temp in function Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, *Adrian Klaver* [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·. -- Adrian Klaver [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
I'll ask again in a different way. What is the purpose of your dynamic sql and/or temp table? Don't tell me anything about using select into. What is the business purpose of the function? An appropriate answer would be I'm trying calculate x or I'm trying to determine y by looking at x. It looks like you are trying to use a temp table when an inline view would be more appropriate. Jon From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kerri Reno Sent: Tuesday, April 22, 2008 8:27 AM To: Roberts, Jon Cc: pgsql-general@postgresql.org Subject: Re: FW: Re: [GENERAL] create temp in function http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN says SELECT INTO is not currently supported within EXECUTE. I was using a temp table to get around the above problem. On 4/22/08, Roberts, Jon [EMAIL PROTECTED] wrote: Can you explain what you mean by the restriction to do SELECT INTO? Why are you using a temp table to begin with? Jon From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kerri Reno Sent: Tuesday, April 22, 2008 7:55 AM To: pgsql-general@postgresql.org Subject: Re: FW: Re: [GENERAL] create temp in function Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, Adrian Klaver [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: FW: Re: [GENERAL] create temp in function
So the reason I'm getting the error is that I'm running it in 8.0. Thanks so much for your help! Kerri On 4/22/08, Adrian Klaver [EMAIL PROTECTED] wrote: On Tuesday 22 April 2008 6:26 am, Kerri Reno wrote: http://www.postgresql.org/docs/8.2/static/plpgsql-statements.html#PLPGSQL-S TATEMENTS-EXECUTING-DYN says SELECT INTO is not currently supported within EXECUTE. In 8.2 EXECUTE INTO is supported.; The INTO clause specifies where the results of a SQL command returning rows should be assigned. If a row or variable list is provided, it must exactly match the structure of the query's results (when a record variable is used, it will configure itself to match the result structure automatically). If multiple rows are returned, only the first will be assigned to the INTO variable. If no rows are returned, NULL is assigned to the INTO variable. If no INTO clause is specified, the query results are discarded. I was using a temp table to get around the above problem. On 4/22/08, Roberts, Jon [EMAIL PROTECTED] wrote: Can you explain what you mean by the restriction to do SELECT INTO? Why are you using a temp table to begin with? Jon -- *From:* [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] *On Behalf Of *Kerri Reno *Sent:* Tuesday, April 22, 2008 7:55 AM *To:* pgsql-general@postgresql.org *Subject:* Re: FW: Re: [GENERAL] create temp in function Thanks to all who responded. I now know why execute will help this problem, but then it causes a new problem. The example I sent you was trivial, trying to get to the bottom of the issue. What I'm really trying to is get past the restriction of execute to do SELECT INTO. That's why I created a temporary table, so that that command could be dynamic, and then do a SELECT INTO from that table. Because of the planning issue that won't work. I can't upgrade to 8.3 at this time (long story). Any ideas how to do this? Here is a section of my code. begin query = 'create temp table schedrec as select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; execute query; select into relid distinct(attrelid) from pg_attribute where attrelid='schedrec'::regclass; raise notice 'relid: %', relid; raise notice 'about to do select'; select into arow * from schedrec limit 1; drop table schedrec; return arow; end; Thanks so much! Kerri On 4/21/08, *Adrian Klaver* [EMAIL PROTECTED] wrote: -- Original message -- From: Kerri Reno [EMAIL PROTECTED] Adrian, I don't understand. Why do I need to use execute? It runs fine the first time. The second time it bombs, because it's not seeing schedrec correctly. Which part should be in an execute query statement? plpgsql caches query plans. In versions prior to 8.3 this meant that the first time you ran a function the plans for the statements where cached for use by later runs of the function in the same session. The error you are getting about OID missing means the function is looking for the OID of the temp table as it was cached in the first run and not finding it. To get around this you need to EXECUTE the create temp table statement. This causes the plan not be cached but run anew for each call of the function. If you follow the link I included in the previous email you will see some examples. -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·. -- Adrian Klaver [EMAIL PROTECTED] -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: [GENERAL] table as log (multiple writers and readers)
David Wilson wrote: On Mon, Apr 21, 2008 at 7:55 PM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: If you want to clean up the the staging table I have some concerns about the advisory lock. I think you mean exclusive table lock. Either works, really. An advisory lock is really just a lock over which you have control of the meaning, as long as you're using it in the appropriate places. Also, an advisory lock on just the processes doing staging-to-log moves would allow writes into the staging table to continue concurrently with the staging-to-log transaction (whereas an exclusive lock would unnecessarily prevent them). Describe the mechanism, because I don't really believe it yet. I think you need to do a advisory lock around every commit of every transaction that writes to the log table. If you are only using the advisory lock in the staging-to-log transaction, how would this prevent newly committed tuples to not show up during this process? (You can't both delete and insert in a single statement, I believe, in which case you won't have a problem anyways). Also, while Vance appears to have chosen to have a dedicated staging-to-log process, even that isn't necessary- each reader can simply do the lock/clear staging/unlock before any attempt to read- unless you're polling that log table at truly crazy rates, the overhead should be negligible and will ensure that the staging table is simply cleared out whenever necessary while removing the complexity of a separate process. Using serialization mode for the staging-to-log process seems to be the most efficient methods, as it won't even block writers. - Joris -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] better error handling for COPY from stdin
if you have .sql file with COPY table FROM stdin; and the table doesn't exist, psql prints a lot of errors from lines which contain only data for copy. to reproduce: = echo -ne COPY non_existing_table FROM stdin;\n1\n\\N\n2\n\\.\n | psql ERROR: relation non_existing_table does not exist invalid command \N invalid command \. ERROR: syntax error at or near 1 LINE 1: 1 ^ would it be possible to make it smarter, so it will not treat datalines from failed copy as statements? i see a lot of questions that postgresql says command \N is invalid - which is of course rubbish, but making psql a bit smarter would make all of these question go away. regards, depesz -- quicksil1er: postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV! :) http://www.depesz.com/ - blog dla ciebie (i moje CV) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How is statement level read consistency implemented?
On Apr 22, 2008, at 8:35 AM, Alvaro Herrera wrote: Roberts, Jon wrote: -With autovacuum, does it skip these rows still being referenced in a transaction or does it wait? It skips them, the idea being that a future vacuum will remove them. Awesome. In a large data warehouse, the snapshot too old error is very annoying and I'm glad PostgreSQL is superior to Oracle in this regard. :) Well, the disadvantage of the PostgreSQL way is that it keeps dead rows around for longer than they're actually needed, and so it causes some problems in pathological conditions -- for example when setting up large replication sets with Slony, or during a pg_dump, no dead rows can be removed. Since the Slony thing can take a very long time, dead rows start to pile up in a way that can really harm performance. In addition or rather, another potential issue, if you have a REALLY long transaction running then you can risk transaction id wraparound. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table as log (multiple writers and readers)
On Tue, Apr 22, 2008 at 9:52 AM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: Describe the mechanism, because I don't really believe it yet. I think you need to do a advisory lock around every commit of every transaction that writes to the log table. Consider some number of reader processes and some number of writer processes. Writer processes touch only the staging table, and solely do inserts into it. As a result, writer processes cannot interfere with each other in any way and do not require any synchronization beyond that provided by MVCC. Reader processes are interested in polling the logging table at intervals. In the process, they also act as staging-to-log movers. This act (because it is destructive and because we require serialized inserts for id generation in the log table) must take a lock that prevents other readers from attempting the same work at the same time. Each reader process therefore has a loop that appears as follows: 1) Obtain advisory lock. 2) Begin transaction. 3) For each row in staging table, insert copy into log table. 4) Delete all visible rows from staging table. 5) Commit transaction. 6) Release advisory lock. 7) Handle not-yet-seen rows in the logging table (This is the primary work of the readers) 8) Sleep for the desired interval and return to 1). We require two types of synchronization and the above takes care of both: 1) The advisory lock prevents multiple readers from doing simultaneous staging-to-log moves. 2) The transaction block ensures that the reader will see a consistent state on the staging table while writers may write at the same time; writes that occur during the reader's transaction block will simply be ignored during this round of reading. You need both types of synchronization to avoid problems- taking an exclusive lock would simply be the sledgehammer method of doing the synchronization, since it would take the place of both the advisory lock and the transaction at the same time but would also block writers. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, 22 Apr 2008 20:58:30 +0800 Craig Ringer wrote: D. Dante Lorenso wrote: The problem is that once I create a column in my account table that uses this 'account_type' datatype, I can't seem to change or add to it any more. I want to add a new value or edit/delete an existing one. How do you make changes to an ENUM datatype that is already in use? As far as I know ENUM is not well suited to uses where new enumeration members may be added later. A lookup table and a foreign key is probably better for this sort of use. I remember the discussions before PG implemented ENUMs at all - some people voted against this feature because they knew that questions about modifing the enum values would pop up sooner or later. You *can* add elements to an enum type - sort of - by creating a new type and converting columns. It's ugly, though, and will be hard to get right when the column of interest is referenced by foreign keys and such. If there's one request to modify a specific ENUM column, other requests will follow because the enum set is not as static as it seems in the first place. So i beg that the only advise should be change the column to a foreign key construct. Converting columns to new data types is much more overhead anyway. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] can't cast char to in
I have a table with a mileage column that is a character varying (please don't ask why :). I need to do a query where mileage 500 select * from cars where mileage500 So I need to cast it but everything I try throws an error such as : ERROR: invalid input syntax for integer: + How can I cast this? Thanks!
Re: [GENERAL] can't cast char to in
On Apr 22, 2008, at 10:34 AM, blackwater dev wrote: I have a table with a mileage column that is a character varying (please don't ask why :). I need to do a query where mileage 500 select * from cars where mileage500 So I need to cast it but everything I try throws an error such as : ERROR: invalid input syntax for integer: + How can I cast this? Well, you didn't really give any real information on the format of the data in your mileage column. However, my guess is that you've got at least one row with just '+' in the mileage column which is not the same thing as '+0'. You'll probably need to do a little data cleaning and, once that's done you should definitely consider switching that to an integer/numeric data type. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] can't cast char to in
Yeah, it was my being stupid, I got it going now. Thanks! On Tue, Apr 22, 2008 at 11:42 AM, Erik Jones [EMAIL PROTECTED] wrote: On Apr 22, 2008, at 10:34 AM, blackwater dev wrote: I have a table with a mileage column that is a character varying (please don't ask why :). I need to do a query where mileage 500 select * from cars where mileage500 So I need to cast it but everything I try throws an error such as : ERROR: invalid input syntax for integer: + How can I cast this? Well, you didn't really give any real information on the format of the data in your mileage column. However, my guess is that you've got at least one row with just '+' in the mileage column which is not the same thing as '+0'. You'll probably need to do a little data cleaning and, once that's done you should definitely consider switching that to an integer/numeric data type. Erik Jones DBA | Emma(R) [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com
Re: [GENERAL] can't cast char to in
blackwater dev wrote: I have a table with a mileage column that is a character varying (please don't ask why :). Why? :-) I need to do a query where mileage 500 select * from cars where mileage500 So I need to cast it but everything I try throws an error such as : ERROR: invalid input syntax for integer: + Once you've cleaned your data, I would do one of two things: 1. Add a constraint to restrict the values the mileage column will accept: ALTER TABLE cars ADD CONSTRAINT valid_mileage CHECK (mileage ~ '^[+]?[0-9]+$'); 2. You can alter the type on-the-fly too: ALTER TABLE mileage_test ALTER COLUMN mileage TYPE integer USING (mileage::int); Note that you'll need to remove the constraint from #1 if you've applied that. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record
Hi, I am running postgres-8.1 on a Centos5 system. this morning when I came in there was no more disk space on the partition that holds pgsql: /var/lib/pgsql. I believe postgres had crashed at this point. I stupidly removed all of the files under pg_xlog to create disk space. Not knowing that those are the files I need to start the database. I quickly realized the error I had made and restored the pg_xlog data/files from our system backup that ran at 1am last night (4/22/08). Now when I try to restart the database, I get the following error. It is looking for file 000100130064, which I do not have. LOG: database system was shut down at 2008-04-22 10:38:59 EDT LOG: could not open file pg_xlog/ (log file 19, segment 100): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file pg_xlog/000100130064 (log file 19, segment 100): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 6800) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down I am in the process of having the WAL dir restored as well. Again, in my haste, I deleted that dir as well. This data will be from 4/22/08 at 1pm as well, time of the system backup. If I do not have the files that startup script is asking for, what is the best way to restart the database. Any advise (other than not deleting important dirs) would be greatly appreciated. -- Thanks Mary Ellen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record
Some time ago I had a similar problem with Postgresql 7.3. If I remember correctly running pg_resetxlog helped; check out http://www.postgresql.org/docs/8.3/static/app-pgresetxlog.html Michael. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mary Ellen Fitzpatrick Sent: Tuesday, April 22, 2008 1:14 PM To: PGSQL General Subject: [GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record Hi, I am running postgres-8.1 on a Centos5 system. this morning when I came in there was no more disk space on the partition that holds pgsql: /var/lib/pgsql. I believe postgres had crashed at this point. I stupidly removed all of the files under pg_xlog to create disk space. Not knowing that those are the files I need to start the database. I quickly realized the error I had made and restored the pg_xlog data/files from our system backup that ran at 1am last night (4/22/08). Now when I try to restart the database, I get the following error. It is looking for file 000100130064, which I do not have. LOG: database system was shut down at 2008-04-22 10:38:59 EDT LOG: could not open file pg_xlog/ (log file 19, segment 100): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file pg_xlog/000100130064 (log file 19, segment 100): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 6800) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down I am in the process of having the WAL dir restored as well. Again, in my haste, I deleted that dir as well. This data will be from 4/22/08 at 1pm as well, time of the system backup. If I do not have the files that startup script is asking for, what is the best way to restart the database. Any advise (other than not deleting important dirs) would be greatly appreciated. -- Thanks Mary Ellen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table as log (multiple writers and readers)
David Wilson wrote: On Tue, Apr 22, 2008 at 9:52 AM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: Describe the mechanism, because I don't really believe it yet. I think you need to do a advisory lock around every commit of every transaction that writes to the log table. Consider some number of reader processes and some number of writer processes. Writer processes touch only the staging table, and solely do inserts into it. As a result, writer processes cannot interfere with each other in any way and do not require any synchronization beyond that provided by MVCC. Reader processes are interested in polling the logging table at intervals. In the process, they also act as staging-to-log movers. This act (because it is destructive and because we require serialized inserts for id generation in the log table) must take a lock that prevents other readers from attempting the same work at the same time. Each reader process therefore has a loop that appears as follows: 1) Obtain advisory lock. 2) Begin transaction. 3) For each row in staging table, insert copy into log table. 4) Delete all visible rows from staging table. Ah, yes, all visible rows... My point is that, unless you use a transaction with serializable isolation, this all visible rows for the second statement might be different from those that you copied into the log table. With the normal Read committed isolation level you suffer from a possible nonrepeatable read that might change tuple visibility between different statements. 5) Commit transaction. 6) Release advisory lock. 7) Handle not-yet-seen rows in the logging table (This is the primary work of the readers) 8) Sleep for the desired interval and return to 1). We require two types of synchronization and the above takes care of both: 1) The advisory lock prevents multiple readers from doing simultaneous staging-to-log moves. 2) The transaction block ensures that the reader will see a consistent state on the staging table while writers may write at the same time; writes that occur during the reader's transaction block will simply be ignored during this round of reading. See above, you demand its impossible for nonrepeatable reads to occur. You need both types of synchronization to avoid problems- taking an exclusive lock would simply be the sledgehammer method of doing the synchronization, since it would take the place of both the advisory lock and the transaction at the same time but would also block writers. I agree with you on this, but it does guarentee the impossibility of a nonrepeatable read at the cost of concurrency. There seems to be a better solution indeed. - Joris -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table as log (multiple writers and readers)
On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: Ah, yes, all visible rows... My point is that, unless you use a transaction with serializable isolation, this all visible rows for the second statement might be different from those that you copied into the log table. With the normal Read committed isolation level you suffer from a possible nonrepeatable read that might change tuple visibility between different statements. That depends on implementation. A select into ... to do the initial copy followed by a delete where... with the where clause referencing the log table itself to ensure that we delete only things that now exist in the log table, or a row by row insert/delete pair. Either would provide the appropriate level of protection from accidental deletion of more things than you intended without harming concurrency. The delete referencing the log table might require that the log table be indexed for performance, but it's likely that such indexing would be done anyway for general log use. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record
I ran the pg_resetxlog -n to test and it found the correct values. Ran it for real and it worked. Thank you for the info/help Mary Ellen BRUSSER Michael wrote: Some time ago I had a similar problem with Postgresql 7.3. If I remember correctly running pg_resetxlog helped; check out http://www.postgresql.org/docs/8.3/static/app-pgresetxlog.html Michael. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mary Ellen Fitzpatrick Sent: Tuesday, April 22, 2008 1:14 PM To: PGSQL General Subject: [GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record Hi, I am running postgres-8.1 on a Centos5 system. this morning when I came in there was no more disk space on the partition that holds pgsql: /var/lib/pgsql. I believe postgres had crashed at this point. I stupidly removed all of the files under pg_xlog to create disk space. Not knowing that those are the files I need to start the database. I quickly realized the error I had made and restored the pg_xlog data/files from our system backup that ran at 1am last night (4/22/08). Now when I try to restart the database, I get the following error. It is looking for file 000100130064, which I do not have. LOG: database system was shut down at 2008-04-22 10:38:59 EDT LOG: could not open file pg_xlog/ (log file 19, segment 100): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file pg_xlog/000100130064 (log file 19, segment 100): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 6800) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: logger shutting down I am in the process of having the WAL dir restored as well. Again, in my haste, I deleted that dir as well. This data will be from 4/22/08 at 1pm as well, time of the system backup. If I do not have the files that startup script is asking for, what is the best way to restart the database. Any advise (other than not deleting important dirs) would be greatly appreciated. -- Thanks Mary Ellen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table as log (multiple writers and readers)
On Wed, Apr 23, 2008 at 12:29 AM, David Wilson [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: Ah, yes, all visible rows... My point is that, unless you use a transaction with serializable isolation, this all visible rows for the second statement might be different from those that you copied into the log table. With the normal Read committed isolation level you suffer from a possible nonrepeatable read that might change tuple visibility between different statements. That depends on implementation. A select into ... to do the initial copy followed by a delete where... with the where clause referencing the log table itself to ensure that we delete only things that now exist in the log table, or a row by row insert/delete pair. Either would provide the appropriate level of protection from accidental deletion of more things than you intended without harming concurrency. The delete referencing the log table might require that the log table be indexed for performance, but it's likely that such indexing would be done anyway for general log use. I think this plpgsql function would solve the problem of atomic read-and-delete operation... create or replace function log_rotate() returns void as $$ declare rec record; begin for rec in delete from t1 returning * loop insert into t2 values( rec.a, rec.b ); end loop; end; $$ language 'plpgsql'; select log_rotate(); Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [GENERAL] Can not restart postgres: Panic could not locate a valid checkpoint record
On Tue, Apr 22, 2008 at 1:00 PM, Mary Ellen Fitzpatrick [EMAIL PROTECTED] wrote: I ran the pg_resetxlog -n to test and it found the correct values. Ran it for real and it worked. Note that on most unix systems, some percentage of disk storage is reserved for use by root only. On linux with ext2/3 you can change this % with the tune2fs command... tune2fs -m 0 /dev/sdb2 would set the drive mounted from /dev/sdb2 to have 0 reserved % for root, and may be enough in the future to get you out of this situation. Be sure to set it back to 1 or 2 % after you've cleaned up enough so you have the free space again when you need it. The same thing can be done with quotas... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table as log (multiple writers and readers)
Gurjeet Singh wrote: On Wed, Apr 23, 2008 at 12:29 AM, David Wilson [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Ah, yes, all visible rows... My point is that, unless you use a transaction with serializable isolation, this all visible rows for the second statement might be different from those that you copied into the log table. With the normal Read committed isolation level you suffer from a possible nonrepeatable read that might change tuple visibility between different statements. That depends on implementation. A select into ... to do the initial copy followed by a delete where... with the where clause referencing the log table itself to ensure that we delete only things that now exist in the log table, or a row by row insert/delete pair. Either would provide the appropriate level of protection from accidental deletion of more things than you intended without harming concurrency. The delete referencing the log table might require that the log table be indexed for performance, but it's likely that such indexing would be done anyway for general log use. Of course, point is, that is another way to define visibility in this context: if present in log table. Point is, a suitable definition is needed. I think this plpgsql function would solve the problem of atomic read-and-delete operation... create or replace function log_rotate() returns void as $$ declare rec record; begin for rec in delete from t1 returning * loop insert into t2 values( rec.a, rec.b ); end loop; end; $$ language 'plpgsql'; select log_rotate(); Don't forget ordering, this was important before... START TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT ... INTO log FROM staging ORDER BY ...; DELETE FROM staging; COMMIT; Don't know if that ORDER BY works. It should in this case. - Joris -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
On Apr 21, 2008, at 10:44 PM, Christophe wrote: I'd like a tool that would automatically create these scripts, and I wondered if anything like this existed. The theory would be that it would consider two databases a and b, and produce the appropriate script to change b's schema to match a. Does anything like this exist? If not, I might have a new project... However it happens, you need to *capture* the schema changes that need to be applied to each database. I don't think you can just compare schemas and guarantee that database A will be transformed in exactly the same way as database B. For example, suppose the last text column name in database B was different from A. How can you determine if the column was renamed or if the column was dropped and a new column was added? The semantics of that difference could be very important. It would be nice if PostgreSQL had some kind of unique reference for the column, but I think columns are just numbered sequentially as they are added. It would also be neat to have a built-in way to log the schema changes. John DeSoi, Ph.D. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Rapidly decaying performance repopulating a large table
I have a fairly simple table (a dozen real/integer columns, a few indexes, one foreign key reference) with ~120m rows. Periodically the table is truncated or dropped and recreated and the data is regenerated (slightly different data, of course, or the exercise would be rather pointless). The regeneration occurs in batches of ~4000 data points at a time, which are inserted into the table via COPY, and are coming from several simultaneous processes. The first several million data points are quite quick (the copy executes in well under a quarter second). By the time the table reaches 10-15m rows, however, each individual COPY is taking 20 seconds to execute. Is there anything I can do to improve this performance? I can't drop/recreate the indices because some of the data points rely on points generated already in the run, and dropping the indices would make the required joins ridiculously expensive once the table starts growing. The foreign key reference *is* droppable for this regeneration, but I wouldn't expect it to be a performance problem. The autovacuum daemon is running in the background, with these settings: (All autovacuum-specific settings are still at defaults) vacuum_cost_delay = 50 # 0-1000 milliseconds vacuum_cost_page_hit = 1# 0-1 credits vacuum_cost_page_miss = 10 # 0-1 credits vacuum_cost_page_dirty = 20 # 0-1 credits vacuum_cost_limit = 200 # 1-1 credits My gut feeling is that better autovacuum settings would help, but I'm not really sure what to modify to get the improvement I'm looking for. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 2:31 PM, David Wilson [EMAIL PROTECTED] wrote: I have a fairly simple table (a dozen real/integer columns, a few indexes, one foreign key reference) with ~120m rows. Periodically the table is truncated or dropped and recreated and the data is regenerated (slightly different data, of course, or the exercise would be rather pointless). The regeneration occurs in batches of ~4000 data points at a time, which are inserted into the table via COPY, and are coming from several simultaneous processes. The first several million data points are quite quick (the copy executes in well under a quarter second). By the time the table reaches 10-15m rows, however, each individual COPY is taking 20 seconds to execute. Is there anything I can do to improve this performance? I can't drop/recreate the indices because some of the The best bet is to issue an analyze table (with your table name in there, of course) and see if that helps. Quite often the real issue is that pgsql is using a method to insert rows when you have 10million of them that made perfect sense when you had 100 rows, but no longer is the best way. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
If youre looking for a PHP solution, theres: http://ezcomponents.org/docs/api/latest/introduction_DatabaseSchema.html Its very limited but works for simple schemas, I begun extending it to support more rigorously SQL-92 schema information (will be released BSD license at openmv.com). From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kerri Reno Sent: April 22, 2008 8:44 AM To: Christophe Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Schema migration tools? Christophe - We use LiquiBase: http://www.liquibase.org/home We don't use it quite the same as what you're proposing, but I think you could use it that way. When we found it, we did 'Generate Changelog' to start, then made new changes to the ChangeLog, and applied it to each database as needed. It has really helped us, because we were forgetting to apply the changes to the production database that were needed. Hope this helps! Kerri On 4/21/08, Christophe [EMAIL PROTECTED] wrote: Greetings, We have the traditional three servers: dev -- staging -- production each with a PostgreSQL instance and the same schema, at least over time. Right now, we handle schema migration (updating the schema for new releases) by manually-created scripts that apply the changes to staging and production. I'd like a tool that would automatically create these scripts, and I wondered if anything like this existed. The theory would be that it would consider two databases a and b, and produce the appropriate script to change b's schema to match a. Does anything like this exist? If not, I might have a new project... Thanks! -- Christophe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Yuma Educational Computer Consortium Compass Development Team Kerri Reno [EMAIL PROTECTED] (928) 502-4240 .·:*¨¨*:·. .·:*¨¨*:·. .·:*¨¨*:·.
Re: [GENERAL] How to modify ENUM datatypes?
Andreas 'ads' Scherbaum wrote: On Tue, 22 Apr 2008 20:58:30 +0800 Craig Ringer wrote: D. Dante Lorenso wrote: The problem is that once I create a column in my account table that uses this 'account_type' datatype, I can't seem to change or add to it any more. I want to add a new value or edit/delete an existing one. How do you make changes to an ENUM datatype that is already in use? As far as I know ENUM is not well suited to uses where new enumeration members may be added later. A lookup table and a foreign key is probably better for this sort of use. I remember the discussions before PG implemented ENUMs at all - some people voted against this feature because they knew that questions about modifing the enum values would pop up sooner or later. You *can* add elements to an enum type - sort of - by creating a new type and converting columns. It's ugly, though, and will be hard to get right when the column of interest is referenced by foreign keys and such. If there's one request to modify a specific ENUM column, other requests will follow because the enum set is not as static as it seems in the first place. So i beg that the only advise should be change the column to a foreign key construct. Converting columns to new data types is much more overhead anyway. So, the advice here is don't use ENUM? I was really hoping that it would be more efficient to not have to do all the foreign keys and joins for tables that may have 4-5 enum types. Just being able to: SELECT * FROM tablename would be nice if my columns contained enums instead of doing: SELECT * FROM tablename, lookuptable WHERE tablename.some_id = lookuptable.some_id Isn't the join more expensive? -- Dante -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Apr 22, 2008, at 1:45 PM, D. Dante Lorenso wrote: So, the advice here is don't use ENUM? I think it's more Don't use ENUM for a type that you are planning to extend. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
On Apr 22, 2008, at 3:09 PM, John DeSoi wrote: It would be nice if PostgreSQL had some kind of unique reference for the column, but I think columns are just numbered sequentially as they are added. It would also be neat to have a built-in way to log the schema changes. It does: log_statement set to either 'all' or 'ddl' will do the trick. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 4:38 PM, Scott Marlowe [EMAIL PROTECTED] wrote: The best bet is to issue an analyze table (with your table name in there, of course) and see if that helps. Quite often the real issue is that pgsql is using a method to insert rows when you have 10million of them that made perfect sense when you had 100 rows, but no longer is the best way. This has caused the behavior to be... erratic. That is, individual copies are now taking anywhere from 2 seconds (great!) to 30+ seconds (back where we were before). I also clearly can't ANALYZE the table after every 4k batch; even if that resulted in 2 second copies, the analyze would take up as much time as the copy otherwise would have been. I could conceivably analyze after every ~80k (the next larger unit of batching; I'd love to be able to batch the copies at that level but dependencies ensure that I can't), but it seems odd to have to analyze so often. Oh, barring COPY delays I'm generating the data at a rate of something like a half million rows every few minutes, if that's relevant. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 2:59 PM, David Wilson [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 4:38 PM, Scott Marlowe [EMAIL PROTECTED] wrote: The best bet is to issue an analyze table (with your table name in there, of course) and see if that helps. Quite often the real issue is that pgsql is using a method to insert rows when you have 10million of them that made perfect sense when you had 100 rows, but no longer is the best way. This has caused the behavior to be... erratic. That is, individual copies are now taking anywhere from 2 seconds (great!) to 30+ seconds (back where we were before). I also clearly can't ANALYZE the table after every 4k batch; even if that resulted in 2 second copies, the analyze would take up as much time as the copy otherwise would have been. I could conceivably analyze after every ~80k (the next larger unit of batching; I'd love to be able to batch the copies at that level but dependencies ensure that I can't), but it seems odd to have to analyze so often. Normally, after the first 50,000 or so the plan won't likely change due to a new analyze, so you could probably just analyze after 50k or so and get the same performance. If the problem is a bad plan for the inserts / copies. also, non-indexed foreign keyed fields can cause this problem. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, Apr 22, 2008 at 3:31 PM, D. Dante Lorenso [EMAIL PROTECTED] wrote: All, In the past I have used foreign keys to lookup tables for small lists of values that I now think ENUM could do the job of. I was hoping that by using ENUM, I would avoid having to do joins in my queries, and that I'd be making the data more robust and faster. I used to have a table for account_status: A | Active B | Billing Failed C | Closed D | Deactivated account.acct_type CHAR references account_type.acct_type CHAR But, now I've converted that to an ENUM: ACTIVE BILLING_FAILED CLOSED DEACTIVATED account.acct_type ENUM account_type The problem is that once I create a column in my account table that uses this 'account_type' datatype, I can't seem to change or add to it any more. I want to add a new value or edit/delete an existing one. How do you make changes to an ENUM datatype that is already in use? I agree with others that ENUMs stop looking pretty when you need to modify them... Here's a thread from recent past where this exact problem was discussed... maybe it'll interest you... http://archives.postgresql.org/pgsql-general/2007-12/msg01363.php Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, Apr 22, 2008 at 2:45 PM, D. Dante Lorenso [EMAIL PROTECTED] wrote: So, the advice here is don't use ENUM? I was really hoping that it would be more efficient to not have to do all the foreign keys and joins for tables that may have 4-5 enum types. Just being able to: SELECT * FROM tablename If you use a lookup table methodology you still get that. Try this: smarlowe=# create table choices (color text primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index choices_pkey for table choices CREATE TABLE smarlowe=# insert into choices values ('red'),('yellow'),('green'),('blue'); INSERT 0 4 smarlowe=# create table mystuff (id serial primary key, usenam text, mycolor text references choices(color)); NOTICE: CREATE TABLE will create implicit sequence mystuff_id_seq for serial column mystuff.id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index mystuff_pkey for table mystuff CREATE TABLE smarlowe=# insert into mystuff (usenam, mycolor) values ('scott','red'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('darren','blue'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('dan','green'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('steve','green'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('mike','black'); ERROR: insert or update on table mystuff violates foreign key constraint mystuff_mycolor_fkey DETAIL: Key (mycolor)=(black) is not present in table choices. smarlowe=# select * from mystuff; id | usenam | mycolor ++- 1 | scott | red 2 | darren | blue 3 | dan| green 4 | steve | green (4 rows) tada! No enum, and no join. But you can't insert illegal values in mycolor... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 5:04 PM, Scott Marlowe [EMAIL PROTECTED] wrote: Normally, after the first 50,000 or so the plan won't likely change due to a new analyze, so you could probably just analyze after 50k or so and get the same performance. If the problem is a bad plan for the inserts / copies. also, non-indexed foreign keyed fields can cause this problem. Analyzing after the first 50k or so is easy enough, then; thanks for the suggestion. Foreign keys are definitely indexed (actually referencing a set of columns that the foreign table is UNIQUE on). Any other suggestions? COPY times alone are pretty much quadrupling my table-rebuild runtime, and I can interrupt the current rebuild to try things pretty much at a whim (nothing else uses the DB while a rebuild is happening), so I'm pretty much game to try any reasonable suggestions anyone has. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 3:15 PM, David Wilson [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 5:04 PM, Scott Marlowe [EMAIL PROTECTED] wrote: Normally, after the first 50,000 or so the plan won't likely change due to a new analyze, so you could probably just analyze after 50k or so and get the same performance. If the problem is a bad plan for the inserts / copies. also, non-indexed foreign keyed fields can cause this problem. Analyzing after the first 50k or so is easy enough, then; thanks for the suggestion. Foreign keys are definitely indexed (actually referencing a set of columns that the foreign table is UNIQUE on). Any other suggestions? COPY times alone are pretty much quadrupling my table-rebuild runtime, and I can interrupt the current rebuild to try things pretty much at a whim (nothing else uses the DB while a rebuild is happening), so I'm pretty much game to try any reasonable suggestions anyone has. Try upping your checkpoint segments. Some folks find fairly large numbers like 50 to 100 to be helpful. Each segment = 16Megs, so be sure not to run your system out of drive space while increasing it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, 22 Apr 2008 15:45:39 -0500 D. Dante Lorenso [EMAIL PROTECTED] wrote: I was really hoping that it would be more efficient to not have to do all the foreign keys and joins for tables that may have 4-5 enum types. Just being able to: SELECT * FROM tablename would be nice if my columns contained enums instead of doing: SELECT * FROM tablename, lookuptable WHERE tablename.some_id = lookuptable.some_id Isn't the join more expensive? You were using natural keys, the join would not be required. Joshua D. Drake -- Dante -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
On Apr 22, 2008, at 4:53 PM, Erik Jones wrote: It would be nice if PostgreSQL had some kind of unique reference for the column, but I think columns are just numbered sequentially as they are added. It would also be neat to have a built-in way to log the schema changes. It does: log_statement set to either 'all' or 'ddl' will do the trick. If I do this, is there a way to get a transaction consistent log of just the necessary commands to transform another copy of the database? In other words, I assume this approach will log each DDL command even if the transaction is rolled back. Correct? Thanks, John DeSoi, Ph.D. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
Scott Marlowe wrote: On Tue, Apr 22, 2008 at 2:45 PM, D. Dante Lorenso [EMAIL PROTECTED] wrote: So, the advice here is don't use ENUM? I was really hoping that it would be more efficient to not have to do all the foreign keys and joins for tables that may have 4-5 enum types. Just being able to: SELECT * FROM tablename If you use a lookup table methodology you still get that. Try this: smarlowe=# create table choices (color text primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index choices_pkey for table choices CREATE TABLE smarlowe=# insert into choices values ('red'),('yellow'),('green'),('blue'); INSERT 0 4 smarlowe=# create table mystuff (id serial primary key, usenam text, mycolor text references choices(color)); NOTICE: CREATE TABLE will create implicit sequence mystuff_id_seq for serial column mystuff.id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index mystuff_pkey for table mystuff CREATE TABLE smarlowe=# insert into mystuff (usenam, mycolor) values ('scott','red'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('darren','blue'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('dan','green'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('steve','green'); INSERT 0 1 smarlowe=# insert into mystuff (usenam, mycolor) values ('mike','black'); ERROR: insert or update on table mystuff violates foreign key constraint mystuff_mycolor_fkey DETAIL: Key (mycolor)=(black) is not present in table choices. smarlowe=# select * from mystuff; id | usenam | mycolor ++- 1 | scott | red 2 | darren | blue 3 | dan| green 4 | steve | green (4 rows) tada! No enum, and no join. But you can't insert illegal values in mycolor... This approach is so old-school, I seem to have overlooked the obvious. Here you've elected to use the foreign key to just control the possible values inserted but not really to look up the value. Seems you are storing the values in text form which goes against all the normalization techniques I've learned in school. I see this might be a problem with storage since you will need to store the TEXT value for every row in the 'mystuff' table instead of just storing the reference to the lookup table as an INTEGER. Over millions of rows, perhaps this would become a concern? What is the general consensus by the community about this approach? Is this de-normalization frowned upon, or is there a performance advantage here that warrants the usage? -- Dante -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 5:18 PM, Scott Marlowe [EMAIL PROTECTED] wrote: Try upping your checkpoint segments. Some folks find fairly large numbers like 50 to 100 to be helpful. Each segment = 16Megs, so be sure not to run your system out of drive space while increasing it. Ahh, much more progress. Upping the segments to 50, timeout to 30m and completion target to 0.9 has improved average copy time to between 2 and 10 seconds, which is definitely an improvement. Thanks for the help. Any other random thoughts while you're at it? :) -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schema migration tools?
On Apr 22, 2008, at 4:33 PM, John DeSoi wrote: On Apr 22, 2008, at 4:53 PM, Erik Jones wrote: It would be nice if PostgreSQL had some kind of unique reference for the column, but I think columns are just numbered sequentially as they are added. It would also be neat to have a built-in way to log the schema changes. It does: log_statement set to either 'all' or 'ddl' will do the trick. If I do this, is there a way to get a transaction consistent log of just the necessary commands to transform another copy of the database? In other words, I assume this approach will log each DDL command even if the transaction is rolled back. Correct? Right. It's not something you'd want to lift directly and run somewhere else. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Apr 22, 2008, at 4:46 PM, David Wilson wrote: On Tue, Apr 22, 2008 at 5:18 PM, Scott Marlowe [EMAIL PROTECTED] wrote: Try upping your checkpoint segments. Some folks find fairly large numbers like 50 to 100 to be helpful. Each segment = 16Megs, so be sure not to run your system out of drive space while increasing it. Ahh, much more progress. Upping the segments to 50, timeout to 30m and completion target to 0.9 has improved average copy time to between 2 and 10 seconds, which is definitely an improvement. Thanks for the help. Any other random thoughts while you're at it? :) Has anyone yet pointed out the standards: drop indexes and foreign keys and rebuild them once the entire data import is finished? Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
David Wilson [EMAIL PROTECTED] writes: Foreign keys are definitely indexed (actually referencing a set of columns that the foreign table is UNIQUE on). Are you loading any tables that are the targets of foreign key references from other tables being loaded? If so, I'd bet on Scott's theory being correct with respect to the plans for checks of those FK constraints. The COPY itself hasn't got any plan, and inserting rows into a table should be constant-time in itself, so it seems to me there are only two possibilities for a progressive slowdown: * the cost of updating the indexes, which for standard btree indexes ought to grow at about log(N) when there are already N entries * bad plans in either foreign-key triggers or user-defined triggers attached to the tables. You failed to mention what PG version this is (tut tut) but if it's less than 8.3 then ANALYZE alone won't fix bad plans in triggers; you'd need to analyze and then start a fresh database session. You said that you don't need to have the FK constraints present, so I'd strongly suggest trying it without ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, 2008-04-22 at 16:34 -0500, D. Dante Lorenso wrote: I see this might be a problem with storage since you will need to store the TEXT value for every row in the 'mystuff' table instead of just storing the reference to the lookup table as an INTEGER. Over millions of rows, perhaps this would become a concern? It does use additional storage to store the full text value, rather than a fixed-size integer. However, the difference is not much when the average string length is short. If you store an integer reference instead, joins are not necessarily expensive. If the number of distinct values is small (which is the normal use case for ENUM), I would expect the joins to be quite cheap. Beware of running into bad plans however, or making the optimizer work too hard (if you have a lot of other joins, too). I don't think the performance concerns are major, but worth considering if you have millions of rows. What is the general consensus by the community about this approach? Is this de-normalization frowned upon, or is there a performance advantage here that warrants the usage? This is not de-normalization, at all. Normalization is a formal process, and if this were de-normalization, you could find a specific rule that is violated by this approach. Look here: http://en.wikipedia.org/wiki/Third_normal_form If you go to higher normal forms, you will not find any violations there, either. There is nothing about normalization that requires the use of surrogate keys. The approach suggested by Scott Marlowe is normalized as well as being quite natural and simple. I think often this is overlooked as being too simple, but it's a quite good design in many cases. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 6:10 PM, Tom Lane [EMAIL PROTECTED] wrote: David Wilson [EMAIL PROTECTED] writes: Are you loading any tables that are the targets of foreign key references from other tables being loaded? If so, I'd bet on Scott's theory being correct with respect to the plans for checks of those FK constraints. The COPY itself hasn't got any plan, and inserting rows into a table should be constant-time in itself, so it seems to me there are only two possibilities for a progressive slowdown: * the cost of updating the indexes, which for standard btree indexes ought to grow at about log(N) when there are already N entries * bad plans in either foreign-key triggers or user-defined triggers attached to the tables. Only one table is being regenerated, and it's not the target of any foreign key checks itself; it merely has a single FK reference out to one unchanging table. There are no triggers on the table. You failed to mention what PG version this is (tut tut) but if it's less than 8.3 then ANALYZE alone won't fix bad plans in triggers; you'd need to analyze and then start a fresh database session. PG is 8.3.1. I certainly expect some slowdown, given that I have indices that I can't drop (as you indicate above). Having been watching it now for a bit, I believe that the checkpoint settings were the major cause of the problem, however. Changing those settings has dropped the copy times back down toward what I'd expect; I have also now dropped the FK constraint, but that has made no perceptible difference in time. My guess at this point is that I'm just running into index update times and checkpoint IO. The only thing that still seems strange is the highly variable nature of the COPY times- anywhere from 1.0 seconds to 20 seconds, with an average probably around 8ish. I can live with that, but I'm still open to any other suggestions anyone has! Thanks for the help so far. -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
David Wilson [EMAIL PROTECTED] writes: My guess at this point is that I'm just running into index update times and checkpoint IO. The only thing that still seems strange is the highly variable nature of the COPY times- anywhere from 1.0 seconds to 20 seconds, with an average probably around 8ish. I can live with that, but I'm still open to any other suggestions anyone has! What have you got shared_buffers set to? If it's not enough to cover the working set for your indexes, that might be the (other) problem. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: FW: Re: [GENERAL] create temp in function
Kerri Reno wrote: So the reason I'm getting the error is that I'm running it in 8.0. Thanks so much for your help! Kerri use FOR instead CREATE OR REPLACE FUNCTION private.get_salary_schedule(integer, text) RETURNS record AS $BODY$ declare tcp_id alias for $1; tfy alias for $2; tbl text := 'saltab' || tfy; arow record; query text; begin query = 'select ' || 'salary_schedule, pay_column, step from ' || tbl || ' where cp_id = ' || to_char(tcp_id,''); raise notice 'query: %', query; for arow in execute query loop return arow; end loop; return null; end; $BODY$ LANGUAGE 'plpgsql' STABLE SECURITY DEFINER; you probably need to protect tfy from sql injection too. see quote_ident. klint. -- Klint Gore Database Manager Sheep CRC A.G.B.U. University of New England Armidale NSW 2350 Ph: 02 6773 3789 Fax: 02 6773 3266 EMail: [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres Encoding conversion problem
On 04/22/2008 07:30 PM, Albe Laurenz wrote: Clemens Schwaighofer wrote: I sometimes have a problem with conversion of encodings eg from UTF-8 tio ShiftJIS: ERROR: character 0xf0a0aeb7 of encoding UTF8 has no equivalent in SJIS I have no idea what character this is, I cannot view it in my browser, etc. It translates to Unicode 10BB7, which is not defined. I guess that is not intended; can you guess what the character(s) should be? to be honest no idea. its some chinese character, I have no idea how the user input this, because this is a japanese page. I actually found the carachter, but only my Mac OS X can show it. It looks similar to a japanese character used for a name, but how the chinese one got selected is a mystery to me ... Are you sure that your Mac OS X computer interprets the character as UTF-8? That I cannot be sure, I just searched through a page that has a complete list. OS X can render it, Linux cannot, I have not tried windows. -- [ Clemens Schwaighofer -=:~ ] [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, Apr 22, 2008 at 7:33 PM, Tom Lane [EMAIL PROTECTED] wrote: What have you got shared_buffers set to? If it's not enough to cover the working set for your indexes, that might be the (other) problem. shared_buffers = 1536MB Is there a way to get the size of a specific index, on that note? There seem to be access functions for the relation + indices, and for the relation by itself, but not a specific index out of possibly several. I could increase shared_buffers some, but client apps on the same machine occasionally also have hefty memory requirements (not during these regeneration runs, but it seems like restarting the server with a new shared_buffers value before and after the regeneration is a bit of overkill). -- - David T. Wilson [EMAIL PROTECTED] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to modify ENUM datatypes?
On Tue, Apr 22, 2008 at 6:11 PM, Jeff Davis [EMAIL PROTECTED] wrote: If you store an integer reference instead, joins are not necessarily expensive. If the number of distinct values is small (which is the normal use case for ENUM), I would expect the joins to be quite cheap. Beware of running into bad plans however, or making the optimizer work too hard (if you have a lot of other joins, too). Necessarily being the operative word here. Think about an enum as part of a composite key for example. It's a lot nicer to rely on enum for natural ordering than doing something like a functional index. Anyways, it's pretty easy to extend an enum...you can manually insert an entry into pg_enum (see the relevent docs). Just watch out for oid overlap. One thing currently that is very difficult currently to do is to alter the order of the enum elements. The current state of things is pretty workable though. Scott's color/mystuff example is generally preferred for a lot of cases. I _really_ prefer this to surrogate style enums where you have color_id...this approach makes your database unreadable IMO. A decent hybrid approach which I have been using lately is char (not char) where the choices set is reasonably small, well represented by a single character, and the intrinsic ordering property is not too important (where an enum might be better). In many cases though, the pure natural approach is simply the best. The enum though with is intrinsic ordering and more efficient indexing has an important niche however. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rapidly decaying performance repopulating a large table
On Tue, 22 Apr 2008, David Wilson wrote: My guess at this point is that I'm just running into index update times and checkpoint IO. The only thing that still seems strange is the highly variable nature of the COPY times- anywhere from 1.0 seconds to 20 seconds, with an average probably around 8ish. Have you turned on log_checkpoints to see whether those are correlated with the slow ones? Given that you've had an improvement by increasing checkpoint_segments, it's not out of the question to think that maybe you're still getting nailed sometimes during the more stressful portions of the checkpoint cycle (usually right near the end). The 1 second ones just might just happen to be ones that start just as the previous checkpoint finished. To make lining those up easier, you might turn on logging of long statements with log_min_duration_statement to see both bits of data in the same log file. That might get you some other accidental enlightenment as well (like if there's some other statement going on that's colliding with this load badly). This is a bit out of my area, but after reading the rest of this thread I wonder whether raising the default_statistics_target parameter a bit might reduce the instances of bad plans showing up. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general