[GENERAL] What is this vacuum doing?
I have a vacuum process that is sitting around and apparently not doing anything. It's been around over 2000 seconds and is eating up no cpu. It isn't waiting on a lock. Backtrace is this: #0 0x00367aed4ff7 in semop () from /lib64/libc.so.6 #1 0x005d2a83 in PGSemaphoreLock (sema=0x2ace8a5721f0, interruptOK=1 '\001') at pg_sema.c:418 #2 0x005fa275 in LockBufferForCleanup (buffer=138743) at bufmgr.c:2471 #3 0x00476ff0 in btvacuumscan (info=0x7fff3b3dcfc0, stats=0x1eb0a900, callback=0x551250 lazy_tid_reaped, callback_state=0x1eae2db0, cycleid=45547) at nbtree.c:856 #4 0x004774c5 in btbulkdelete (fcinfo=value optimized out) at nbtree.c:574 #5 0x006d100a in FunctionCall4Coll (flinfo=0xfffc, collation=993900848, arg1=1, arg2=18446744073709551615, arg3=0, arg4=140734187288784) at fmgr.c:1371 #6 0x005511c4 in lazy_vacuum_index (indrel=0x2ace8b697ab8, stats=0x1eb09cc8, vacrelstats=0x1eae2db0) at vacuumlazy.c:1020 #7 0x00551a77 in lazy_vacuum_rel (onerel=0x2ace8b696470, vacstmt=value optimized out, bstrategy=value optimized out) at vacuumlazy.c:843 #8 0x005507e0 in vacuum_rel (relid=3471206537, vacstmt=0x7fff3b3de740, do_toast=value optimized out, for_wraparound=0 '\0') at vacuum.c:1042 #9 0x00550b0b in vacuum (vacstmt=0x7fff3b3de740, relid=value optimized out, do_toast=0 '\0', bstrategy=value optimized out, for_wraparound=0 '\0', isTopLevel=value optimized out) at vacuum.c:227 #10 0x005d5d57 in do_autovacuum () at autovacuum.c:2739 #11 0x005d6393 in AutoVacWorkerMain (argc=value optimized out, argv=value optimized out) at autovacuum.c:1641 #12 0x005d6486 in StartAutoVacWorker () at autovacuum.c:1428 #13 0x005e0d22 in sigusr1_handler (postgres_signal_arg=value optimized out) at postmaster.c:4588 #14 signal handler called #15 0x00367aecce83 in __select_nocancel () from /lib64/libc.so.6 #16 0x005dd7dc in ServerLoop () at postmaster.c:1415 #17 0x005dec97 in PostmasterMain (argc=2, argv=0x1ea0bd90) at postmaster.c:1116 #18 0x0058352e in main (argc=2, argv=value optimized out) at main.c:199 So what is this doing? This is a problem because other threads are blocking on this one, and as soon as I kill -INT it it respawns so I can't get rid of it. pg version is: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] VIEW still referring to old name of field
Hi, This is a case where I changed the name of a field in a table that a VIEW referred to, but the VIEW definition still points to the old name of the field. The surprise is that the VIEW still works (with live data). Excerpt from psql (v9.1.2) given below. The problem came up when I took a schema dump and restored this on another server which crashed on this VIEW since now the field referred by the VIEW isn't there any more. The same error is shown if (on this server) I copy the VIEW definition and execute directly (which is expected, since the field name is incorrect in the definition). I think an easy way to solve this is to do a drop/create for the VIEW, but I think this is a bug and its better if found / resolved, if possible. Let me know if I can provide any more input on this. I have this server for another day or two, and then do let me know if running a command on this server may help. Further, (I am unsure here) but I believe the field name was changed ~1-2 weeks back and the server was restarted just the day before. Is it possible that this survives a restart as well? Thanks -- Robins Tharakan == [pgsql@server /webstats/pgsql]$ psql psql (9.1.2) Type help for help. pgsql=# \c aproject You are now connected to database aproject as user pgsql. aproject=# \d ui.thisview View ui.thisview Column | Type | Modifiers --++--- pid | integer| product_name | character varying(100) | product_usage| bigint | product_usage_percentage | numeric(10,2) | aproject=# select * from pg_views where viewname = 'thisview'; schemaname |viewname| viewowner | definition ++---+ ui | thisview | pgsql | WITH x AS (SELECT session.pid, count(*) AS product_usage FROM ((s.history JOIN s.session USING (session_id)) JOIN s.product USING (pid)) WHERE (((history.datetime_entry = (now() - '90 days'::interval)) AND (history.datetime_entry = now())) AND (session.pid IS NOT NULL)) GROUP BY session.pid) SELECT x.pid, product.product_name, x.product_usage, x.product_usage * 100))::numeric / (SELECT sum(x.product_usage) AS count FROM x)))::numeric(10,2) AS product_usage_percentage FROM (x JOIN s.product USING (pid)); (1 row) aproject=# select * from ui.thisview limit 1; pid | product_name | product_usage | product_usage_percentage -+-+---+-- 14 | Unknown Product | 700 | 2.02 (1 row) aproject=# WITH x AS (SELECT session.pid, count(*) AS product_usage FROM ((s.history JOIN s.session USING (session_id)) JOIN s.product USING (pid)) WHERE (((history.datetime_entry = (now() - '90 days'::interval)) AND (history.datetime_entry = now())) AND (session.pid IS NOT NULL)) GROUP BY session.pid) SELECT x.pid, product.product_name, x.product_usage, x.product_usage * 100))::numeric / (SELECT sum(x.product_usage) AS count FROM x)))::numeric(10,2) AS product_usage_percentage FROM (x JOIN s.product USING (pid)); ERROR: column pid specified in USING clause does not exist in right table aproject=# \d s.product Table s.product Column| Type | Modifiers --++ product_id | integer| not null default nextval('s.product_id_seq'::regclass) product_name | character varying(100) | not null Indexes: product_pkey PRIMARY KEY, btree (product_id) Referenced by: TABLE s.session CONSTRAINT fk_product_id FOREIGN KEY (pid) REFERENCES s.product(product_id)
Re: [GENERAL] phantom long-running query (check_postgres.pl reports a long-running query, but the database logs don't show it)
On Πεμ 09 Φεβ 2012 02:32:37 Aleksey Tsalolikhin wrote: I got an alert from check_postgres.pl today on a long-running query on our production database, but our PostgreSQL 8.4.9 server log, which is configured to log queries over 5 seconds long (log_min_duration_statement = 5000) does not show the query. check_postgres.pl showed: Date/Time: Wed Feb 8 08:41:36 PST 2012 POSTGRES_QUERY_TIME WARNING: (host:xxx) longest query: 264s (database:xxx PID:xxx port:xxx address:xxx username:xxx) postgres log showed other long-running queries, but nothing over 48s. It's really interesting why the database server log does not show this query. Can you think of some scenario where check_postgres.pl would see a long-running query but it would not get logged by the database server? I checked the server log and there is nothing there for 08:41 at all. 08:40 and 08:42, yes, some queries over 5 secs, but nothing even close to 264 secs. I've added the verbose switch to my check_postgres.pl script so we'll have a record of what was the long-running query. You should either look at the source of this perl script to see what it is doing, or enable log_statement = 'all' in postgresql.conf and send the postmaster the -HUP signal, and then watch the postgresql log for the queries that the perl script is issueing. Then replay those queries by hand and examine them. Also you can always check pg_stat_activity table. Best, -at -- Achilleas Mantzios IT DEPT -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SOUNDEX call
On Wed, Feb 8, 2012 at 15:37, 84.le0n 84.l...@gmail.com wrote: This is an interesting solution, but I know a little bit PL/pgSQL and I don't know how provide SOUNDEX version in PL/pgSQL, I don't know SOUNDEX algo too. How can I provide soundex in PL/pgSQL ? I wrote and posted a PL/pgSQL version of soundex() on the PostgreSQL wiki: https://wiki.postgresql.org/wiki/Soundex You are free to use it under the terms of any version of the GPL license, MIT license or PostgreSQL license at your choice. If you end up using it, getting some credit in release notes would be appreciated. :) Regards, Marti -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock
Chris (et al.), Thanks for the reply. I have not replied sooner because I was hoping to get some more feedback from the list. I have a recently-migrated Pg cluster running 8.4.7 on Red Hat Enterprise Linux Client release 5.7 (Tikanga) in a VMware VM that is logging the subject warning. The application code is considerably old (deployed back in the 7.4 days, I believe) and the message is preceded by a call to select user_write_unlock(), which is a simple wrapper query that calls pg_advisory_unlock(). What are the causes and possible problems and side-effects of receiving such a log warning? Thank you for any assistance you can provide. Warning: you don't own a lock of type ExclusiveLock That means that you didn't have the lock you were trying to release. Whether that indicates a critical error or not depends on application logic. The application appears to be functioning without issue, but I still have some concerns. One possible cause of the warning I considered was having the VM underlying storage NFS-mounted on a NetApp. Is it possible that Postgres is not receiving a meaningful response with respect to ExclusiveLock locking (i.e. unable to really obtain an ExclusiveLock) due to VM disk residing on an NFS mount? -- Gary Chambers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] VIEW still referring to old name of field
Robins Tharakan robins.thara...@gmail.com writes: This is a case where I changed the name of a field in a table that a VIEW referred to, but the VIEW definition still points to the old name of the field. The surprise is that the VIEW still works (with live data). Specifically, you mean that you had a column referenced by a USING clause, and then you renamed it, right? We've had discussions about that in the past, and concluded that the SQL spec is just fundamentally broken here. If you rename one of the input columns, there is no way to represent a view that (used to) use USING without changing the view's behavior -- in particular, the set of columns exposed by a join with USING is different from the set of columns exposed without that, so simply replacing the USING with an ON clause wouldn't get the job done. So our view-dumping code just doesn't bother to try. You'll get something that still says USING, but of course this won't work when the view definition is dumped and reloaded. This is not the fault of the view: if you'd not used a view but just issued the equivalent join directly as a SQL query, the rename would still have broken your application. Our internal representation doesn't depend on the name-matching aspect of USING, so the view continues to work as before, so long as you don't dump and reload. But it looks wrong if you dump the definition as SQL. That's basically because SQL lacks a way to represent the situation. The best idea I've heard for fixing it is to invent a non-standard syntax that could represent a USING clause matching two dissimilarly named columns, say USING (foo = bar AS baz), and then use that syntax when dumping a view if the column names don't match. Nobody's worked out the idea in full detail, though, let alone implemented it; it's not really clear it's worth the trouble. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock
On Fri, Feb 10, 2012 at 2:35 AM, Gary Chambers gwch...@gwcmail.com wrote: Is it possible that Postgres is not receiving a meaningful response with respect to ExclusiveLock locking (i.e. unable to really obtain an ExclusiveLock) due to VM disk residing on an NFS mount? pg_advisory_unlock (along with the other functions in that family) works on a set of mythical objects with no actual meaning beyond what the database administrator chooses to give them. You lock and unlock these ethereal things, just numbers off a set of tables, with no relationship to NFS mounts, tables, records, or anything else. In (the current iteration of) the priority-queue I wrote for work, each queue-pumping process takes an exclusive lock on a partition, where a partition is one fraction of the available ID space, using modulo arithmetic. At least, that's what I, the programmer, see; to Postgres, it just takes an exclusive lock on (42,64) or some other pair of numbers. That lock will succeed or fail only on the basis of other advisory lock calls, nothing else can affect it. Chris Angelico -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Strategy for Primary Key Generation When Populating Table
I have a lot of data currently in .pdf files. I can extract the relevant data to plain text and format it to create a large text file of INSERT INTO ... rows. I need a unique ID for each row and there are no columns that would make a natural key so the serial data type would be appropriate. When I prepare the text file I can start each row with the delimiter (',') to indicate there's a table column preceding. If I define the primary key as serial type on that first position in the file, will postgres automagically fill it in as each row is read into the table? If not, or if there's a better way of approaching this task, please clue me in to that. TIA, Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, Feb 9, 2012 at 10:49 AM, Rich Shepard rshep...@appl-ecosys.com wrote: I have a lot of data currently in .pdf files. I can extract the relevant data to plain text and format it to create a large text file of INSERT INTO ... rows. I need a unique ID for each row and there are no columns that would make a natural key so the serial data type would be appropriate. The record should be logically unique as well as physically unique (of if it isn't, why bother making a unique constraint at all?). Sometimes you *have* to force a surrogate, for example if certain (broken) client tools need a primary key to work, but aside from that you shouldn't rely on a surrogate to generate uniqueness. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, 9 Feb 2012, Merlin Moncure wrote: The record should be logically unique as well as physically unique (of if it isn't, why bother making a unique constraint at all?). Sometimes you *have* to force a surrogate, for example if certain (broken) client tools need a primary key to work, but aside from that you shouldn't rely on a surrogate to generate uniqueness. merlin, I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. The only 'broken client tools' are their consistent uses of Microsoft Excel to store data or providing text reports in pdf with other data. Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On 2/9/2012 10:49 AM, Rich Shepard wrote: I have a lot of data currently in .pdf files. I can extract the relevant data to plain text and format it to create a large text file of INSERT INTO ... rows. I need a unique ID for each row and there are no columns that would make a natural key so the serial data type would be appropriate. When I prepare the text file I can start each row with the delimiter (',') to indicate there's a table column preceding. If I define the primary key as serial type on that first position in the file, will postgres automagically fill it in as each row is read into the table? If not, or if there's a better way of approaching this task, please clue me in to that. TIA, Rich If you create a serial column, dont put the column name or a value into your insert statement. create table junk (id serial, stuff text); insert into junk(stuff) values ('my stuff'); or, and I've never done this, I think you can use the default keyword: insert into junk(id, stuff) values (default, 'my stuff'); -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, Feb 9, 2012 at 9:49 AM, Rich Shepard rshep...@appl-ecosys.com wrote: I have a lot of data currently in .pdf files. I can extract the relevant data to plain text and format it to create a large text file of INSERT INTO ... rows. I need a unique ID for each row and there are no columns that would make a natural key so the serial data type would be appropriate. When I prepare the text file I can start each row with the delimiter (',') to indicate there's a table column preceding. If I define the primary key as serial type on that first position in the file, will postgres automagically fill it in as each row is read into the table? If not, or if there's a better way of approaching this task, please clue me in to that. If you format your copy statement with a column list that leaves out the serial key the db will insert that for you. file: /tmp/input.sql copy test (i1) from stdin; 10 20 30 40 50 \. create table test (id serial primary key, i1 int); \i /tmp/input.sql select * from test id | i1 + 1 | 10 2 | 20 3 | 30 4 | 40 5 | 50 (5 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock
Chris, pg_advisory_unlock (along with the other functions in that family) works on a set of mythical objects with no actual meaning beyond what the database administrator chooses to give them. Thank you for your excellent description. I have never used the advisory lock functionality that Postgres provides, and your reply combined with Merlin's multi-part primer on them, it's easy to see their value, application, and implementation. Finally, with respect to my original request, it's quite conceivable that the application is attempting to free the lock more times than it requested it. Thank you again for your reply! -- Gary Chambers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, 9 Feb 2012, Andy Colson wrote: If you create a serial column, dont put the column name or a value into your insert statement. create table junk (id serial, stuff text); insert into junk(stuff) values ('my stuff'); Andy, That's what I assumed would work but did not know for sure. Thanks, Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] initdb $PGDATA not working
Hi! I've set my PGDATA variable in profile export PGDATA=/home/mydata/pgsql/data Testing variable for correct set in enviroment $ echo $PGDATA /home/mydata/pgsql/data but when execute: $ sudo /etc/init.d/postgresql-9.1 initdb or execute: $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data The cluster always is created on default path /var/lib/pgsql/9.1/data/ Will install in CentOS 5.7 Thx
Re: [GENERAL] initdb $PGDATA not working
On 2/9/2012 3:25 PM, Daniel Vázquez wrote: Hi! I've set my PGDATA variable in profile export PGDATA=/home/mydata/pgsql/data Testing variable for correct set in enviroment $ echo $PGDATA /home/mydata/pgsql/data but when execute: $ sudo /etc/init.d/postgresql-9.1 initdb or execute: $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data The cluster always is created on default path /var/lib/pgsql/9.1/data/ Will install in CentOS 5.7 Thx I'm guessing /etc/init.d/postgresql-9.1 is a script that is ignoring your arguments: initdb, $PGDATA, and -D /path/. I'd say dont use that script and go right for the origional initdb. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] initdb $PGDATA not working
On Thursday, February 09, 2012 10:25:51 PM Daniel Vázquez wrote: Hi! I've set my PGDATA variable in profile export PGDATA=/home/mydata/pgsql/data Testing variable for correct set in enviroment $ echo $PGDATA /home/mydata/pgsql/data but when execute: $ sudo /etc/init.d/postgresql-9.1 initdb or execute: $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data The cluster always is created on default path /var/lib/pgsql/9.1/data/ Will install in CentOS 5.7 The init script doesn't take command line options except the initdb action. And it sets PGDATA itself. Perhaps su postgres -c initdb -D /home/mydata/pgsql/data or something similar might work better. Or modifying the init script before using it, which would have the advantage that it might also start the server later using your new data directory. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] initdb $PGDATA not working
On 02/09/12 1:25 PM, Daniel Vázquez wrote: I've set my PGDATA variable in profile export PGDATA=/home/mydata/pgsql/data Testing variable for correct set in enviroment $ echo $PGDATA /home/mydata/pgsql/data but when execute: $ sudo /etc/init.d/postgresql-9.1 initdb or execute: $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data The cluster always is created on default path /var/lib/pgsql/9.1/data/ Will install in CentOS 5.7 the /etc/init.d/postgresql* scripts are completely selfcontained and don't use any external environment variables. you can put custom settings in /etc/sysconfig/pgsql/postgresql-9.1 (where the last part of the filename is the same as the /etc/init.d/ script name). the main things you'd set in those scripts are... PGENGINE=/usr/pgsql-${PGMAJORVERSION}/bin PGPORT=5432 PGDATA=/var/lib/pgsql/${PGMAJORVERSION}/data PGLOG=/var/lib/pgsql/${PGMAJORVERSION}/pgstartup.log However, if you're creating a cluster under your own home directory, I'm not sure you even want it to be under the control of the system scripts. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On 2/9/12 10:08 AM, Rich Shepard wrote: I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. -ds -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On 2/9/2012 4:10 PM, David Salisbury wrote: On 2/9/12 10:08 AM, Rich Shepard wrote: I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. -ds Yeah, this is a Vim vs Emacs war. (Vim, :-) ) I prefer surrogates like you. Its way to easy to pick something that one day has to change. Within the last year I remember a long thread about this same thing. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, Feb 9, 2012 at 4:20 PM, Andy Colson a...@squeakycode.net wrote: On 2/9/2012 4:10 PM, David Salisbury wrote: On 2/9/12 10:08 AM, Rich Shepard wrote: I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. -ds Yeah, this is a Vim vs Emacs war. (Vim, :-) ) I prefer surrogates like you. Its way to easy to pick something that one day has to change. Within the last year I remember a long thread about this same thing. Sure, you can use surrogates, but you should still define or at least be aware of a natural key if there is one. If you can't (which happens on various type of data), then the surrogate is giving the illusion of row uniqueness when there isn't one. This is really a design error: other keys could depend on this table's primary key which is a provably ambiguous relationship. Since your rows are not informationally distinct from each other, why do you need to be able to point at a specific one? natural/surrogate is a performance/usability debate with various tradeoffs. but using surrogate to 'create' uniqueness is a logical design error; maybe a very forgivable one for various reasons, but the point stands. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On 2/9/2012 4:20 PM, Andy Colson wrote: On 2/9/2012 4:10 PM, David Salisbury wrote: On 2/9/12 10:08 AM, Rich Shepard wrote: I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. -ds Yeah, this is a Vim vs Emacs war. (Vim, :-) ) I prefer surrogates like you. Its way to easy to pick something that one day has to change. Within the last year I remember a long thread about this same thing. -Andy Ah, here it is: http://archives.postgresql.org/pgsql-general/2011-04/msg00996.php -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] initdb $PGDATA not working
THX all !! My fault is I set the $PGDATA enviroment variable in /etc/sysconfig/pgsql/**postgresql like usual in 8.4 renaming the file to postgresql-9.1 solve the isue. Thanks guys! El 9 de febrero de 2012 22:25, Daniel Vázquez daniel2d2...@gmail.comescribió: Hi! I've set my PGDATA variable in profile export PGDATA=/home/mydata/pgsql/data Testing variable for correct set in enviroment $ echo $PGDATA /home/mydata/pgsql/data but when execute: $ sudo /etc/init.d/postgresql-9.1 initdb or execute: $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data The cluster always is created on default path /var/lib/pgsql/9.1/data/ Will install in CentOS 5.7 Thx
Re: [GENERAL] VIEW still referring to old name of field
Thanks Tom. I get it. Putting it in another way, if there was a function and a VIEW and this field name were to be changed, then we'd have a broken function anyway. The only issue is that (before writing this mail) I expected that a VIEW would either throw up errors or would work without fail regardless of a dump / reload scenario (and yes likewise, I expected a function to not exhibit that behaviour). So I actually did a search for this field in the functions, and relied on PG to throw up errors for a VIEW. Probably I got stumped there. But that aside, the only question left here is that if a token is not exposed by a VIEW, would not an automatic search / replace have done the job ? Theoretically speaking, having a known case where a VIEW's definition not working whereas the VIEW working is flawed (frankly worrying, now to think of it). Thanks nonetheless. Guess I need some daily dump/reload scripts for all projects right away. -- Robins Tharakan On Thu, Feb 9, 2012 at 9:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robins Tharakan robins.thara...@gmail.com writes: This is a case where I changed the name of a field in a table that a VIEW referred to, but the VIEW definition still points to the old name of the field. The surprise is that the VIEW still works (with live data). Specifically, you mean that you had a column referenced by a USING clause, and then you renamed it, right? We've had discussions about that in the past, and concluded that the SQL spec is just fundamentally broken here. If you rename one of the input columns, there is no way to represent a view that (used to) use USING without changing the view's behavior -- in particular, the set of columns exposed by a join with USING is different from the set of columns exposed without that, so simply replacing the USING with an ON clause wouldn't get the job done. So our view-dumping code just doesn't bother to try. You'll get something that still says USING, but of course this won't work when the view definition is dumped and reloaded. This is not the fault of the view: if you'd not used a view but just issued the equivalent join directly as a SQL query, the rename would still have broken your application. Our internal representation doesn't depend on the name-matching aspect of USING, so the view continues to work as before, so long as you don't dump and reload. But it looks wrong if you dump the definition as SQL. That's basically because SQL lacks a way to represent the situation. The best idea I've heard for fixing it is to invent a non-standard syntax that could represent a USING clause matching two dissimilarly named columns, say USING (foo = bar AS baz), and then use that syntax when dumping a view if the column names don't match. Nobody's worked out the idea in full detail, though, let alone implemented it; it's not really clear it's worth the trouble. regards, tom lane
Re: [GENERAL] initdb $PGDATA not working
On 02/09/12 2:38 PM, Daniel Vázquez wrote: YES /etc/sysconfig/pgsql/postgresql-9.1 I'm using /etc/sysconfig/pgsql/postgresql like in 8.4 ... my fault ... is like some overunderstand ... but no doc about it. whatever the name of the /etc/init.d/postgres** script is, it uses that same name in the /etc/sysconfg/pgsql directory. so if you want to run 2 copies of 9.1, cp /etc/init.d/postgresql-9.1 /etc/init.d/postgresql-9.1B and then create /etc/sysconfig/pgsql/postgresql-9.1B and put your port, and data directory in there, and poof, like magic! -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, 9 Feb 2012, David Salisbury wrote: Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? For water quality data the primary key is (site, date, param) since there's only one value for a given parameter collected at a specific site on a single day. No surrogate key needed. I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. The problem with real world data is that different taxonomic levels are used. Not all organisms can be identified to species; some (such as the round worms, or nematodes) are at the level of order. That means there is no combination of columns that are consistently not NULL. Sigh. Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, 9 Feb 2012, Merlin Moncure wrote: If you can't (which happens on various type of data), then the surrogate is giving the illusion of row uniqueness when there isn't one. Ah, but each row is unique. However, there is no consisten set of non NULL values that can consistently define a unique key for each row. Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On 2/9/12 5:25 PM, Rich Shepard wrote: For water quality data the primary key is (site, date, param) since there's only one value for a given parameter collected at a specific site on a single day. No surrogate key needed. Yea. I was wondering if the surrogate key debate really boils down to the composite primary key debate. Seems so in my mind, though one could maybe come up with a combination. Basically aliases of values and composite those. Perhaps that's the ultimate methodology. :) The problem with real world data is that different taxonomic levels are used. Not all organisms can be identified to species; some (such as the round worms, or nematodes) are at the level of order. That means there is no combination of columns that are consistently not NULL. Sigh. I didn't know that about worms. I did know grasses only went to the genus. You could make a tall skinny self referential table though, and nothing would be null and everything would be unique ( I think, unless certain taxon values can appear under different higher order taxon values ). Thanks for the view points out there. Cheers, -ds -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thursday, February 09, 2012 5:18:19 pm David Salisbury wrote: On 2/9/12 5:25 PM, Rich Shepard wrote: For water quality data the primary key is (site, date, param) since there's only one value for a given parameter collected at a specific site on a single day. No surrogate key needed. Yea. I was wondering if the surrogate key debate really boils down to the composite primary key debate. Seems so in my mind, though one could maybe come up with a combination. Basically aliases of values and composite those. Perhaps that's the ultimate methodology. :) The problem with real world data is that different taxonomic levels are used. Not all organisms can be identified to species; some (such as the round worms, or nematodes) are at the level of order. That means there is no combination of columns that are consistently not NULL. Sigh. I didn't know that about worms. I did know grasses only went to the genus. You could make a tall skinny self referential table though, and nothing would be null and everything would be unique ( I think, unless certain taxon values can appear under different higher order taxon values ). OT. Alright, now I have to ask. When you say grasses(or for that matter round worms) cannot be identified to species are you talking about the data you are receiving or in general. Because as far as I know there are many species identified for both. They are difficult to id but species do exist. Thanks for the view points out there. Cheers, -ds -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Strategy for Primary Key Generation When Populating Table
On Thu, Feb 9, 2012 at 2:10 PM, David Salisbury salisb...@globe.gov wrote: On 2/9/12 10:08 AM, Rich Shepard wrote: I have reports containing macroinvertebrate collection data for several hundred (or several thousand) of taxa. There is no natural key since there are multiple rows for each site/date pair. Years ago Joe Celko taught me to seek natural keys whenever they might exist. They don't here. That's why I specifically mentioned that in my message. Interesting. I used to think natural keys were okay, but have since decided that surrogates are the way to go. That second layer of abstraction allows for much easier data modifications when needed. What would be an example of a natural key that would be good to use, and why would it be preferable?? I'd think the key value must never change, and even say kingdom values in a taxa table could possibly change.. might discover something new and do a little reordering. :) Also natural keys might be strings, which I'm thinking would not be as efficient as integers for an index. Well, here is the approach we have taken with LedgerSMB: Every table has a defined primary key, and where possible this is the natural key. There are cases where there is no natural key however and we use a surrogate key. However every table also has at least one single column key whether it is the natural primary key or a surrogate one. All joins are done on surrogate keys. This has a few very specific advantages as the db schema changes: if criteria for the natural key must change because of evolving requirements, the join conditions need not change. Moreover joins don't require intimate knowledge of natural keys between tables, making joins simpler and more predictable, and easier to read. So I don't think this is an either/or proposition. I think there is a great deal of benefit to the use of both. Best Wishes, Chris Travers