[GENERAL] What is this vacuum doing?

2012-02-09 Thread Joseph Shraibman
I have a vacuum process that is sitting around and apparently not doing 
anything.  It's been around over 2000 seconds and is eating up no cpu.  
It isn't waiting on a lock.  Backtrace is this:



#0  0x00367aed4ff7 in semop () from /lib64/libc.so.6
#1  0x005d2a83 in PGSemaphoreLock (sema=0x2ace8a5721f0, 
interruptOK=1 '\001') at pg_sema.c:418
#2  0x005fa275 in LockBufferForCleanup (buffer=138743) at 
bufmgr.c:2471
#3  0x00476ff0 in btvacuumscan (info=0x7fff3b3dcfc0, 
stats=0x1eb0a900, callback=0x551250 lazy_tid_reaped, 
callback_state=0x1eae2db0, cycleid=45547) at nbtree.c:856
#4  0x004774c5 in btbulkdelete (fcinfo=value optimized out) at 
nbtree.c:574
#5  0x006d100a in FunctionCall4Coll (flinfo=0xfffc, 
collation=993900848, arg1=1, arg2=18446744073709551615, arg3=0, 
arg4=140734187288784) at fmgr.c:1371
#6  0x005511c4 in lazy_vacuum_index (indrel=0x2ace8b697ab8, 
stats=0x1eb09cc8, vacrelstats=0x1eae2db0) at vacuumlazy.c:1020
#7  0x00551a77 in lazy_vacuum_rel (onerel=0x2ace8b696470, 
vacstmt=value optimized out, bstrategy=value optimized out) at 
vacuumlazy.c:843
#8  0x005507e0 in vacuum_rel (relid=3471206537, 
vacstmt=0x7fff3b3de740, do_toast=value optimized out, for_wraparound=0 
'\0') at vacuum.c:1042
#9  0x00550b0b in vacuum (vacstmt=0x7fff3b3de740, relid=value 
optimized out, do_toast=0 '\0', bstrategy=value optimized out, 
for_wraparound=0 '\0', isTopLevel=value optimized out) at vacuum.c:227

#10 0x005d5d57 in do_autovacuum () at autovacuum.c:2739
#11 0x005d6393 in AutoVacWorkerMain (argc=value optimized out, 
argv=value optimized out) at autovacuum.c:1641

#12 0x005d6486 in StartAutoVacWorker () at autovacuum.c:1428
#13 0x005e0d22 in sigusr1_handler (postgres_signal_arg=value 
optimized out) at postmaster.c:4588

#14 signal handler called
#15 0x00367aecce83 in __select_nocancel () from /lib64/libc.so.6
#16 0x005dd7dc in ServerLoop () at postmaster.c:1415
#17 0x005dec97 in PostmasterMain (argc=2, argv=0x1ea0bd90) at 
postmaster.c:1116
#18 0x0058352e in main (argc=2, argv=value optimized out) at 
main.c:199


So what is this doing?  This is a problem because other threads are 
blocking on this one, and as soon as I kill -INT it it respawns so I 
can't get rid of it.


pg version is:

PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 
4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] VIEW still referring to old name of field

2012-02-09 Thread Robins Tharakan
Hi,

This is a case where I changed the name of a field in a table that a VIEW
referred to, but the VIEW definition still points to the old name of the
field. The surprise is that the VIEW still works (with live data).

Excerpt from psql (v9.1.2) given below.

The problem came up when I took a schema dump and restored this on another
server which crashed on this VIEW since now the field referred by the VIEW
isn't there any more. The same error is shown if (on this server) I copy
the VIEW definition and execute directly (which is expected, since the
field name is incorrect in the definition).

I think an easy way to solve this is to do a drop/create for the VIEW, but
I think this is a bug and its better if found / resolved, if possible. Let
me know if I can provide any more input on this. I have this server for
another day or two, and then do let me know if running a command on this
server may help.

Further, (I am unsure here) but I believe the field name was changed ~1-2
weeks back and the server was restarted just the day before. Is it possible
that this survives a restart as well?

Thanks
--
Robins Tharakan
==


[pgsql@server /webstats/pgsql]$ psql
psql (9.1.2)
Type help for help.

pgsql=# \c aproject
You are now connected to database aproject as user pgsql.

aproject=# \d ui.thisview
   View ui.thisview
  Column  |  Type  | Modifiers
--++---
 pid  | integer|
 product_name | character varying(100) |
 product_usage| bigint |
 product_usage_percentage | numeric(10,2)  |

aproject=# select * from pg_views where viewname = 'thisview';
 schemaname |viewname| viewowner |



definition


++---+
 ui | thisview | pgsql | WITH x AS (SELECT session.pid,
count(*) AS product_usage FROM ((s.history JOIN s.session USING
(session_id)) JOIN s.product USING (pid)) WHERE (((history.datetime_entry
= (now() - '90 days'::interval)) AND (history.datetime_entry = now()))
AND (session.pid IS NOT NULL)) GROUP BY session.pid) SELECT x.pid,
product.product_name, x.product_usage, x.product_usage * 100))::numeric
/ (SELECT sum(x.product_usage) AS count FROM x)))::numeric(10,2) AS
product_usage_percentage FROM (x JOIN s.product USING (pid));
(1 row)

aproject=# select * from ui.thisview limit 1;
 pid |  product_name   | product_usage | product_usage_percentage
-+-+---+--
  14 | Unknown Product |   700 | 2.02
(1 row)

aproject=# WITH x AS (SELECT session.pid, count(*) AS product_usage FROM
((s.history JOIN s.session USING (session_id)) JOIN s.product USING (pid))
WHERE (((history.datetime_entry = (now() - '90 days'::interval)) AND
(history.datetime_entry = now())) AND (session.pid IS NOT NULL)) GROUP BY
session.pid) SELECT x.pid, product.product_name, x.product_usage,
x.product_usage * 100))::numeric / (SELECT sum(x.product_usage) AS
count FROM x)))::numeric(10,2) AS product_usage_percentage FROM (x JOIN
s.product USING (pid));
ERROR:  column pid specified in USING clause does not exist in right table

aproject=# \d s.product
   Table s.product
Column|  Type  | Modifiers
--++
 product_id   | integer| not null default
nextval('s.product_id_seq'::regclass)
 product_name | character varying(100) | not null
Indexes:
product_pkey PRIMARY KEY, btree (product_id)
Referenced by:
TABLE s.session CONSTRAINT fk_product_id FOREIGN KEY (pid)
REFERENCES s.product(product_id)


Re: [GENERAL] phantom long-running query (check_postgres.pl reports a long-running query, but the database logs don't show it)

2012-02-09 Thread Achilleas Mantzios
On Πεμ 09 Φεβ 2012 02:32:37 Aleksey Tsalolikhin wrote:
 I got an alert from check_postgres.pl today on a long-running query on
 our production database, but our PostgreSQL 8.4.9 server log, which is
 configured to log queries over 5 seconds long
 (log_min_duration_statement = 5000) does not show the query.
 
 check_postgres.pl showed:
 Date/Time: Wed Feb 8 08:41:36 PST 2012
 POSTGRES_QUERY_TIME WARNING: (host:xxx) longest query: 264s
 (database:xxx PID:xxx port:xxx address:xxx username:xxx)
 
 postgres log showed other long-running queries, but nothing over 48s.
 
 It's really interesting why the database server log does not show this
 query.
 
 Can you think of some scenario where check_postgres.pl would see a
 long-running query but it would not get logged by the database server?
 
 I checked the server log and there is nothing there for 08:41 at all.
 08:40 and 08:42, yes, some queries over 5 secs, but nothing even close
 to 264 secs.
 
 I've added the verbose switch to my check_postgres.pl script so we'll
 have a record of what was the long-running query.
 

You should either look at the source of this perl script to see what it is 
doing, or enable log_statement = 'all' in postgresql.conf and send the 
postmaster the -HUP signal, and then watch the postgresql log for the queries 
that the perl script is issueing. Then replay those queries by hand and 
examine them.
Also you can always check pg_stat_activity table.

 Best,
 -at

-- 
Achilleas Mantzios
IT DEPT

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SOUNDEX call

2012-02-09 Thread Marti Raudsepp
On Wed, Feb 8, 2012 at 15:37, 84.le0n 84.l...@gmail.com wrote:
 This is an interesting solution, but I know a little bit PL/pgSQL and I
 don't know how provide SOUNDEX version in PL/pgSQL, I don't know SOUNDEX
 algo too.
 How can I provide soundex in PL/pgSQL ?

I wrote and posted a PL/pgSQL version of soundex() on the PostgreSQL wiki:
https://wiki.postgresql.org/wiki/Soundex

You are free to use it under the terms of any version of the GPL
license, MIT license or PostgreSQL license at your choice. If you end
up using it, getting some credit in release notes would be
appreciated. :)

Regards,
Marti

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock

2012-02-09 Thread Gary Chambers

Chris (et al.),

Thanks for the reply.  I have not replied sooner because I was hoping to get
some more feedback from the list.


I have a recently-migrated Pg cluster running 8.4.7 on Red Hat Enterprise
Linux Client release 5.7 (Tikanga) in a VMware VM that is logging the
subject warning.  The application code is considerably old (deployed back
in the 7.4 days, I believe) and the message is preceded by a call to
select user_write_unlock(), which is a simple wrapper query that calls
pg_advisory_unlock().  What are the causes and possible problems and
side-effects of receiving such a log warning?  Thank you for any
assistance you can provide.


Warning: you don't own a lock of type ExclusiveLock


That means that you didn't have the lock you were trying to release.
Whether that indicates a critical error or not depends on application
logic.


The application appears to be functioning without issue, but I still have
some concerns.  One possible cause of the warning I considered was having
the VM underlying storage NFS-mounted on a NetApp.  Is it possible that
Postgres is not receiving a meaningful response with respect to
ExclusiveLock locking (i.e. unable to really obtain an ExclusiveLock) due to
VM disk residing on an NFS mount?

-- Gary Chambers

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VIEW still referring to old name of field

2012-02-09 Thread Tom Lane
Robins Tharakan robins.thara...@gmail.com writes:
 This is a case where I changed the name of a field in a table that a VIEW
 referred to, but the VIEW definition still points to the old name of the
 field. The surprise is that the VIEW still works (with live data).

Specifically, you mean that you had a column referenced by a USING
clause, and then you renamed it, right?  We've had discussions about
that in the past, and concluded that the SQL spec is just fundamentally
broken here.  If you rename one of the input columns, there is no way
to represent a view that (used to) use USING without changing the view's
behavior -- in particular, the set of columns exposed by a join with
USING is different from the set of columns exposed without that, so
simply replacing the USING with an ON clause wouldn't get the job done.
So our view-dumping code just doesn't bother to try.  You'll get
something that still says USING, but of course this won't work when the
view definition is dumped and reloaded.  This is not the fault of the
view: if you'd not used a view but just issued the equivalent join
directly as a SQL query, the rename would still have broken your
application.

Our internal representation doesn't depend on the name-matching aspect
of USING, so the view continues to work as before, so long as you don't
dump and reload.  But it looks wrong if you dump the definition as SQL.
That's basically because SQL lacks a way to represent the situation.

The best idea I've heard for fixing it is to invent a non-standard
syntax that could represent a USING clause matching two dissimilarly
named columns, say USING (foo = bar AS baz), and then use that syntax
when dumping a view if the column names don't match.  Nobody's worked
out the idea in full detail, though, let alone implemented it; it's not
really clear it's worth the trouble.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock

2012-02-09 Thread Chris Angelico
On Fri, Feb 10, 2012 at 2:35 AM, Gary Chambers gwch...@gwcmail.com wrote:
 Is it possible that
 Postgres is not receiving a meaningful response with respect to
 ExclusiveLock locking (i.e. unable to really obtain an ExclusiveLock) due to
 VM disk residing on an NFS mount?

pg_advisory_unlock (along with the other functions in that family)
works on a set of mythical objects with no actual meaning beyond what
the database administrator chooses to give them. You lock and unlock
these ethereal things, just numbers off a set of tables, with no
relationship to NFS mounts, tables, records, or anything else. In (the
current iteration of) the priority-queue I wrote for work, each
queue-pumping process takes an exclusive lock on a partition, where
a partition is one fraction of the available ID space, using modulo
arithmetic. At least, that's what I, the programmer, see; to Postgres,
it just takes an exclusive lock on (42,64) or some other pair of
numbers. That lock will succeed or fail only on the basis of other
advisory lock calls, nothing else can affect it.

Chris Angelico

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Rich Shepard

  I have a lot of data currently in .pdf files. I can extract the relevant
data to plain text and format it to create a large text file of INSERT INTO
... rows. I need a unique ID for each row and there are no columns that
would make a natural key so the serial data type would be appropriate.

  When I prepare the text file I can start each row with the delimiter (',')
to indicate there's a table column preceding. If I define the primary key
as serial type on that first position in the file, will postgres
automagically fill it in as each row is read into the table?

  If not, or if there's a better way of approaching this task, please clue
me in to that.

TIA,

Rich



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Merlin Moncure
On Thu, Feb 9, 2012 at 10:49 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a lot of data currently in .pdf files. I can extract the relevant
 data to plain text and format it to create a large text file of INSERT INTO
 ... rows. I need a unique ID for each row and there are no columns that
 would make a natural key so the serial data type would be appropriate.

The record should be logically unique as well as physically unique (of
if it isn't, why bother making a unique constraint at all?).
Sometimes you *have* to force a surrogate, for example if certain
(broken) client tools need a primary key to work, but aside from that
you shouldn't rely on a surrogate to generate uniqueness.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Rich Shepard

On Thu, 9 Feb 2012, Merlin Moncure wrote:


The record should be logically unique as well as physically unique (of if
it isn't, why bother making a unique constraint at all?). Sometimes you
*have* to force a surrogate, for example if certain (broken) client tools
need a primary key to work, but aside from that you shouldn't rely on a
surrogate to generate uniqueness.


merlin,

  I have reports containing macroinvertebrate collection data for several
hundred (or several thousand) of taxa. There is no natural key since there
are multiple rows for each site/date pair. Years ago Joe Celko taught me to
seek natural keys whenever they might exist. They don't here. That's why I
specifically mentioned that in my message.

  The only 'broken client tools' are their consistent uses of Microsoft
Excel to store data or providing text reports in pdf with other data.

Rich



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Andy Colson

On 2/9/2012 10:49 AM, Rich Shepard wrote:

I have a lot of data currently in .pdf files. I can extract the relevant
data to plain text and format it to create a large text file of INSERT
INTO
... rows. I need a unique ID for each row and there are no columns that
would make a natural key so the serial data type would be appropriate.

When I prepare the text file I can start each row with the delimiter (',')
to indicate there's a table column preceding. If I define the primary key
as serial type on that first position in the file, will postgres
automagically fill it in as each row is read into the table?

If not, or if there's a better way of approaching this task, please clue
me in to that.

TIA,

Rich





If you create a serial column, dont put the column name or a value into 
your insert statement.


create table junk (id serial, stuff text);
insert into junk(stuff) values ('my stuff');

or, and I've never done this, I think you can use the default keyword:

insert into junk(id, stuff) values (default, 'my stuff');


-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Scott Marlowe
On Thu, Feb 9, 2012 at 9:49 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a lot of data currently in .pdf files. I can extract the relevant
 data to plain text and format it to create a large text file of INSERT INTO
 ... rows. I need a unique ID for each row and there are no columns that
 would make a natural key so the serial data type would be appropriate.

  When I prepare the text file I can start each row with the delimiter (',')
 to indicate there's a table column preceding. If I define the primary key
 as serial type on that first position in the file, will postgres
 automagically fill it in as each row is read into the table?

  If not, or if there's a better way of approaching this task, please clue
 me in to that.

If you format your copy statement with a column list that leaves out
the serial key the db will insert that for you.

file: /tmp/input.sql
copy test (i1) from stdin;
10
20
30
40
50
\.

create table test (id serial primary key, i1 int);
\i /tmp/input.sql
select * from test
 id | i1
+
  1 | 10
  2 | 20
  3 | 30
  4 | 40
  5 | 50
(5 rows)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Warning: you don't own a lock of type ExclusiveLock

2012-02-09 Thread Gary Chambers

Chris,


pg_advisory_unlock (along with the other functions in that family) works
on a set of mythical objects with no actual meaning beyond what the
database administrator chooses to give them.


Thank you for your excellent description.  I have never used the advisory
lock functionality that Postgres provides, and your reply combined with
Merlin's multi-part primer on them, it's easy to see their value,
application, and implementation.

Finally, with respect to my original request, it's quite conceivable that
the application is attempting to free the lock more times than it requested
it.  Thank you again for your reply!

-- Gary Chambers

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Rich Shepard

On Thu, 9 Feb 2012, Andy Colson wrote:

If you create a serial column, dont put the column name or a value into your 
insert statement.


create table junk (id serial, stuff text);
insert into junk(stuff) values ('my stuff');


Andy,

  That's what I assumed would work but did not know for sure.

Thanks,

Rich


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] initdb $PGDATA not working

2012-02-09 Thread Daniel Vázquez
Hi!

I've set my PGDATA variable in profile
export PGDATA=/home/mydata/pgsql/data

Testing variable for correct set in enviroment
$ echo $PGDATA
/home/mydata/pgsql/data

but when execute:
$ sudo /etc/init.d/postgresql-9.1 initdb

or execute:
$ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data

The cluster always is created on default path /var/lib/pgsql/9.1/data/

Will install in CentOS 5.7

Thx


Re: [GENERAL] initdb $PGDATA not working

2012-02-09 Thread Andy Colson

On 2/9/2012 3:25 PM, Daniel Vázquez wrote:

Hi!

I've set my PGDATA variable in profile
export PGDATA=/home/mydata/pgsql/data

Testing variable for correct set in enviroment
$ echo $PGDATA
/home/mydata/pgsql/data

but when execute:
$ sudo /etc/init.d/postgresql-9.1 initdb

or execute:
$ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data

The cluster always is created on default path /var/lib/pgsql/9.1/data/

Will install in CentOS 5.7

Thx






I'm guessing /etc/init.d/postgresql-9.1 is a script that is ignoring 
your arguments: initdb, $PGDATA, and -D /path/.


I'd say dont use that script and go right for the origional initdb.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] initdb $PGDATA not working

2012-02-09 Thread Alan Hodgson
On Thursday, February 09, 2012 10:25:51 PM Daniel Vázquez wrote:
 Hi!
 
 I've set my PGDATA variable in profile
 export PGDATA=/home/mydata/pgsql/data
 
 Testing variable for correct set in enviroment
 $ echo $PGDATA
 /home/mydata/pgsql/data
 
 but when execute:
 $ sudo /etc/init.d/postgresql-9.1 initdb
 
 or execute:
 $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data
 
 The cluster always is created on default path /var/lib/pgsql/9.1/data/
 
 Will install in CentOS 5.7

The init script doesn't take command line options except the initdb action. 
And it sets PGDATA itself.

Perhaps su postgres -c initdb -D /home/mydata/pgsql/data or something 
similar might work better. Or modifying the init script before using it, which 
would have the advantage that it might also start the server later using your 
new data directory.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] initdb $PGDATA not working

2012-02-09 Thread John R Pierce

On 02/09/12 1:25 PM, Daniel Vázquez wrote:

I've set my PGDATA variable in profile
export PGDATA=/home/mydata/pgsql/data

Testing variable for correct set in enviroment
$ echo $PGDATA
/home/mydata/pgsql/data

but when execute:
$ sudo /etc/init.d/postgresql-9.1 initdb

or execute:
$ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data

The cluster always is created on default path /var/lib/pgsql/9.1/data/

Will install in CentOS 5.7


the /etc/init.d/postgresql* scripts are completely selfcontained and 
don't use any external environment variables.  you can put custom 
settings in /etc/sysconfig/pgsql/postgresql-9.1  (where the last part of 
the filename is the same as the /etc/init.d/ script name).   the main 
things you'd set in those scripts are...


   PGENGINE=/usr/pgsql-${PGMAJORVERSION}/bin
   PGPORT=5432
   PGDATA=/var/lib/pgsql/${PGMAJORVERSION}/data
   PGLOG=/var/lib/pgsql/${PGMAJORVERSION}/pgstartup.log


However, if you're creating a cluster under your own home directory, I'm 
not sure you even want it to be under the control of the system scripts.


--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread David Salisbury



On 2/9/12 10:08 AM, Rich Shepard wrote:

   I have reports containing macroinvertebrate collection data for several
hundred (or several thousand) of taxa. There is no natural key since there
are multiple rows for each site/date pair. Years ago Joe Celko taught me to
seek natural keys whenever they might exist. They don't here. That's why I
specifically mentioned that in my message.



Interesting.  I used to think natural keys were okay, but have since decided
that surrogates are the way to go.  That second layer of abstraction allows
for much easier data modifications when needed.  What would be an example
of a natural key that would be good to use, and why would it be preferable??

I'd think the key value must never change, and even say kingdom values in a
taxa table could possibly change.. might discover something new and do a
little reordering. :)  Also natural keys might be strings, which I'm thinking
would not be as efficient as integers for an index.

-ds

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Andy Colson

On 2/9/2012 4:10 PM, David Salisbury wrote:



On 2/9/12 10:08 AM, Rich Shepard wrote:

I have reports containing macroinvertebrate collection data for several
hundred (or several thousand) of taxa. There is no natural key since
there
are multiple rows for each site/date pair. Years ago Joe Celko taught
me to
seek natural keys whenever they might exist. They don't here. That's
why I
specifically mentioned that in my message.



Interesting. I used to think natural keys were okay, but have since decided
that surrogates are the way to go. That second layer of abstraction allows
for much easier data modifications when needed. What would be an example
of a natural key that would be good to use, and why would it be
preferable??

I'd think the key value must never change, and even say kingdom values in a
taxa table could possibly change.. might discover something new and do a
little reordering. :) Also natural keys might be strings, which I'm
thinking
would not be as efficient as integers for an index.

-ds




Yeah, this is a Vim vs Emacs war.  (Vim, :-) )

I prefer surrogates like you.  Its way to easy to pick something that 
one day has to change.


Within the last year I remember a long thread about this same thing.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Merlin Moncure
On Thu, Feb 9, 2012 at 4:20 PM, Andy Colson a...@squeakycode.net wrote:
 On 2/9/2012 4:10 PM, David Salisbury wrote:



 On 2/9/12 10:08 AM, Rich Shepard wrote:

 I have reports containing macroinvertebrate collection data for several
 hundred (or several thousand) of taxa. There is no natural key since
 there
 are multiple rows for each site/date pair. Years ago Joe Celko taught
 me to
 seek natural keys whenever they might exist. They don't here. That's
 why I
 specifically mentioned that in my message.



 Interesting. I used to think natural keys were okay, but have since
 decided
 that surrogates are the way to go. That second layer of abstraction allows
 for much easier data modifications when needed. What would be an example
 of a natural key that would be good to use, and why would it be
 preferable??

 I'd think the key value must never change, and even say kingdom values in
 a
 taxa table could possibly change.. might discover something new and do a
 little reordering. :) Also natural keys might be strings, which I'm
 thinking
 would not be as efficient as integers for an index.

 -ds



 Yeah, this is a Vim vs Emacs war.  (Vim, :-) )

 I prefer surrogates like you.  Its way to easy to pick something that one
 day has to change.

 Within the last year I remember a long thread about this same thing.

Sure, you can use surrogates, but you should still define or at least
be aware of a natural key if there is one. If you can't (which happens
on various type of data), then the surrogate is giving the illusion of
row uniqueness when there isn't one.   This is really a design error:
other keys could depend on this table's primary key which is a
provably ambiguous relationship.   Since your rows are not
informationally distinct from each other, why do you need to be able
to point at a specific one?

natural/surrogate is a performance/usability debate with various
tradeoffs.  but using surrogate to 'create' uniqueness is a logical
design error; maybe a very forgivable one for various reasons, but the
point stands.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Andy Colson

On 2/9/2012 4:20 PM, Andy Colson wrote:

On 2/9/2012 4:10 PM, David Salisbury wrote:



On 2/9/12 10:08 AM, Rich Shepard wrote:

I have reports containing macroinvertebrate collection data for several
hundred (or several thousand) of taxa. There is no natural key since
there
are multiple rows for each site/date pair. Years ago Joe Celko taught
me to
seek natural keys whenever they might exist. They don't here. That's
why I
specifically mentioned that in my message.



Interesting. I used to think natural keys were okay, but have since
decided
that surrogates are the way to go. That second layer of abstraction
allows
for much easier data modifications when needed. What would be an example
of a natural key that would be good to use, and why would it be
preferable??

I'd think the key value must never change, and even say kingdom values
in a
taxa table could possibly change.. might discover something new and do a
little reordering. :) Also natural keys might be strings, which I'm
thinking
would not be as efficient as integers for an index.

-ds




Yeah, this is a Vim vs Emacs war. (Vim, :-) )

I prefer surrogates like you. Its way to easy to pick something that one
day has to change.

Within the last year I remember a long thread about this same thing.

-Andy




Ah, here it is:

http://archives.postgresql.org/pgsql-general/2011-04/msg00996.php

-Andy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] initdb $PGDATA not working

2012-02-09 Thread Daniel Vázquez
THX all !!
My fault is I set the $PGDATA enviroment variable in
/etc/sysconfig/pgsql/**postgresql
like usual in 8.4
renaming the file to postgresql-9.1 solve the isue.

Thanks guys!

El 9 de febrero de 2012 22:25, Daniel Vázquez daniel2d2...@gmail.comescribió:

 Hi!

 I've set my PGDATA variable in profile
 export PGDATA=/home/mydata/pgsql/data

 Testing variable for correct set in enviroment
 $ echo $PGDATA
 /home/mydata/pgsql/data

 but when execute:
 $ sudo /etc/init.d/postgresql-9.1 initdb

 or execute:
 $ sudo /etc/init.d/postgresql-9.1 initdb -D /home/mydata/pgsql/data

 The cluster always is created on default path /var/lib/pgsql/9.1/data/

 Will install in CentOS 5.7

 Thx






Re: [GENERAL] VIEW still referring to old name of field

2012-02-09 Thread Robins Tharakan
Thanks Tom.

I get it. Putting it in another way, if there was a function and a VIEW and
this field name were to be changed, then we'd have a broken function
anyway.

The only issue is that (before writing this mail) I expected that a VIEW
would either throw up errors or would work without fail regardless of a
dump / reload scenario (and yes likewise, I expected a function to not
exhibit that behaviour). So I actually did a search for this field in the
functions, and relied on PG to throw up errors for a VIEW. Probably I got
stumped there.

But that aside, the only question left here is that if a token is not
exposed by a VIEW, would not an automatic search / replace have done the
job ? Theoretically speaking, having a known case where a VIEW's definition
not working whereas the VIEW working is flawed (frankly worrying, now to
think of it).

Thanks nonetheless. Guess I need some daily dump/reload scripts for all
projects right away.
--
Robins Tharakan



On Thu, Feb 9, 2012 at 9:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robins Tharakan robins.thara...@gmail.com writes:
  This is a case where I changed the name of a field in a table that a VIEW
  referred to, but the VIEW definition still points to the old name of the
  field. The surprise is that the VIEW still works (with live data).

 Specifically, you mean that you had a column referenced by a USING
 clause, and then you renamed it, right?  We've had discussions about
 that in the past, and concluded that the SQL spec is just fundamentally
 broken here.  If you rename one of the input columns, there is no way
 to represent a view that (used to) use USING without changing the view's
 behavior -- in particular, the set of columns exposed by a join with
 USING is different from the set of columns exposed without that, so
 simply replacing the USING with an ON clause wouldn't get the job done.
 So our view-dumping code just doesn't bother to try.  You'll get
 something that still says USING, but of course this won't work when the
 view definition is dumped and reloaded.  This is not the fault of the
 view: if you'd not used a view but just issued the equivalent join
 directly as a SQL query, the rename would still have broken your
 application.

 Our internal representation doesn't depend on the name-matching aspect
 of USING, so the view continues to work as before, so long as you don't
 dump and reload.  But it looks wrong if you dump the definition as SQL.
 That's basically because SQL lacks a way to represent the situation.

 The best idea I've heard for fixing it is to invent a non-standard
 syntax that could represent a USING clause matching two dissimilarly
 named columns, say USING (foo = bar AS baz), and then use that syntax
 when dumping a view if the column names don't match.  Nobody's worked
 out the idea in full detail, though, let alone implemented it; it's not
 really clear it's worth the trouble.

regards, tom lane



Re: [GENERAL] initdb $PGDATA not working

2012-02-09 Thread John R Pierce

On 02/09/12 2:38 PM, Daniel Vázquez wrote:

YES /etc/sysconfig/pgsql/postgresql-9.1

I'm using /etc/sysconfig/pgsql/postgresql like in 8.4 ... my fault ... 
is like some overunderstand ... but no doc about it.


whatever the name of the /etc/init.d/postgres** script is, it uses that 
same name in the /etc/sysconfg/pgsql directory.


so if you want to run 2 copies of 9.1, cp /etc/init.d/postgresql-9.1 
/etc/init.d/postgresql-9.1B and then create 
/etc/sysconfig/pgsql/postgresql-9.1B  and put your port, and data 
directory in there, and poof, like magic!




--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Rich Shepard

On Thu, 9 Feb 2012, David Salisbury wrote:


Interesting.  I used to think natural keys were okay, but have since decided
that surrogates are the way to go.  That second layer of abstraction allows
for much easier data modifications when needed.  What would be an example
of a natural key that would be good to use, and why would it be preferable??


  For water quality data the primary key is (site, date, param) since
there's only one value for a given parameter collected at a specific site on
a single day. No surrogate key needed.


I'd think the key value must never change, and even say kingdom values in
a taxa table could possibly change.. might discover something new and do a
little reordering. :) Also natural keys might be strings, which I'm
thinking would not be as efficient as integers for an index.


  The problem with real world data is that different taxonomic levels are
used. Not all organisms can be identified to species; some (such as the
round worms, or nematodes) are at the level of order. That means there is no
combination of columns that are consistently not NULL. Sigh.

Rich


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Rich Shepard

On Thu, 9 Feb 2012, Merlin Moncure wrote:


If you can't (which happens on various type of data), then the surrogate
is giving the illusion of row uniqueness when there isn't one.


  Ah, but each row is unique. However, there is no consisten set of non NULL
values that can consistently define a unique key for each row.

Rich


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread David Salisbury


On 2/9/12 5:25 PM, Rich Shepard wrote:

For water quality data the primary key is (site, date, param) since
there's only one value for a given parameter collected at a specific
site on
a single day. No surrogate key needed.


Yea.  I was wondering if the surrogate key debate really boils down to the
composite primary key debate.  Seems so in my mind, though one could
maybe come up with a combination.  Basically aliases of values and
composite those.  Perhaps that's the ultimate methodology. :)


The problem with real world data is that different taxonomic levels are
used. Not all organisms can be identified to species; some (such as the
round worms, or nematodes) are at the level of order. That means there
is no
combination of columns that are consistently not NULL. Sigh.


I didn't know that about worms.  I did know grasses only went to the genus.
You could make a tall skinny self referential table though, and nothing
would be null and everything would be unique ( I think, unless certain
taxon values can appear under different higher order taxon values ).

Thanks for the view points out there.

Cheers,

-ds

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Adrian Klaver
On Thursday, February 09, 2012 5:18:19 pm David Salisbury wrote:
 On 2/9/12 5:25 PM, Rich Shepard wrote:
  For water quality data the primary key is (site, date, param) since
  there's only one value for a given parameter collected at a specific
  site on
  a single day. No surrogate key needed.
 
 Yea.  I was wondering if the surrogate key debate really boils down to the
 composite primary key debate.  Seems so in my mind, though one could
 maybe come up with a combination.  Basically aliases of values and
 composite those.  Perhaps that's the ultimate methodology. :)
 
  The problem with real world data is that different taxonomic levels are
  used. Not all organisms can be identified to species; some (such as the
  round worms, or nematodes) are at the level of order. That means there
  is no
  combination of columns that are consistently not NULL. Sigh.
 
 I didn't know that about worms.  I did know grasses only went to the genus.
 You could make a tall skinny self referential table though, and nothing
 would be null and everything would be unique ( I think, unless certain
 taxon values can appear under different higher order taxon values ).

OT. Alright, now I have to ask. When you say grasses(or for that matter round 
worms) cannot be identified to species are you talking about the data you are 
receiving or in general.  Because as far as I know there are many species 
identified for both. They are difficult to id but species do exist.

 
 Thanks for the view points out there.
 
 Cheers,
 
 -ds

-- 
Adrian Klaver
adrian.kla...@gmail.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strategy for Primary Key Generation When Populating Table

2012-02-09 Thread Chris Travers
On Thu, Feb 9, 2012 at 2:10 PM, David Salisbury salisb...@globe.gov wrote:



 On 2/9/12 10:08 AM, Rich Shepard wrote:

   I have reports containing macroinvertebrate collection data for several
 hundred (or several thousand) of taxa. There is no natural key since there
 are multiple rows for each site/date pair. Years ago Joe Celko taught me
 to
 seek natural keys whenever they might exist. They don't here. That's why I
 specifically mentioned that in my message.



 Interesting.  I used to think natural keys were okay, but have since
 decided
 that surrogates are the way to go.  That second layer of abstraction allows
 for much easier data modifications when needed.  What would be an example
 of a natural key that would be good to use, and why would it be
 preferable??

 I'd think the key value must never change, and even say kingdom values in a
 taxa table could possibly change.. might discover something new and do a
 little reordering. :)  Also natural keys might be strings, which I'm
 thinking
 would not be as efficient as integers for an index.

 Well, here is the approach we have taken with LedgerSMB:

Every table has a defined primary key, and where possible this is the
natural key.  There are cases where there is no natural key however and we
use a surrogate key.  However every table also has at least one single
column key whether it is the natural primary key or a surrogate one.

All joins are done on surrogate keys.

This has a few very specific advantages as the db schema changes:  if
criteria for the natural key must change because of evolving requirements,
the join conditions need not change.  Moreover joins don't require intimate
knowledge of natural keys between tables, making joins simpler and more
predictable, and easier to read.

So I don't think this is an either/or proposition.  I think there is a
great deal of benefit to the use of both.

Best Wishes,
Chris Travers