Re: [GENERAL] select count() out of memory

2007-10-26 Thread Thomas Finneid



Gregory Stark wrote:

Tom's point is that if you have 55k tables then just *finding* the newest
child table is fairly expensive. You're accessing a not insignificant-sized
index and table of tables. And the situation is worse when you consider the
number of columns all those tables have, all the indexes those tables have,
all the column keys those indexes the tables have have, etc.


Yes, I got that. But I name the child tables so that I when my server 
receives read requests, I retreive details from the request to be able 
to figure out the exact child table name, without the system needing to 
do any internal searches to find the newest table.



Nonetheless you've more or less convinced me that you're not completely nuts.


thank you for only regarding me as somewhat nuts :)


I would suggest not bothering with inheritance though. Inheritance imposes
additional costs to track the inheritance relationships. For your purposes you
may as well just create separate tables and not bother trying to use
inheritance.


As addressed in a previous reply, I find inheritance better for a couple 
of practical reasons.



If its practical to use partitions, granularity does not come into the
equation.


Uhm, yes it does. This is engineering, it's all about trade-offs. Having 55k
tables will have costs and benefits. I think it's a bit early to dismiss the
costs. Keep in mind that profiling them may be a bit tricky since they occur
during planning and DDL that you haven't finished experimenting with yet. The
problem you just ran into is just an example of the kind of costs it imposes.


See answer on why granularity is not relevant for my case.


You should also consider some form of compromise with separate tables but at a
lower level of granularity. Perhaps one partition per day instead of one per
30s. you could drop a partition when all the keys in it are marked as dead.


The structure of the data is divided in a descrete timeline, so every 
predefined x seconds a whole new bunch of data arrives, and all that 
belongs in a single partition.



regards

thomas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Thomas Finneid



Scott Marlowe wrote:

It may well be that one big table and partial indexes would do what
you want.  Did you explore partial indexes against one big table?
That can be quite handy.


Hmm, interresting, I suppose it could work. Tanks for the suggestion, 
Ill keep it in mind.


regards

thomas


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Query_time SQL as a function w/o creating a new type

2007-10-26 Thread Reg Me Please
You could try this:


CREATE OR REPLACE FUNCTION foo( out procpid integer, out client_addr inet, out 
query_time interval, out current_query text )
RETURNS SETOF RECORD AS $BODY$
...
$BODY$ LANGUAGE PLPGSQL VOLATILE;

(Thanks to Joen Conway for showing this in tablefunc!)


Il Friday 26 October 2007 08:24:46 Ow Mun Heng ha scritto:
 Hi,

 After Erik Jones gave me the idea for this, I started to become lazy to
 have to type this into the sql everytime I want to see how long a query
 is taking.. so, I thought that I'll create a function to do just that..
 I ended up with..

 CREATE OR REPLACE FUNCTION query_time()
   RETURNS SETOF query_time AS
 $BODY$
 DECLARE
 rec RECORD;

 BEGIN
   FOR rec IN
   SELECT procpid, client_addr, now() - query_start as query_time,
   current_query
   FROM pg_stat_activity
   ORDER BY query_time DESC
   LOOP
   RETURN NEXT rec;
   END LOOP;
   RETURN;
 END;

 $BODY$
   LANGUAGE 'plpgsql' VOLATILE;


 But the issue with the above is that I need to create a type.

 CREATE TYPE query_time AS
(procpid integer,
 client_addr inet,
 query_time interval,
 current_query text);

 Is there a method which I'm able to return a result set w/o needing to
 declare/create a new type.

 I tried to use language 'sql' but it only returned me 1 column, with all
 the fields concatenated together with comma separating the fields.





 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Thomas Finneid


Jorge Godoy wrote:

Em Thursday 25 October 2007 13:36:09 [EMAIL PROTECTED] escreveu:

Regarding dumps and restore; the system will always be offline during
those operations and it will be so for several days, because a new project
might start at another location in the world, so the travelling there
takes time. In the mean time, all admin tasks can be performed without
problems, even backup operations that take 3 days.


This sounds a lot like oil exploration...  Data gathered from sensors is 
usually a few TBs, explosions have definite intervals, interference between 
sensors, etc.


Sorry I cant talk about what the work actually is, a colleague of mine 
just got reprimanded for just mentioning he was working on a compression 
library. The manager thought he was revealing *too much* :)


Putting the data inside the DB fast is part of the solution, getting it out 
fast to be processes / analyzed is another part.


But you are right about that part, things needs to be fast.

regards

thomas


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Query_time SQL as a function w/o creating a new type

2007-10-26 Thread Ow Mun Heng

On Fri, 2007-10-26 at 08:35 +0200, Reg Me Please wrote:
 You could try this:
 
 
 CREATE OR REPLACE FUNCTION foo( out procpid integer, out client_addr inet, 
 out 
 query_time interval, out current_query text )
 RETURNS SETOF RECORD AS $BODY$
 ...
 $BODY$ LANGUAGE PLPGSQL VOLATILE;


Somehow it doesn't work..

CREATE OR REPLACE FUNCTION query_time2( out procpid integer, out
client_addr inet, out 
query_time interval, out current_query text ) AS
  --RETURNS SETOF RECORD AS
$BODY$

BEGIN
SELECT procpid, client_addr, (now() - query_start),
current_query
FROM pg_stat_activity
ORDER BY (now() - query_start) DESC;
RETURN;
END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


ERROR:  query has no destination for result data
HINT:  If you want to discard the results of a SELECT, use PERFORM
instead.
CONTEXT:  PL/pgSQL function query_time2 line 3 at SQL statement


 
 Il Friday 26 October 2007 08:24:46 Ow Mun Heng ha scritto:
  Hi,
 
  After Erik Jones gave me the idea for this, I started to become lazy to
  have to type this into the sql everytime I want to see how long a query
  is taking.. so, I thought that I'll create a function to do just that..
  I ended up with..
 
  CREATE OR REPLACE FUNCTION query_time()
RETURNS SETOF query_time AS
  $BODY$
  DECLARE
  rec RECORD;
 
  BEGIN
  FOR rec IN
  SELECT procpid, client_addr, now() - query_start as query_time,
  current_query
  FROM pg_stat_activity
  ORDER BY query_time DESC
  LOOP
  RETURN NEXT rec;
  END LOOP;
  RETURN;
  END;
 
  $BODY$
LANGUAGE 'plpgsql' VOLATILE;
 
 
  But the issue with the above is that I need to create a type.
 
  CREATE TYPE query_time AS
 (procpid integer,
  client_addr inet,
  query_time interval,
  current_query text);
 
  Is there a method which I'm able to return a result set w/o needing to
  declare/create a new type.
 
  I tried to use language 'sql' but it only returned me 1 column, with all
  the fields concatenated together with comma separating the fields.
 
 
 
 
 
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Query_time SQL as a function w/o creating a new type

2007-10-26 Thread Ow Mun Heng
Hi,

After Erik Jones gave me the idea for this, I started to become lazy to
have to type this into the sql everytime I want to see how long a query
is taking.. so, I thought that I'll create a function to do just that..
I ended up with..

CREATE OR REPLACE FUNCTION query_time()
  RETURNS SETOF query_time AS
$BODY$
DECLARE
rec RECORD;

BEGIN
FOR rec IN
SELECT procpid, client_addr, now() - query_start as query_time,
current_query
FROM pg_stat_activity
ORDER BY query_time DESC
LOOP
RETURN NEXT rec;
END LOOP;
RETURN;
END;

$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


But the issue with the above is that I need to create a type.

CREATE TYPE query_time AS
   (procpid integer,
client_addr inet,
query_time interval,
current_query text);

Is there a method which I'm able to return a result set w/o needing to 
declare/create a new type.

I tried to use language 'sql' but it only returned me 1 column, with all the 
fields concatenated together with
comma separating the fields.





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Query_time SQL as a function w/o creating a new type

2007-10-26 Thread A. Kretschmer
am  Fri, dem 26.10.2007, um 14:51:52 +0800 mailte Ow Mun Heng folgendes:
 
 On Fri, 2007-10-26 at 08:35 +0200, Reg Me Please wrote:
  You could try this:
  
  
  CREATE OR REPLACE FUNCTION foo( out procpid integer, out client_addr inet, 
  out 
  query_time interval, out current_query text )
  RETURNS SETOF RECORD AS $BODY$
  ...
  $BODY$ LANGUAGE PLPGSQL VOLATILE;
 
 
 Somehow it doesn't work..
 
 CREATE OR REPLACE FUNCTION query_time2( out procpid integer, out
 client_addr inet, out 
 query_time interval, out current_query text ) AS
   --RETURNS SETOF RECORD AS
 $BODY$
 
 BEGIN
   SELECT procpid, client_addr, (now() - query_start),
   current_query
   FROM pg_stat_activity
   ORDER BY (now() - query_start) DESC;
   RETURN;
 END;
 $BODY$
   LANGUAGE 'plpgsql' VOLATILE;
 
 
 ERROR:  query has no destination for result data
 HINT:  If you want to discard the results of a SELECT, use PERFORM
 instead.
 CONTEXT:  PL/pgSQL function query_time2 line 3 at SQL statement

Change the SELECT procpid, ... to
   SELECT into procpid, ...


Thats all (i hope)...


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] subversion support?

2007-10-26 Thread Dave Page
Roberts, Jon wrote:
 That is awesome.  Can it be added to pga3?

Like I said - it wasn't implemented in pga3 because noone used it in
pga2 except for (as far as I know), the team I was working with at the
time. As I recall we polled the mailing lists before dropping it and
noone said they wanted to keep it, nor did anyone complain about it
being missing in pga3.

Unless there are a decent number of people that would actually use it
now, I'm reluctant to spend the time on a feature that would simply add
to the maintenance burden.

Regards, Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Gregory Stark
D. Dante Lorenso [EMAIL PROTECTED] writes:

 Pat Maddox wrote:
 I'd like to store some tree data in my database.  I want to be able to
 sort the data but maintain a tree structure
 Is it possible to pull all the data like that with one query?  How do
 I need to structure the table, and what query do I have to run in
 order to make it happen?

 You need to look at the connectby function which is part of contrib.

Or ltree. Depending on how static your data is and what else you need to do
with it.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Dave Page
Magnus Hagander wrote:
 Rainer Bauer wrote:
 After increasing the session heap size in the registry from 512KB to 1024KB
 the no. of connections was roughly doubled. So this might be a solution for
 people running out of Desktop heap.

 Alter the value of the following key
 HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

 The numeric values following SharedSection= control the heap management:
 On WinXP these are the default values: SharedSection=1024,3072,512
 Altering this to SharedSection=1024,3072,1024 will increase the heap for 
 all
 non-interactive window stations to 1024KB.
 
 This part should go in the FAQ, I think. It's valid for 8.2 as well,
 from what I can tell, and it's valid for 8.3 both before and after the
 patch I just applied.
 
 Dave, you're listed as maintainer :-P

done.

/D

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Sam Mason
On Fri, Oct 26, 2007 at 08:26:13AM +0200, Thomas Finneid wrote:
 Scott Marlowe wrote:
 It may well be that one big table and partial indexes would do what
 you want.  Did you explore partial indexes against one big table?
 That can be quite handy.
 
 Hmm, interresting, I suppose it could work. Tanks for the suggestion, 
 Ill keep it in mind.

That's still going to have to do a tablescan on the whole dataset (a
couple of terabytes?) before building the index isn't it? that doesn't
sound like something you'd want to do too often.

Are there any thoughts of deferring index update so that many rows
can be merged simultaneously, rather than doing many individual index
operations?  It sounds as though this is what Thomas is really after and
it would also remove the need for dropping indexes while doing a bulk
insert of data.  I apologise if this has been discussed before!


  Sam

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Sam Mason
On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
 Gregory Stark wrote:
 Tom's point is that if you have 55k tables then just *finding* the newest
 child table is fairly expensive. You're accessing a not insignificant-sized
 index and table of tables. And the situation is worse when you consider the
 number of columns all those tables have, all the indexes those tables have,
 all the column keys those indexes the tables have have, etc.
 
 Yes, I got that. But I name the child tables so that I when my server 
 receives read requests, I retreive details from the request to be able 
 to figure out the exact child table name, without the system needing to 
 do any internal searches to find the newest table.

I think the lookup that is being referred to is the fact that if you've
got 55k (plus) files in a directory then the filesystem still has to
perform a search in the directory to locate the actual file associated
with the filename.  There probably isn't going to be much difference
between the filesystem performing this lookup vs the database descending
a few levels of its index structure.

I have a feeling you may have thought about this before, but thought it
wouldn't hurt to point it out more explicitly.


  Sam

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Martijn van Oosterhout
On Fri, Oct 26, 2007 at 12:34:06PM +0200, Reg Me Please wrote:
 it's very fast (of course!). But when I run:
 
 SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';
 
 it's very slow.
 The EXPLAIN says that in the second case it has to do a sequential
 scan on T_DATA. And this explains the timing.
 Is there a way to avoid such a behaviour by acting on indexes?

Firstly, have you run ANALYZE recently. Secondly, you'll have to show
us the output of EXPLAIN ANALYZE if you want some useful help.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[GENERAL] How to ALTER a TABLE to change the primary key?

2007-10-26 Thread Reg Me Please
Hi all.
I'd need to modify the primary key definition in an already populated table.
How can I do it?

Thanks.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] How to ALTER a TABLE to change the primary key?

2007-10-26 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/26/07 04:39, Reg Me Please wrote:
 Hi all.
 I'd need to modify the primary key definition in an already populated table.
 How can I do it?

Have you tried dropping the constraint, and creating a new one?

http://www.postgresql.org/docs/8.2/interactive/sql-altertable.html

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHIbwhS9HxQb37XmcRAjg6AJ4sPW7wPH599JPVdmZ5s25b5yHnHQCeJtsr
0TRv9XcYy2+04FW+1dNIYFc=
=ldEW
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] TRUNCATE - timing of the return of disk space - caused by long-lived client?

2007-10-26 Thread Vince Negri

Hi All,

Running postgres 8.1.9 on SLES10 (kernel 2.6.16) currently, I noticed something 
about the behaviour
of TRUNCATE on a large table.

The docs imply that TRUNCATE is the best way to delete everything in a table
(rather than DELETE) since there is no need to VACUUM afterward - the disk space
used is immediately returned to the operating system.

In the setup in question, there is one table in the cluster that acts as a 
sequential
log. A long-lived process (24/7) connects to the cluster and writes rows to the 
table
(existing rows are never altered.) The client does not use transactions, only 
single INSERT commands.

Obviously this table can't be allowed to grow for ever, but it is important not 
to
disrupt the client connection. One approach is to periodically
DELETE old entries and then do a VACUUM so that they can be re-used. This is 
quite slow since
the table is large.

So I tried TRUNCATE on the table. It appeared to work  - in that the row count 
dropped to zero
and the connected client was not disrupted, and du on the postgres data 
directory showed a fall. 

But the available disk space (reported by df) did not fall. 

So I used lsof | grep pgsql | grep deleted to look for files that have been 
deleted but are held open
and sure enough, there is the file for the table I just truncated. It is 
referenced by a number of 
postmaster processes(threads?) Most of which are associated with connections 
that have *never queried* the
table in question, which is odd, but one process is associated with the 
long-lived connection.

What causes the file handles of the truncated table to be released by all 
postmaster processes?
I am concerned that some of these files will only get fully deleted once all 
clients have disconnected
or the postgres server shuts down (neither of which is desirable.)

Vince




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] INDEX and JOINs

2007-10-26 Thread Reg Me Please
Hi all.
I have a setup like this:

CREATE TABLE T_FIELDS (
  TABL_ID TEXT NOT NULL,
  COLU_ID TEXT NOT NULL,
  FIEL_ID TEXT PRIMARY KEY,
  UNIQUE( TABL_ID,COLU_ID ) ); --  200 ROWS

CREATE TABLE T_DATA (
  ITEM_ID INT8 NOT NULL,
  FIEL_ID TEXT NOT NULL REFERENCES T_FIELDS,
  DATA_T TEXT NOT NULL,
  PRIMARY( FIEL_ID,ITEM_ID  ) ); --  10M ROWS

When I run

SELECT * FROM T_DATA WHERE FIEL_ID='TABL.FIEL';

it's very fast (of course!). But when I run:

SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';

it's very slow.
The EXPLAIN says that in the second case it has to do a sequential
scan on T_DATA. And this explains the timing.
Is there a way to avoid such a behaviour by acting on indexes?

Thanks.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Reg Me Please
Il Friday 26 October 2007 13:05:10 Martijn van Oosterhout ha scritto:
 On Fri, Oct 26, 2007 at 12:34:06PM +0200, Reg Me Please wrote:
  it's very fast (of course!). But when I run:
 
  SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';
 
  it's very slow.
  The EXPLAIN says that in the second case it has to do a sequential
  scan on T_DATA. And this explains the timing.
  Is there a way to avoid such a behaviour by acting on indexes?

 Firstly, have you run ANALYZE recently. Secondly, you'll have to show
 us the output of EXPLAIN ANALYZE if you want some useful help.

 Have a nice day,

Yes, I'm often runing analyze while trying to sort this kind of
things out.

This is the output:

prove=# explain analyze SELECT * from t_dati natural left join t_campi where 
tabe_id='CONTE';
  QUERY PLAN
--
 Hash Join  (cost=3.95..382140.91 rows=274709 width=91) (actual 
time=1.929..57713.305 rows=92 loops=1)
   Hash Cond: (t_dati.camp_id = t_campi.camp_id)
   -  Seq Scan on t_dati  (cost=0.00..326851.72 rows=14010172 width=73) 
(actual time=0.028..43814.946 rows=14011712 loops=1)
   -  Hash  (cost=3.91..3.91 rows=3 width=33) (actual time=0.129..0.129 
rows=3 loops=1)
 -  Seq Scan on t_campi  (cost=0.00..3.91 rows=3 width=33) (actual 
time=0.040..0.121 rows=3 loops=1)
   Filter: (tabe_id = 'CONTE'::text)
 Total runtime: 57713.449 ms

(I translated the table and column names. The substance is the same.)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Reg Me Please
Il Friday 26 October 2007 13:56:20 Martijn van Oosterhout ha scritto:
 On Fri, Oct 26, 2007 at 01:10:42PM +0200, Reg Me Please wrote:
  prove=# explain analyze SELECT * from t_dati natural left join t_campi
  where tabe_id='CONTE';
QUERY PLAN
  -
 - Hash Join 
  (cost=3.95..382140.91 rows=274709 width=91) (actual
  time=1.929..57713.305 rows=92 loops=1)
 Hash Cond: (t_dati.camp_id = t_campi.camp_id)

 Neither of the columns are indexed according to the schema you sent so
 that's the problem. Or you broke something while translating.

  (I translated the table and column names. The substance is the same.)

 Try not translating, and we might get somewhere...

 Have a nice day,


prove=# \d t_dati
   Tabella public.t_dati
Colonna|   Tipo   |   Modificatori
---+--+--
 elem_id   | bigint   | not null
 camp_id   | text | not null
 dato_t| text | not null
 dato_r| double precision |
 dato_validita | timestamp with time zone | not null 
default '-infinity'::timestamp with time zone
 dato_scadenza | timestamp with time zone | not null 
default 'infinity'::timestamp with time zone
 dato_flag | boolean  | not null default true
 dato_data | timestamp with time zone | not null default now()
 dato_id   | bigint   | not null default 
nextval('t_dati_dato_id_seq'::regclass)
Indici:
t_dati_pkey PRIMARY KEY, btree (dato_id)
i_dati_0 btree (elem_id)
i_dati_1 btree (camp_id)
i_dati_2 btree (dato_t text_pattern_ops)
i_dati_3 btree (dato_flag, dato_validita, dato_scadenza)
i_dati_4 btree (dato_data)
Vincoli di integrità referenziale
t_dati_camp_id_fkey FOREIGN KEY (camp_id) REFERENCES t_campi(camp_id)

prove=# \d t_campi
   Tabella public.t_campi
 Colonna | Tipo | Modificatori
-+--+--
 tabe_id | text | not null
 colo_id | text | not null
 camp_id | text | not null
Indici:
t_campi_pkey PRIMARY KEY, btree (camp_id)
i_t_campi_0 btree (tabe_id)
Vincoli di integrità referenziale
t_campi_colo_id_fkey FOREIGN KEY (colo_id) REFERENCES t_colonne(colo_id)
t_campi_tabe_id_fkey FOREIGN KEY (tabe_id) REFERENCES t_tabelle(tabe_id)

They seems to be indexed.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] TRUNCATE - timing of the return of disk space - caused by long-lived client?

2007-10-26 Thread Tom Lane
Vince Negri [EMAIL PROTECTED] writes:
 What causes the file handles of the truncated table to be released by all 
 postmaster processes?

It should happen when the other backends process the sinval message
about the TRUNCATE, which at the latest should be the next time they
begin command execution.  What were the other clients doing, just
sitting idle?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] TRUNCATE - timing of the return of disk space - caused by long-lived client?

2007-10-26 Thread Alvaro Herrera
Vince Negri wrote:

 So I tried TRUNCATE on the table. It appeared to work  - in that the
 row count dropped to zero and the connected client was not disrupted,
 and du on the postgres data directory showed a fall. 
 
 But the available disk space (reported by df) did not fall. 

I think you shouldn't worry about it.  The file will eventually be
closed (maybe after a couple of checkpoints) and the space returned to
the filesystem.

FYI what TRUNCATE does is create a new, separate file for the table and
index storages.  The old one can still be open for a while, but it
should get detached not long after the transaction commits.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Michael Glaesemann


On Oct 26, 2007, at 4:19 , Gregory Stark wrote:


D. Dante Lorenso [EMAIL PROTECTED] writes:


You need to look at the connectby function which is part of contrib.


Or ltree. Depending on how static your data is and what else you  
need to do

with it.


Or adjacency list or nested set (or even nested intervals).

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Martijn van Oosterhout
On Fri, Oct 26, 2007 at 01:10:42PM +0200, Reg Me Please wrote:
 prove=# explain analyze SELECT * from t_dati natural left join t_campi where 
 tabe_id='CONTE';
   QUERY PLAN
 --
  Hash Join  (cost=3.95..382140.91 rows=274709 width=91) (actual 
 time=1.929..57713.305 rows=92 loops=1)
Hash Cond: (t_dati.camp_id = t_campi.camp_id)

Neither of the columns are indexed according to the schema you sent so
that's the problem. Or you broke something while translating.

 (I translated the table and column names. The substance is the same.)

Try not translating, and we might get somewhere...

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] How to ALTER a TABLE to change the primary key?

2007-10-26 Thread Michael Glaesemann


On Oct 26, 2007, at 5:39 , Reg Me Please wrote:

I'd need to modify the primary key definition in an already  
populated table.

How can I do it?


Drop the primary key constraint and create a new one. You can do this  
inside a transaction.


test=# \d strings
   Table public.strings
  Column  | Type | Modifiers
--+--+---
a_string | text | not null
Indexes:
strings_pkey PRIMARY KEY, btree (a_string)

test=# begin; alter table strings drop constraint strings_pkey; alter  
table strings add constraint new_pkey primary key (a_string); commit;

BEGIN
ALTER TABLE
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index  
new_pkey for table strings

ALTER TABLE
COMMIT
test=# \d strings;
   Table public.strings
  Column  | Type | Modifiers
--+--+---
a_string | text | not null
Indexes:
new_pkey PRIMARY KEY, btree (a_string)

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Tom Lane
Reg Me Please [EMAIL PROTECTED] writes:
 (cost=3.95..382140.91 rows=274709 width=91) (actual
 time=1.929..57713.305 rows=92 loops=1)
 Hash Cond: (t_dati.camp_id = t_campi.camp_id)

I think the reason it doesn't want to use an indexed join is the large
estimate of the number of join result rows.  You need to try to get that
number down to something nearer the reality.  Increasing the statistics
target for the larger table might help.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Query_time SQL as a function w/o creating a new type

2007-10-26 Thread Tom Lane
A. Kretschmer [EMAIL PROTECTED] writes:
 Change the SELECT procpid, ... to
SELECT into procpid, ...

For something like this, you shouldn't use plpgsql at all: a simple
SQL function gets the job done with a lot less notational overhead
(and likely less runtime overhead too).

postgres=# CREATE OR REPLACE FUNCTION query_time2( out procpid integer, out
postgres(# client_addr inet, out 
postgres(# query_time interval, out current_query text ) 
postgres-# returns setof record as $$
postgres$# SELECT procpid, client_addr, (now() - query_start),
postgres$# current_query
postgres$# FROM pg_stat_activity
postgres$# ORDER BY (now() - query_start) DESC;
postgres$# $$ language sql;
CREATE FUNCTION
postgres=# select * from query_time2();
 procpid | client_addr | query_time |current_query 
-+-++--
9874 | | 00:00:00   | select * from query_time2();
(1 row)

postgres=# 

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] INDEX and JOINs

2007-10-26 Thread Martijn van Oosterhout
On Fri, Oct 26, 2007 at 02:39:28PM +0200, Reg Me Please wrote:
  - Hash Join 
   (cost=3.95..382140.91 rows=274709 width=91) (actual
   time=1.929..57713.305 rows=92 loops=1)
  Hash Cond: (t_dati.camp_id = t_campi.camp_id)
 
 prove=# \d t_dati

Ok, my suggestion would be to run it with enable_seqscan=off and if
that is indeed faster, then try reducing random_page_cost.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Gregory Stark
Sam Mason [EMAIL PROTECTED] writes:

 On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
 Gregory Stark wrote:
 Tom's point is that if you have 55k tables then just *finding* the newest
 child table is fairly expensive. You're accessing a not insignificant-sized
 index and table of tables. And the situation is worse when you consider the
 number of columns all those tables have, all the indexes those tables have,
 all the column keys those indexes the tables have have, etc.
 
 Yes, I got that. But I name the child tables so that I when my server 
 receives read requests, I retreive details from the request to be able 
 to figure out the exact child table name, without the system needing to 
 do any internal searches to find the newest table.

 I think the lookup that is being referred to is the fact that if you've
 got 55k (plus) files in a directory then the filesystem still has to
 perform a search in the directory to locate the actual file associated
 with the filename.  There probably isn't going to be much difference
 between the filesystem performing this lookup vs the database descending
 a few levels of its index structure.

That's true but it's in *addition* to the database having to find the catalog
records for the table which involves an index lookup itself. 

Actually many index lookups since it has to look up the catalog record for the
table, for all the columns of the table, for all indexes of the table, for all
the index keys of those indexes, all constraints of the table, all triggers of
the table, all dependencies on other objects and of other objects on this
table (this latter is why I suggest not using inheritance). 

Each of these lookups is using an index to find the table out of 55k records
which is just the same work that you're saving in the top level of the index
tree.

If you think there's one obvious solution then you just haven't analyzed the
problem seriously. In serious engineering there are always tradeoffs. The
cleanest prettiest solution is not necessarily -- and in cases where you're
dealing with large numbers like this almost certainly isn't -- the optimal
choice.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] TRUNCATE - timing of the return of disk space - caused by long-lived client?

2007-10-26 Thread Vince Negri
Hi Tom (and all)

Yes, in the meantime I realised that the other relevant clients (the ones that
seemed to be holding the file handle) were ones that sat idle most of the time
and rarely executed any query. You are right, as each of these executed a query
(thus processing sinval) they released the filehandle.

Thanks for the pointers.


Vince

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: 26 October 2007 13:22
To: Vince Negri
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] TRUNCATE - timing of the return of disk space -
caused by long-lived client? 


Vince Negri [EMAIL PROTECTED] writes:
 What causes the file handles of the truncated table to be released by all 
 postmaster processes?

It should happen when the other backends process the sinval message
about the TRUNCATE, which at the latest should be the next time they
begin command execution.  What were the other clients doing, just
sitting idle?

regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] subversion support?

2007-10-26 Thread Tino Wildenhain

Hi Dave,

Dave Page schrieb:



--- Original Message ---
From: Roberts, Jon [EMAIL PROTECTED]
To: pgsql-general@postgresql.org
Sent: 25/10/07, 17:35:32
Subject: Re: [GENERAL] subversion support?

Complaint?  Who is complaining?  


I am simply asking if this feature that is rather common in other database
development tools will ever be added to pgAdmin.


pgAdmin II had change control. No-one ever really used it though so we never 
bothered to implement it in pgAdmin III.


But it was implemented differently then the proposal above.

One way to implement it as easily as possible would be the ability
to link editor windows to file on disk, where you could have
the file version controled and changes to the file would show
up immediately in the edit window where edits in the window
could (with small delay) auto saved to the file.

This way you need not change pgadmin much while you can use cvs/svn
on your file system to do the VC stuff.

only a clever way for mapping (maybe based on object type)
configuration and the change detection (file notify, FAM, ...)
(the latter depending on the OS unfortunately)

Regards
Tino

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] select count() out of memory

2007-10-26 Thread Sam Mason
On Fri, Oct 26, 2007 at 02:21:39PM +0100, Gregory Stark wrote:
 Sam Mason [EMAIL PROTECTED] writes:
  I think the lookup that is being referred to is the fact that if you've
  got 55k (plus) files in a directory then the filesystem still has to
  perform a search in the directory to locate the actual file associated
  with the filename.  There probably isn't going to be much difference
  between the filesystem performing this lookup vs the database descending
  a few levels of its index structure.
 
 That's true but it's in *addition* to the database having to find the catalog
 records for the table which involves an index lookup itself. 
 
 Actually many index lookups since it has to look up the catalog record for the
 table, for all the columns of the table, for all indexes of the table, for all
 the index keys of those indexes, all constraints of the table, all triggers of
 the table, all dependencies on other objects and of other objects on this
 table (this latter is why I suggest not using inheritance). 

OK, sounds pretty horrible to contemplate.  If most of the queries
are to the same table (sounds like they will be) then the relavant
information will remain resident in memory won't it?


  Sam

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] subversion support?

2007-10-26 Thread Dave Page
Hi Tino

Tino Wildenhain wrote:
 Hi Dave,
 pgAdmin II had change control. No-one ever really used it though so we
 never bothered to implement it in pgAdmin III.
 
 But it was implemented differently then the proposal above.

I'm not sure the detail of how it was implemented was a huge factor in
the fact that few people used it. People tend to complain if a feature
is there but they don't like the way it is designed - they didn't in
this instance.

 One way to implement it as easily as possible would be the ability
 to link editor windows to file on disk, where you could have
 the file version controled and changes to the file would show
 up immediately in the edit window where edits in the window
 could (with small delay) auto saved to the file.
 
 This way you need not change pgadmin much while you can use cvs/svn
 on your file system to do the VC stuff.

Yeah, but the most useful feature of such a system is to provide a
'diff' to allow a schema to be patched to a new version. To do that, you
need to store not only the object definition, but the changes made to
get to that state - ie. a bunch of ALTER statements instead of a
traditional diff. Unless you're going to provide that sort of
functionality (which I believe would be difficult with a traditional
SCMS), you might as well just script a regular 'pg_dump --schema-only 
svn commit'

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] cases in rules problem

2007-10-26 Thread Darnell Brawner

I am trying to make a sql based versioning system.
I am working on a Ruby on Rails project and am using a plugin called  
hobo the plugin can do some nice things but over all its lame but  
thats what i got  to work with.
The problem is hobo does a lot of work for you but the database most  
be in a standard format to use it.

so my idea for a sql versioning  work around was this.

CREATE TABLE main(
id serial CONSTRAINT firstkey PRIMARY KEY,
parent_id int,
title varchar(30),
public boolean default false
);

INSERT INTO main(parent_id,title,public)
VALUES
(1,'blah',true),
(1,'tah',false),
(1,'blah2',false),
(1,'blah3',false),
(2,'tah2',false),
(2,'tah3',true);

CREATE VIEW  vmain as
(SELECT * FROM main
   WHERE public=true
   ORDER BY id DESC)
   UNION
(SELECT *
   FROM main
   WHERE id IN (select max(id) from main group by parent_id)
   ORDER BY id DESC)

CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

the result of the view should be all rows with public as true and one  
false for each new parent_id  if any that must have a higher id than  
the true one.


So on the web server, someone of level writer can edit something a  
superuser has created but what happens is it puts the update into the  
view hits the rule and makes a dup in the main table with public set  
to false so no one on the outside can see it. And basically the most  
rows that show up will be the public on and the highest id private  
one i don't really care about them rolling back versions.


My problem is when the admin wants to approve the private row.  I tryed
CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
CASE NEW.public = true and OLD.public = false
THEN
UPDATE main set public=true where id=NEW.id
ELSE
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

But i can't seem to put CASE statements in a rule is there any why i  
can do then with out having to create a function and rule that fires it?

This has to  go on alot of table.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Pat Maddox
On 10/26/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Oct 26, 2007, at 4:19 , Gregory Stark wrote:

  D. Dante Lorenso [EMAIL PROTECTED] writes:
 
  You need to look at the connectby function which is part of contrib.
 
  Or ltree. Depending on how static your data is and what else you
  need to do
  with it.

 Or adjacency list or nested set (or even nested intervals).

 Michael Glaesemann
 grzm seespotcode net




A bunch of options so far...but there's really no way to do this with
standard SQL?

I'm starting to feel I'm better off just pulling the data I need and
then building the tree structure in my app code.

Pat

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] select count() out of memory

2007-10-26 Thread tfinneid
 Sam Mason [EMAIL PROTECTED] writes:

 On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
 Gregory Stark wrote:
 Tom's point is that if you have 55k tables then just *finding* the
 newest
 child table is fairly expensive. You're accessing a not
 insignificant-sized
 index and table of tables. And the situation is worse when you
 consider the
 number of columns all those tables have, all the indexes those tables
 have,
 all the column keys those indexes the tables have have, etc.

 Yes, I got that. But I name the child tables so that I when my server
 receives read requests, I retreive details from the request to be able
 to figure out the exact child table name, without the system needing to
 do any internal searches to find the newest table.

 I think the lookup that is being referred to is the fact that if you've
 got 55k (plus) files in a directory then the filesystem still has to
 perform a search in the directory to locate the actual file associated
 with the filename.  There probably isn't going to be much difference
 between the filesystem performing this lookup vs the database descending
 a few levels of its index structure.

 That's true but it's in *addition* to the database having to find the
 catalog
 records for the table which involves an index lookup itself.

 Actually many index lookups since it has to look up the catalog record for
 the
 table, for all the columns of the table, for all indexes of the table, for
 all
 the index keys of those indexes, all constraints of the table, all
 triggers of
 the table, all dependencies on other objects and of other objects on this
 table (this latter is why I suggest not using inheritance).

 Each of these lookups is using an index to find the table out of 55k
 records
 which is just the same work that you're saving in the top level of the
 index
 tree.

 If you think there's one obvious solution then you just haven't analyzed
 the
 problem seriously. In serious engineering there are always tradeoffs. The
 cleanest prettiest solution is not necessarily -- and in cases where
 you're
 dealing with large numbers like this almost certainly isn't -- the optimal
 choice.

Serious engineering does not imply perfect engineering, I have analyzed it
and made my tradeoffs. What you are forgetting here is that you clearly
dont understand the enire solution, So I will try to explain it again. And
if you still think its bonkers, the I urge you to come up with a solution
that works with the requirements.

Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives,
which must be stored by descrete time groups, e.g. second 3,6,9,12. The
data that arrives is approx 4MB per second, so in this case its 12MB. This
has to be processed by the server and written to the db, within 1 second.
There can be up to 5 writers at the same time. Within that same second, at
least 16 readers should be able to read all the data, *each*. Writers and
readers are only concerned with the latest data, i.e. data from the latest
time group, e.g. second 9.
This has to go on every predefined seconds for the next 6-12 weeks,
without stop, pause or any errors. These are the requirements.

When I performed performance tests I found several unwanted effects from
several test scenarios. Here are the most important ones:

- single large table, with indexes created when table is created.
   - this leads to the performance of an insert degrading as more data is
added, when I get
 to 1 billion rows it took 50 seconds to add the data.

 My lesson from this is that
  - single inserts can never be efficient enough
  - indexes cause linear performance drop as data volume increases

So I tried a different approach, which would address both issues:

- separate tables for each bulk of data
   - use of bulk insert through jdbc COPY.
   - add indexes to the newly create table after the copy is finished.

   My lesson from this is:
   - insert take constant time, no matter how much data is in the base
   - adding the indexes after insert takes constant time, i.e. some
milliseconds.

From this I realised that using either single tables or partitions is the
way to go, since I only need to access the latest data, i.e. the newest
table, in normal situations.

After thinking about it and discussing with this group, I found that using
partitions would be more practical for two reasons:
- changes to the parent table is automatically propagated to all child
tables, so the schema remains consistent and the server wont brake because
of differences in the tables.
- it is more maintainable to use create with inheritance sql in source
code than the entire ddl of the table.

So now I have tested the server 24/7 for a week and a half, with 1 writer
and 16 readers writing all the mentioned data, and everything works fine.
Expect for the select on the parent table, which now runs out of memory.
Which in it self is not a problem since I will never use the parent table
in production in any 

Re: [GENERAL] cases in rules problem

2007-10-26 Thread Erik Jones

On Oct 26, 2007, at 10:28 AM, Darnell Brawner wrote:


I am trying to make a sql based versioning system.
I am working on a Ruby on Rails project and am using a plugin  
called hobo the plugin can do some nice things but over all its  
lame but thats what i got  to work with.
The problem is hobo does a lot of work for you but the database  
most be in a standard format to use it.

so my idea for a sql versioning  work around was this.

CREATE TABLE main(
id serial CONSTRAINT firstkey PRIMARY KEY,
parent_id int,
title varchar(30),
public boolean default false
);

INSERT INTO main(parent_id,title,public)
VALUES
(1,'blah',true),
(1,'tah',false),
(1,'blah2',false),
(1,'blah3',false),
(2,'tah2',false),
(2,'tah3',true);

CREATE VIEW  vmain as
(SELECT * FROM main
   WHERE public=true
   ORDER BY id DESC)
   UNION
(SELECT *
   FROM main
   WHERE id IN (select max(id) from main group by parent_id)
   ORDER BY id DESC)

CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

the result of the view should be all rows with public as true and  
one false for each new parent_id  if any that must have a higher id  
than the true one.


So on the web server, someone of level writer can edit something a  
superuser has created but what happens is it puts the update into  
the view hits the rule and makes a dup in the main table with  
public set to false so no one on the outside can see it. And  
basically the most rows that show up will be the public on and the  
highest id private one i don't really care about them rolling back  
versions.


My problem is when the admin wants to approve the private row.  I  
tryed

CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
CASE NEW.public = true and OLD.public = false
THEN
UPDATE main set public=true where id=NEW.id
ELSE
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

But i can't seem to put CASE statements in a rule is there any why  
i can do then with out having to create a function and rule that  
fires it?

This has to  go on alot of table.


The problem here is that CASE statements go in queries, not around  
them.  That leave two options:  either create two rules, one for each  
case, or go ahead and create a function that gets fired by either a  
rule or a trigger.  As far as managing the trigger on a lot of  
tables, you can script that and I think you'll find that easier to  
manage than multiple rules on each table.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] PostgreSQL and AutoCad

2007-10-26 Thread Guy Fraser
On Thu, 2007-10-25 at 08:57 -0600, Josh Tolley wrote:
 On 10/24/07, Bob Pawley [EMAIL PROTECTED] wrote:
  Is there any way of converting text from an AutoCad (.dwg ot .dxf) file into
  a PostgreSQL  Database??
 
  Bob Pawley
 
 I know nothing of AutoCad, but your message has been sitting for a
 while without response, so I'll throw out the suggestion that you
 probably want AutoCad to export the text to some more common format
 (like a ASCII or UTF8 file or some such) and import that.
 

Stupid list!
I guess my response went directly to the poster instead of the list.

DXF is Text that can be parsed if I remember correctly.

Look it up on Wikipedia, it has some info, but there are AutoDesk
manuals that detail the format.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Rainer Bauer
Dave Page wrote:

Magnus Hagander wrote:
 Rainer Bauer wrote:
 After increasing the session heap size in the registry from 512KB to 1024KB
 the no. of connections was roughly doubled. So this might be a solution for
 people running out of Desktop heap.

 Alter the value of the following key
 HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

 The numeric values following SharedSection= control the heap management:
 On WinXP these are the default values: SharedSection=1024,3072,512
 Altering this to SharedSection=1024,3072,1024 will increase the heap for 
 all
 non-interactive window stations to 1024KB.
 
 This part should go in the FAQ, I think. It's valid for 8.2 as well,
 from what I can tell, and it's valid for 8.3 both before and after the
 patch I just applied.
 
 Dave, you're listed as maintainer :-P

done.

Dave could you add that it's the third parameter of the SharedSection string
that must be changed. I read that KB article, but still had to find the
correct one by trial and error, which required a reboot every time.

Rainer

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Michael Glaesemann


On Oct 26, 2007, at 10:56 , Pat Maddox wrote:


A bunch of options so far...but there's really no way to do this with
standard SQL?


What do you mean by standard SQL? Trees aren't inherently relational.



I'm starting to feel I'm better off just pulling the data I need and
then building the tree structure in my app code.


Part of the issue is how do you *store* the tree in the database. You  
have to encode that information somehow. These are all methods to do  
that.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Pat Maddox
On 10/26/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Oct 26, 2007, at 10:56 , Pat Maddox wrote:

  A bunch of options so far...but there's really no way to do this with
  standard SQL?

 What do you mean by standard SQL? Trees aren't inherently relational.

Right now my table looks like this:

posts
  id
  body
  parent_id
  root_id
  created_at

so if I've got the records

(1, 'post 1', NULL, 1, '4pm')
(2, 'post 2', NULL, 2, '8pm')
(3, 'post 3', 1, 1, '6pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')
(6, 'post 6', NULL, 1, '5pm')

I'd like to do a select and get them all in this order:

(1, 'post 1', NULL, 1, '4pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')
(3, 'post 3', 1, 1, '6pm')
(6, 'post 6', NULL, 1, '5pm')
(2, 'post 2', NULL, 2, '8pm')

And reverse sorted would be:

(2, 'post 2', NULL, 2, '8pm')
(6, 'post 6', NULL, 1, '5pm')
(1, 'post 1', NULL, 1, '4pm')
(3, 'post 3', 1, 1, '6pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')

Does that make sense?

Pat

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Selecting tree data

2007-10-26 Thread brian

Pat Maddox wrote:

Right now my table looks like this:

posts
  id
  body
  parent_id
  root_id
  created_at

so if I've got the records

(1, 'post 1', NULL, 1, '4pm')
(2, 'post 2', NULL, 2, '8pm')
(3, 'post 3', 1, 1, '6pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')
(6, 'post 6', NULL, 1, '5pm')

I'd like to do a select and get them all in this order:

(1, 'post 1', NULL, 1, '4pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')
(3, 'post 3', 1, 1, '6pm')
(6, 'post 6', NULL, 1, '5pm')
(2, 'post 2', NULL, 2, '8pm')

And reverse sorted would be:

(2, 'post 2', NULL, 2, '8pm')
(6, 'post 6', NULL, 1, '5pm')
(1, 'post 1', NULL, 1, '4pm')
(3, 'post 3', 1, 1, '6pm')
(4, 'post 4', 1, 1, '5pm')
(5, 'post 5', 4, 1, '6pm')




SELECT * FROM posts ORDER BY root_id, id;

brian

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Dave Page


 --- Original Message ---
 From: Rainer Bauer [EMAIL PROTECTED]
 To: pgsql-general@postgresql.org
 Sent: 26/10/07, 18:09:26
 Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
 
 Dave could you add that it's the third parameter of the SharedSection string
 that must be changed. I read that KB article, but still had to find the
 correct one by trial and error, which required a reboot every time.

Err, it does say that:

You can increase the non-interactive Desktop Heap by modifying the third 
SharedSection value in the registry as described in this Microsoft 
Knowledgebase article.

/D

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] change format of logging statements?

2007-10-26 Thread Matthew Hixson
With 8.2.4 is it possible to get Postgres to log incoming SQL  
statements the same as they look when written?  Instead of:


DEBUG: insert into foo (name) values ($1);
DETAIL: parameters: $1 = 'stan'

I'd like to see:

DEBUG: insert into foo (name) values ('stan');

This would be extremely helpful when debugging complex Hibernate  
generated queries.  I could just copypaste the query into a psql  
session and begin playing with it.

  Thanks,
   -M@

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Pat Maddox
On 10/26/07, brian [EMAIL PROTECTED] wrote:
 Pat Maddox wrote:
  Right now my table looks like this:
 
  posts
id
body
parent_id
root_id
created_at
 
  so if I've got the records
 
  (1, 'post 1', NULL, 1, '4pm')
  (2, 'post 2', NULL, 2, '8pm')
  (3, 'post 3', 1, 1, '6pm')
  (4, 'post 4', 1, 1, '5pm')
  (5, 'post 5', 4, 1, '6pm')
  (6, 'post 6', NULL, 1, '5pm')
 
  I'd like to do a select and get them all in this order:
 
  (1, 'post 1', NULL, 1, '4pm')
  (4, 'post 4', 1, 1, '5pm')
  (5, 'post 5', 4, 1, '6pm')
  (3, 'post 3', 1, 1, '6pm')
  (6, 'post 6', NULL, 1, '5pm')
  (2, 'post 2', NULL, 2, '8pm')
 
  And reverse sorted would be:
 
  (2, 'post 2', NULL, 2, '8pm')
  (6, 'post 6', NULL, 1, '5pm')
  (1, 'post 1', NULL, 1, '4pm')
  (3, 'post 3', 1, 1, '6pm')
  (4, 'post 4', 1, 1, '5pm')
  (5, 'post 5', 4, 1, '6pm')
 


 SELECT * FROM posts ORDER BY root_id, id;

 brian

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org/


Okay, but what if I want to order by created_at?

btw created_at is a timestamp, I just wrote '4pm' to make it a bit
easier to read.

Pat

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] cases in rules problem

2007-10-26 Thread Darnell Brawner

Ok i gave up just keeping this in  rules so i did this

CREATE OR REPLACE RULE version AS ON UPDATE TO
vmain
DO INSTEAD
   select version2 
(OLD.id,NEW.title,OLD.parent_id,NEW.public,OLD.public);


CREATE OR REPLACE FUNCTION version2(int,varchar,int,bool,bool)
RETURNS VOID
LANGUAGE 'plpgsql' AS '
DECLARE
_id ALIAS FOR $1;
_title  ALIAS FOR $2;
_parent_id  ALIAS FOR $3;
_n_public   ALIAS FOR $4;
_o_public   ALIAS FOR $5;
BEGIN
IF _n_public  true or _o_public  false
THEN update main set public=true where id=_id;
ELSE
 INSERT INTO main(parent_id,title,public) VALUES 
(_parent_id,_title,false);

END IF;
RETURN;
END
';

But when i run
update vmain set title='tah4' where id=6
which in theory should insert a new row with parent_id=2,  
title='tah4' and public = false

instead i get this error:
 [UPDATE - 0 row(s), 0.003 secs]  [Error Code: 0, SQL State: 23505]   
ERROR: duplicate key violates unique constraint firstkey



On Oct 26, 2007, at 12:04 PM, Erik Jones wrote:


On Oct 26, 2007, at 10:28 AM, Darnell Brawner wrote:


I am trying to make a sql based versioning system.
I am working on a Ruby on Rails project and am using a plugin  
called hobo the plugin can do some nice things but over all its  
lame but thats what i got  to work with.
The problem is hobo does a lot of work for you but the database  
most be in a standard format to use it.

so my idea for a sql versioning  work around was this.

CREATE TABLE main(
id serial CONSTRAINT firstkey PRIMARY KEY,
parent_id int,
title varchar(30),
public boolean default false
);

INSERT INTO main(parent_id,title,public)
VALUES
(1,'blah',true),
(1,'tah',false),
(1,'blah2',false),
(1,'blah3',false),
(2,'tah2',false),
(2,'tah3',true);

CREATE VIEW  vmain as
(SELECT * FROM main
   WHERE public=true
   ORDER BY id DESC)
   UNION
(SELECT *
   FROM main
   WHERE id IN (select max(id) from main group by parent_id)
   ORDER BY id DESC)

CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

the result of the view should be all rows with public as true and  
one false for each new parent_id  if any that must have a higher  
id than the true one.


So on the web server, someone of level writer can edit something a  
superuser has created but what happens is it puts the update into  
the view hits the rule and makes a dup in the main table with  
public set to false so no one on the outside can see it. And  
basically the most rows that show up will be the public on and the  
highest id private one i don't really care about them rolling back  
versions.


My problem is when the admin wants to approve the private row.  I  
tryed

CREATE OR REPLACE RULE main_up AS ON UPDATE TO vmain
DO INSTEAD
CASE NEW.public = true and OLD.public = false
THEN
UPDATE main set public=true where id=NEW.id
ELSE
INSERT INTO main(parent_id,title,public)
VALUES(NEW.parent_id,NEW.title,false);

But i can't seem to put CASE statements in a rule is there any why  
i can do then with out having to create a function and rule that  
fires it?

This has to  go on alot of table.


The problem here is that CASE statements go in queries, not around  
them.  That leave two options:  either create two rules, one for  
each case, or go ahead and create a function that gets fired by  
either a rule or a trigger.  As far as managing the trigger on a  
lot of tables, you can script that and I think you'll find that  
easier to manage than multiple rules on each table.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] change format of logging statements?

2007-10-26 Thread Tom Lane
Matthew Hixson [EMAIL PROTECTED] writes:
 Instead of:
 DEBUG: insert into foo (name) values ($1);
 DETAIL: parameters: $1 = 'stan'
 I'd like to see:
 DEBUG: insert into foo (name) values ('stan');

Don't hold your breath.  That would require a great deal more smarts
in the logging code (and a great deal more cycles expended) than it
has now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Selecting tree data

2007-10-26 Thread brian



Pat Maddox wrote:

On 10/26/07, brian [EMAIL PROTECTED] wrote:






SELECT * FROM posts ORDER BY root_id, id;

brian

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/




Okay, but what if I want to order by created_at?

btw created_at is a timestamp, I just wrote '4pm' to make it a bit
easier to read.



SELECT * FROM posts ORDER BY created_a, root_id, id;

brian

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Selecting K random rows - efficiently!

2007-10-26 Thread cluster
All you're doing is picking random =subsequences= from the same 
permutation of the original data.


You have some good points in your reply. I am very much aware of this 
non-random behavior you point out for the static random-value column 
approach but at least it is fast, which is a requirement. :-(
However, if the life time of the individual rows are short, the 
behaviour is, luckily, sufficiently random for my specific purpose.


I furthermore realize that the only way to get truly random samples is 
to ORDER BY random(), but this is an unacceptable slow method for large 
data sets.
Even though it is not trivial at all, there ARE indeed algorithms out 
there [1,2] for picking random sub sets from a result set but these are 
(sadly) not implemented in postgresql.



References:
[1] http://portal.acm.org/citation.cfm?id=304206

[2] http://compstat.chonbuk.ac.kr/Sisyphus/CurrentStudy/Sampling/vldb86.pdf

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit

2007-10-26 Thread Rainer Bauer
Dave Page wrote:

 --- Original Message ---
 From: Rainer Bauer [EMAIL PROTECTED]
 To: pgsql-general@postgresql.org
 Sent: 26/10/07, 18:09:26
 Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
 
 Dave could you add that it's the third parameter of the SharedSection 
 string
 that must be changed. I read that KB article, but still had to find the
 correct one by trial and error, which required a reboot every time.

Err, it does say that:

You can increase the non-interactive Desktop Heap by modifying the third 
SharedSection value in the registry as described in this Microsoft 
Knowledgebase article.

Must have overlooked that part. Sorry for the noise.

Rainer

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] change format of logging statements?

2007-10-26 Thread Alvaro Herrera
Tom Lane wrote:
 Matthew Hixson [EMAIL PROTECTED] writes:
  Instead of:
  DEBUG: insert into foo (name) values ($1);
  DETAIL: parameters: $1 = 'stan'
  I'd like to see:
  DEBUG: insert into foo (name) values ('stan');
 
 Don't hold your breath.  That would require a great deal more smarts
 in the logging code (and a great deal more cycles expended) than it
 has now.

That said, you can use explain on these things, though you must do a bit
more work:

alvherre=# prepare foo as insert into foo (name) values ($1);
PREPARE
alvherre=# explain execute foo('stan');
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 ligne)

The benefit is that this will use the same plan that Hibernate would be
using, whereas simply expanding the literal in the query would possibly not.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Geographic data sources, queries and questions

2007-10-26 Thread Chuck D.
On May 24, 2007 01:02:42 pm John D. Burger wrote:
 Tilmann Singer wrote:
  We are using this data which seems to be fairly extensive and
  accurate, and is free:
 
  http://earth-info.nga.mil/gns/html/gis_countryfiles.htm

 We use that, but it is only non-US, so we combine it with this:

http://geonames.usgs.gov/domestic/download_data.htm

 We also have a hodge-podge of other sources, but those are the main
 ones.  (By the way, we have found USGS to very amenable to dumping
 their data in arbitrary ways.  Those state files essentially try to
 fit everything into a single CSV format, but they have given us other
 custom dumps.)

 Note that both of these main sources have multiple names for the same
 location, so our schema is highly normalized - we have a separate
 table for names (so the string Springfield occurs in only one
 place :).  Because we are interested in all sorts of geographic
 entities, not just city/state/country, we have only a single table
 for these, with fields for type, lat/long, primary name, and a few
 other things.  All other relationships are represented in separate
 linking tables, using our internal IDs for locations and names, e.g.,
 location_has_name, location_contained_in_location, etc.  As far as
 FIPS and ISO codes are concerned, we have a separate table mapping
 (locationID, standards body) to codes.

 We are interested in sharing this stuff, so I'd be happy to pass
 along the schema and/or the data, although all of it is kind of beta.

 - John D. Burger
MITRE

This has been a while since I've written discussing this but I am looking for 
some help.

Has anyone successfully imported any of the newer 
http://earth-info.nga.mil/gns/html/gis_countryfiles.htm world cities files?

I keep getting problems with the import using COPY.  Just when I thought I'd 
solved the problem of the ^M$ in some fields (notably China's adm2's) I've 
come across a problem with cc1=SG in which there appears to be extra ^I tabs.

Anyone have success with these?  What preprocessing needs to be done to these 
files to get them to COPY correctly?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] find out data types using sql or php

2007-10-26 Thread Tom Hart

Hey guys. This is probably a n00b question, but here goes anyway.

I have a set of csv files that I COPY t o a number of import tables 
(same field layout as live tables, but with all datatypes 'text') then 
use an INSERT INTO ... SELECT FROM statement to transfer the rows over 
to the live table (at times filtering out, at times not). Unfortunately 
if any of the data is not type perfect (letters in number fields, etc.) 
then the entire query bombs and nothing gets loaded. What I'd like to do 
is add a field is_ok and then use sql or php (or whatever else, if 
there's an easier way) to determine the field datatype (text, numeric, 
bool, etc.) and then use some regex or something along those lines to 
attempt to verify that the data is good, and then mark the is_ok field 
(obviously a bool) as true, and use is_ok = TRUE in the insert/select 
statement. Can somebody give me a push in the right direction?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] WAL archiving idle database

2007-10-26 Thread Brian Wipf
I have a test PG 8.2.5 installation that has been left idle with no  
connections to it whatsoever for the last 24 hours plus. WALs are  
being archived exactly 5 minutes apart, even though archive_timeout  
is set to 60. Is this the expected behavior for a database with no  
changes?


Brian


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 16:31 -0500, Erik Jones wrote:
 In the absence of activity, WAL are archived every checkpoint_timeout  
 seconds.  archive_timeout is how long postgres will wait for the  
 archive_command to return before declaring it failed.
 

http://www.postgresql.org/docs/current/static/runtime-config-wal.html

When this parameter is greater than zero, the server will switch to a
new segment file whenever this many seconds have elapsed since the last
segment file switch.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] WAL archiving idle database

2007-10-26 Thread Erik Jones

On Oct 26, 2007, at 4:08 PM, Brian Wipf wrote:

I have a test PG 8.2.5 installation that has been left idle with no  
connections to it whatsoever for the last 24 hours plus. WALs are  
being archived exactly 5 minutes apart, even though archive_timeout  
is set to 60. Is this the expected behavior for a database with no  
changes?


Brian


In the absence of activity, WAL are archived every checkpoint_timeout  
seconds.  archive_timeout is how long postgres will wait for the  
archive_command to return before declaring it failed.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 15:08 -0600, Brian Wipf wrote:
 I have a test PG 8.2.5 installation that has been left idle with no  
 connections to it whatsoever for the last 24 hours plus. WALs are  
 being archived exactly 5 minutes apart, even though archive_timeout  
 is set to 60. Is this the expected behavior for a database with no  
 changes?
 

If it's set to just 60 that means 60 seconds.

What's happening is that you have a checkpoint_timeout of 5 minutes, and
that checkpoint must write a checkpoint record in the WAL, prompting the
archiving.

If you want it to happen less frequently, it's often safe to have
checkpoint timeout set to something larger by a reasonable amount. 

Anyone using a checkpoint_timeout is going to end up with quite a few
mostly-empty 16MB files to deal with. Someone wrote a utility to zero
out the empty space in WAL segments, you might look at
pg_clearxlogtail written by Kevin Grittner (search the archives or
pgfoundry). This allows you to gzip the files to basically nothing.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] find out data types using sql or php

2007-10-26 Thread Tom Hart

Replying to yourself is so depressing...

Anyway, I managed to google myself into a solution, I just wanted to 
share it with the list in case anybody else was interested.


Using the INFORMATION SCHEMA and a query like
SELECT column_name, data_type FROM information_schema.columns WHERE 
table_name = 'table';


I get results similar to

   column_name| data_type

---+---

atm_acct_mess | text

atm_acct_num  | numeric

atm_acct_tp1  | text

atm_acct_tp2  | text

atm_acct_tp3  | text

atm_acct_tp4  | text

atm_acct_tp5  | text

atm_acct_tp6  | text

atm_acct1_stcd| text

atm_acct2_stcd| text

atm_acct3_stcd| text

atm_acct4_stcd| text

atm_acct5_stcd| text

atm_acct6_stcd| text

atm_atm/ach_cd| integer

atm_atm/ach_id| numeric

atm_atm/ach_tp| integer

atm_cn_num| integer

atm_date_opened   | date

atm_id1   | text

atm_id2   | text

atm_id3   | text

atm_id4   | text

atm_id5   | text

atm_id6   | text

atm_last_act_date | date

atm_next_rec  | integer

atm_stat_cd   | integer

atm_trn_acct_id   | text

atm_trn_acct_num  | numeric

atm_trn_acct_tp   | text

atm_trn_cn_num| integer

atm_trn_date  | date

atm_trn_reg_e | integer

atm_trn_term_id   | text

atm_trn_trace | text

atm_trn_trn_num   | integer

(37 rows)


Which I can then of course parse with php and do some testing from there.

I hope this helps somebody, I know I could have used this information 
about 20 minutes ago :-)


Tom Hart wrote:

Hey guys. This is probably a n00b question, but here goes anyway.

I have a set of csv files that I COPY t o a number of import tables 
(same field layout as live tables, but with all datatypes 'text') then 
use an INSERT INTO ... SELECT FROM statement to transfer the rows over 
to the live table (at times filtering out, at times not). 
Unfortunately if any of the data is not type perfect (letters in 
number fields, etc.) then the entire query bombs and nothing gets 
loaded. What I'd like to do is add a field is_ok and then use sql or 
php (or whatever else, if there's an easier way) to determine the 
field datatype (text, numeric, bool, etc.) and then use some regex or 
something along those lines to attempt to verify that the data is 
good, and then mark the is_ok field (obviously a bool) as true, and 
use is_ok = TRUE in the insert/select statement. Can somebody give me 
a push in the right direction?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] find out data types using sql or php

2007-10-26 Thread Webb Sprague
 ... to determine the field datatype (text, numeric,
 bool, etc.)

I am not sure if this helps, but you can dig around in the system
files (pg_catalog.*), and probably write a query that gets the types
of every column in the data table you want to insert to.

Not a big push, but maybe it will get you started.  I don't have links
at hand -- sorry.

 and then use some regex or something along those lines to
 attempt to verify that the data is good, and then mark the is_ok field
 (obviously a bool) as true, and use is_ok = TRUE in the insert/select
 statement. Can somebody give me a push in the right direction?

This sounds like a lot of programming, but that you have the right
idea.  I am not sure how you would use the is_ok within SQL; I would
probably only try to insert data that is ok, and filter that in the
application.

I might also think in terms of wrapping everything in a transaction,
assuming it is all good, and then rolling back and catching your
exception and giving the user a decent error -- such as reformat your
spreadsheet, doofus, and try uploading again...  Very doable with
psycopg2 and python.

-W

 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 What's happening is that you have a checkpoint_timeout of 5 minutes, and
 that checkpoint must write a checkpoint record in the WAL, prompting the
 archiving.

 If you want it to happen less frequently, it's often safe to have
 checkpoint timeout set to something larger by a reasonable amount. 

I think you're confusing checkpoint_timeout and archive_timeout...

I seem to recall this behavior having been discussed before, but I
can't find it in the archives right now.  What is happening is that
after each checkpoint_timeout, we test to see if we need to write
a new checkpoint; which is determined by whether anything's been
inserted into WAL since the start of the last checkpoint.  And after
each archive_timeout, we test to see if we need to flush the current WAL
segment out to the archive; which is determined by whether the write
pointer is currently exactly at the start of a segment or not.

Which would be fine, except that the has anything been inserted
since last checkpoint test is actually done by seeing if the WAL
insert pointer has moved.  Which it will have, if we did an archive
flush.  And that means that each of these activities makes it look
to the other one like something has happened, and so you get a
checkpoint record every checkpoint_timeout seconds, and then we flush
the entire WAL file (containing only that record), even if the database
is in reality completely idle.  Lather, rinse, repeat.

In the prior discussion that I seem to remember, we didn't think of
a decent solution, and it kinda fell off the radar since zero-activity
isn't too interesting to a lot of folks.  However, chewing on it again
I think I've come up with a good idea that will fix this and actually
simplify the code a bit:

* Add a boolean flag insertedXLog to XLogCtlInsert, which means at
least one WAL record has been inserted since start of last checkpoint.
Also add a flag completedCkpt somewhere in XLogCtlData, which means
checkpoint successfully completed; this second flag is only used by
checkpoint so it can be considered as being protected by the
CheckpointLock.  At startup we can initialize insertedXLog = false,
completedCkpt = true.

* XLogInsert sets insertedXLog to true while holding WALInsertLock,
*except* when inserting either a checkpoint record or an xlog switch
record; in those cases it doesn't change the flag.

* CreateCheckpoint replaces its current rather complex test (lines
5693-5703 in CVS-tip xlog.c) with if insertedXLog is clear and
completedCkpt is set, we need not checkpoint.  If it does have
to perform a checkpoint, it clears both flags before releasing
WALInsertLock.

* After successful completion of a checkpoint, completedCkpt gets set.

Because insertedXLog is cleared at the same time the checkpoint's REDO
pointer is determined, this will correctly implement the requirement of
detecting whether anything has been inserted since the last REDO point.
This replaces the current indirect test involving comparing the last
checkpoint's REDO pointer to its own address.  However we have to not
set insertedXLog when we finally do insert the checkpoint record, thus
the special case is needed in XLogInsert.  The other special case of
ignoring xlog switch is what's needed to fix the bug, and is obviously
OK because an xlog switch doesn't represent a checkpointable change.

The reason we need the completedCkpt flag is that if a checkpoint fails
partway through, it would nonetheless have cleared insertedXLog, and
we don't want that to cause us to not retry the checkpoint next time.

This is slightly warty but it certainly seems a lot clearer than the
current test in lines 5693-5703.  The couple of lines to be added to
XLogInsert should have negligible performance impact.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] find out data types using sql or php

2007-10-26 Thread Michael Glaesemann


On Oct 26, 2007, at 16:06 , Tom Hart wrote:

What I'd like to do is add a field is_ok and then use sql or php  
(or whatever else, if there's an easier way) to determine the field  
datatype (text, numeric, bool, etc.) and then use some regex or  
something along those lines to attempt to verify that the data is  
good, and then mark the is_ok field (obviously a bool) as true, and  
use is_ok = TRUE in the insert/select statement.


This is one way to do it. Another option would be to process the csv  
file beforehand using your favorite scripting language and COPY into  
tables with the appropriate, expected datatypes. You know you haven't  
processed the text file adequately if the COPY fails.


If you're going to do the cleaning inside the database (i.e., all  
text fields), I'd stay inside the database using SQL and PL functions  
rather than connecting to the database via some external script,  
process the data, and stick it back in: that's just added overhead,  
and if the process is automated enough to the point you can write a  
script to handle it, pretty much anything you can do in an external  
script you can do inside the database without the overhead of round- 
tripping out and back.


However, if the cleaning is going to take significant user  
interaction, you might consider using a simple web app that would  
connect to the database so a user could view and clean the data. I  
did this with some success for manually checking if names in an  
imported file were properly split along given and family name lines.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Kevin Grittner
 On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 
 And after
 each archive_timeout, we test to see if we need to flush the current WAL
 segment out to the archive; which is determined by whether the write
 pointer is currently exactly at the start of a segment or not.
 
Hmmm...  We would actually prefer to get the WAL file at the
specified interval.  We have software to ensure that the warm
standby instances are not getting stale, and that's pretty simple
with the current behavior.  We don't have a bandwidth or storage
space issue because we zero out the unused portion of the WAL file
and gzip it -- an empty file's about 16 KB.  Checking that the whole
system is healthy gets a lot more complicated if we stop sending
empty WAL files.
 
Could this at least be a configurable option?
 
-Kevin
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
 Tom Lane [EMAIL PROTECTED] wrote: 
 And after
 each archive_timeout, we test to see if we need to flush the current WAL
 segment out to the archive; which is determined by whether the write
 pointer is currently exactly at the start of a segment or not.
 
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.

Well, if it's a feature not a bug, that's fine with me.  I wonder though
how predictable the behavior will really be with 8.3's distributed
checkpoints ... you might need to find another way anyhow.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote:
  On Fri, Oct 26, 2007 at  5:47 PM, in message [EMAIL PROTECTED],
 Tom Lane [EMAIL PROTECTED] wrote: 
  
  And after
  each archive_timeout, we test to see if we need to flush the current WAL
  segment out to the archive; which is determined by whether the write
  pointer is currently exactly at the start of a segment or not.
  
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.  We have software to ensure that the warm
 standby instances are not getting stale, and that's pretty simple
 with the current behavior.  We don't have a bandwidth or storage
 space issue because we zero out the unused portion of the WAL file
 and gzip it -- an empty file's about 16 KB.  Checking that the whole
 system is healthy gets a lot more complicated if we stop sending
 empty WAL files.
  
 Could this at least be a configurable option?
  

A good point.

Keep in mind that even in the current system, your configuration is
variable based on the checkpoint_timeout setting.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 Keep in mind that even in the current system, your configuration is
 variable based on the checkpoint_timeout setting.

Yeah, and he has to keep this less than archive_timeout in order for
it to work the way he wants, which is probably not good for performance.
(Sane settings of checkpoint_timeout are probably higher, not lower,
than what people are likely to use for archive_timeout.)

I think my recommendation to Kevin would be to force some trivial
transaction to occur a little before each expected archive_timeout,
so that there will be something to be archived.  This would have the
additional advantage that the monitor is checking that the database is
actually responding to queries, whereas just noting that it's spitting
out WAL files doesn't really prove that --- especially not if mere
no-op checkpoints can cause WAL files to be emitted.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 18:47 -0400, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  What's happening is that you have a checkpoint_timeout of 5 minutes, and
  that checkpoint must write a checkpoint record in the WAL, prompting the
  archiving.
 
  If you want it to happen less frequently, it's often safe to have
  checkpoint timeout set to something larger by a reasonable amount. 
 
 I think you're confusing checkpoint_timeout and archive_timeout...

Thanks for clarifying it. The user-visible behavior, as I understand it,
is that the time between archiving on an idle database is:
MAX(archive_timeout,checkpoint_timeout)

[ of course, there's no guarantee that the archive_command succeeds in
that time ]

It looks like checkpoint_timeout was the limiting factor, in his case.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote:
 Hmmm...  We would actually prefer to get the WAL file at the
 specified interval.  We have software to ensure that the warm
 standby instances are not getting stale, and that's pretty simple
 with the current behavior.  We don't have a bandwidth or storage

Another thought: when you say it's pretty simple, what do you do now?
My monitoring scripts for this particular situation employ some pretty
ugly code.

I think if this did get changed, I would change my script to monitor the
pg_current_xlog_location() of the primary database and compare to the
last restored log file... entry in the standby database's log.

I would think if the current location does not end in all zeros, you
should expect a new WAL segment to be archived soon. Although this
assumes that an idle database would not advance that location at all,
and I'm still trying to understand Tom's proposal well enough to know
whether that would be true or not.

If this doesn't get changed, I think we should archive every
archive_timeout seconds, rather than 
MAX(archive_timeout,checkpoint_timeout), which is less obvious.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] WAL archiving idle database

2007-10-26 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 I would think if the current location does not end in all zeros, you
 should expect a new WAL segment to be archived soon. Although this
 assumes that an idle database would not advance that location at all,
 and I'm still trying to understand Tom's proposal well enough to know
 whether that would be true or not.

With my proposal, after the last activity, you'd get a checkpoint, and
then at the next archive_timeout we'd advance the pointer to a segment
boundary and archive the old segment, and then nothing more would happen
until the next WAL-loggable update.  So yeah, the master's
pg_current_xlog_location could be expected to sit at a segment boundary
while it was idle.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] WAL archiving idle database

2007-10-26 Thread Erik Jones


On Oct 26, 2007, at 4:46 PM, Jeff Davis wrote:


On Fri, 2007-10-26 at 16:31 -0500, Erik Jones wrote:

In the absence of activity, WAL are archived every checkpoint_timeout
seconds.  archive_timeout is how long postgres will wait for the
archive_command to return before declaring it failed.



http://www.postgresql.org/docs/current/static/runtime-config-wal.html

When this parameter is greater than zero, the server will switch to a
new segment file whenever this many seconds have elapsed since the  
last

segment file switch.

Regards,
Jeff Davis


Ah, my bad :)

Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Selecting tree data

2007-10-26 Thread Pat Maddox
On 10/26/07, brian [EMAIL PROTECTED] wrote:


 Pat Maddox wrote:
  On 10/26/07, brian [EMAIL PROTECTED] wrote:
 
 
 
 
 SELECT * FROM posts ORDER BY root_id, id;
 
 brian
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/
 
 
 
  Okay, but what if I want to order by created_at?
 
  btw created_at is a timestamp, I just wrote '4pm' to make it a bit
  easier to read.
 

 SELECT * FROM posts ORDER BY created_a, root_id, id;

That doesn't work because it just sorts by created_at, and then if two
records are the same it goes to root_id.  That's not what I want.

Pat

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match