Re: [PERFORM] Analysis Function

2010-06-14 Thread Magnus Hagander
On Sun, Jun 13, 2010 at 17:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Sun, Jun 13, 2010 at 17:42, Tom Lane t...@sss.pgh.pa.us wrote:
 ... (We presumably want
 timezone to default to the system timezone setting, but I wonder how
 we should make that work --- should an empty string be treated as
 meaning that?)

 Umm. NULL could be made to mean that, or we could provicde two
 different versions - one that takes TZ and one that doesn't.

 Using NULL like that seems a bit awkward: for one thing it'd mean the
 function couldn't be STRICT, and also it'd be bizarre that only this
 one argument could be null without leading to a null result.

Hmm, yeah.


 And two separate functions isn't good either.  Basically, I think it's
 important that there be a way to specify an explicit parameter value
 that behaves identically to the default.

In that case, empty string seems fairly reasonable - if you look at
the text based parsing, that's what we do if the timezone is an empty
string (meaning not specified).



 And a third, construct_time(), no?

 Yeah, maybe ... do you think there's any demand for it?

Yes, I think there is. Plus, it's for completeness :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-14 Thread Magnus Hagander
On Sun, Jun 13, 2010 at 21:19, David Jarvis thanga...@gmail.com wrote:
 Hi,

 It's not immediately obvious what the default value of timezone
 will be?

 The system's locale, like now(); documentation can clarify.

 By named parameter, I meant default value. You could construct a timestamp
 variable using:

   construct_timestamp( year := 1900, hour := 1 )

 When I read that code, the first thing I think it should return is:

   1900-01-01 01:00:00.-07

 I agree construct_timestamp( hour := 1 ) and construct_date() are errors:
 year is required.

Does it make sense to allow minutes when hours isn't specified? Or
should we simply say that for each of the date and the time part, to
specify at level n you need to have everything from the top up to
level n-1 specified? E.g. month requires year to be specified, day
requires both year and month etc?


 I prefer to_timestamp and to_date over the more verbose construct_timestamp.

Yeah, I agree with that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] query tuning help

2010-06-14 Thread AI Rumman
Can any one please help me in tuning the query?

explain
select *
from (select * from crmentity where deleted = 0 and createdtime between
(now() - interval '6 month') and now() ) as crmentity
inner join (select * from activity where  activitytype = 'Emails' and
date_start between (now() - interval '6 month')  and now()) as activity on
crmentity.crmid=activity.activityid
inner join emaildetails on emaildetails.emailid = crmentity.crmid
inner join vantage_email_track on
vantage_email_track.mailid=emaildetails.emailid
left join seactivityrel on seactivityrel.activityid = emaildetails.emailid



QUERY
PLAN


 Nested Loop Left Join  (cost=8725.27..17121.20 rows=197 width=581)
   -  Nested Loop  (cost=8725.27..16805.64 rows=7 width=573)
 -  Hash Join  (cost=8725.27..10643.08 rows=789 width=292)
   Hash Cond: (emaildetails.emailid =
public.activity.activityid)
   -  Seq Scan on emaildetails  (cost=0.00..1686.95 rows=44595
width=186)
   -  Hash  (cost=8664.41..8664.41 rows=4869 width=106)
 -  Hash Join  (cost=5288.61..8664.41 rows=4869
width=106)
   Hash Cond: (vantage_email_track.mailid =
public.activity.activityid)
   -  Seq Scan on vantage_email_track
(cost=0.00..1324.52 rows=88852 width=12)
   -  Hash  (cost=4879.22..4879.22 rows=15071
width=94)
 -  Bitmap Heap Scan on activity
(cost=392.45..4879.22 rows=15071 width=94)
   Recheck Cond: (((activitytype)::text
= 'Emails'::text) AND (date_start = (now() - '6 mons'::interval)) AND
(date_start = now()))
   -  Bitmap Index Scan on
activity_activitytype_date_start_idx  (cost=0.00..388.68 rows=15071 width=0)
 Index Cond:
(((activitytype)::text = 'Emails'::text) AND (date_start = (now() - '6
mons'::interval)) AND (date_start = now()))
 -  Index Scan using crmentity_pkey on crmentity  (cost=0.00..7.80
rows=1 width=281)
   Index Cond: (public.crmentity.crmid =
public.activity.activityid)
   Filter: ((public.crmentity.deleted = 0) AND
(public.crmentity.createdtime = now()) AND (public.crmentity.createdtime =
(now() - '6 mons'::interval)))
   -  Index Scan using seactivityrel_activityid_idx on seactivityrel
(cost=0.00..39.57 rows=441 width=8)
 Index Cond: (seactivityrel.activityid = emaildetails.emailid)
(19 rows)


Re: [PERFORM] Analysis Function

2010-06-14 Thread David Jarvis

 Does it make sense to allow minutes when hours isn't specified? Or


For time, 00 seems a reasonable default for all values; clearly document the
defaults. Also, having a default makes the code simpler than level n plus
level n-1. (Not to mention explaining it.) ;-)

SELECT to_timestamp( minutes := 19 ) -- error (year not specified)
SELECT to_timestamp( year := 2000, minutes := 19 ) -- 2000-01-01
00:19:00.-07

Dave


[PERFORM] Dead lock

2010-06-14 Thread Elias Ghanem

Hi all,
I have 2 data bases trying to perform an update query at the same time 
on a same table in a third data base using db link.

I'm getting a dead lock exception:
ERROR:  deadlock detected
DETAIL:  Process 27305 waits for ShareLock on transaction 55575; blocked 
by process 27304.
Process 27304 waits for ShareLock on transaction 55576; blocked by 
process 27305.

HINT:  See server log for query details.
Actually the folowing function is installed on 2 dbs DB1 and DB2. This 
function issues an update query on DB3.
When this function is running simultaneously on DB1 and DB2, it produces 
a dead lock making one of the functions (in DB1 or DB2) stop with the 
above exception:
Is it normal? should'nt postgres be able to handle such situations, for 
ex:  let one transaction wait untill the other commits or rollback then 
continue with the first transaction?
Is there a parameter that should be set in postgresql.conf to allow 
handling of concurrent transaction...?


CREATE OR REPLACE FUNCTION TEST_DB_LINK(VARCHAR)
RETURNS VOID AS'
DECLARE
C INTEGER;
P ALIAS FOR $1;
DUMMY  VARCHAR;
BEGIN
C:= 0;
LOOP
EXIT WHEN C  15;
C:= C+1;
SELECT INTO DUMMY DBLINK_EXEC(''CONNECTION_STRING TO DB3', 
''UPDATE IN_FICHE_PRODUIT SET VALIDE = 1'');

RAISE NOTICE ''%, %'', C,P;
END LOOP;
END;'
LANGUAGE 'plpgsql';

Thanks for your time.


Re: [PERFORM] Dead lock

2010-06-14 Thread Tom Molesworth

On 14/06/10 12:50, Elias Ghanem wrote:
SELECT INTO DUMMY DBLINK_EXEC(''CONNECTION_STRING TO DB3', 
''UPDATE IN_FICHE_PRODUIT SET VALIDE = 1'');


If there's more than one value in that table, an explicit ORDER BY might 
help (otherwise you could get the situation where query A will update in 
the order 1,2,3 and query B will do 3,2,1 so neither will be able to get 
the requested locks until the other query has finished).


Tom


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query tuning help

2010-06-14 Thread Andy Colson

On 06/14/2010 05:41 AM, AI Rumman wrote:

Can any one please help me in tuning the query?

explain
select *
from (select * from crmentity where deleted = 0 and createdtime between
(now() - interval '6 month') and now() ) as crmentity
inner join (select * from activity where  activitytype = 'Emails' and
date_start between (now() - interval '6 month')  and now()) as activity
on crmentity.crmid=activity.activityid
inner join emaildetails on emaildetails.emailid = crmentity.crmid
inner join vantage_email_track on
vantage_email_track.mailid=emaildetails.emailid
left join seactivityrel on seactivityrel.activityid = emaildetails.emailid



Can you send us 'explain analyze' too?


-  Seq Scan on emaildetails  (cost=0.00..1686.95 rows=44595 width=186)
-  Seq Scan on vantage_email_track  (cost=0.00..1324.52 rows=88852 width=12)


do you have indexes on emaildetails(emailid) and  vantage_email_track(mailid)?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-14 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sun, Jun 13, 2010 at 21:19, David Jarvis thanga...@gmail.com wrote:
 I prefer to_timestamp and to_date over the more verbose construct_timestamp.

 Yeah, I agree with that.

Those names are already taken.  It will cause confusion (of both people
and machines) if you try to overload them with this.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Dead lock

2010-06-14 Thread Andy Colson

On 06/14/2010 06:50 AM, Elias Ghanem wrote:

Hi all,
I have 2 data bases trying to perform an update query at the same time
on a same table in a third data base using db link.
I'm getting a dead lock exception:
ERROR: deadlock detected
DETAIL: Process 27305 waits for ShareLock on transaction 55575; blocked
by process 27304.
Process 27304 waits for ShareLock on transaction 55576; blocked by
process 27305.
HINT: See server log for query details.
Actually the folowing function is installed on 2 dbs DB1 and DB2. This
function issues an update query on DB3.
When this function is running simultaneously on DB1 and DB2, it produces
a dead lock making one of the functions (in DB1 or DB2) stop with the
above exception:
Is it normal? should'nt postgres be able to handle such situations, for
ex: let one transaction wait untill the other commits or rollback then
continue with the first transaction?
Is there a parameter that should be set in postgresql.conf to allow
handling of concurrent transaction...?

CREATE OR REPLACE FUNCTION TEST_DB_LINK(VARCHAR)
RETURNS VOID AS'
DECLARE
C INTEGER;
P ALIAS FOR $1;
DUMMY VARCHAR;
BEGIN
C:= 0;
LOOP
EXIT WHEN C  15;
C:= C+1;
SELECT INTO DUMMY DBLINK_EXEC(''CONNECTION_STRING TO DB3', ''UPDATE
IN_FICHE_PRODUIT SET VALIDE = 1'');
RAISE NOTICE ''%, %'', C,P;
END LOOP;
END;'
LANGUAGE 'plpgsql';

Thanks for your time.



I think PG is doing what you want.. if you think about it.  You start two transactions at the same 
time.  A transaction is defined as do this set of operations, all of which must succeed or 
fail atomicly.  One transaction cannot update the exact same row as another transaction 
because that would break the second transactions must succeed rule.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query hangs

2010-06-14 Thread Kevin Grittner
AI Rumman rumman...@gmail.com wrote:
 
 [It takes a long time to return 2.6 million rows.]
 
 Any more idea, please.
 
I don't recall you telling us exactly what the environment and
connection type is in which you're trying to return this large
result set.  Any specific suggestions would depend on that
information.
 
I do wonder why you are returning 2.6 million rows.  A result set
that large is rarely useful directly (except during data conversion
or loading of some sort).  Is there any filtering or aggregation
happening on the client side with the received rows?  If so, my
first suggestion would be to make that part of the query, rather
than part of the client code.
 
 Is table partition a good solution for query optimization?
 
Table partitioning is useful in some cases, but you haven't told us
anything yet to indicate that it would help here.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Fwd: Dead lock

2010-06-14 Thread Elias Ghanem

Hi,
Actually i guess the problem is related to the way PG uses to aquire 
lock on the rows that will be updated.

Suppose the update query will affect 5 rows: A, B, C, D and E.
Apparently the folowing senario is happening:
1- Transaction1 locks row A
2- Trnasaction2 locks row B
3- Transaction1 updates row A
4- Tranasaction2 updates row B
5- Transaction1 *tries *to acquire lock on row B(and fail because 
row B is still locked by transaction2)
6- Transaction2 *tries *to acquire lock on row A(and fail because 
row A is still locked by transaction1)

Hence the dead lock.
Is this a plausible explanation of what is going on?
If yes, what can be done to avoid the dead lock?
Thanks again.


 Original Message 
Subject:Dead lock
Date:   Mon, 14 Jun 2010 14:50:43 +0300
From:   Elias Ghanem e.gha...@acteos.com
To: pgsql-performance@postgresql.org



Hi all,
I have 2 data bases trying to perform an update query at the same time 
on a same table in a third data base using db link.

I'm getting a dead lock exception:
ERROR:  deadlock detected
DETAIL:  Process 27305 waits for ShareLock on transaction 55575; blocked 
by process 27304.
Process 27304 waits for ShareLock on transaction 55576; blocked by 
process 27305.

HINT:  See server log for query details.
Actually the folowing function is installed on 2 dbs DB1 and DB2. This 
function issues an update query on DB3.
When this function is running simultaneously on DB1 and DB2, it produces 
a dead lock making one of the functions (in DB1 or DB2) stop with the 
above exception:
Is it normal? should'nt postgres be able to handle such situations, for 
ex:  let one transaction wait untill the other commits or rollback then 
continue with the first transaction?
Is there a parameter that should be set in postgresql.conf to allow 
handling of concurrent transaction...?


CREATE OR REPLACE FUNCTION TEST_DB_LINK(VARCHAR)
RETURNS VOID AS'
DECLARE
C INTEGER;
P ALIAS FOR $1;
DUMMY  VARCHAR;
BEGIN
C:= 0;
LOOP
EXIT WHEN C  15;
C:= C+1;
SELECT INTO DUMMY DBLINK_EXEC(''CONNECTION_STRING TO DB3', 
''UPDATE IN_FICHE_PRODUIT SET VALIDE = 1'');

RAISE NOTICE ''%, %'', C,P;
END LOOP;
END;'
LANGUAGE 'plpgsql';

Thanks for your time.



Re: [PERFORM] Fwd: Dead lock

2010-06-14 Thread Dave Crooke
It's a standard (indeed, required) best practice of concurrent database
programming across any brand of database to ensure that multi-row
transactions always acquire the locks they use in a predictable order based
on row identities, e.g. for the classic banking debit-credit pair, doing
something like this (Java / JDBC, simplified for brevity and clarity):

PreparedStatement debit = conn.prepareStatement(update account set balance
= balance - ? where acc_no = ? and balance  ?);
debit.setLong(1, amount);
debit.setLong(2, debit_acct);
debit.setLong(3, amount);

PreparedStatement credit = conn.prepareStatement(update account set balance
= balance + ? where acc_no = ?);
credit.setLong(1, amount);
credit.setLong(2, credit_acct);

try {
   // always acquire row locks in increasing account number order
   conn.beginTransaction();
   if (credit_acct  debit_acct) {
  credit.executeUpdate();
  if (debit.executeUpdate()  1) throw new SQLException(Insufficient
balance);
   }
   else {
  if (debit.executeUpdate()  1) throw new SQLException(Insufficient
balance);
  credit.executeUpdate();
   }
}
catch (SQLException e) {
   System.err.println(Oops. transaction failed: , e.getMessage());
   conn.rollback();
}
conn.commit();

If you're doing straight SQL bulk updates, then as someone suggested, you
could use an ORDER BY on a subquery, but I don't know if that is a
guarantee, if you're not actually displaying the results then the DB may be
technically allowed to optimize it out from underneath you. The only way to
be sure is a cursor / procedure.

In short, this boils down to learning more about database programming. PG is
performing as it should.

Cheers
Dave

On Mon, Jun 14, 2010 at 10:36 AM, Elias Ghanem e.gha...@acteos.com wrote:

  Hi,
 Actually i guess the problem is related to the way PG uses to aquire lock
 on the rows that will be updated.
 Suppose the update query will affect 5 rows: A, B, C, D and E.
 Apparently the folowing senario is happening:
 1- Transaction1 locks row A
 2- Trnasaction2 locks row B
 3- Transaction1 updates row A
 4- Tranasaction2 updates row B
 5- Transaction1 *tries *to acquire lock on row B(and fail because row
 B is still locked by transaction2)
 6- Transaction2 *tries *to acquire lock on row A(and fail because row
 A is still locked by transaction1)
 Hence the dead lock.
 Is this a plausible explanation of what is going on?
 If yes, what can be done to avoid the dead lock?
 Thanks again.



  Original Message   Subject: Dead lock  Date: Mon, 14 Jun
 2010 14:50:43 +0300  From: Elias Ghanem 
 e.gha...@acteos.come.gha...@acteos.com  To:
 pgsql-performance@postgresql.org

 Hi all,
 I have 2 data bases trying to perform an update query at the same time on a
 same table in a third data base using db link.
 I'm getting a dead lock exception:
 ERROR:  deadlock detected
 DETAIL:  Process 27305 waits for ShareLock on transaction 55575; blocked by
 process 27304.
 Process 27304 waits for ShareLock on transaction 55576; blocked by process
 27305.
 HINT:  See server log for query details.
 Actually the folowing function is installed on 2 dbs DB1 and DB2. This
 function issues an update query on DB3.
 When this function is running simultaneously on DB1 and DB2, it produces a
 dead lock making one of the functions (in DB1 or DB2) stop with the above
 exception:
 Is it normal? should'nt postgres be able to handle such situations, for
 ex:  let one transaction wait untill the other commits or rollback then
 continue with the first transaction?
 Is there a parameter that should be set in postgresql.conf to allow
 handling of concurrent transaction...?

 CREATE OR REPLACE FUNCTION TEST_DB_LINK(VARCHAR)
 RETURNS VOID AS'
 DECLARE
 C INTEGER;
 P ALIAS FOR $1;
 DUMMY  VARCHAR;
 BEGIN
 C:= 0;
 LOOP
 EXIT WHEN C  15;
 C:= C+1;
 SELECT INTO DUMMY DBLINK_EXEC(''CONNECTION_STRING TO DB3', ''UPDATE
 IN_FICHE_PRODUIT SET VALIDE = 1'');
 RAISE NOTICE ''%, %'', C,P;
 END LOOP;
 END;'
 LANGUAGE 'plpgsql';

 Thanks for your time.




[PERFORM] RE: [PERFORM] Dbt2 w ith postgres issues on CentOS-5.3‏

2010-06-14 Thread MUHAMMAD ASIF

I am facing sar related issues on Redhat Enterprise Linux64 5.4 too (60G Ram, 
No Swap space, Xeon Processor).

sar -o /var/dbt2_data/PG/Output/driver/dbt2-sys1/sar_raw.out 60 204
|___ sadc 60 205 -z /var/dbt2_data/PG/Output/driver/dbt2-sys1/sar_raw.out

It generates following sar data i.e.
….
….
03:52:43 AM  2.31
03:53:43 AM  2.31
03:54:43 AM  2.28
03:55:43 AM  2.31
03:56:43 AM  1.67
03:57:43 AM  0.29
03:58:43 AM  0.29
04:00:43 AM  0.30
04:04:00 AM  3.52
04:07:07 AM  0.30
04:09:36 AM  0.23
04:12:04 AM  0.36
04:14:25 AM  0.23
04:16:45 AM  0.26
04:19:10 AM  0.24
04:21:30 AM  0.38
04:23:55 AM  0.24
04:26:25 AM  0.35
04:28:48 AM  0.24
04:31:10 AM  0.27
04:33:40 AM  0.33
04:36:45 AM  0.41
04:39:12 AM  0.27
04:41:41 AM  0.26
04:44:11 AM  0.33
04:46:35 AM  0.25
04:49:06 AM  0.33
04:51:27 AM  0.27
04:53:56 AM  0.23
04:56:19 AM  0.36
04:58:43 AM  0.24
05:01:10 AM  0.35
05:03:43 AM  0.33
05:06:53 AM  0.29
05:09:25 AM  0.23
….
….

To fix this issue I have modified the sysstat-9.1.2/sadc.c and replaced signal 
based pause (That is not real time) with select based pause. That fixed the 
issue. Thanks.

---

sadc.c.patch
---
--- sadc.c.org2010-06-14 21:44:18.0 +0500
+++ sadc.c2010-06-14 22:52:51.693211184 +0500
@@ -33,6 +33,10 @@
 #include sys/stat.h
 #include sys/utsname.h
 
+#include sys/types.h 
+#include sys/time.h 
+#include time.h 
+
 #include version.h
 #include sa.h
 #include rd_stats.h
@@ -792,6 +796,15 @@
 }
 }
 
+void pause_new( void )
+{
+struct timeval tvsel; 
+tvsel.tv_sec = interval; 
+tvsel.tv_usec = 0; 
+
+select( 0, NULL, NULL, NULL, tvsel );
+}
+
 /*
  ***
  * Main loop: Read stats from the relevant sources and display them.
@@ -899,7 +912,7 @@
 }
 
 if (count) {
-pause();
+pause_new();
 }
 
 /* Rotate activity file if necessary */
---

Best Regards,
Asif Naeem

 Date: Wed, 21 Apr 2010 18:10:35 -0700
 Subject: Re: [PERFORM] Dbt2 with postgres issues on CentOS-5.3‏
 From: mark...@gmail.com
 To: anaeem...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 2010/4/20 MUHAMMAD ASIF anaeem...@hotmail.com:
  Hi,
 
  I am using dbt2 on Linux 64 (CentOS release 5.3 (Final)) . I have compiled
  latest postgresql-8.4.3 code on the machine and run dbt2 against it. I am
  little confused about the results. I ran dbt2 with the following
  configuration i.e.
 
  DBT2 Options :
  WAREHOUSES=75
  DB_CONNECTIONS=20
  REGRESS_DURATION=1 #HOURS
  REGRESS_DURATION_SEC=$((60*60*$REGRESS_DURATION))
 
  DBT2 Command :
  ./dbt2-pgsql-create-db
  ./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES
  ./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d
  $REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR
  ./dbt2-pgsql-stop-db
 
  I am not able to understand the sar related graphs. Iostat,mpstat and vmstat
  results are similar but
sar results are strange. I tried to explore the dbt2 source code to find
  out the how graphs are drawn and why sar results differ.DBT2.pm : 189 reads
  sar.out and parse it and consider 1 minute elapsed time between each record
  i.e.
 
 That is certainly a weakness in the logic of the perl modules in
 plotting the charts accurately.  I wouldn't be surprised if the other
 stat tools suffer the same problem.
 
 Regards,
 Mark
  
_
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
https://signup.live.com/signup.aspx?id=60969

Re: [PERFORM] query hangs

2010-06-14 Thread Scott Marlowe
On Thu, Jun 10, 2010 at 5:36 AM, AI Rumman rumman...@gmail.com wrote:
 I found only AccessShareLock in pg_locks during the query.
 And the query does not return data though I have been waiting for 10 mins.

 Do you have any idea ?

I have queries that run for hours.  As long as it's using CPU / IO
(use top in unix, whatever in windows to see) it's not hung, it's just
taking longer than you expected.  Those are not the same thing at all.

Seeing as how you're joining three tables with millions of rows with
no where clause, it's gonna take some to complete.  Go grab a
sandwich, etc, come back when it's done.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fwd: Dead lock

2010-06-14 Thread Merlin Moncure
On Mon, Jun 14, 2010 at 11:58 AM, Dave Crooke dcro...@gmail.com wrote:
 If you're doing straight SQL bulk updates, then as someone suggested, you 
 could use an ORDER BY on a subquery, but I don't know if that is a guarantee, 
 if you're not actually displaying the results then the DB may be technically 
 allowed to optimize it out from underneath you. The only way to be sure is a 
 cursor / procedure.

'order by' should be safe if you use SELECT...FOR UPDATE.  update
doesn't have an order by clause.   Using cursor/procedure vs a query
is not the material point; you have to make sure locks are acquired in
a regular way.

update foo set x=x where id in (select * from bar order by x) does
look dangerous.

I think:
update foo set x=x where id in (select * from bar order by x for update)
should be ok.  I don't usually do it that way.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Tom Wilcox


Hi Bob,

Thanks a lot. Here's my best attempt to answer your questions:

The VM is setup with a virtual disk image dynamically expanding to fill 
an allocation of 300GB on a fast, local hard drive (avg read speed = 
778MB/s ).
WAL files can have their own disk, but how significantly would this 
affect our performance?
The filesystem of the host OS is NTFS (Windows Server 2008 OS 64), the 
guest filesystem is Ext2 (Ubuntu 64).
The workload is OLAP (lots of large, complex queries on large tables run 
in sequence).


In addition, I have reconfigured my server to use more memory. Here's a 
detailed blow by blow of how I reconfigured my system to get better 
performance (for anyone who might be interested)...


In order to increase the shared memory on Ubuntu I edited the System V 
IPC values using sysctl:


sysctl -w kernel.shmmax=16106127360*
*sysctl -w kernel.shmall=2097152

I had some fun with permissions as I somehow managed to change the 
owner  of the postgresql.conf to root where it needed to be postgres, 
resulting in failure to start the service.. (Fixed with chown 
postgres:postgres ./data/postgresql.conf and chmod u=rwx ./data -R).


I changed the following params in my configuration file..

default_statistics_target=1
maintenance_work_mem=512MB
work_mem=512MB
shared_buffers=512MB
wal_buffers=128MB

With this config, the following command took  6,400,000ms:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

With plan:
Seq Scan on match_data  (cost=0.00..1392900.78 rows=32237278 width=232) 
(actual time=0.379..464270.682 rows=2961 loops=1)

Total runtime: 6398238.890 ms

With these changes to the previous config, the same command took  
5,610,000ms:


maintenance_work_mem=4GB
work_mem=4GB
shared_buffers=4GB
effective_cache_size=4GB
wal_buffers=1GB

Resulting plan:

Seq Scan on match_data  (cost=0.00..2340147.72 rows=30888572 width=232) 
(actual time=0.094..452793.430 rows=2961 loops=1)

Total runtime: 5614140.786 ms

Then I performed these changes to the postgresql.conf file:

max_connections=3
effective_cache_size=15GB
maintenance_work_mem=5GB
shared_buffers=7000MB
work_mem=5GB

And ran this query (for a quick look - can't afford the time for the 
previous tests..):


EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id 
 10;


Result:

Index Scan using match_data_pkey1 on match_data  (cost=0.00..15662.17 
rows=4490 width=232) (actual time=27.055..1908.027 rows=9 loops=1)

  Index Cond: (match_data_id  10)
Total runtime: 25909.372 ms

I then ran EntrepriseDB's Tuner on my postgres install (for a dedicated 
machine) and got the following settings and results:


EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id 
 10;


Index Scan using match_data_pkey1 on match_data  (cost=0.00..13734.54 
rows=4495 width=232) (actual time=0.348..2928.844 rows=9 loops=1)

  Index Cond: (match_data_id  10)
Total runtime: 1066580.293 ms

For now, I will go with the config using 7000MB shared_buffers. Any 
suggestions on how I can further optimise this config for a single 
session, 64-bit install utilising ALL of 96GB RAM. I will spend the next 
week making the case for a native install of Linux, but first we need to 
be 100% sure that is the only way to get the most out of Postgres on 
this machine.


Thanks very much. I now feel I am at a position where I can really 
explore and find the optimal configuration for my system, but would 
still appreciate any suggestions.


Cheers,
Tom

On 11/06/2010 07:25, Bob Lunney wrote:

Tom,

First off, I wouldn't use a VM if I could help it, however, sometimes you have 
to make compromises.  With a 16 Gb machine running 64-bit Ubuntu and only 
PostgreSQL, I'd start by allocating 4 Gb to shared_buffers.  That should leave 
more than enough room for the OS and file system cache.  Then I'd begin testing 
by measuring response times of representative queries with significant amounts 
of data.

Also, what is the disk setup for the box?  Filesystem?  Can WAL files have 
their own disk?  Is the workload OLTP or OLAP, or a mixture of both?  There is 
more that goes into tuning a PG server for good performance than simply 
installing the software, setting a couple of GUCs and running it.

Bob

--- On Thu, 6/10/10, Tom Wilcox hungry...@gmail.com wrote:

  

From: Tom Wilcox hungry...@gmail.com
Subject: Re: [PERFORM] requested shared memory size overflows size_t
To: Bob Lunney bob_lun...@yahoo.com
Cc: Robert Haas robertmh...@gmail.com, pgsql-performance@postgresql.org
Date: Thursday, June 10, 2010, 10:45 AM
Thanks guys. I am currently
installing Pg64 onto a Ubuntu Server 64-bit installation
running as a VM in VirtualBox with 16GB of RAM accessible.
If what you say is true then what do you suggest I do to
configure my new setup to best use the available 16GB (96GB
and native install eventually if the test goes well) of RAM
on Linux.

I was considering starting by using Enterprise DBs tuner to
see if that 

Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-14 Thread Lacey Powers

Alvaro Herrera wrote:

Excerpts from Tom Lane's message of jue jun 10 11:46:25 -0400 2010:


Yes, the folks at commandprompt need to be told about this.  Loudly.
It's a serious packaging error.


Just notified Lacey, the packager (not so loudly, though); she's working
on new packages, and apologizes for the inconvenience.



Hello Everyone,

New packages for 8.4.4 on CentOS 5.5 and RHEL 5.5 (all arches), have 
been built, and are available in the PGDG repo.


http://yum.pgsqlrpms.org/8.4/redhat/rhel-5-i386/
http://yum.pgsqlrpms.org/8.4/redhat/rhel-5-x86_64/

Output from pg_config --configure --version is below.

x86_64:

'--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' 
'--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' 
'--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' 
'--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' 
'--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' 
'--sharedstatedir=/usr/com' '--mandir=/usr/share/man' 
'--infodir=/usr/share/info' '--disable-rpath' '--with-perl' 
'--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' 
'--with-openssl' '--with-pam' '--with-krb5' '--with-gssapi' 
'--with-includes=/usr/include' '--with-libraries=/usr/lib64' 
'--enable-nls' '--enable-thread-safety' '--with-libxml' '--with-libxslt' 
'--with-ldap' '--with-system-tzdata=/usr/share/zoneinfo' 
'--sysconfdir=/etc/sysconfig/pgsql' '--datadir=/usr/share/pgsql' 
'--with-docdir=/usr/share/doc' 'build_alias=x86_64-redhat-linux-gnu' 
'host_alias=x86_64-redhat-linux-gnu' 
'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -m64 -mtune=generic -I/usr/include/et' 
'CPPFLAGS= -I/usr/include/et'

PostgreSQL 8.4.4

i386:

'--build=i686-redhat-linux-gnu' '--host=i686-redhat-linux-gnu' 
'--target=i386-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' 
'--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' 
'--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' 
'--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--localstatedir=/var' 
'--sharedstatedir=/usr/com' '--mandir=/usr/share/man' 
'--infodir=/usr/share/info' '--disable-rpath' '--with-perl' 
'--with-python' '--with-tcl' '--with-tclconfig=/usr/lib' 
'--with-openssl' '--with-pam' '--with-krb5' '--with-gssapi' 
'--with-includes=/usr/include' '--with-libraries=/usr/lib' 
'--enable-nls' '--enable-thread-safety' '--with-libxml' '--with-libxslt' 
'--with-ldap' '--with-system-tzdata=/usr/share/zoneinfo' 
'--sysconfdir=/etc/sysconfig/pgsql' '--datadir=/usr/share/pgsql' 
'--with-docdir=/usr/share/doc' 'build_alias=i686-redhat-linux-gnu' 
'host_alias=i686-redhat-linux-gnu' 'target_alias=i386-redhat-linux-gnu' 
'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 
-mtune=generic -fasynchronous-unwind-tables -I/usr/include/et' 
'CPPFLAGS= -I/usr/include/et'

PostgreSQL 8.4.4

Again, I extend deep apologies for the inconvenience.

If there is anything further we can help with, please let us know.

Regards,

Lacey

--
Lacey Powers

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 104
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 2:53 PM, Tom Wilcox hungry...@gmail.com wrote:
 maintenance_work_mem=4GB
 work_mem=4GB
 shared_buffers=4GB
 effective_cache_size=4GB
 wal_buffers=1GB

It's pretty easy to drive your system into swap with such a large
value for work_mem - you'd better monitor that carefully.

The default value for wal_buffers is 64kB.  I can't imagine why you'd
need to increase that by four orders of magnitude.  I'm not sure
whether it will cause you a problem or not, but you're allocating
quite a lot of shared memory that way that you might not really need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Dave Crooke
Tom

I always prefer to choose apps based on business needs, then the OS based on
the needs for the app.

Cynically, I often feel that the best answer to we have a policy that says
we're only allowed to use operating system x is to ignore the policy 
the kind of people ignorant enough to be that blinkered are usually not
tech-savvy enough to notice when it gets flouted :-)

More seriously, is the policy Windows only on the metal or could you run
e.g. VMware ESX server? I/O is the area that takes the biggest hit in
virtualization, and ESX server has far less overhead loss than either
Hyper-V (which I presume you are using) or VMWare Workstation for NT
(kernels).

If it's a Windows-only policy, then perhaps you can run those traps in
reverse, and switch to a Windows database, i.e. Microsoft SQL Server.

Cheers
Dave

On Mon, Jun 14, 2010 at 1:53 PM, Tom Wilcox hungry...@gmail.com wrote:


 Hi Bob,

 Thanks a lot. Here's my best attempt to answer your questions:

 The VM is setup with a virtual disk image dynamically expanding to fill an
 allocation of 300GB on a fast, local hard drive (avg read speed = 778MB/s ).
 WAL files can have their own disk, but how significantly would this affect
 our performance?
 The filesystem of the host OS is NTFS (Windows Server 2008 OS 64), the
 guest filesystem is Ext2 (Ubuntu 64).
 The workload is OLAP (lots of large, complex queries on large tables run in
 sequence).

 In addition, I have reconfigured my server to use more memory. Here's a
 detailed blow by blow of how I reconfigured my system to get better
 performance (for anyone who might be interested)...

 In order to increase the shared memory on Ubuntu I edited the System V IPC
 values using sysctl:

 sysctl -w kernel.shmmax=16106127360*
 *sysctl -w kernel.shmall=2097152

 I had some fun with permissions as I somehow managed to change the owner
  of the postgresql.conf to root where it needed to be postgres, resulting in
 failure to start the service.. (Fixed with chown postgres:postgres
 ./data/postgresql.conf and chmod u=rwx ./data -R).

 I changed the following params in my configuration file..

 default_statistics_target=1
 maintenance_work_mem=512MB
 work_mem=512MB
 shared_buffers=512MB
 wal_buffers=128MB

 With this config, the following command took  6,400,000ms:

 EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

 With plan:
 Seq Scan on match_data  (cost=0.00..1392900.78 rows=32237278 width=232)
 (actual time=0.379..464270.682 rows=2961 loops=1)
 Total runtime: 6398238.890 ms

 With these changes to the previous config, the same command took
  5,610,000ms:

 maintenance_work_mem=4GB
 work_mem=4GB
 shared_buffers=4GB
 effective_cache_size=4GB
 wal_buffers=1GB

 Resulting plan:

 Seq Scan on match_data  (cost=0.00..2340147.72 rows=30888572 width=232)
 (actual time=0.094..452793.430 rows=2961 loops=1)
 Total runtime: 5614140.786 ms

 Then I performed these changes to the postgresql.conf file:

 max_connections=3
 effective_cache_size=15GB
 maintenance_work_mem=5GB
 shared_buffers=7000MB
 work_mem=5GB

 And ran this query (for a quick look - can't afford the time for the
 previous tests..):

 EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id 
 10;

 Result:

 Index Scan using match_data_pkey1 on match_data  (cost=0.00..15662.17
 rows=4490 width=232) (actual time=27.055..1908.027 rows=9 loops=1)
   Index Cond: (match_data_id  10)
 Total runtime: 25909.372 ms

 I then ran EntrepriseDB's Tuner on my postgres install (for a dedicated
 machine) and got the following settings and results:

 EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id 
 10;

 Index Scan using match_data_pkey1 on match_data  (cost=0.00..13734.54
 rows=4495 width=232) (actual time=0.348..2928.844 rows=9 loops=1)
   Index Cond: (match_data_id  10)
 Total runtime: 1066580.293 ms

 For now, I will go with the config using 7000MB shared_buffers. Any
 suggestions on how I can further optimise this config for a single session,
 64-bit install utilising ALL of 96GB RAM. I will spend the next week making
 the case for a native install of Linux, but first we need to be 100% sure
 that is the only way to get the most out of Postgres on this machine.

 Thanks very much. I now feel I am at a position where I can really explore
 and find the optimal configuration for my system, but would still appreciate
 any suggestions.

 Cheers,
 Tom


 On 11/06/2010 07:25, Bob Lunney wrote:

 Tom,

 First off, I wouldn't use a VM if I could help it, however, sometimes you
 have to make compromises.  With a 16 Gb machine running 64-bit Ubuntu and
 only PostgreSQL, I'd start by allocating 4 Gb to shared_buffers.  That
 should leave more than enough room for the OS and file system cache.  Then
 I'd begin testing by measuring response times of representative queries with
 significant amounts of data.

 Also, what is the disk setup for the box?  Filesystem?  Can WAL files have
 their own 

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Tom Wilcox

Hi Dave,

I am definitely able to switch OS if it will get the most out of 
Postgres. So it is definitely a case of choosing the OS on the needs if 
the app providing it is well justified.


Currently, we are running Ubuntu Server 64-bit in a VirtualBox VM.

Cheers,
Tom


Dave Crooke wrote:

Tom

I always prefer to choose apps based on business needs, then the OS 
based on the needs for the app.


Cynically, I often feel that the best answer to we have a policy that 
says we're only allowed to use operating system x is to ignore the 
policy  the kind of people ignorant enough to be that blinkered 
are usually not tech-savvy enough to notice when it gets flouted :-)


More seriously, is the policy Windows only on the metal or could you 
run e.g. VMware ESX server? I/O is the area that takes the biggest hit 
in virtualization, and ESX server has far less overhead loss than 
either Hyper-V (which I presume you are using) or VMWare Workstation 
for NT (kernels).


If it's a Windows-only policy, then perhaps you can run those traps in 
reverse, and switch to a Windows database, i.e. Microsoft SQL Server.


Cheers
Dave

On Mon, Jun 14, 2010 at 1:53 PM, Tom Wilcox hungry...@gmail.com 
mailto:hungry...@gmail.com wrote:



Hi Bob,

Thanks a lot. Here's my best attempt to answer your questions:

The VM is setup with a virtual disk image dynamically expanding to
fill an allocation of 300GB on a fast, local hard drive (avg read
speed = 778MB/s ).
WAL files can have their own disk, but how significantly would
this affect our performance?
The filesystem of the host OS is NTFS (Windows Server 2008 OS 64),
the guest filesystem is Ext2 (Ubuntu 64).
The workload is OLAP (lots of large, complex queries on large
tables run in sequence).

In addition, I have reconfigured my server to use more memory.
Here's a detailed blow by blow of how I reconfigured my system to
get better performance (for anyone who might be interested)...

In order to increase the shared memory on Ubuntu I edited the
System V IPC values using sysctl:

sysctl -w kernel.shmmax=16106127360*
*sysctl -w kernel.shmall=2097152

I had some fun with permissions as I somehow managed to change the
owner  of the postgresql.conf to root where it needed to be
postgres, resulting in failure to start the service.. (Fixed with
chown postgres:postgres ./data/postgresql.conf and chmod u=rwx
./data -R).

I changed the following params in my configuration file..

default_statistics_target=1
maintenance_work_mem=512MB
work_mem=512MB
shared_buffers=512MB
wal_buffers=128MB

With this config, the following command took  6,400,000ms:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

With plan:
Seq Scan on match_data  (cost=0.00..1392900.78 rows=32237278
width=232) (actual time=0.379..464270.682 rows=2961 loops=1)
Total runtime: 6398238.890 ms

With these changes to the previous config, the same command took
 5,610,000ms:

maintenance_work_mem=4GB
work_mem=4GB
shared_buffers=4GB
effective_cache_size=4GB
wal_buffers=1GB

Resulting plan:

Seq Scan on match_data  (cost=0.00..2340147.72 rows=30888572
width=232) (actual time=0.094..452793.430 rows=2961 loops=1)
Total runtime: 5614140.786 ms

Then I performed these changes to the postgresql.conf file:

max_connections=3
effective_cache_size=15GB
maintenance_work_mem=5GB
shared_buffers=7000MB
work_mem=5GB

And ran this query (for a quick look - can't afford the time for
the previous tests..):

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE
match_data_id  10;

Result:

Index Scan using match_data_pkey1 on match_data
 (cost=0.00..15662.17 rows=4490 width=232) (actual
time=27.055..1908.027 rows=9 loops=1)
  Index Cond: (match_data_id  10)
Total runtime: 25909.372 ms

I then ran EntrepriseDB's Tuner on my postgres install (for a
dedicated machine) and got the following settings and results:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE
match_data_id  10;

Index Scan using match_data_pkey1 on match_data
 (cost=0.00..13734.54 rows=4495 width=232) (actual
time=0.348..2928.844 rows=9 loops=1)
  Index Cond: (match_data_id  10)
Total runtime: 1066580.293 ms

For now, I will go with the config using 7000MB shared_buffers.
Any suggestions on how I can further optimise this config for a
single session, 64-bit install utilising ALL of 96GB RAM. I will
spend the next week making the case for a native install of Linux,
but first we need to be 100% sure that is the only way to get the
most out of Postgres on this machine.

Thanks very much. I now feel I am at a position where I can really
explore and find the optimal configuration for my system, but
would still 

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Tom Wilcox

Thanks a lot Dave,

That's exactly the kind of answer I can use to justify the OS switch. 
Motivation for the previous setup was based on the fact that we will use 
the same machine for other projects that will use SQL Server and most of 
our experience lies within the MS domain. However, these projects are 
not a high priority currently and therefore I have been focusing on the 
best solution for a Postgres-focused setup.


This does however mean that I will need to have the other projects 
running in a VM on Linux. However, they are less demanding in terms of 
resources.


Cheers,
Tom

Dave Crooke wrote:
With that clarification, I stand squarely behind what others are 
saying ... if performance is important to you, then you should always 
run databases on dedicated hardware, with the OS running on bare metal 
with no virtualization. VirtualBox has even more I/O losses than 
Hyper-V. It's simply not designed for this, and you're giving away a 
ton of performance.


If nothing else, my confusion should indicate to you how 
unconventional and poorly performing this virtualizaed setup is ... I 
simply assumed that the only plausible reason you were piggybacking on 
virtualization on Windows was a mandated lack of alternative options.


Reload the hardware with an OS which PGSQL supports well, and get rid 
of the VirtualBox and Windows layers. If you have hardware that only 
Windows supports well, then you may need to make some hardware changes.


I haven't said anything about which Unix-like OS  you may find 
people arguing passionately for BSD vs. Linux  however, the 
difference between these is negligible compared to virtualized vs. 
real system, and at this point considerations like support base, ease 
of use and familiarity also come into play.


IMHO Ubuntu would be a fine choice, and PGSQL is a first-class 
supported package from the distributor ... however, at customer sites, 
I've typically used Red Hat AS because they have a corporate 
preference for it, even though it is less convenient to install and 
manage.


On Mon, Jun 14, 2010 at 7:41 PM, Tom Wilcox hungry...@gmail.com 
mailto:hungry...@gmail.com wrote:


Hi Dave,

I am definitely able to switch OS if it will get the most out of
Postgres. So it is definitely a case of choosing the OS on the
needs if the app providing it is well justified.

Currently, we are running Ubuntu Server 64-bit in a VirtualBox VM.

Cheers,
Tom


Dave Crooke wrote:

Tom

I always prefer to choose apps based on business needs, then
the OS based on the needs for the app.

Cynically, I often feel that the best answer to we have a
policy that says we're only allowed to use operating system x
is to ignore the policy  the kind of people ignorant
enough to be that blinkered are usually not tech-savvy enough
to notice when it gets flouted :-)

More seriously, is the policy Windows only on the metal or
could you run e.g. VMware ESX server? I/O is the area that
takes the biggest hit in virtualization, and ESX server has
far less overhead loss than either Hyper-V (which I presume
you are using) or VMWare Workstation for NT (kernels).

If it's a Windows-only policy, then perhaps you can run those
traps in reverse, and switch to a Windows database, i.e.
Microsoft SQL Server.

Cheers
Dave

On Mon, Jun 14, 2010 at 1:53 PM, Tom Wilcox
hungry...@gmail.com mailto:hungry...@gmail.com
mailto:hungry...@gmail.com mailto:hungry...@gmail.com wrote:


   Hi Bob,

   Thanks a lot. Here's my best attempt to answer your questions:

   The VM is setup with a virtual disk image dynamically
expanding to
   fill an allocation of 300GB on a fast, local hard drive
(avg read
   speed = 778MB/s ).
   WAL files can have their own disk, but how significantly would
   this affect our performance?
   The filesystem of the host OS is NTFS (Windows Server 2008
OS 64),
   the guest filesystem is Ext2 (Ubuntu 64).
   The workload is OLAP (lots of large, complex queries on large
   tables run in sequence).

   In addition, I have reconfigured my server to use more memory.
   Here's a detailed blow by blow of how I reconfigured my
system to
   get better performance (for anyone who might be interested)...

   In order to increase the shared memory on Ubuntu I edited the
   System V IPC values using sysctl:

   sysctl -w kernel.shmmax=16106127360*
   *sysctl -w kernel.shmall=2097152

   I had some fun with permissions as I somehow managed to
change the
   owner  of the postgresql.conf to root where it needed to be
   postgres, resulting in failure to start the service..
(Fixed with
   

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Greg Smith

Tom Wilcox wrote:

default_statistics_target=1
wal_buffers=1GB
max_connections=3
effective_cache_size=15GB
maintenance_work_mem=5GB
shared_buffers=7000MB
work_mem=5GB


That value for default_statistics_target means that every single query 
you ever run will take a seriously long time to generate a plan for.  
Even on an OLAP system, I would consider 10,000 an appropriate setting 
for a column or two in a particularly troublesome table.  I wouldn't 
consider a value of even 1,000 in the postgresql.conf to be a good 
idea.  You should consider making the system default much lower, and 
increase it only on columns that need it, not for every column on every 
table.


There is no reason to set wal_buffers larger than 16MB, the size of a 
full WAL segment.  Have you read 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server yet?  
checkpoint_segments is the main parameter you haven't touched yet you 
should consider increasing.  Even if you have a low write load, when 
VACUUM runs it will be very inefficient running against a large set of 
tables without the checkpoint frequency being decreased some.  Something 
in the 16-32 range would be plenty for an OLAP setup.


At 3 connections, a work_mem of 5GB is possibly reasonable.  I would 
normally recommend that you make the default much smaller than that 
though, and instead just increase to a large value for queries that 
benefit from it.  If someone later increases max_connections to 
something higher, your server could run completely out of memory if 
work_mem isn't cut way back as part of that change.


You could consider setting effective_cache_size to something even larger 
than that,


EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE 
match_data_id  10;


By the way--repeatedly running this form of query to test for 
improvements in speed is not going to give you particularly good 
results.  Each run will execute a bunch of UPDATE statements that leave 
behind dead rows.  So the next run done for comparison sake will either 
have to cope with that additional overhead, or it will end up triggering 
autovacuum and suffer from that.  If you're going to use an UPDATE 
statement as your benchmark, at a minimum run a manual VACUUM ANALYZE in 
between each test run, to level out the consistency of results a bit.  
Ideally you'd restore the whole database to an initial state before each 
test.


I will spend the next week making the case for a native install of 
Linux, but first we need to be 100% sure that is the only way to get 
the most out of Postgres on this machine.


I really cannot imagine taking a system as powerful as you're using here 
and crippling it by running through a VM.  You should be running Ubuntu 
directly on the hardware, ext3 filesystem without LVM, split off RAID-1 
drive pairs dedicated to OS and WAL, then use the rest of them for the 
database.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] B-Heaps

2010-06-14 Thread Eliot Gable
Just curious if this would apply to PostgreSQL:
http://queue.acm.org/detail.cfm?id=1814327

http://queue.acm.org/detail.cfm?id=1814327Now that I've read it, it seems
like a no-brainer. So, how does PostgreSQL deal with the different latencies
involved in accessing data on disk for searches / sorts vs. accessing data
in memory? Is it allocated in a similar way as described in the article such
that disk access is reduced to a minimum?


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Scott Carey

On Jun 14, 2010, at 11:53 AM, Tom Wilcox wrote:

 
 
 max_connections=3
 effective_cache_size=15GB
 maintenance_work_mem=5GB
 shared_buffers=7000MB
 work_mem=5GB
 

maintenance_work_mem doesn't need to be so high, it certainly has no effect on 
your queries below.  It would affect vacuum, reindex, etc.

With fast disk like this (assuming your 700MB/sec above was not a typo) make 
sure you tune autovacuum up to be much more aggressive than the default 
(increase the allowable cost per sleep by at least 10x).

A big work_mem like above is OK if you know that no more than a couple sessions 
will be active at once.  Worst case, a single connection ... probably ... won't 
use more than 2x that ammount.  


 For now, I will go with the config using 7000MB shared_buffers. Any 
 suggestions on how I can further optimise this config for a single 
 session, 64-bit install utilising ALL of 96GB RAM. I will spend the next 
 week making the case for a native install of Linux, but first we need to 
 be 100% sure that is the only way to get the most out of Postgres on 
 this machine.
 

Getting the most from the RAM does *_NOT_*  mean making Postgres use all the 
RAM.  Postgres relies on the OS file cache heavily.  If there is a lot of free 
RAM for the OS to use to cache files, it will help the performance.  Both 
Windows and Linux aggressively cache file pages and do a good job at it.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Scott Carey

On Jun 14, 2010, at 7:06 PM, Greg Smith wrote:

 I really cannot imagine taking a system as powerful as you're using here 
 and crippling it by running through a VM.  You should be running Ubuntu 
 directly on the hardware, ext3 filesystem without LVM, split off RAID-1 
 drive pairs dedicated to OS and WAL, then use the rest of them for the 
 database.
 

Great points.  There is one other option that is decent for the WAL:
If splitting out a volume is not acceptable for the OS and WAL -- absolutely 
split those two out into their own partitions.  It is most important to make 
sure that WAL and data are not on the same filesystem, especially if ext3 is 
involved.


 -- 
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-14 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 Great points.  There is one other option that is decent for the WAL:
 If splitting out a volume is not acceptable for the OS and WAL -- absolutely 
 split those two out into their own partitions.  It is most important to make 
 sure that WAL and data are not on the same filesystem, especially if ext3 is 
 involved.

Uh, no, WAL really needs to be on its own *spindle*.  The whole point
here is to have one disk head sitting on the WAL and not doing anything
else except writing to that file.  Pushing WAL to a different partition
but still on the same physical disk is likely to be a net pessimization,
because it'll increase the average seek distance whenever the head does
have to move between WAL and everything-else-in-the-database.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance