Re: [PERFORM] Huge Data sets, simple queries

2006-01-29 Thread Michael Adler
On Fri, Jan 27, 2006 at 08:23:55PM -0500, Mike Biamonte wrote:

 This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4
 Kernel) with RAID-10 (15K drives) and 12 GB Ram.  I was expecting it
 to take about 4 hours - based on some experience with a similar
 dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM,
 Raid-5 10K drives)

It's difficult to just try various ideas because each attempt
 takes a full day to test.  Real experience is needed here!

It seems like you are changing multiple variables at the same time.

I think you need to first compare the query plans with EXPLAIN SELECT
to see if they are significantly different. Your upgrade from 7.3 to
8.1 may have resulted in a less optimal plan.

Second, you should monitor your IO performance during the query
execution and test it independent of postgres. Then compare the stats
between the two systems. 

As a side note, if you have many disks and you are concerned about
bottlenecks on read operations, RAID 5 may perform better than
RAID 10. 

 -Mike

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] SAN/NAS options

2006-01-18 Thread Michael Adler
On Sat, Jan 14, 2006 at 09:37:01PM -0500, Charles Sprickman wrote:
 Following up to myself again...
 
 On Wed, 14 Dec 2005, Charles Sprickman wrote:
 
 Hello all,
 
 Supermicro 1U w/SCA backplane and 4 bays
 2x2.8 GHz Xeons
 Adaptec 2015S zero channel RAID card
 
 I don't want to throw away the four machines like that that we have.  I do 
 want to throw away the ZCR cards... :)  If I ditch those I still have a 1U 
 box with a U320 scsi plug on the back.
 
 I'm vaguely considering pairing these two devices:
 
 http://www.areca.us/products/html/products.htm
 
 That's an Areca 16 channel SATA II (I haven't even read up on what's new 
 in SATA II) RAID controller with an optional U320 SCSI daughter card to 
 connect to the host(s).
 
 http://www.chenbro.com.tw/Chenbro_Special/RM321.php

Not sure how significant, but the RM321 backplane claims to support
SATA 150 (aka SATA I) only.

 -Mike

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] fine tuning for logging server

2005-03-31 Thread Michael Adler
On Wed, Mar 30, 2005 at 08:41:43PM -0600, John Arbash Meinel wrote:
 If all you are doing is append only logging, the fastest thing is
 probably just a flat file. You could have something that comes along
 later to move it into the database. It doesn't really sound like you are
 using any features a database provides. (normalization, foreign keys,
 indexes, etc.)

Here's two ideas that I don't think have been mentioned yet: Use copy
to bulk load the data instead of individual imports. And if you get
desperate, you can run pg with fsync=false since you don't seem to
care about re-initializing your whole database in the case of
unexpected interruption. 

 -Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout

2005-02-23 Thread Michael Adler
On Wed, Feb 23, 2005 at 02:15:52PM -0500, John Allgood wrote:
 using custom scripts. Maybe I have given a better explanation of the 
 application. my biggest concern is how to partition the shared storage 
 for maximum performance. Is there a real benifit to having more that one 
 raid5 partition or am I wasting my time.

I think the simplest and most generic solution would be to put the OS
and pg_xlog on a RAID 1 pair and dedicate the rest of the drives to
RAID 5 or RAID 1+0 (striped set of mirrors) array.

Depending on the nature of your work, you may get better performance
by placing individual tables/indices on dedicated spindles for
parallel access.

 -Mike Adler

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Low Performance for big hospital server ..

2005-01-02 Thread Michael Adler
On Sun, Jan 02, 2005 at 09:54:32AM +0700, [EMAIL PROTECTED] wrote:
 postgresql 7.3.2-1 with RH 9 on a mechine of 2 Xeon 3.0 Ghz and ram of 4 Gb.

You may want to try disabling hyperthreading, if you don't mind
rebooting. 

 grew up to 3.5 Gb and there were more than 160 concurent connections.

Looks like your growing dataset won't fit in your OS disk cache any
longer. Isolate your most problematic queries and check out their
query plans. I bet you have some sequential scans that used to read
from cache but now need to read the disk. An index may help you. 

More RAM wouldn't hurt. =)

 -Mike Adler

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PG Logging is Slow

2004-12-20 Thread Michael Adler

On Mon, Dec 20, 2004 at 03:17:11PM +1100, Theo Galanakis wrote:
 Under postgres 7.3 logging is incredibly slow!
 
 I have applied the following settings:
 
 syslog = 2
 syslog_facility = 'LOCAL0'
 syslog_ident = 'postgres'
  
  log_connections =  true 
 log_duration =  true 
 log_pid =  true 
 log_statement =  true 
 log_timestamp =  true 
  
 This severely impacted the performance of our production system, a search
 page which took 1-3 seconds now takes over 30, is this normal?
  
 I need to get some performance indicators from our production db, however I
 cant turn on logging with such performance degradation.


I've experienced this problem many times due to hanging dns
lookups. /etc/resolv.conf may point to a nonexistent
nameserver. Comment it out and restart syslogd. Or use a syslog
implementation that allows you to disable dns lookups. Or just give
the nameserver a kick.

 -Mike Adler

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Michael Adler
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote:
 On November 16, 2004 08:00 pm, Michael Adler wrote:
  http://pugs.postgresql.org/sfpug/archives/21.html
 
  I noticed that some of you left coasters were talking about memcached
  and pgsql. I'm curious to know what was discussed.
 
 Have a look at the pdf presentation found on the following site:
 
 http://people.freebsd.org/~seanc/pgmemcache/

Thanks for that.

That presentation was rather broad and the API seems rather general
purpose, but I wonder why you would really want access the cache by
way of the DB? If one major point of memcache is to allocate RAM to a
low-overhead server instead of to the RDBMS's disk cache, why would
you add the overhead of the RDBMS to the process?  (this is a bit of
straw man, but just trying to flesh-out the pros and cons)

Still, it seems like a convenient way to maintain cache coherency,
assuming that your application doesn't already have a clean way to do
that.

(just my uninformed opinion, though...)

-Mike

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] memcached and PostgreSQL

2004-11-16 Thread Michael Adler
http://pugs.postgresql.org/sfpug/archives/21.html

I noticed that some of you left coasters were talking about memcached
and pgsql. I'm curious to know what was discussed.

In reading about memcached, it seems that many people are using it to
circumvent the scalability problems of MySQL (lack of MVCC). 

from their site:

snip
Shouldn't the database do this?

Regardless of what database you use (MS-SQL, Oracle, Postgres,
MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
properties in a RDBMS, especially when disks are involved, which means
queries are going to block. For databases that aren't ACID-compliant
(like MySQL-MyISAM), that overhead doesn't exist, but reading threads
block on the writing threads. memcached never blocks. 
/snip

So What does memcached offer pgsql users? It would still seem to offer
the benefit of a multi-machined cache.

-Mike 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Excessive context switching on SMP Xeons

2004-10-07 Thread Michael Adler
On Thu, Oct 07, 2004 at 11:48:41AM -0400, Bill Montgomery wrote:
 Alan Stange wrote:
 
 The same test on a Dell PowerEdge 1750, Dual Xeon 3.2 GHz, 512k cache, 
 HT on, Linux 2.4.21-20.ELsmp (RHEL 3), 4GB memory, pg 7.4.5:
 
 Far less performance that the Dual Opterons with a low number of 
 clients, but the gap narrows as the number of clients goes up. Anyone 
 smarter than me care to explain?

You'll have to wait for someone smarter than you, but I will posit
this: Did you use a tmpfs filesystem like Alan? You didn't mention
either way. Alan did that as an attempt remove IO as a variable.

-Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [HACKERS] spurious function execution in prepared statements.

2004-09-30 Thread Michael Adler
On Thu, Sep 30, 2004 at 09:45:51AM -0400, Merlin Moncure wrote:
 Now, if the same query is executed as a prepared statement,
 prepare ps(...) as select f(t.c) from t where [expr] limit 1;
 execute ps;
 
 now, if ps ends up using a index scan on t, everything is ok.  However,
 if ps does a seqscan, f executes for every row on t examined until the
 [expr] criteria is met.  Is this a bug?  If necessary I should be able
 to set up a reproducible example.  The easy workaround is to not use
 prepared statements in these situations, but I need to be able to
 guarantee that f only executes once (even if that means exploring
 subqueries).


Here's another workaround that may let you use a prepared statement:

prepare ps(...) as 
select f(c) from (select c from t where [expr] limit 1) as t1

-Mike

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries (now with pgpool)

2004-09-23 Thread Michael Adler
On Thu, Sep 23, 2004 at 09:23:51PM -0400, Jason Coene wrote:
 I ran some previous queries to get pgpool to pre-establish all the
 connections, and ab ran for a few minutes (with one query per page, eek!).
 It was still exhibiting the same problems as before.  While so many new
 connections at once can surely make the problem worse (and pgpool will
 surely help there), shouldn't this prove that it's not the only issue?

 Any ideas?

Now that your connections are persistent, you may benefit from using
PREPAREd queries.

-Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance over a LAN

2004-07-23 Thread Michael Adler
On Fri, Jul 23, 2004 at 03:20:54PM +0930, William Carney wrote:

 But with the server running on one machine and the client running on
 another, the two machines being connected by a 100 Mb ethernet, with nothing
 else on the network, this test takes 17 minutes to run. I have tried
 changing the frequency of COMMIT operations, but with only a small effect.

 The machines used are P4s running FreeBSD 5.2.1. The Postgres version is
 7.4.3. Can anyone tell me why there's such a big difference?

Can you reproduce this problem in a tiny test case? If your application
is doing other networky things (e.g. many name resolutions that hang
for 10 seconds each), they may be slowing down the PostgreSQL work. 

Just a WAG.

-mike

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] string casting for index usage

2004-03-19 Thread Michael Adler
On Thu, Mar 18, 2004 at 03:39:12PM -0500, Tom Lane wrote:
 Michael Adler [EMAIL PROTECTED] writes:
  In porting an application from v7.2 and v7.3, I noticed that a join on a varchar 
  column and a text column was ignoring indices that were helpful in v7.2. When I 
  explicitly cast the text to a varchar (or set ENABLE_SEQSCAN TO false) the index 
  is scanned and it works as efficiently as in v7.2. 
 
 Maybe you should be moving to 7.4, instead.

That's a fair suggestion, but it's not practical for our 75 sites, most without decent 
network access. If this is in fact addressed in newer releases, then my point is 
mostly inconsequential.

We use Debian stable (7.2.1-2woody4) and Debian testing (7.3.4-9). 

 A desultory test didn't show any difference between 7.2.4 and 7.3.6
 in this respect, however.  Perhaps you forgot to ANALYZE yet in the
 new database?

I have a test with sample data and queries to demonstrate what I'm seeing. I hope it 
is useful. 

Having to do manual casts is not cruel and unusual, but it's not encouraging to see 
performance go down after an upgrade. If anyone has any clever solutions, let me know. 

tables, data, and queries: 
http://www.panix.com/~adler/manual-cast-for-index-scan.sql

my test output:
http://www.panix.com/~adler/manual-cast-for-index-scan_7.3.4-9.out
http://www.panix.com/~adler/manual-cast-for-index-scan_7.2.1-2woody4.out

(the times are not horrific in these specific examples, but the sequential scan makes 
them unscalable). 


manual-cast-for-index-scan_7.3.4-9.out:

DROP TABLE t1;
DROP TABLE
DROP TABLE t2;
DROP TABLE
CREATE TABLE t1 (
key_col text,
grp text
);
CREATE TABLE
COPY t1 FROM stdin;
CREATE UNIQUE INDEX tempindex1 ON t1 USING btree (key_col);
CREATE INDEX
CREATE TABLE t2 (
item_num character varying(5),
key_col character varying(14)
);
CREATE TABLE
COPY t2 FROM stdin;
CREATE INDEX tempindex2 ON t2 USING btree (key_col);
CREATE INDEX
VACUUM ANALYZE;
VACUUM
SELECT version();
 PostgreSQL 7.3.4 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.2 
(Debian)

EXPLAIN ANALYZE  SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = 
t1.key_col) WHERE grp = '24';
 Nested Loop  (cost=0.00..23803.27 rows=194 width=31) (actual time=20.95..1401.46 
rows=69 loops=1)
   Join Filter: ((inner.key_col)::text = outer.key_col)
   -  Seq Scan on t1  (cost=0.00..492.94 rows=194 width=18) (actual time=0.32..30.27 
rows=69 loops=1)
 Filter: (grp = '24'::text)
   -  Seq Scan on t2  (cost=0.00..66.87 rows=4287 width=13) (actual time=0.01..12.06 
rows=4287 loops=69)
 Total runtime: 1401.73 msec

EXPLAIN ANALYZE  SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col::text 
= t1.key_col) WHERE grp = '24';
 Nested Loop  (cost=0.00..23803.27 rows=194 width=31) (actual time=20.27..1398.82 
rows=69 loops=1)
   Join Filter: ((inner.key_col)::text = outer.key_col)
   -  Seq Scan on t1  (cost=0.00..492.94 rows=194 width=18) (actual time=0.26..25.91 
rows=69 loops=1)
 Filter: (grp = '24'::text)
   -  Seq Scan on t2  (cost=0.00..66.87 rows=4287 width=13) (actual time=0.01..12.02 
rows=4287 loops=69)
 Total runtime: 1399.08 msec

EXPLAIN ANALYZE  SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = 
t1.key_col::varchar(24)) WHERE grp = '24';
 Nested Loop  (cost=0.00..4819.13 rows=194 width=31) (actual time=0.52..27.46 rows=69 
loops=1)
   -  Seq Scan on t1  (cost=0.00..492.94 rows=194 width=18) (actual time=0.27..25.94 
rows=69 loops=1)
 Filter: (grp = '24'::text)
   -  Index Scan using tempindex2 on t2  (cost=0.00..22.17 rows=12 width=13) (actual 
time=0.01..0.01 rows=0 loops=69)
 Index Cond: (t2.key_col = (outer.key_col)::character varying(24))
 Total runtime: 27.70 msec



manual-cast-for-index-scan_7.2.1-2woody4.out:

DROP TABLE t1;
DROP
DROP TABLE t2;
DROP
CREATE TABLE t1 (
key_col text,
grp text
);
CREATE
COPY t1 FROM stdin;
CREATE UNIQUE INDEX tempindex1 ON t1 USING btree (key_col);
CREATE
CREATE TABLE t2 (
item_num character varying(5),
key_col character varying(14)
);
CREATE
COPY t2 FROM stdin;
CREATE INDEX tempindex2 ON t2 USING btree (key_col);
CREATE
VACUUM ANALYZE;
VACUUM
SELECT version();
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4

EXPLAIN ANALYZE  SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = 
t1.key_col) WHERE grp = '24';
psql:castedneed.sql:29127: NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1405.88 rows=204 width=32) (actual time=0.46..40.60 rows=69 
loops=1)
  -  Seq Scan on t1  (cost=0.00..505.94 rows=204 width=18) (actual time=0.35..39.09 
rows=69 loops=1)
  -  Index Scan using tempindex2 on t2  (cost=0.00..4.27 rows=11 width=14) (actual 
time=0.01..0.01 rows=0 loops=69)
Total runtime: 40.81 msec

EXPLAIN
EXPLAIN ANALYZE  SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col::text 
= t1.key_col) WHERE grp = '24';
psql:castedneed.sql:29128: NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1405.88 rows=204 width

Re: [PERFORM] inferior SCSI performance

2003-09-30 Thread Michael Adler


On Wed, 17 Sep 2003, Tom Lane wrote:
 Michael Adler [EMAIL PROTECTED] writes:
  I have been experimenting with a new Seagate Cheetah 10k-RPM SCSI to
  compare with a cheaper Seagate Barracuda 7200-RPM IDE (each in a
  single-drive configuration). The Cheetah definately dominates the generic
  IO tests such as bonnie++, but fares poorly with pgbench (and other
  postgresql operations).

 It's fairly common for ATA drives to be configured to lie about write
 completion (ie, claim write-complete as soon as data is accepted into
 their onboard RAM buffer), whereas SCSI drives usually report write
 complete only when the data is actually down to disk.  The performance
 differential may thus be coming at the expense of reliability.  If you
 run Postgres with fsync off, does the differential go away?

Yes, they both perform equally at about 190 tps with fsync off.

The culprit turns out to be write-caching on the IDE drive. It is enabled
by default, but can be disabled with hdparm -W0 /dev/hdx. After it is
disabled, the tps are proportional to rpms.

There's an (2001) Linux thread on this if anyone is interested:
http://www.ussg.iu.edu/hypermail/linux/kernel/0103.0/0331.html

So the quesiton is whether it is ever sensible to use write-caching and
expect comparable persistence.

Thanks,

Michael Adler

---(end of broadcast)---
TIP 8: explain analyze is your friend