Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Matt Clarkson
I've been assisting Mark with the benchmarking of these new servers. 

The drop off in both throughput and CPU utilisation that we've been
observing as the client count increases has let me to investigate which
lwlocks are dominant at different client counts.

I've recompiled postgres with Andres LWLock improvements, Kevin's
libnuma patch and with LWLOCK_STATS enabled.

The LWLOCK_STATS below suggest that ProcArrayLock might be the main
source of locking that's causing throughput to take a dive as the client
count increases beyond the core count.


wal_buffers = 256MB
checkpoint_segments = 1920
wal_sync_method = open_datasync

pgbench -s 2000 -T 600


Results:

 clients |  tps
-+-
 6   |  9490
12   | 17558
24   | 25681
48   | 41175
96   | 48954
   192   | 31887
   384   | 15564
   
   

LWLOCK_STATS at 48 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|  31144   |  11   |  1.64 |   1.62
 ShmemIndexLock |192   |   1   |  0.01 |   0.15
 OidGenLock |  32648   |  14   |  1.72 |   2.06
 XidGenLock |  35731   |  18   |  1.88 |   2.64
 ProcArrayLock  | 291121   | 215   | 15.36 |  31.57
 SInvalReadLock |  32136   |  13   |  1.70 |   1.91
 SInvalWriteLock|  32141   |  12   |  1.70 |   1.76
 WALBufMappingLock  |  31662   |  15   |  1.67 |   2.20
 WALWriteLock   | 825380   |  45   | 36.31 |   6.61
 CLogControlLock| 583458   | 337   | 26.93 |  49.49
 
 
   
LWLOCK_STATS at 96 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|   62954  |  12   |  1.54 |   0.27
 ShmemIndexLock |   62635  |   4   |  1.54 |   0.09
 OidGenLock |   92232  |  22   |  2.26 |   0.50
 XidGenLock |   98326  |  18   |  2.41 |   0.41
 ProcArrayLock  |  928871  |3188   | 22.78 |  72.57
 SInvalReadLock |   58392  |  13   |  1.43 |   0.30
 SInvalWriteLock|   57429  |  14   |  1.41 |   0.32
 WALBufMappingLock  |  138375  |  14   |  3.39 |   0.32
 WALWriteLock   | 1480707  |  42   | 36.31 |   0.96
 CLogControlLock| 1098239  |1066   | 26.93 |  27.27
 
 
 
LWLOCK_STATS at 384 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay % 
+--+---+---+-
 BufFreelistLock|  184298  | 158   |  1.93 |   0.03
 ShmemIndexLock |  183573  | 164   |  1.92 |   0.03
 OidGenLock |  184558  | 173   |  1.93 |   0.03
 XidGenLock |  200239  | 213   |  2.09 |   0.04
 ProcArrayLock  | 4035527  |  579666   | 42.22 |  98.62
 SInvalReadLock |  182204  | 152   |  1.91 |   0.03
 SInvalWriteLock|  182898  | 137   |  1.91 |   0.02
 WALBufMappingLock  |  219936  | 215   |  2.30 |   0.04
 WALWriteLock   | 3172725  | 457   | 24.67 |   0.08
 CLogControlLock| 1012458  |6423   | 10.59 |   1.09
 

The same test done with a readonly workload show virtually no SpinDelay
at all.


Any thoughts or comments on these results are welcome!


Regards,
Matt.





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] OFFSET/LIMIT - Disparate Performance w/ Go application

2014-06-12 Thread Matt Silverlock
Hi all. This might be tricky in so much as there’s a few moving parts (when 
isn’t there?), but I’ve tried to test the postgres side as much as possible.

Trying to work out a potential database bottleneck with a HTTP application 
(written in Go):
Pages that render HTML templates but don’t perform DB queries can hit ~36k+ 
req/s
Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, 
SELECT * FROM listings WHERE id = $1 AND expiry_date  current_date, l.Id)
Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to 
top 1.3k req/s
There’s very little “extra” logic around these queries: you can find the code 
here (about 39 lines for both functions) 
https://gist.github.com/elithrar/b2497b0b473da64932b5
Other pertinent details:
It’s always been about this slow to my knowledge
The table is a test database with about 40 rows, although performance doesn’t 
change noticeably even with a few hundred (it’s unlikely to ever be more than a 
10,000 rows over its lifetime)
Running PostgreSQL 9.3.4 on OS X w/ a 3.5GHz i7, 12GB RAM, 128GB PCI-E SSD.
The Go application peaks at about 40MB memory when hitting 37k req/s — so there 
doesn’t appear to be an issue of it eating into the available RAM on the machine
I’m also aware that HTTP benchmarks aren’t the most reliable thing, but I’m 
using wrk -c 400 -t 32 -15s to stress it out
The application has a connection pool via the lib/pq driver 
(https://github.com/lib/pq) with MaxOpen set to 256 connections. Stack size is 
8GB and max socket connections are set to 1024 (running out of FDs isn’t the 
problem here from what I can see).

Relevant postgresql.conf settings — everything else should be default, 
including fsync/synchronous commits (on) for obvious reasons:
max_connections = 512
shared_buffers = 2048MB
temp_buffers = 16MB
work_mem = 4MB
wal_buffers = 16
checkpoint_segments = 16
random_page_cost = 2.0
effective_cache_size = 8192MB
The query in question is: http://explain.depesz.com/s/7g8 and the table schema 
is as below:


Table public.listings
┌───┬──┬───┐
│Column │   Type   │ Modifiers │
├───┼──┼───┤
│ id│ character varying(17)│ not null  │
│ title │ text │   │
│ company   │ text │   │
│ location  │ text │   │
│ description   │ text │   │
│ rendered_desc │ text │   │
│ term  │ text │   │
│ commute   │ text │   │
│ company_url   │ text │   │
│ rep   │ text │   │
│ rep_email │ text │   │
│ app_method│ text │   │
│ app_email │ text │   │
│ app_url   │ text │   │
│ posted_date   │ timestamp with time zone │   │
│ edited_date   │ timestamp with time zone │   │
│ renewed_date  │ timestamp with time zone │   │
│ expiry_date   │ timestamp with time zone │   │
│ slug  │ text │   │
│ charge_id │ text │   │
│ sponsor_id│ text │   │
│ tsv   │ tsvector │   │
└───┴──┴───┘
Indexes:
listings_pkey PRIMARY KEY, btree (id)
fts gin (tsv)
listings_expiry_date_idx btree (expiry_date)
listings_fts_idx gin (to_tsvector('english'::regconfig, (((title || ' 
'::text) || company) || ' '::text) || location) || ' '::text) || term) || ' 
'::text) || commute))
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON listings FOR EACH ROW EXECUTE 
PROCEDURE tsvector_update_trigger('tsv', 'pg_catalog.english', 'title', 
'company', 'location', 'term', 'commute’)


The single row query has a query plan here: http://explain.depesz.com/s/1Np 
(this is where I see 6.6k req/s at the application level), 

Some pgbench results from this machine as well:
$ pgbench -c 128 -C -j 4 -T 15 -M extended -S
starting vacuum...end.
transaction type: SELECT only
scaling factor: 1
query mode: extended
number of clients: 128
number of threads: 4
duration: 15 s
number of transactions actually processed: 17040
tps = 1134.481459 (including connections establishing)
tps = 56884.093652 (excluding connections establishing)
Ultimately I'm not expecting a miracle—database ops are nearly always the 
slowest part of a web server outside the latency to the client itself—but I'd 
expect something a little closer (even 10% of 33k would be a lot better). And 
of course, this is somewhat academic because I don't expect to see four 
million hits an hour—but I'd also like to catch problems for future reference.

Thanks in 

Re: [PERFORM] OFFSET/LIMIT - Disparate Performance w/ Go application

2014-06-12 Thread Matt Silverlock
Thanks for the replies Jeff, Tom and Merlin.

 Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to 
 top 1.3k req/s
 
 Is that tested at the OFFSET and LIMIT of 0 and 15, as shown in the
 explain plan?


Yes — 0 (OFFSET) and 16 (LIMIT), or 15 and 31 (i.e. “second page” of results). 
There’s no difference on that front. For context, OFFSET is a multiple of 15 
(i.e. 15 results per page) and LIMIT is always 15 + 1 in an attempt to fetch 
one more result, get the len of the returned slice and then return paginate 
true + slice the last result off if there’s more than 15.

 
 
 The query in question is: http://explain.depesz.com/s/7g8 and the table 
 schema is as below:
 
 The reported runtime of 0.078 ms should be able to sustain nearly 10
 times the reported rate of 1.3k/s, so the bottleneck would seem to be
 elsewhere.
 
 Perhaps the bottleneck is formatting the result set in postgres to be
 sent over the wire, then sending it over the wire, then parsing it in
 the Go connection library to hand back to the Go user code, and then
 the Go user code doing something meaningful with it.
 
 What happens if you get rid of the offset and the order by, and just
 use limit?  I bet it doesn't change the speed much (because that is
 not where the bottleneck is).
 
 You seem to be selecting an awful lot of wide columns.  Do you really
 need to see all of them?

- Testing SELECT * FROM … with just LIMIT 15 and no offset yields 1299 
request/s at the front end of the application.
- Testing SELECT id, title, company, location, commute, term, expiry_date (a 
subset of fields) with LIMIT 15 and no OFFSET yields 1800 request/s at the 
front end.

There’s definitely an increase to be realised there (I’d say by just tossing 
the rendered HTML field). 

Based on your comments about the Go side of things, I ran a quick test by 
cutting the table down to 6 records from the 39 in the test DB in all previous 
tests. This skips the pagination logic (below) and yields 3068 req/s on the 
front-end. 

// Determine if we have more than one page of results.
// If so, trim the extra result off and set pagination = true
if len(listings)  opts.PerPage {
paginate = true
listings = listings[:opts.PerPage]
}

So there certainly appears to be a bottleneck on the Go side as well (outside 
of even the DB driver), probably from the garbage generated from slicing the 
slice, although I’d be keen to know if there’s a better way to approach 
returning a paginated list of results.

 Well, you can also do client side pagination using the row-wise
 comparison feature, implemented by you :-).  Cursors can be the best
 approach, but it's nice to know the client side approach if you're
 really stateless and/or want to be able to pick up external changes
 during the browse.


What would be a better approach here? The cursor approach isn’t ideal in my 
case (although I could make it work), but what other options are there that are 
stateless?


 
 Some pgbench results from this machine as well:
 
 $ pgbench -c 128 -C -j 4 -T 15 -M extended -S
 
 This is just benchmarking how fast you can make and break connections
 to the database.
 
 Because your app is using an embedded connection pooler, this
 benchmark isn't very relevant to your situation.


Noted — thanks.


On 13 Jun 2014, at 4:46 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 snip



Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Matt Clarkson

On Tue, 2013-05-07 at 18:32 +1200, Mark Kirkwood wrote:
 On 07/05/13 18:10, Simon Riggs wrote:
  On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz wrote:
 
  I'm thinking that a variant of (2) might be simpler to inplement:
 
  (I think Matt C essentially beat me to this suggestion - he originally
  discovered this issue). It is probably good enough for only *new* plans to
  react to the increased/increasing number of in progress rows. So this
  would require backends doing significant numbers of row changes to either
  directly update pg_statistic or report their in progress numbers to the
  stats collector. The key change here is the partial execution numbers
  would need to be sent. Clearly one would need to avoid doing this too
  often (!) - possibly only when number of changed rows 
  autovacuum_analyze_scale_factor proportion of the relation concerned or
  similar.
 
  Are you loading using COPY? Why not break down the load into chunks?
 
 
 INSERT - but we could maybe workaround by chunking the INSERT. However 
 that *really* breaks the idea that in SQL you just say what you want, 
 not how the database engine should do it! And more practically means 
 that the most obvious and clear way to add your new data has nasty side 
 effects, and you have to tip toe around muttering secret incantations to 
 make things work well :-)

We also had the same problem with an UPDATE altering the data
distribution in such a way that trivial but frequently executed queries
cause massive server load until auto analyze sorted out the stats.

-- 
Matt Clarkson
Catalyst.Net Limited




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Matt Clarkson

 
 (2) seems fairly hard generically, since we'd have to keep track of
 the tids returned from the IndexScan to allow us to switch to a
 different plan and avoid re-issuing rows that we've already returned.
 But maybe if we adapted the IndexScan plan type so that it adopted a
 more page oriented approach internally, it could act like a
 bitmapscan. Anyway, that would need some proof that it would work and
 sounds like a fair task.
 
 (1) sounds more easily possible and plausible. At the moment we have
 enable_indexscan = off. If we had something like
 plan_cost_weight_indexscan = N, we could selectively increase the cost
 of index scans so that they would be less likely to be selected. i.e.
 plan_cost_weight_indexscan = 2 would mean an indexscan would need to
 be half the cost of any other plan before it was selected. (parameter
 name selected so it could apply to all parameter types). The reason to
 apply this weighting would be to calculate risk adjusted cost not
 just estimated cost.
 
 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


Another option would be for the bulk insert/update/delete to track the
distribution stats as the operation progresses and if it detects that it
is changing the distribution of data beyond a certain threshold it would
update the pg stats accordingly.

-- 
Matt Clarkson
Catalyst.Net Limited




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Estimation question...

2013-02-28 Thread Matt Daw
I get it now... project_id=115 has a frequency of 0.09241 in pg_stats.
So if ((project_id = 115) AND (project_id = 115)) is considered as two
independent conditions, the row estimate ends up being 0.09241 *
0.09241 * 1.20163e+07 (reltuples from pg_class) = 102614.

http://www.postgresql.org/docs/9.0/static/row-estimation-examples.html
was a big help.

Matt

On Wed, Feb 27, 2013 at 9:08 AM, Matt Daw m...@shotgunsoftware.com wrote:
 Quick follow up... I've found that the row estimate in:

 explain select count(id) from versions where project_id IN (80,115)
 AND project_id=115;
QUERY PLAN
 -
  Aggregate  (cost=178572.75..178572.76 rows=1 width=4)
-  Index Scan using dneg_versions_project_id on versions
 (cost=0.00..178306.94 rows=106323 width=4)
  Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
 (project_id = 115))


 ... is the sum of two other estimates, seen when rewriting the query
 using OR instead of IN:


 explain select count(id) from versions where (project_id = 80 OR
 project_id = 115) AND project_id=115;
 QUERY PLAN
 ---
  Aggregate  (cost=305896.95..305896.96 rows=1 width=4)
-  Bitmap Heap Scan on versions  (cost=2315.08..305632.00
 rows=105980 width=4)
  Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR
 ((project_id = 115) AND (project_id = 115)))
  -  BitmapOr  (cost=2315.08..2315.08 rows=106323 width=0)
-  Bitmap Index Scan on dneg_versions_project_id
 (cost=0.00..94.52 rows=3709 width=0)
  Index Cond: ((project_id = 80) AND (project_id = 115))
-  Bitmap Index Scan on dneg_versions_project_id
 (cost=0.00..2167.57 rows=102614 width=0)
  Index Cond: ((project_id = 115) AND (project_id = 115))

 106323 = 3709 + 102614

 Looks like the underlying problem is that the estimate for
 ((project_id = 115) AND (project_id = 115)) doesn't end up being the
 same as (project_id=115) on its own.

 Matt

 On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote:
 Howdy, the query generator in my app sometimes creates redundant
 filters of the form:

 project_id IN ( list of projects user has permission to see ) AND
 project_id = single project user is looking at 

 ... and this is leading to a bad estimate (and thus a bad plan) on a
 few complex queries. I've included simplified examples below. This
 server is running 9.0.10 and the statistics target has been updated to
 1000 on the project_id column. I've also loaded the one table into a
 9.2.2 instance and replicated the behaviour.

 I can change how the query is being generated, but I'm curious why I'm
 getting a bad estimate. Is this an expected result?

 Thanks!

 Matt

 =

 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good.

 explain (analyze,buffers) select count(id) from versions WHERE 
 project_id=115;

  QUERY PLAN
 ---
  Aggregate  (cost=1218111.01..1218111.02 rows=1 width=4) (actual
 time=1531.341..1531.342 rows=1 loops=1)
Buffers: shared hit=452619
-  Bitmap Heap Scan on versions  (cost=34245.06..1215254.86
 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197
 loops=1)
  Recheck Cond: (project_id = 115)
  Buffers: shared hit=452619
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..33959.45 rows=1142461 width=0) (actual
 time=139.709..139.709 rows=1116037 loops=1)
Index Cond: (project_id = 115)
Buffers: shared hit=22077
  Total runtime: 1531.399 ms

 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower.

 explain (analyze,buffers) select count(id) from versions WHERE
 project_id IN (80,115) AND project_id=115;;

 QUERY PLAN
 -
  Aggregate  (cost=327066.18..327066.19 rows=1 width=4) (actual
 time=1637.889..1637.889 rows=1 loops=1)
Buffers: shared hit=458389
-  Bitmap Heap Scan on versions  (cost=3546.56..326793.17
 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180
 loops=1)
  Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
 (project_id = 115))
  Buffers: shared hit=458389
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502
 rows=1125436 loops=1)
Index Cond: ((project_id = ANY ('{80,115}'::integer

Re: [PERFORM] Estimation question...

2013-02-27 Thread Matt Daw
Quick follow up... I've found that the row estimate in:

explain select count(id) from versions where project_id IN (80,115)
AND project_id=115;
   QUERY PLAN
-
 Aggregate  (cost=178572.75..178572.76 rows=1 width=4)
   -  Index Scan using dneg_versions_project_id on versions
(cost=0.00..178306.94 rows=106323 width=4)
 Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
(project_id = 115))


... is the sum of two other estimates, seen when rewriting the query
using OR instead of IN:


explain select count(id) from versions where (project_id = 80 OR
project_id = 115) AND project_id=115;
QUERY PLAN
---
 Aggregate  (cost=305896.95..305896.96 rows=1 width=4)
   -  Bitmap Heap Scan on versions  (cost=2315.08..305632.00
rows=105980 width=4)
 Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR
((project_id = 115) AND (project_id = 115)))
 -  BitmapOr  (cost=2315.08..2315.08 rows=106323 width=0)
   -  Bitmap Index Scan on dneg_versions_project_id
(cost=0.00..94.52 rows=3709 width=0)
 Index Cond: ((project_id = 80) AND (project_id = 115))
   -  Bitmap Index Scan on dneg_versions_project_id
(cost=0.00..2167.57 rows=102614 width=0)
 Index Cond: ((project_id = 115) AND (project_id = 115))

106323 = 3709 + 102614

Looks like the underlying problem is that the estimate for
((project_id = 115) AND (project_id = 115)) doesn't end up being the
same as (project_id=115) on its own.

Matt

On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote:
 Howdy, the query generator in my app sometimes creates redundant
 filters of the form:

 project_id IN ( list of projects user has permission to see ) AND
 project_id = single project user is looking at 

 ... and this is leading to a bad estimate (and thus a bad plan) on a
 few complex queries. I've included simplified examples below. This
 server is running 9.0.10 and the statistics target has been updated to
 1000 on the project_id column. I've also loaded the one table into a
 9.2.2 instance and replicated the behaviour.

 I can change how the query is being generated, but I'm curious why I'm
 getting a bad estimate. Is this an expected result?

 Thanks!

 Matt

 =

 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good.

 explain (analyze,buffers) select count(id) from versions WHERE project_id=115;

  QUERY PLAN
 ---
  Aggregate  (cost=1218111.01..1218111.02 rows=1 width=4) (actual
 time=1531.341..1531.342 rows=1 loops=1)
Buffers: shared hit=452619
-  Bitmap Heap Scan on versions  (cost=34245.06..1215254.86
 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197
 loops=1)
  Recheck Cond: (project_id = 115)
  Buffers: shared hit=452619
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..33959.45 rows=1142461 width=0) (actual
 time=139.709..139.709 rows=1116037 loops=1)
Index Cond: (project_id = 115)
Buffers: shared hit=22077
  Total runtime: 1531.399 ms

 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower.

 explain (analyze,buffers) select count(id) from versions WHERE
 project_id IN (80,115) AND project_id=115;;

 QUERY PLAN
 -
  Aggregate  (cost=327066.18..327066.19 rows=1 width=4) (actual
 time=1637.889..1637.889 rows=1 loops=1)
Buffers: shared hit=458389
-  Bitmap Heap Scan on versions  (cost=3546.56..326793.17
 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180
 loops=1)
  Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
 (project_id = 115))
  Buffers: shared hit=458389
  -  Bitmap Index Scan on versions_project_id
 (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502
 rows=1125436 loops=1)
Index Cond: ((project_id = ANY ('{80,115}'::integer[]))
 AND (project_id = 115))
Buffers: shared hit=22076
  Total runtime: 1637.941 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Estimation question...

2013-02-26 Thread Matt Daw
Howdy, the query generator in my app sometimes creates redundant
filters of the form:

project_id IN ( list of projects user has permission to see ) AND
project_id = single project user is looking at 

... and this is leading to a bad estimate (and thus a bad plan) on a
few complex queries. I've included simplified examples below. This
server is running 9.0.10 and the statistics target has been updated to
1000 on the project_id column. I've also loaded the one table into a
9.2.2 instance and replicated the behaviour.

I can change how the query is being generated, but I'm curious why I'm
getting a bad estimate. Is this an expected result?

Thanks!

Matt

=

1) Filter on project_id only, row estimate for Bitmap Index Scan quite good.

explain (analyze,buffers) select count(id) from versions WHERE project_id=115;

 QUERY PLAN
---
 Aggregate  (cost=1218111.01..1218111.02 rows=1 width=4) (actual
time=1531.341..1531.342 rows=1 loops=1)
   Buffers: shared hit=452619
   -  Bitmap Heap Scan on versions  (cost=34245.06..1215254.86
rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197
loops=1)
 Recheck Cond: (project_id = 115)
 Buffers: shared hit=452619
 -  Bitmap Index Scan on versions_project_id
(cost=0.00..33959.45 rows=1142461 width=0) (actual
time=139.709..139.709 rows=1116037 loops=1)
   Index Cond: (project_id = 115)
   Buffers: shared hit=22077
 Total runtime: 1531.399 ms

2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower.

explain (analyze,buffers) select count(id) from versions WHERE
project_id IN (80,115) AND project_id=115;;

QUERY PLAN
-
 Aggregate  (cost=327066.18..327066.19 rows=1 width=4) (actual
time=1637.889..1637.889 rows=1 loops=1)
   Buffers: shared hit=458389
   -  Bitmap Heap Scan on versions  (cost=3546.56..326793.17
rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180
loops=1)
 Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
(project_id = 115))
 Buffers: shared hit=458389
 -  Bitmap Index Scan on versions_project_id
(cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502
rows=1125436 loops=1)
   Index Cond: ((project_id = ANY ('{80,115}'::integer[]))
AND (project_id = 115))
   Buffers: shared hit=22076
 Total runtime: 1637.941 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query plan, nested EXISTS

2012-09-28 Thread Matt Daw


 Aggregate  (cost=821544.18..821544.19 rows=1 width=0) (actual
time=5046.492..5046.492 rows=1 loops=1)
   -  Hash Semi Join  (cost=735371.03..821521.73 rows=8977 width=0)
(actual time=4941.968..5045.968 rows=7116 loops=1)
 Hash Cond: (a.id = b.note_id)
 -  Index Scan using notes_retirement_date_project on notes a
 (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.639..68.751
rows=12469 loops=1)
   Index Cond: (project_id = 114)
 -  Hash  (cost=712116.23..712116.23 rows=1417424 width=4) (actual
time=4931.956..4931.956 rows=297401 loops=1)
   Buckets: 65536  Batches: 4  Memory Usage: 2633kB
   -  Hash Join  (cost=620484.32..712116.23 rows=1417424
width=4) (actual time=3362.472..4864.816 rows=297401 loops=1)
 Hash Cond: (b.entity_id = c.id)
 -  Seq Scan on note_links b  (cost=0.00..71849.56
rows=1417424 width=8) (actual time=0.079..622.277 rows=1509795 loops=1)
   Filter: ((retirement_date IS NULL) AND
((entity_type)::text = 'Version'::text))
 -  Hash  (cost=618673.97..618673.97 rows=144828
width=4) (actual time=3362.337..3362.337 rows=155834 loops=1)
   Buckets: 16384  Batches: 1  Memory Usage: 5479kB
   -  HashAggregate  (cost=617225.69..618673.97
rows=144828 width=4) (actual time=3289.861..3335.344 rows=155834 loops=1)
 -  Seq Scan on versions c
 (cost=0.00..616863.62 rows=144828 width=4) (actual time=217.080..3133.870
rows=155834 loops=1)
   Filter: ((retirement_date IS NULL)
AND ((sg_status_list)::text = 'ip'::text))
 Total runtime: 5051.414 ms
(17 rows)


Does anything come to mind that would help me debug why this plan is being
chosen? Thanks!

Matt


Re: [PERFORM] Query plan, nested EXISTS

2012-09-28 Thread Matt Daw
Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance and
report back.

Matt

On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Matt Daw m...@shotgunsoftware.com writes:
  Howdy, I've been debugging a client's slow query today and I'm curious
  about the query plan. It's picking a plan that hashes lots of rows from
 the
  versions table (on v9.0.10)...

  EXPLAIN ANALYZE
  SELECT COUNT(*) FROM notes a WHERE
  a.project_id = 114 AND
  EXISTS (
  SELECT 1 FROM note_links b
  WHERE
  b.note_id = a.id AND
  b.entity_type = 'Version' AND
  EXISTS (
  SELECT 1 FROM versions c
  WHERE
  c.id = b.entity_id AND
  c.code ILIKE '%comp%' AND
  c.retirement_date IS NULL
  ) AND
  b.retirement_date IS NULL
  )

 I think the real problem here is that 9.0 is incapable of avoiding a
 full table scan on note_links, which means it doesn't really have any
 better option than to do the inner EXISTS as a full-table semijoin.
 This is because it can't push a.id down through two levels of join, and
 because the semijoins don't commute, there's no way to get a.id into the
 scan of note_links to pull out only the useful rows.  The hack with
 LIMIT avoids this problem by preventing the inner EXISTS from being
 treated as a full-fledged semijoin; but of course that hack leaves you
 vulnerable to very bad plans if the statistics are such that a nestloop
 join isn't the best bet for the inner EXISTS.

 The work I did for parameterized paths in 9.2 was intended to address
 exactly this type of scenario.  I would be interested to know if 9.2
 does this any better for you.

 regards, tom lane



Re: [PERFORM] Query plan, nested EXISTS

2012-09-28 Thread Matt Daw
Hi Tom, v9.2.1 looks good!

 Aggregate  (cost=420808.99..420809.00 rows=1 width=0) (actual
time=147.345..147.345 rows=1 loops=1)
   -  Nested Loop Semi Join  (cost=0.00..420786.71 rows=8914 width=0)
(actual time=13.847..147.219 rows=894 loops=1)
 -  Index Scan using notes_retirement_date_project on notes a
 (cost=0.00..67959.22 rows=12535 width=4) (actual time=13.811..71.741
rows=12469 loops=1)
   Index Cond: (project_id = 114)
 -  Nested Loop Semi Join  (cost=0.00..28.14 rows=1 width=4)
(actual time=0.006..0.006 rows=0 loops=12469)
   -  Index Scan using note_links_note on note_links b
 (cost=0.00..12.37 rows=1 width=8) (actual time=0.002..0.002 rows=1
loops=12469)
 Index Cond: (note_id = a.id)
 Filter: ((retirement_date IS NULL) AND
((entity_type)::text = 'Version'::text))
 Rows Removed by Filter: 1
   -  Index Scan using versions_pkey on versions c
 (cost=0.00..15.76 rows=1 width=4) (actual time=0.003..0.003 rows=0
loops=11794)
 Index Cond: (id = b.entity_id)
 Filter: ((retirement_date IS NULL) AND ((code)::text
~~* '%comp%'::text))
 Rows Removed by Filter: 1
 Total runtime: 147.411 ms
(14 rows)

On Fri, Sep 28, 2012 at 2:47 PM, Matt Daw m...@shotgunsoftware.com wrote:

 Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance
 and report back.

 Matt


 On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Matt Daw m...@shotgunsoftware.com writes:
  Howdy, I've been debugging a client's slow query today and I'm curious
  about the query plan. It's picking a plan that hashes lots of rows from
 the
  versions table (on v9.0.10)...

  EXPLAIN ANALYZE
  SELECT COUNT(*) FROM notes a WHERE
  a.project_id = 114 AND
  EXISTS (
  SELECT 1 FROM note_links b
  WHERE
  b.note_id = a.id AND
  b.entity_type = 'Version' AND
  EXISTS (
  SELECT 1 FROM versions c
  WHERE
  c.id = b.entity_id AND
  c.code ILIKE '%comp%' AND
  c.retirement_date IS NULL
  ) AND
  b.retirement_date IS NULL
  )

 I think the real problem here is that 9.0 is incapable of avoiding a
 full table scan on note_links, which means it doesn't really have any
 better option than to do the inner EXISTS as a full-table semijoin.
 This is because it can't push a.id down through two levels of join, and
 because the semijoins don't commute, there's no way to get a.id into the
 scan of note_links to pull out only the useful rows.  The hack with
 LIMIT avoids this problem by preventing the inner EXISTS from being
 treated as a full-fledged semijoin; but of course that hack leaves you
 vulnerable to very bad plans if the statistics are such that a nestloop
 join isn't the best bet for the inner EXISTS.

 The work I did for parameterized paths in 9.2 was intended to address
 exactly this type of scenario.  I would be interested to know if 9.2
 does this any better for you.

 regards, tom lane





[PERFORM] Performance of Seq Scan from buffer cache

2012-08-21 Thread Matt Daw
Howdy. I'm curious what besides raw hardware speed determines the performance 
of a Seq Scan that comes entirely out of shared buffers… I ran the following on 
the client's server I'm profiling, which is otherwise idle:

EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT * FROM notes;

Seq Scan on notes  (cost=0.00..94004.88 rows=1926188 width=862) (actual 
time=0.009..1673.702 rows=1926207 loops=1)
   Buffers: shared hit=74743
 Total runtime: 3110.442 ms
(3 rows)


… and that's about 9x slower than what I get on my laptop with the same data. I 
ran stream-scaling on the machine and the results seem reasonable (8644.1985 
MB/s with 1 core - 25017 MB/s with 12 cores). The box is running 2.6.26.6-49 
and postgresql 9.0.6.

I'm stumped as to why it's so much slower, any ideas on what might explain it… 
or other benchmarks I could run to try to narrow down the cause?

Thanks!

Matt

Re: [PERFORM] Performance of Seq Scan from buffer cache

2012-08-21 Thread Matt Daw
Ugh, never mind. I ran ltrace and it's spending 99% of its time in 
gettimeofday.  

select count(*) from notes;
  count   
-
 1926207
(1 row)

Time: 213.950 ms

explain analyze select count(*) from notes;
   QUERY PLAN   
 

 Aggregate  (cost=99274.59..99274.60 rows=1 width=0) (actual 
time=2889.325..2889.325 rows=1 loops=1)
   -  Seq Scan on notes  (cost=0.00..94459.07 rows=1926207 width=0) (actual 
time=0.005..1475.218 rows=1926207 loops=1)
 Total runtime: 2889.360 ms
(3 rows)

Time: 2889.842 ms  


On Tuesday, 21 August, 2012 at 3:57 PM, Matt Daw wrote:

 Howdy. I'm curious what besides raw hardware speed determines the performance 
 of a Seq Scan that comes entirely out of shared buffers… I ran the following 
 on the client's server I'm profiling, which is otherwise idle:
  
 EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT * FROM notes;
  
 Seq Scan on notes  (cost=0.00..94004.88 rows=1926188 width=862) (actual 
 time=0.009..1673.702 rows=1926207 loops=1)
Buffers: shared hit=74743
  Total runtime: 3110.442 ms
 (3 rows)
  
  
 … and that's about 9x slower than what I get on my laptop with the same data. 
 I ran stream-scaling on the machine and the results seem reasonable 
 (8644.1985 MB/s with 1 core - 25017 MB/s with 12 cores). The box is running 
 2.6.26.6-49 and postgresql 9.0.6.
  
 I'm stumped as to why it's so much slower, any ideas on what might explain 
 it… or other benchmarks I could run to try to narrow down the cause?
  
 Thanks!
  
 Matt  



Re: [PERFORM] Slowing UPDATEs inside a transaction

2011-03-04 Thread Matt Burke
Robert Haas wrote:
 Old row versions have to be kept around until they're no longer of 
 interest to any still-running transaction.

Thanks for the explanation.

Regarding the snippet above, why would the intermediate history of
multiply-modified uncommitted rows be of interest to anything, or is the
current behaviour simply cheaper overall in terms of cpu/developer time?


-- 
 
The information contained in this message is confidential and is intended for 
the addressee only. If you have received this message in error or there are any 
problems please notify the originator immediately. The unauthorised use, 
disclosure, copying or alteration of this message is strictly forbidden. 

Critical Software Ltd. reserves the right to monitor and record e-mail messages 
sent to and from this address for the purposes of investigating or detecting 
any unauthorised use of its system and ensuring its effective operation.

Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, 
Keele Science Park, Keele, Staffordshire, ST5 5NH.


This message has been scanned for security threats by iCritical.
For further information, please visit www.icritical.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slowing UPDATEs inside a transaction

2011-03-03 Thread Matt Burke
Hi. I've only been using PostgreSQL properly for a week or so, so I
apologise if this has been covered numerous times, however Google is
producing nothing of use.

I'm trying to import a large amount of legacy data (billions of
denormalised rows) into a pg database with a completely different schema,
de-duplicating bits of it on-the-fly while maintaining a reference count.
The procedures to do this have proven easy to write, however the speed is
not pretty. I've spent some time breaking down the cause and it's come down
to a simple UPDATE as evidenced below:


CREATE TABLE foo (a int PRIMARY KEY, b int);
INSERT INTO foo VALUES (1,1);

CREATE OR REPLACE FUNCTION test() RETURNS int AS $$
DECLARE
i int;
BEGIN
FOR i IN 1..1 LOOP
UPDATE foo SET b=b+1 WHERE a=1;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;

When run individually, this function produces the following timing:
Time: 1912.593 ms
Time: 1937.183 ms
Time: 1941.607 ms
Time: 1943.673 ms
Time: 1944.738 ms

However, when run inside a transaction (to simulate a function doing the
same work) I get this:

START TRANSACTION
Time: 0.836 ms
Time: 1908.931 ms
Time: 5370.110 ms
Time: 8809.378 ms
Time: 12274.294 ms
Time: 15698.745 ms
Time: 19218.389 ms


There is no disk i/o and the postgresql process runs 100% cpu.
Server is amd64 FreeBSD 8-STABLE w/16GB RAM running postgresql 9.0.3 from
packages

Looking at the timing of real data (heavily grouped), it seems the speed of
UPDATEs can vary dependent on how heavily updated a row is, so I set out to
produce a test case:

CREATE TABLE foo (a int PRIMARY KEY, b int);
INSERT INTO foo VALUES (1,1),(2,1),(3,1),(4,1);

CREATE OR REPLACE FUNCTION test(int) RETURNS int AS $$
DECLARE
i int;
BEGIN
FOR i IN 1..1 LOOP
UPDATE foo SET b=1 WHERE a=$1;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
START TRANSACTION;
SELECT test(1);  Time: 1917.305 ms
SELECT test(2);  Time: 1926.758 ms
SELECT test(3);  Time: 1926.498 ms
SELECT test(1);  Time: 5376.691 ms
SELECT test(2);  Time: 5408.231 ms
SELECT test(3);  Time: 5403.654 ms
SELECT test(1);  Time: 8842.425 ms
SELECT test(4);  Time: 1925.954 ms
COMMIT; START TRANSACTION;
SELECT test(1);  Time: 1911.153 ms


As you can see, the more an individual row is updated /within a
transaction/, the slower it becomes for some reason.

Unfortunately in my real-world case, I need to do many billions of these
UPDATEs. Is there any way I can get around this without pulling my huge
source table out of the database and feeding everything in line-at-a-time
from outside the database?



Thanks.


-- 

 
The information contained in this message is confidential and is intended for 
the addressee only. If you have received this message in error or there are any 
problems please notify the originator immediately. The unauthorised use, 
disclosure, copying or alteration of this message is strictly forbidden. 

Critical Software Ltd. reserves the right to monitor and record e-mail messages 
sent to and from this address for the purposes of investigating or detecting 
any unauthorised use of its system and ensuring its effective operation.

Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, 
Keele Science Park, Keele, Staffordshire, ST5 5NH.


This message has been scanned for security threats by iCritical.
For further information, please visit www.icritical.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow-ish Query Needs Some Love

2010-02-02 Thread Matt White
On Feb 2, 6:06 am, Edgardo Portal egportal2...@yahoo.com wrote:
 On 2010-02-02, Matt White mattw...@gmail.com wrote:

  I have a relatively straightforward query that by itself isn't that
  slow, but we have to run it up to 40 times on one webpage load, so it
  needs to run much faster than it does. Here it is:

  SELECT COUNT(*) FROM users, user_groups
   WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND
  user_groups.partner_id IN
   (partner_id_1, partner_id_2);

  The structure is partners have user groups which have users. In the
  test data there are over 200,000 user groups and users but only ~3000
  partners. Anyone have any bright ideas on how to speed this query up?

 Can you avoid running it 40 times, maybe by restructuring the
 query (or making a view) along the lines of the following and
 adding some logic to your page?

 SELECT p.partner_id, ug.user_group_id, u.id, count(*)
   FROM partners p
        LEFT JOIN user_groups ug
               ON ug.partner_id=p.partner_id
        LEFT JOIN users u
               ON u.user_group_id=ug.id
  WHERE NOT u.deleted
  GROUP BY 1,2,3
 ;

Thanks for the suggestion. The view didn't seem to speed things up.
Perhaps we can reduce the number of times it's called, we'll see. Any
additional ideas would be helpful. Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow-ish Query Needs Some Love

2010-02-02 Thread Matt White
I have a relatively straightforward query that by itself isn't that
slow, but we have to run it up to 40 times on one webpage load, so it
needs to run much faster than it does. Here it is:

SELECT COUNT(*) FROM users, user_groups
 WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND
user_groups.partner_id IN
 (partner_id_1, partner_id_2);

The structure is partners have user groups which have users. In the
test data there are over 200,000 user groups and users but only ~3000
partners. Anyone have any bright ideas on how to speed this query up?

Here's the query plan:

 Aggregate  (cost=12574.53..12574.54 rows=1 width=0) (actual
time=2909.298..2909.299 rows=1 loops=1)
   -  Hash Join  (cost=217.79..12566.08 rows=3378 width=0) (actual
time=2909.284..2909.284 rows=0 loops=1)
 Hash Cond: (users.user_group_id = user_groups.id)
 -  Seq Scan on users  (cost=0.00..11026.11 rows=206144
width=4) (actual time=0.054..517.811 rows=205350 loops=1)
   Filter: (NOT deleted)
 -  Hash  (cost=175.97..175.97 rows=3346 width=4) (actual
time=655.054..655.054 rows=22 loops=1)
   -  Nested Loop  (cost=0.27..175.97 rows=3346 width=4)
(actual time=1.327..428.406 rows=22 loops=1)
 -  HashAggregate  (cost=0.27..0.28 rows=1
width=4) (actual time=1.259..1.264 rows=2 loops=1)
   -  Result  (cost=0.00..0.26 rows=1
width=0) (actual time=1.181..1.240 rows=2 loops=1)
 -  Index Scan using user_groups_partner_id_idx
on user_groups  (cost=0.00..133.86 rows=3346 width=8) (actual
time=0.049..96.992 rows=11 loops=2)
   Index Cond: (user_groups.partner_id =
(partner_all_subpartners(3494)))


The one obvious thing that everyone will point out is the sequential
scan on users, but there actually is an index on users.deleted. When I
forced sequential scanning off, it ran slower, so the planner wins
again.

Thanks for any help you can offer.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke
Rajesh Kumar Mallah wrote:
 I've checked out the latest Areca controllers, but the manual 
 available on their website states there's a limitation of 32 disks 
 in an array...
 
 Where exactly is there  limitation of 32 drives. the datasheet of 
 1680 states support upto 128drives using enclosures.

The 1680 manual:
http://www.areca.us//support/download/RaidCards/Documents/Manual_Spec/SAS_Manual.zip

Page 25:

 Note:
 
 1. The maximum no. is 32 disk drived included in a single RAID set

Page 49:

 1. Up to 32 disk drives can be included in a single RAID set.
 2. Up to 8 RAID sets can be created per controller

(point 2 meaning you can't do s/w RAID over umpteen h/w RAID1 pairs)

Page 50:

 To create RAID 30/50/60 volume, you need create multiple RAID sets
 first with the same disk members on each RAID set. The max no. disk
 drives per volume set: 32 for RAID 0/1/10/3/5/6 and 128 for RAID
 30/50/60.

...and a few more times saying the same thing

-- 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke
Scott Carey wrote:
 You probably don’t want a single array with more than 32 drives anyway,
 its almost always better to start carving out chunks and using software
 raid 0 or 1 on top of that for various reasons. I wouldn’t put more than
 16 drives in one array on any of these RAID cards, they’re just not
 optimized for really big arrays and tend to fade between 6 to 16 in one
 array, depending on the quality.

This is what I'm looking at now. The server I'm working on at the moment
currently has a PERC6/e and 3xMD1000s which needs to be tested in a few
setups.  I need to code a benchmarker yet (I haven't found one yet that
can come close to replicating our DB usage patterns), but I intend to try:

1. 3x h/w RAID10 (one per shelf), sofware RAID0
2. lots x h/w RAID1, software RAID0 if the PERC will let me create
enough arrays
3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks
4. 2x h/w RAID30, software RAID0

I'm not holding much hope out for the last one :)


I'm just glad work on a rewrite of my inherited backend systems should
start soon; get rid of the multi-TB MySQL hell and move to a distributed
PG setup on dirt cheap Dell R200s/blades


 You can do direct-attached storage to 100+ drives or more if you want.
  The price and manageability cost go up a lot if it gets too big
 however.  Having global hot spare drives is critical.  Not that the cost
 of using SAN’s and such is low...  SAS expanders have made DAS with
 large arrays very accessible though.

For large storage arrays (RAID60 or similar) you can't beat a RAID
controller and disk shelf(s), especially if you keep the raidsets small
and use cheap ludicrous capacity SATA disks

You just need to be aware that performance doesn't scale well/easily
over 1-2 shelves on the things


-- 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke
Glyn Astill wrote:
 Stupid question, but why do people bother with the Perc line of
 cards if the LSI brand is better?  It seems the headache of trying 
 to get the Perc cards to perform is not worth any money saved.
 
 I think in most cases the dell cards actually cost more, people end
 up stuck with them because they come bundled with their servers -
 they find out too late that they've got a lemon.

That's what's been happening with me... The fact Dell prices can have a
fair bit of downward movement when you get the account manager on the
phone makes them especially attractive to the people controlling the
purse strings.

The biggest reason for me however is the lack of comparative reviews. I
struggled to get the LSI card to replace the PERC3 because all I had to
go on was qualitative mailing list/forum posts from strangers. The only
way I got it was to make the argument that other than trying the LSI,
we'd have no choice other than replacing the server+shelf+disks.

I want to see just how much better a high-end Areca/Adaptec controller
is, but I just don't think I can get approval for a £1000 card because
some guy on the internet said the PERC sucks. Would that same person
say it sucked if it came in Areca packaging? Am I listening to the
advice of a professional, or a fanboy?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-05 Thread Matt Burke
Arjen van der Meijden wrote:

 Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're
 not identical in layout etc), so it would be a bit weird if they
 performed much less than the similar LSI's wouldn't you think?

I've recently had to replace a PERC4/DC with the exact same card made by
LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10
actually appears to be concatenated RAID1 arrays.

Since replacing it and rebuilding the array on the LSI card, performance
has been considerably better (14 disk SCSI shelf)

 Areca may be the fastest around right now, but if you'd like to get it
 all from one supplier, its not too bad to be stuck with Dell's perc 5 or
 6 series.

The PERC6 isn't too bad, however it grinds to a halt when the IO queue
gets large and it has the serious limitation of not supporting more than
8 spans, so trying to build a RAID10 array greater than 16 disks is
pointless if you're not just after the extra capacity.

Are there any reasonable choices for bigger (3+ shelf) direct-connected
RAID10 arrays, or are hideously expensive SANs the only option? I've
checked out the latest Areca controllers, but the manual available on
their website states there's a limitation of 32 disks in an array...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too many clog files

2008-09-09 Thread Matt Smiley
Hi Duan,

As others have said, you should probably attempt to run pg_dump to export your 
database.  If that doesn't work, consider restoring from backup.  If the dump 
does work, you can create a clean PGDATA directory (using initdb like when you 
setup your original installation), and create a fresh copy of your database 
using the dump file.  Then abandon your potentially damaged PGDATA directory.

For future reference:

 - The autovacuum parameter in postgresql.conf is off by default under 
Postgres 8.1.  You should probably turn it on to ensure regular vacuuming, 
unless you have your own cronjob to do the vacuuming.

 - About finding old transactions, there are 2 places you have to look for old 
transactions.  The usual place is in pg_stat_activity.  The 2nd place is 
pg_prepared_xacts, where prepared transactions are listed.  If there's a 
prepared transaction in your system, it might explain why your old commit-logs 
aren't being purged.  The following query shows both prepared and normal 
transactions:

select
  l.transactionid,
  age(l.transactionid) as age,  /* measured in number of other transactions 
elapsed, not in terms of time */
  l.pid,
  case when l.pid is null then false else true end as is_prepared,
  a.backend_start,
  p.prepared as time_xact_was_prepared,
  p.gid as prepared_name
from
  pg_locks l
  left outer join pg_stat_activity a on l.pid = a.procpid
  left outer join pg_prepared_xacts p on l.transactionid = p.transaction
where
  l.locktype = 'transactionid'
  and l.mode = 'ExclusiveLock'
  and l.granted
order by age(l.transactionid) desc
;

 transactionid | age | pid  | is_prepared | backend_start |
time_xact_was_prepared |  prepared_name
---+-+--+-+---+---+--
316645 |  44 |  | f   |   | 
2008-09-09 00:31:46.724178-07 | my_prepared_transaction1
316689 |   0 | 6093 | t   | 2008-09-09 00:40:10.928287-07 | 
  |
(2 rows)

Note that unless you run this query as a superuser (e.g. postgres), the 
columns from pg_stat_activity will only be visible for sessions that belong to 
you.  To rollback this example prepared transaction, you'd type:
  ROLLBACK PREPARED 'my_prepared_transaction1';

Hope this helps!
Matt



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too many clog files

2008-09-09 Thread Matt Smiley
Alvaro Herrera wrote:
 Move the old clog files back where they were, and run VACUUM FREEZE in
 all your databases.  That should clean up all the old pg_clog files, if
 you're really that desperate.

Has anyone actually seen a CLOG file get removed under 8.2 or 8.3?  How about 
8.1?

I'm probably missing something, but looking at src/backend/commands/vacuum.c 
(under 8.2.9 and 8.3.3), it seems like vac_truncate_clog() scans through *all* 
tuples of pg_database looking for the oldest datfrozenxid.  Won't that always 
be template0, which as far as I know can never be vacuumed (or otherwise 
connected to)?

postgres=# select datname, datfrozenxid, age(datfrozenxid), datallowconn from 
pg_database order by age(datfrozenxid), datname ;
 datname  | datfrozenxid |   age| datallowconn
--+--+--+--
 template1| 36347792 | 3859 | t
 postgres | 36347733 | 3918 | t
 mss_test | 36347436 | 4215 | t
 template0|  526 | 36351125 | f
(4 rows)

I looked at several of my 8.2 databases' pg_clog directories, and they all have 
all the sequentially numbered segments ( through current segment).  Would 
it be reasonable for vac_truncate_clog() to skip databases where datallowconn 
is false (i.e. template0)?  Looking back to the 8.1.13 code, it does exactly 
that:
if (!dbform-datallowconn)
continue;

Also, Duan, if you have lots of files under pg_clog, you may be burning through 
transactions faster than necessary.  Do your applications leave autocommit 
turned on?  And since no one else mentioned it, as a work-around for a small 
filesystem you can potentially shutdown your database, move the pg_clog 
directory to a separate filesystem, and create a symlink to it under your 
PGDATA directory.  That's not a solution, just a mitigation.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matt Smiley
Tom Lane [EMAIL PROTECTED] writes:
  Matt Smiley [EMAIL PROTECTED] writes:
So an Index Scan is always going to have a higher cost estimate than
an equivalent Seq Scan returning the same result rows (unless
random_page_cost is  1).  That's why I think the planner is always
preferring the plan that uses a Seq Scan.
  
  If that were the case, we'd never choose an indexscan at all...

You're right, that was a silly guess.

  It's true that a plain indexscan is not preferred for queries that will
  return a large fraction of the table.  However, it should be willing to
  use a bitmap scan for this query, given default cost settings (the
  default cost settings will cause it to prefer bitmap scan for retrieving
  up to about a third of the table, in my experience).  I too am confused
  about why it doesn't prefer that choice in the OP's example.

It looks like the bitmap scan has a higher cost estimate because the entire 
bitmap index must be built before beginning the heap scan and returning rows up 
the pipeline.  The row-count limit can't be pushed lower than the 
bitmap-heap-scan like it can for the basic index-scan.

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# set enable_indexscan = false ;
SET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
   QUERY PLAN

 Limit  (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 
rows=15 loops=1)
   -  Bitmap Heap Scan on my_table  (cost=17070.22..69478.96 rows=988217 
width=8) (actual time=606.892..606.983 rows=15 loops=1)
 Recheck Cond: (b = 3)
 Filter: (a IS NULL)
 -  Bitmap Index Scan on idx_b  (cost=0.00..16823.17 rows=109 
width=0) (actual time=592.657..592.657 rows=100 loops=1)
   Index Cond: (b = 3)
 Total runtime: 607.340 ms
(7 rows)


  It would be interesting to alter the random_page_cost setting and see if he 
 gets
  different results.

Using an unmodified postgresql.conf, the cost estimate for an index-scan were 
so much higher than for a seqscan that random_page_cost had to be set below 0.2 
before the index-scan was preferred.  However, it looks like this was mainly 
because effective_cache_size was too small.  The planner thought the cache was 
only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 
8 KB/block = 330 MB.  It makes sense for the cost estimate to be so much higher 
if blocks are expected to be repeatedly re-fetched from disk.  I wonder if 
David's effective_cache_size is too small.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
QUERY PLAN
---
 Limit  (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 
loops=1)
   -  Seq Scan on my_table  (cost=0.00..164492.74 rows=988217 width=8) (actual 
time=0.028..0.138 rows=15 loops=1)
 Filter: ((a IS NULL) AND (b = 3))
 Total runtime: 0.338 ms
(4 rows)

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# show random_page_cost ;
 random_page_cost
--
 4
(1 row)

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
QUERY PLAN
--
 Limit  (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 
loops=1)
   -  Index Scan using idx_b on my_table  (cost=0.00..3029924.36 rows=988217 
width=8) (actual time=0.043..0.100 rows=15 loops=1)
 Index Cond: (b = 3)
 Filter: (a IS NULL)
 Total runtime: 0.308 ms
(5 rows)

test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
   QUERY PLAN
-
 Limit  (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 
loops=1)
   -  Index Scan using idx_b on my_table  (cost=0.00..161190.65 rows=988217 
width=8) (actual time=0.042..0.097 rows=15 loops=1)
 Index Cond: (b = 3)
 Filter: (a IS NULL)
 Total runtime: 0.307 ms
(5 rows)


Now fix effective_cache_size and try again.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# set effective_cache_size = '500MB' ;
SET
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matt Smiley
Tom Lane [EMAIL PROTECTED] writes:
 I'm not sure offhand whether the existing correlation stats would be of use 
 for
 it, or whether we'd have to get ANALYZE to gather additional data.

Please forgive the tangent, but would it be practical to add support for 
gathering statistics on an arbitrary expression associated with a table, rather 
than just on materialized columns?  For example:
analyze my_tab for expression 'my_func(my_tab.col)' ;
It seems like any time you'd consider using a functional index, this feature 
would let the planner calculate decent selectivity estimates for the 
expression's otherwise opaque data distribution.  The expression might be 
treated as a virtual column on the table; not sure if that helps or hurts.  
Should I post this question on pgsql-hackers?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Matt Smiley
Hi David,

Early in this thread, Pavel suggested:

 you should partial index
 
 create index foo(b) on mytable where a is null;

Rather, you might try the opposite partial index (where a is NOT null) as a 
replacement for the original unqualified index on column A.  This new index 
will be ignored by the query you're trying to tune, but it'll be available to 
the other queries that filter to a non-null value of column A.  (Omitting NULL 
from that index should be ok because you normally wouldn't want to use an index 
when 95% of the table's rows match the filtered key.)

Then you can temporarily disable Seq Scans in your session for just this one 
query, as follows:

SQL create table my_table ( a int, b int ) ;
CREATE TABLE

SQL create index idx_a_not_null on my_table ( a ) where a is not null ;
CREATE INDEX

SQL create index idx_b on my_table ( b ) ;
CREATE INDEX

SQL insert into my_table (a, b)
select
  case when random() = 0.95 then null else i end as a,
  mod(i, 10) as b
from generate_series(1, 1000) s(i)
;
INSERT 0 1000

SQL analyze my_table ;
ANALYZE


Review the statistics available to the optimizer:

SQL select attname, null_frac, n_distinct, most_common_vals, 
most_common_freqs, histogram_bounds, correlation
from pg_stats
where tablename = 'my_table'
order by attname
;
 attname | null_frac | n_distinct |   most_common_vals| 
 most_common_freqs   |  
  histogram_bounds| 
correlation
-+---++---+--++-
 a   | 0.945 | -1 |   | 
 | 
{2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768}
 |   1
 b   | 0 | 10 | {9,4,3,1,2,6,8,5,7,0} | 
{0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.098,0.096,0.09}
 |  
  |0.127294
(2 rows)

SQL select relname, reltuples, relpages from pg_class where relname in 
('my_table', 'idx_a_not_null', 'idx_b') order by relname ;
relname | reltuples | relpages
+---+--
 idx_a_not_null |499955 | 1100
 idx_b  | 1e+07 |21946
 my_table   | 1e+07 |39492
(3 rows)


Run the test query, first without disabling Seq Scan to show this example 
reproduces the plan you're trying to avoid.

SQL explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
---
 Limit  (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 
loops=1)
   -  Seq Scan on my_table  (cost=0.00..164492.00 rows=929250 width=8) (actual 
time=0.061..0.159 rows=15 loops=1)
 Filter: ((a IS NULL) AND (b = 5))
 Total runtime: 0.371 ms
(4 rows)


Now run the same query without the Seq Scan option.

SQL set enable_seqscan = false ;
SET

SQL explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
--
 Limit  (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 
loops=1)
   -  Index Scan using idx_b on my_table  (cost=0.00..2869913.63 rows=929250 
width=8) (actual time=0.072..0.130 rows=15 loops=1)
 Index Cond: (b = 5)
 Filter: (a IS NULL)
 Total runtime: 0.341 ms
(5 rows)

SQL reset enable_seqscan ;
RESET


Yes, it's unsavory to temporarily adjust a session-level parameter to tune a 
single query, but I don't know of a less intrusive way to avoid the SeqScan.  
Here's why I think it might be your simplest option:

As far as I can tell, the plan nodes for accessing the table/index are unaware 
of the LIMIT.  The cost of the Limit node is estimated as the cost of its input 
row-source multiplied by the ratio of requested/returned rows.  For example, 
from the preceding plan output:
2869913.63 for Index Scan upper cost * (15 row limit / 929250 returned 
rows) = 46.326 upper cost for the Limit node
The underlying plan nodes each assume that all the rows matching their filter 
predicates will be returned up the pipeline; the cost estimate is only reduced 
at the Limit node.  A Seq Scan and an Index Scan (over a complete index) will 
both expected the same number of input rows (pg_class.reltuples).  They also 
produce the same estimated result set, since 

Re: [SOLVED] [PERFORM] Query plan excluding index on view

2008-04-07 Thread Matt Klinker
Removing the constants definitely did take care of the issue on 8.3 (still
same query plan on 8.1).  Thanks for your help in getting this resolved, and
sorry again for not including all relevant information on my initial request

On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Matt Klinker [EMAIL PROTECTED] writes:
  --Joined View:
  CREATE OR REPLACE VIEW directory_listing AS
   SELECT school.id, school.name, school.description, 119075291 AS
  listing_type_fid
 FROM school
  UNION ALL
   SELECT company.id, company.name, company.description, 119074833 AS
  listing_type_fid
 FROM company;

 Ah, there's the problem :-(.  Can you get rid of the constants here?
 The planner's currently not smart about UNION ALL subqueries unless
 their SELECT lists contain just simple column references.

 (Yes, fixing that is on the todo list, but don't hold your breath...
 it'll be 8.4 material at the earliest.)

regards, tom lane



Re: [PERFORM] Query plan excluding index on view

2008-04-04 Thread Matt Klinker
 on company c  (cost=0.00..9.67 rows=1
width=424)
Index Cond: (c.id = xref.listing_fid)




On Thu, Apr 3, 2008 at 11:49 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Matt Klinker [EMAIL PROTECTED] writes:
  Sorry for not including this extra bit originally.  Below is the explain
  detail from both the query to the view that takes longer and then the
 query
  directly to the single table that performs quickly.
 ...
  -  Subquery Scan *SELECT* 1  (cost=0.00..1285922.80
 rows=18384890
  width=251)
-  Seq Scan on company  (cost=0.00..1102073.90
 rows=18384890

 The presence of a Subquery Scan node tells me that either this is a much
 older PG version than you stated, or there are some interesting details
 to the query that you omitted.  Please drop the fan-dance routine and
 show us a complete reproducible test case.

regards, tom lane



[PERFORM] Query plan excluding index on view

2008-04-03 Thread Matt Klinker
I'm trying to fine tune this query to return in a reasonable amount of time
and am having difficulties getting the query to run the way I'd like.  I
have a couple of semi-related entities that are stored in individual tables,
say, A and B.  There is then a view created that pulls together the common
fields from these 2 tables.  These are then related through a m:m
relationship to a classification.  Quick definitions of all of this follows:

Table: ItemA
id - primary key
name
description
addtl fields for A

Table: ItemB
id- primary key
name
description
addtl fields for B


View: Combined
SELECT id, name, description from ItemA
UNION ALL
SELECT id, name, description from ItemB


Table: xref
id   - primary key
item_id  - indexed, points to either ItemA.id or ItemB.id
classifcation_id  - indexed, points to classification.id


Table: classifcation
id   - primiary key
name

I'm trying to query from the classification, through the xref, and to the
view to get a list of Items (either A or B) that are tied to a specific
classification.  My query is rather simple, baiscally as follows:

SELECT id, name, description
FROM combination c
INNER JOIN xref on c.id = xref.item_id
WHERE xref.classifcation_id = 1

This query runs in about 2-3 minutes (I should mention that ItemA has ~18M
records and xref has ~26M records - and both will continue to grow).  The
explain text shows a disregard for the indexes on ItemA and ItemB and a
sequence scan is done on both of them.  However, if I rewrite this query to
join directly to ItemA rather to the view it runs in ~50ms because it now
uses the proper index.

I know it's generally requested to include the EXPLAIN text when submitting
a specific question, but I thought perhaps this was generic enough that
someone might at least have some suggestions.  If required I can certainly
work up a simpler example, or I could include my actual explain (though it
doesn't exactly match everything defined above as I tried to keep things
rather generic).

Any links would be nice as well, from all my searching the past few days,
most of the performance tuning resources I could find where about tuning the
server itself, not really a specific query - at least not one that dealt
with this issue.  If you've read this far - thank you much!


Re: [PERFORM] Query plan excluding index on view

2008-04-03 Thread Matt Klinker
Sorry for not including this extra bit originally.  Below is the explain
detail from both the query to the view that takes longer and then the query
directly to the single table that performs quickly.

Hash Join  (cost=49082.96..1940745.80 rows=11412 width=76)
  Hash Cond: (outer.?column1? = inner.listing_fid)
  -  Append  (cost=0.00..1290709.94 rows=18487347 width=252)
-  Subquery Scan *SELECT* 1  (cost=0.00..1285922.80 rows=18384890
width=251)
  -  Seq Scan on company  (cost=0.00..1102073.90 rows=18384890
width=251)
-  Subquery Scan *SELECT* 2  (cost=0.00..4787.14 rows=102457
width=252)
  -  Seq Scan on school  (cost=0.00..3762.57 rows=102457
width=252)
  -  Hash  (cost=49042.64..49042.64 rows=16130 width=8)
-  Bitmap Heap Scan on listing_node_xref xref
(cost=102.45..49042.64 rows=16130 width=8)
  Recheck Cond: (node_fid = 173204537)
  -  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)


Nested Loop  (cost=102.45..98564.97 rows=11349 width=517)
  -  Bitmap Heap Scan on listing_node_xref xref  (cost=102.45..49042.64
rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
-  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..102.45 rows=16130 width=0)
  Index Cond: (node_fid = 173204537)
  -  Index Scan using idx_pki_company_id on company c  (cost=0.00..3.06
rows=1 width=517)
Index Cond: (c.id = outer.listing_fid)


On Thu, Apr 3, 2008 at 7:19 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Matt Klinker [EMAIL PROTECTED] writes:
  I new I'd forget something!  I've tried this on both 8.2 and 8.3 with
 the
  same results.

 Then you're going to have to provide more details ...

regards, tom lane



[PERFORM] db performance/design question

2007-09-12 Thread Matt Chambers


I'm designing a system that will be doing over a million inserts/deletes 
on a single table every hour.  Rather than using a single table, it is 
possible for me to partition the data into multiple tables if I wanted 
to, which would be nice because I can just truncate them when I don't 
need them.  I could even use table spaces to split the IO load over 
multiple filers.  The application does not require all this data be in 
the same table.   The data is fairly temporary, it might last 5 seconds, 
it might last 2 days, but it will all be deleted eventually and 
different data will be created.


Considering a single table would grow to 10mil+ rows at max, and this 
machine will sustain about 25mbps of insert/update/delete traffic 24/7 - 
365, will I be saving much by partitioning data like that?


--
-Matt

http://twiki.spimageworks.com/twiki/bin/view/Software/CueDevelopment


Re: [PERFORM] Sunfire X4500 recommendations

2007-03-29 Thread Matt Smiley
Hi David,

Thanks for your feedback!  I'm rather a newbie at this, and I do appreciate the 
critique.

First, let me correct myself: The formulas for the risk of loosing data when 
you loose 2 and 3 disks shouldn't have included the first term (g/n).  I'll 
give the corrected formulas and tables at the end of the email.


 please explain why you are saying that the risk of loosing any 1 disk is 
 1/n. shouldn't it be probability of failure * n instead?

1/n represents the assumption that all disks have an equal probability of being 
the next one to fail.  This seems like a fair assumption in general for the 
active members of a stripe (not including hot spares).  A possible exception 
would be the parity disks (because reads always skip them and writes always hit 
them), but that's only a consideration if the RAID configuration used dedicated 
disks for parity instead of distributing it across the RAID 5/6 group members.  
Apart from that, whether the workload is write-heavy or read-heavy, sequential 
or scattered, the disks in the stripe ought to handle a roughly equivalent 
number of iops over their lifetime.


 following this logic the risk of loosing all 48 disks in a single group of 
 48 would be 100%

Exactly.  Putting all disks in one group is RAID 0 -- no data protection.  If 
you loose even 1 active member of the stripe, the probability of loosing your 
data is 100%.


 also what you are looking for is the probability of the second (and third) 
 disks failing in time X (where X is the time nessasary to notice the 
 failure, get a replacement, and rebuild the disk)

Yep, that's exactly what I'm looking for.  That's why I said, these 
probabilities are only describing the case where we don't have enough time 
between disk failures to recover the array.  My goal wasn't to estimate how 
long time X is.  (It doesn't seem like a generalizable quantity; due partly to 
logistical and human factors, it's unique to each operating environment.)  
Instead, I start with the assumption that time X has been exceeded, and we've 
lost a 2nd (or 3rd) disk in the array.  Given that assumption, I wanted to show 
the probability that the loss of the 2nd disk has caused the stripe to become 
unrecoverable.

We know that RAID 10 and 50 can tolerate the loss of anywhere between 1 and n/g 
disks, depending on how lucky you are.  I wanted to quantify the amount of luck 
required, as a risk management tool.  The duration of time X can be minimized 
with hot spares and attentive administrators, but the risk after exceeding time 
X can only be minimized (as far as I know) by configuring the RAID stripe with 
small enough underlying failure groups.


 the killer is the time needed to rebuild the disk, with multi-TB arrays 
 is't sometimes faster to re-initialize the array and reload from backup 
 then it is to do a live rebuild (the kernel.org servers had a raid failure 
 recently and HPA mentioned that it took a week to rebuild the array, but 
 it would have only taken a couple days to do a restore from backup)

That's very interesting.  I guess the rebuild time also would depend on how 
large the damaged failure group was.  Under RAID 10, for example, I think you'd 
still only have to rebuild 1 disk from its mirror, regardless of how many other 
disks were in the stripe, right?  So shortening the rebuild time may be another 
good motivation to keep the failure groups small.


 add to this the fact that disk failures do not appear to be truely 
 independant from each other statisticly (see the recent studies released 
 by google and cmu), and I wouldn't bother with single-parity for a 

I don't think I've seen the studies you mentioned.  Would you cite them please? 
 This may not be typical of everyone's experience, but what I've seen during 
in-house load tests is an equal I/O rate for each disk in my stripe, using 
short-duration sampling intervals to avoid long-term averaging effects.  This 
is what I expected to find, so I didn't delve deeper.

Certainly it's true that some disks may be more heavily burdened than others 
for hours or days, but I wouldn't expect any bias from an application-driven 
access pattern to persist for a significant fraction of a disk's lifespan.  The 
only influence I'd expect to bias the cumulative I/O handled by a disk over its 
entire life would be its role in the RAID configuration.  Hot spares will have 
minimal wear-and-tear until they're activated.  Dedicated parity disks will 
probably live longer than data disks, unless the workload is very heavily 
oriented towards small writes (e.g. logging).


 multi-TB array. If the data is easy to recreate (including from backup) or 
 short lived (say a database of log data that cycles every month or so) I 
 would just do RAID-0 and plan on loosing the data on drive failure (this 
 assumes that you can afford the loss of service when this happens). if the 
 data is more important then I'd do dual-parity or more, along with a hot 
 spare so that the 

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-27 Thread Matt Smiley
Hi Dimitri,

First of all, thanks again for the great feedback!

Yes, my I/O load is mostly read operations.  There are some bulk writes done in 
the background periodically throughout the day, but these are not as 
time-sensitive.  I'll have to do some testing to find the best balance of read 
vs. write speed and tolerance of disk failure vs. usable diskspace.

I'm looking forward to seeing the results of your OLTP tests!  Good luck!  
Since I won't be doing that myself, it'll be all new to me.

About disk failure, I certainly agree that increasing the number of disks will 
decrease the average time between disk failures.  Apart from any performance 
considerations, I wanted to get a clear idea of the risk of data loss under 
various RAID configurations.  It's a handy reference, so I thought I'd share it:



The goal is to calculate the probability of data loss when we loose a certain 
number of disks within a short timespan (e.g. loosing a 2nd disk before 
replacing+rebuilding the 1st one).  For RAID 10, 50, and Z, we will loose data 
if any disk group (i.e. mirror or parity-group) looses 2 disks.  For RAID 60 
and Z2, we will loose data if 3 disks die in the same parity group.  The parity 
groups can include arbitrarily many disks.  Having larger groups gives us more 
usable diskspace but less protection.  (Naturally we're more likely to loose 2 
disks in a group of 50 than in a group of 5.)

g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or 
more; dual-parity = 4 or more)
n = total number of disks
risk of loosing any 1 disk = 1/n
risk of loosing 1 disk from a particular group = g/n
risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1)
risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2)

For the x4500, we have 48 disks.  If we stripe our data across all those disks, 
then these are our configuration options:

RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same 
group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   4824  0.09%
  3  16   4832  0.27%
  4  12   4836  0.53%
  6   8   4840  1.33%
  8   6   4842  2.48%
 12   4   4844  5.85%
 24   2   4846 24.47%
 48   1   4847100.00%

RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose 
data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   48   n/an/a
  3  16   4816  0.01%
  4  12   4824  0.02%
  6   8   4832  0.12%
  8   6   4836  0.32%
 12   4   4840  1.27%
 24   2   4844 11.70%
 48   1   4846100.00%

So, in terms of fault tolerance:
 - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when 
only 2 disks fail.
 - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk 
groups across which to spread the risk.
 - Having more parity groups increases fault tolerance but decreases usable 
diskspace.

That's all assuming each disk has an equal chance of failure, which is probably 
true since striping should distribute the workload evenly.  And again, these 
probabilities are only describing the case where we don't have enough time 
between disk failures to recover the array.

In terms of performance, I think RAID 10 should always be best for write speed. 
 (Since it doesn't calculate parity, writing a new block doesn't require 
reading the rest of the RAID stripe just to recalculate the parity bits.)  I 
think it's also normally just as fast for reading, since the controller can 
load-balance the pending read requests to both sides of each mirror.





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sunfire X4500 recommendations

2007-03-23 Thread Matt Smiley
Thanks Dimitri!  That was very educational material!  I'm going to think out 
loud here, so please correct me if you see any errors.

The section on tuning for OLTP transactions was interesting, although my OLAP 
workload will be predominantly bulk I/O over large datasets of 
mostly-sequential blocks.

The NFS+ZFS section talked about the zil_disable control for making zfs ignore 
commits/fsyncs.  Given that Postgres' executor does single-threaded synchronous 
I/O like the tar example, it seems like it might benefit significantly from 
setting zil_disable=1, at least in the case of frequently flushed/committed 
writes.  However, zil_disable=1 sounds unsafe for the datafiles' filesystem, 
and would probably only be acceptible for the xlogs if they're stored on a 
separate filesystem and you're willing to loose recently committed 
transactions.  This sounds pretty similar to just setting fsync=off in 
postgresql.conf, which is easier to change later, so I'll skip the zil_disable 
control.

The RAID-Z section was a little surprising.  It made RAID-Z sound just like 
RAID 50, in that you can customize the trade-off between iops versus usable 
diskspace and fault-tolerance by adjusting the number/size of parity-protected 
disk groups.  The only difference I noticed was that RAID-Z will apparently set 
the stripe size across vdevs (RAID-5s) to be as close as possible to the 
filesystem's block size, to maximize the number of disks involved in 
concurrently fetching each block.  Does that sound about right?

So now I'm wondering what RAID-Z offers that RAID-50 doesn't.  I came up with 2 
things: an alleged affinity for full-stripe writes and (under RAID-Z2) the 
added fault-tolerance of RAID-6's 2nd parity bit (allowing 2 disks to fail per 
zpool).  It wasn't mentioned in this blog, but I've heard that under certain 
circumstances, RAID-Z will magically decide to mirror a block instead of 
calculating parity on it.  I'm not sure how this would happen, and I don't know 
the circumstances that would trigger this behavior, but I think the goal (if it 
really happens) is to avoid the performance penalty of having to read the rest 
of the stripe required to calculate parity.  As far as I know, this is only an 
issue affecting small writes (e.g. single-row updates in an OLTP workload), but 
not large writes (compared to the RAID's stripe size).  Anyway, when I saw the 
filesystem's intent log mentioned, I thought maybe the small writes are 
converted to full-stripe writes by deferring their commit until a full stripe's 
worth of data had been accumulated.  Does that sound plausible?

Are there any other noteworthy perks to RAID-Z, rather than RAID-50?  If not, 
I'm inclined to go with your suggestion, Dimitri, and use zfs like RAID-10 to 
stripe a zpool over a bunch of RAID-1 vdevs.  Even though many of our queries 
do mostly sequential I/O, getting higher seeks/second is more important to us 
than the sacrificed diskspace.

For the record, those blogs also included a link to a very helpful ZFS Best 
Practices Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

To sum up, so far the short list of tuning suggestions for ZFS includes:
 - Use a separate zpool and filesystem for xlogs if your apps write often.
 - Consider setting zil_disable=1 on the xlogs' dedicated filesystem.  ZIL is 
the intent log, and it sounds like disabling it may be like disabling 
journaling.  Previous message threads in the Postgres archives debate whether 
this is safe for the xlogs, but it didn't seem like a conclusive answer was 
reached.
 - Make filesystem block size (zfs record size) match the Postgres block size.
 - Manually adjust vdev_cache.  I think this sets the read-ahead size.  It 
defaults to 64 KB.  For OLTP workload, reduce it; for DW/OLAP maybe increase it.
 - Test various settings for vq_max_pending (until zfs can auto-tune it).  See 
http://blogs.sun.com/erickustarz/entry/vq_max_pending
 - A zpool of mirrored disks should support more seeks/second than RAID-Z, just 
like RAID 10 vs. RAID 50.  However, no single Postgres backend will see better 
than a single disk's seek rate, because the executor currently dispatches only 
1 logical I/O request at a time.


 Dimitri [EMAIL PROTECTED] 03/23/07 2:28 AM 
On Friday 23 March 2007 03:20, Matt Smiley wrote:
 My company is purchasing a Sunfire x4500 to run our most I/O-bound
 databases, and I'd like to get some advice on configuration and tuning. 
 We're currently looking at: - Solaris 10 + zfs + RAID Z
  - CentOS 4 + xfs + RAID 10
  - CentOS 4 + ext3 + RAID 10
 but we're open to other suggestions.


Matt,

for Solaris + ZFS you may find answers to all your questions here:

  http://blogs.sun.com/roch/category/ZFS
  http://blogs.sun.com/realneel/entry/zfs_and_databases

Think to measure log (WAL) activity and use separated pool for logs if needed. 
Also, RAID-Z is more security-oriented rather performance, RAID-10 should be 
a better choice...

Rgds

[PERFORM] Sunfire X4500 recommendations

2007-03-22 Thread Matt Smiley
My company is purchasing a Sunfire x4500 to run our most I/O-bound databases, 
and I'd like to get some advice on configuration and tuning.  We're currently 
looking at:
 - Solaris 10 + zfs + RAID Z
 - CentOS 4 + xfs + RAID 10
 - CentOS 4 + ext3 + RAID 10
but we're open to other suggestions.

From previous message threads, it looks like some of you have achieved stellar 
performance under both Solaris 10 U2/U3 with zfs and CentOS 4.4 with xfs.  
Would those of you who posted such results please describe how you tuned the 
OS/fs to yield those figures (e.g. patches, special drivers, read-ahead, 
checksumming, write-through cache settings, etc.)?

Most of our servers currently run CentOS/RedHat, and we have little experience 
with Solaris, but we're not opposed to Solaris if there's a compelling reason 
to switch.  For example, it sounds like zfs snapshots may have a lighter 
performance penalty than LVM snapshots.  We've heard that just using LVM (even 
without active snapshots) imposes a maximum sequential I/O rate of around 600 
MB/s (although we haven't yet reached this limit experimentally).

By the way, we've also heard that Solaris is more stable under heavy I/O load 
than Linux.  Have any of you experienced this?  It's hard to put much stock in 
such a blanket statement, but naturally we don't want to introduce 
instabilities.

Thanks in advance for your thoughts!

For reference:

Our database cluster will be 3-6 TB in size.  The Postgres installation will be 
8.1 (at least initially), compiled to use 32 KB blocks (rather than 8 KB).  The 
workload will be predominantly OLAP.  The Sunfire X4500 has 2 dual-core 
Opterons, 16 GB RAM, 48 SATA disks (500 GB/disk * 48 = 24 TB raw - 12 TB 
usable under RAID 10).

So far, we've seen the X4500 deliver impressive but suboptimal results using 
the out-of-the-box installation of Solaris + zfs.  The Linux testing is in the 
early stages (no xfs, yet), but so far it yeilds comparatively modest write 
rates and very poor read and rewrite rates.

===
Results under Solaris with zfs:
===

Four concurrent writers:
% time dd if=/dev/zero of=/zpool1/test/50GB-zero1 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero2 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero3 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero4 bs=1024k count=51200 ; time 
sync

Seq Write (bs = 1 MB):  128 + 122 + 131 + 124 = 505 MB/s

Four concurrent readers:
% time dd if=/zpool1/test/50GB-zero1 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero2 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero3 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero4 of=/dev/null bs=1024k

Seq Read (bs = 1 MB):   181 + 177 + 180 + 178 = 716 MB/s


One bonnie++ process:
% bonnie++ -r 16384 -s 32g:32k -f -n0 -d /zpool1/test/bonnie_scratch

Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
thumper132G:32k   604173  98 268893  43   543389  59 519.2  
 3
thumper1,32G:32k,,,604173,98,268893,43,,,543389,59,519.2,3,


4 concurrent synchronized bonnie++ processes:
% bonnie++ -p4
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -p-1

Combined results of 4 sessions:
Seq Output:   124 + 124 + 124 + 140 = 512 MB/s
Rewrite:   93 +  94 +  93 +  96 = 376 MB/s
Seq Input:192 + 194 + 193 + 197 = 776 MB/s
Random Seek:  327 + 327 + 335 + 332 = 1321 seeks/s


=
Results under CentOS 4 with ext3 and LVM:
=

% bonnie++ -s 32g:32k -f -n0 -d /large_lvm_stripe/test/bonnie_scratch
Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
thumper1.rt 32G:32k   346595  94 59448  11   132471  12 479.4   
2
thumper1.rtkinternal,32G:32k,,,346595,94,59448,11,,,132471,12,479.4,2,



Summary of bonnie++ results:


   sequential  sequentialsequential  scattered
Test case  write MB/s  rewrite MB/s  read MB/s   seeks/s
-  --    --  -
Sol10+zfs, 1 process  604   269 543519
Sol10+zfs, 4 processes512   376 776  

[PERFORM] pgsql upgrade

2006-12-04 Thread Matt Chambers
Hi all, I've run into an issue with a Pg 7.4.6 to 8.1.5 upgrade along 
with hardware upgrade. 
I moved the database from a 2x 3.0ghz Xeon (512kb w/HT) to a 2x Opteron 
250.  The database is in
memory on a tmpfs partition. (the application can rebuild the db during 
total system failure)


When I first switched it over, the results were exactly what I expected. 
I was sustaining about
a 2.2 on the Xeon and with the Opteron, about a 1.5 with the same 
traffic. The box is highload,

it ouputs about 15mbps.

After a couple hours working perfectly the ' system' (vs user) load 
jumped from
a 3% of total CPU usage, to 30%, a 10x increase, and postgres started to 
write out
to data/base at a fairly fast rate.  The CPU context switch rate doubled 
at the same time.

Iowait, which was historically 0 on the 7.4 box, went to 0.08.

Strangely enough, a vacuum (not full or analyze) stopped postgres from 
writing to data/base
but the strange load pattern remains. (system is ~30% of the overall 
load, vs 3% before)


So, my question is, what happened, and how can I get it back to the same 
load pattern

7.4.6 had, and the same pattern I had for 4 hours before it went crazy?


Matt




















---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Loading the entire DB into RAM

2006-04-07 Thread Matt Davies | Postgresql List
If memory serves me correctly I have seen several posts about this in 
the past.


I'll try to recall highlights.

1. Create a md in linux sufficiently large enough to handle the data set 
you are wanting to store.

2. Create a HD based copy somewhere as your permanent storage mechanism.
3. Start up your PostgreSQL instance with the MD as the data store
4. Load your data to the MD instance.
5. Figure out how you will change indexes _and_ ensure that your disk 
storage is consistent with your MD instance.


I haven't done so, but it would be interesting to have a secondary 
database somewhere that is your primary storage. It needn't be 
especially powerful, or even available. It serves as the place to 
generate your indexing data. You could then use SLONY to propogate the 
data to the MD production system.


Of course, if you are updating your system that resides in ram, you 
should be thinking the other way. Have SLONY replicate changes to the 
other, permanent storage, system.


Either way you do it, I can't think of an out of the box method to doing 
it. Somehow one has to transfer data from permanent storage to the md 
instance, and, likewise, back to permanent storage.


Out of curiosity, what are you using as the search engine?


Charles A. Landemaine wrote:

I have a web server with PostgreSQL and RHEL. It hosts a search
engine, and each time some one makes a query, it uses the HDD Raid
array. The DB is not very big, it is less than a GB. I plan to add
more RAM anyway.

What I'd like to do is find out how to keep the whole DB in RAM so
that each time some one does a query, it doesn't use the HDD. Is it
possible, if so, how?
Thanks,

Charles.

---(end of broadcast)---
TIP 6: explain analyze is your friend


  



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Storing Digital Video

2006-01-31 Thread Matt Davies | Postgresql List

Rodrigo Madera wrote:


I am concerned with performance issues involving the storage of DV on
a database.

I though of some options, which would be the most advised for speed?

1) Pack N frames inside a container and store the container to the db.
2) Store each frame in a separate record in the table frames.
3) (type something here)

Thanks for the help,

 



My experience has been that this is a very bad idea. Many people want to 
store all sorts of data in a database such as email messages, pictures, 
etc... The idea of a relational database is to perform queries against 
data. If you are needing to just store data then store it on a disk and 
use the database as the indexer of the data.


Keep in mind the larger the database the slower some operations become.

Unless you are operating on the frame data (which you either store as 
blobs or hex-encoded data) I'd recommend you store the data on a hard 
drive and let the database store meta data about the video such as path 
information, run time, author, etc...


We do this on an application storing close to a million images and the 
performance is impressive.
   1. we don't have to do any sort of data manipulation storing the 
data in or retrieving the data out of the database.
   2. our database is compact and extremely fast  - it is using the 
database for what it was designed for - relational queries.


My $0.02


Rodrigo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Prefetch

2005-05-15 Thread Matt Olson
I wanted to get some opinions about row prefetching.  AFAIK, there is no 
prefetching done by PostgreSQL; all prefetching is delegated to the operating 
system.  

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has 
around 30,000 records (tickers).  A typical operation is to get the 200 day 
simple moving average (of price) for each ticker and write the result to a 
summary table.  In running this process (Perl/DBI), it is typical to see 
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next 
day's date, the postgres cache and file cache is now populated with 199 days 
of the needed data, postgres runs 80-90% of CPU and total run time is greatly 
reduced.  My conclusion is that this is a high cache hit rate in action.  

I've done other things that make sense, like using indexes, playing with the 
planner constants and turning up the postgres cache buffers.  

Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is no 
apparent difference in database performance.  The random nature of the I/O 
drops disk reads down to about 1MB/sec for the array.  A linear table scan 
can easily yield 70-80MB/sec on this system.  Total table size is usually 
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has 
there been serious consideration of implementing something like a prefetch 
subsystem?  Does anyone have any opinions as to why this would be a bad idea 
for postgres?  

Postges is great for a multiuser environment and OLTP applications.  However, 
in this set up, a data warehouse, the observed performance is not what I 
would hope for.  

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Prefetch

2005-05-10 Thread Matt Olson
I wanted to get some opinions about row prefetching.  AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the operating
system.

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has
around 30,000 records (tickers).  A typical operation is to get the 200 day
simple moving average (of price) for each ticker and write the result to a
summary table.  In running this process (Perl/DBI), it is typical to see
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next
day's date, the postgres cache and file cache is now populated with 199 days
of the needed data, postgres runs 80-90% of CPU and total run time is greatly
reduced.  My conclusion is that this is a high cache hit rate in action.

I've done other things that make sense, like using indexes, playing with the
planner constants and turning up the postgres cache buffers.

Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
apparent difference in database performance.  The random nature of the I/O
drops disk reads down to about 1MB/sec for the array.  A linear table scan
can easily yield 70-80MB/sec on this system.  Total table size is usually
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
there been serious consideration of implementing something like a prefetch
subsystem?  Does anyone have any opinions as to why this would be a bad idea
for postgres?

Postges is great for a multiuser environment and OLTP applications.  However,
in this set up, a data warehouse, the observed performance is not what I
would hope for.

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Prefetch

2005-05-10 Thread Matt Olson
My postgres binaries and WAL are on a separate disk from the raid array.  The 
table I'm doing the selects from is probably about 4GB in size and 18-20 
million records.  No concurrent or dependent inserts or deletes are going on.

Tom's point and your points about optimizing the application are well taken.  
I know my approach is sub optimal and prone to getting caught by latency 
issues (seek times, cache hit rates, etc.).  However, the question of 
prefetch in my mind is all about eliminating latencies, so, I thought my 
problem would be good for the sake of discussing prefetching.

The two approaches I'm in the process of testing are Rod and Greg's suggestion 
of using 'CLUSTER'.  And for the sake of not letting a good idea get away, 
I'll probably spend time on doing a parallel query approach which Tom 
suggested.  

I'll report back to the list what I find and maybe do some _rough_ 
benchmarking.  This is a production app, so I can't get too much in the way 
of the daily batches.  

-- 
Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

On Tuesday 10 May 2005 11:13 am, Greg Stark wrote:
 Matt Olson writes:
  I've done other things that make sense, like using indexes, playing with
  the planner constants and turning up the postgres cache buffers.
 
  Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields
  no apparent difference in database performance.  The random nature of the
  I/O drops disk reads down to about 1MB/sec for the array.  A linear table
  scan can easily yield 70-80MB/sec on this system.  Total table size is
  usually around 1GB and with indexes should be able to fit completely in
  main memory.

 Actually forcing things to use indexes is the wrong direction to go if
 you're trying to process lots of data and want to stream it off disk as
 rapidly as possible. I would think about whether you can structure your
 data such that you can use sequential scans. That might mean partitioning
 your raw data into separate tables and then accessing only the partitions
 that are relevant to the query.

 In your application that might be hard. It sounds like you would need more
 or less one table per stock ticker which would really be hard to manage.

 One thing you might look into is using the CLUSTER command. But postgres
 doesn't maintain the cluster ordering so it would require periodically
 rerunning it.

 I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
 even for completely random reads. Is it possible you're seeing something
 else interfering? Do you have INSERT/UPDATE/DELETE transactions happening
 concurrently with this select scan? If so you should strongly look into
 separating the transaction log from the data files.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM]

2005-01-21 Thread Matt Casters

 On Thu, Jan 20, 2005 at 11:31:29 -0500,
   Alex Turner [EMAIL PROTECTED] wrote:
 I am curious - I wasn't aware that postgresql supported partitioned tables,
 Could someone point me to the docs on this.

 Some people have been doing it using a union view. There isn't actually
 a partition feature.



Actually, there is.  If found this example on pgsql-performance:

   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
   ANALYZE super_foo ;

   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
   INSERT INTO sub_foo1 VALUES ( 1, 1 );
   -- repeat insert until sub_foo1 has 1,000,000 rows
   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
   ANALYZE sub_foo1 ;

   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
   INSERT INTO sub_foo2 VALUES ( 2, 1 );
   -- repeat insert until sub_foo2 has 1,000,000 rows
   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
   ANALYZE sub_foo2 ;


I think that in certain cases this system even beats Oracle as it stores less 
information in the
table partitions. (and in doing so is causing less disk IO)
BTW, internally, Oracle sees partitions as tables too.  Even the Union all 
system that MS SQL
Server uses works fine as long as the optimiser supports it to prune correctly.

Cheers,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Matt Clark
Presumably it can't _ever_ know without being explicitly told, because 
even for a plain SELECT there might be triggers involved that update 
tables, or it might be a select of a stored proc, etc.  So in the 
general case, you can't assume that a select doesn't cause an update, 
and you can't be sure that the table list in an update is a complete 
list of the tables that might be updated.


Tatsuo Ishii wrote:
Can I ask a question?
Suppose table A gets updated on the master at time 00:00. Until 00:03
pgpool needs to send all queries regarding A to the master only. My
question is, how can pgpool know a query is related to A?
--
Tatsuo Ishii
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM]

2005-01-21 Thread Matt Casters

  Some people have been doing it using a union view. There isn't actually
  a partition feature.

 Actually, there is.  If found this example on pgsql-performance:

CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
ANALYZE super_foo ;
 
CREATE TABLE sub_foo1 () INHERITS ( super_foo );
 [...]
 
CREATE TABLE sub_foo2 () INHERITS ( super_foo );
 [...]
 

 Yes, this could be used instead of a view. But there is one thing
 missing. You can't just insert into super_foo and aquire the correct
 partition. You will still have to insert into the correct underlying
 table. Real partitioning will take care of correct partition
 selection.

This IS bad news.  It would mean a serious change in the ETL.
I think I can solve the other problems, but I don't know about this one...

Regards,

Matt





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Matt Clark
Yes, I wasn't really choosing my examples particularly carefully, but I 
think the conclusion stands: pgpool (or anyone/thing except for the 
server) cannot in general tell from the SQL it is handed by the client 
whether an update will occur, nor which tables might be affected.

That's not to say that pgpool couldn't make a good guess in the majority 
of cases!

M
Joshua D. Drake wrote:
Matt Clark wrote:
Presumably it can't _ever_ know without being explicitly told, 
because even for a plain SELECT there might be triggers involved that 
update tables, or it might be a select of a stored proc, etc.  So in 
the general case, you can't assume that a select doesn't cause an 
update, and you can't be sure that the table list in an update is a 
complete list of the tables that might be updated.

Uhmmm no :) There is no such thing as a select trigger. The closest 
you would get
is a function that is called via select which could be detected by 
making sure
you are prepending with a BEGIN or START Transaction. Thus yes pgPool 
can be made
to do this.

Sincerely,
Joshua D. Drake

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM]

2005-01-20 Thread Matt Casters

Hi,

I have the go ahead of a customer to do some testing on Postgresql in a couple 
of weeks as a
replacement for Oracle.
The reason for the test is that the number of users of the warehouse is going 
to increase and this
will have a serious impact on licencing costs. (I bet that sounds familiar)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on 
Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M rows, 
the other will be
hitting 1B rows soon.
(around 250Gb of data)

My questions to the list are: has this sort of thing been attempted before? If 
so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance 
benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to 
read on data
warehouse performance tuning on PostgreSQL?

Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date
on the results.

Best regards,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM]

2005-01-20 Thread Matt Casters
 
Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.

We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?

That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?

Thanks!

Matt

-Oorspronkelijk bericht-
Van: Stephen Frost [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters ([EMAIL PROTECTED]) wrote:
 I have the go ahead of a customer to do some testing on Postgresql in 
 a couple of weeks as a replacement for Oracle.
 The reason for the test is that the number of users of the warehouse 
 is going to increase and this will have a serious impact on licencing 
 costs. (I bet that sounds familiar)

Rather familiar, yes... :)

 We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
 Basically we have 2 large fact tables to deal with: one going for 400M 
 rows, the other will be hitting 1B rows soon.
 (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

 My questions to the list are: has this sort of thing been attempted 
 before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

 I've been reading up on partitioned tabes on pgsql, will the 
 performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

 What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

 Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

 Thanks in advance for any help you may have, I'll do my best to keep 
 pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

Stephen



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM]

2005-01-20 Thread Matt Casters

Joshua,

Actually that's a great idea!
I'll have to check if Solaris wants to play ball though.
We'll have to see as we don't have the new disks yet, ETA is next week.

Cheers,

Matt

-Oorspronkelijk bericht-
Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 21:26
Aan: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

Matt Casters wrote:
  
 Thanks Stephen,
 
 My main concern is to get as much read performance on the disks as 
 possible on this given system.  CPU is rarely a problem on a typical 
 data warehouse system, this one's not any different.
 
 We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third 
 one coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your
database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




 I was kind of hoping that the new PGSQL tablespaces would allow me to 
 create a storage container spanning multiple file-systems, but 
 unfortunately, that seems to be not the case.  Is this correct?
 
 That tells me that I probably need to do a full reconfiguration of the 
 disks on the Solaris level to get maximum performance out of the system.
 Mmmm. This is going to be a though one to crack.  Perhaps it will be 
 possible to get some extra juice out of placing the indexes on the 
 smaller disks (150G) and the data on the bigger ones?
 
 Thanks!
 
 Matt
 
 -Oorspronkelijk bericht-
 Van: Stephen Frost [mailto:[EMAIL PROTECTED]
 Verzonden: donderdag 20 januari 2005 15:26
 Aan: Matt Casters
 CC: pgsql-performance@postgresql.org
 Onderwerp: Re: [PERFORM]
 
 * Matt Casters ([EMAIL PROTECTED]) wrote:
 
I have the go ahead of a customer to do some testing on Postgresql in 
a couple of weeks as a replacement for Oracle.
The reason for the test is that the number of users of the warehouse 
is going to increase and this will have a serious impact on licencing 
costs. (I bet that sounds familiar)
 
 
 Rather familiar, yes... :)
 
 
We're running a medium sized data warehouse on a Solaris box (4CPU, 
8Gb
 
 RAM) on Oracle.
 
Basically we have 2 large fact tables to deal with: one going for 400M 
rows, the other will be hitting 1B rows soon.
(around 250Gb of data)
 
 
 Quite a bit of data.  There's one big thing to note here I think- 
 Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.
 So, it depends on your workload as to how that may impact you.  
 Situations where this will be unlikely to affect you:
 
 Your main bottle-neck is IO/disk and not CPU.
 You run multiple queries in parallel frequently.
 There are other processes on the system which chew up CPU time anyway.
 
 Situations where you're likely to be affected would be:
 
 You periodically run one big query.
 You run a set of queries in sequential order.
 
 
My questions to the list are: has this sort of thing been attempted 
before? If so, what where the performance results compared to Oracle?
 
 
 I'm pretty sure it's been attempted before but unfortunately I don't 
 have any numbers on it myself.  My data sets aren't that large (couple 
 million
 rows) but I've found PostgreSQL at least as fast as Oracle for what we 
 do, and much easier to work with.
 
 
I've been reading up on partitioned tabes on pgsql, will the 
performance benefit will be comparable to Oracle partitioned tables?
 
 
 In this case I would think so, except that PostgreSQL still won't use 
 multiple CPUs for a given query, even against partitioned tables, aiui.
 
 
What are the gotchas?
 
 
 See above? :)  Other issues are things having to do w/ your specific
 SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is 
 it, something like select x,y from a,b where x=%y; to do a right-join,
iirc).
 
 
Should I be testing on 8 or the 7 version?
 
 
 Now that 8.0 is out I'd say probably test with that and just watch for 
 8.0.x releases before you go production, if you have time before you 
 have to go into production with the new solution (sounds like you do- 
 changing databases takes time anyway).
 
 
Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.
 
 
 Hope that helps.  Others on here will correct me if I misspoke. :)
 
   Stephen
 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., your source for PostgreSQL replication, professional
support, programming, managed services, shared and dedicated hosting. Home
of the Open Source Projects plPHP, plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] DWH on Postgresql

2005-01-20 Thread Matt Casters



Hi,

I have the go ahead 
of a customer to do some testing on Postgresql in a couple of weeks as a 
replacement for Oracle.

The reason for the 
test is that the number of users of the warehouse is going to increase and this 
will have a serious impact on licencing costs. (I bet that sounds 
familiar)

We're running a medium sized data 
warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
Basically we have 2large fact tables 
to deal with: onegoing for400M rows, the other will be 
hitting 1B rows soon.
(around 250Gb of 
data)

My questions to the 
list are: has this sort of thing been attempted before?If so, what where 
the results?
I've been reading up 
on partitioned tabes on pgsql, will the performance benefit will be comparable 
to Oracle partitioned tables?
What are the 
gotchas? Should I be testing on 8 or the 7 version?

Thanks in 
advancefor any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.

Best 
regards,

Matt
___
Matt Casters
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 OKEGEM, Belgium
Tel. 054/25.01.37
GSM 0486/97.29.37




Re: [PERFORM] Swapping on Solaris

2005-01-19 Thread Matt Casters

 Kevin Schroeder wrote:
 It looks to me like you are using no (device or file) swap at all, and
 have 1.3G of real memory free, so could in fact give Postgres more of it :-)


Indeed.
If you DO run into trouble after giving Postgres more RAM, use the vmstat 
command.
You can use this command like vmstat 10. (ignore the first line)
Keep an eye on the pi and po parameters. (kilobytes paged in and out)

HTH,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Swapping on Solaris

2005-01-19 Thread Matt Clark
This page may be of use:
http://www.serverworldmagazine.com/monthly/2003/02/solaris.shtml
From personal experience, for god's sake don't think Solaris' VM/swap 
implementation is easy - it's damn good, but it ain't easy!

Matt
Kevin Schroeder wrote:
I think it's probably just reserving them.  I can't think of anything 
else. Also, when I run swap activity with sar I don't see any 
activity, which also points to reserved swap space, not used swap space.

swap -s reports
total: 358336k bytes allocated + 181144k reserved = 539480k used, 
2988840k available

Kevin
- Original Message - From: Alan Stange [EMAIL PROTECTED]
To: Kevin Schroeder [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, January 19, 2005 11:04 AM
Subject: Re: [PERFORM] Swapping on Solaris

Kevin Schroeder wrote:
I may be asking the question the wrong way, but when I start up 
PostgreSQL swap is what gets used the most of.  I've got 1282MB free 
RAM right now and and 515MB swap in use.  Granted, swap file usage 
probably wouldn't be zero, but I would guess that it should be a lot 
lower so something must be keeping PostgreSQL from using the free 
RAM that my system is reporting.  For example, one of my postgres 
processes is 201M in size but on 72M is resident in RAM.  That extra 
130M is available in RAM, according to top, but postgres isn't using 
it.

The test you're doing doesn't measure what you think you're measuring.
First, what else is running on the machine?Note that some shared 
memory allocations do reserve backing pages in swap, even though the 
pages aren't currently in use.  Perhaps this is what you're 
measuring?  swap -s has better numbers than top.

You'd be better by trying a reboot then starting pgsql and seeing 
what memory is used.

Just because you start a process and see the swap number increase 
doesn't mean that the new process is in swap.  It means some 
anonymous pages had to be evicted to swap to make room for the new 
process or some pages had to be reserved in swap for future use.   
Typically a new process won't be paged out unless something else is 
causing enormous memory pressure...

-- Alan


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Speed in V8.0

2004-12-24 Thread Matt Clark
Another man working to the bitter end this Christmas!

There could be many reasons, but maybe first you should look at the amount
of RAM available?  If the tables fit in RAM on the production server but not
on the dev server, then that will easily defeat the improvement due to using
the native DB version.

Why don't you install cygwin on the dev box and do the comparison using the
same hardware?

M

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Thomas Wegner
 Sent: 21 December 2004 23:03
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Speed in V8.0
 
 
 Hello, i have a problem between V7.4.3 Cygwin and
 V8.0RC2 W2K. I have 2 systems:
 
 1. Production Machine
 - Dual P4 3000MHz
 - 2 GB RAM
 - W2K
 - PostgreSQL 7.4.3 under Cygwin
 - i connect to it over a DSL Line
 2. Develop Machine
 - P4 1800MHz
 - 760 MB RAM
 - PostgreSQL Native Windows
 - local connection 100MB/FD
 
 Both systems use the default postgresql.conf. Now the 
 problem. I have an (unoptimized, dynamic) query wich was 
 execute on the production machine over DSL in 2 seconds and 
 on my develop machine, connected over local LAN, in 119 seconds!
 
 Whats this? I can not post the query details here public, its 
 a commercial project. Any first idea? I execute on both 
 machine the same query with the same database design!
 -
 Thomas Wegner
 CabrioMeter - The Weather Plugin for Trillian 
 http://www.wegner24.de/cabriometer
 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] ext3 journalling type

2004-11-08 Thread Matt Clark
 Am I right to assume that writeback is both fastest and at 
 the same time as safe to use as ordered?  Maybe any of you 
 did some benchmarks?

It should be fastest because it is the least overhead, and safe because
postgres does it's own write-order guaranteeing through fsync().  You should
also mount the FS with the 'noatime' option.

But  For some workloads, there are tests showing that 'data=journal' can
be the fastest!  This is because although the data is written twice (once to
the journal, and then to its real location on disk) in this mode data is
written _sequentially_ to the journal, and later written out to its
destination, which may be at a quieter time.

There's a discussion (based around 7.2) here:
http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.txt

M


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark
 For some reason it's a requirement that partial wildcard 
 searches are done on this field, such as SELECT ... WHERE 
 field LIKE 'A%'
 
 I thought an interesting way to do this would be to simply 
 create partial indexes for each letter on that field, and it 
 works when the query matches the WHERE clause in the index 
 exactly like above.  The problem is thus:

I thought PG could use an ordinary index for 'like' conditions with just a
terminating '%'?

My other thought is that like 'A%' should grab about 1/26th of the table
anyway (if the initial character distribution is random), and so a
sequential scan might be the best plan anyway...

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] What is the difference between these?

2004-11-05 Thread Matt Nuzum
To me, these three queries seem identical... why doesn't the first one
(simplest to understand and write) go the same speed as the third one?

I'll I'm trying to do is get statistics for one day (in this case,
today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
rh linux 7.3 (note that i think the difference between the first two
might just be related to the data being in memory for the second
query).


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime 
date_trunc('day', now());
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=580350.43..580350.43 rows=1 loops=1)
   -  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
 Filter: (atime  date_trunc('day'::text, now()))
 Total runtime: 580350.65 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from (select * from usage_access
where atime  date_trunc('day', now())) as temp;
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=348012.85..348012.85 rows=1 loops=1)
   -  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
 Filter: (atime  date_trunc('day'::text, now()))
 Total runtime: 348013.10 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime
between date_trunc('day', now()) and date_trunc('day', now()) + '1
day'::interval;
QUERY PLAN  
--
 Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
time=27.84..27.84 rows=1 loops=1)
   -  Index Scan using usage_access_atime on usage_access 
(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
rows=2964 loops=1)
 Index Cond: ((atime = date_trunc('day'::text, now())) AND
(atime = (date_trunc('day'::text, now()) + '1 day'::interval)))
 Total runtime: 28.11 msec
(4 rows)

-- 
Matthew Nuzum   | Makers of Elite Content Management System
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark
 With that many rows, and a normal index on the field, 
 postgres figures the best option for say I% is not an index 
 scan, but a sequential scan on the table, with a filter -- 
 quite obviously this is slow as heck, and yes, I've run 
 analyze several times and in fact have the vacuum analyze automated.

Ah, so like 'I%' uses a very slow seq scan, but like 'ABC%' uses an
ordinary index OK?  If so then...

The planner would usually assume (from what Tom usually says) that 1/26
selectivity isn't worth doing an index scan for, but in your case it's wrong
(maybe because the rows are very big?)

You may be able to get the planner to go for an index scan on like 'I%' by
tweaking the foo_cost variables in postgresql.conf 

Or you could have the app rewrite like 'I%' to like 'IA%' or like 'IB%'
... , or do that as a stored proc.

 With the partial index the index scan is used and the cost 
 drops from 0..2million to 0..9000 -- a vast improvement.

So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';

Any good?


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
 I have a dual processor system that can support over 150 concurrent 
 connections handling normal traffic and load.   Now suppose I setup 
 Apache to spawn all of it's children instantly, what will 
...
 This will spawn 150 children in a short order of time and as 
 this takes 

Doctor, it hurts when I do this!
Well, don't do that then...

Sorry, couldn't resist ;-)

Our Apache/PG driven website also needs to be able to deal with occasional
large peaks, so what we do is:

StartServers 15 # Don't create too many children initially
MinSpareServers 10  # Always have at least 10 spares lying around
MaxSpareServers 20  # But no more than 20
MaxClients 150  # Up to 150 - the default 256 is too much for our
RAM


So on server restart 15 Apache children are created, then one new child
every second up to a maximum of 150.

Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
scope for queuing inbound requests while we wait for sufficient children to
be spawned.

In addition we (as _every_ high load site should) run Squid as an
accelerator, which dramatically increases the number of client connections
that can be handled.  Across 2 webservers at peak times we've had 50,000
concurrently open http  https client connections to Squid, with 150 Apache
children doing the work that squid can't (i.e. all the dynamic stuff), and
PG (on a separate box of course) whipping through nearly 800 mixed selects,
inserts and updates per second - and then had to restart Apache on one of
the servers for a config change...  Not a problem :-)

One little tip - if you run squid on the same machine as apache, and use a
dual-proc box, then because squid is single-threaded it will _never_ take
more than half the CPU - nicely self balancing in a way.

M


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
 Apache::DBI overall works better to what I require, even if 
 it is not a 
 pool per sey.   Now if pgpool supported variable rate pooling like 
 Apache does with it's children, it might help to even things 
 out.  That 
 and you'd still get the spike if you have to start the webserver and 
 database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
 Case in point: A first time visitor hits your home page.  A 
 dynamic page is generated (in about 1 second) and served 
 (taking 2 more seconds) which contains links to 20 additional 

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.

I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
 Correct the 75% of all hits are on a script that can take 
 anywhere from 
 a few seconds to a half an hour to complete.The script 
 essentially 
 auto-flushes to the browser so they get new information as it arrives 
 creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.

 A squid proxy would probably cause severe problems when 
 dealing with a 
 script that does not complete output for a variable rate of time.

No, it's fine, squid gives it to the client as it gets it, but can receive
from the server faster.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

1- You have a query that runs for half an hour and you spoon feed 
the  results to the client ?
(argh)

2- Your script looks for new data every few seconds, sends a 
packet, then  sleeps, and loops ?

If it's 2 I have a readymade solution for you, just ask.
I'm guessing (2) - PG doesn't give the results of a query in a stream. 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

These are CGI scripts at the lowest level, nothing more and nothing 
less.  While I could probably embed a small webserver directly into 
the perl scripts and run that as a daemon, it would take away the 
portability that the scripts currently offer.
If they're CGI *scripts* then they just use the CGI environment, not 
Apache, so a daemon that accepts the inbound connections, then compiles 
the scripts a-la Apache::Registry, but puts each in a separate thread 
would be, er, relatively easy for someone better at multithreaded stuff 
than me.

This should be my last question on the matter, does squid report the 
proper IP address of the client themselves?That's a critical 
requirement for the scripts.

In the X-Forwarded-For header.  Not that you can be sure you're seeing 
the true client IP anyway if they've gone through an ISP proxy beforehand.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

In your webpage include an iframe with a Javascript to refresh it 
every  five seconds. The iframe fetches a page from the server which 
brings in  the new data in form of generated JavaScript which writes 
in the parent  window. Thus, you get a very short request every 5 
seconds to fetch new  data, and it is displayed in the client's window 
very naturally.

...
Yup.  If you go the JS route then you can do even better by using JS to 
load data into JS objects in the background and manipulate the page 
content directly, no need for even an Iframe.  Ignore the dullards who 
have JS turned off - it's essential for modern web apps, and refusing JS 
conflicts absolutely with proper semantic markup.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good 
starting point.

It's clear that this discussion has moved way away from PG!  Although in 
the context of DB backed web apps I guess in remains a bit on-topic...

M
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] appropriate indexing

2004-11-04 Thread Matt Clark

- ITEM table will, grow, grow, grow (sold items are not deleted)
WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and 
(ITEM.KIND=2 or ITEM.KIND=3)

Partial index on item.status ?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Matt Clark

All 3 plans have crappy estimates.
Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).
 

Er, no other possible answer except Rod's :-)
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.
 

Hmm... I respectfully disagree.  It is so powerful that it is impossible 
to ignore when implementing a sophisticated app.  And it is not 
dangerous to the user so long as they have a popup blocker.  
Commercially, I can ignore the people who turn it off, and I can gain a 
huge benefit from knowing that 95% of people have it turned on, because 
it gives my users a hugely better experience than the equivalent XHTML 
only page (which I deliver, and which works, but which is a fairly 
depressing experience compared to the JS enabled version).

It is _amazing_ how much crud you can take out of a page if you let JS 
do the dynamic stuff (with CSS still in full control of the styling).  
Nice, clean, semantically sensible XHTML, that can be transformed for 
multiple devices - it's great.

An example:
a class=preview_link/previews/foo.wmv/a
But we want it to appear in a popup when viewed in certain devices  
Easy - Attach an 'onclick' event handler (or just set the target 
attribute) when the device has a suitable screen  media player, but 
leave the markup clean for the rest of the world.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

A note though : you'll have to turn off HTTP persistent 
connections in  your server (not in your proxy) or youre back to 
square one.

I hadn't considered that.  On the client side it would seem to be up to 
the client whether to use a persistent connection or not.  If it does, 
then yeah, a request every 5 seconds would still just hold open a 
server.  One more reason to use a proxy I s'pose.

It's clear that this discussion has moved way away from PG!  Although 
in  the context of DB backed web apps I guess in remains a bit 
on-topic...

I find it very on-topic as
- it's a way to help this guy solve his pg problem which was iin 
fact a  design problem
- it's the future of database driven web apps (no more reloading 
the  whole page !)

I think in the future there will be a good bit of presentation 
login in  the client...
Not if Bruno has his way ;-)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] preloading indexes

2004-11-03 Thread Matt Clark
Title: Message




The best way to get all the stuff needed by a query into 
RAM is to run the query. Is it more that you want to 'pin' the data in RAM 
so it doesn't get overwritten by other 
queries?

-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
[EMAIL PROTECTED]Sent: 03 November 2004 
17:31To: [EMAIL PROTECTED]Subject: 
[PERFORM] preloading indexes

  
  I am working with some pretty 
  convoluted queries that work very slowly the first time theyre called but 
  perform fine on the second call. I am fairly certain that these differences 
  are due to the caching. Can someone point me in a direction that would allow 
  me to pre-cache the critical 
indexes?


Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-27 Thread Matt Clark

and certainly anyone who's been around a computer more than a week or
two knows which direction in and out are customarily seen from.
regards, tom lane

Apparently not whoever wrote the man page that everyone copied ;-)
Interesting. I checked this on several machines. They actually say 
different things.

Redhat 9- bi: Blocks sent to a block device (blocks/s).
Latest Cygwin- bi: Blocks sent to a block device (blocks/s).
Redhat 7.x- bi: Blocks sent to a block device (blocks/s).
Redhat AS3- bi: blocks sent out to a block device (in blocks/s)
I would say that I probably agree, things should be relative to the 
cpu. However, it doesn't seem to be something that was universally 
agreed upon. Or maybe the man-pages were all wrong, and only got 
updated recently.

Looks like the man pages are wrong, for RH7.3 at least.  It says bi is 
'blocks written', but an actual test like 'dd if=/dev/zero of=/tmp/test 
bs=1024 count=16384' on an otherwise nearly idle RH7.3 box gives:
  procs  memoryswap  io 
system cpu
r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  
sy  id
0  0  0  75936 474704 230452 953580   0   0 0 0  106  2527   0   
0  99
0  0  0  75936 474704 230452 953580   0   0 0 16512  376  2572   
0   2  98
0  0  0  75936 474704 230452 953580   0   0 0 0  105  2537   
0   0 100

Which is in line with bo being 'blocks written'.
M
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-26 Thread Matt Clark

I don't have iostat on that machine, but vmstat shows a lot of writes to
the drives, and the runnable processes are more than 1:
6  1  0 3617652 292936 279192800 0 52430 1347  4681 25
19 20 37
 

Assuming that's the output of 'vmstat 1' and not some other delay, 
50MB/second of sustained writes is usually considered 'a lot'. 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Matt Clark
I suppose I'm just idly wondering really.  Clearly it's against PG
philosophy to build an FS or direct IO management into PG, but now it's so
relatively easy to plug filesystems into the main open-source Oses, It
struck me that there might be some useful changes to, say, XFS or ext3, that
could be made that would help PG out.

I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.

Not being an FS guru I'm not really clear on whether this would help much
(enough to be worth it anyway) or not - any thoughts?  And if there were
useful gains to be had, would it need a whole new FS or could an existing
one be modified?

So there might be (as I said, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?


Matt



Matt Clark
Ymogen Ltd
P: 0845 130 4531
W: https://ymogen.net/
M: 0774 870 1584
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] OS desicion

2004-10-20 Thread Matt Clark
You are asking the wrong question.  The best OS is the OS you (and/or 
the customer)  knows and can administer competently.  The real 
performance differences between unices are so small as to be ignorable 
in this context.  The context switching bug is not OS-dependent, but 
varys in severity across machine architectures (I understand it to be 
mostly P4/Athlon related, but don't take my word for it).

M
Tom Fischer wrote:
Hi List,
I have a Dual-Xeon 3Ghz System with with GB RAM and an Adaptec 212ß SCSI
RAID with 4 SCA Harddiscs. Our customer wants to have the Machine tuned
for best Database performance. Which OS should we used? We are tending
between Linux 2.6 or FreeBSD. The Database Size is 5GB and ascending.
Most SQL-Queries are Selects, the Tablesizes are beetween 300k and up to
10 MB. I've read the Hardware Performance Guide and the result was to
take FreeBSD in the Decision too :)
And what is on this Context Switiching Bug i have read in the Archive? 

Hope you can help me
Regards
Tom
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] how much mem to give postgres?

2004-10-20 Thread Matt Clark

How would I turn that off? In the kernel config? Not too familiar with
that. I have a 2 proc xeon with 4 gigs of mem on the way for postgres,
so I hope HT isn't a problem. If HT is turned off, does it just not
use the other half of the processor? Or does the processor just work
as one unit?
 

You turn it off in the BIOS.  There is no 'other half', the processor is 
just pretending to have two cores by shuffling registers around, which 
gives maybe a 5-10% performance gain in certain multithreaded 
situations.  opinionA hack to overcome marchitactural limitations due 
to the overly long pipeline in the Prescott core./opinion.  Really of 
most use for desktop interactivity rather than actual throughput.

M
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] OS desicion

2004-10-20 Thread Matt Clark






  
The real 
performance differences between unices are so small as to be ignorable
in this context. 

  
  <>
Well, at least the difference between Linux and BSD. There are
substantial 
tradeoffs should you chose to use Solaris or UnixWare.
  
Yes, quite right, I should have said 'popular x86-based unices'. 




Re: [PERFORM] how much mem to give postgres?

2004-10-20 Thread Matt Clark

OT
Hyperthreading is actually an excellent architectural feature that
can give significant performance gains when implemented well and used
for an appropriate workload under a decently HT aware OS.
IMO, typical RDBMS streams are not an obviously appropriate workload,
Intel didn't implement it particularly well and I don't think there
are any OSes that support it particularly well.
/OT
But don't write off using it in the future, when it's been improved
at both the OS and the silicon levels.
 

You are quite right of course  - unfortunately the current Intel 
implementation meets nearly none of these criteria!  As Rod Taylor 
pointed out off-list, IBM's SMT implementation on the Power5 is vastly 
superior.  Though he's also just told me that Sun is beating IBM on 
price/performance for his workload, so who knows how reliable a chap he 
is... ;-)

M
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Speeding up this function

2004-10-19 Thread Matt Nuzum
Hello, I've thought it would be nice to index certain aspects of my
apache log files for analysis. I've used several different techniques
and have something usable now, but I'd like to tweak it one step
further.

My first performance optimization was to change the logformat into a
CSV format.  I processed the logfiles with PHP and plsql stored
procedures. Unfortunately, it took more than 24 hours to process 1
days worth of log files.

I've now switched to using C# (using mono) to create hash-tables to do
almost all of the pre-processing.  This has brought the time down to
about 3 hours.  Actually, if I take out one step it brought the
process down to about 6 minutes, which is a tremendous improvement.

The one step that is adding 2.5+ hours to the job is not easily done
in C#, as far as I know.

Once the mostly-normalized data has been put into a table called
usage_raw_access I then use this query:
insert into usage_access select * , 
usage_normalize_session(accountid,client,atime) as sessionid 
from usage_raw_access;

All it does is try to link pageviews together into a session. 
here's the function:
 create or replace function usage_normalize_session (varchar(12),
inet, timestamptz) returns integer as '
 DECLARE
 -- $1 = Account ID, $2 = IP Address, $3 = Time
 RecordSet record;
 BEGIN
 SELECT INTO RecordSet DISTINCT sessionid FROM usage_access ua
 WHERE ua.accountid = $1
 AND ua.client = $2
 AND ua.atime = ($3 - ''20 min''::interval)::timestamptz;

 if found
 then return RecordSet.sessionid;
 end if;

 return nextval(''usage_session_ids'');
 END;'
 language plpgsql;

And the table usage_access looks like this:
   Table public.usage_access
  Column|   Type   | Modifiers
-+--+---
[snip]
client  | inet |
atime   | timestamp with time zone |
accountid   | character varying(12)|
sessionid   | integer  |
Indexes: usage_acccess_req_url btree (req_url),
usage_access_accountid btree (accountid),
usage_access_atime btree (atime),
usage_access_hostid btree (hostid),
usage_access_sessionid btree (sessionid)
usage_access_sessionlookup btree (accountid,client,atime);

As you can see, this looks for clients who have visited the same site
within 20 min.  If there is no match, a unique sessionid is assigned
from a sequence.  If there is a visit, the session id assigned to them
is used.  I'm only able to process about 25 records per second with my
setup.  My window to do this job is 3-4 hours and the shorter the
better.

Here is an explain analyze of the query I do (note I limited it to 1000):
EXPLAIN ANALYZE
insert into usage_access select * ,
usage_normalize_session(accountid,client,atime) as sessionid from
usage_raw_access limit 1000;
QUERY PLAN
--
 Subquery Scan *SELECT*  (cost=0.00..20.00 rows=1000 width=196)
(actual time=51.63..47634.22 rows=1000 loops=1)
   -  Limit  (cost=0.00..20.00 rows=1000 width=196) (actual
time=51.59..47610.23 rows=1000 loops=1)
 -  Seq Scan on usage_raw_access  (cost=0.00..20.00 rows=1000
width=196) (actual time=51.58..47606.14 rows=1001 loops=1)
 Total runtime: 48980.54 msec

I also did an explain of the query that's performed inside the function:

EXPLAIN ANALYZE
select sessionid from usage_access ua where ua.accountid =  'XYZ' and
ua.client = '64.68.88.45'::inet and ua.atime = '2003-11-02
04:50:01-05'::timestamptz;
  
  QUERY PLAN
--
Index Scan using usage_access_sessionlookup on usage_access ua 
(cost=0.00..6.02 rows=1 width=4) (actual time=0.29..0.29 rows=0
loops=1)
  Index Cond: ((accountid = 'XYZ'::character varying) AND (client =
'64.68.88.45'::inet) AND (atime = '2003-11-02 04:50:01-05'::timestamp
with time zone))
Total runtime: 0.35 msec
(3 rows)


What I'd really like to know is if someone knows a way to do any of
the following:
 a: Make the INSERT into ... SELECT *,usage_access_sessionlookup().. work faster
 b: Make the usage_access_sessionlookup() smarter,better,etc.
 c: Do this in C# using a hash-table or some other procedure that
would be quicker.
 d: Find an algorithm to create the sessionid without having to do any
database or hash-table lookups.  As the dataset gets bigger, it won't
fit in RAM and the lookup queries will become I/O bound, drastically
slowing things down.

d: is my first choice.

For some reason I just can't seem to get my mind around the data.  I
wonder if there's someway 

Re: [PERFORM] Speeding up this function

2004-10-19 Thread Matt Nuzum
On Tue, 19 Oct 2004 15:49:45 -0400, Jeremy Dunn [EMAIL PROTECTED] wrote:
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
  Matt Nuzum
  Sent: Tuesday, October 19, 2004 3:35 PM
  To: pgsql-performance
  Subject: [PERFORM] Speeding up this function
 
 snip
snip
 
 This is probably a stupid question, but why are you trying to create
 sessions after the fact?  Since it appears that users of your site must
 login, why not just assign a sessionID to them at login time, and keep
 it in the URL for the duration of the session?  Then it would be easy to
 track where they've been.
 
 - Jeremy
 
 

You don't have to log in to visit the sites.  These log files are
actually for many domains.  Right now, we do logging with a web-bug
and it does handle the sessions, but it relies on javascript and we
want to track a lot more than we are now.  Plus, that code is in
JavaScript and one of our primary motiviations is to ditch MySQL
completely.

-- 
Matthew Nuzum   | Makers of Elite Content Management System
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select with qualified join condition / Batch inserts

2004-10-15 Thread Matt Clark
 SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION 
   FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con 
   WHERE cmp.BARCODE=con.BARCODE 
   AND cmp.WELL_INDEX=con.WELL_INDEX 
   AND cmp.MAT_ID=con.MAT_ID 
   AND cmp.MAT_ID = 3 
   AND cmp.BARCODE='910125864' 
   AND cmp.ID_LEVEL = 1;

Quick guess - type mismatch forcing sequential scan.  Try some quotes:
AND cmp.MAT_ID = '3' 
AND cmp.BARCODE='910125864' 
AND cmp.ID_LEVEL = '1';

M


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Opteron vs RHAT

2004-10-13 Thread Matt Clark
 trainwreck...  If you're going through IBM, then they won't want to 
 respond to any issues if you're not running a 
 bog-standard RHAS/RHES 
 release from Red Hat.  
... To be fair, we keep on actually running into things that 
 _can't_ be backported, like fibrechannel drivers that were 
 written to take advantage of changes in the SCSI support in 2.6.

I thought IBM had good support for SUSE?  I don't know why I thought that...


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] IBM P-series machines

2004-10-12 Thread Matt Clark

As for vendor support for Opteron, that sure looks like a
trainwreck...  If you're going through IBM, then they won't want to
respond to any issues if you're not running a bog-standard RHAS/RHES
release from Red Hat.  And that, on Opteron, is preposterous, because
there's plenty of the bits of Opteron support that only ever got put
in Linux 2.6, whilst RHAT is still back in the 2.4 days.
 

To be fair, they have backported a boatload of 2.6 features to their kernel:
http://www.redhat.com/software/rhel/kernel26/
And that page certainly isn't an exhaustive list...
M
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Which plattform do you recommend I run PostgreSQL for

2004-10-12 Thread Matt Clark

In the MySQL manual it says that MySQL performs best with Linux 2.4 with
ReiserFS on x86. Can anyone official, or in the know, give similar
information regarding PostgreSQL?
 

I'm neither official, nor in the know, but I do have a spare moment! I 
can tell you that any *NIX variant on any modern hardware platform will 
give you good performance, except for Cygwin/x86.  Any differences 
between OSes on the same hardware are completely swamped by far more 
direct concerns like IO systems, database design, OS tuning etc.  Pick 
the OS you're most familiar with is usually a good recommendation (and 
not just for Postgres).

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-10-01 Thread Matt Clark
 OK, that'd work too... the point is if you're re-connecting 
 all the time it doesn't really matter what else you do for 
 performance.

Yeah, although there is the chap who was asking questions on the list
recently who had some very long-running code on his app servers, so was best
off closing the connection because he had far too many postmaster processes
just sitting there idle all the time!

But you're right, it's a killer usually.

M 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark

Basically you set a default in seconds for the HTML results to be
cached, and then have triggers set that force the cache to regenerate
(whenever CRUD happens to the content, for example).
Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a
believer out of me!
 

Nice to have it in a library, but if you want to be that simplistic then 
it's easy in any language.  What if a process on server B modifies a n 
important value that server A has cached though?  Coherency (albeit that 
the client may choose to not use it) is a must for a general solution.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark






  
More to the point though, I think this is a feature that really really 
should be in the DB, because then it's trivial for people to use.  

  
  
How does putting it into PGPool make it any less trivial for people to
use?

The answers are at
http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html . Specifically,
it's a separate application that needs configuration, the homepage has
no real discussion of the potential pitfalls of pooling and what this
implementation does to get around them, you get the idea. I'm sure
it's great software, but it doesn't come as part of the DB server, so
95% of people who would benefit from query caching being implemented in
it never will. If it shipped with and was turned on by default in SUSE
or RedHat that would be a different matter. Which I realise makes me
look like one of those people who doesn't appreciate code unless it's
'popular', but I hope I'm not *that* bad...

Oh OK, I'll say it, this is a perfect example of why My*** has so much
more mindshare. It's not better, but it sure makes the average Joe
_feel_ better. Sorry, I've got my corporate hat on today, I'm sure
I'll feel a little less cynical tomorrow.

M




Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark

Any competently written application where caching results would be a
suitable performance boost can already implement application or
middleware caching fairly easily, and increase performance much more
than putting result caching into the database would.
 

I guess the performance increase is that you can spend $10,000 on a 
developer, or $10,000 on hardware, and for the most part get a more 
reliable result the second way.  MemcacheD is fine(ish), but it's not a 
panacea, and it's more than easy to shoot yourself in the foot with it.  
Caching is hard enough that lots of people do it badly - I'd rather use 
an implementation from the PG team than almost anywhere else.

I don't see caching results in the database as much of a win for most
well written applications.  Toy benchmarks, sure, but for real apps it
seems it would add a lot of complexity, and violate the whole point of
using an ACID database.
 

Well the point surely is to _remove_ complexity from the application, 
which is written by God Knows Who, and put it in the DB, which is 
written by God And You.  And you can still have ACID (cached data is not 
the same as stale data, although once you have the former, the latter 
can begin to look tempting sometimes).

M
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Caching of Queries

2004-09-23 Thread Matt Clark
 I've looked at PREPARE, but apparently it only lasts 
 per-session - that's worthless in our case (web based 
 service, one connection per data-requiring connection).

That's a non-sequitur.  Most 'normal' high volume web apps have persistent
DB connections, one per http server process.  Are you really dropping DB
connections and reconnecting each time a new HTTP request comes in?

M


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] insert

2004-08-13 Thread Matt Clark
  It is likely that you are missing an index on one of those foreign 
  key'd items.
 
 I don't think that is too likely as a foreign key reference 
 must be a unique key which would have an index. 

I think you must be thinking of primary keys, not foreign keys.  All
one-to-many relationships have non-unique foreign keys.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance Bottleneck

2004-08-10 Thread matt

 Squid also takes away the work of doing SSL (presuming you're running it
 on a different machine). Unfortunately it doesn't support HTTP/1.1 which
 means that most generated pages (those that don't set Content-length) end
 up forcing squid to close and then reopen the connection to the web
 server.

It is true that it doesn't support http/1.1, but 'most generated pages'? 
Unless they are actually emitted progressively they should have a
perfectly good content-length header.

 I've also had some problems when Squid had a large number of connections
 open (several thousand); though that may have been because of my
 half_closed_clients setting. Squid 3 coped a lot better when I tried it
 (quite a few months ago now - and using FreeBSD and the special kqueue
 system call) but crashed under some (admittedly synthetic) conditions.

It runs out of the box with a very conservative setting for max open file
descriptors - this may or may not be the cause of the problems you have
seen.  Certainly I ran squid with 16,000 connections back in 1999...

 You still have periods of time when the web servers are busy using their
 CPUs to generate HTML rather than waiting for database queries. This is
 especially true if you cache a lot of data somewhere on the web servers
 themselves (which, in my experience, reduces the database load a great
 deal). If you REALLY need to reduce the number of connections (because you
 have a large number of web servers doing a lot of computation, say) then
 it might still be useful.

Aha, a postgres related topic in this thread!  What you say is very true,
but then given that the connection overhead is so vanishingly small, why
not simply run without a persistent DB connection in this case?  I would
maintain that if your webservers are holding open idle DB connections for
so long that it's a problem, then simply close the connections!

M

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Matt Clark
 And this is exactly where the pgpool advantage lies. 
 Especially with the 
 TPC-W, the Apache is serving a mix of PHP (or whatever CGI 
 technique is 
 used) and static content like images. Since the 200+ Apache 
 kids serve 
 any of that content by random and the emulated browsers very much 
 encourage it to ramp up MaxClients children by using up to 4 
 concurrent 
 image connections, one does end up with MaxClients DB 
 connections that 
 are all relatively low frequently used. In contrast to that the real 
 pgpool causes lesser, more active DB connections, which is better for 
 performance.

There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.

1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

2)  Serve static content off an entirely separate apache server than the
dynamic content, but by using separate domains (e.g. 'static.foo.com').

Personally I favour number 1.  Our last biggish peak saw 6000 open HTTP and
HTTPS connections and only 200 apache children, all of them nice and busy,
not hanging around on street corners looking bored.  During quiet times
Apache drops back to its configured minimum of 40 kids.  Option 2 has the
advantage that you can use a leaner build for the 'dynamic' apache server,
but with RAM so plentiful these days that's a less useful property.

Basically this puts the 'pooling' back in the stateless HTTP area where it
truly belongs and can be proven not to have any peculiar side effects
(especially when it comes to transaction safety).  Even better, so long as
you use URL parameters for searches and the like, you can have the
accelerator cache those pages for a certain time too so long as slightly
stale results are OK.

I'm sure pgpool and the like have their place, but being band-aids for
poorly configured websites probably isn't the best use for them.

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Swapping in 7.4.3

2004-07-16 Thread Matt Clark
 This is normal.  My personal workstation has been up for 16 
 days, and it shows 65 megs used for swap.  The linux kernel 
 looks for things that haven't been accessed in quite a while 
 and tosses them into swap to free up the memory for other uses.
 
 This isn't PostgreSQL's fault, or anything elses.  It's how a 
 typical Unix kernel works.  I.e. you're seeing a problem that 
 simply isn't there.

Actually it (and other OSes) does slightly better than that.  It _copies_
the least recently used pages into swap, but leaves them in memory.  Then
when there really is a need to swap stuff out there is no need to actually
write to swap because it's already been done, and conversely if those pages
are wanted then they don't have to be read from disk because they were never
removed from memory.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Postgres over Linux NBD or NFS

2004-06-22 Thread Matt Clark
How about iSCSI?  This is exactly what it's for - presenting a bunch of
remote SCSI hardware as if it were local.  

There are several reference implementations on SourceForge from Intel, Cisco
 others.

I've never tried it myself, but I would if I had the need.  And let's face
it there are some very big players selling very pricey kit that uses it, so
you should have pretty high confidence that the fundamentals are strong.

M



 The other goal is to be able to stick LOTS of disk into one 
 box, and dole it out to multiple servers.  It's more 
 expensive to set up and manage 3 RAID arrays than it is to 
 set up and manage just 1, because you have to manage 3 sets 
 of disk hardware rather than 1.
[snip]
 The poor man's approach involves trying to fake this by 
 building a disk box running Linux that exports the storage 
 either as a filesystem (using NFS) or as disk blocks (NBD).  
 NFS clearly doesn't provide the filesystem semantics needed 
 to get decent reliability; with NBD, it's not clear what happens :-(.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] RAID or manual split?

2004-06-03 Thread matt
 It seems, that if I know the type and frequency of the queries a
 database will be seeing, I could split the database by hand over
 multiple disks and get better performance that I would with a RAID array
 with similar hardware.

Unlikely, but possible if you had radically different hardware for
different tables.

 Six large (3-7 Mrow) 'summary' tables, each being updated continuously
 by 5-20 processes with about 0.5 transactions/second/process.

Well you should get close to an order of magnitude better performance from
a RAID controller with write-back cache on those queries.

 Periodically (currently every two weeks), join queries are
 performed between one of the 'summary' tables(same one each time) and
 each of the other five.  Each join touches most rows of both tables,
 indexes aren't used.  Results are written into a separate group of
 'inventory' tables (about 500 Krow each), one for each join.

The more disks the data is spread over the better (the RAID controller
will help here with striping).

 There are frequent (100-1000/day) queries of both the
 inventory and summary tables using the primary key -- always using the
 index and returning  10 rows.

RAM is what you need, to cache the data and indexes, and then as much CPU
power as you can get.

 We're currently getting (barely) acceptable performance from a single
 15k U160 SCSI disk, but db size and activity are growing quickly.
 I've got more disks and a battery-backed LSI card on order.

3 or more disks in a stripe set, with write back caching, will almost
certainly give a huge performance boost.  Try that first, and only if you
have issues should you think about futzing with symlinks etc.

M

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Matt Clark
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), 
Quad Xeon 700MHz/1MB L2 cache, 3GB RAM.

Idle-ish (it's a production server) cs/sec ~5000

3 test queries running:
   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache   si  sobibo   incs   us  sy  id
 3  0  0  23380 577680 105912 2145140   0   0 0 0  107 116890  50  14  35
 2  0  0  23380 577680 105912 2145140   0   0 0 0  114 118583  50  15  34
 2  0  0  23380 577680 105912 2145140   0   0 0 0  107 115842  54  14  32
 2  1  0  23380 577680 105920 2145140   0   0 032  156 117549  50  16  35

HTH

Matt

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
 Sent: 20 April 2004 01:02
 To: [EMAIL PROTECTED]
 Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; Neil Conway
 Subject: Re: [PERFORM] Wierd context-switching issue on Xeon 
 
 
 Here is a test case.  To set up, run the test_setup.sql script once;
 then launch two copies of the test_run.sql script.  (For those of
 you with more than two CPUs, see whether you need one per CPU to make
 trouble, or whether two test_runs are enough.)  Check that you get a
 nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
 
 In isolation, test_run.sql should do essentially no syscalls at all once
 it's past the initial ramp-up.  On a machine that's functioning per
 expectations, multiple copies of test_run show a relatively low rate of
 semop() calls --- a few per second, at most --- and maybe a delaying
 select() here and there.
 
 What I actually see on Josh's client's machine is a context swap storm:
 vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
 shows a corresponding rate of semop() syscalls, with a few delaying
 select()s sprinkled in.  top(1) shows system CPU percent of 25-30
 and idle CPU percent of 16-20.
 
 I haven't bothered to check how long the test_run query takes, but if it
 ends while you're still examining the behavior, just start it again.
 
 Note the test case assumes you've got shared_buffers set to at least
 1000; with smaller values, you may get some I/O syscalls, which will
 probably skew the results.
 
   regards, tom lane
 
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-24 Thread Matt Clark
  Now if these vendors could somehow eliminate downtime due to human error
  we'd be talking *serious* reliablity.

 You mean making the OS smart enough to know when clearing the arp
 cache is a bonehead operation, or just making the hardware smart
 enough to realise that the keyswitch really shouldn't be turned
 while 40 people are logged in?  (Either way, I agree this'd be an
 improvement.  It'd sure make colocation a lot less painful.)

Well I was joking really, but those are two very good examples!  Yes, machines should 
require extra confirmation for operations like
those.  Hell, even a simple 'init 0' would be well served by a prompt that says There 
are currently 400 network sockets open, 50
remote users logged in, and 25 disk IOs per second.  What's more, there's nobody 
logged in at the console to boot me up again
afterwards - are you _sure_ you want to shut the machine down?.  It's also crazy that 
there's no prompt after an 'rm -rf' (we could
have 'rm -rf --iacceptfullresponsibility' for an unprompted version).

Stuff like that would have saved me from a few embarrassments in the past for sure ;-)

It drives me absolutely nuts every time I see a $staggeringly_expensive clustered 
server whose sysadmins are scared to do a failover
test in case something goes wrong!  Or which has worse uptime than my desktop PC 
because the cluster software's poorly set up or
administered.  Or which has both machines on the same circuit breaker.  I could go on 
but it's depressing me.

Favourite anecdote:  A project manager friend of mine had a new 'lights out' 
datacenter to set up.  The engineers, admins and
operators swore blind that everything had been tested in every possible way, and that 
incredible uptime was guaranteed.  'So if I
just pull this disk out everything will keep working?' he asked, and then pulled the 
disk out without waiting for an answer...

Ever since he told me that story I've done exactly that with every piece of so-called 
'redundant' hardware a vendor tries to flog
me.  Ask them to set it up, then just do nasty things to it without asking for 
permission.  Less than half the gear makes it through
that filter, and actually you can almost tell from the look on the technical sales 
rep's face as you reach for the
drive/cable/card/whatever whether it will or won't.

M






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-23 Thread Matt Clark
If it's going to be write intensive then the RAID controller will be the most 
important thing.  A dual p3/500 with a write-back
cache will smoke either of the boxes you mention using software RAID on write 
performance.

As for the compute intensive side (complex joins  sorts etc), the Dell will most 
likely beat the Sun by some distance, although
what the Sun lacks in CPU power it may make up a bit in memory bandwidth/latency.

Matt

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Subbiah,
 Stalin
 Sent: 23 March 2004 18:41
 To: 'Andrew Sullivan'; '[EMAIL PROTECTED]'
 Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux


 We are looking into Sun V210 (2 x 1 GHz cpu, 2 gig ram, 5.8Os) vs. Dell 1750
 (2 x 2.4 GHz xeon, 2 gig ram, RH3.0). database will mostly be
 write intensive and disks will be on raid 10. Wondering if 64bit 1 GHz to
 32bit 2.4 GHz make a big difference here.

 Thanks!

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Andrew
 Sullivan
 Sent: Tuesday, March 23, 2004 9:37 AM
 To: '[EMAIL PROTECTED]'
 Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux


 On Mon, Mar 22, 2004 at 04:05:45PM -0800, Subbiah, Stalin wrote:
  being the key performance booster for postgres.  what is the preferred OS
  for postgres deployment if given an option between linux and solaris. As

 One thing this very much depends on is what you're trying to do.
 Suns have a reputation for greater reliability.  While my own
 experience with Sun hardware has been rather shy of sterling, I _can_
 say that it stands head and shoulders above a lot of the x86 gear you
 can get.

 If you're planning to use Solaris on x86, don't bother.  Solaris is a
 slow, bloated pig compared to Linux, at least when it comes to
 managing the largish number of processes that Postgres requires.

 If pure speed is what you're after, I have found that 2-way, 32 bit
 Linux on P-IIIs compares very favourably to 4 way 64 bit Ultra SPARC
 IIs.

 A

 --
 Andrew Sullivan  | [EMAIL PROTECTED]
 The fact that technology doesn't work is no bar to success in the
 marketplace.
   --Philip Greenspun

 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Disappointing performance in db migrated from MS SQL

2004-02-13 Thread matt
 Josh, the disks in the new system should be substantially faster than
 the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has
 15k RPM disks, as opposed to the 10k RPM disks in the old system.

Spindle speed does not correlate with 'throughput' in any easy way.  What
controllers are you using for these disks?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] insert speed - Mac OSX vs Redhat

2004-01-15 Thread Matt Clark
 On a variety of hardware with Redhat, and versions of postgres, we're
 not getting much better than 50 inserts per second. This is prior to
 moving WAL to another disk, and fsync is on.

 However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing
 500 inserts per second.

 We can only put this down to the OS.
  ^^^

You haven't really produced much evidence to support that statement.  Given that the 
differences in performance between Postgres
running on *BSD and Linux on Intel hardware are not large at all, it seems to be 
almost certainly false in fact.

It may of course be due to some settings of the different OSes, but not the OSes 
themselves.

It would help if you gave a straight PG7.4 comparison with hardware specs as well, and 
config file differences if any.

One thought: assuming the Apple has IDE disks, then the disks probably have write 
caching turned on, which is good for speed, but
not crash-safe.

matt





---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Adding RAM: seeking advice warnings of hidden gotchas

2003-12-17 Thread Matt Clark
If you have 3 1.5GB tables then you might as well go for 4GB while you're at
it.  Make sure you've got a bigmem kernel either running or available, and
boost effective_cache_size by whatever amount you increase the RAM by.  We
run a Quad Xeon/4GB server on Redhat 7.3 and it's solid as a rock.

There is no way I know of to get indexes preferentially cached over data
though.

Matt

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Nick
 Fankhauser
 Sent: 17 December 2003 19:57
 To: [EMAIL PROTECTED] Org
 Subject: [PERFORM] Adding RAM: seeking advice  warnings of hidden
 gotchas


 Hi-

 After having done my best to squeeze better performance out of our
 application by tuning within our existing resources, I'm falling back on
 adding memory as a short-term solution while we get creative for
 a long-term
 fix. I'm curious about what experiences others have had with the
 process of
 adding big chunks of RAM. In particular, if I'm trying to encourage the OS
 to cache more of my index information in RAM, what sort of configuration
 should I do at both the PostgreSQL and OS level?

 In a slightly off-topic vein, I'd also like to hear about it if
 anyone knows
 about any gotchas at the OS level that might become a problem.

 The server is a dual processor Athlon 1.2GHz box with hardware
 SCSI RAID. It
 currently has 1 GB RAM, and we're planning to add one GB more for
 a total of
 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2

 My current memory related settings are:

 SHMMAX and SHMALL set to 128MB (OS setting)
 shared buffers 8192 (64MB)
 sort_mem 16384 (16MB)
 effective_cache_size  65536 (512MB)


 We support up to 70 active users, sharing a connection pool of 16
 connections. Most of the queries center around 3 tables that are about 1.5
 GB each.


 Thanks.
-Nick

 -
 Nick Fankhauser

 [EMAIL PROTECTED]  Phone 1.765.965.7363  Fax 1.765.962.9788
 doxpop - Court records at your fingertips - http://www.doxpop.com/



 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] tuning questions

2003-12-09 Thread Matt Clark
 I ended up going back to a default postgresql.conf and reapplying the
 various tunings one-by-one. Turns out that while setting fsync = false
 had little effect on the slow IDE box, it had a drastic effect on this
 faster SCSI box and performance is quite acceptable now (aside from the
 expected falloff of about 30% after the first twenty minutes, which I
 believe comes from growing and shrinking tables without vacuumdb
 --analyzing).

Hmm.  I wonder if that could be related to the issue where many IDE drives have 
write-caching enabled.  With the write cache enabled
fsyncs are nearly immediate, so setting fsync=false makes little difference...





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


  1   2   >