from:"Jim C. Nasby"

Re: [PERFORM] Pgsql - Red Hat Linux - VS MySQL VS MSSQL

2003-07-13 Thread Jim C. Nasby

On Sun, Jul 13, 2003 at 12:51:02PM -0700, Balazs Wellisch wrote:
  Alternatively, you simply compile 7.3.3 from source. I've upgraded most my
  machines that way.
 
 
 Unfortunatelly, compiling from source is not really an option for us. We use
 RPMs only to ease the installation and upgrade process. We have over a
 hundred servers to maintaine and having to compile and recompile software
 everytime a new release comes out would be way too much work.
 
If you aren't settled on OS yet, take a look at FreeBSD, or one of the
linuxes that have better app management. Keeping pgsql up-to-date using
ports on FreeBSD is pretty painless (for that matter, so is keeping the
OS itself up-to-date).
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Odd explain estimate

2003-07-31 Thread Jim C. Nasby

On Thu, Jul 31, 2003 at 04:59:21PM -0400, Andrew Sullivan wrote:
 On Thu, Jul 31, 2003 at 02:51:45PM -0500, Jim C. Nasby wrote:
 If you really needed to set enable_seqscan=false (did you really? 
 Are you sure that's not the cheapest way?), you might want to
 investigate expainding the statistics on the indexed column,
 increasing the correlation through clustering, and other such tricks.
 
Well, if I don't do this it wants to seqscan a table that occupies 350k
pages, instead of pulling a couple thousand rows. I started running it
with the seqscan and it's already taken way longer than it does if I
disable seqscan.

I guess I'll try expanding the statistics.
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Odd explain estimate

2003-08-02 Thread Jim C. Nasby

On Fri, Aug 01, 2003 at 08:16:12AM -0400, Andrew Sullivan wrote:
 On Thu, Jul 31, 2003 at 05:59:59PM -0500, Jim C. Nasby wrote:
   
  Well, if I don't do this it wants to seqscan a table that occupies 350k
  pages, instead of pulling a couple thousand rows. I started running it
  with the seqscan and it's already taken way longer than it does if I
  disable seqscan.
 
 That was indeed the question. 
 
 If it uses a seqscan when it ought not to do, then there's something
 wrong with the statistics, or you haven't vacuum analysed correctly,
 or your table needs vacuum full (is it really 350k pages, or is that
 mostly dead space?), c. -- all the usual bad-seqscan candidates.
 
 enable_seqscan=off is probably not a good strategy for any moderately
 complicated query.  If the planner were perfect, of course, you'd
 never need it at all.
 
Set statistics on the ID colum to 1000, vacuum analyze, and it's good to
go now. Thanks for your help!
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] Poor pg_dump performance

2003-09-06 Thread Jim C. Nasby

First, sorry if this has been answered before; the list search seems to
be down...

This is on a quad Xeon-PII 450 machine running FBSD 4.8.

84386 pgsql 64   0   104M99M RUN1  78:20 61.87% 61.87% postgres
84385 decibel   64   0  3748K  2268K CPU1   3  49:49 37.79% 37.79% pg_dump

(note that the CPU percents are per-cpu, so 100% would be 100% of one
CPU)

According to vmstat, there's very little disk I/O, so that's not a
bottleneck. The command I used was: 

pg_dump -vFc -f pgsql-20030906.cdb stats

It should be compressing, but if that was the bottleneck, shouldn't the
pg_dump process be at 100% CPU? It does seem a bit coincidental that the
two procs seem to be taking 100% of one CPU (top shows them running on
different CPUs though).

This is version 7.3.4.
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Jim C. Nasby

On Tue, Apr 06, 2004 at 01:47:22AM -0400, Tom Lane wrote:
 Qing Zhao [EMAIL PROTECTED] writes:
  We have got a G5 64-bit processor to replace an old G4 32-bit
  processor.  Given everything else equal, should we see a big
  improvement on PG's performance?
 
 Nope.  Database performance typically depends on disk performance first,
 and RAM size second.  A 64-bit processor might help by allowing you to
 install more RAM, but you didn't say that you had.

Memory bandwidth is a consideration too, so you might see some
performance improvements on a G5. We recently debated between Xeons and
Opterons in a new PGSQL server and a little poking around on the lists
indicated that the Opterons did perform better, presumably due to the
increased memory bandwidth. Incidentally, this is why you need about 2x
the CPUs on Sun hardware vs RS6000 hardware for database stuff (and that
gets expensive if you're paying per CPU!).
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[PERFORM] Poor performance of group by query

2004-04-16 Thread Jim C. Nasby

Anyone have any ideas why this query would be so slow?

stats=# explain analyze SELECT work_units, min(raw_rank) AS rank  FROM 
Trank_work_overall GROUP BY work_units;
  QUERY PLAN   

---
 HashAggregate  (cost=1050.12..1085.98 rows=14347 width=16) (actual 
time=163149.981..163227.758 rows=17849 loops=1)
   -  Seq Scan on trank_work_overall  (cost=0.00..804.41 rows=49141 width=16) (actual 
time=0.071..328.682 rows=49091 loops=1)
 Total runtime: 163296.212 ms

(3 rows)

stats=# \d Trank_work_overall
Table pg_temp_1.trank_work_overall
   Column   |  Type  | Modifiers 
++---
 raw_rank   | bigint | 
 work_units | bigint | 

stats=# 

FreeBSD fritz.distributed.net 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #1:
Wed Apr  7 18:42:52 CDT 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/FRITZ  amd64

The machine is a dual opteron with 4G of memory. The query in question
was not hitting the disk at all. PostgreSQL 7.4.2 compiled with -O3.

Also, if I set enable_hashagg = false, it runs in less than a second.
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[PERFORM] Horribly slow hash join

2004-04-16 Thread Jim C. Nasby

Note the time for the hash join step:

 
--
 Hash Join  (cost=357.62..26677.99 rows=93668 width=62) (actual 
time=741.159..443381.011 rows=49091 loops=1)
   Hash Cond: (outer.work_today = inner.work_units)
   -  Hash Join  (cost=337.11..24784.11 rows=93668 width=54) (actual 
time=731.374..417188.519 rows=49091 loops=1)
 Hash Cond: (outer.work_total = inner.work_units)
 -  Seq Scan on email_rank  (cost=0.00..22240.04 rows=254056 width=46) 
(actual time=582.145..1627.759 rows=49091 loops=1)
   Filter: (project_id = 8)
 -  Hash  (cost=292.49..292.49 rows=17849 width=16) (actual 
time=148.944..148.944 rows=0 loops=1)
   -  Seq Scan on rank_tie_overall o  (cost=0.00..292.49 rows=17849 
width=16) (actual time=0.059..75.984 rows=17849 loops=1)
   -  Hash  (cost=17.81..17.81 rows=1081 width=16) (actual time=8.996..8.996 rows=0 
loops=1)
 -  Seq Scan on rank_tie_today d  (cost=0.00..17.81 rows=1081 width=16) 
(actual time=0.080..4.635 rows=1081 loops=1)
 Total runtime: 619047.032 ms

By comparison:
stats=# set enable_hashjoin=false;
SET
stats=# explain analyze select * from email_rank, rank_tie_overall o, rank_tie_today d 
 WHERE email_rank.work_today = d.work_units AND email_rank.work_total = o.work_units 
AND email_rank.project_id = :ProjectID;
 QUERY 
PLAN  
-
 Merge Join  (cost=55391.69..56823.23 rows=93668 width=80) (actual 
time=2705.344..3349.318 rows=49091 loops=1)
   Merge Cond: (outer.work_units = inner.work_today)
   -  Index Scan using work_units_today on rank_tie_today d  (cost=0.00..23.89 
rows=1081 width=16) (actual time=0.150..4.874 rows=1081 loops=1)
   -  Sort  (cost=55391.69..55625.86 rows=93668 width=64) (actual 
time=2705.153..2888.039 rows=49091 loops=1)
 Sort Key: email_rank.work_today
 -  Merge Join  (cost=45047.64..47656.93 rows=93668 width=64) (actual 
time=1685.414..2494.342 rows=49091 loops=1)
   Merge Cond: (outer.work_units = inner.work_total)
   -  Index Scan using work_units_overall on rank_tie_overall o  
(cost=0.00..361.34 rows=17849 width=16) (actual time=0.122..79.383 rows=17849 loops=1)
   -  Sort  (cost=45047.64..45682.78 rows=254056 width=48) (actual 
time=1685.228..1866.215 rows=49091 loops=1)
 Sort Key: email_rank.work_total
 -  Seq Scan on email_rank  (cost=0.00..22240.04 rows=254056 
width=48) (actual time=786.515..1289.101 rows=49091 loops=1)
   Filter: (project_id = 8)
 Total runtime: 3548.087 ms

Even though the second case is only a select, it seems clear that
something's wrong...
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Horribly slow hash join

2004-04-20 Thread Jim C. Nasby

Dammit, I somehow deleted a bunch of replies to this.

Did a TODO ever come out of this?
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-31 Thread Jim C. Nasby

On Thu, Aug 26, 2004 at 12:04:48PM -0700, J. Andrew Rogers wrote:
 The major caveat to having tables of this type is that you can only have
 a primary key index.  No other indexes are possible because the heap
 constantly undergoes local reorganizations if you have a lot of write
 traffic, the same kind of reorganization you would normally expect in a
 BTree index.

This isn't true, at least in 9i. You can create whatever indexes you
want on an index-organized table. I believe that the index stores the PK
value instead of the ROWID.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-31 Thread Jim C. Nasby

On Thu, Aug 26, 2004 at 11:39:42PM -0400, Greg Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  Updated TODO item:
  
  o Automatically maintain clustering on a table
  
  This would require some background daemon to maintain clustering
  during periods of low usage. It might also require tables to be only
  paritally filled for easier reorganization.  It also might require
  creating a merged heap/index data file so an index lookup would
  automatically access the heap data too.
 
 Fwiw, I would say the first would is also a might. None of the previous
 discussions here presumed a maintenance daemon. The discussions before talked
 about a mechanism to try to place new tuples as close as possible to the
 proper index position.
 
 I would also suggest making some distinction between a cluster system similar
 to what we have now but improved to maintain the clustering continuously, and
 an actual index-organized-table where the tuples are actually only stored in a
 btree structure.
 
 They're two different approaches to similar problems. But they might both be
 useful to have, and have markedly different implementation details.

There's a third approach that I think is worth considering. Half of the
benefit to clustered tables is limiting the number of pages you need to
access when scanning the primary key. The order of tuples in the pages
themselves isn't nearly as important as ordering of the pages. This
means you can get most of the benefit of an index-organized table just
by being careful about what page you place a tuple on. What I'm thinking
of is some means to ensure all the tuples on a page are within some PK
range, but not worrying about the exact order within the page since it's
relatively cheap to scan through the page in memory.

Some pros:
This would probably mean less change to the code that inserts tuples.

No need for a background daemon.

No need to create a new B-Tree table structure.

Ideally, there won't be need to move tuples around, which should mean
that current indexing code doesn't need to change.

Cons:
Need to have some way to deal with pages that fill up.

To gain full benefit some means of indicating what range of PK values
are on a page might be needed.

It's not as beneficial as a true IOT since you don't get the benefit of
storing your tuples inline with your B-Tree.

I'm sure there's a ton of things I'm missing, especially since I'm not
familiar with the postgresql code, but hopefully others can explore this
further.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Jim C. Nasby

On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote:
 PostgreSQL's functionality is in many ways similar to Oracle Partitioning.
 
 Loading up your data in many similar tables, then creating a view like:
 
 CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS
 SELECT 200409130800, col1, col2, col3... FROM table200409130800
 UNION ALL
 SELECT 200409131000, col1, col2, col3... FROM table200409131000
 UNION ALL
 SELECT 200409131200, col1, col2, col3... FROM table200409131200
 ...etc...
 
 will allow the PostgreSQL optimizer to eliminate partitions from the query
 when you run queries which include a predicate on the partitioning_col, e.g.
 
 select count(*) from bigtable where idate = 200409131000
 
 will scan the last two partitions only...
 
 There are a few other ways of creating the view that return the same answer,
 but only using constants in that way will allow the partitions to be
 eliminated from the query, and so run for much longer.

Is there by any chance a set of functions to manage adding and removing
partitions? Certainly this can be done by hand, but having a set of
tools would make life much easier. I just looked but didn't see anything
on GBorg.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning

2004-09-16 Thread Jim C. Nasby

On Wed, Sep 15, 2004 at 02:09:31PM -0700, J. Andrew Rogers wrote:
 On Tue, 2004-09-14 at 21:30, Joe Conway wrote:
  That's exactly what we're doing, but using inherited tables instead of a 
  union view. With inheritance, there is no need to rebuild the view each 
  time a table is added or removed. Basically, in our application, tables 
  are partitioned by either month or week, depending on the type of data 
  involved, and queries are normally date qualified.
 
 
 
 We do something very similar, also using table inheritance and a lot of 
 triggers to automatically generate partitions and so forth.  It works
 pretty well, but it is a custom job every time I want to implement a
 partitioned table.  You can save a lot on speed and space if you use it
 to break up large tables with composite indexes, since you can drop
 columns from the table depending on how you use it.  A big part of

Forgive my ignorance, but I didn't think you could have a table that
inherits from a parent not have all the columns. Or is that not what you
mean by 'you can drop columns from the table...'?

This is one advantage I see to a big UNION ALL view; if you're doing
partitioning based on unique values, you don't actually have to store
that value in the partition tables. For example,
http://stats.distributed.net has a table that details how much work each
participant did each day for each project. Storing project_id in that
table is an extra 4 bytes... doesn't sound like much until you consider
that the table has over 130M rows right now. So it would be nice to have
an easy way to partition the table based on unique project_id's and not
waste space in the partition tables on a field that will be the same for
every row (in each partition).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Article about PostgreSQL and RAID in Brazil

2004-09-16 Thread Jim C. Nasby

On Thu, Sep 16, 2004 at 10:50:33AM -0700, Josh Berkus wrote:
 The second step is to have lots of disks; 5 drives is a minimum for really 
 good performance.   3-drive RAID5, in particular, is a poor performer for 
 PostgreSQL, often resulting in I/O that is 40% or less as efficient as a 
 single disk due to extremely slow random seeks and little parallelization.
 
 Once you have 6 drives or more, opinions are divided on whether RAID 10 or 
 RAID 5 is better.   I think it partly depends on your access pattern.

What about benefits from putting WAL and pg_temp on seperate drives?
Specifically, we have a box with 8 drives, 2 in a mirror with the OS and
WAL and pg_temp; the rest in a raid10 with the database on it. Do you
think it would have been better to make one big raid10? What if it was
raid5? And what if it was only 6 drives total?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Article about PostgreSQL and RAID in Brazil

2004-09-16 Thread Jim C. Nasby

On Thu, Sep 16, 2004 at 02:07:37PM -0700, Josh Berkus wrote:
 Jim,
 
  What about benefits from putting WAL and pg_temp on seperate drives?
  Specifically, we have a box with 8 drives, 2 in a mirror with the OS and
  WAL and pg_temp; the rest in a raid10 with the database on it. Do you
  think it would have been better to make one big raid10? What if it was
  raid5? And what if it was only 6 drives total?
 
 OSDL's finding was that even with a large RAID array, it still benefits you to 
 have WAL on a seperate disk resource ... substantially, like 10% total 
 performance.However, your setup doesn't get the full possible benefit, 
 since WAL is sharing the array with other resources.
 
Yes, but if a 3 drive raid array is 40% slower than a single disk it
seems like the 10% benefit for having WAL on a seperate drive would
still be a losing proposition.

BTW, my experience with our setup is that the raid10 is almost always
the IO bottleneck, and not the mirror with everything else on it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Caching of Queries

2004-09-27 Thread Jim C. Nasby

On Thu, Sep 23, 2004 at 08:29:25AM -0700, Mr Pink wrote:
 Not knowing anything about the internals of pg, I don't know how this relates, but 
 in theory, 
 query plan caching is not just about saving time re-planning queries, it's about 
 scalability.
 Optimizing queries requires shared locks on the database metadata, which, as I 
 understand it
 causes contention and serialization, which kills scalability. 

One of the guru's can correct me if I'm wrong here, but AFAIK metadata
lookups use essentially the same access methods as normal queries. This
means MVCC is used and no locking is required. Even if locks were
required, they would be shared read locks which wouldn't block each
other.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Caching of Queries

2004-09-30 Thread Jim C. Nasby

On Mon, Sep 27, 2004 at 09:30:31PM +0100, Matt Clark wrote:
 It's certainly the case that the typical web app (which, along with 
 warehouses, seems to be one half of the needy apps), could probably do 
 worse than use pooling as well.  I'm not well up enough on pooling to 
 know how bulletproof it is though, which is why I included it in my list 
 of things that make me go 'hmm'.  It would be really nice not to 
 have to take both things together.
 
If you're not using a connection pool of some kind then you might as
well forget query plan caching, because your connect overhead will swamp
the planning cost. This does not mean you have to use something like
pgpool (which makes some rather questionable claims IMO); any decent web
application language/environment will support connection pooling.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Caching of Queries

2004-10-01 Thread Jim C. Nasby

On Fri, Oct 01, 2004 at 06:43:42AM +0100, Matt Clark wrote:
 
 If you're not using a connection pool of some kind then you might as
 well forget query plan caching, because your connect overhead will swamp
 the planning cost. This does not mean you have to use something like
 pgpool (which makes some rather questionable claims IMO); any decent web
 application language/environment will support connection pooling.
  
 
 Hmm, a question of definition -  there's a difference between a pool and 
 a persistent connection.  Pretty much all web apps have one connection 
 per process, which is persistent (i.e. not dropped and remade for each 
 request), but not shared between processes, therefore not pooled.

OK, that'd work too... the point is if you're re-connecting all the time
it doesn't really matter what else you do for performance.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Caching of Queries

2004-10-04 Thread Jim C. Nasby

On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote:
 Transparent query caching is the industry standard for how these things 
 are handled.   However, Postgres' lack of this feature has made me consider 
 other approaches, and I'm starting to wonder if the standard query caching 
 -- where a materialized query result, or some reduction thereof, is cached in 
 database memory -- isn't the best way to cache things.  I'm going to 
 abbreviate it SQC for the rest of this e-mail.
 
Not to quibble, but are you sure that's the standard? Oracle and DB2
don't do this, and I didn't think MSSQL did either. What they do do is
cache query *plans*. This is a *huge* deal in Oracle; search
http://asktom.oracle.com for 'soft parse'.

In any case, I think a means of marking some specific queries as being
cachable is an excellent idea; perfect for 'static data' scenarios. What
I don't know is how much will be saved.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Jim C. Nasby

And obviously make sure you're vacuuming frequently.

On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote:
 Steve,
 
  I'm used to performance tuning on a select-heavy database, but this
  will have a very different impact on the system. Does anyone have any
  experience with an update heavy system, and have any performance hints
  or hardware suggestions?
 
 Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
 using commit_siblings/commit_delay if it's a multi-stream application.   
 Figure out ways to do inserts instead of updates where possible, and COPY 
 instead of insert, where possible.  Put your WAL on its own disk resource.
 
 I'm a little curious as to what kind of app would be 95% writes.  A log?
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Does PostgreSQL run with Oracle?

2004-10-21 Thread Jim C. Nasby

On Fri, Oct 15, 2004 at 10:19:48AM -0700, Steve Atkins wrote:
 On Fri, Oct 15, 2004 at 11:54:44AM -0500, [EMAIL PROTECTED] wrote:
  My basic question to the community is is PostgreSQL approximately as fast
  as Oracle?
 
  I'm currently running single processor UltraSPARC workstations, and intend
  to use Intel Arch laptops and Linux.  The application is a big turnkey
  workstation app.  I know the hardware switch alone will enhance
  performance, and may do so to the point where even a slower database will
  still be adequate.
 
 I have found that PostgreSQL seems to perform poorly on Solaris/SPARC
 (less so after recent improvements, but still...) compared to x86
 systems - more so than the delta between Oracle on the two platforms.
 Just a gut impression, but it might mean that comparing the two
 databases on SPARC may not be that useful comparison if you're
 planning to move to x86.
 
As a point of reference, an IBM hardware sales rep I worked with a few
years ago told me that he got a lot of sales from Oracle shops that were
running Sun and switched to RS/6000. Basically, for a given workload, it
would take 2x the number of Sun CPUs as RS/6000 CPUs. The difference in
Oracle licensing costs was usually enough to pay for the new hardware in
one year.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] OS desicion

2004-10-21 Thread Jim C. Nasby

On Wed, Oct 20, 2004 at 09:38:51AM -0700, Josh Berkus wrote:
 Tom,
 
  You are asking the wrong question.  The best OS is the OS you (and/or
  the customer)  knows and can administer competently.  
 
 I'll have to 2nd this.

I'll 3rd but add one tidbit: FreeBSD will schedule disk I/O based on
process priority, while linux won't. This can be very handy for things
like vacuum.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Jim C. Nasby

Note that most people are now moving away from raw devices for databases
in most applicaitons. The relatively small performance gain isn't worth
the hassles.

On Thu, Oct 21, 2004 at 12:27:27PM +0200, Steinar H. Gunderson wrote:
 On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote:
  I suppose I'm just idly wondering really.  Clearly it's against PG
  philosophy to build an FS or direct IO management into PG, but now it's so
  relatively easy to plug filesystems into the main open-source Oses, It
  struck me that there might be some useful changes to, say, XFS or ext3, that
  could be made that would help PG out.
 
 This really sounds like a poor replacement for just making PostgreSQL use raw
 devices to me. (I have no idea why that isn't done already, but presumably it
 isn't all that easy to get right. :-) )
 
 /* Steinar */
 -- 
 Homepage: http://www.sesse.net/
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Sequential Scan with LIMIT

2004-10-28 Thread Jim C. Nasby

On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
 But the LIMIT will cut the cost of the seqscan case too.  Given the
 numbers you posit above, about one row in five will have 'myval', so a
 seqscan can reasonably expect to hit the first matching row in the first
 page of the table.  This is still cheaper than doing an index scan
 (which must require reading at least one index page plus at least one
 table page).
 
 The test case you are showing is probably suffering from nonrandom
 placement of this particular data value; which is something that the
 statistics we keep are too crude to detect.
 
Isn't that exactly what pg_stats.correlation is?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Sequential Scan with LIMIT

2004-10-28 Thread Jim C. Nasby

On Thu, Oct 28, 2004 at 07:49:28PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Sun, Oct 24, 2004 at 04:11:53PM -0400, Tom Lane wrote:
  The test case you are showing is probably suffering from nonrandom
  placement of this particular data value; which is something that the
  statistics we keep are too crude to detect.
  
  Isn't that exactly what pg_stats.correlation is?
 
 No.  A far-from-zero correlation gives you a clue that on average, *all*
 the data values are placed nonrandomly ... but it doesn't really tell
 you much one way or the other about a single data value.
 
Maybe I'm confused about what the original issue was then... it appeared
that you were suggesting PGSQL was doing a seq scan instead of an index
scan because it thought it would find it on the first page if the data
was randomly distributed. If the correlation is highly non-zero though,
shouldn't it 'play it safe' and assume that unless it's picking the min
or max value stored in statistics it will be better to do an index scan,
since the value it's looking for is probably in the middle of the table
somewhere? IE: if the values in the field are between 1 and 5 and the
table is clustered on that field then clearly an index scan would be
better to find a row with field=3 than a seq scan.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[PERFORM] seqscan strikes again

2004-11-09 Thread Jim C. Nasby

) (actual time=0.018..0.038 rows=10 loops=1)
 Total runtime: 218.644 ms
(15 rows)

opensims=# 

I'd really like to avoid putting a 'set enable_seqscan=false' in my
code, especially since this query only has a problem if it's run on a
large date/time window, which normally doesn't happen.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] seqscan strikes again

2004-11-10 Thread Jim C. Nasby

Which column would you recommend? Did something stick out at you?

On Tue, Nov 09, 2004 at 03:14:36PM -0800, Joshua D. Drake wrote:
 
 opensims=# 
 
 I'd really like to avoid putting a 'set enable_seqscan=false' in my
 code, especially since this query only has a problem if it's run on a
 large date/time window, which normally doesn't happen.
 
 Try increasing your statistics target for the column and then rerunning 
 analyze.
 
 Sincerely,
 
 Joshua D. Drake

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Jim C. Nasby

On Wed, Nov 17, 2004 at 10:32:48PM +, Simon Riggs wrote:
 The main issue is that PostgreSQL's default histogram statistics setting
 is lower than other RDBMS. This means that it is less able to
 discriminate between cases such as yours that are close to the edge.
 This is a trade-off between run-time of the ANALYZE command and the
 benefit it produces. As Joshua suggests, increasing the statistics
 target for this table will likely allow the optimizer to correctly
 determine the selectivity of the index and take the right path.

Is there still a good reason to have the histogram stats so low? Should
the default be changed to more like 100 at this point?

Also, how extensively does the planner use n_distinct, null_frac,
reltuples and the histogram to see what the odds are of finding a unique
value or a low number of values? I've seen cases where it seems the
planer doesn't think it'll be getting a unique value or a small set of
values even though stats indicates that it should be.

One final question... would there be interest in a process that would
dynamically update the histogram settings for tables based on how
distinct/unique each field was?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] memcached and PostgreSQL

2004-11-24 Thread Jim C. Nasby

If instead of a select you do a select for update I think this would be
transaction safe. Nothing would be able to modify the data in the
database between when you do the SELECT and when you commit. If the
transaction fails the value in memcached will be correct.

Also, it's not clear if you're doing an update here or not... If you're
doing an update then this wouldn't work. You'd want to do your update,
then re-insert the value into memcached outside of the update
transaction.

On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote:
 My point was that there are two failure cases --- one where the cache 
 is
 slightly out of date compared to the db server --- these are cases 
 where
 the cache update is slightly before/after the commit.
 
 I was thinking about this and ways to minimize this even further.  Have 
 memcache clients add data and have a policy to have the database only 
 delete data.  This sets the database up as the bottleneck again, but 
 then you have a degree of transactionality that couldn't be previously 
 achieved with the database issuing replace commands.  For example:
 
 1) client checks the cache for data and gets a cache lookup failure
 2) client beings transaction
 3) client SELECTs data from the database
 4) client adds the key to the cache
 5) client commits transaction
 
 This assumes that the client won't rollback or have a transaction 
 failure.  Again, in 50M transactions, I doubt one of them would fail 
 (sure, it's possible, but that's a symptom of bigger problems: 
 memcached isn't an RDBMS).
 
 The update case being:
 
 1) client begins transaction
 2) client updates data
 3) database deletes record from memcache
 4) client commits transaction
 5) client adds data to memcache
 
 The second is
 where the cache update happens and the commit later fails, or the 
 commit
 happens and the cache update never happens.
 
 Having pgmemcache delete, not replace data addresses this second issue. 
  -sc
 
 -- 
 Sean Chittenden
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Postgres vs. DSpam

2004-11-29 Thread Jim C. Nasby

FWIW, those queries won't be able to use an index. A better WHERE clause
would be:

AND last_hit  CURRENT_DATE - 60

On Fri, Nov 26, 2004 at 02:37:12PM +1300, Andrew McMillan wrote:
 On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
  Hi,
  
  I have installed the dspam filter
  (http://www.nuclearelephant.com/projects/dspam) on our mail server
  (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
  with a quite low traffic of 4000 messages/day. So it's a quite common
  platform/environment, nothing spectacular.
 
 I am using DSpam with PostgreSQL here.  I have a daily job that cleans
 the DSpam database up, as follows:
 
 DELETE FROM dspam_token_data
   WHERE (innocent_hits*2) + spam_hits  5
   AND CURRENT_DATE - last_hit  60;
 
 DELETE FROM dspam_token_data
   WHERE innocent_hits = 1
   AND CURRENT_DATE - last_hit  30;
 
 DELETE FROM dspam_token_data
   WHERE CURRENT_DATE - last_hit  180;
 
 DELETE FROM dspam_signature_data
   WHERE CURRENT_DATE - created_on  14;
 
 VACUUM dspam_token_data;
 
 VACUUM dspam_signature_data;
 
 
 
 I also occasionally do a VACUUM FULL ANALYZE; on the database as well.
 
 
 In all honesty though, I think that MySQL is better suited to DSpam than
 PostgreSQL is.
 
 
  Please, could anyone explain me this difference?
  Is Postgres that bad?
  Is MySQL that good?
  Am I the only one to have observed this behavior?
 
 I believe that what DSpam does that is not well-catered for in the way
 PostgreSQL operates, is that it does very frequent updates to rows in
 (eventually) quite large tables.  In PostgreSQL the UPDATE will result
 internally in a new record being written, with the old record being
 marked as deleted.  That old record won't be re-used until after a
 VACUUM has run, and this means that the on-disk tables will have a lot
 of dead rows in them quite quickly.
 
 The reason that PostgreSQL operates this way, is a direct result of the
 way transactional support is implemented, and it may well change in a
 version or two.  It's got better over the last few versions, with things
 like pg_autovacuum, but that approach still doesn't suit some types of
 database updating.
 
 Cheers,
   Andrew.
 -
 Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
 WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
 DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
 -
 



-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Normalization or Performance

2004-12-02 Thread Jim C. Nasby

On Thu, Dec 02, 2004 at 03:05:55PM -0200, Alvaro Nunes Melo wrote:
 Hi,
 
 Before writing this mail, I'd researched a little about this topic, and
 got some opinions from guys like Fabien Pascal, who argues that logical
 design should be separated from physical design, and other sources. As
 this is not fact, I'm writing to you guys, that make things work in real
 world.
 
 We started our first big (for our company standards) project always
 thinking in normalization. But once we imported legacy data into the DB,
 things got harder.
 
 One example is the clients status. A client might be active, inactive or
 pending (for many reasons). We store all the status a client have since
 it is in the system. To check what is the actual status of a client, we
 get the last status from this historical status table. This take a
 considerable time, so our best results were achieved building  a
 function that checks the status and indexing this function. The problem
 is that indexed functions mus bu immutable, so as you can figure, if the
 status change after the creation of the index, the retunr of the
 function is still the same.
 
 What do you suggest for situations like this? Should I add a field to
 clients table and store its actual status, keep storing data in the
 historical table an control its changes with a trigger?
 
It seems you shouldn't have to resort to this. SELECT status FROM
client_status WHERE client_id = blah ORDER BY status_date DESC LIMIT 1
should be pretty fast given an index on client_id, status_date (which
should be able to be unique).

 There are other situations that are making things difficult to us. For
 example, one query must return the total amount a client bought in the
 last 90 days. It's taking too long, when we must know it for many
 clients, many times. So should I create summarization tables to store
 this kind of stuff, update it with a trigger in daily basis (for
 example), and solve this problem with one join?
 
This sounds like a more likely candidate for a summary table, though you
might not want to use a trigger. Unless you need absolutely up-to-date
information it seems like a nightly process to update the totals would
be better and more efficient.

 Our database is not that big. The larger table has about 7.000.000 rows.
 About 50.000 clients, half of them active. All that I'd point out above
 uses indexes for queries, etc. But even with this it's not been fast
 enough. We have a Dell server for this (I know, the Dell issue), a Dual
 Xeon 2.8, SCSI HD, 1 GB mem. Do we need better hardware for our system?

Is all this on a single HD? That's going to be a huge bottleneck. You'll
be much better off with a mirrored partition for your WAL files and
either raid5 or raid10 for the database itself. You'd probably be better
off with more memory as well. If you're going to buy a new box instead
of upgrade your existing one, I'd recommend going with an Opteron
because of it's much better memory bandwidth.

For reference, stats.distributed.net is a dual Opteron 244 1.8GHz with
4G ram, a 200G mirror for WAL and the system files and a 6x200G RAID10
for the database (all SATA drives). The largest table 120M rows and
825,000 8k pages. I can scan 1/5th of that table via an index scan in
about a minute. (The schema can be found at
http://minilink.org/cvs.distributed.net/l3.sql.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Partitioned table performance

2004-12-21 Thread Jim C. Nasby

Sorry for the late reply, so I included the whole thread. Should this be
a TODO?

On Wed, Dec 15, 2004 at 08:30:08PM -0500, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  But I'm a bit puzzled. Why would Append have any significant cost? It's just
  taking the tuples from one plan node and returning them until they run out,
  then taking the tuples from another plan node. It should have no i/o cost 
  and
  hardly any cpu cost. Where is the time going?
 
 As best I can tell by profiling, the cost of the Append node per se is
 indeed negligible --- no more than a couple percent of the runtime in
 CVS tip for a test case similar to Stacy White's example.
 
 It looks bad in EXPLAIN ANALYZE, but you have to realize that passing
 the tuples up through the Append node doubles the instrumentation
 overhead of EXPLAIN ANALYZE, which is pretty sizable already.  (If you
 turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE,
 the actual elapsed time is about double, at least for me.)
 
 The other effect, which I hadn't expected, is that the seqscans
 themselves actually slow down.  I get
 
 regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ;
QUERY PLAN
 
  Aggregate  (cost=16414.32..16414.32 rows=1 width=4) (actual 
 time=32313.980..32313.988 rows=1 loops=1)
-  Append  (cost=0.00..13631.54 rows=556555 width=4) (actual 
 time=0.232..21848.401 rows=524289 loops=1)
  -  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4) (actual 
 time=0.020..0.020 rows=0 loops=1)
  -  Seq Scan on sub_foo1 super_foo  (cost=0.00..6815.77 rows=278277 
 width=4) (actual time=0.187..6926.395 rows=262144 loops=1)
  -  Seq Scan on sub_foo2 super_foo  (cost=0.00..6815.77 rows=278277 
 width=4) (actual time=0.168..7026.953 rows=262145 loops=1)
  Total runtime: 32314.993 ms
 (6 rows)
 
 regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1;
QUERY PLAN
 
  Aggregate  (cost=8207.16..8207.16 rows=1 width=4) (actual 
 time=9850.420..9850.428 rows=1 loops=1)
-  Seq Scan on sub_foo1  (cost=0.00..6815.77 rows=278277 width=4) (actual 
 time=0.202..4642.401 rows=262144 loops=1)
  Total runtime: 9851.423 ms
 (3 rows)
 
 Notice the actual times for the sub_foo1 seqscans.  That increase (when
 counted for both input tables) almost exactly accounts for the
 difference in non-EXPLAIN ANALYZE runtime.
 
 After digging around, I find that the reason for the difference is that
 the optimization to avoid a projection step (ExecProject) isn't applied
 for scans of inheritance unions:
 
   /*
* Can't do it with inheritance cases either (mainly because Append
* doesn't project).
*/
   if (rel-reloptkind != RELOPT_BASEREL)
   return false;
 
 So if you were to try the example in a pre-7.4 PG, which didn't have
 that optimization, you'd probably find that the speeds were just about
 the same.  (I'm too lazy to verify this though.)
 
 I looked briefly at what it would take to cover this case, and decided
 that it's a nontrivial change, so it's too late to do something about it
 for 8.0.  I think it's probably possible to fix it though, at least for
 cases where the child tables have rowtypes identical to the parent.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioned table performance

2004-12-21 Thread Jim C. Nasby

On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote:
 Greg,
 
  Well Oracle has lots of partitioning intelligence pushed up to the planner
  to avoid overhead.
 
  If you have a query with something like WHERE date = '2004-01-01' and
  date is your partition key (even if it's a range) then Oracle will figure
  out which partition it will need at planning time.
 
 Hmmm ... well, we're looking at making a spec for Postgres Table 
 Partitioning.   
 Maybe you could help?

This is something I've been thinking about doing for
http://stats.distributed.net; is there a formal project for this
somewhere?

On a different note, has anyone looked at the savings you get by
ommitting the partition field from the child tables? ISTM that the
savings would be substantial for narrow tables. Of course that most
likely means doing a union view instead of inheritence, but I'm guessing
here. The table I'm thinking of partitioning is quite narrow (see
below), so I suspect that dropping project_id out would result in a
substantial savings (there's basically nothing that ever queries across
the whole table). With the data distribution, I suspect just breaking
project ID's 205, 5, and 25 into partitioned tables that didn't contain
project_id would save about 450M (4bytes * 95% * 130M).

(the table has ~130M rows)

   Table public.email_contrib
   Column   |  Type   | Modifiers 
+-+---
 project_id | integer | not null
 id | integer | not null
 date   | date| not null
 team_id| integer | 
 work_units | bigint  | not null
Indexes:
email_contrib_pkey primary key, btree (project_id, id, date)
email_contrib__pk24 btree (id, date) WHERE (project_id = 24)
email_contrib__pk25 btree (id, date) WHERE (project_id = 25)
email_contrib__pk8 btree (id, date) WHERE (project_id = 8)
email_contrib__project_date btree (project_id, date)
Foreign-key constraints:
fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) ON 
UPDATE CASCADE
fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES 
stats_team(team) ON UPDATE CASCADE

stats=# select * from pg_stats where tablename='email_contrib' and
attname='project_id';
 schemaname |   tablename   |  attname   | null_frac | avg_width | n_distinct | 
most_common_vals  |most_common_freqs | histogram_bounds | 
correlation 
 
+---++---+---++---+-+--+-
  public | email_contrib | project_id | 0 | 4 | 6 | 
{205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | | 
   0.703936
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Jim C. Nasby

On Mon, Jan 10, 2005 at 12:46:01PM -0500, Alex Turner wrote:
 You sir are correct!  You can't use perl in MS-SQL or Oracle ;).
 
On the other hand, PL/SQL is incredibly powerful, especially combined
with all the tools/utilities that come with Oracle. I think you'd be
hard-pressed to find too many real-world examples where you could do
something with a PostgreSQL procedural language that you couldn't do
with PL/SQL.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance delay

2005-01-13 Thread Jim C. Nasby

On Thu, Jan 13, 2005 at 07:14:10PM +0800, Hasnul Fadhly bin Hasan wrote:
 Hi Richard,
 
 Thanks for the reply.. is that the case? i thought it would comply to 
 the where condition first..
 and after that it will format the output to what we want..

That is in fact exactly what it's doing. The second query is faster not
because of the where clause, but because of the limit clause. The first
query builds a list of id, long2ip(srcip), long2ip(dstip) for the
timestamp range, then it orders that list and gives you the first 30.
The second query builds a list of everything from sometable for the
timestamp range, orders it, keeps the first 30, THEN in calculates
long2ip based on that list of 30 items.

 Hasnul
 
 Richard Huxton wrote:
 
 Hasnul Fadhly bin Hasan wrote:
 
 Hi,
 
 just want to share with all of you a wierd thing that i found when i 
 tested it.
 
 i was doing a query that will call a function long2ip to convert 
 bigint to ips.
 
 so the query looks something like this.
 
 select id, long2ip(srcip), long2ip(dstip) from sometable
 where timestamp between timestamp '01-10-2005' and timestamp 
 '01-10-2005 23:59' order by id limit 30;
 
 for your info, there are about 300k rows for that timeframe.
 
 it cost me about 57+ secs to get the list.
 
 which is about the same if i query
 select id, long2ip(srcip), long2ip(dstip) from sometable
 where timestamp between timestamp '01-10-2005' and timestamp 
 '01-10-2005 23:59'
 
 it will cost me about 57+ secs also.
 
 Now if i did this
 select id,long2ip(srcip), long2ip(dstip) from (
 * from sometable
 where timestamp between timestamp '01-10-2005' and timestamp 
 '01-10-2005 23:59' order by id limit 30) as t;
 
 it will cost me about 3+ secs
 
 
 The difference will be that in the final case you only make 30 calls 
 to long2ip() whereas in the first two you call it 300,000 times and 
 then throw away most of them.
 Try running EXPLAIN ANALYSE ... for both - that will show how PG is 
 planning the query.
 -- 
   Richard Huxton
   Archonet Ltd
 
 
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
   http://archives.postgresql.org
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby

On Thu, Jan 20, 2005 at 10:08:47AM -0500, Stephen Frost wrote:
 * Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
  PostgreSQL has replication, but not partitioning (which is what you want).
 
 It doesn't have multi-server partitioning..  It's got partitioning
 within a single server (doesn't it?  I thought it did, I know it was
 discussed w/ the guy from Cox Communications and I thought he was using
 it :).

No, PostgreSQL doesn't support any kind of partitioning, unless you
write it yourself. I think there's some work being done in this area,
though.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby

On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote:
  
 I was thinking the same! I'd like to know how other databases such as Oracle
 do it.
 
In a nutshell, in a clustered environment (which iirc in oracle means
shared disks), they use a set of files for locking and consistency
across machines. So you better have fast access to the drive array, and
the array better have caching of some kind.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby

On Thu, Jan 20, 2005 at 07:12:42AM -0800, Joshua D. Drake wrote:
 
 then I was thinking. Couldn't he use
 multiple databases
 over multiple servers with dblink?
 
 It is not exactly how I would want to do it, but it would provide what
 he needs I think???

 
 
 Yes seems to be the only solution ... but I'm a little disapointed about 
 this ... could you explain me why there is not this kind of 
 functionnality ... it seems to be a real need for big applications no ?
  
 
 Because it is really, really hard to do correctly and hard
 equals expensive.

To expand on what Josh said, the expense in this case is development
resources. If you look on the developer site you'll see a huge TODO list
and a relatively small list of PostgreSQL developers. To develop a
cluster solution similar to RAC would probably take the efforts of the
entire development team for a year or more, during which time very
little else would be done.

I'm glad to see your persistance in wanting to use PostgreSQL, and there
might be some kind of limited clustering scheme that could be
implemented without a great amount of effort by the core developers. In
that case I think there's a good chance you could find people willing to
work on it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[PERFORM] Odd number of rows expected

2005-01-21 Thread Jim C. Nasby

I have a query that thinks it's going to generate a huge number of rows,
when in fact it won't:

INSERT INTO page_log.rrs
( bucket_id, page_id,project_id,other, 
hits,min_hits,max_hits,total_duration,min_duration,max_duration )
SELECT a.rrs_bucket_id, page_id,project_id,other
, 
count(*),count(*),count(*),sum(duration),min(duration),max(duration)
FROM
(SELECT b.bucket_id AS rrs_bucket_id, s.*
FROM rrs.bucket b
JOIN page_log.log s
ON (
b.prev_end_time   log_time
AND b.end_time = log_time )
WHERE b.rrs_id = '1'
AND b.end_time = '2005-01-21 20:23:00+00'
AND b.end_time  '1970-01-01 00:00:00+00'
) a
GROUP BY rrs_bucket_id, page_id,project_id,other;


QUERY PLAN  
   
---
 Subquery Scan *SELECT*  (cost=170461360504.98..183419912556.69 rows=91175544 
width=77)
   -  GroupAggregate  (cost=170461360504.98..183418316984.67 rows=91175544 
width=29)
 -  Sort  (cost=170461360504.98..171639141309.21 rows=471112321692 
width=29)
   Sort Key: b.bucket_id, s.page_id, s.project_id, s.other
   -  Nested Loop  (cost=0.00..17287707964.10 rows=471112321692 
width=29)
 -  Seq Scan on bucket b  (cost=0.00..9275.84 rows=281406 
width=20)
   Filter: ((rrs_id = 1) AND (end_time = '2005-01-21 
20:23:00+00'::timestamp with time zone) AND (end_time  '1970-01-01 
00:00:00+00'::timestamp with time zone))
 -  Index Scan using log__log_time on log s  
(cost=0.00..36321.24 rows=1674137 width=33)
   Index Cond: ((outer.prev_end_time  s.log_time) 
AND (outer.end_time = s.log_time))

The final rowcount after the aggregate will actually be about 1.9M
rows:

QUERY PLAN  
   
---
 Subquery Scan *SELECT*  (cost=170461360504.98..183419912556.69 rows=91175544 
width=77) (actual time=156777.374..234613.843 rows=1945123 loops=1)
   -  GroupAggregate  (cost=170461360504.98..183418316984.67 rows=91175544 
width=29) (actual time=156777.345..214246.751 rows=1945123 loops=1)
 -  Sort  (cost=170461360504.98..171639141309.21 rows=471112321692 
width=29) (actual time=156777.296..177517.663 rows=4915567 loops=1)
   Sort Key: b.bucket_id, s.page_id, s.project_id, s.other
   -  Nested Loop  (cost=0.00..17287707964.10 rows=471112321692 
width=29) (actual time=0.662..90702.755 rows=4915567 loops=1)
 -  Seq Scan on bucket b  (cost=0.00..9275.84 rows=281406 
width=20) (actual time=0.063..1591.591 rows=265122 loops=1)
   Filter: ((rrs_id = 1) AND (end_time = '2005-01-21 
20:23:00+00'::timestamp with time zone) AND (end_time  '1970-01-01 
00:00:00+00'::timestamp with time zone))
 -  Index Scan using log__log_time on log s  
(cost=0.00..36321.24 rows=1674137 width=33) (actual time=0.014..0.174 rows=19 
loops=265122)
   Index Cond: ((outer.prev_end_time  s.log_time) 
AND (outer.end_time = s.log_time))
 Total runtime: 299623.954 ms

Everything is analyzed, and the statistics target is set to 1000.
Basically, it seems that it doesn't understand that each row in log will
match up with at most one row in bucket. There is a unique index on
bucket(rrs_id, end_time), so it should be able to tell this.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-22 Thread Jim C. Nasby

From http://developer.postgresql.org/todo.php:

Maintain a map of recently-expired rows

This allows vacuum to reclaim free space without requiring a sequential
scan 

On Sat, Jan 22, 2005 at 12:20:53PM -0500, Greg Stark wrote:
 Dawid Kuroczko [EMAIL PROTECTED] writes:
 
  Quick thought -- would it be to possible to implement a 'partial VACUUM'
  per analogiam to partial indexes?
 
 No.
 
 But it gave me another idea. Perhaps equally infeasible, but I don't see why.
 
 What if there were a map of modified pages. So every time any tuple was marked
 deleted it could be marked in the map as modified. VACUUM would only have to
 look at these pages. And if it could mark as free every tuple that was marked
 as deleted then it could unmark the page.
 
 The only downside I see is that this could be a source of contention on
 multi-processor machines running lots of concurrent update/deletes.
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Cheaper VACUUMing

2005-01-23 Thread Jim C. Nasby

For reference, here's the discussion about this that took place on
hackers: http://lnk.nu/archives.postgresql.org/142.php 

On Sun, Jan 23, 2005 at 01:16:20AM -0500, Christopher Browne wrote:
 A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Greg Stark) 
 wrote:
  Dawid Kuroczko [EMAIL PROTECTED] writes:
 
  Quick thought -- would it be to possible to implement a 'partial VACUUM'
  per analogiam to partial indexes?
 
  No.
 
  But it gave me another idea. Perhaps equally infeasible, but I don't see 
  why.
 
  What if there were a map of modified pages. So every time any tuple
  was marked deleted it could be marked in the map as modified. VACUUM
  would only have to look at these pages. And if it could mark as free
  every tuple that was marked as deleted then it could unmark the
  page.
 
  The only downside I see is that this could be a source of contention
  on multi-processor machines running lots of concurrent
  update/deletes.
 
 I was thinking the same thing after hearing fairly extensive
 pooh-poohing of the notion of vacuuming based on all the pages in
 the shared cache.
 
 This hot list page table would probably need to be a hash table.  It
 rather parallels the FSM, including the way that it would need to be
 limited in size.
 -- 
 wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com').
 http://cbbrowne.com/info/lsf.html
 Rules  of  the  Evil  Overlord  #57. Before  employing  any  captured
 artifacts  or machinery, I  will carefully  read the  owner's manual.
 http://www.eviloverlord.com/
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Odd number of rows expected

2005-01-23 Thread Jim C. Nasby

On Sat, Jan 22, 2005 at 10:18:00PM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  (SELECT b.bucket_id AS rrs_bucket_id, s.*
  FROM rrs.bucket b
  JOIN page_log.log s
  ON (
  b.prev_end_time   log_time
  AND b.end_time = log_time )
  WHERE b.rrs_id = '1'
  AND b.end_time = '2005-01-21 20:23:00+00'
  AND b.end_time  '1970-01-01 00:00:00+00'
  ) a
 
  Basically, it seems that it doesn't understand that each row in log will
  match up with at most one row in bucket. There is a unique index on
  bucket(rrs_id, end_time), so it should be able to tell this.
 
 Why should it be able to tell that?

Indexes:
rrs_bucket__rrs_id__end_time unique, btree (rrs_id, end_time)

Err, crap, I guess that wouldn't work, because of prev_end_time not
being in there...

In english, each bucket defines a specific time period, and no two
buckets can over-lap (though there's no constraints defined to actually
prevent that). So reality is that each row in page_log.log will in fact
only match one row in bucket (at least for each value of rrs_id).

Given that, would the optimizer make a better choice if it knew that
(since it means a much smaller result set). Is there any way to tell the
optimizer this is the case?

Maybe what I ultimately need is a timestamp with interval datatype, that
specifies an interval that's fixed in time.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-23 Thread Jim C. Nasby

On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote:
 There was some discussion in Toronto this week about storing bitmaps
 that would tell VACUUM whether or not there was any need to visit
 individual pages of each table.  Getting rid of useless scans through
 not-recently-changed areas of large tables would make for a significant
 reduction in the cost of VACUUM.
FWIW, that's already on the TODO. See also
http://lnk.nu/archives.postgresql.org/142.php.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-24 Thread Jim C. Nasby

On Sun, Jan 23, 2005 at 03:40:03PM -0500, Tom Lane wrote:
 The real issue with any such scheme is that you are putting maintenance
 costs into the critical paths of foreground processes that are executing
 user queries.  I think that one of the primary advantages of the
 Postgres storage design is that we keep that work outside the critical
 path and delegate it to maintenance processes that can run in the
 background.  We shouldn't lightly toss away that advantage.

To pull out the oft-used show me the numbers card... has anyone done a
study to see if keeping this stuff out of the 'critical path' actually
helps overall system performance? While the current scheme initially
speeds up transactions, eventually you have to run vacuum, which puts a
big load on the system. If you can put off vacuuming until off-hours
(assuming your system has off-hours), then this doesn't matter, but more
and more we're seeing systems where vacuum is a big performance issue
(hence recent work with the delay in vacuum so as not to swamp the IO
system).

If you vacuum as part of the transaction it's going to be more efficient
of resources, because you have more of what you need right there (ie:
odds are that you're on the same page as the old tuple). In cases like
that it very likely makes a lot of sense to take a small hit in your
transaction time up-front, instead of a larger hit doing a vacuum down
the road.

Of course, without numbers this is a bunch of hand-waving, but I don't
think it's valid to assume that minimizing the amount of work you do in
a transaction means better throughput without considering what it will
cost to do the work you're putting off until later.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Automagic tuning

2005-01-31 Thread Jim C. Nasby

On Mon, Jan 31, 2005 at 03:26:12PM -0500, Tom Lane wrote:
 Josh Berkus josh@agliodbs.com writes:
  I would like something that really measures values like random_page_cost
  or cpu_tuple_cost that are hardware dependent.
  
  I assume such thing does not exist?
 
  Nope.  You gotta whip out your calculator and run some queries.
 
 Preferably a whole lot of queries.  All the measurement techniques I can
 think of are going to have a great deal of noise, so you shouldn't
 twiddle these cost settings based on just a few examples.

Are there any examples of how you can take numbers from pg_stats_* or
explain analize and turn them into configuration settings (such and
random page cost)?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-01-31 Thread Jim C. Nasby

On Mon, Jan 31, 2005 at 09:41:32PM +0100, Cosimo Streppone wrote:
 2) The goal is to make the db handle 100 tps (something like
100 users). What kind of server and storage should I provide?
 
The actual servers our application runs on normally have
2 Intel Xeon processors, 2-4 Gb RAM, RAID 0/1/5 SCSI
disk storage with hard drives @ 10,000 rpm

You might look at Opteron's, which theoretically have a higher data
bandwidth. If you're doing anything data intensive, like a sort in
memory, this could make a difference.

 4) Is it correct to suppose that multiple RAID 1 arrays
can provide the fastest I/O ?
I usually reserve one RAID1 array to db data directory,
one RAID1 array to pg_xlog directory and one RAID1 array
for os and application needs.

RAID10 will be faster than RAID1. The key factor to a high performance
database is a high performance I/O system. If you look in the archives
you'll find people running postgresql on 30 and 40 drive arrays.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Automagic tuning

2005-01-31 Thread Jim C. Nasby

On Tue, Feb 01, 2005 at 12:06:27AM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Mon, Jan 31, 2005 at 03:26:12PM -0500, Tom Lane wrote:
  Preferably a whole lot of queries.  All the measurement techniques I can
  think of are going to have a great deal of noise, so you shouldn't
  twiddle these cost settings based on just a few examples.
 
  Are there any examples of how you can take numbers from pg_stats_* or
  explain analize and turn them into configuration settings (such and
  random page cost)?
 
 Well, the basic idea is to adjust random_page_cost so that the ratio of
 estimated cost to real elapsed time (as shown by EXPLAIN ANALYZE) is the
 same for seqscans and indexscans.  What you have to watch out for is
 that the estimated cost model is oversimplified and doesn't take into
 account a lot of real-world factors, such as the activity of other
 concurrent processes.  The reason for needing a whole lot of tests is
 essentially to try to average out the effects of those unmodeled
 factors, so that you have a number that makes sense within the planner's
 limited view of reality.

Given that, I guess the next logical question is: what would it take to
collect stats on queries so that such an estimate could be made? And
would it be possible/make sense to gather stats useful for tuning the
other parameters?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system

2005-02-01 Thread Jim C. Nasby

On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote:
 You might look at Opteron's, which theoretically have a higher data
 bandwidth. If you're doing anything data intensive, like a sort in
 memory, this could make a difference.
 
 Would Opteron systems need 64-bit postgresql (and os, gcc, ...)
 build to have that advantage?
 
Well, that would give you the most benefit, but the memory bandwidth is
still greater than on a Xeon. There's really no issue with 64 bit if
you're using open source software; it all compiles for 64 bits and
you're good to go. http://stats.distributed.net runs on a dual opteron
box running FreeBSD and I've had no issues.

 RAID10 will be faster than RAID1.
 
 Sorry Jim, by RAID10 you mean several raid1 arrays mounted on
 different linux partitions? Or several raid1 arrays that
 build up a raid0 array? In the latter case, who decides which
 data goes in which raid1 array? Raid Adapter?

You should take a look around online for a description of raid types.

There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored
drives (a RAID 0 built out of RAID 1s), the other is a mirror of two
RAID 0s. The former is much better; if you're lucky you can lose half
your drives without any data loss (if each dead drive is part of a
different mirror). Recovery is also faster.

You'll almost certainly be much happier with hardware raid instead of
software raid. stats.distributed.net runs a 3ware controller and SATA
drives.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout

2005-02-25 Thread Jim C. Nasby

On Wed, Feb 23, 2005 at 01:37:28PM -0500, Tom Lane wrote:
 Bruno Almeida do Lago [EMAIL PROTECTED] writes:
  Is there a real limit for max_connections? Here we've an Oracle server with
  up to 1200 simultaneous conections over it!
 
 [ shrug... ] If your machine has the beef to run 1200 simultaneous
 queries, you can set max_connections to 1200.
 
 The point of what you were quoting is that if you want to service
 1200 concurrent users but you only expect maybe 100 simultaneously
 active queries from them (and you have a database box that can only
 service that many) then you want to put a connection pooler in
 front of 100 backends, not try to start a backend for every user.
 
 Oracle may handle this sort of thing differently, I dunno.
 
   regards, tom lane

Oracle has some form of built-in connection pooling. I don't remember
the exact details of it off the top of my head, but I think it was a
'wedge' that clients would connect to as if it was the database, and the
wedge would then find an available database process to use.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] multi billion row tables: possible or insane?

2005-03-04 Thread Jim C. Nasby

On Tue, Mar 01, 2005 at 10:34:29AM +0100, Ramon Bastiaans wrote:
 Hi all,
 
 I am doing research for a project of mine where I need to store several 
 billion values for a monitoring and historical tracking system for a big 
 computer system. My currect estimate is that I have to store (somehow) 
 around 1 billion values each month (possibly more).

On a side-note, do you need to keep the actual row-level details for
history? http://rrs.decibel.org might be of some use.

Other than that, what others have said. Lots and lots of disks in
RAID10, and opterons (though I would choose opterons not for memory size
but because of memory *bandwidth*).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby

On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote:
 Stacy,
 
  Luckily they that had the chance to work with a truly fantastic DBA (the
  author of an Oracle Press performance tuning book even) before they could
  switch back.  He convinced them to make some of their indexes global.
  Performance dramatically improved (compared with both the unpartitioned
  schema, and the partitioned-and-locally-indexed schema), and they've since
  stayed with partitioned tables and a mix of local and global indexes.
 
 Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information 
 on 
 what values are found in what partition also solve this?Without 1/2 of 
 the overhead imposed by global indexes?
 
 I can actually see such a bitmap as being universally useful to the 
 partitioning concept ... for one, it would resolve the whole partition on 
 {value} issue.

I suspect both will have their uses. I've read quite a bit about global
v. local indexs in Oracle, and there are definately cases where global
is much better than local. Granted, there's some things with how Oracle
handles their catalog, etc. that might make local indexes more expensive
for them than they would be for PostgreSQL. It's also not clear how much
a 'partition bitmap' index would help.

As for the 'seqscan individual partitions' argument, that's not going to
work well at all for a case where you need to hit a relatively small
percentage of rows in a relatively large number of partitions. SELECT
... WHERE customer_id = 1 would be a good example of such a query
(assuming the table is partitioned on something like invoice_date).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby

On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  We probably also need multi-table indexes.
 
 As Josh says, that seems antithetical to the main point of partitioning,
 which is to be able to rapidly remove (and add) partitions of a table.
 If you have to do index cleaning before you can drop a partition, what's
 the point of partitioning?

Why would you need to do index cleaning first? Presumably the code that
goes to check a heap tuple that an index pointed at to ensure that it
was visible in the current transaction would be able to recognize if the
partition that tuple was in had been removed, and just ignore that index
entry. Granted, you'd need to clean the index up at some point
(presumably via vacuum), but it doesn't need to occur at partition drop
time.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Preventing query from hogging server

2005-03-28 Thread Jim C. Nasby

On Thu, Mar 24, 2005 at 01:07:39PM -0600, Matthew Nuzum wrote:
 I've tried `nice psql` in the past and I don't think that had much impact,
 but I haven't tried it on this query.

On linux, nice will only help if the query is CPU-bound. On FreeBSD,
nice affects I/O scheduling, as well as CPU, so it's a more effective
means of limiting the impact of large queries. I don't know how other
OS's handle this.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[PERFORM] Compressing WAL

2005-04-04 Thread Jim C. Nasby

Maybe better for -hackers, but here it goes anyway...

Has anyone looked at compressing WAL's before writing to disk? On a
system generating a lot of WAL it seems there might be some gains to be
had WAL data could be compressed before going to disk, since today's
machines are generally more I/O bound than CPU bound. And unlike the
base tables, you generally don't need to read the WAL, so you don't
really need to worry about not being able to quickly scan through the
data without decompressing it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Jim C. Nasby

On Tue, Apr 05, 2005 at 12:16:27AM -0400, Christopher Petrilli wrote:
 My fear is that it's some bizarre situation interacting with both
 issues, and one that might not be solvable.  Does anyone else have
 much experience with this sort of sustained COPY?

You might ask the guy who just posted to -admin about a database that's
doing 340M inserts a day in 300M transactions...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-06 Thread Jim C. Nasby

On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
 Can anyone suggest a more general rule?  Do we need for example to
 consider whether the relation membership is the same in two clauses
 that might be opposite sides of a range restriction?  It seems like
 
   a.x  b.y AND a.x  b.z

In a case like this, you could actually look at the  data in b and see
what the average range size is. If you wanted to get really fancy, the
optimizer could decide how best to access a based on each row of b.

 probably can be treated as a range restriction on a.x for this purpose,
 but I'm much less sure that the same is true of
 
   a.x  b.y AND a.x  c.z

Well, this could end up being much trickier, since who knows how b and c
are related. Though thinking about it, although I threw out the
row-by-row analysis idea to be glib, that would actually work in this
case; you could take a look at what b and c look like each time 'through
the loop'.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Jim C. Nasby

Sorry if I'm pointing out the obvious here, but it seems worth
mentioning. AFAIK all 3ware controllers are setup so that each SATA
drive gets it's own SATA bus. My understanding is that by and large,
SATA still suffers from a general inability to have multiple outstanding
commands on the bus at once, unlike SCSI. Therefore, to get good
performance out of SATA you need to have a seperate bus for each drive.
Theoretically, it shouldn't really matter that it's SATA over ATA, other
than I certainly wouldn't want to try and cram 8 ATA cables into a
machine...

Incidentally, when we were investigating storage options at a previous
job we talked to someone who deals with RS/6000 storage. He had a bunch
of info about their serial controller protocol (which I can't think of
the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
saturating even a 160MB SCSI bus with only 2 or 3 drives.

People are finally realizing how important bandwidth has become in
modern machines. Memory bandwidth is why RS/6000 was (and maybe still
is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
the water. Likewise it's why SCSI is so much better than IDE (unless you
just give each drive it's own dedicated bandwidth).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Jim C. Nasby

On Wed, Apr 06, 2005 at 06:35:10PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
  Can anyone suggest a more general rule?  Do we need for example to
  consider whether the relation membership is the same in two clauses
  that might be opposite sides of a range restriction?  It seems like
  
  a.x  b.y AND a.x  b.z
 
  In a case like this, you could actually look at the  data in b and see
  what the average range size is.
 
 Not with the current statistics --- you'd need some kind of cross-column
 statistics involving both y and z.  (That is, I doubt it would be
 helpful to estimate the average range width by taking the difference of
 independently-calculated mean values of y and z ...)  But yeah, in
 principle it would be possible to make a non-default estimate.

Actually, it might be possible to take a SWAG at it using the histogram
and correlation stats.

You know... since getting universally useful cross-platform stats seems
to be pretty pie-in-the-sky, would it be possible to generate more
complex stats on the fly from a sampling of a table? If you're looking
at a fairly sizeable table ISTM it would be worth sampling the rows on
10 or 20 random pages to see what you get. In this case, you'd want to
know the average difference between two fields. Other queries might want
something different.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Any way to speed this up?

2005-04-09 Thread Jim C. Nasby

2 things to point out from this last run:

50% of the time is taken scanning tblassociate 
-  Seq Scan on tblassociate a  (cost=0.00..38388.79 rows=199922 width=53) 
(actual time=62.000..10589.000 rows=176431 loops=1)
  Filter: ((clientnum)::text = 'SAKS'::text)

If you had an index on clientnum and didn't cast it to text in the view,
you might be able to use an indexscan, which could be faster (depends on
how big the table actually is).

This sort is taking about 25% of the time:
  -  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual 
time=19641.000..19955.000 rows=159960 loops=1)
Sort Key: a.locationid
-  Merge Right Join  (cost=60850.40..62453.22 rows=199922 width=75) 
(actual time=13500.000..14734.000 rows=176431 loops=1)

I suspect it shouldn't take 5 seconds to sort 160k rows in memory, and
that this sort is spilling to disk. If you increase your working memory
the sort might fit entirely in memory. As a quick test, you could set
working memory to 80% of system memory and see how that changes the
speed. But you wouldn't want to set it that high in production.

On Thu, Apr 07, 2005 at 01:14:33PM -0400, Joel Fradkin wrote:
 Here is the result after putting it back to 4 the original value (I had done
 that prior to your suggestion of using 2 or 3) to see what might change.
 I also vacummed and thought I saw records deleted in associate, which I
 found odd as this is a test site and no new records were added or deleted.
 
 Merge Join  (cost=86788.09..87945.00 rows=10387 width=112) (actual
 time=19703.000..21154.000 rows=159959 loops=1)
   Merge Cond: (outer.locationid = inner.locationid)
   -  Sort  (cost=1245.50..1246.33 rows=332 width=48) (actual
 time=62.000..62.000 rows=441 loops=1)
 Sort Key: l.locationid
 -  Index Scan using ix_location on tbllocation l
 (cost=0.00..1231.60 rows=332 width=48) (actual time=15.000..62.000 rows=441
 loops=1)
   Index Cond: ('SAKS'::text = (clientnum)::text)
   -  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual
 time=19641.000..19955.000 rows=159960 loops=1)
 Sort Key: a.locationid
 -  Merge Right Join  (cost=60850.40..62453.22 rows=199922
 width=75) (actual time=13500.000..14734.000 rows=176431 loops=1)
   Merge Cond: ((outer.id = inner.jobtitleid) AND
 (outer.?column4? = inner.?column10?))
   -  Sort  (cost=554.11..570.13 rows=6409 width=37) (actual
 time=94.000..94.000 rows=6391 loops=1)
 Sort Key: jt.id, (jt.clientnum)::text
 -  Seq Scan on tbljobtitle jt  (cost=0.00..148.88
 rows=6409 width=37) (actual time=0.000..63.000 rows=6391 loops=1)
   Filter: (1 = presentationid)
   -  Sort  (cost=60296.29..60796.09 rows=199922 width=53)
 (actual time=13406.000..13859.000 rows=176431 loops=1)
 Sort Key: a.jobtitleid, (a.clientnum)::text
 -  Seq Scan on tblassociate a  (cost=0.00..38388.79
 rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1)
   Filter: ((clientnum)::text = 'SAKS'::text)
 Total runtime: 22843.000 ms
 
 Joel Fradkin
  
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, April 07, 2005 11:43 AM
 To: Joel Fradkin
 Cc: 'PostgreSQL Perform'
 Subject: Re: [PERFORM] Any way to speed this up? 
 
 Joel Fradkin [EMAIL PROTECTED] writes:
  random_page_cost = 1.2#4# units are one sequential page
  fetch cost
 
 That is almost certainly overoptimistic; it's causing the planner to
 use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.
 
   regards, tom lane
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Functionscan estimates

2005-04-09 Thread Jim C. Nasby

On Sat, Apr 09, 2005 at 12:00:56AM -0400, Tom Lane wrote:
 Not too many releases ago, there were several columns in pg_proc that
 were intended to support estimation of the runtime cost and number of
 result rows of set-returning functions.  I believe in fact that these
 were the remains of Joe Hellerstein's thesis on expensive-function
 evaluation, and are exactly what he was talking about here:
 http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php
 
 But with all due respect to Joe, I think the reason that stuff got
 trimmed is that it didn't work very well.  In most cases it's
 *hard* to write an estimator for a SRF.  Let's see you produce
 one for dblink() for instance ...

Actually, if the remote database supported a way to get a rows estimate
from the query passed to db_link, it would be trivial, since you'd just
pass that back.

In fact, having such a function (estimate_rows_for_sql(text)) would
probably be very useful to functions that wanted to support returning a
rows estimate.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Compressing WAL

2005-04-13 Thread Jim C. Nasby

On Sun, Apr 10, 2005 at 09:12:41PM -0400, Bruce Momjian wrote:
 I have never heard anyone talk about it, but it seems useful.  I think
 compressing the page images written on first page modification since
 checkpoint would be a big win.

Could you clarify that? Maybe I'm being naive, but it seems like you
could just put a compression routine between the log writer and the
filesystem.

 Is this a TODO?

ISTM it's at least worth hacking something together and doing some
performance testing...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby

On Thu, Apr 14, 2005 at 10:51:46AM -0500, Matthew Nuzum wrote:
 So if you all were going to choose between two hard drives where:
 drive A has capacity C and spins at 15K rpms, and
 drive B has capacity 2 x C and spins at 10K rpms and
 all other features are the same, the price is the same and C is enough
 disk space which would you choose?
 
 I've noticed that on IDE drives, as the capacity increases the data
 density increases and there is a pereceived (I've not measured it)
 performance increase.
 
 Would the increased data density of the higher capacity drive be of
 greater benefit than the faster spindle speed of drive A?

The increased data density will help transfer speed off the platter, but
that's it. It won't help rotational latency.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby

On Mon, Apr 18, 2005 at 07:41:49PM +0200, Jacques Caron wrote:
 It would be interesting to actually compare this to real-world (or 
 nearly-real-world) benchmarks to measure the effectiveness of features like 
 TCQ/NCQ etc.

I was just thinking that it would be very interesting to benchmark
different RAID configurations using dbt2. I don't know if this is
something that the lab is setup for or capable of, though.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby

On Mon, Apr 18, 2005 at 10:20:36AM -0500, Dave Held wrote:
 Hmm...so you're saying that at some point, quantity beats quality?
 That's an interesting point.  However, it presumes that you can
 actually distribute your data over a larger number of drives.  If
 you have a db with a bottleneck of one or two very large tables,
 the extra spindles won't help unless you break up the tables and
 glue them together with query magic.  But it's still a point to
 consider.

Huh? Do you know how RAID10 works?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby

On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote:
 Don't you think optimal stripe width would be
 a good question to research the binaries for? I'd
 think that drives the answer, largely.  (uh oh, pun alert)
 
 EG, oracle issues IO requests (this may have changed _just_ 
 recently) in 64KB chunks, regardless of what you ask for. 
 So when I did my striping (many moons ago, when the Earth 
 was young...) I did it in 128KB widths, and set the oracle 
 multiblock read count according. For oracle, any stripe size
 under 64KB=stupid, anything much over 128K/258K=wasteful. 
 
 I am eager to find out how PG handles all this. 

AFAIK PostgreSQL requests data one database page at a time (normally
8k). Of course the OS might do something different.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby

On Tue, Apr 19, 2005 at 11:22:17AM -0500, [EMAIL PROTECTED] wrote:
 
 
 [EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM:
 
  What is 'multiple initiators' used for in the real world?
 
 I asked this same question and got an answer off list:  Somebody said their
 SAN hardware used multiple initiators.  I would try to check the archives
 for you, but this thread is becoming more of a rope.
 
 Multiple initiators means multiple sources on the bus issuing I/O
 instructions to the drives.  In theory you can have two computers on the
 same SCSI bus issuing I/O requests to the same drive, or to anything else
 on the bus, but I've never seen this implemented.  Others have noted this
 feature as being a big deal, so somebody is benefiting from it.

It's a big deal for Oracle clustering, which relies on shared drives. Of
course most people doing Oracle clustering are probably using a SAN and
not raw SCSI...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] immutable functions vs. join for lookups ?

2005-04-19 Thread Jim C. Nasby

You should re-run the function test using SQL as the function language
instead of plpgsql. There might be some performance to be had there.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] How to improve postgres performace

2005-04-19 Thread Jim C. Nasby

On Mon, Apr 18, 2005 at 11:36:01AM -0300, Rodrigo Moreno wrote:
 I really worried about that, because it's no enough anymore, and users claim
 about performace. But running the vacuumdb full, everthing starts to run
 better again, so i think the problem is not related to a specific query.

Vacuum full will skew your results, unless you plan on running vacuum
full all the time. This is because you will always have some amount of
dead tuples in a table that has any update or delete activity. A regular
vacuum doesn't remove these tuples, it just marks them as available. So
over time, depending on how frequently a table is vacuumed, it will
settle down to a steady-state size that is greater than it's size after
a vacuum full.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sort and index

2005-04-19 Thread Jim C. Nasby

On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote:
 Since you are fetching the entire table, you are touching all the rows.
 If the query were to fetch the rows in index order, it would be seeking
 all over the table's tracks.  By fetching in sequence order, it has a
 much better chance of fetching rows in a way that minimizes head seeks.
 Since disk I/O is generally 10-100x slower than RAM, the in-memory sort 
 can be surprisingly slow and still beat indexed disk access.  Of course,
 this is only true if the table can fit and be sorted entirely in memory
 (which, with 1500 rows, probably can).

Actually, the planner (at least in 7.4) isn't smart enough to consider
if the sort would fit in memory or not. I'm running a test right now to
see if it's actually faster to use an index in this case.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] What to do with 6 disks?

2005-04-19 Thread Jim C. Nasby

http://stats.distributed.net is setup with the OS, WAL, and temp on a
RAID1 and the database on a RAID10. The drives are 200G SATA with a
3ware raid card. I don't think the controller has battery-backed cache,
but I'm not sure. In any case, it's almost never disk-bound on the
mirror; when it's disk-bound it's usually the RAID10. But this is a
read-mostly database. If it was write-heavy, that might not be the case.

Also, in general, I see very little disk activity from the OS itself, so
I don't think there's a large disadvantage to having it on the same
drives as part of your database. I would recommend different filesystems
for each, though. (ie: not one giant / partition)

On Tue, Apr 19, 2005 at 06:00:42PM -0700, Jeff Frost wrote:
 Now that we've hashed out which drives are quicker and more money equals 
 faster...
 
 Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid 
 option would you use for a standalone postgres server?
 
 a) 3xRAID1 - 1 for data, 1 for xlog, 1 for os?
 b) 1xRAID1 for OS/xlog, 1xRAID5 for data
 c) 1xRAID10 for OS/xlong/data
 d) 1xRAID1 for OS, 1xRAID10 for data
 e) .
 
 I was initially leaning towards b, but after talking to Josh a bit, I 
 suspect that with only 4 disks the raid5 might be a performance detriment 
 vs 3 raid 1s or some sort of split raid10 setup.
 
 -- 
 Jeff Frost, Owner [EMAIL PROTECTED]
 Frost Consulting, LLC http://www.frostconsultingllc.com/
 Phone: 650-780-7908   FAX: 650-649-1954
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faq
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Jim C. Nasby

A friend of mine has an application where he's copying in 4000 rows at a
time into a table that has about 4M rows. Each row is 40-50 bytes. This
is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk
SATA mirror, running FBSD 4.10-stable. There's one index on the table.

What's really odd is that neither the CPU or the disk are being
hammered. The box appears to be pretty idle; the postgresql proces is
using 4-5% CPU.

I seem to recall others running into this before, but I can't remember
what the issue was and I can't find it in the archives.

This is version 8.0, btw.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sort and index

2005-04-19 Thread Jim C. Nasby

On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Actually, the planner (at least in 7.4) isn't smart enough to consider
  if the sort would fit in memory or not.
 
 Really?  Have you read cost_sort()?
 
 It's certainly possible that the calculation is all wet, but to claim
 that the issue is not considered is just wrong.

To be fair, no, I haven't looked at the code. This is based strictly on
anecdotal evidence on a 120M row table. I'm currently running a test to
see how an index scan compares to a seqscan. I also got the same results
when I added a where clause that would restrict it to about 7% of the
table.

Actually, after running some tests (below), the plan cost does change
when I change sort_mem (it was originally 5).

stats=# \d email_contrib
   Table public.email_contrib
   Column   |  Type   | Modifiers 
+-+---
 project_id | integer | not null
 id | integer | not null
 date   | date| not null
 team_id| integer | 
 work_units | bigint  | not null
Indexes:
email_contrib_pkey primary key, btree (project_id, id, date)
email_contrib__pk24 btree (id, date) WHERE (project_id = 24)
email_contrib__pk25 btree (id, date) WHERE (project_id = 25)
email_contrib__pk8 btree (id, date) WHERE (project_id = 8)
email_contrib__project_date btree (project_id, date)
Foreign-key constraints:
fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) ON 
UPDATE CASCADE
fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES 
stats_team(team) ON UPDATE CASCADE

stats=# explain select * from email_contrib where project_id=8 order by 
project_id, id, date;
   QUERY PLAN   

 Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
   Sort Key: project_id, id, date
   -  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
 Filter: (project_id = 8)
(4 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
QUERY PLAN  
  
--
 Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
   Sort Key: project_id, id, date
   -  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
width=24)
(3 rows)

stats=# select 8862263::float/130969400;
  ?column?  

 0.067687027657
(1 row)

stats=# explain select * from email_contrib where project_id=8 order by 
project_id, id, date;
 QUERY PLAN 
 
-
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 
rows=8862263 width=24)
   Index Cond: (project_id = 8)
(2 rows)

stats=# explain select * from email_contrib order by project_id, id, date;
   QUERY PLAN   
 
-
 Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 
rows=130969400 width=24)
(1 row)

stats=# set enable_seqscan=on;
SET
stats=# set sort_mem=1000;
SET
stats=# explain select * from email_contrib order by project_id, id, date;
QUERY PLAN  
  
--
 Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
   Sort Key: project_id, id, date
   -  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
width=24)
(3 rows)

stats=# 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-20 Thread Jim C. Nasby

No, this is a single process. And there's known issues with context
storms on Xeons, so that might be what you're seeing.

On Tue, Apr 19, 2005 at 09:37:21PM -0700, Mischa Sandberg wrote:
 Quoting Tom Lane [EMAIL PROTECTED]: 
  
  Jim C. Nasby [EMAIL PROTECTED] writes: 
   A friend of mine has an application where he's copying in 4000 rows at a 
   time into a table that has about 4M rows. Each row is 40-50 bytes. This 
   is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk 
   SATA mirror, running FBSD 4.10-stable. There's one index on the table. 
   
  If there's no hidden costs such as foreign key checks, that does seem 
  pretty dang slow. 
   
   What's really odd is that neither the CPU or the disk are being 
   hammered. The box appears to be pretty idle; the postgresql proces is 
   using 4-5% CPU. 
 --  
 This sounds EXACTLY like my problem, if you make the box to a Xeon 2.4GHz, 
 2GB 
 RAM ... with two SCSI drives (xlog and base); loading 10K rows of about 200 
 bytes each; takes about 20 secs at the best, and much longer at the worst. By 
 any chance does your friend have several client machines/processes trying to 
 mass-load rows at the same time? Or at least some other processes updating 
 that table in a bulkish way? What I get is low diskio, low cpu, even low 
 context-switches ... and I'm betting he should take a look at pg_locks. For 
 my 
 own problem, I gather that an exclusive lock is necessary while updating 
 indexes and heap, and the multiple processes doing the update can make that 
 pathological. 
  
 Anyway, have your friend check pg_locks. 
  
  
 Dreams come true, not free. -- S.Sondheim, ITW 
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-20 Thread Jim C. Nasby

No, he's using either COPY or \COPY.

On Wed, Apr 20, 2005 at 12:34:27AM -0400, Greg Stark wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
 
  What's really odd is that neither the CPU or the disk are being
  hammered. The box appears to be pretty idle; the postgresql proces is
  using 4-5% CPU.
 
 Is he committing every row? In that case you would see fairly low i/o
 bandwidth usage because most of the time is being spent seeking and waiting
 for rotational latency.
 
 -- 
 greg
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-22 Thread Jim C. Nasby

Hrm... I was about to suggest that for timing just the query (and not
output/data transfer time) using explain analyze, but then I remembered
that explain analyze can incur some non-trivial overhead with the timing
calls. Is there a way to run the query but have psql ignore the output?
If so, you could use \timing.

In any case, it's not valid to use pgadmin to time things.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Sort and index

2005-04-22 Thread Jim C. Nasby

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.

Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
 On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
  Jim C. Nasby [EMAIL PROTECTED] writes:
   Actually, the planner (at least in 7.4) isn't smart enough to consider
   if the sort would fit in memory or not.
  
  Really?  Have you read cost_sort()?
  
  It's certainly possible that the calculation is all wet, but to claim
  that the issue is not considered is just wrong.
 
 To be fair, no, I haven't looked at the code. This is based strictly on
 anecdotal evidence on a 120M row table. I'm currently running a test to
 see how an index scan compares to a seqscan. I also got the same results
 when I added a where clause that would restrict it to about 7% of the
 table.
 
 Actually, after running some tests (below), the plan cost does change
 when I change sort_mem (it was originally 5).
 
 stats=# \d email_contrib
Table public.email_contrib
Column   |  Type   | Modifiers 
 +-+---
  project_id | integer | not null
  id | integer | not null
  date   | date| not null
  team_id| integer | 
  work_units | bigint  | not null
 Indexes:
 email_contrib_pkey primary key, btree (project_id, id, date)
 email_contrib__pk24 btree (id, date) WHERE (project_id = 24)
 email_contrib__pk25 btree (id, date) WHERE (project_id = 25)
 email_contrib__pk8 btree (id, date) WHERE (project_id = 8)
 email_contrib__project_date btree (project_id, date)
 Foreign-key constraints:
 fk_email_contrib__id FOREIGN KEY (id) REFERENCES stats_participant(id) 
 ON UPDATE CASCADE
 fk_email_contrib__team_id FOREIGN KEY (team_id) REFERENCES 
 stats_team(team) ON UPDATE CASCADE
 
 stats=# explain select * from email_contrib where project_id=8 order by 
 project_id, id, date;
QUERY PLAN 
   
 
  Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
Sort Key: project_id, id, date
-  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 
 width=24)
  Filter: (project_id = 8)
 (4 rows)
 
 stats=# explain select * from email_contrib order by project_id, id, date;
 QUERY PLAN
 
 --
  Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
Sort Key: project_id, id, date
-  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
 width=24)
 (3 rows)
 
 stats=# select 8862263::float/130969400;
   ?column?  
 
  0.067687027657
 (1 row)
 
 stats=# explain select * from email_contrib where project_id=8 order by 
 project_id, id, date;
  QUERY PLAN   

 -
  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 
 rows=8862263 width=24)
Index Cond: (project_id = 8)
 (2 rows)
 
 stats=# explain select * from email_contrib order by project_id, id, date;
QUERY PLAN 

 -
  Index Scan using email_contrib_pkey on email_contrib  
 (cost=0.00..100055905.62 rows=130969400 width=24)
 (1 row)
 
 stats=# set enable_seqscan=on;
 SET
 stats=# set sort_mem=1000;
 SET
 stats=# explain select * from email_contrib order by project_id, id, date;
 QUERY PLAN
 
 --
  Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
Sort Key: project_id, id, date
-  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
 width=24)
 (3 rows)
 
 stats=# 
 
 -- 
 Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
 Give your computer some brain candy! www.distributed.net Team #1828
 
 Windows: Where do you

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-22 Thread Jim C. Nasby

On Fri, Apr 22, 2005 at 05:04:19PM -0400, Joel Fradkin wrote:
 And is the system always going to be used by only one user?  
 No we have 400+ concurrent users
 
 I guess the question is if this big select is representative of the load you
 expect in production.
 Yes we see many time on the two processor box running MSSQL large return
 sets using 100%cpu for 5-30 seconds.
 
 What happens if you execute the query more times?  Do the times stay the
 same as the second run?
 I will definitely have to pressure testing prior to going live in
 production. I have not done concurrent tests as honestly single user tests
 are failing, so multiple user testing is not something I need yet.

I would very, very strongly encourage you to run multi-user tests before
deciding on mysql. Mysql is nowhere near as capable when it comes to
concurrent operations as PostgreSQL is. From what others have said, it
doesn't take many concurrent operations for it to just fall over. I
can't speak from experience because I avoid mysql like the plague,
though. :)

Likewise, MSSQL will probably look better single-user than it will
multi-user. Unless you're going to only access the database single-user,
it's just not a valid test case (and by the way, this is true no matter
what database you're looking at. Multiuser access is where you uncover
your real bottlenecks.)
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Interesting numbers on a CREATE INDEX

2005-04-22 Thread Jim C. Nasby

Building a single-column index on a dual opteron with 4G of memory, data
on a 4 SATA RAID10; OS, logs and tempsace on a SATA mirror, with
sort_mem set to 2.5G, create index is actually CPU bound for large
portions of time. The postgresql process and system time are accounting
for an entire CPU, and systat (this is a FreeBSD5.2 box) is generally
showing 80% utilization on the RAID10 and 40% on the mirror.

Not a performance problem, but I thought some people might be
interested. The RAID10 is doing about 28-32MB/s, I would think this
wouldn't be enough to swamp the CPU but I guess I would be thinking
wrong.

BTW, the column I'm indexing is a bigint with a low correlation.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Sort and index

2005-04-22 Thread Jim C. Nasby

On Fri, Apr 22, 2005 at 10:08:06PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  I've run some performance tests. The actual test case is at
  http://stats.distributed.net/~decibel/timing.sql, and the results are at
  http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
  an index scan appears to be about 2x faster than a sequential scan and a
  sort.
 
 ... for one test case, on one platform, with a pretty strong bias to the
 fully-cached state since you ran the test multiple times consecutively.

The table is 6.5G and the box only has 4G, so I suspect it's not cached.

 Past experience has generally been that an explicit sort is quicker,
 so you'll have to pardon me for suspecting that this case may be
 atypical.  Is the table nearly in order by pkey, by any chance?

It might be, but there's no way I can check with a multi-key index,
right?

I'll re-run the tests with a single column index on a column with a
correlation of 16%

  In any case, it's clear that the planner is making the wrong choice
  here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.
 
 Feel free to propose better cost equations.

Where would I look in code to see what's used now?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Index bloat problem?

2005-04-22 Thread Jim C. Nasby

You would be interested in
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00565.php

On Thu, Apr 21, 2005 at 03:33:05PM -0400, Dave Chapeskie wrote:
 On Thu, Apr 21, 2005 at 11:28:43AM -0700, Josh Berkus wrote:
  Michael,
  
   Every five minutes, DBCC INDEXDEFRAG will report to the user an
   estimated percentage completed. DBCC INDEXDEFRAG can be terminated at
   any point in the process, and *any completed work is retained.*
  
  Keen.  Sounds like something for our TODO list.
  
  -- 
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 See http://archives.postgresql.org/pgsql-general/2005-03/msg01465.php
 for my thoughts on a non-blocking alternative to REINDEX.  I got no
 replies to that message. :-(
 
 
 I've almost got a working solution integrated in the backend that does
 correct WAL logging and everything.  (Writing the code to write and
 replay WAL logs for complicated operations can be very annoying!)
 
 For now I've gone with a syntax of:
 
   REINDEX INDEX btree_index_name INCREMENTAL;
 
 (For now it's not a proper index AM (accessor method), instead the
 generic index code knows this is only supported for btrees and directly
 calls the btree_compress function.)
 
 It's not actually a REINDEX per-se in that it doesn't rebuild the whole
 index.  It holds brief exclusive locks on the index while it shuffles
 items around to pack the leaf pages fuller.  There were issues with the
 code I attached to the above message that have been resolved with the
 new code.  With respect to the numbers provided in that e-mail the new
 code also recycles more pages than before.
 
 Once I've finished it up I'll prepare and post a patch.
 
 -- 
 Dave Chapeskie
 OpenPGP Key ID: 0x3D2B6B34
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Sort and index

2005-04-24 Thread Jim C. Nasby

On Sat, Apr 23, 2005 at 01:00:40AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Feel free to propose better cost equations.
 
  Where would I look in code to see what's used now?
 
 All the gold is hidden in src/backend/optimizer/path/costsize.c.
 
   regards, tom lane

After setting up a second test that orders the table by a highly
non-correlated column, I think I've found part of the problem. The
estimated index scan cost for (project_id, id, date) is
0.00..100117429.34 while the estimate for work_units is
0.00..103168408.62; almost no difference, even though project_id
correlation is .657 while work_units correlation is .116. This is with
random_page_cost set to 1.1; if I set it much higher I can't force the
index scan (BTW, would it make more sense to set the cost of a disable
seqscan to either pages or tuples * disable_cost?), but even with only a
10% overhead on random page fetches it seems logical that the two
estimates should be much farther apart. If you look at the results of
the initial run (http://stats.distributed.net/~decibel/timing.log),
you'll see that the cost of the index scan is way overestimated. Looking
at the code, the runcost is calculated as

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

where csquared is indexCorrelation^2. Why is indexCorrelation squared?
The comments say a linear interpolation between min_IO and max_IO is
used, but ISTM that if it was linear then instead of csquared,
indexCorrelation would just be used.

By the way, I'm running a test for ordering by work_units right now, and
I included code to allocate and zero 3.3G of memory (out of 4G) between
steps to clear the kernel buffers. This brought the seqscan times up to
~6800 seconds, so it seems there was in fact buffering going on in the
first test. The second test has been running an index scan for over 14
hours now, so clearly a seqscan+sort is the way to go for a highly
uncorrelated index (at least one that won't fit in
effective_cache_size).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-27 Thread Jim C. Nasby

BTW, http://stats.distributed.net/~decibel/base.log is a test I ran;
select count(*) was ~6x faster than explain analyze select *.

On Tue, Apr 26, 2005 at 07:46:52PM -0700, Kevin Brown wrote:
 Josh Berkus wrote:
  Jim, Kevin,
  
Hrm... I was about to suggest that for timing just the query (and not
output/data transfer time) using explain analyze, but then I remembered
that explain analyze can incur some non-trivial overhead with the timing
calls. Is there a way to run the query but have psql ignore the output?
If so, you could use \timing.
  
   Would timing SELECT COUNT(*) FROM (query) work?
  
  Just \timing would work fine; PostgreSQL doesn't return anything until it 
  has 
  the whole result set.  
 
 Hmm...does \timing show the amount of elapsed time between query start
 and the first results handed to it by the database (even if the
 database itself has prepared the entire result set for transmission by
 that time), or between query start and the last result handed to it by
 the database?
 
 Because if it's the latter, then things like server-client network
 bandwidth are going to affect the results that \timing shows, and it
 won't necessarily give you a good indicator of how well the database
 backend is performing.  I would expect that timing SELECT COUNT(*)
 FROM (query) would give you an idea of how the backend is performing,
 because the amount of result set data that has to go over the wire is
 trivial.
 
 Each is, of course, useful in its own right, and you want to be able
 to measure both (so, for instance, you can get an idea of just how
 much your network affects the overall performance of your queries).
 
 
  That's why MSSQL vs. PostgreSQL timing comparisons are 
  deceptive unless you're careful:  MSSQL returns the results on block at a 
  time, and reports execution time as the time required to return the *first* 
  block, as opposed to Postgres which reports the time required to return the 
  whole dataset.
 
 Interesting.  I had no idea MSSQL did that, but I can't exactly say
 I'm surprised.  :-)
 
 
 -- 
 Kevin Brown [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Kernel Resources and max_connections

2005-05-03 Thread Jim C. Nasby

On Wed, May 04, 2005 at 01:46:34PM +1200, Mark Kirkwood wrote:
 (This not-very-clear distinction between what is sysctl'abe and what is 
 a kernel tunable is a bit of a downer).

I think this is documented somewhere, though I can't think of where
right now.

Also, note that some sysctl's can only be set in /boot/loader.conf.
hw.ata.wc=0 is an example (which you want to set on any box with IDE
drives if you want fsync to actually do what it thinks it's doing).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] COPY vs INSERT

2005-05-06 Thread Jim C. Nasby

On Wed, May 04, 2005 at 10:22:56PM -0400, Tom Lane wrote:
 Also, there is a whole lot of one-time-per-statement overhead that can
 be amortized across many rows instead of only one.  Stuff like opening
 the target table, looking up the per-column I/O conversion functions,
 identifying trigger functions if any, yadda yadda.  It's not *that*
 expensive, but compared to an operation as small as inserting a single
 row, it's significant.

Has thought been given to supporting inserting multiple rows in a single
insert? DB2 supported:

INSERT INTO table VALUES(
(1,2,3),
(4,5,6),
(7,8,9)
);

I'm not sure how standard that is or if other databases support it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] COPY vs INSERT

2005-05-08 Thread Jim C. Nasby

On Fri, May 06, 2005 at 09:30:46AM +0200, Dennis Bjorklund wrote:
 The sql standard include this, except that you can not have the outer ().
 So it should be
 
 INSERT INTO table VALUES
 (1,2,3),
 (4,5,6),
 (7,8,9);
 
 Do DB2 demand these extra ()?

My recollection is that it does, but it's been a few years...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL

2005-05-09 Thread Jim C. Nasby

On Tue, May 10, 2005 at 02:38:41AM +1000, Neil Conway wrote:
 Jim C. Nasby wrote:
 Having indexes that people shouldn't be using does add confusion for
 users, and presents the opportunity for foot-shooting.
 
 Emitting a warning/notice on hash-index creation is something I've 
 suggested in the past -- that would be fine with me.

Probably not a bad idea.

 Even if there is some kind of advantage (would they possibly speed up
 hash joins?)
 
 No, hash joins and hash indexes are unrelated.

I know they are now, but does that have to be the case? Like I said, I
don't know the history, so I don't know why we even have them to begin
with.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL

2005-05-10 Thread Jim C. Nasby

On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote:
 Jim C. Nasby wrote:
  No, hash joins and hash indexes are unrelated.
 I know they are now, but does that have to be the case?
 
 I mean, the algorithms are fundamentally unrelated. They share a bit of 
 code such as the hash functions themselves, but they are really solving 
 two different problems (disk based indexing with (hopefully) good 
 concurrency and WAL logging vs. in-memory joins via hashing with spill 
 to disk if needed).

Well, in a hash-join right now you normally end up feeding at least one
side of the join with a seqscan. Wouldn't it speed things up
considerably if you could look up hashes in the hash index instead? That
way you can eliminate going to the heap for any hashes that match. Of
course, if limited tuple visibility info was added to hash indexes
(similar to what I think is currently happening to B-tree's), many of
the heap scans could be eliminated as well. A similar method could also
be used for hash aggregates, assuming they use the same hash.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL

2005-05-10 Thread Jim C. Nasby

On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote:
 be responsive to your search.)  (This also brings up the thought that
 it might be interesting to support hash buckets smaller than a page ...
 but I don't know how to make that work in an adaptive fashion.)

IIRC, other databases that support hash indexes also allow you to define
the bucket size, so it might be a good start to allow for that. DBA's
usually have a pretty good idea of what a table will look like in
production, so if there's clear documentation on the effect of bucket
size a good DBA should be able to make a good decision.

What's the challange to making it adaptive, comming up with an algorithm
that gives you the optimal bucket size (which I would think there's
research on...) or allowing the index to accommodate different bucket
sizes existing in the index at once? (Presumably you don't want to
re-write the entire index every time it looks like a different bucket
size would help.)
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] [GENERAL] Hash index vs. b-tree index (PostgreSQL

2005-05-10 Thread Jim C. Nasby

On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  What's the challange to making it adaptive, comming up with an algorithm
  that gives you the optimal bucket size (which I would think there's
  research on...) or allowing the index to accommodate different bucket
  sizes existing in the index at once? (Presumably you don't want to
  re-write the entire index every time it looks like a different bucket
  size would help.)
 
 Exactly.  That's (a) expensive and (b) really hard to fit into the WAL
 paradigm --- I think we could only handle it as a REINDEX.  So if it
 were adaptive at all I think we'd have to support multiple bucket sizes
 existing simultaneously in the index, and I do not see a good way to do
 that.

I'm not really familiar enough with hash indexes to know if this would
work, but if the maximum bucket size was known you could use that to
determine a maximum range of buckets to look at. In some cases, that
range would include only one bucket, otherwise it would be a set of
buckets. If you found a set of buckets, I think you could then just go
to the specific one you need.

If we assume that the maximum bucket size is one page it becomes more
realistic to take an existing large bucket and split it into several
smaller ones. This could be done on an update to the index page, or a
background process could handle it.

In any case, should this go on the TODO list?

 Allowing a bucket size to be specified at CREATE INDEX doesn't seem out
 of line though.  We'd have to think up a scheme for index-AM-specific
 index parameters ...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] PGSQL Capacity

2005-05-10 Thread Jim C. Nasby

http://stats.distributed.net has a table that's 130M rows.
http://stats.distributed.net/participant/phistory.php?project_id=8id=39622
is a page that hits that table, and as you can see it's quite fast. This
is on a dual opteron with 4G of memory.

Unless you're looking for sub millisecond response times, 50k rows is
nothing.

On Mon, May 09, 2005 at 09:32:18PM +0200, Steinar H. Gunderson wrote:
 On Mon, May 09, 2005 at 09:22:40PM +0200, [EMAIL PROTECTED] wrote:
  How can i know a capacity of a pg database ?
  How many records my table can have ?
  I saw in a message that someone have 50 000 records it's possible in a 
  table ?
  (My table have 8 string field (length 32 car)).
  Thanks for your response.
 
 You can have several million records in a table easily -- I've done 10
 million personally, but you can find people doing that many records a _day_.
 Hitting 1 billion records should probably not be impossible either -- it all
 depends on your hardware, and perhaps more importantly, what kind of queries
 you're running against it. 5 is absolutely no problem at all.
 
 /* Steinar */
 -- 
 Homepage: http://www.sesse.net/
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Configing 8 gig box.

2005-05-10 Thread Jim C. Nasby

On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote:
 Seems to be only using like 360 meg out of 7 gig free (odd thing is I did
 see some used swap 4k out of 1.9) with a bunch of users (this may be normal,
 but it is not going overly fast so thought I would ask).

This is perfectly normal. Each postgresql backend will only report
memory usage roughly equal to shared_buffers plus the size of the code
(16M or so?). If it's in the middle of a sort or vacuum, it will use
more memory.

It's not uncommon for modern OS's to swap out stuff that's not being
used. They would rather have the memory available for disk caching,
which is normally a good trade-off.

For reference, on a 4G box running FreeBSD, there's currently 18M of
swap used. Postgresql processes typically show 53M of total VM, with
~22M resident. This is with shared buffers of 2000.

 Items I modified per commandprompt.coma nd watching this list etc.
 
  
 
 shared_buffers = 24576
 
 work_mem = 32768
 
 max_fsm_pages = 10
 
 max_fsm_relations = 1500
 
 fsync = true
 
 wal_sync_method = open_sync
 
 wal_buffers = 2048
 
 checkpoint_segments = 100 
 
 effective_cache_size = 524288
 
 default_statistics_target = 250
 
  
 
 Any help is appreciated.
 
  
 
  
 
  
 
 Joel Fradkin
 
  
 
 Wazagua, Inc.
 2520 Trailmate Dr
 Sarasota, Florida 34243
 Tel.  941-753-7111 ext 305
 
  
 
 [EMAIL PROTECTED]
 www.wazagua.com
 Powered by Wazagua
 Providing you with the latest Web-based technology  advanced tools.
 C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
  This email message is for the use of the intended recipient(s) and may
 contain confidential and privileged information.  Any unauthorized review,
 use, disclosure or distribution is prohibited.  If you are not the intended
 recipient, please contact the sender by reply email and delete and destroy
 all copies of the original message, including attachments.
 
  
 
 
  
 
  
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby

On Tue, May 10, 2005 at 07:29:59PM +0200, PFC wrote:
   I wonder how Oracle does it ;)

Oracle *clustering* demands shared storage. So you've shifted your money
from big-iron CPUs to big-iron disk arrays.

Oracle replication works similar to Slony, though it supports a lot more
modes (ie: syncronous).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby

On Tue, May 10, 2005 at 02:55:55PM -0700, Mischa Sandberg wrote:
 just beyond belief, for both updates and queries. At Acxiom, the
 datasets are so large, even after partitioning, that they just
 constantly cycle them through memory, and commands are executes in
 convoys --- sort of like riding a paternoster.

Speaking of which... what's the status of the patch that allows seqscans
to piggyback on already running seqscans on the same table?

 So where should I go look, to see what's been done so far, on a Postgres
 that can treat another PG server as a new table type?

To the best of my knowledge no such work has been done. There is a
project (who's name escapes me) that lets you run queries against a
remote postgresql server from a postgresql connection to a different
server, which could serve as the basis for what you're proposing.

BTW, given your experience, you might want to check out Bizgres.
(http://pgfoundry.org/projects/bizgres/) I'm sure your insights would be
most welcome.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Sort and index

2005-05-11 Thread Jim C. Nasby

First, I've got some updated numbers up at
http://stats.distributed.net/~decibel/

timing2.log shows that the planner actually under-estimates an index
scan by several orders of magnitude. Granted, random_page_cost is set to
an unrealistic 1.1 (otherwise I can't force the index scan), but that
alone isn't enough to explain the difference.

On Wed, May 11, 2005 at 05:59:10PM +0200, Manfred Koizar wrote:
 On Sun, 24 Apr 2005 17:01:46 -0500, Jim C. Nasby [EMAIL PROTECTED]
 wrote:
   Feel free to propose better cost equations.
 
 I did.  More than once.
 
 estimated index scan cost for (project_id, id, date) is
 0.00..100117429.34 while the estimate for work_units is
 0.00..103168408.62; almost no difference,
 
 ~3%
 
  even though project_id correlation is .657
 
 This is divided by the number of index columns, so the index correlation
 is estimated to be 0.219.

That seems like a pretty bad assumption to make.

Is there any eta on having statistics for multi-column indexes?

 you'll see that the cost of the index scan is way overestimated. Looking
 at the code, the runcost is calculated as
 
 run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
 
 where csquared is indexCorrelation^2. Why is indexCorrelation squared?
 The comments say a linear interpolation between min_IO and max_IO is
 used, but ISTM that if it was linear then instead of csquared,
 indexCorrelation would just be used.
 
 In my tests I got much more plausible results with
 
   1 - (1 - abs(correlation))^2

What's the theory behind that?

And I'd still like to know why correlation squared is used.

 Jim, are you willing to experiment with one or two small patches of
 mine?  What version of Postgres are you running?

It depends on the patches, since this is a production machine. Currently
it's running 7.4.*mumble*, though I need to upgrade to 8, which I was
intending to do via slony. Perhaps the best thing would be for me to get
that setup and we can experiment against version 8.0.3.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Jim C. Nasby

On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote:
 For an interesting look at scalability, clustering, caching, etc for a
 large site have a look at how livejournal did it.
 http://www.danga.com/words/2004_lisa/lisa04.pdf
 
 They have 2.6 Million active users, posting 200 new blog entries per
 minute, plus many comments and countless page views.

Neither of which is that horribly impressive. 200 TPM is less than 4TPS.
While I haven't run high transaction rate databases under PostgreSQL, I
suspect others who have will say that 4TPS isn't that big of a deal.

 Although this system is of a different sort to the type I work on it's
 interesting to see how they've made it scale.
 
 They use mysql on dell hardware! And found single master replication did
 not scale.  There's a section on multimaster replication, not sure if
Probably didn't scale because they used to use MyISAM.

 they use it.  The main approach they use is to parition users into
 spefic database clusters.  Caching is done using memcached at the
Which means they've got a huge amount of additional code complexity, not
to mention how many times you can't post something because 'that cluster
is down for maintenance'.

 application level to avoid hitting the db for rendered pageviews.
Memcached is about the only good thing I've seen come out of
livejournal.

 It's interesting that the solution livejournal have arrived at is quite
 similar in ways to the way google is set up.

Except that unlike LJ, google stays up and it's fast. Though granted, LJ
is quite a bit faster than it was 6 months ago.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Sort and index

2005-05-14 Thread Jim C. Nasby

On Thu, May 12, 2005 at 08:54:48PM +0200, Manfred Koizar wrote:
 On Wed, 11 May 2005 16:15:16 -0500, Jim C. Nasby [EMAIL PROTECTED]
 wrote:
  This is divided by the number of index columns, so the index correlation
  is estimated to be 0.219.
 
 That seems like a pretty bad assumption to make.
 
 Any assumption we make without looking at entire index tuples has to be
 bad.  A new GUC variable secondary_correlation introduced by my patch at
 least gives you a chance to manually control the effects of additional
 index columns.

It seems it would be much better to gather statistics on any
multi-column indexes, but I know that's probably beyond what's
reasonable for your patch.

Also, my data (http://stats.distributed.net/~decibel) indicates that
max_io isn't high enough. Look specifically at timing2.log compared to
timing.log. Thouggh, it is possibile that this is because of having
random_page_cost set to 1.1 (if I set it much higher I can't force the
index scan because the index estimate actually exceeds the cost of the
seqscan with the disable cost added in).

 It depends on the patches, since this is a production machine. Currently
 it's running 7.4.*mumble*,
 
 The patch referenced in
 http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is
 still available.  It doesn't touch too many places and should be easy to
 review.  I'm using it and its predecessors in production for more than
 two years.  Let me know, if the 74b1 version does not apply cleanly to
 your source tree.

Looks reasonable; I'll give it a shot on 8.0 once I have replication
happening.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

[PERFORM] Tuning planner cost estimates

2005-05-17 Thread Jim C. Nasby

I've been doing some work to try and identify the actual costs
associated with an index scan with some limited sucess. What's been run
so far can be seen at http://stats.distributed.net/~decibel. But there's
a couple problems. First, I can't use the box exclusively for this
testing, which results in some result inconsistencies. Second, I've been
using a dataset that I can't make public, which means no one else can
run these tests on different hardware.

So what I think would be useful is some way to generate a known dataset,
and then be able to run tests against it on different machines. In the
case of testing index scans, we need to be able to vary correlation,
which so far I've been doing by ordering by different columns. I suspect
it will also be important to test with different tuple sizes. There's
also the question of whether or not the cache should be flushed for each
run or not.

Does this sound like a good way to determine actual costs for index
scans (and hopefully other access methods in the future)? If so, what
would be a good way to implement this?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Tuning planner cost estimates

2005-05-20 Thread Jim C. Nasby

On Thu, May 19, 2005 at 09:31:47AM -0700, Josh Berkus wrote:
  In the
  case of testing index scans, we need to be able to vary correlation,
  which so far I've been doing by ordering by different columns. I suspect
  it will also be important to test with different tuple sizes. There's
  also the question of whether or not the cache should be flushed for each
  run or not.
 
  Does this sound like a good way to determine actual costs for index
  scans (and hopefully other access methods in the future)? If so, what
  would be a good way to implement this?
 
 Well, the problem is that what we need to index scans is a formula, rather 
 than a graph.   The usefulness of benchmarking index scan cost is so that we 

True, but having a graphical representation of how different input
variables (such as correlation) affect runtime is a good way to derive
such a formula, or at least point you in the right direction.

 can test our formula for accuracy and precision.  However, such a formula 
 *does* need to take into account concurrent activity, updates, etc ... that 
 is, it needs to approximately estimate the relative cost on a live database, 
 not a test one.

Well, that raises an interesting issue, because AFAIK none of the cost
estimate functions currently do that. Heck, AFAIK even the piggyback seqscan
code doesn't take other seqscans into account.

Another issue is: what state should the buffers/disk cache be in? In the
thread that kicked all this off Tom noted that my results were skewed
because of caching, so I changed my tests to flush the disk cache as
effectively as I could (by running a program that would consume enough
available memory to just start the box swapping), but I don't think
that's necessarily realistic. Though at least it should preclude the
need to run tests multiple times on an otherwise idle box in order to
'pre-seed' the cache (not that that's any more realistic). If you don't
use one of these techniques you end up with results that depend on what
test was run before the current one...

 This is also going to be a moving target because Tom's in-memory-bitmapping 
 changes relative cost equations.

I thought those all had seperate costing functions...? In any case, if
we have a cost estimation tool it will make it much easier to derive
cost estimation functions.

 I think a first step would be, in fact, to develop a tool that allows us to 
 put EXPLAIN ANALYZE results in a database table.  Without that, there is no 
 possibility of statistical-scale analysis.

Rather than trying to parse all possible output, ISTM it would be much
better if there was a way to access the info directly. Would it be
difficult to have an option that produces output that is a set of
different fields? I'm thinking something like:

Level (basically how far something's indented)
Parent node (what node a child node is feeding)
node_id (some kind of identifier for each step)
operation
(estimate|actual)_(startup|total|rows|width|loops)
other (something to hold index condition, filter, etc)

But ultimately, I'm not sure if this is really required or not, because
I don't see that we need to use explain when running queries. In fact,
it's possibly desireable that we don't, because of the overhead it
incurs. We would want to log an explain (maybe analyze) just to make
sure we knew what the optimizer was doing, but I think we shouldn't need
the info to produce cost estimates.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Tuning planner cost estimates

2005-05-20 Thread Jim C. Nasby

On Fri, May 20, 2005 at 04:47:38PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Thu, May 19, 2005 at 09:31:47AM -0700, Josh Berkus wrote:
  can test our formula for accuracy and precision.  However, such a formula 
  *does* need to take into account concurrent activity, updates, etc ... 
  that 
  is, it needs to approximately estimate the relative cost on a live 
  database,
  not a test one.
 
  Well, that raises an interesting issue, because AFAIK none of the cost
  estimate functions currently do that.
 
 I'm unconvinced that it'd be a good idea, either.  People already
 complain that the planner's choices change when they ANALYZE; if the
 current load factor or something like that were to be taken into account
 then you'd *really* have a problem with irreproducible behavior.
 
 It might make sense to have something a bit more static, perhaps a GUC
 variable that says plan on the assumption that there's X amount of
 concurrent activity.  I'm not sure what scale to measure X on, nor
 exactly how this would factor into the estimates anyway --- but at least
 this approach would maintain reproducibility of behavior.

Or allowing the load of the machine to affect query plans dynamically is
something that could be disabled by default, so presumably if you turn
it on it means you know what you're doing.

Of course this is all academic until we have a means to actually measure
how much system load affects the different things we estimate cost for,
and I don't see that happening until we have a system for measuring how
changing different input variables affects costs.

  Another issue is: what state should the buffers/disk cache be in?
 
 The current cost models are all based on the assumption that every query
 starts from ground zero: nothing in cache.  Which is pretty bogus in
 most real-world scenarios.  We need to think about ways to tune that
 assumption, too.  Maybe this is actually the same discussion, because
 certainly one of the main impacts of a concurrent environment is on what
 you can expect to find in cache.

Well, load doesn't directly effect cache efficiency; it's really a
question of the ratios of how often different things in the database are
accessed. If you wanted to get a crude idea of how likely pages from
some relation are to be in cache, you could take periodic snapshots of
io stats and see what percentage of the IO done in a given time period
was on the relation you're interested in as compared to the rest of the
database. But I think this is probably still a 2nd order effect.

In terms of a testing system, here's what I'm thinking of. For each cost
estimate, there will be a number of input variables we want to vary, and
then check to see how changes in them effect run time. Using index scan
as a simple example, 1st order variables will likely be index and table
size (especially in relation to cache size), and correlation. So, we
need some kind of a test harness that can vary these variables
(prefferably one at a time), and run a test case after each change. It
would then need to store the timing info, possibly along with other
information (such as explain output). If I'm the one to write this it'll
end up in perl, since that's the only language I know that would be able
to accomplish this. DBT seems to be a reasonable test database to do
this testing with, especially since it would provide a ready means to
provide external load.

Does this sound like a reasonable approach? Also, how important do
people think it is to use explain analyze output instead of just doing
SELECT count(*) FROM (query you actually want to test)? (The select
count(*) wrapper is just a means to throw away the results since we
don't really want to worry about data transfer times, etc). The testing
I've done (http://stats.distributed.net/~decibel/base.log) shows explain
analyze to be almost 5x slower than select count(*), so it seems a big
gain if we can avoid that.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Tuning planner cost estimates

2005-05-20 Thread Jim C. Nasby

On Fri, May 20, 2005 at 03:23:16PM -0700, Josh Berkus wrote:
 Jim,
 
  Well, that raises an interesting issue, because AFAIK none of the cost
  estimate functions currently do that. Heck, AFAIK even the piggyback
  seqscan code doesn't take other seqscans into account.
 
 Sure.   But you're striving for greater accuracy, no?
 
 Actually, all that's really needed in the way of concurrent activity is a 
 calculated factor that lets us know how likely a particular object is to be 
 cached, either in the fs cache or the pg cache (with different factors for 
 each presumably) based on history.   Right now, that's based on 
 estimated_cache_size, which is rather innacurate: a table which is queried 
 once a month has the exact same cost factors as one which is queried every 
 2.1 seconds.  This would mean an extra column in pg_stats I suppose.

True, though that's a somewhat different issue that what the load on the
box is (see the reply I just posted). Load on the box (particuarly IO
load) will also play a factor for things; for example, it probably means
seqscans end up costing a lot more than random IO does, because the disk
heads are being sent all over the place anyway.

  But ultimately, I'm not sure if this is really required or not, because
  I don't see that we need to use explain when running queries. In fact,
  it's possibly desireable that we don't, because of the overhead it
  incurs. We would want to log an explain (maybe analyze) just to make
  sure we knew what the optimizer was doing, but I think we shouldn't need
  the info to produce cost estimates.
 
 Well, the problem is that you need to know how much time the index scan took 
 vs. other query steps.   I don't see a way to do this other than an anayze.

True, but that can be done by a seperate seqscan step. I would argue
that doing it that way is actually more accurate, because the overhead
of explain analyze is huge and tends to swamp other factors out. As I
mentioned in my other email, my tests show explain analyze select * from
table is 5x slower than select count(*) from table.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Select performance vs. mssql

2005-05-29 Thread Jim C. Nasby

On Wed, May 25, 2005 at 09:29:36AM +0800, Christopher Kings-Lynne wrote:
 --MSSQL's ability to hit the index only and not having
 to go to the table itself results in a _big_
 performance/efficiency gain. If someone who's in
 development wants to pass this along, it would be a
 nice addition to PostgreSQL sometime in the future.
 I'd suspect that as well as making one query faster,
 it would make everything else faster/more scalable as
 the server load is so much less.
 
 This is well-known and many databases do it.  However, due to MVCC 
 considerations in PostgreSQL, it's not feasible for us to implement it...

Wasn't there a plan to store some visibility info in indexes? IIRC the
idea was that a bit would be set in the index tuple indicating that all
transactions that wouldn't be able to see that index value were
complete, meaning that there was no reason to hit the heap for that
tuple.

I looked on the TODO but didn't see this, maybe it fell through the
cracks?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

1 2 3 4 5 6 7 >

1 - 100 of 602 matches

Mail list logo