Re: [PERFORM] SCSI vs SATA

2007-04-04 Thread Rod Taylor


On 4-Apr-07, at 8:46 AM, Andreas Kostyrka wrote:


* Peter Kovacs <[EMAIL PROTECTED]> [070404 14:40]:
This may be a silly question but: will not 3 times as many disk  
drives

mean 3 times higher probability for disk failure? Also rumor has it
that SATA drives are more prone to fail than SCSI drivers. More
failures will result, in turn, in more administration costs.

Actually, the newest research papers show that all discs (be it
desktops, or highend SCSI) have basically the same failure statistics.

But yes, having 3 times the discs will increase the fault probability.


I highly recommend RAID6 to anyone with more than 6 standard SATA  
drives in a single array. It's actually fairly probable that you will  
lose 2 drives in a 72 hour window (say over a long weekend) at some  
point.



Andreas


Thanks
Peter

On 4/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Tue, 3 Apr 2007, Geoff Tolley wrote:



Ron wrote:

 At 07:07 PM 4/3/2007, Ron wrote:

 For random IO, the 3ware cards are better than PERC

 Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB  
SATA II

 drives?

 Nope.  Not even if the 15K 73GB HDs were the brand new Savvio  
15K

 screamers.

 Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12  
7.2K 320GB
 The 15K's are 2x faster rpm, but they are only ~23% the  
density =>

 advantage per HD to SATAs.
 Then there's the fact that there are 1.5x as many 7.2K  
spindles as 15K

 spindles...

 Oops make that =3x= as many 7.2K spindles as 15K spindles...


I don't think the density difference will be quite as high as  
you seem to
think: most 320GB SATA drives are going to be 3-4 platters, the  
most that a
73GB SCSI is going to have is 2, and more likely 1, which would  
make the
SCSIs more like 50% the density of the SATAs. Note that this  
only really
makes a difference to theoretical sequential speeds; if the  
seeks are random
the SCSI drives could easily get there 50% faster (lower  
rotational latency
and they certainly will have better actuators for the heads).  
Individual 15K

SCSIs will trounce 7.2K SATAs in terms of i/os per second.


true, but with 3x as many drives (and 4x the capacity per drive)  
the SATA

system will have to do far less seeking

for that matter, with 20ish 320G drives, how large would a  
parition be
that only used the outer pysical track of each drive? (almost  
certinly
multiple logical tracks) if you took the time to set this up you  
could
eliminate seeking entirely (at the cost of not useing your  
capacity, but
since you are considering a 12x range in capacity, it's obviously  
not your

primary concern)

If you care about how often you'll have to replace a failed  
drive, then the
SCSI option no question, although check the cases for hot- 
swapability.


note that the CMU and Google studies both commented on being  
surprised at
the lack of difference between the reliability of SCSI and SATA  
drives.


David Lang

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so  
that your

  message can get through to the mailing list cleanly



---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] What you would consider as heavy traffic?

2006-12-29 Thread Rod Taylor
Depends on what the query is. If the queries take 3 to 5 days to  
execute, then 1 query per day on a 4 CPU machine would be at capacity.


On 23-Dec-06, at 3:12 AM, [EMAIL PROTECTED] wrote:


Hey Everyone,

I am having a bit of trouble with a web host, and was wondering as  
what

you would class as a high level of traffic to a database (queries per
second) to an average server running postgres in a shared hosting
environment (very modern servers).

Many Thanks in Advance,
Oliver


---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Rod Taylor
Rebuilding the indexes or integrity confirmations are probably taking  
most of the time.


What is your work_mem setting?

On 22-Dec-06, at 9:32 AM, Sebastián Baioni wrote:


Hi,
We have a database with one table of 10,000,000 tuples and 4 tables  
with 5,000,000 tuples.
While in SQL Server it takes 3 minutes to restore this complete  
database, in PostgreSQL it takes more than 2 hours.
The Backup takes 6 minutes in SQLServer and 13 minutes (which is  
not a problem)


We are running PostgreSQL 8.1 for Windows and we are using:
C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f "C: 
\BK\file.backup" base

and
C:\pg_restore.exe  -i -h localhost -p 5432 -U usuario -d base -O -v  
"C:\BK\file.backup"


We use those parameters because we copied them  from PGAdminIII.

Is there any way to make it faster?

Tanks
 Sebastián
__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar




Re: [PERFORM] When to vacuum a table?

2006-11-26 Thread Rod Taylor


On 26-Nov-06, at 8:11 AM, Steinar H. Gunderson wrote:


On Sun, Nov 26, 2006 at 09:43:11AM -0300, Marcelo Costa wrote:
All its tables constantly manipulated (INSERT, UPDATE, DELETE)  
they need a

VACUUM


Just a minor clarification here: INSERT does not create dead rows,  
only
UPDATE and DELETE do. Thus, if you only insert rows, you do not  
need to

vacuum (although you probably need to analyze).


Not entirely true. An insert & rollback will create dead rows. If you  
attempt and fail a large number of insert transactions then you will  
still need to vacuum.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] autovacuum on a -mostly- r/o table

2006-09-27 Thread Rod Taylor
On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote:
> 
> I have read that autovacuum cannot check to see pg load before
> launching 
> vacuum but is there any patch about it? that would sort out the
> problem 
> in a good and simple way. 

In some cases the solution to high load is to vacuum the tables being
hit the heaviest -- meaning that simply checking machine load isn't
enough to make that decision.

In fact, that high load problem is exactly why autovacuum was created in
the first place.
-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Optimizing DELETE

2006-09-19 Thread Rod Taylor
On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered 
> DELETE) on a trivially small table but with many foreign key references 
> (on similar-sized tables), and I'm waiting for it to finish. It's been 
> 10 minutes now, which seems very excessive for a table of 9000 rows on a 
> 3 GHz desktop machine.

I would guess that a few of those referenced tables are missing indexes
on the referenced column.

> 'top' says it's all spent in USER time, and there's a ~~500KB/s write 
> rate going on. Just before this DELETE, I've deleted data from a larger 
> table (5 rows) using the same method and it finished in couple of 
> seconds - maybe it's a PostgreSQL bug?
> 
> My question is: assuming it's not a bug, how to optimize DELETEs? 
> Increasing work_mem maybe?
> 
> (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)
> 
> (I know about TRUNCATE; I need those foreign key references to cascade)
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread Rod Taylor
> capsa=# explain analyze select name from capsa.flatomfilesysentry
> where objectid in ( select dstobj from capsa.flatommemberrelation
> where srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409');
> 
> 
>QUERY PLAN
> -
>  Nested Loop IN Join  (cost=0.00..1386.45 rows=5809 width=14) (actual
> time=2.933..101467.463 rows=5841 loops=1)
>Join Filter: ("outer".objectid = "inner".dstobj)
>->  Seq Scan on flatomfilesysentry  (cost=0.00..368.09 rows=5809
> width=30) (actual time=0.007..23.451 rows=5844 loops=1)
>->  Seq Scan on flatommemberrelation  (cost=0.00..439.05 rows=5842
> width=16) (actual time=0.007..11.790 rows=2922 loops=5844)

A loop for an IN indicates that you are using a very old version of
PostgreSQL (7.2 or earlier).  Please double check that the server is
8.1.3 as you indicated and not just the client.

>From psql:
select version();

Hmm... Perhaps it is an 8.1.3 server with mergejoin and hashjoin
disabled?
show enable_mergejoin;
show enable_hashjoin;

You can try this query syntax:

select name from capsa.flatomfilesysentry join
capsa.flatommemberrelation on (objectid = dstobj) where srcobj =
'c1c7304a-1fe1-11db-8af7-001143214409';


>  Filter: (srcobj =
> 'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid)
>  Total runtime: 101482.256 ms
> (6 rows)
> 
> capsa=# select count(*) from capsa.flatommemberrelation ;
>  count
> ---
>  11932
> (1 row)
> 
> capsa=# select count(*) from capsa.flatomfilesysentry ;
>  count
> ---
>   5977
> 
> 
> 
> 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] lots of updates on small table

2005-07-14 Thread Rod Taylor
On Thu, 2005-07-14 at 15:08 +1000, Alison Winters wrote:
> Hi,
> 
> Our application requires a number of processes to select and update rows
> from a very small (<10 rows) Postgres table on a regular and frequent
> basis.  These processes often run for weeks at a time, but over the

Are these long running transactions or is the process issuing many short
transactions?

If your transaction lasts a week, then a daily vacuum isn't really doing
anything.

I presume you also run ANALYZE in some shape or form periodically?

> space of a few days we find that updates start getting painfully slow.
> We are running a full vacuum/analyze and reindex on the table every day,

If they're short transactions, run vacuum (not vacuum full) every 100 or
so updates. This might even be once a minute.

Analyze periodically as well.

-- 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Mount database on RAM disk?

2005-07-08 Thread Rod Taylor
> If I could get and deploy some SSD (Solid State Disk) devices that
> would make this sort of thing *actually safe,* I'd expect that to be a
> pretty fabulous improvement, at least for write-heavy database
> activity.

Not nearly as much as you would expect. For the price of the SSD and a
SCSI controller capable of keeping up to the SSD along with your regular
storage with enough throughput to keep up to structure IO, you can
purchase a pretty good mid-range SAN which will be just as capable and
much more versatile.

-- 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance - moving from oracle to postgresql

2005-06-24 Thread Rod Taylor

> There are some immediate questions from our engineers about performance
> 
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.  But...  If we really want to look at
> performance, then we ought to put together a set of benchmarks of some
> typical tasks."
> 
> Is this accurate:
> accoring to
> http://www.postgresql.org/docs/8.0/interactive/queries-limit.html
>   -- " The rows skipped by an OFFSET clause still have to be computed 
> inside the server; therefore a large OFFSET can be inefficient."

Yes. That's accurate. First you need to determine whether PostgreSQLs
method is fast enough for that specific query, and if the performance
gains for other queries (inserts, updates, delete) from reduced index
management evens out your concern. All performance gains through design
changes either increase complexity dramatically or have a performance
trade-off elsewhere.


I find it rather odd that anyone would issue a single one-off select for
0.1% of the data about 99.1% of the way through, without doing anything
with the rest. Perhaps you want to take a look at using a CURSOR?

> Where is psql not appropriate to replace Oracle?

Anything involving reporting using complex aggregates or very long
running selects which Oracle can divide amongst multiple CPUs.

Well, PostgreSQL can do it if you give it enough time to run the query,
but a CUBE in PostgreSQL on a TB sized table would likely take
significantly longer to complete. It's mostly just that the Pg
developers haven't implemented those features optimally, or at all, yet.

-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] max_connections / shared_buffers /

2005-06-24 Thread Rod Taylor

> 1.) shared_buffers I see lot of reference to making
> this the size of available ram (for the DB).  However,
> I also read to make it the size of pgdata directory.  

> 2.) effective_cache_size - from what I read this is
> the 'total' allowed memory for postgresql to use
> correct? So, if I am willing to allow 1GB of memory
> should I make this 1GB?

shared_buffers in your case should be about 1. It is not taken on a
per connection basis, but is global for that cluster. Perhaps your
memory analysis tool is fooling with you?

effective_cache_size is what you want to set to the amount of ram that
you expect the kernel to use for caching the database information in
memory. PostgreSQL will not allocate this memory, but it will make
adjustments to the query execution methods (plan) chosen.

> 3.) max_connections, been trying to figure 'how' to
> determine this #.  I've read this is buffer_size+500k
> per a connection.  

> ie.  128mb(buffer) + 500kb = 128.5mb per connection?

Max connections is the number of connections to the database you intend
to allow.

Shared_buffers must be of a certain minimum size to have that number of
connections, but the 10k number above should cover any reasonable
configurations.

> work_mem and max_stack_depth set to 4096
> maintenance_work_mem set to 64mb

Sort_mem and vacuum_mem became work_mem and maintenance_work_mem as
those terms better indicate what they really do.

> Thanks for any help on this.  I'm sure bombardment of
> newbies gets old :)

That's alright. We only request that once you have things figured out
that you, at your leisure, help out a few others.


-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How does the transaction buffer work?

2005-06-16 Thread Rod Taylor
> Now of course this isn't Nirvana, you must pay somewhere ;-) and our
> weak spot is the need for VACUUM.  But you have no need to fear large
> individual transactions.

No need to fear long running transactions other than their ability to
stop VACUUM from doing what it's supposed to be doing, thus possibly
impacting performance.
-- 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Updates on large tables are extremely slow

2005-06-13 Thread Rod Taylor

> Ok, if all 21 are affected, I can understand the problem.
> But allow me to say that this is a "functional error"

It's a choice between total throughput on a high load, high connection
basis (MVCC dramatically wins here), versus a single user, low load
scenario (MS Access is designed for this).

Believe me when I say that a lot of people have spent a lot of time
explicitly making the system work that way.

> On 13 Jun 2005, at 18:02, Richard Huxton wrote:
> 
> Yves Vindevogel wrote:
> I forgot cc
> Begin forwarded message:
> From: Yves Vindevogel
> <[EMAIL PROTECTED]>
> Date: Mon 13 Jun 2005 17:45:19 CEST
> To: Tom Lane <[EMAIL PROTECTED]>
> Subject: Re: [PERFORM] Updates on large tables
> are extremely slow
> 
> Yes, but if I update one column, why should PG
> update 21 indexes ?
> There's only one index affected !
> 
> No - all 21 are affected. MVCC creates a new row on disk.
> 
> -- 
>   Richard Huxton
>   Archonet Ltd
> 
> 
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
> 
> Yves Vindevogel
> Implements
> 
> 
> 
> __
> 
> 
> 
> Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91
> 
> Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76
> 
> Web: http://www.implements.be
> 
> First they ignore you.  Then they laugh at you.  Then they fight you.
> Then you win.
> Mahatma Ghandi.
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Need help to decide Mysql vs Postgres

2005-06-06 Thread Rod Taylor
On Mon, 2005-06-06 at 12:00 -0400, Amit V Shah wrote:
> Hi all,
> 
> Thanks for your replies. 
> 
> I ran a very prelimnary test, and found following results. I feel they are
> wierd and I dont know what I am doing wrong !!!
> 
> I made a schema with 5 tables. I have a master data table with foreign keys
> pointing to other 4 tables. Master data table has around 4 million records.
> When I run a select joining it with the baby tables, 
> 
> postgres -> returns results in 2.8 seconds
> mysql -> takes around 16 seconds  (This is with myisam ... with innodb
> it takes 220 seconds)

We said MySQL was faster for simple selects and non-transaction inserts
on a limited number of connections.

Assuming you rebuilt statistics in MySQL (myisamchk -a), I would presume
that PostgreSQLs more mature optimizer has come into play in the above 5
table join test by finding a better (faster) way of executing the query.

If you post EXPLAIN ANALYZE output for the queries, we might be able to
tell you what they did differently.

> I am all for postgres at this point, however just want to know why I am
> getting opposite results !!! Both DBs are on the same machine

If possible, it would be wise to run a performance test with the
expected load you will receive. If you expect to have 10 clients perform
operation X at a time, then benchmark that specific scenario.

Both PostgreSQL and MySQL will perform differently in a typical real
load situation than with a single user, single query situation.

> -Original Message-
> From: Jeffrey Tenny [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 06, 2005 11:51 AM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Need help to decide Mysql vs Postgres
> 
> 
> Re: your JDBC wishes: Consider IBM Cloudscape (now Apache Derby) too, 
> which has an apache license.  It's all pure java and it's easy to get going.
> 
> 
> As to MySql vs Postgres: license issues aside, if you have 
> transactionally complex needs (multi-table updates, etc), PostgreSQL 
> wins hands down in my experience.  There are a bunch of things about 
> MySQL that just suck for high end SQL needs. (I like my subqueries,
> and I absolutely demand transactional integrity).
> 
> There are some pitfalls to pgsql though, especially for existing SQL 
> code using MAX and some other things which can really be blindsided 
> (performance-wise) by pgsql if you don't use the workarounds.
> 
> 
> MySQL is nice for what I call "raw read speed" applications. But that 
> license is an issue for me, as it is for you apparently.
> 
> 
> Some cloudscape info:
> http://www-306.ibm.com/software/data/cloudscape/
> 
> Some info on pitfalls of MySQL and PostgreSQL, an interesting contrast:
> http://sql-info.de/postgresql/postgres-gotchas.html
> http://sql-info.de/mysql/gotchas.html
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
> 
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Prefetch

2005-05-11 Thread Rod Taylor
On Wed, 2005-05-11 at 12:53 +0800, Christopher Kings-Lynne wrote:
> > Another trick you can use with large data sets like this when you want 
> > results
> > back in seconds is to have regularly updated tables that aggregate the data
> > along each column normally aggregated against the main data set.
> 
> > Maybe some bright person will prove me wrong by posting some working
> > information about how to get these apparently absent features working.
> 
> Most people just use simple triggers to maintain aggregate summary tables...

Agreed. I've also got a view which calls a function that will 1) use the
summary table where data exists, or 2) calculate the summary
information, load it into summary table, and send a copy to the client
(partial query results cache).

It's not all nicely abstracted behind user friendly syntax, but most of
those features can be cobbled together (with effort) in PostgreSQL.
-- 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Prefetch

2005-05-10 Thread Rod Taylor
> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.

After you load the new days data try running CLUSTER on the structure
using a key of (stockID, date) -- probably your primary key.

This should significantly reduce the amount of IO required for your
calculations involving a few stocks over a period of time.

-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Final decision

2005-04-27 Thread Rod Taylor

> 
> I did have a question if any folks are using two servers one for
> reporting and one for data entry what system should be the beefier?

Yeah. We started putting up slaves for reporting purposes and
application specific areas using Slony replicating partial data sets to
various locations -- some for reporting.

If your reports have a long runtime and don't require transactional
safety for writes (daily summary written or results aren't recorded in
the DB at all) this is probably something to consider.

I understand that PGAdmin makes Slony fairly painless to setup, but it
can be time consuming to get going and Slony can add new complications
depending on the data size and what you're doing with it -- but they're
working hard to reduce the impact of those complications.

> I have a 2proc machine I will be using and I can either put Sears off
> by themselves on this machine or split up functionality and have one
> for reporting and one for inserts and updates; so not sure which
> machine would be best for which spot (reminder the more robust is a
> 4proc with 8 gigs and 2 proc is 4 gigs, both dells).
> 
>  
> 
> Thank you for any ideas in this arena.
> 
>  
> 
> Joel Fradkin
> 
>  
> 
> 
>  
> 
>  
> 
> 
-- 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Disk filling, CPU filling, renegade inserts and

2005-04-20 Thread Rod Taylor
> I'm having a pretty serious problem with postgresql's performance.  
> Currently, I have a cron task that is set to restart and vacuumdb -faz 
> every six hours.  If that doesn't happen, the disk goes from 10% full 
> to 95% full within 2 days (and it's a 90GB disk...with the database 
> being a 2MB download after dump), and the CPU goes from running at 
> around a 2% load to a 99+% load right away (the stats look like a 
> square wave).

Are you running frequent queries which use temporary tables?


-- 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)

2005-04-20 Thread Rod Taylor
> The HPs are at root pretty good machines -- and take 6 drives, so I expect 
> you're mixed up there.  However, they use HP's proprietary RAID controller 
> which is seriously defective.   So you need to factor replacing the RAID 
> controller into the cost.

Do you have any additional materials on what is defective with their
raid controllers?
-- 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Spend 7K *WHERE*? WAS Intel SRCS16 SATA raid? and How

2005-04-15 Thread Rod Taylor
On Fri, 2005-04-15 at 15:43 -0500, Matthew Nuzum wrote:
> I think there are many people who feel that $7,000 is a good budget for a
> database server, me being one.

The budget for a database server is usually some %age of the value of
the data within the database or the value of it's availability. Big
budget hardware (well, going from $7k to $100k) often brings more
redundancy and reliability improvement than performance improvement.

If you're going to lose $100k in business because the database was
unavailable for 12 hours, then kick $75k into the hardware and call a
profit of $25k over 3 years (hardware lifetime is 3 years, catastrophic
failure happens once every 3 or so years...).

Ditto for backup systems. If the company depends on the data in the
database for it's survival, where bankruptcy or worse would happen as a
result of complete dataloss, then it would be a good idea to invest a
significant amount of the companies revenue into making damn sure that
doesn't happen. Call it an insurance policy.


Performance for me dictates which hardware is purchased and
configuration is used within $BUDGET, but $BUDGET itself is nearly
always defined by the value of the data stored.


>  * I agree with the threads that more disks are better.
>  * I also agree that SCSI is better, but can be hard to justify if your
> budget is tight, and I have great certainty that 2x SATA drives on a good
> controller is better than x SCSI drives for many work loads.
>  * I also feel that good database design and proper maintenance can be one
> of the single biggest performance enhancers available. This can be labor
> intensive, however, and sometimes throwing more hardware at a problem is
> cheaper than restructuring a db.
> 
> Either way, having a good hardware platform is an excellent place to start,
> as much of your tuning will depend on certain aspects of your hardware.
> 
> So if you need a db server, and you have $7k to spend, I'd say spend it.
> >From this list, I've gathered that I/O and RAM are your two most important
> investments.
> 
> Once you get that figured out, you can still do some performance tuning on
> your new server using the excellent advice from this mailing list.
> 
> By the way, for all those who make this list work, I've rarely found such a
> thorough, helpful and considerate group of people as these on the
> performance list.
> 
-- 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Réf

2005-04-06 Thread Rod Taylor
On Wed, 2005-04-06 at 14:40 -0400, Alex Turner wrote:
> I think his point was that 9 * 4 != 2400

Oh.. heh.. I didn't even notice that.

Can I pretend I did it in my head using HEX math and that it wasn't a
mistake?

> On Apr 6, 2005 2:23 PM, Rod Taylor <[EMAIL PROTECTED]> wrote:
> > On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote:
> > > On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote:
> > > > Yeah, I think that can be done provided there is more than one worker.
> > > > My limit seems to be about 1000 transactions per second each with a
> > > > single insert for a single process (round trip time down the Fibre
> > > > Channel is large) but running 4 simultaneously only drops throughput to
> > > > about 900 per process (total of 2400 transactions per second) and the
> > > > machine still seemed to have lots of oomph to spare.
> > >
> > > Erm, have I missed something here? 900 * 4 = 2400?
> > 
> > Nope. You've not missed anything.
> > 
> > If I ran 10 processes and the requirement would be met.
> > --
> > 
> > ---(end of broadcast)---
> > TIP 4: Don't 'kill -9' the postmaster
> >
> 
-- 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Réf

2005-04-06 Thread Rod Taylor
On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote:
> On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote:
> > Yeah, I think that can be done provided there is more than one worker.
> > My limit seems to be about 1000 transactions per second each with a
> > single insert for a single process (round trip time down the Fibre
> > Channel is large) but running 4 simultaneously only drops throughput to
> > about 900 per process (total of 2400 transactions per second) and the
> > machine still seemed to have lots of oomph to spare.
> 
> Erm, have I missed something here? 900 * 4 = 2400?

Nope. You've not missed anything.

If I ran 10 processes and the requirement would be met.
-- 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: Réf. : Re: RE : RE: [PERFORM] Postgresql

2005-04-06 Thread Rod Taylor
On Wed, 2005-04-06 at 19:08 +0200, [EMAIL PROTECTED] wrote:
> 
> On our production server, I can insert 5000 tuples in 2100 ms.  
> 
> Single Xeon 2.6 Ghz 
> 2 Gigs ram 
> 3ware RAID 5 SATA drives array, 3 drives only :-(( 
> PG 8.0 - fsync off  
> 
> I do think inserting 5000 tuples in a second (i.e 5000 insert
> transactions, no bulk load) can be reached with well a configured SCSI
> RAID 10 array. 

Yeah, I think that can be done provided there is more than one worker.
My limit seems to be about 1000 transactions per second each with a
single insert for a single process (round trip time down the Fibre
Channel is large) but running 4 simultaneously only drops throughput to
about 900 per process (total of 2400 transactions per second) and the
machine still seemed to have lots of oomph to spare.

Also worth noting is that this test was performed on a machine which as
a noise floor receives about 200 queries per second, which it was
serving during the test.

> Is pgcluster worth giving a try and can it be trusted for in a
> production environnement ? 
> Will it be possible to get a sort of real-time application ? 

>From the design of pgcluster it looks like it adds in a significant
amount of additional communication so expect your throughput for a
single process to drop through the floor.

-- 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this

2005-04-06 Thread Rod Taylor
On Wed, 2005-04-06 at 16:12 +, Mohan, Ross wrote:
> I wish I had a Dell system and run case to show you Alex, but I don't...
> however...using Oracle's "direct path" feature, it's pretty straightforward. 
> 
> We've done 110,000 rows per second into index-less tables on a big system
> (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. 
> Sustained
> for almost 9 minutes. )

Just for kicks I did a local test on a desktop machine (single CPU,
single IDE drive) using COPY from STDIN for a set of integers in via a
single transaction, no indexes.

1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per
second.

Okay, no checkpoints and I didn't cross an index boundary, but I also
haven't tuned the config file beyond bumping up the buffers.

Lets try again with more data this time.

31Million tuples were loaded in approx 279 seconds, or approx 112k rows
per second.

> I'd love to see PG get into this range..i am a big fan of PG (just a
> rank newbie) but I gotta think the underlying code to do this has
> to be not-too-complex.

I'd say we're there.

> -Original Message-
> From: Alex Turner [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, April 06, 2005 11:38 AM
> To: [EMAIL PROTECTED]
> Cc: pgsql-performance@postgresql.org; Mohan, Ross
> Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?
> 
> 
> I think everyone was scared off by the 5000 inserts per second number.
> 
> I've never seen even Oracle do this on a top end Dell system with copious 
> SCSI attached storage.
> 
> Alex Turner
> netEconomist
> 
> On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >  
> > Unfortunately.
> >  
> > But we are in the the process to choose Postgresql with pgcluster. I'm 
> > currently running some tests (performance, stability...) Save the 
> > money on the license fees, you get it for your hardware ;-)
> >  
> > I still welcome any advices or comments and I'll let you know how the 
> > project is going on.
> >  
> > Benjamin.
> >  
> >  
> >  
> >  "Mohan, Ross" <[EMAIL PROTECTED]>
> > 
> > 05/04/2005 20:48
> >  
> > Pour :<[EMAIL PROTECTED]> 
> > cc : 
> > Objet :RE: [PERFORM] Postgresql vs SQLserver for this
> > application ?
> >  
> >  
> > You never got answers on this? Apologies, I don't have one, but'd be 
> > curious to hear about any you did get
> >   
> > thx
> >   
> > Ross
> > 
> > -Original Message-
> >  From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf
> > Of [EMAIL PROTECTED]
> >  Sent: Monday, April 04, 2005 4:02 AM
> >  To: pgsql-performance@postgresql.org
> >  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
> >  
> > 
> >  hi all.
> >  
> >  We are designing a quite big application that requires a 
> > high-performance database backend.  The rates we need to obtain are at 
> > least  5000 inserts per second and 15 selects per second for one 
> > connection. There should only be 3 or 4 simultaneous connections.
> >  I think our main concern is to deal with the constant flow of data coming
> > from the inserts that must be available for selection as fast as possible.
> > (kind of real time access ...) 
> >  
> >  As a consequence, the database should rapidly increase up to more 
> > than one hundred gigs. We still have to determine how and when we 
> > shoud backup old data to prevent the application from a performance 
> > drop. We intend to develop some kind of real-time partionning on our 
> > main table keep the flows up.
> >  
> >  At first, we were planning to use SQL Server as it has features that 
> > in my opinion could help us a lot :
> > - replication 
> > - clustering
> >  
> >  Recently we started to study Postgresql as a solution for our project : 
> > - it also has replication 
> > - Postgis module can handle geographic datatypes (which would 
> > facilitate our developments)
> > - We do have a strong knowledge on Postgresql administration 
> > (we use it for production processes)
> > - it is free (!) and we could save money for hardware 
> > purchase.
> >  
> >  Is SQL server clustering a real asset ? How reliable are Postgresql 
> > replication tools  ? Should I trust Postgresql performance for this 
> > kind of needs ?
> >  
> >  My question is a bit fuzzy but any advices are most welcome... 
> > hardware,tuning or design tips as well :))
> >  
> >  Thanks a lot.
> >  
> >  Benjamin.
> >  
> >  
> >
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 
-- 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-19 Thread Rod Taylor
On Sun, 2005-03-20 at 00:29 -0400, Alvaro Herrera wrote:
> On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > > We probably also need multi-table indexes.
> > 
> > As Josh says, that seems antithetical to the main point of partitioning,
> > which is to be able to rapidly remove (and add) partitions of a table.
> > If you have to do index cleaning before you can drop a partition, what's
> > the point of partitioning?
> 
> Hmm.  You are right, but without that we won't be able to enforce
> uniqueness on the partitioned table (we could only enforce it on each
> partition, which would mean we can't partition on anything else than
> primary keys if the tables have one).  IMHO this is something to
> consider.

Could uniqueness across partitions be checked for using a mechanism
similar to what a deferred unique constraint would use (trigger / index
combination)?


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Rod Taylor
On Wed, 2005-03-16 at 19:31 +0100, Laurent Martelli wrote:
> >>>>> "Rod" == Rod Taylor <[EMAIL PROTECTED]> writes:
> 
>   Rod> On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote:
>   >> Consider this query:
>   >> 
>   >> SELECT distinct owner from pictures;
> 
>   Rod> The performance has nothing to do with the number of rows
>   Rod> returned, but rather the complexity of calculations and amount
>   Rod> of data to sift through in order to find it.
> 
> Yes, but I thought that an index might be able to know what distinct
> values there are and help optime that query very much.

The index does know. You just have to visit all of the pages within the
index to find out, which it does, and that's why you dropped 10ms.

But if you want a sub ms query, you're going to have to normalize the
structure.

-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Rod Taylor
On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote:
> Consider this query:
> 
> SELECT distinct owner from pictures; 

The performance has nothing to do with the number of rows returned, but
rather the complexity of calculations and amount of data to sift through
in order to find it.

> Any ideas, apart from more or less manually maintaining a list of
> distinct owners in another table ?

This would be the proper thing to do, along with adding a foreign key
from pictures to the new owner structure for integrity enforcement.
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance problem on delete from for 10k rows.

2005-03-15 Thread Rod Taylor
>  I get this problem on my dev (Windows/7.4/Cygwin) environment.  But now 
> I see that it's also have this problem on my production env.  Yes I 
> tought I was maybe just a cygwin/Windows problem .. apparently not :-

Care to try again with logging enabled on the PostgreSQL side within the
development environment?

log_statement = true
log_duration = true
log_connections = on

Then run it via Java and from pgAdminIII and send us the two log
snippets as attachments?

Thanks.
-- 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout

2005-02-23 Thread Rod Taylor
On Wed, 2005-02-23 at 15:26 -0300, Bruno Almeida do Lago wrote:
> Is there a real limit for max_connections? Here we've an Oracle server with
> up to 1200 simultaneous conections over it!

If you can reduce them by using something like pgpool between PostgreSQL
and the client, you'll save some headache. PostgreSQL did not perform as
well with a large number of idle connections and it does otherwise (last
time I tested was 7.4 though -- perhaps it's better now).

The kernel also starts to play a significant role with a high number of
connections. Some operating systems don't perform as well with a high
number of processes (process handling, scheduling, file handles, etc.).

I think you can do it without any technical issues, but you will
probably be happier with the result if you can hide idle connections
from the database machine.
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Rod Taylor
> My concern is that this kind of testing has very little relevance to the 
> real world of multiuser processing where contention for the cache becomes an 
> issue.  It may be that, at least in the current situation, postgres is 
> giving too much weight to seq scans based on single user, straight line 

To be fair, a large index scan can easily throw the buffers out of whack
as well. An index scan on 0.1% of a table with 1 billion tuples will
have a similar impact to buffers as a sequential scan of a table with 1
million tuples.

Any solution fixing buffers should probably not take into consideration
the method being performed (do you really want to skip caching a
sequential scan of a 2 tuple table because it didn't use an index) but
the volume of data involved as compared to the size of the cache.

I've often wondered if a single 1GB toasted tuple could wipe out the
buffers. I would suppose that toast doesn't bypass them.
-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Rod Taylor
On Tue, 2005-02-15 at 09:34 +0800, Michael Ryan S. Puncia wrote:
> Hi,
> 
>  
> 
> I have 3 tables in the database with 80G of data, one of them is
> almost 40G and the remaining 2 tables has 20G each.
> 
> We use this database mainly for query and updating is done only
> quarterly and the database perform well. My problem
> 
> is after updating and then run VACCUM FULL ANALYZE  vacuuming the
> tables takes days to complete. I hope someone

I suspect the VACUUM FULL is the painful part. Try running CLUSTER on
the table or changing a column type (in 8.0) instead.
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance Tuning

2005-02-09 Thread Rod Taylor
On Wed, 2005-02-09 at 15:01 -0500, Chris Kratz wrote:
> Hello All,
> 
> In contrast to what we hear from most others on this list, we find our 
> database servers are mostly CPU bound.  We are wondering if this is because 
> we have postgres configured incorrectly in some way, or if we really need 
> more powerfull processor(s) to gain more performance from postgres.  

Not necessarily. I had a very disk bound system, bought a bunch of
higher end equipment (which focuses on IO) and now have a (faster) but
CPU bound system.

It's just the way the cookie crumbles.

Some things to watch for are large calculations which are easy to move
client side, such as queries that sort for display purposes. Or data
types which aren't really required (using numeric where an integer would
do).

> We continue to tune our individual queries where we can, but it seems we 
> still 
> are waiting on the db a lot in our app.  When we run most queries, top shows 
> the postmaster running at 90%+ constantly during the duration of the request. 
>  

Is this for the duration of a single request or 90% constantly?

If it's a single request, odds are you're going through much more
information than you need to. Lots of aggregate work (max / min) perhaps
or count(*)'s where an approximation would do?

> Our question is simply this, is it better to invest in a faster processor at 
> this point, or are there configuration changes to make it faster?  I've done 

If it's for a single request, you cannot get single processors which are
much faster than what you describe as having.

Want to send us a few EXPLAIN ANALYZE's of your longer running queries?

-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-22 Thread Rod Taylor
On Sat, 2005-01-22 at 12:41 -0600, Bruno Wolff III wrote:
> On Sat, Jan 22, 2005 at 12:13:00 +0900,
>   Tatsuo Ishii <[EMAIL PROTECTED]> wrote:
> > 
> > Probably VACUUM works well for small to medium size tables, but not
> > for huge ones. I'm considering about to implement "on the spot
> > salvaging dead tuples".
> 
> You are probably vacuuming too often. You want to wait until a significant
> fraction of a large table is dead tuples before doing a vacuum. If you are
> scanning a large table and only marking a few tuples as deleted, you aren't
> getting much bang for your buck.

The big problem occurs when you have a small set of hot tuples within a
large table.  In the time it takes to vacuum a table with 200M tuples
one can update a small subset of that table many many times.

Some special purpose vacuum which can target hot spots would be great,
but I've always assumed this would come in the form of table
partitioning and the ability to vacuum different partitions
independently of each-other.

-- 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Rod Taylor
On Thu, 2005-01-20 at 15:36 +0100, Hervé Piedvache wrote:
> Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit :
> > > Is there any solution with PostgreSQL matching these needs ... ?
> >
> > You want: http://www.slony.info/
> >
> > > Do we have to backport our development to MySQL for this kind of problem
> > > ? Is there any other solution than a Cluster for our problem ?
> >
> > Well, Slony does replication which is basically what you want :)
> >
> > Only master->slave though, so you will need to have all inserts go via
> > the master server, but selects can come off any server.
> 
> Sorry but I don't agree with this ... Slony is a replication solution ... I 
> don't need replication ... what will I do when my database will grow up to 50 
> Gb ... I'll need more than 50 Gb of RAM on each server ???
> This solution is not very realistic for me ...

Slony has some other issues with databases > 200GB in size as well
(well, it hates long running transactions -- and pg_dump is a regular
long running transaction)

However, you don't need RAM one each server for this, you simply need
enough disk space.

Have a Master which takes writes, a "replicator" which you can consider
to be a hot-backup of the master, have N slaves replicate off of the
otherwise untouched "replicator" machine.

For your next trick, have the application send read requests for Clients
A-C to slave 1, D-F to slave 2, ...

You need enough memory to hold the index sections for clients A-C on
slave 1. The rest of the index can remain on disk. It's available should
it be required (D-F box crashed, so your application is now feeding
those read requests to the A-C machine)...

Go to more slaves and smaller segments as you require. Use the absolute
cheapest hardware you can find for the slaves that gives reasonable
performance. They don't need to be reliable, so RAID 0 on IDE drives is
perfectly acceptable.

PostgreSQL can do the replication portion quite nicely. You need to
implement the "cluster" part in the application side.
-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Low Performance for big hospital server ..

2005-01-06 Thread Rod Taylor
On Thu, 2005-01-06 at 12:35 -0500, Dave Cramer wrote:
> Reading can be worse for a normalized db, which is likely what the 
> developers were concerned about.

To a point. Once you have enough data that you start running out of
space in memory then normalization starts to rapidly gain ground again
because it's often smaller in size and won't hit the disk as much.

Moral of the story is don't tune with a smaller database than you expect
to have.

> Frank Wiles wrote:
> 
> >On Thu, 6 Jan 2005 09:06:55 -0800
> >Josh Berkus  wrote:
> >
> >  
> >
> >>I can't tell you how many times I've seen this sort of thing.   And
> >>the developers always tell me "Well, we denormalized for performance
> >>reasons ... "
> >>
> >>
> >
> >  Now that's rich.  I don't think I've ever seen a database perform
> >  worse after it was normalized.  In fact, I can't even think of a
> >  situation where it could! 
> >
> > -
> >   Frank Wiles <[EMAIL PROTECTED]>
> >   http://www.wiles.org
> > -
> >
> >
> >---(end of broadcast)---
> >TIP 3: if posting/reading through Usenet, please send an appropriate
> >  subscribe-nomail command to [EMAIL PROTECTED] so that your
> >  message can get through to the mailing list cleanly
> >
> >
> >  
> >
> 
-- 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] query rewrite using materialized views

2005-01-04 Thread Rod Taylor
On Tue, 2005-01-04 at 13:26 -0600, Wager, Ryan D [NTK] wrote:
> Rod,
>   I do this, PG gets forked many times, it is tough to find the max
> number of times I can do this, but I have a Proc::Queue Manager Perl
> driver that handles all of the copy calls.  I have a quad CPU machine.
> Each COPY only hits ones CPU for like 2.1% but anything over about 5
> kicks the load avg up.

Sounds like disk IO is slowing down the copy then.

>   Ill get some explain analysis and table structures out there pronto.
> 
> -Original Message-
> From: Rod Taylor [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, January 04, 2005 1:02 PM
> To: Wager, Ryan D [NTK]
> Cc: Postgresql Performance
> Subject: Re: [PERFORM] query rewrite using materialized views
> 
> >   1)the 250 million records are currently whipped and reinserted as a
> > "daily snapshot" and the fastest way I have found "COPY" to do this
> from
> > a file is no where near fast enough to do this.  SQL*Loader from
> Oracle
> > does some things that I need, ie Direct Path to the db files access
> > (skipping the RDBMS), inherently ignoring indexing rules and saving a
> > ton of time (Dropping the index, COPY'ing 250 million records, then
> > Recreating the index just takes way too long).
> 
> If you have the hardware for it, instead of doing 1 copy, do 1 copy
> command per CPU (until your IO is maxed out anyway) and divide the work
> amongst them. I can push through 100MB/sec using methods like this --
> which makes loading 100GB of data much faster.
> 
> Ditto for indexes. Don't create a single index on one CPU and wait --
> send off one index creation command per CPU.
> 
> >   2)Finding a way to keep this many records in a fashion that can be
> > easily queried.  I even tried breaking it up into almost 2800 separate
> > tables, basically views of the data pre-broken down, if this is a
> > working method it can be done this way, but when I tried it, VACUUM,
> and
> > the COPY's all seemed to slow down extremely.
> 
> Can you send us EXPLAIN ANALYSE output for the slow selects and a little
> insight into what your doing? A basic table structure, and indexes
> involved would be handy. You may change column and table names if you
> like.
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Josh
> Berkus
> > Sent: Tuesday, January 04, 2005 12:06 PM
> > To: pgsql-performance@postgresql.org
> > Cc: Yann Michel
> > Subject: Re: [PERFORM] query rewrite using materialized views
> > 
> > Yann,
> > 
> > > are there any plans for rewriting queries to preexisting
> materialized
> > > views?  I mean, rewrite a query (within the optimizer) to use a
> > > materialized view instead of the originating table?
> > 
> > Automatically, and by default, no.   Using the RULES system?  Yes, you
> > can 
> > already do this and the folks on the MattView project on pgFoundry are
> 
> > working to make it easier.
> > 
-- 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] query rewrite using materialized views

2005-01-04 Thread Rod Taylor
>   1)the 250 million records are currently whipped and reinserted as a
> "daily snapshot" and the fastest way I have found "COPY" to do this from
> a file is no where near fast enough to do this.  SQL*Loader from Oracle
> does some things that I need, ie Direct Path to the db files access
> (skipping the RDBMS), inherently ignoring indexing rules and saving a
> ton of time (Dropping the index, COPY'ing 250 million records, then
> Recreating the index just takes way too long).

If you have the hardware for it, instead of doing 1 copy, do 1 copy
command per CPU (until your IO is maxed out anyway) and divide the work
amongst them. I can push through 100MB/sec using methods like this --
which makes loading 100GB of data much faster.

Ditto for indexes. Don't create a single index on one CPU and wait --
send off one index creation command per CPU.

>   2)Finding a way to keep this many records in a fashion that can be
> easily queried.  I even tried breaking it up into almost 2800 separate
> tables, basically views of the data pre-broken down, if this is a
> working method it can be done this way, but when I tried it, VACUUM, and
> the COPY's all seemed to slow down extremely.

Can you send us EXPLAIN ANALYSE output for the slow selects and a little
insight into what your doing? A basic table structure, and indexes
involved would be handy. You may change column and table names if you
like.

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Josh Berkus
> Sent: Tuesday, January 04, 2005 12:06 PM
> To: pgsql-performance@postgresql.org
> Cc: Yann Michel
> Subject: Re: [PERFORM] query rewrite using materialized views
> 
> Yann,
> 
> > are there any plans for rewriting queries to preexisting materialized
> > views?  I mean, rewrite a query (within the optimizer) to use a
> > materialized view instead of the originating table?
> 
> Automatically, and by default, no.   Using the RULES system?  Yes, you
> can 
> already do this and the folks on the MattView project on pgFoundry are 
> working to make it easier.
> 
-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Speeding up pg_dump

2004-12-14 Thread Rod Taylor
Are there any tricks to speeding up pg_dump aside from doing them from a
replicated machine?

I'm using -Fc with no compression.

-- 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] VACUUM ANALYZE downgrades performance

2004-12-02 Thread Rod Taylor
On Thu, 2004-12-02 at 17:07 +0100, Dmitry Karasik wrote:
>   Hi Thomas!
> 
>  Thomas> Look at the ACTUAL TIME.  It dropped from 0.029ms (using the index
>  Thomas> scan) to 0.009ms (using a sequential scan.)
> 
>  Thomas> Index scans are not always faster, and the planner/optimizer knows
>  Thomas> this.  VACUUM ANALYZE is best run when a large proportion of data
>  Thomas> has been updated/loaded or in the off hours to refresh the
>  Thomas> statistics on large datasets.
> 
> While I agree that generally this is true, look how stupid this 
> behavior looks in this particular case: A developer creates a table
> and index, knowing that the table will be large and will be intensively 
> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, 
> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index 
> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every 
> 5 minutes as a solution, right?

You might want to try this on the next 8.0 beta to come out, or against
CVS. Tom recently applied some changes which should mitigate this
situation.

-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] time to stop tuning?

2004-11-27 Thread Rod Taylor
> On limiting the client side connections: we've been gradually pushing up
> the client-side connection pool and threads, and have seen steady
> improvement in our throughput up to the current barrier we have reached.

Very well.. Sometimes more simultaneous workers helps, other times it
hinders.

> I've attached the plans for the 4 queries that represent ~35% of our
> load. These are run against the same dataset, but without any other
> load. Another big query basically requires a test to be runnning because

Those aren't likely from your production system as there isn't any data
in those tables and the queries took less than 1ms.

-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] time to stop tuning?

2004-11-27 Thread Rod Taylor
On Fri, 2004-11-26 at 12:13 -0500, David Parker wrote:
> 
> I suspect the ultimate answer to our problem will be:
> 
>1) aggressive client-side caching
>2) SQL tuning
>3) more backend hardware

#0 might actually be using connection pooling and using cached query
plans (PREPARE), disabling the statistics daemon, etc.

For the plans, send us EXPLAIN ANALYZE output for each of the common
queries.

If you can try it, I'd give a try at FreeBSD or a newer Linux on your
system instead of Solaris. Older versions of Solaris had not received
the same amount of attention for Intel hardware as the BSDs and Linux
have and I would imagine (having not tested it recently) that this is
still true for 32bit Intel.

Another interesting test might be to limit the number of simultaneous
connections to 8 instead of 30 (client side connection retry) after
client side connection pooling via pgpool or similar has been installed.

Please report back with your findings.
-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Rod Taylor
On Mon, 2004-11-22 at 16:54 -0800, Josh Berkus wrote:
> Alexandre,
> 
> > What is the common approach? Should I use directly the product_code as
> > my ID, or use a sequantial number for speed? (I did the same for the
> > company_id, this is a 'serial' and not the shor name of the customer.
> > I just don't know what is usually done.
> 
> Don't use SERIAL just because it's there.Ideally, you *want* to use the 
> product_code if you can.   It's your natural key and a natural key is always 
> superior to a surrogate key all other things being equal.   

It would be nice if PostgreSQL had some form of transparent surrogate
keying in the background which would automatically run around and
replace your real data with SERIAL integers. It could use a lookup table
for conversions between the surrogate and real values so the user never
knows that it's done, a bit like ENUM. Then we could all use the real
values with no performance issues for 1) because it's an integer in the
background, and 2) because a cascade only touches a single tuple in the
lookup table.


-- 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Tuning suggestions wanted

2004-11-13 Thread Rod Taylor
I'm looking for suggestions on tuning Solaris 9 for a SunFire 890 (Ultra
IV chips) connected to an Hitachi 9500V running PostgreSQL 7.4.

So that I don't lead people in a direction, I'll hold off for a while
before posting our configuration settings.

Database is approx 160GB in size with a churn of around 4GB per day (2
GB updated,  2GB inserted, very little removed). It's a mixture of OLTP
and reporting.

5% is reports which do trickle writes
95% is short (30 second or less) transactions with about 10 selects, 10
writes (inserts, updates, deletes all mixed in) affecting 150 tuples.

Thanks for any tips -- particularly Solaris kernel tuning.
-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Solaris 9 Tuning Tips requested

2004-11-10 Thread Rod Taylor
I'm looking for suggestions on tuning Solaris 9 for a SunFire 890 (Ultra
IV chips) connected to an Hitachi 9500V running PostgreSQL 7.4.

Database is approx 160GB in size with a churn of around 4GB per day (2
GB updated,  2GB inserted, very little removed). It's a mixture of OLTP
and reporting.

5% is reports which do trickle writes 95% is short (30 second or less)
transactions with about 10 selects, 10 writes (inserts, updates, deletes
all mixed in) affecting 150 tuples.

Thanks for any tips -- particularly Solaris kernel tuning or oddities in
Disk IO or configuration settings as they related to Solaris (as they
differ from an Intel).




-- 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Rod Taylor
> It seems to me that a query saying "SELECT column FROM table WHERE
> column LIKE 'AA%';" should be just as fast or very close to the first
> case up above.  However, explain tells me that this query is not using
> the index above, which is what's not making sense to me.

It looks for an exact expression match, and doesn't know about values
which are equal.

You can provide both clauses.

WHERE column LIKE 'A%' and column LIKE 'AA%';



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Rod Taylor
On Thu, 2004-11-04 at 17:42, [EMAIL PROTECTED] wrote:
> Citando Rod Taylor <[EMAIL PROTECTED]>:
> > Please send an explain analyze from both.
> I'm sendin three explains. In the first the Dell machine didn't use existing
> indexes, so I turn enable_seqscan off (this is the second explain). The total
> cost decreased, but the total time not. The third explain refers to the cheaper
> (and faster) machine. The last thing is the query itself.

All 3 plans have crappy estimates.

Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Rod Taylor
On Thu, 2004-11-04 at 16:06, Alvaro Nunes Melo wrote:
> Hi,
> 
> I have a very tricky situation here. A client bought a Dell dual-machine
> to be used as Database Server, and we have a cheaper machine used in
> development. With identical databases, configuration parameters and
> running the same query, our machine is almost 3x faster.

Please send an explain analyze from both.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-28 Thread Rod Taylor
On Thu, 2004-10-28 at 12:31, Tom Lane wrote:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> >> One drawback to this is that it would require an additional lseek per table
> >> while planning, but that doesn't seem like a huge penalty.
> 
> > Hmmm ... would the additional lseek take longer for larger tables, or would it 
> > be a fixed cost? 
> 
> Should be pretty much a fixed cost: one kernel call per table.

Is this something that the bgwriter could periodically do and share the
data? Possibly in the future it could even force a function or prepared
statement recompile if the data has changed significantly?



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-26 Thread Rod Taylor
On Tue, 2004-10-26 at 13:42, Anjan Dave wrote:
> It probably is locking issue. I got a long list of locks held when we ran select * 
> from pg_locks during a peak time.
> 
> relation | database | transaction |  pid  |   mode   | granted 
> --+--+-+---+--+-
> 17239 |17142 | |  3856 | AccessShareLock  | t

How many have granted = false?

> Vmstat would show a lot of disk IO at the same time.
> 
> Is this pointing towards a disk IO issue?

Not necessarily. Is your IO reaching the limit or is it just heavy?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-25 Thread Rod Taylor
On Mon, 2004-10-25 at 16:53, Anjan Dave wrote:
> Hi,
> 
>  
> 
> I am dealing with an app here that uses pg to handle a few thousand
> concurrent web users. It seems that under heavy load, the INSERT and
> UPDATE statements to one or two specific tables keep queuing up, to
> the count of 150+ (one table has about 432K rows, other has about
> 2.6Million rows), resulting in ʽwaitʼs for other queries, and then

This isn't an index issue, it's a locking issue. Sounds like you have a
bunch of inserts and updates hitting the same rows over and over again.

Eliminate that contention point, and you will have solved your problem.

Free free to describe the processes involved, and we can help you do
that.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Rod Taylor
On Wed, 2004-10-20 at 12:45, Robert Creager wrote:
> When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400),
> Rod Taylor <[EMAIL PROTECTED]> confessed:
> 
> > > I've done some manual benchmarking running my script 'time script.pl'
> > > I realise my script uses some of the time, bench marking shows that
> > > %50 of the time is spent in dbd:execute.
> > > 
> > 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
> > compliant) which has functions similar to putline() that allow COPY to
> > be used.
> 
> COPY can be used with DBD::Pg, per a script I use:
> 
> $dbh->do( "COPY temp_obs_$band ( $col_list ) FROM stdin" );
> $dbh->func( join ( "\t", @data ) . "\n", 'putline' );
> $dbh->func( "\\.\n", 'putline' );
> $dbh->func( 'endcopy' );

Thanks for that. All of the conversations I've seen on the subject
stated that DBD::Pg only supported standard DB features -- copy not
amongst them.

> With sets of data from 1000 to 8000 records, my COPY performance is consistent
> at ~1 records per second.

Well done.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Insert performance, what should I expect?

2004-10-19 Thread Rod Taylor
> I've done some manual benchmarking running my script 'time script.pl'
> I realise my script uses some of the time, bench marking shows that
> %50 of the time is spent in dbd:execute.

The perl drivers don't currently use database level prepared statements
which would give a small boost.

But your best bet is to switch to using COPY instead of INSERT. Two ways
to do this.

1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI
compliant) which has functions similar to putline() that allow COPY to
be used.

2) Have your perl script output a .sql file with the data prepared (COPY
statements) which you feed into the database via psql.

You can probably achieve a 50% increase in throughput.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Vacuum takes a really long time, vacuum full required

2004-10-19 Thread Rod Taylor
> Whatever the case, the database still slows down to a halt after a month or
> so, and I have to go in and shut everything down and do a VACUUM FULL by
> hand.  One index (of many many) takes 2000 seconds to vacuum.  The whole
> process takes a few hours.

Do a REINDEX on that table instead, and regular vacuum more frequently.

> $ pg_config --version
> PostgreSQL 7.3.2

7.4.x deals with index growth a little better 7.3 and older did.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: IBM P-series machines (was: [PERFORM] Excessive context

2004-10-11 Thread Rod Taylor
On Mon, 2004-10-11 at 13:38, Andrew Sullivan wrote:
> On Tue, Oct 05, 2004 at 09:47:36AM -0700, Josh Berkus wrote:
> > As long as you're on x86, scaling outward is the way to go.   If you want to 
> > continue to scale upwards, ask Andrew Sullivan about his experiences running 
> > PostgreSQL on big IBM boxes.   But if you consider an quad-Opteron server 
> > expensive, I don't think that's an option for you.


> The 650s are not cheap, but boy are they fast.  I don't have any
> numbers I can share, but I can tell you that we recently had a few
> days in which our write load was as large as the entire write load
> for last year, and you couldn't tell.  It is too early for us to say
> whether the P series lives up to its billing in terms of relibility:
> the real reason we use these machines is reliability, so if
> approaching 100% uptime isn't important to you, the speed may not be
> worth it.

Agreed completely, and the 570 knocks the 650 out of the water -- nearly
double the performance for math heavy queries. Beware vendor support for
Linux on these things though -- we ran into many of the same issues with
vendor support on the IBM machines as we did with the Opterons.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries

2004-09-27 Thread Rod Taylor
> More to the point though, I think this is a feature that really really 
> should be in the DB, because then it's trivial for people to use.  

How does putting it into PGPool make it any less trivial for people to
use?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Getting rid of nested loop

2004-09-24 Thread Rod Taylor
I set nested_loop = off, which is why I have the high cost.

@ is a postgis operator between 2 geomotries (both polygons). It's the @
operator which is expensive. Is there a way to force a cheaper way of
doing that join?

 ->  Nested Loop  (cost=11905.94..11906.08 rows=1
width=68) (actual time=1739.368..17047.422 rows=100 loops=1)
   Join Filter: ((COALESCE("outer".geom, "outer".geom) @
COALESCE("inner".geom, "inner".geom)) AND ("outer".region_id <>
"inner".region_id))



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-09-22 Thread Rod Taylor
> Rod Taylor wrote:
> | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
> | work as well as expected, but do require some tweeking as they normally
> | are not optimized for the datablock size that PostgreSQL likes to deal
> | with (8k by default) -- this can make as much as a 50% difference in
> | performance levels.

> I'm also not entirely sure how to make the datablocks line up with the
> filesystem blocks. Any suggestions on this would be greatly appreciated.

We just played with Veritas settings while running pg_bench on a 200GB
database. I no longer have access to the NetApp, but the settings for
the Hitachi are below.

In tunefstab we have:

read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2

In fstab it's:
defaults,mincache=tmpcache,noatime


If you have better settings, please shoot them over so we can try them
out. Perhaps even get someone over there to write a new SAN section in
the Tuning Chapter.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-13 Thread Rod Taylor
On Mon, 2004-09-13 at 20:51, Stephen Crowley wrote:
> Does postgres cache the entire result set before it begins returning
> data to the client?

Sometimes you need to be careful as to how the clients treat the data. 

For example psql will resize columns width on the length (width) of the
data returned.

PHP and Perl will retrieve and cache all of the rows if you request a
row count ($sth->rows() or pg_num_rows($rset))


You may find that using a cursor will help you out.

> I have a table with ~8 million rows and I am executing a query which
> should return about ~800,000 rows. The problem is that as soon as I
> execute the query it absolutely kills my machine and begins swapping
> for 5 or 6 minutes before it begins returning results. Is postgres
> trying to load the whole query into memory before returning anything?
> Also, why would it choose not to use the index? It is properly
> estimating the # of rows returned. If I set enable_seqscan to off it
> is just as slow.
> 
> Running postgres 8.0 beta2 dev2
> 
> explain select * from island_history where date='2004-09-07' and stock='QQQ';
> QUERY PLAN
> ---
>  Seq Scan on island_history  (cost=0.00..266711.23 rows=896150 width=83)
>Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'QQQ'::text))
> (2 rows)
> 
> Any help would be appreciated
> 
> --Stephen
> 
>  Table "public.island_history"
>   Column  |  Type  | Modifiers
> --++---
>  date | date   | not null
>  stock| character varying(6)   |
>  time | time without time zone | not null
>  reference_number | numeric(9,0)   | not null
>  message_type | character(1)   | not null
>  buy_sell_ind | character(1)   |
>  shares   | numeric(6,0)   |
>  remaining_shares | numeric(6,0)   |
>  price| numeric(10,4)  |
>  display  | character(1)   |
>  match_number | numeric(9,0)   | not null
> Indexes:
> "island_history_pkey" PRIMARY KEY, btree (date, reference_number,
> message_type, "time", match_number)
> "island_history_date_stock_time" btree (date, stock, "time")
> "island_history_oid" btree (oid)
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Why does a simple query not use an obvious index?

2004-08-29 Thread Rod Taylor
> People expect count(*) _without a where clause_ to be cached in a single
> global variable. Postgres can't do this, but the reason has everything to do

Someone should write an approx_count('table') function that reads
reltuples from pg_class and tell them to use it in combination with
autovac.

I've yet to see someone use count(*) across a table and not round the
result themselves (approx 9 million clients).



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] using an index worst performances

2004-08-20 Thread Rod Taylor
On Fri, 2004-08-20 at 05:37, Gaetano Mendola wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Christopher Kings-Lynne wrote:
> 
> |>>> Without index: 1.140 ms
> |>>> With index: 1.400 ms
> |>>> With default_statistic_targer = 200:   1.800 ms
> |>>
> |>>
> |>>
> |>>
> |>> Can I just check that 1.800ms means 1.8 secs (You're using . as the
> |>> thousands separator)?
> |>>
> |>> If it means 1.8ms then frankly the times are too short to mean
> |>> anything without running them 100 times and averaging.
> |>
> |>
> |>
> |>
> |> It mean 1.8 ms and that execution time is sticky to that value even
> |> with 1000 times.
> |
> |
> | Given the almost irrelvant difference in the speed of those queries, I'd
> | say that with the stats so high, postgres simply takes longer to check
> | the statistics to come to the same conclusion.  ie. it has to loop over
> | 200 rows instead of just 10.
> 
> The time increase seems too much.

We can test this.

What are the times without the index, with the index and with the higher
statistics value when using a prepared query?



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Help specifying new machine

2004-08-18 Thread Rod Taylor
On Wed, 2004-08-18 at 11:18, Raoul Buzziol wrote:
> > You're not going to be able to get a Dual Athlon MP for the same price
> > as a single Xeon. A few years back, this was the case because Xeon CPUs
> > & MBs had a huge premium over Athlon. This is no longer true mainly
> > because the number of people carrying Athlon MP motherboards has dropped
> > down drastically. Go to pricewatch.com and do a search for 760MPX -- you
> > get a mere 8 entries. Not surprisingly because who would not want to
> > spend a few pennies more for a much superior Dual Opteron? The few
> > sellers you see now just keep stuff in inventory for people who need
> > replacement parts for emergencies and are willing to pay up the nose
> > because it is an emergency.
> 
> I saw pricewatch.com and you're right. 
> 
> I looked for some benchmarks, and I would know if I'm right on:
> - Dual Opteron 246 have aproximately the same performance of a Dual Xeon 3Gh
> (Opteron a little better)
> - Opteron system equal or cheeper than Xeon system.

For PostgreSQL, Opteron might be a touch worse than Xeon for single
processor, little better for Dual, and a whole heck of a bunch better
for Quads -- but this depends on your specific work load as memory
bound, cpu bound, lots of float math, etc. work loads will perform
differently.

In general, an Opteron is a better bet simply because you can shove more
ram onto it (without workarounds), and you can't beat an extra 8GB ram
on an IO bound database (consider your datasize in 1 year).



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] insert

2004-08-13 Thread Rod Taylor
On Fri, 2004-08-13 at 08:10, Ulrich Wisser wrote:
> Hi,
> 
> my inserts are done in one transaction, but due to some foreign key 
> constraints and five indexes sometimes the 100 inserts will take more 
> than 5 minutes.

It is likely that you are missing an index on one of those foreign key'd
items.

Do an EXPLAIN ANALYZE SELECT * FROM foreign_table WHERE foreign_col =
'';

Fix them until they're quick.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-11 Thread Rod Taylor
On Wed, 2004-08-11 at 18:03, Jason Coene wrote:
> > -Original Message-
> > From: Rod Taylor [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, August 11, 2004 5:46 PM
> > To: Jason Coene
> > Cc: 'Merlin Moncure'; Postgresql Performance
> > Subject: Re: [PERFORM] Hardware upgrade for a high-traffic database
> > 
> > > I'm wondering why our PG server is using so little memory...  The system
> > has
> > > 2GB of memory, though only around 200MB of it are used.  Is there a PG
> > 
> > This is the second time you've said this. Surely you're not implying
> > there is 1.8GB Free Memory -- rather than 1.8GB in Buffers or Cache.
> 
> Hi Rod,
> 
> I was looking at top and vmstat - which always show under 300MB "Active".
> We may hit 400MB at peak.  Everything I see (though this isn't my area of
> expertise) points to most of the memory simply being unused.  Results below,
> am I missing something?

This looks fine. The memory is not unused (only 5MB is actually empty)
but is being used for disk cache.

Active is memory used by programs and would need to be swapped if this
space was needed.

Inactive is memory that is generally dirty. Disk cache is often here. In
your case, you likely write to the same pages you're reading from --
which is why this number is so big. It also explains why a checkpoint is
a killer; a large chunk of this memory set needs to be pushed to disk.

Cache is memory used generally for disk cache that is not dirty. It's
been read from the disk and could be cleared immediately if necessary.

Wired is memory that cannot be swapped. In your case, Shared Memory is
probably Wired (this is good). There is another sysctl to check and set
whether it is wired or swappable.



Interesting (if dry) read:
http://www.freebsd.org/doc/en_US.ISO8859-1/articles/vm-design/index.html



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-11 Thread Rod Taylor
On Wed, 2004-08-11 at 17:31, Brian Hirt wrote:
> On Aug 11, 2004, at 3:18 PM, Jason Coene wrote:
> >
> > I'm wondering why our PG server is using so little memory...  The 
> > system has
> > 2GB of memory, though only around 200MB of it are used.  Is there a PG
> > setting to force more memory usage towards the cache?  Additionally, 
> > we use
> > FreeBSD.  I've heard that Linux may manage that memory better, any 
> > truth
> > there?  Sorry if I'm grabbing at straws here :)
> >
> 
> i don't know about freebsd, but linux is very aggressive about using 
> unused memory for disk cache.  we have dedicated linux box running pg 

Aggressive indeed.. I'm stuck with the version that has a tendency to
swap out active processes rather than abandon disk cache -- it gets very
annoying!



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-11 Thread Rod Taylor
> I'm wondering why our PG server is using so little memory...  The system has
> 2GB of memory, though only around 200MB of it are used.  Is there a PG

This is the second time you've said this. Surely you're not implying
there is 1.8GB Free Memory -- rather than 1.8GB in Buffers or Cache.

Send output of the below:

sysctl vm

sysctl -a | grep buffers

top | grep -E "(Mem|Swap):"



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware upgrade for a high-traffic database

2004-08-10 Thread Rod Taylor
> Our database is about 20GB on disk, we have some quite large tables - 2M
> rows with TEXT fields in a sample table, accessed constantly.  We average
> about 4,000 - 5,000 queries per second - all from web traffic.  As you can

99% is reads? and probably the same data over and over again? You might
want to think about a small code change to cache sections of page output
in memory for the most commonly generated pages (there are usually 3 or
4 that account for 25% to 50% of web traffic -- starting pages).

The fact you're getting 5k queries/second off IDE drives tells me most
of the active data is in memory -- so your actual working data set is
probably quite small (less than 10% of the 20GB).


If the above is all true (mostly reads, smallish dataset, etc.) and the
database is not growing very quickly, you might want to look into RAM
and RAM bandwidth over disk. An Opteron with 8GB ram using the same old
IDE drives. Get a mobo with a SCSI raid controller in it, so the disk
component can be upgraded in the future (when necessary).



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Tuning queries on large database

2004-08-04 Thread Rod Taylor
On Wed, 2004-08-04 at 08:44, Valerie Schneider DSI/DEV wrote:
> Hi,
> 
> I have some problem of performance on a PG database, and I don't
> know how to improve. I Have two questions : one about the storage
> of data, one about tuning queries. If possible !
> 
> My job is to compare Oracle and Postgres. All our operational databases
> have been running under Oracle for about fifteen years. Now I try to replace
> Oracle by Postgres.

You may assume some additional hardware may be required -- this would be
purchased out of the Oracle License budget :)

> My first remark is that the table takes a lot of place on disk, about
> 70 Gb, instead of 35 Gb with oracle.
> 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
> not so bad for oracle. What about for PG ? How data is stored ?

This is due to the datatype you've selected. PostgreSQL does not convert
NUMERIC into a more appropriate integer format behind the scenes, nor
will it use the faster routines for the math when it is an integer.
Currently it makes the assumption that if you've asked for numeric
rather than integer or float that you are dealing with either large
numbers or require high precision math.

Changing most of your columns to integer + Check constraint (where
necessary) will give you a large speed boost and reduce disk
requirements a little.

> The different queries of the bench are "simple" queries (no join,
> sub-query, ...) and are using indexes (I "explained" each one to
> be sure) :

Care to send us the EXPLAIN ANALYZE output for each of the 4 queries
after you've improved the datatype selection?

-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Page Miss Hits

2004-08-02 Thread Rod Taylor
> | ARC still helps, since it makes sure the shared_buffers don't all get
> | flushed from the useful small datasets when a seq scan gets executed.
> 
> I'm still not convinced. Why the last backend alive, have to throw away
> bunch of memory copied in the SHM? And again, the ARC is a replacement
> policy for a cache, which one ?

As you know, ARC is a recent addition. I've not seen any benchmarks
demonstrating that the optimal SHARED_BUFFERS setting is different today
than it was in the past.

We know it's changed, but the old buffer strategy had an equally hard
time with a small buffer as it did a large one. Does that mean the
middle of the curve is still at 15k buffers but the extremes are handled
better? Or something completely different?

Please feel free to benchmark 7.5 (OSDL folks should be able to help us
as well) and report back.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] No index usage with "left join"

2004-08-02 Thread Rod Taylor
> SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid =
> companies.intfield01
> 
> contacts.sid (type text, b-tree index on it)
> companies.intfield01 (type bigint, b-tree index on it)

> How can I force the usage of the indexes when using "left join". Or
> any other SQL construct that does the same !? Can anybody please give
> us a hint !?

You really don't need to use indexes since you're fetching all
information from both tables.

Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would
likely choose a far better plan -- hash join rather than nested loop) as
it won't join a bigint to a text field without a cast.

Try this:
set enable_nestloop = false;
SELECT count(*) FROM contacts LEFT JOIN companies ON
cast(contacts.sid as bigint) = companies.intfield01;
set enable_nestloop = true;



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-27 Thread Rod Taylor
You only want one record to be returned? Tack a LIMIT 1 onto the end of 
the query.

> My problem is deceptively simple: how you read the next record from a
> table based on a given set of values?  In practice, this is difficult to
> implement.  If anybody can suggest a alternative/better way to this, I'm
> all ears.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Performance over a LAN

2004-07-23 Thread Rod Taylor
On Fri, 2004-07-23 at 01:50, William Carney wrote:
> Hello,
> 
> Using a test client application that performs 10 insert operations on a
> table, with the client application running on the same machine as the
> Postgres server, I get the following results for the time taken to run the
> test:
> 
> Unix domain socket connection: 26 seconds
> Inet domain socket ('localhost'): 35 seconds

> The machines used are P4s running FreeBSD 5.2.1. The Postgres version is
> 7.4.3. Can anyone tell me why there's such a big difference?

Domains sockets have significantly less work to do than inet sockets as
well as less delays for the transmission itself.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Odd sorting behaviour

2004-07-21 Thread Rod Taylor
On Wed, 2004-07-21 at 06:04, Steinar H. Gunderson wrote:
> On Tue, Jul 20, 2004 at 10:18:19PM -0400, Rod Taylor wrote:
> > I've taken a look and managed to cut out quite a bit of used time.
> > You'll need to confirm it's the same results though (I didn't -- it is
> > the same number of results (query below)
> 
> It looks very much like the same results.

Oh.. On my (slow) laptop it cut the time back significantly..

> As my server is 7.2 and not 7.4, that obviously won't help much :-) Thanks
> anyway, though -- we'll upgrade eventually, and it'll help then. 

I see. Yeah, avoid NOT IN like a plague on 7.2.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Odd sorting behaviour

2004-07-20 Thread Rod Taylor
> I could of course post the updated query plan if anybody is interested; let
> me know. (The data is still available if anybody needs it as well, of
> course.)

I've taken a look and managed to cut out quite a bit of used time.
You'll need to confirm it's the same results though (I didn't -- it is
the same number of results (query below)

First off, "DROP INDEX prodid_index;". It doesn't help anything since
the primary key is just as usable, but it does take enough space that it
causes thrashing in the buffer_cache. Any queries based on prodid will
use the index for the PRIMARY KEY instead.

Secondly, I had no luck getting the hashjoin but this probably doesn't
matter. I've assumed that the number of users will climb faster than the
product set offered, and generated additional data via the below command
run 4 times:

INSERT INTO opinions SELECT prodid, uid + (SELECT max(uid) FROM
opinions), opinion FROM opinions;

I found that by this point, the hashjoin and mergejoin have essentially
the same performance -- in otherwords, as you grow you'll want the
mergejoin eventually so I wouldn't worry about it too much.


New Query cuts about 1/3rd the time, forcing hashjoin gets another 1/3rd
but see the above note:

  SELECT o3.prodid
   , SUM(o3.opinion*o12.correlation) AS total_correlation
FROM opinions o3

 -- Plain join okay since o12.correlation <> 0
 -- eliminates any NULLs anyway.
 -- Was RIGHT JOIN
JOIN (SELECT o2.uid
   , SUM(o1.opinion*o2.opinion)/SQRT(count(*)::numeric)
 AS correlation
FROM opinions AS o1
JOIN opinions AS o2 USING (prodid)
   WHERE o1.uid = 1355
GROUP BY o2.uid
 ) AS o12 USING (uid)

 -- Was old Left join
   WHERE o3.prodid NOT IN (SELECT prodid
 FROM opinions AS o4
WHERE uid = 1355)
 AND o3.opinion <> 0 
 AND o12.correlation <> 0
GROUP BY o3.prodid
ORDER BY total_correlation desc;



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-07-20 Thread Rod Taylor
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS
> which could slow down the transfer rate ??)
> Has anyone ever tried one of these with postgresql ? 

I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
work as well as expected, but do require some tweeking as they normally
are not optimized for the datablock size that PostgreSQL likes to deal
with (8k by default) -- this can make as much as a 50% difference in
performance levels.

For a NAS setup, be VERY careful that the NFS implementation you're
using has the semantics that the database requires (do plenty of failure
testing -- pull plugs and things at random). iSCSI looks more promising,
but I've not tested how gracefully it fails.

Have your supplier run a bunch of benchmarks for random IO with 8k
blocks.

One side note, SANs seem to be very good at scaling across multiple jobs
from multiple sources, but beware your Fibre Channel drivers -- mine
seems to spend quite a bit of time managing interrupts and I've not
found a way to put it into a polling mode (I'm not a Linux person and
that trick usually happens for me on the BSDs).



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] extrem bad performance

2004-07-16 Thread Rod Taylor
> The database grows very slowly. The main load comes from SELECT's and 
> not from INSERT's or UPDATE's, but the performance gets slower day by day...
> 
> I have no idea where to search for the speed break!

Lets start with an example. Please send us an EXPLAIN ANALYZE of a
couple of the poorly performing queries.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] query plan wierdness?

2004-07-12 Thread Rod Taylor
> Oddly enough, I put the same database on a different machine, and the
> query now behaves as I hoped all along.  Notice that I'm using the
> "real" query, with the aspid in asc and the other fields in desc order,
> yet the query does use the call_idx13 index:

Notice that while it only takes 19 seconds to pull the data out of the
table, it is spending 30 seconds sorting it -- so the index scan isn't
buying you very much.

Try it again with ORDER BY ascid DESC and you should get the query down
to 20 seconds in total on that Sparc; so I wouldn't call it exactly what
you wanted.

he decision about whether to use an index or not, is borderline. And as
you've shown they take approximately the same amount of time. Use of an
index will not necessarily be faster than a sequential scan -- but the
penalty for accidentally selecting one when it shouldn't have is much
higher.

> > Any chance you could put together a test case demonstrating the above
> > behaviour? Everything from CREATE TABLE, through dataload to the
> EXPLAIN
> > ANALYZE.
> 
> 
> Forgive me for being thick: what exactly would be involved?  Due to
> HIPAA regulations, I cannot "expose" any of the data.

Of course. But that doesn't mean you couldn't create table different
name and muck around with the values. But you're getting what you want,
so it isn't a problem anymore.

> 
> I hesitated to bring this up because I wanted to focus on the technical
> issues rather than have this degenerate into a religious war.  The chief
> developer in charge of the project brought this query to my attention.
> He has a fair amount of political sway in the company, and is now
> lobbying to switch to MySQL because he maintains that PostgreSQL is
> broken and/or too slow for our needs.  He has apparently benchmarked the
> same query using MySQL and gotten much more favorable results (I have
> been unable to corroborate this yet).
> 

I wouldn't be surprised if MySQL did run this single query faster with
nothing else going on during that time. MySQL was designed primarily
with a single user in mind, but it is unlikely this will be your
production situation so the benchmark is next to useless.

Connect 50 clients to the databases running this (and a mixture of other
selects) while another 20 clients are firing off updates, inserts,
deletes on these and other structures -- or whatever matches your full
production load.

This is what PostgreSQL (and a number of other DBs) are designed for,
typical production loads.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Working on huge RAM based datasets

2004-07-12 Thread Rod Taylor
> What would be most interesting to see is whether this makes it wise to
> increase shared buffer size.  It may be more effective to bump down
> the cache a little, and bump up sort memory; hard to tell.

How do we go about scheduling tests with the OSDL folks? If they could
do 10 runs with buffers between 1k and 500k it would help us get a broad
view of the situation.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] query plan wierdness?

2004-07-09 Thread Rod Taylor
> OK, that makes sense; however, this doesn't:
> 
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid asc, openeddatetime asc, callstatus asc,
> calltype asc, callkey asc;

> I've modified the "order by" to reflect the call_idx13 index, yet the
> query still causes a sequence scan of the table.

This query shown above does not have a limit where the original one had
LIMIT 26. PostgreSQL has determined that pulling out all the table rows,
and sorting them in CPU is cheaper than pulling out all index rows, then
randomly pulling out all table rows.

Normally, that would be well on the mark. You can sort a large number of
tuples for a single random disk seek, but this is not true for you. 

Considering you're pulling out 450k rows in 8 seconds, I'd also guess
the data is mostly in memory. Is that normal? Or is this a result of
having run several test queries against the same data multiple times?

If it's normal, bump your effective_cache parameter higher to move the
sort vs. scan threshold.

> Again, that makes sense to me, but if I remove aspid from the query it
> still ignores the index

You've changed 2 variables. You removed the aspid AND removed the LIMIT.
Add back the limit of 26 like you originally showed, and it'll do what I
described.

> Setting enable_seqscan=off still doesn't cause the desired index to be
> selected:
> 
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid desc, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc;
>  
> QUERY PLAN
> 
> 
> 
> -
>  Sort  (cost=355314.41..356482.87 rows=467384 width=295) (actual
> time=33382.92..34088.10 rows=461973 loops=1)
>Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
>->  Index Scan using call_aspid on call  (cost=0.00..43430.25
> rows=467384 width=295) (actual time=0.24..7915.21 rows=461973 loops=1)
>  Index Cond: (aspid = '123C'::bpchar)
>  Filter: ((openeddatetime >= '2000-01-01 00:00:00-07'::timestamp
> with time zone) AND (openeddatetime <= '2004-06-24
> 23:59:59.999-07'::timestamp with time zone))
>  Total runtime: 39196.39 msec

I'm a little surprised at this. I should have done a reverse index scan
and skipped the sort step. In fact, with a very simple select, I get
this:

rbt=# \d t
  Table "public.t"
 Column |  Type  | Modifiers
++---
 col1   | bpchar |
 col2   | timestamp(0) without time zone |
 col3   | integer|
 col4   | integer|
 col5   | integer|
Indexes:
"t_idx" btree (col1, col2, col3, col4, col5)

rbt=# set enable_seqscan = false;
SET
rbt=# explain analyze select * from t order by col1 desc, col2 desc,
col3 desc, col4 desc, col5 desc;
 QUERY PLAN 
-
 Index Scan Backward using t_idx on t  (cost=0.00..6.20 rows=18
width=52) (actual time=0.046..0.219 rows=18 loops=1)
 Total runtime: 1.813 ms
(2 rows)

Any chance you could put together a test case demonstrating the above
behaviour? Everything from CREATE TABLE, through dataload to the EXPLAIN
ANALYZE.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] vacuum_mem

2004-07-08 Thread Rod Taylor
> It seems vacuum_mem does not have performance 
> effect at all.

Wrong conclusion. It implies that your test case takes less than 64M of
memory to track your removed tuples. I think it takes 8 bytes to track a
tuple for vacuuming an index, which means it should be able to track
80 deletions. Since you're demonstration had 75 for removal,
it's under the limit.

Try your test again with 32MB; it should make a single sequential pass
on the table, and 2 passes on each index for that table.

Either that, or do a few more aborted updates.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] query plan wierdness?

2004-07-08 Thread Rod Taylor

> However, this query performs a sequence scan on the table, ignoring the
> call_idx13 index (the only difference is the addition of the aspid field
> in the order by clause):

You do not have an index which matches the ORDER BY, so PostgreSQL
cannot simply scan the index for the data you want. Thus is needs to
find all matching rows, order them, etc.

> 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc limit 26;

aspid ASC, openeddatetime DESC, callstatus DESC, calltype DESC

>  call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
> callkey),

This index is: aspid ASC, openeddatetime ASC, callstatus ASC, calltype
ASC, callkey ASC

A reverse scan, would of course be DESC, DESC, DESC, DESC, DESC --
neither of which matches your requested order by, thus cannot help the
reduce the lines looked at to 26.

This leaves your WHERE clause to restrict the dataset and it doesn't do
a very good job of it. There are more than 45 rows matching the
where clause, which means the sequential scan was probably the right
choice (unless you have over 10 million entries in the table).


Since your WHERE clause contains a single aspid, an improvement to the
PostgreSQL optimizer may be to ignore that field in the ORDER BY as
order is no longer important since there is only one possible value. If
it did ignore aspid, it would use a plan similar to the first one you
provided.

You can accomplish the same thing by leaving out aspid ASC OR by setting
it to aspid DESC in the ORDER BY. Leaving it out entirely will be
slightly faster, but DESC will cause PostgreSQL to use index
"call_idx13".



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query performance

2004-06-30 Thread Rod Taylor
> Can I get any better performance?

You can try bumping your sort memory way up (for this query only).

Another method would be to cluster the table by the symbol column
(eliminates the expensive sort).

If you could run a very simple calculation against open & close numbers
to eliminate a majority of symbols early, that would be useful as well.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] postgresql and openmosix migration

2004-06-23 Thread Rod Taylor
> 2) You can hire a PG database expert.This will be much faster, but cost 
> you a lot of money.

I wouldn't exactly say "a lot of money". Lots of consulters out there
are willing to put in a weeks worth of effort, on site, for
significantly less than a support contract with most commercial DB
organizations (including MySQL) -- and often give better results since
they're on-site rather than over phone or via email.

But yes, doing it via this mailing list is probably the cheapest option.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] pg_fetch_array

2004-06-20 Thread Rod Taylor
> Does php need to read database everytime when pg_fetch_array is executed in
> the while loop or all the rows have been in the memory after pg_query?

You may need to ask the php people about this one. The PostgreSQL
protocol would allow data to continue streaming in at the same time as
you are processing other rows (asynchronous retrieval). So, optionally
they may fetch and cache all rows in local memory at pg_query OR grab
them in sets of 1000 rows and cache that (fetching the next set when the
first set runs out) OR grab one row for each fetch.

You could run a simple select that will fetch 100M rows from a table
with no WHERE clause. See how quickly the first row come in, and how
much memory is used by the process.

I suspect they call all of the rows at pg_query execution. Otherwise
they wouldn't know how to respond to a pg_num_rows() call.


On a side note, that is a rather unique email address.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] in-transaction insert performance in 7.5devel

2004-06-11 Thread Rod Taylor
> As I understand it, sync() is never called anymore.  mdsync() hits the
> all the files 1 by 1 with an fsync.  My understanding of the commit
> process is that 30 tps is quite reasonable for my hardware.  

Sorry. I didn't see the version in the subject and assumed 7.4 on a
Linux machine with excessive journaling enabled.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] in-transaction insert performance in 7.5devel

2004-06-11 Thread Rod Taylor
On Fri, 2004-06-11 at 14:40, Merlin Moncure wrote:
> I am batch inserting insert statements into a database with fsync = on.
> My single disk system is on a 10k drive...even though I am inside a
> transaction there is at least 1 file sync per row insert.  

Which filesystem?

PostgreSQL isn't issuing the sync except at commit of a transaction, but
some filesystems do wacky things if you ask them too.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
On Wed, 2004-06-09 at 21:45, Christopher Kings-Lynne wrote:
> > If I take away the diagonalSize condition in my query I find that there
> > are 225 rows that satisfy the other conditions.  155 of these have a

> Maybe you should drop your random_page_cost to something less than 4, 
> eg. 3 or even 2...

The big problem is a very poor estimate (off by a couple orders of
magnitude). I was hoping someone with more knowledge in fixing those
would jump in.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
> ... and here is the plan with statistics set to 1000 ...
> 
>  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
>Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

It's better like this, but still way off the mark. Even your good query
which uses the index was out by more than an order of magnitude.

Try raising the statistics levels for upperrightx, lowerleftx,
upperrighty and lowerlefty.

Failing that, you might be able to push it back down again by giving
diagonalsize an upper limit. Perhaps 500 is a value that would never
occur.

AND (diagonalsize BETWEEN 49.999::double precision AND 500)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
On Wed, 2004-06-09 at 16:50, ken wrote:
> Thanks Rod,
> 
> This setting has no effect however.  If I set statistics to 1000, or

Okay.. but you never did send EXPLAIN ANALYZE output. I want to know
what it is really finding.

> On Wed, 2004-06-09 at 13:12, Rod Taylor wrote:
> > It seems to believe that the number of rows returned for the >49.999
> > case will be 4 times the number for the >50 case. If that was true, then
> > the sequential scan would be correct.
> > 
> > ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
> > ANALZYE ;
> > 
> > Send back EXPLAIN ANALYZE output for the >49.999 case.
> > 
> > > The query plan for diagonalSize > 50.000 is ...
> > > 
> > > Index Scan using nrgfeature_xys_index on nrgfeature f 
> > > (cost=0.00..17395.79 rows=4618 width=220)
> > >Index Cond: ((upperrightx > 321264.236977215::double precision) AND
> > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > > 123286.261898636::double precision) AND (lowerlefty <
> > > 124985.927450476::double precision) AND (diagonalsize > 50::double
> > > precision))
> > > 
> > > ... while for diagonalSize > 49.999 is ...
> > > 
> > >  Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
> > >Filter: ((upperrightx > 321264.236977215::double precision) AND
> > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > > 123286.261898636::double precision) AND (lowerlefty <
> > > 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> > > precision))
> > 
> > 
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >   joining column's datatypes do not match
> > 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
It seems to believe that the number of rows returned for the >49.999
case will be 4 times the number for the >50 case. If that was true, then
the sequential scan would be correct.

ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
ANALZYE ;

Send back EXPLAIN ANALYZE output for the >49.999 case.

> The query plan for diagonalSize > 50.000 is ...
> 
> Index Scan using nrgfeature_xys_index on nrgfeature f 
> (cost=0.00..17395.79 rows=4618 width=220)
>Index Cond: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 50::double
> precision))
> 
> ... while for diagonalSize > 49.999 is ...
> 
>  Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
>Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] seq scan woes

2004-06-09 Thread Rod Taylor
On Mon, 2004-06-07 at 16:12, Dan Langille wrote:
> On 7 Jun 2004 at 16:00, Rod Taylor wrote:
> 
> > On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> > > A production system has had a query recently degrade in performance.  
> > > What once took < 1s now takes over 1s.  I have tracked down the 
> > > problem to a working example.
> > 
> > What changes have you made to postgresql.conf?
> 
> Nothing recently (ie. past few months). Nothing at all really.  
> Perhaps I need to start tuning that.
> 
> > Could you send explain analyse again with SEQ_SCAN enabled but with
> > nested loops disabled?
> 
> See http://rafb.net/paste/results/zpJEvb28.html

This doesn't appear to be the same query as we were shown earlier.

> > Off the cuff? I might hazard a guess that effective_cache is too low or
> > random_page_cost is a touch too high. Probably the former.
> 
> I grep'd postgresql.conf:
> 
> #effective_cache_size = 1000# typically 8KB each
> #random_page_cost = 4   # units are one sequential page fetch cost

This would be the issue. You haven't told PostgreSQL anything about your
hardware. The defaults are somewhat modest.

http://www.postgresql.org/docs/7.4/static/runtime-config.html

Skim through the run-time configuration parameters that can be set in
postgresql.conf.

Pay particular attention to:
  * shared_buffers (you may be best with 2000 or 4000)
  * effective_cache_size (set to 50% of ram size if dedicated db
machine)
  * random_page_cost (good disks will bring this down to a 2 from a
4)


-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] seq scan woes

2004-06-09 Thread Rod Taylor
On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> A production system has had a query recently degrade in performance.  
> What once took < 1s now takes over 1s.  I have tracked down the 
> problem to a working example.

What changes have you made to postgresql.conf?

Could you send explain analyse again with SEQ_SCAN enabled but with
nested loops disabled?

Off the cuff? I might hazard a guess that effective_cache is too low or
random_page_cost is a touch too high. Probably the former.
-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] WAL Optimisation - configuration and usage

2004-06-08 Thread Rod Taylor
> random_page_cost = 0.5

Not likely. The lowest this value should ever be is 1, and thats if
you're using something like a ram drive.

If you're drives are doing a ton of extra random IO due to the above
(rather than sequential reads) it would lower the throughput quite a
bit.

Try a value of 2 for a while.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] postgres performance: comparing 2 data centers

2004-06-06 Thread Rod Taylor
> The members table contains about 500k rows.  It has an index on
> (group_id, member_id) and on (member_id, group_id).


Yes, bad stats are causing it to pick a poor plan, but you're giving it
too many options (which doesn't help) and using space up unnecessarily.

Keep (group_id, member_id)
Remove (member_id, group_id)
Add (member_id)

An index on just member_id is actually going to perform better than
member_id, group_id since it has a smaller footprint on the disk.

Anytime where both group_id and member_id are in the query, the
(group_id, member_id) index will likely be used.

-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] postgres performance: comparing 2 data centers

2004-06-04 Thread Rod Taylor
On Fri, 2004-06-04 at 18:07, Michael Nonemacher wrote:
> Slight update:
> 
> Thanks for the replies; this is starting to make a little more sense...
> 
> I've managed to track down the root of the problem to a single query on
> a single table.  I have a query that looks like this:
>select count(*) from members where group_id = ? and member_id >
> 0;
> 
> The members table contains about 500k rows.  It has an index on
> (group_id, member_id) and on (member_id, group_id).
> 
> It seems like the statistics are wildly different depending on whether
> the last operation on the table was a 'vacuum analyze' or an 'analyze'.


Yes, bad stats are causing it to pick a poor plan (might be better in
7.5), but you're giving it too many options (which doesn't help) and
using diskspace up unnecessarily.

Keep (group_id, member_id)
Remove (member_id, group_id)
Add (member_id)

An index on just member_id is actually going to perform better than
member_id, group_id since it has a smaller footprint on the disk.

Anytime where both group_id and member_id are in the query, the
(group_id, member_id) index will likely be used.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL and Kernel 2.6.x

2004-06-01 Thread Rod Taylor
On Tue, 2004-06-01 at 23:16, V i s h a l Kashyap @ [Sai Hertz And
Control Systems] wrote:
> Dear all,
> 
> Have anyone compiled PostgreSQL with kernel 2.6.x 
> if YES
> 1. Was their any performance gains

OSDL reports approx 20% improvement. I've seen similar with some data
access patterns.

> 2. What problems would keeping us away from compiling on kernel 2.6

Nothing that I know of assuming you have vendor support for it.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rod Taylor
> What is essentially required is the "prescient cacheing algorithm,"
> where the postmaster must consult /dev/esp in order to get a
> prediction of what blocks it may need to refer to in the next sixty
> seconds.

Easy enough. Television does it all the time with live shows. The guy
with the buzzer always seems to know what will be said before they say
it. All we need is a 5 to 10 second delay...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] TPCH 100GB - need some help

2004-05-14 Thread Rod Taylor
On Fri, 2004-05-14 at 14:00, Eduardo Almeida wrote:
> Hi folks,
> 
> I need some help in a TPCH 100GB benchmark.

Performance with 7.5 is much improved over 7.4 for TPCH due to efforts
of Tom Lane and OSDL. Give it a try with a recent snapshot of
PostgreSQL.

Otherwise, disable nested loops for that query. 

set enable_nestloop = off;



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] LIKE and INDEX

2004-05-05 Thread Rod Taylor
> but if I use:
> select url from urlinfo where url like 'http://%.lycos.de';
> it won't use index at all, NOT good!
> is there any way I can force secon query use index???

create index nowww on urlinfo(replace(replace(url, 'http://', ''),
'www.', '')));

SELECT url
  FROM urlinfo
WHERE replace(replace(url, 'http://', ''), 'www.', '') = 'lycos.de'
   AND url LIKE 'http://%.lycos.de' ;

The replace() will narrow the search down to all records containing
lycos.de. Feel free to write a more complex alternative for replace()
that will deal with more than just optional www.

Once the results have been narrowed down, you may use the original like
expression to confirm it is your record.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Rod Taylor
> Or even better an offset into the datatable for the earliest deleted 
> row, so if you have a table where you update the row shortly after 
> insert and then never touch it vacuum can skip most of the table 
> (inserts are done at the end of the table, right?)

Inserts are done at the end of the table as a last resort. But anyway,
how do you handle a rolled back insert?



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Rod Taylor
> I would be nice to get a feel for how much performance loss would be incurred in 
> maintaining the index flags against possible performance gains for getting the data 
> back 
> out again.

I guess the real question is, why maintain index flags and not simply
drop the index entry altogether?

A more interesting case would be to have the backend process record
index tuples that it would invalidate (if committed), then on commit
send that list to a garbage collection process.

It's still vacuum -- just the reaction time for it would be much
quicker.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


  1   2   3   >