Re: [PERFORM] Comparing postgresql7.4 CVS head on linux 2.4.20 and 2.6.0-test4

2003-08-28 Thread Neil Conway
On Wed, Aug 27, 2003 at 09:02:25PM +0530, Shridhar Daithankar wrote:
 IIRC in a kernel release note recently, it was commented that IO scheduler is 
 still being worked on and does not perform as much for random seeks, which 
 exaclty what database needs.

Yeah, I've read that as well. It would be interesting to see how 2.6
performs with the traditional (non-anticipatory) scheduler -- I believe
you can switch from one I/O scheduler to another via a sysctl.

 pgbench -c10 -t100 test1
 tps = 64.917044 (including connections establishing)
 tps = 65.438067 (excluding connections establishing)

Interesting that the performance of 2.4.20 for this particular
benchmark is a little less than 3 times faster than 2.6

 3) Shared buffers 3000
 
 pgbench -c5 -t100 test
 tps = 132.489569 (including connections establishing)
 tps = 135.177003 (excluding connections establishing)
 
 pgbench -c5 -t1000 test
 tps = 70.272855 (including connections establishing)
 tps = 70.343452 (excluding connections establishing)
 
 pgbench -c10 -t100 test
 tps = 121.624524 (including connections establishing)
 tps = 123.549086 (excluding connections establishing)

[...] 

 4) noatime enabled Shared buffers 3000
 
 pgbench -c5 -t100 test
 tps = 90.850600 (including connections establishing)
 tps = 92.053686 (excluding connections establishing)
 
 pgbench -c5 -t1000 test
 tps = 92.209724 (including connections establishing)
 tps = 92.329682 (excluding connections establishing)
 
 pgbench -c10 -t100 test
 tps = 79.264231 (including connections establishing)
 tps = 80.145448 (excluding connections establishing)

I'm a little skeptical of the consistency of these numbers
(several people have observed in the past that it's difficult
to get pgbench to produce reliable results) -- how is it
possible that using noatime can possibly *reduce* performance
by 50%, in the case of the first and third benchmarks?

-Neil


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Hello,

We're running a set of Half-Life based game servers that lookup user 
privileges from a central PostgreSQL 7.3.4 database server (I recently 
ported the MySQL code in Adminmod to PostgreSQL to be able to do this).

The data needed by the game servers are combined from several different 
tables, so we have some views set up to provide the data in the format 
needed.

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.

This is rather critical, as the game server software isn't asynchonous 
and thus waits for a reply before continuing, so when someone connects, 
and the user lookup happens to have one of these very long durations, 
the players on this server experience a major lag spike, which isn't 
very popular :-(

All the game servers and the database server are connected to the same 
switch, so I don't think, that it is a network problem.

So far I've been unable to locate the problem, so any suggestions are 
very welcome.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Simple queries take forever to run

2003-08-28 Thread Michael Guerin
I'm running into some performance problems trying to execute simple 
queries.

postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 40
ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes 1s to return, 
postgresql doesn't return at all, neither does explain analyze.
x has 1200673 rows
y has 1282 rows

It seems like its ignoring the index and not using enough memory.. any 
ideas?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt
 You probably, more than anything, should look at some kind of 
 superfast, external storage array

Yeah, I think that's going to be a given.  Low end EMC FibreChannel
boxes can do around 20,000 IOs/sec, which is probably close to good
enough.

You mentioned using multiple RAID controllers as a boost - presumably
the trick here is to split the various elements (WAL, tables, indexes)
across different controllers using symlinks or suchlike?  Can I feasibly
split the DB tables across 5 or more controllers?

  Also, and importantly, the load comes but one hour per week, so buying a
  Starfire isn't a real option, as it'd just sit idle the rest of the
  time.  I'm particularly interested in keeping the cost down, as I'm a
  shareholder in the company!
 
 Interesting.  If you can't spread the load out, can you batch some parts 
 of it?  Or is the whole thing interactive therefore needing to all be 
 done in real time at once?

All interactive I'm afraid.  It's a micropayment system that's going to
be used here in the UK to do online voting for a popular TV programme. 
The phone voting system has a hard limit of [redacted] million votes per
hour, and the producers would like to be able to tell people to vote
online if the phone lines are busy.  They can vote online anyway, but we
expect the average viewer to have to make 10 calls just to get through
during peak times, so the attraction is obvious.

 whether you like it or not, you're gonna need heavy iron if you need to do 
 this all in one hour once a week.

Yeah, I need to rent a Starfire for a month later this year, anybody got
one lying around?  Near London?

 Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I 
 saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going 
 for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should 
 fly.

Jeez, and I thought I was joking about the Starfire.  Even Slowaris
would be OK on one of them.

The financial issue is that there's just not that much money in the
micropayments game for bursty sales.  If I was doing these loads
*continuously* then I wouldn't be working, I'd be in the Maldives :-)

I'm also looking at renting equipment, or even trying out IBM/HP's
'on-demand' offerings.




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Rod Taylor
 Currently there's only a few users in the database for testing purposes, 
 and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
 queries and duration turned on), but several times per hour the duration 
 for one of the queries is 2-3 seconds (1000 times larger), while the 
 surrounding lookups take the usual 2-3 ms.

Are there any other jobs running at the time of these excessive queries?


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread scott.marlowe
On 27 Aug 2003, matt wrote:

 I'm wondering if the good people out there could perhaps give me some
 pointers on suitable hardware to solve an upcoming performance issue. 
 I've never really dealt with these kinds of loads before, so any
 experience you guys have would be invaluable.  Apologies in advance for
 the amount of info below...
 
 My app is likely to come under some serious load in the next 6 months,
 but the increase will be broadly predictable, so there is time to throw
 hardware at the problem.
 
 Currently I have a ~1GB DB, with the largest (and most commonly accessed
 and updated) two tables having 150,000 and 50,000 rows.
 
 A typical user interaction with the system involves about 15
 single-table selects, 5 selects with joins or subqueries, 3 inserts, and
 3 updates.  The current hardware probably (based on benchmarking and
 profiling) tops out at about 300 inserts/updates *or* 2500 selects per
 second.
 
 There are multiple indexes on each table that updates  inserts happen
 on.  These indexes are necessary to provide adequate select performance.
 
 Current hardware/software:
 Quad 700MHz PIII Xeon/1MB cache
 3GB RAM
 RAID 10 over 4 18GB/10,000rpm drives
 128MB battery backed controller cache with write-back enabled
 Redhat 7.3, kernel 2.4.20
 Postgres 7.2.3 (stock redhat issue)
 
 I need to increase the overall performance by a factor of 10, while at
 the same time the DB size increases by a factor of 50.  e.g. 3000
 inserts/updates or 25,000 selects per second, over a 25GB database with
 most used tables of 5,000,000 and 1,000,000 rows.

It will likely take a combination of optimizing your database structure / 
methods and increasing your hardware / OS performance.

You probably, more than anything, should look at some kind of 
superfast, external storage array that has dozens of drives, and a large 
battery backed cache.  You may be able to approximate this yourself with 
just a few dual channel Ultra 320 SCSI cards and a couple dozen hard 
drives.  The more spindles you throw at a database, generally speaking, 
the more parallel load it can handle.  

You may find that once you get to 10 or 20 drives, RAID 5 or 5+0 or 0+5 
will be outrunning 1+0/0+1 due to fewer writes.

You likely want to look at the fastest CPUs with the fastest memory you 
can afford.  those 700MHz xeons are likely using PC133 memory, which is 
painfully slow compared to the stuff pumping data out at 4 to 8 times the 
rate of the older stuff.

Maybe an SGI Altix could do this?  Have you looked at them?  They're not 
cheap, but they do look to be quite fast, and can scale to 64 CPUs if need 
be.  They're interbox communication fabric is faster than most CPU's front 
side busses.

 Notably, the data is very time-sensitive, so the active dataset at any
 hour is almost certainly going to be more on the order of 5GB than 25GB
 (plus I'll want all the indexes in RAM of course).
 
 Also, and importantly, the load comes but one hour per week, so buying a
 Starfire isn't a real option, as it'd just sit idle the rest of the
 time.  I'm particularly interested in keeping the cost down, as I'm a
 shareholder in the company!

Interesting.  If you can't spread the load out, can you batch some parts 
of it?  Or is the whole thing interactive therefore needing to all be 
done in real time at once?

 So what do I need?

whether you like it or not, you're gonna need heavy iron if you need to do 
this all in one hour once a week.

 Can anyone who has (or has ever had) that kind of
 load in production offer any pointers, anecdotes, etc?  Any theoretical
 musings also more than welcome.  Comments upon my sanity will be
 referred to my doctor.
 
 If the best price/performance option is a second hand 32-cpu Alpha
 running VMS I'd be happy to go that way ;-)

Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I 
saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going 
for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should 
fly.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt
 Don't know how cheap they are.
 
 I have an app that does large batch updates. I found that if I dropped
 the indexes, did the updates and recreated the indexes, it was faster
 than doing the updates while the indexes were intact.

Yeah, unfortunately it's not batch work, but real time financial work. 
If I drop all the indexes my select performance goes through the floor,
as you'd expect.

 Does noatime make much difference on a PostgreSQL database?  I haven't
 tested that yet.

Yup, it does.  In fact it should probably be in the standard install
documentation (unless someone has a reason why it shouldn't).  Who
*cares* when PG last looked at the tables?  If 'nomtime' was available
that would probably be a good thing too.

 Can you split it onto multiple boxes?  Some database layouts lend themselves
 to this, others don't.  Obviously you can't do joins from one server to
 another, so you may lose more in multiple queries than you gain by having
 multiple servers.  It's worth looking into though.

I'm considering that.  There are some tables which I might be able to
split out.  There amy even be some things I can pull from the DB
altogether (session info in particular, so long as I can reliably send a
given user's requests to the same app server each time, bearing in mind
I can't see the cookies too easily because 50% of the requests are over
SSL)

 I know my answers aren't quite the ones you were looking for, but my
 experience is that many people try to solve poor application design
 by simply throwing bigger hardware at the problem.  It appears as though
 you've already done your homework, though.

Well, I *hope* that's the case!  The core issue is simply that we have
to deal with an insane load for 1 hour a week, and there's just no
avoiding it.

Maybe I can get Sun/HP/IBM to lend some gear (it's a pretty high-profile
site).


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Simple queries take forever to run

2003-08-28 Thread Michael Guerin
I'm running into some performance problems trying to execute simple queries.

postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 40
ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes 1s to return, 
postgresql doesn't return at all, neither does explain analyze.
x has 1200673 rows
y has 1282 rows

It seems like its ignoring the index and not using enough memory.. any 
ideas?



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Christopher Browne
Martha Stewart called it a Good Thing [EMAIL PROTECTED] (matt)wrote:
 I'm also looking at renting equipment, or even trying out IBM/HP's
 'on-demand' offerings.

You're assuming that this is likely to lead to REAL savings, and that
seems unlikely.

During the recent power outage in the NorthEast, people looking for
generators and fuel were paying _premium_ prices, not discounted
prices.

If your hardware requirement leads to someone having to buy hardware
to support your peak load, then _someone_ has to pay the capital cost,
and that someone is unlikely to be IBM or HP.  Peak demand equipment
is likely to attract pretty peaked prices.

If you can find someone who needs the hardware during the day, but who
_never_ needs it during your needful hours, then there might be an
arrangement to be had, assuming the someone else trusts you to use
what's, at other times, their hardware, and assuming you trust them
with the financial information you're managing.
-- 
select 'cbbrowne' || '@' || 'ntlug.org';
http://www3.sympatico.ca/cbbrowne/linux.html
Rules of the   Evil Overlord #170.  I  will  be  an equal-opportunity
despot and make sure that terror and oppression is distributed fairly,
not just against one particular  group that  will  form the core of  a
rebellion. http://www.eviloverlord.com/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt
 Are you sure?  Have you tested the overall application to see if possibly
 you gain more on insert performance than you lose on select performanc?

Unfortunately dropping any of the indexes results in much worse select
performance that is not remotely clawed back by the improvement in
insert performance.

Actually there doesn't really seem to *be* that much improvement in
insert performance when going from 3 indexes to 2.  I guess indexes must
be fairly cheap for PG to maintain?

 It's possible that compiling Postgres manually with proper optimizations
 could yield some improvements, as well as building a custom kernel in
 Redhat.
 
 Also, you don't mention which filesystem you're using:
 http://www.potentialtech.com/wmoran/postgresql.php

Yeah, I can imagine getting 5% extra from a slim kernel and
super-optimised PG.

The FS is ext3, metadata journaling (the default), mounted noatime.

 But if you're in the situation where you have more time than money,
 you may find that an overall audit of your app is worthwhile.  Consider
 taking parts that are in perl (for example) and recoding them into C
 (that is, unless you've already identified that all the bottlenecks are
 at the PostgreSQL server)

I can pretty cheaply add more CPU horsepower for the app servers, as
they scale horizontally, so I can chuck in a couple (or 3, or 4, or ...)
more dual-cpu boxen with a gig of ram and tell the load balancer about
them.  The problem with the DB is that that approach simply won't work -
the box just has to get bigger!

 I doubt if the suggestions I've made are going to get you 10x, but they
 may get you 2x, and then you only need the hardware to do 5x.

It all helps :-)  A few percent here, a few percent there, pretty soon
you're talking serious improvements...

Thanks

Matt


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Tests

2003-08-28 Thread Tomka Gergely
http://mail.sth.sze.hu/~hsz/sql/

New, upgraded test results. As we see, the developers works hard, and with
good results. Million thanks and congratulations.

Sorry *BSD-lovers, if you send a new hard drive, our tester can do bsd
tests also. Life is hard.

And last, but not least, thanks for the tests, Horvth Szabolcs

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Sun vs a P2. Interesting results.

2003-08-28 Thread Jeff
On Tue, 26 Aug 2003, Neil Conway wrote:

 Would it be possible to get a profile (e.g. gprof output) for a postgres
 backend executing the query on the Sun machine?

Heh. Never thought of doing a profile!

I attached the entire gprof output, but here's the top few functions.

I did the test, 1 beater, 100 searches: 148 seconds total.

 30.9  45.5545.55nocachegetattr [16]
 16.0  69.2023.65internal_mcount [22]
  6.9  79.3710.17  5245902 0.00 0.00  heapgettup [21]
  6.0  88.28 8.91  3663201 0.00 0.00
ExecMakeFunctionResult
cycle 5 [23]
  5.4  96.27 7.99 11431400 0.00 0.00  ExecEvalVar [25]
  3.0 100.73 4.46 18758201 0.00 0.00  ExecEvalExpr
cycle 5
 [24]
  3.0 105.17 4.44  5246005 0.00 0.00  AllocSetReset [29]
  2.5 108.89 3.72  5245700 0.00 0.00
HeapTupleSatisfiesSnapshot
 [30]
  2.0 111.78 2.89  5650632 0.00 0.00  LWLockRelease [32]
  1.6 114.10 2.32  5650632 0.00 0.00  LWLockAcquire [34]
  1.6 116.40 2.30  5245800 0.00 0.01  SeqNext [17]
  1.4 118.54 2.14  5438301 0.00 0.00  ExecStoreTuple [27]
  1.4 120.62 2.08  5245700 0.00 0.01  ExecQual [18]
  1.3 122.50 1.88  5379202 0.00 0.00  ReleaseAndReadBuffer
[35]
  1.1 124.16 1.66   178400 0.01 0.40  ExecScan [15]
  1.1 125.80 1.64_mcount (6247)
  1.1 127.41 1.61  5245902 0.00 0.01  heap_getnext [20]


.. as it turns out the profile gzipped is still huge (100kb) so I put it
on my web server - snag it at

http://www.jefftrout.com/~threshar/postgres/postgres-7.3.4-sol8-gprof.txt.gz

I'll do a profile for hte p2 and send post that in an hour or two

-- 
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Andrew Sullivan
On Wed, Aug 27, 2003 at 02:35:13AM +0100, matt wrote:

 I need to increase the overall performance by a factor of 10, while at
 the same time the DB size increases by a factor of 50.  e.g. 3000
 inserts/updates or 25,000 selects per second, over a 25GB database with
 most used tables of 5,000,000 and 1,000,000 rows.

Your problem is mostly going to be disk related.  You can only get in
there as many tuples in a second as your disk rotates per second.  I
suspect what you need is really expensive disk hardware (sorry to
tell you that) set up as RAID 1+0 on fibre channel or something. 
3000 write transactions per second is probably too much to ask for
any standard hardware.

But given that you are batching this once a week, and trying to avoid
big expenses, are you use this is the right approach?  Perhaps you
should consider a redesign using COPY and such?

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Christopher Browne
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED]
(scott.marlowe) belched out... :-):
 whether you like it or not, you're gonna need heavy iron if you need
 to do this all in one hour once a week.

The other thing worth considering is trying to see if there is a way
of partitioning the workload across multiple hosts.

At the point that you start going past hardware that is
over-the-counter commodity stuff, the premiums start getting pretty
high.  Dual-CPU Intel boxes are pretty cheap compared to buncha-CPU
Sparc boxes.

If some sort of segmentation of the workload can be done, whether by
area code, postal code, or perhaps the last couple digits of the
caller's phone number, or even a round robin, it's likely to be a
lot cheaper to get an array of 4 Dual-Xeon boxes with 8 disk drives
apiece than a Sun/HP/IBM box with 16 CPUs.
-- 
let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;;
http://cbbrowne.com/info/linuxxian.html
Show me... show me... show me... COMPUTERS!

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Bill Moran
Christopher Browne wrote:
Martha Stewart called it a Good Thing [EMAIL PROTECTED] (matt)wrote:

I'm also looking at renting equipment, or even trying out IBM/HP's
'on-demand' offerings.
You're assuming that this is likely to lead to REAL savings, and that
seems unlikely.
During the recent power outage in the NorthEast, people looking for
generators and fuel were paying _premium_ prices, not discounted
prices.
If your hardware requirement leads to someone having to buy hardware
to support your peak load, then _someone_ has to pay the capital cost,
and that someone is unlikely to be IBM or HP.  Peak demand equipment
is likely to attract pretty peaked prices.
If you can find someone who needs the hardware during the day, but who
_never_ needs it during your needful hours, then there might be an
arrangement to be had, assuming the someone else trusts you to use
what's, at other times, their hardware, and assuming you trust them
with the financial information you're managing.
I hadn't considered this, but that's not a bad idea.

With FreeBSD, you have jails, which allow multiple users to share
hardware without having to worry about user A looking at user B's
stuff.  Does such a paradigm exist on any heavy iron?  I have no
idea where you'd go to find this kind of co-op server leasing,
but it sure sounds like it could work.
--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Ron Johnson
On Tue, 2003-08-26 at 20:35, matt wrote:
 I'm wondering if the good people out there could perhaps give me some
 pointers on suitable hardware to solve an upcoming performance issue. 
 I've never really dealt with these kinds of loads before, so any
 experience you guys have would be invaluable.  Apologies in advance for
 the amount of info below...
 
 My app is likely to come under some serious load in the next 6 months,
 but the increase will be broadly predictable, so there is time to throw
 hardware at the problem.
 
 Currently I have a ~1GB DB, with the largest (and most commonly accessed
 and updated) two tables having 150,000 and 50,000 rows.
 
 A typical user interaction with the system involves about 15
 single-table selects, 5 selects with joins or subqueries, 3 inserts, and
 3 updates.  The current hardware probably (based on benchmarking and
 profiling) tops out at about 300 inserts/updates *or* 2500 selects per
 second.
 
 There are multiple indexes on each table that updates  inserts happen
 on.  These indexes are necessary to provide adequate select performance.
 
 Current hardware/software:
 Quad 700MHz PIII Xeon/1MB cache
 3GB RAM
 RAID 10 over 4 18GB/10,000rpm drives
 128MB battery backed controller cache with write-back enabled

Much more cache needed.  Say 512MB per controller?

 Redhat 7.3, kernel 2.4.20
 Postgres 7.2.3 (stock redhat issue)

Upgrade to Pg 7.3.4!

 I need to increase the overall performance by a factor of 10, while at
 the same time the DB size increases by a factor of 50.  e.g. 3000

Are you *sure* about that  3K updates/inserts per second xlates
to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!

 inserts/updates or 25,000 selects per second, over a 25GB database with

Likewise: 90,000,000 selects per hour.

 most used tables of 5,000,000 and 1,000,000 rows.
 
 Notably, the data is very time-sensitive, so the active dataset at any

During the 1 hour surge, will SELECTs at 10 minutes after the 
hour depend on INSERTs at 5 minutes after the hour?

If not, maybe you could pump the INSERT/UPDATE records into
flat files, to be processed after the 1-hour surge is complete.
That may reduce the h/w requirements.

 hour is almost certainly going to be more on the order of 5GB than 25GB
 (plus I'll want all the indexes in RAM of course).
 
 Also, and importantly, the load comes but one hour per week, so buying a

Only one hour out of 168?  May I ask what kind of app it is?

 Starfire isn't a real option, as it'd just sit idle the rest of the
 time.  I'm particularly interested in keeping the cost down, as I'm a
 shareholder in the company!

What a fun exercises.  Ok, lets see:
Postgres 7.3.4
RH AS 2.1
12GB RAM
motherboard with 64 bit 66MHz PCI slots
4 - Xenon 3.0GHz (1MB cache) CPUs
8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
having 512MB cache (for database)
2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
having 512MB cache (for OS, swap, WAL files)
1 - library tape drive plugged into the OS' SCSI controller.  I
prefer DLT, but that's my DEC bias.
1 - 1000 volt UPS.

If you know when the flood will be coming, you could perform
SELECT * FROM ... WHERE statements on an indexed field, to
pull the relevant data into Linux's buffers.

Yes, the 8 disks is capacity-overkill, but the 8 high-speed
spindles is what you're looking for.

 So what do I need?  Can anyone who has (or has ever had) that kind of
 load in production offer any pointers, anecdotes, etc?  Any theoretical
 musings also more than welcome.  Comments upon my sanity will be
 referred to my doctor.
 
 If the best price/performance option is a second hand 32-cpu Alpha
 running VMS I'd be happy to go that way ;-)

I'd love to work on a GS320!  You may even pick one up for a million
or 2.  The license costs for VMS  Rdb would eat you, though.

Rdb *does* have ways, though, using large buffers and hashed indexes,
with the table tuples stored on the same page as the hashed index
keys, to make such accesses *blazingly* fast.

 Many thanks for reading this far.

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

A C program is like a fast dance on a newly waxed dance floor 
by people carrying razors.
Waldi Ravens


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Bill Moran
Hey all.

I said I was going to do it, and I finally did it.

As with all performance tests/benchmarks, there are probably dozens or
more reasons why these results aren't as accurate or wonderful as they
should be.  Take them for what they are and hopefully everyone can
learn a few things from them.
Intelligent feedback is welcome.

http://www.potentialtech.com/wmoran/postgresql.php

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Bill Moran
matt wrote:
I'm wondering if the good people out there could perhaps give me some
pointers on suitable hardware to solve an upcoming performance issue. 
I've never really dealt with these kinds of loads before, so any
experience you guys have would be invaluable.  Apologies in advance for
the amount of info below...

My app is likely to come under some serious load in the next 6 months,
but the increase will be broadly predictable, so there is time to throw
hardware at the problem.
Currently I have a ~1GB DB, with the largest (and most commonly accessed
and updated) two tables having 150,000 and 50,000 rows.
A typical user interaction with the system involves about 15
single-table selects, 5 selects with joins or subqueries, 3 inserts, and
3 updates.  The current hardware probably (based on benchmarking and
profiling) tops out at about 300 inserts/updates *or* 2500 selects per
second.
There are multiple indexes on each table that updates  inserts happen
on.  These indexes are necessary to provide adequate select performance.
Are you sure?  Have you tested the overall application to see if possibly
you gain more on insert performance than you lose on select performanc?
(Hey, you asked for musings ...)

Current hardware/software:
Quad 700MHz PIII Xeon/1MB cache
3GB RAM
RAID 10 over 4 18GB/10,000rpm drives
128MB battery backed controller cache with write-back enabled
Redhat 7.3, kernel 2.4.20
Postgres 7.2.3 (stock redhat issue)
It's possible that compiling Postgres manually with proper optimizations
could yield some improvements, as well as building a custom kernel in
Redhat.
Also, you don't mention which filesystem you're using:
http://www.potentialtech.com/wmoran/postgresql.php
I need to increase the overall performance by a factor of 10, while at
the same time the DB size increases by a factor of 50.  e.g. 3000
inserts/updates or 25,000 selects per second, over a 25GB database with
most used tables of 5,000,000 and 1,000,000 rows.
Notably, the data is very time-sensitive, so the active dataset at any
hour is almost certainly going to be more on the order of 5GB than 25GB
(plus I'll want all the indexes in RAM of course).
Also, and importantly, the load comes but one hour per week, so buying a
Starfire isn't a real option, as it'd just sit idle the rest of the
time.  I'm particularly interested in keeping the cost down, as I'm a
shareholder in the company!
I can't say for sure without looking at your application overall, but
many applications I've seen could be optimized.  It's usually a few
seconds here and there that take hours to find and tweak.
But if you're in the situation where you have more time than money,
you may find that an overall audit of your app is worthwhile.  Consider
taking parts that are in perl (for example) and recoding them into C
(that is, unless you've already identified that all the bottlenecks are
at the PostgreSQL server)
I doubt if the suggestions I've made are going to get you 10x, but they
may get you 2x, and then you only need the hardware to do 5x.
--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Best tweak for fast results.. ?

2003-08-28 Thread scott.marlowe
On Tue, 26 Aug 2003, JM wrote:

 need input on parameter values on confs...
 
 our database is getting 1000 transactions/sec on peak periods..
 
 sitting on RH 7.3 
 2.4.7-10smp
 RAM: 1028400
 SWAP: 2040244

1:  Upgrade your kernel.  2.4.7 on RH3 was updated to 2.4.18-24 in March, 
and the 2.4.18 kernel is MUCH faster and has many bugs squashed.

2:  Upgrade to the latest stable version of postgresql, 7.3.4

3:  Make sure your kernels file-nr settings, and shm settings are big 
enough to handle load.  

4:  Edit the $PGDATA/postgresql.conf file to reflect all that extra cache 
you've got etc  

shared_buffers = 5000
sort_mem = 16384
effective_cache_size = (size of cache/buffer mem divided by 8192)

5:  Look at moving WAL to it's own spindle(s), as it is often the choke 
point when doing lots of transactions.

6:  Look at using more drives in a RAID 1+0 array for the data (as well as 
a seperate one for WAL if you can afford it.)

7:  Make sure your drives are mounted noatime.

8:  If you don't mind living dangerously, or the data can be reproduced 
from source files (i.e. catastrophic failure of your data set won't set 
you back) look at both mounting the drives async (the default for linux, 
slightly dangerous) and turning fsync off (quite dangerous, in case of 
crashed hardware / OS, you very well might lose data.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Sun vs a P2. Interesting results.

2003-08-28 Thread Tom Lane
Jeff [EMAIL PROTECTED] writes:
 I'll do a profile for hte p2 and send post that in an hour or two

Please redo the linux profile after recompiling postmaster.c with
-DLINUX_PROFILE added (I use make PROFILE='-pg -DLINUX_PROFILE'
when building for profile on Linux).

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Ron Johnson
On Wed, 2003-08-27 at 21:26, Bill Moran wrote:
 Christopher Browne wrote:
  Martha Stewart called it a Good Thing [EMAIL PROTECTED] (matt)wrote:
[snip]
 With FreeBSD, you have jails, which allow multiple users to share
 hardware without having to worry about user A looking at user B's
 stuff.  Does such a paradigm exist on any heavy iron?  I have no

IBM invented the idea (or maybe stole it) back in the '70s.  The
VM hypervisor was designed as a conversion tool, to let customers
run both OS/MVS and DOS/VSE, to aid in converting from VSE to MVS.

Customers, the cheap, uncooperative beasts, liked VSE, but also liked
VM, since it let them have, for example, a dev, test, and production
systems all on the same piece of h/w, thus saving them oodles of
money in h/w costs and maintenance fees.

Yes, yes, the modern term for this is server consolidation, and
VMware does the same thing, 30 years after dinosaur customers had
it on boxen that academics, analysts and young whippersnappers 
said were supposed to be extinct 20 years ago.

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

Knowledge should be free for all.
Harcourt Fenton Mudd, Star Trek:TOS, I, Mudd


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Balazs Wellisch

Bill,

Very interesting results. I'd like to command you on your honesty.
Having started out with the intentions of proving that FreeBSD is faster
than Linux only to find that the opposite is true must not have been
rewarding for you. However, these unexpected results serve only to
reinforce the integrity of your tests.

Thanks for all the work.

Balazs



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran
Sent: Tuesday, August 26, 2003 6:48 PM
To: [EMAIL PROTECTED]
Subject: [PERFORM] The results of my PostgreSQL/filesystem performance
tests

Hey all.

I said I was going to do it, and I finally did it.

As with all performance tests/benchmarks, there are probably dozens or
more reasons why these results aren't as accurate or wonderful as they
should be.  Take them for what they are and hopefully everyone can
learn a few things from them.

Intelligent feedback is welcome.

http://www.potentialtech.com/wmoran/postgresql.php

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tests

2003-08-28 Thread Balazs Wellisch

Nicely done!

Thanks,

Balazs


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tomka
Gergely
Sent: Wednesday, August 27, 2003 5:40 AM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Tests

http://mail.sth.sze.hu/~hsz/sql/

New, upgraded test results. As we see, the developers works hard, and
with
good results. Million thanks and congratulations.

Sorry *BSD-lovers, if you send a new hard drive, our tester can do bsd
tests also. Life is hard.

And last, but not least, thanks for the tests, Horvth Szabolcs

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Tarhon-Onu Victor
On Wed, 27 Aug 2003, Bruno Wolff III wrote:

 Did you check the error status for the records that weren't entered?
 
 My first guess is that you have some bad data you are trying to insert.

Of course, I checked the error status for every insert, there is 
no error. It seems like in my case the postgres server cannot handle so 
much inserts per second some of the lines are not being parsed and data 
inserted into the database.
I don't know where can be the problem: in the DBD::Pg Perl DBI
driver or my postgresql server settings are not optimal.

-- 
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Tomka Gergely
2003-08-27 ragyog napjn matt ezt zente:

 Yeah, I can imagine getting 5% extra from a slim kernel and
 super-optimised PG.

Hm, about 20%, but only for the correctness - 20% not help you also :(

 The FS is ext3, metadata journaling (the default), mounted noatime.

Worst fs under linux :) Try xfs.

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Please scan your computer

2003-08-28 Thread Dennis Gearon
Someone who has my:

   [EMAIL PROTECTED]

email address has an infected computer, infected with the SoBig.F virus. 
I'm getting 200+ infected emails a day from that person(s).

Go to this site and do a free online virus scan. It's safe, and done by 
one of the two top virus scanning companies in world. I've done it 
several times.

http://housecall.antivirus.com/

--
Dennis Gearon
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 1:07, Anders K. Pedersen wrote:

 Hello,
 
 We're running a set of Half-Life based game servers that lookup user 
 privileges from a central PostgreSQL 7.3.4 database server (I recently 
 ported the MySQL code in Adminmod to PostgreSQL to be able to do this).
 
 The data needed by the game servers are combined from several different 
 tables, so we have some views set up to provide the data in the format 
 needed.
 
 Currently there's only a few users in the database for testing purposes, 
 and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
 queries and duration turned on), but several times per hour the duration 
 for one of the queries is 2-3 seconds (1000 times larger), while the 
 surrounding lookups take the usual 2-3 ms.

Check vmstat during the same period if it is syncing at that point as Tom 
suggested.

Are you using pooled connections? If yes you could shorten life of a connection 
and force making a new connection every 10-15 minutes say. That would avoid IO 
avelanche at the end of the hour types.

HTH.

Bye
 Shridhar

--
ignorance, n.:  When you don't know anything, and someone else finds out.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Shridhar Daithankar
On 27 Aug 2003 at 15:50, Tarhon-Onu Victor wrote:

 
   Hi,
 
   I have a (big) problem with postgresql when making lots of 
 inserts per second. I have a tool that is generating an output of ~2500 
 lines per seconds. I write a script in PERL that opens a pipe to that 
 tool, reads every line and inserts data.
   I tryed both commited and not commited variants (the inserts 
 were commited at every 60 seconds), and the problem persists.

Assuming one record per line, you are committing after 150K records, that's not 
good.

Try committing every 5 seconds. And open more than one conenction. That will 
certainly improve performance. Afterall concurrency is biggest assset of 
postgresql.

Fiddle around with combination and see which works best for you.

Bye
 Shridhar

--
Mencken and Nathan's Ninth Law of The Average American: The quality of a 
champagne is judged by the amount of noise the  cork makes when it is popped.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] 8 way Intel Xeon system

2003-08-28 Thread Tomka Gergely
2003-08-27 ragyog napjn Christopher Browne ezt zente:

 After a long battle with technology,[EMAIL PROTECTED] (Tomka Gergely), an earthling, 
 wrote:
  2003-08-27 ragyog napjn Castle, Lindsay ezt zente:
  Perhaps some may say Linux isn't the best option for an 8 CPU
  server but this is what I have to work with for reasons we won't
  get into :-)
 
  This is not true, 2.4 series AFAIK run nicely on these monstrums. If
  you want some thrill, try 2.6-test series. Linux Is Good For You
  (tm) :)

 The other bleeding edge that it'll be interesting to see, um,
 coagulate, is Dragonfly BSD, which plans to do some really
 interesting SMP stuff as a fork of FreeBSD...

As isee the pages (what a beautiul insect:) tehy not reach te limit of
useability - or i am wrong?

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Christopher Kings-Lynne
 Of course, I checked the error status for every insert, there is
 no error. It seems like in my case the postgres server cannot handle so
 much inserts per second some of the lines are not being parsed and data
 inserted into the database.

That sounds extremely unlikely.  Postgres is not one to fail without any
sort of error.  There's something else that is the problem.  More than
likely, it's a problem in your code.

Chris


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt
 Are you *sure* about that  3K updates/inserts per second xlates
 to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!

Yup, I know!  

 During the 1 hour surge, will SELECTs at 10 minutes after the 
 hour depend on INSERTs at 5 minutes after the hour?

Yes, they do.  It's a payments system, so things like account balances
and purchase histories have to be updated in real time.

 Only one hour out of 168?  May I ask what kind of app it is?

Online voting for an unnamed TV show...

  If the best price/performance option is a second hand 32-cpu Alpha
  running VMS I'd be happy to go that way ;-)
 
 I'd love to work on a GS320!  You may even pick one up for a million
 or 2.  The license costs for VMS  Rdb would eat you, though.

You'd be amazed how little they do go for actually :-)




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Russell Garrett
We have a somewhat similar situation - we're running a fairly constant, but
low priority, background load of about 70 selects and 40 inserts per second
(batched into fairly large transactions), and on top of that we're trying to
run time-sensitive queries for a web site (well two). I should emphasize
that this is low low priority - if a query is delayed by an hour here, it
doesn't matter.

The web site queries will jump up one or two orders of magnitude (I have
seen a normally 100ms query take in excess of 30 seconds) in duration at
seemingly random points. It's not always when the transactions are
committing, and it doesn't seem to be during checkpointing either. The same
thing happens with WAL switched off. It appears to happen the first time the
query runs after a while. If I run the same query immediately afterwards, it
will take the normal amount of time.

Any ideas?

Cheers,

Russ Garrett

[EMAIL PROTECTED] wrote:
 Subject: [PERFORM] Queries sometimes take 1000 times the normal time


 Hello,

 We're running a set of Half-Life based game servers that lookup user
 privileges from a central PostgreSQL 7.3.4 database server (I recently
 ported the MySQL code in Adminmod to PostgreSQL to be able to do
 this).

 The data needed by the game servers are combined from several
 different tables, so we have some views set up to provide the data in
 the format needed.

 Currently there's only a few users in the database for testing
 purposes, and most of the time the user lookup's take 2-3 ms (I have
 syslog'ing of queries and duration turned on), but several times per
 hour the duration for one of the queries is 2-3 seconds (1000 times
 larger), while the surrounding lookups take the usual 2-3 ms.

 This is rather critical, as the game server software isn't asynchonous
 and thus waits for a reply before continuing, so when someone
 connects, and the user lookup happens to have one of these very long
 durations, the players on this server experience a major lag spike,
 which isn't very popular :-(

 All the game servers and the database server are connected to the same
 switch, so I don't think, that it is a network problem.

 So far I've been unable to locate the problem, so any suggestions are
 very welcome.

 Regards,
 Anders K. Pedersen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 10:02, Russell Garrett wrote:

 The web site queries will jump up one or two orders of magnitude (I have
 seen a normally 100ms query take in excess of 30 seconds) in duration at
 seemingly random points. It's not always when the transactions are
 committing, and it doesn't seem to be during checkpointing either. The same
 thing happens with WAL switched off. It appears to happen the first time the
 query runs after a while. If I run the same query immediately afterwards, it
 will take the normal amount of time.

Looks like it got flushed out of every type of cache and IO scheduler could not 
deliver immediately because of other loads...


Bye
 Shridhar

--
Abstainer, n.:  A weak person who yields to the temptation of denying himself a 
pleasure.   -- Ambrose Bierce, The Devil's Dictionary


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Ron Johnson
On Tue, 2003-08-26 at 20:47, Bill Moran wrote:
 Hey all.
 
 I said I was going to do it, and I finally did it.
 
 As with all performance tests/benchmarks, there are probably dozens or
 more reasons why these results aren't as accurate or wonderful as they
 should be.  Take them for what they are and hopefully everyone can
 learn a few things from them.
 
 Intelligent feedback is welcome.
 
 http://www.potentialtech.com/wmoran/postgresql.php

Hi,

Woody has pg 7.2.1.  Note also that Woody comes with kernel 2.4.18.

It would be interesting to see how Debian Sid (kernel 2.4.21 and
pg 7.3.3) would perform.

Thanks for the results!

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

Oh, great altar of passive entertainment, bestow upon me thy 
discordant images at such speed as to render linear thought impossible
Calvin, regarding TV


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Ron Johnson
On Tue, 2003-08-26 at 20:47, Bill Moran wrote:
 Hey all.
 
 I said I was going to do it, and I finally did it.
 
 As with all performance tests/benchmarks, there are probably dozens or
 more reasons why these results aren't as accurate or wonderful as they
 should be.  Take them for what they are and hopefully everyone can
 learn a few things from them.
 
 Intelligent feedback is welcome.
 
 http://www.potentialtech.com/wmoran/postgresql.php

I notice that the Linux FSs weren't tested with noatime.  Any 
reason?

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

As I like to joke, I may have invented it, but Microsoft made 
it popular
David Bradley, regarding Ctrl-Alt-Del 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Russell Garrett
 The web site queries will jump up one or two orders of magnitude (I
 have seen a normally 100ms query take in excess of 30 seconds) in
 duration at seemingly random points. It's not always when the
 transactions are committing, and it doesn't seem to be during
 checkpointing either. The same thing happens with WAL switched off.
 It appears to happen the first time the query runs after a while. If
 I run the same query immediately afterwards, it will take the normal
 amount of time.

 Looks like it got flushed out of every type of cache and IO scheduler
 could not deliver immediately because of other loads...

Yeah, I wasn't sure what (or how) Postgres caches. The db server does have
2Gb of memory, but then again the database amounts to more than 2Gb, so it's
fairly possible it's getting pushed out of cache. It's also fairly possible
that it's not tuned completely optimally. I wonder if FreeBSD/kernel 2.6
would perform better in such a situation...

Russ


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Christopher Browne
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Balazs Wellisch) 
wrote:
 Very interesting results. I'd like to command you on your honesty.
 Having started out with the intentions of proving that FreeBSD is faster
 than Linux only to find that the opposite is true must not have been
 rewarding for you. However, these unexpected results serve only to
 reinforce the integrity of your tests.

Well put.

To see a result that the tester didn't really want to see/present does
suggest good things about the tester's honesty.  There was incentive
to hide unfavorable results.

What it still leaves quite open is just what happens when the OS has
more than one disk drive or CPU to play with.  It's not clear what
happens in such cases, whether FreeBSD would catch up, or be left
further in the dust.  The traditional propaganda has been that
there are all sorts of reasons to expect PostgreSQL on FreeBSD to run
a bit faster than on Linux; it is a bit unexpected for the opposite to
seem true.
-- 
output = reverse(gro.mca @ enworbbc)
http://www3.sympatico.ca/cbbrowne/sap.html
I am aware of the benefits  of a micro kernel approach.  However, the
fact remains  that Linux is  here, and GNU  isn't --- and  people have
been working on Hurd for a lot longer than Linus has been working on
Linux. -- Ted T'so, 1992.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Jeff
On Tue, 26 Aug 2003, Bill Moran wrote:


 Intelligent feedback is welcome.

That's some good work there, Lou. You'll make sgt for that someday.

But I think the next step, before trying out other filesystems and options
would be concurrency. Run a bunch of these beasts together and see what
happens (I don't think too many of us have a single session running).
Perhaps even make them interfere with each other to create as much
pain as possible?

on a side note - I might be blind here - I didn't see what version of pg
you were using or any postgresql.conf tweaks - or did you just use
whatever came with each distro?

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Rod Taylor
Couple of questions:

What was the postgresql.conf configuration used? Default?

How many threads of the script ran? Looks like a single user only.

I assume there was nothing else running at the time (cron, sendmail,
etc. were all off?)

Do you know whether the machines were disk or I/O bound?

Was PostgreSQL compiled the same for each OS or did you use the rpm,
deb, tgz that were available?

On Tue, 2003-08-26 at 21:47, Bill Moran wrote:
 Hey all.
 
 I said I was going to do it, and I finally did it.
 
 As with all performance tests/benchmarks, there are probably dozens or
 more reasons why these results aren't as accurate or wonderful as they
 should be.  Take them for what they are and hopefully everyone can
 learn a few things from them.
 
 Intelligent feedback is welcome.
 
 http://www.potentialtech.com/wmoran/postgresql.php


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Tomka Gergely
2003-08-28 ragyog napjn Christopher Browne ezt zente:

 A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Balazs Wellisch) 
 wrote:
  Very interesting results. I'd like to command you on your honesty.
  Having started out with the intentions of proving that FreeBSD is faster
  than Linux only to find that the opposite is true must not have been
  rewarding for you. However, these unexpected results serve only to
  reinforce the integrity of your tests.

 Well put.

 To see a result that the tester didn't really want to see/present does
 suggest good things about the tester's honesty.  There was incentive
 to hide unfavorable results.

 What it still leaves quite open is just what happens when the OS has
 more than one disk drive or CPU to play with.  It's not clear what
 happens in such cases, whether FreeBSD would catch up, or be left
 further in the dust.  The traditional propaganda has been that
 there are all sorts of reasons to expect PostgreSQL on FreeBSD to run
 a bit faster than on Linux; it is a bit unexpected for the opposite to
 seem true.

AFAIK *BSD better in the handling of big loads - maybe when multiple
concurrent tests run against a linux and a bsd box, we see better result.
Or not.

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Tomka Gergely
2003-08-28 ragyog napjn Ludek Finstrle ezt zente:

  Intelligent feedback is welcome.
 
  http://www.potentialtech.com/wmoran/postgresql.php

 Good work. But I can't find information about xfs. Do you plan to add
 this one FS in test?

http://mail.sth.sze.hu/~hsz/sql/



-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Chris Bowlby
On Tue, 2003-08-26 at 23:59, Ron Johnson wrote:

 What a fun exercises.  Ok, lets see:
 Postgres 7.3.4
 RH AS 2.1
 12GB RAM
 motherboard with 64 bit 66MHz PCI slots
 4 - Xenon 3.0GHz (1MB cache) CPUs
 8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
 having 512MB cache (for database)
 2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
 having 512MB cache (for OS, swap, WAL files)
 1 - library tape drive plugged into the OS' SCSI controller.  I
 prefer DLT, but that's my DEC bias.
 1 - 1000 volt UPS.

 Be careful here, we've seen that with the P4 Xeon's that are
hyper-threaded and a system that has very high disk I/O causes the
system to be sluggish and slow. But after disabling the hyper-threading
itself, our system flew..

-- 
Chris Bowlby [EMAIL PROTECTED]
Hub.Org Networking Services


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 11:05, Chris Bowlby wrote:

 On Tue, 2003-08-26 at 23:59, Ron Johnson wrote:
 
  What a fun exercises.  Ok, lets see:
  Postgres 7.3.4
  RH AS 2.1
  12GB RAM
  motherboard with 64 bit 66MHz PCI slots
  4 - Xenon 3.0GHz (1MB cache) CPUs
  8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
  having 512MB cache (for database)
  2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
  having 512MB cache (for OS, swap, WAL files)
  1 - library tape drive plugged into the OS' SCSI controller.  I
  prefer DLT, but that's my DEC bias.
  1 - 1000 volt UPS.
 
  Be careful here, we've seen that with the P4 Xeon's that are
 hyper-threaded and a system that has very high disk I/O causes the
 system to be sluggish and slow. But after disabling the hyper-threading
 itself, our system flew..

Anybody has opteron working? Hows' the performance?

Bye
 Shridhar

--
A father doesn't destroy his children.  -- Lt. Carolyn Palamas, Who Mourns 
for 
Adonais?, stardate 3468.1.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Michael Guerin
Stephan Szabo wrote:

On Wed, 27 Aug 2003, Michael Guerin wrote:

 

I'm running into some performance problems trying to execute simple
queries.
postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 40
ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes 1s to return,
postgresql doesn't return at all, neither does explain analyze.
   

IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
generally much better (for reasonably sized subqueries) but in earlier
versions you'll probably want to convert into an EXISTS or join form.
 

Something else seems to be going on, even switching to an exists clause 
gives much better but poor performance.
count(*) where exists clause: Postgresql 19s, SQL Server 1s
count(*) where not exists:  23.3s SQL Server 1.5s

SQL Server runs on a dual 1.4 with 4gigs, win2k
Postgresql runs on a quad 900 with 8 gigs, sunos 5.8


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Vivek Khera

sm On 27 Aug 2003, matt wrote:

 My app is likely to come under some serious load in the next 6 months,
 but the increase will be broadly predictable, so there is time to throw
 hardware at the problem.
 
 Currently I have a ~1GB DB, with the largest (and most commonly accessed
 and updated) two tables having 150,000 and 50,000 rows.

Just how big do you expect your DB to grow?  For a 1GB disk-space
database, I'd probably just splurge for an SSD hooked up either via
SCSI or FibreChannel.  Heck, up to about 5Gb or so it is not that
expensive (about $25k) and adding another 5Gb should set you back
probably another $20k.  I use an SSD from Imperial Technology
( http://www.imperialtech.com/ ) for mail spools.  My database is way
to big for my budget to put in SSD.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 10:38, Michael Guerin wrote:
 IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
 generally much better (for reasonably sized subqueries) but in earlier
 versions you'll probably want to convert into an EXISTS or join form.
 Something else seems to be going on, even switching to an exists clause 
 gives much better but poor performance.
 count(*) where exists clause: Postgresql 19s, SQL Server 1s
 count(*) where not exists:  23.3s SQL Server 1.5s

This was with 7.4? Can you try downloading 7.4CVS and try?

 
 SQL Server runs on a dual 1.4 with 4gigs, win2k
 Postgresql runs on a quad 900 with 8 gigs, sunos 5.8

SunOS...Not the impala out there but anyways I would refrain from slipping in 
that..

Parden me if this is a repeatation,  have you set your effective cache size?

Bye
 Shridhar

--
Nouvelle cuisine, n.:   French for not enough food.Continental breakfast, n.: 
English for not enough food.Tapas, n.:Spanish for not enough food.Dim Sum, 
n.: Chinese for more food than you've ever seen in your entire life.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Rod Taylor
 I need to increase the overall performance by a factor of 10, while at
 the same time the DB size increases by a factor of 50.  e.g. 3000
 inserts/updates or 25,000 selects per second, over a 25GB database with
 most used tables of 5,000,000 and 1,000,000 rows.

Ok.. I would be surprised if you needed much more actual CPU power. I
suspect they're mostly idle waiting on data -- especially with a Quad
Xeon (shared memory bus is it not?).

I'd be looking to get your hands on a large pSeries machine from IBM or
perhaps an 8-way Opteron (not that hard to come by today, should be easy
in the near future). The key is low latency ram tied to a chip rather
than a centralized bus -- a 3800 SunFire would do too ;).

Write performance won't matter very much. 3000 inserts/second isn't high
-- some additional battery backed write cache may be useful but not
overly important with enough ram to hold the complete dataset. I suspect
those are slow due to things like foreign keys -- which of course are 
selects.

 Notably, the data is very time-sensitive, so the active dataset at any
 hour is almost certainly going to be more on the order of 5GB than 25GB
 (plus I'll want all the indexes in RAM of course).

Very good. Find yourself 8GB to 12GB ram and you should be fine. In this
case, additional ram will keep the system from hitting the disk for
writes as well.

You may want to play around with checkpoints. Prevention of a checkpoint
during this hour will help prevent peaks. Be warned though, WAL will
grow very large, and recovery time should a crash occur could be
painful.

You say the data is very time sensitive -- how time sensitive? Are the
selects all based on this weeks data? A copy of the database on a second
machine (say your Quad Xeon) for static per client data would be very
useful to reduce needless load. I assume the application servers have
already cached any static global data by this point.

Finally, upgrade to 7.4. Do use prepared statements. Do limit the number
of connections any given application server is allowed (especially for
short transactions). 3 PostgreSQL processes per CPU (where the box limit
is not Disk) seems to be about right -- your OS may vary.

Pre-calculate anything you can. Are the $ amounts for a transaction
generally the the same? Do you tend to have repeat clients? Great --
make your current clients transactions a day in advance. Now you have a
pair of selects and 1 update (mark it with the time the client actually
approved it). If the client doesn't approve of the pre-calculated
transaction, throw it away at some later time.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Stephan Szabo
On Thu, 28 Aug 2003, Michael Guerin wrote:

 Stephan Szabo wrote:

 On Wed, 27 Aug 2003, Michael Guerin wrote:
 
 
 
 I'm running into some performance problems trying to execute simple
 queries.
 
 postgresql version 7.3.3
 .conf params changed from defaults.
 shared_buffers = 64000
 sort_mem = 64000
 fsync = false
 effective_cache_size = 40
 
 ex. query: select * from x where id in (select id from y);
 
 There's an index on each table for id.  SQL Server takes 1s to return,
 postgresql doesn't return at all, neither does explain analyze.
 
 
 
 IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
 generally much better (for reasonably sized subqueries) but in earlier
 versions you'll probably want to convert into an EXISTS or join form.
 
 
 
 
 Something else seems to be going on, even switching to an exists clause
 gives much better but poor performance.
 count(*) where exists clause: Postgresql 19s, SQL Server 1s
 count(*) where not exists:  23.3s SQL Server 1.5s

What does explain analyze show for the two queries?


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread scott.marlowe
On 27 Aug 2003, matt wrote:

  You probably, more than anything, should look at some kind of 
  superfast, external storage array
 
 Yeah, I think that's going to be a given.  Low end EMC FibreChannel
 boxes can do around 20,000 IOs/sec, which is probably close to good
 enough.
 
 You mentioned using multiple RAID controllers as a boost - presumably
 the trick here is to split the various elements (WAL, tables, indexes)
 across different controllers using symlinks or suchlike?  Can I feasibly
 split the DB tables across 5 or more controllers?

I'm not sure I'd split the tables by hand right up front.  Try getting as 
many hard drives as you can afford hooked up at once, and then try 
different ways of partitioning them.  I'm guessing that making two fairly 
good sized 1+0 sets, one for data and one for WAL might be the best 
answer.

  Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I 
  saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going 
  for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should 
  fly.
 
 Jeez, and I thought I was joking about the Starfire.  Even Slowaris
 would be OK on one of them.
 
 The financial issue is that there's just not that much money in the
 micropayments game for bursty sales.  If I was doing these loads
 *continuously* then I wouldn't be working, I'd be in the Maldives :-)

$24,000 isn't that much for a server really, and if you can leverage this 
one sale to get more, then it would likely pay for itself over time.

If you have problems keeping up with load, it will be harder to get more 
customers, so you kinda wanna do this as well as possible the first time.




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Matt Clark
 Just how big do you expect your DB to grow?  For a 1GB disk-space
 database, I'd probably just splurge for an SSD hooked up either via
 SCSI or FibreChannel.  Heck, up to about 5Gb or so it is not that
 expensive (about $25k) and adding another 5Gb should set you back
 probably another $20k.  I use an SSD from Imperial Technology
 ( http://www.imperialtech.com/ ) for mail spools.  My database is way
 to big for my budget to put in SSD.

I may well be able to split some tables that aren't used in joins into a separate DB, 
and could well use an SSD for those.

In fact two of the inserts per user interaction could be split off, and they're not 
financially important tables, so fsync=false
could be enabled for those, in which case an SSD might be overkill...

The whole thing will definitely *not* fit in an SSD for a sensible price, but the WAL 
might well!






---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Matt Clark
 Ok.. I would be surprised if you needed much more actual CPU power. I
 suspect they're mostly idle waiting on data -- especially with a Quad
 Xeon (shared memory bus is it not?).

In reality the CPUs get pegged: about 65% PG and 35% system.  But I agree that memory 
throughput and latency is an issue.

 Write performance won't matter very much. 3000 inserts/second isn't high
 -- some additional battery backed write cache may be useful but not
 overly important with enough ram to hold the complete dataset. I suspect
 those are slow due to things like foreign keys -- which of course are
 selects.

3000 inserts/sec isn't high when they're inside one transaction, but if each is inside 
its own transaction then that's 3000
commits/second.

 case, additional ram will keep the system from hitting the disk for
 writes as well.

How does that work?

 You may want to play around with checkpoints. Prevention of a checkpoint
 during this hour will help prevent peaks. Be warned though, WAL will
 grow very large, and recovery time should a crash occur could be
 painful.

Good point.  I'll have a think about that.




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] performance of foreign key constraints

2003-08-28 Thread teknokrat
I have a table that has about 20 foreign key constraints on it. I think 
this is a bit excessive and am considering removing them ( they are all 
related to the same table and I don't think there is much chance of any 
integrity violations ). Would this improve performance or not?

thanks

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Michael Guerin
Stephan Szabo wrote:

On Thu, 28 Aug 2003, Michael Guerin wrote:

 

Stephan Szabo wrote:

   

On Wed, 27 Aug 2003, Michael Guerin wrote:



 

I'm running into some performance problems trying to execute simple
queries.
postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 40
ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes 1s to return,
postgresql doesn't return at all, neither does explain analyze.
   

IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
generally much better (for reasonably sized subqueries) but in earlier
versions you'll probably want to convert into an EXISTS or join form.


 

Something else seems to be going on, even switching to an exists clause
gives much better but poor performance.
count(*) where exists clause: Postgresql 19s, SQL Server 1s
count(*) where not exists:  23.3s SQL Server 1.5s
   

What does explain analyze show for the two queries?

 

explain analyze  select count(*) from tbltimeseries where exists(select 
uniqid  from tblobjectname where timeseriesid = uniqid);
Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual 
time=22756.64..22756.64 rows=1 loops=1)
  -  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336 
width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
Filter: (NOT (subplan))
SubPlan
  -  Index Scan using idx_objectname on tblobjectname  
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 
loops=1200673)
Index Cond: ($0 = uniqid)
Total runtime: 22756.83 msec
(7 rows)

fiasco=# explain analyze  select count(*) from tbltimeseries where 
exists(select uniqid  from tblobjectname where timeseriesid = uniqid);
  QUERY 
PLAN

explain analyze  select count(*) from tbltimeseries where exists(select 
uniqid  from tblobjectname where timeseriesid = uniqid);
Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual 
time=19558.77..19558.77 rows=1 loops=1)
  -  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336 
width=0) (actual time=0.21..19557.73 rows=560 loops=1)
Filter: (subplan)
SubPlan
  -  Index Scan using idx_objectname on tblobjectname  
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 
loops=1200673)
Index Cond: ($0 = uniqid)
Total runtime: 19559.04 msec
(7 rows)





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Shridhar Daithankar wrote:
On 28 Aug 2003 at 1:07, Anders K. Pedersen wrote:
We're running a set of Half-Life based game servers that lookup user 
privileges from a central PostgreSQL 7.3.4 database server (I recently 
ported the MySQL code in Adminmod to PostgreSQL to be able to do this).

The data needed by the game servers are combined from several different 
tables, so we have some views set up to provide the data in the format 
needed.

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.


Check vmstat during the same period if it is syncing at that point as Tom 
suggested.
I've been running a vmstat 1 logging process for a while now, and the 
sample below shows what happende around one of these spikes - at 
18:18:03 specifically (actually there were two 1 second long queries, 
that finished at 18:18:03).

Thu Aug 28 18:17:53 2003 0  0  0  40904   4568  22288 404352   0   0 
12 0  181   362  2   1  97
Thu Aug 28 18:17:54 2003 0  0  0  40904   4580  22260 404380   0   0 
128 0  205   330  2   3  95
Thu Aug 28 18:17:55 2003 0  0  2  40904   4576  22264 404380   0   0 
 0   284  224   127  0   1  99
Thu Aug 28 18:17:56 2003 0  0  2  40904   5008  22268 404512   0   0 
128   728  571   492  2   3  95
Thu Aug 28 18:17:57 2003 0  0  1  40904   5000  22276 404512   0   0 
 0   120  201   181  1   0  99
Thu Aug 28 18:17:58 2003 0  0  1  40904   4936  22284 404528   0   0 
 8 0 1147  2204 12   3  85
Thu Aug 28 18:17:59 2003 0  0  0  40904   4784  22304 404660   0   0 
148 0 2112  3420  2   3  95
Thu Aug 28 18:18:00 2003 1  1  3  40904   4760  22324 404664   0   0 
20   456 2374  3277  2   1  97
Thu Aug 28 18:18:01 2003 0  2 10  40904   4436  22000 401456   0   0 
144   540  510   457 11   6  83
Thu Aug 28 18:18:02 2003 1  1  2  40904   8336  22032 401512   0   0 
68   676 1830  2540  4   3  93
Thu Aug 28 18:18:04 2003 1  0  1  40904   8160  22052 401664   0   0 
140   220 2308  3253  2   3  95
Thu Aug 28 18:18:05 2003 0  0  1  40904   7748  22064 402064   0   0 
288 0 1941  2856  1   3  96
Thu Aug 28 18:18:06 2003 0  0  3  40904   6704  22064 403100   0   0 
496   992 2326  3510  0   5  95
Thu Aug 28 18:18:07 2003 1  0  0  40904   6324  22088 402716   0   0 
260   188 1984  2927 11   4  85
Thu Aug 28 18:18:08 2003 0  0  0  40904   6920  22088 402828   0   0 
72 0  419  1473 17   5  78
Thu Aug 28 18:18:09 2003 0  0  0  40904   6784  22088 402964   0   0 
128 0  235   476  2   1  97
Thu Aug 28 18:18:10 2003 0  0  1  40904   6404  22088 402980   0   0 
 0 0  343   855 14   2  84

As this shows, some disk I/O and an increased amount of interrupts and 
context switches is taking place at this time, and this also happens at 
the same time as all the other long queries I examined. However, vmstat 
also shows this pattern at a lot of other times, where the queries 
aren't affected by it.

Are you using pooled connections? If yes you could shorten life of a connection 
and force making a new connection every 10-15 minutes say. That would avoid IO 
avelanche at the end of the hour types.
I'm not quite sure, what you mean by pooled connections. Each game 
server has one connection to the PostgreSQL server, which is opened, 
when the server is first started, and then never closed (until the game 
server terminates, but there's days between this happens).

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Al Hulaton
http://www.potentialtech.com/wmoran/postgresql.php
--
Bill Moran
Potential Technologies
http://www.potentialtech.com
Adding my voice to the many, thanks for sharing your results Bill. Very 
instructive.

--
Best,
Al Hulaton|  Sr. Account Engineer  |  Command Prompt, Inc.
503.222.2783  |  [EMAIL PROTECTED]
Home of Mammoth PostgreSQL and 'Practical PostgreSQL'
Managed PostgreSQL, Linux services and consulting
Read and Search O'Reilly's 'Practical PostgreSQL' at
http://www.commandprompt.com
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] performance of foreign key constraints

2003-08-28 Thread teknokrat
Stephan Szabo wrote:

On Thu, 28 Aug 2003, teknokrat wrote:


I have a table that has about 20 foreign key constraints on it. I think
this is a bit excessive and am considering removing them ( they are all
related to the same table and I don't think there is much chance of any
integrity violations ). Would this improve performance or not?


It depends on your frequency of inserts/updates to the table with the
constraint and the frequency of update/delete to the table(s) being
refered to. My guess is probably.  You may wish to leave some of the
constraints (decide which are the most important), but 20 does seem a bit
excessive in general.
The references are all to the same table i.e. they are employee ids, so 
leaving some and not others would make no sense. The table has no 
deletes, small amount of inserts and moderate amount of updates. However 
there are many selects and its their performance I am most concerned with.

thanks

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
 What it still leaves quite open is just what happens when the OS has
 more than one disk drive or CPU to play with.  It's not clear what
 happens in such cases, whether FreeBSD would catch up, or be left
 further in the dust.  The traditional propaganda has been that
 there are all sorts of reasons to expect PostgreSQL on FreeBSD to
 run a bit faster than on Linux; it is a bit unexpected for the
 opposite to seem true.

Let me nip this in the butt before people run away with ideas that
aren't correct.  When the tests were performed in FreeBSD 5.1 and
Linux, the hard drives were running UDMA.  When running 4.8, for some
reason his drives settled in on PIO mode:

ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) falling back to PIO mode

The benchmarks were hardly conclusive as UDMA runs vastly faster than
PIO.  Until we hear back as to whether cables were jarred loose
between the tests or hearing if something else changed, I'd hardly
consider these conclusive tests given PIO/UDMA is apples to oranges in
terms of speed and I fully expect that FreeBSD 4.8 will perform at
least faster than 5.1 (5.x is still being unwound from Giant), but
should out perform Linux as well if industry experience iss any
indicator.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Tom Lane wrote:
Anders K. Pedersen [EMAIL PROTECTED] writes:

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.


One thing that comes to mind is that the slow query could be occurring
at the same time as a checkpoint, or some other cycle-chewing background
operation.  It's not clear why a checkpoint would slow things down that
much, though.  Anyway I'd suggest looking for such activities; once we
know if that's the issue or not, we can make some progress.
One of my colleagues suggested looking for checkpoints as well; I 
searched the log, but only the following messages turned up:

Aug 11 15:21:04 gs1 postgres[5447]: [2] LOG:  checkpoint record is at 
0/80193C
Aug 23 13:59:51 gs1 postgres[16451]: [2] LOG:  checkpoint record is at 
0/201EB74
Aug 25 02:48:17 gs1 postgres[1059]: [2] LOG:  checkpoint record is at 
0/2B787D0

Currently there are only relatively few changes to the database - one 
INSERT everytime one of our game admins executes an administrative 
command (like ban or kick), and this happens at most 10 times per hour. 
As I understand checkpoints, this should mean, that they aren't 
happening very often, and when they do, should be able to finish almost 
immediately.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread Rod Taylor
On Thu, 2003-08-28 at 12:37, Matt Clark wrote:
  Ok.. I would be surprised if you needed much more actual CPU power. I
  suspect they're mostly idle waiting on data -- especially with a Quad
  Xeon (shared memory bus is it not?).
 
 In reality the CPUs get pegged: about 65% PG and 35% system.  But I agree that 
 memory throughput and latency is an issue.

system in this case is dealing with disk activity or process switches?

Usually the 65% includes the CPU waiting on a request for data from main
memory. Since you will be moving a lot of data through the CPU, the L1 /
L2 cache doesn't help too much (even large cache), but low latency high 
bandwidth memory will make a significant difference.  CPUs not having to
wait on other CPUs doing a memory fetch will make an even larger
difference (dedicated memory bus per CPU).

Good memory is the big ticket item. Sun CPUs are not better than Intel
CPUs, for simple DB interaction. It's the additional memory bandwidth
that makes them shine.  Incidentally, Suns are quite slow with PG for
calculation intensive work on a small dataset.

  Write performance won't matter very much. 3000 inserts/second isn't high
  -- some additional battery backed write cache may be useful but not
  overly important with enough ram to hold the complete dataset. I suspect
  those are slow due to things like foreign keys -- which of course are
  selects.
 
 3000 inserts/sec isn't high when they're inside one transaction, but if each is 
 inside its own transaction then that's 3000
 commits/second.

Still not anything to concern yourself with.  WAL on battery backed
write cache (with a good controller) will more than suffice -- boils
down to the same as if fsync was disabled. You might want to try putting
it onto it's own controller, but I don't think you will see much of a
change.  20k WAL operations / sec would be something to worry about.

  case, additional ram will keep the system from hitting the disk for
  writes as well.
 
 How does that work?

Simple.  Your OS will buffer writes in memory until they are required to
hit disk (fsync or similar).  Modify the appropriate sysctl to inform
the OS it can use more than 10% (10% is the FreeBSD default I believe)
of the memory for writes.  Buffering 4GB of work in memory (WAL logs
will ensure this is crash safe) will nearly eliminate I/O.

When the OS is no longer busy, it will filter the writes from ram back
to disk. Visibly, there is no change to the user aside from a speed
increase.

  You may want to play around with checkpoints. Prevention of a checkpoint
  during this hour will help prevent peaks. Be warned though, WAL will
  grow very large, and recovery time should a crash occur could be
  painful.
 
 Good point.  I'll have a think about that.

This is more important with a larger buffer. A checkpoint informs the OS
to dump the buffer to disk so it can guarantee it hit hardware (thus
allowing PG to remove / recycle WAL files).


I do think your best bet is to segregate the DB.  Read / write, by user
location, first 4 digits of the credit card, anything will make a much
better system.

Keep a master with all of the data that can take the full week to
process it.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Rod Taylor
 With regards to other jobs on the server, there is a MySQL server on it 
 as well, which from time to time has some multi-second queries generated 
 from a webserver also on this host, but the MySQL is running with nice 
 10 (PostgreSQL isn't nice'd).

Do those MySQL queries hit disk hard?

I've never seen PostgreSQL have hicups like you describe when running on
a machine by itself.  I have experienced similar issues when another
process (cron job in my case) caused brief swapping to occur.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Stephan Szabo
On Thu, 28 Aug 2003, Michael Guerin wrote:

 Stephan Szabo wrote:

 On Thu, 28 Aug 2003, Michael Guerin wrote:
 
 
 
 Stephan Szabo wrote:
 
 
 
 On Wed, 27 Aug 2003, Michael Guerin wrote:
 
 
 
 
 
 I'm running into some performance problems trying to execute simple
 queries.
 
 postgresql version 7.3.3
 .conf params changed from defaults.
 shared_buffers = 64000
 sort_mem = 64000
 fsync = false
 effective_cache_size = 40
 
 ex. query: select * from x where id in (select id from y);
 
 There's an index on each table for id.  SQL Server takes 1s to return,
 postgresql doesn't return at all, neither does explain analyze.
 
 
 
 
 IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
 generally much better (for reasonably sized subqueries) but in earlier
 versions you'll probably want to convert into an EXISTS or join form.
 
 
 
 
 
 
 Something else seems to be going on, even switching to an exists clause
 gives much better but poor performance.
 count(*) where exists clause: Postgresql 19s, SQL Server 1s
 count(*) where not exists:  23.3s SQL Server 1.5s
 
 
 
 What does explain analyze show for the two queries?
 
 
 
 
 explain analyze  select count(*) from tbltimeseries where exists(select
 uniqid  from tblobjectname where timeseriesid = uniqid);
 Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
 time=22756.64..22756.64 rows=1 loops=1)
-  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
 width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
  Filter: (NOT (subplan))
  SubPlan
-  Index Scan using idx_objectname on tblobjectname
 (cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
 loops=1200673)
  Index Cond: ($0 = uniqid)
  Total runtime: 22756.83 msec
 (7 rows)

Hmm... I'd thought that it had options for a better plan than that.

What do things like:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
 tblobjectname where timeseriesid=uniquid;

and

explain analyze select count(distinct timeseriesid) from
 tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
 where uniqid is null;

give you?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] performance of foreign key constraints

2003-08-28 Thread Stephan Szabo

On Thu, 28 Aug 2003, teknokrat wrote:

 Stephan Szabo wrote:

  On Thu, 28 Aug 2003, teknokrat wrote:
 
 
 I have a table that has about 20 foreign key constraints on it. I think
 this is a bit excessive and am considering removing them ( they are all
 related to the same table and I don't think there is much chance of any
 integrity violations ). Would this improve performance or not?
 
 
  It depends on your frequency of inserts/updates to the table with the
  constraint and the frequency of update/delete to the table(s) being
  refered to. My guess is probably.  You may wish to leave some of the
  constraints (decide which are the most important), but 20 does seem a bit
  excessive in general.
 

 The references are all to the same table i.e. they are employee ids, so
 leaving some and not others would make no sense. The table has no
 deletes, small amount of inserts and moderate amount of updates. However
 there are many selects and its their performance I am most concerned with.

The foreign keys should only really affect insert/update/delete
performance.  If you're using 7.3.4 (I think) then updates to the fk table
that don't change any of the keys should be relatively cheap.  I'd be much
more worried if you had any changes the the referenced employee table that
might change the key because that could get relatively expensive.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Rod Taylor wrote:
With regards to other jobs on the server, there is a MySQL server on it 
as well, which from time to time has some multi-second queries generated 
from a webserver also on this host, but the MySQL is running with nice 
10 (PostgreSQL isn't nice'd).
Do those MySQL queries hit disk hard?
I guess they may be able to do so - the MySQL database is 450 MB, and 
the server has 512 MB RAM, and some of the queries pretty summarizes 
everything in the database.

However, I just cross-referenced the access logs from the webserver with 
the duration logs, and although some of the spikes did happen, while 
there would have been some MySQL activity (I can't tell for sure, if it 
was simple queries or the long ones), other spikes happened without any 
website activity in the surrounding minutes.

I've never seen PostgreSQL have hicups like you describe when running on
a machine by itself.  I have experienced similar issues when another
process (cron job in my case) caused brief swapping to occur.
OK. I may have to try to put the database on a separate server.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread Bill Moran
I need to step in and do 2 things:

First, apologize for posting inaccurate test results.

Second, verify that Sean is absolutely correct.  FreeBSD 4.8 was accessing
the drives in PIO mode, which is significantly lousier than DMA, which
RedHat was able to use.  As a result, the tests are unreasonably skewed
in favor of Linux.
The only thing that the currently posted results prove is that Linux is
better at dealing with crappy hardware than BSD (which I feel we already
knew).
I did some rescrounging, and found some newer hardware stuffed in a
corner that I had forgotten was even around.  I am currently re-running
the tests and will post new results as soon as there are enough to
be interesting to talk about.
In an attempt to avoid the same mistake, I did a timed test with dd(1)
to a raw partition on both Linux and FreeBSD to ensure that both systems
are able to access the hardware at more or less the same speed. The
results of this will be included.
I'm also gathering considerably more information about the state of
the system during the tests, which should answer a number of questions
I've been getting.
To the many people who asked questions like why not try filesystem x
on distribution y and similar questions, the answer in most cases is
time. I've pared the tests down some so they run faster, and I'm hoping
to be able to run more tests on more combinations of configurations as
a result. Also, I never intended for anyone to assume that I was _done_
testing, just that I had enough results for folks to talk about.
I'll post again when I have enough results to be interesting, until then,
I apologize again for the inaccurate results.
Sean Chittenden wrote:
What it still leaves quite open is just what happens when the OS has
more than one disk drive or CPU to play with.  It's not clear what
happens in such cases, whether FreeBSD would catch up, or be left
further in the dust.  The traditional propaganda has been that
there are all sorts of reasons to expect PostgreSQL on FreeBSD to
run a bit faster than on Linux; it is a bit unexpected for the
opposite to seem true.
Let me nip this in the butt before people run away with ideas that
aren't correct.  When the tests were performed in FreeBSD 5.1 and
Linux, the hard drives were running UDMA.  When running 4.8, for some
reason his drives settled in on PIO mode:
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
241 tn 12 sn 44) falling back to PIO mode
The benchmarks were hardly conclusive as UDMA runs vastly faster than
PIO.  Until we hear back as to whether cables were jarred loose
between the tests or hearing if something else changed, I'd hardly
consider these conclusive tests given PIO/UDMA is apples to oranges in
terms of speed and I fully expect that FreeBSD 4.8 will perform at
least faster than 5.1 (5.x is still being unwound from Giant), but
should out perform Linux as well if industry experience iss any
indicator.
--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
 I need to step in and do 2 things:

Thanks for posting that.  Let me know if you have any questions while
doing your testing.  I've found that using 16K blocks on FreeBSD
results in about an 8% speedup in writes to the database, fwiw.

I'm likely going to make this the default for PostgreSQL on FreeBSD
starting with 7.4 (just posted something to -hackers about this)f.  If
you'd like to do this in your testing, just apply the following patch.

Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K
blocks which means that currently, reading two blocks of data in PG is
two read calls to the OS, one reads 16K of data off disk and returns
the 1st page, the 2nd call pulls the 2nd block from the FS cache.  In
making things 16K, it avoids the need for the 2nd system call which is
where the performance difference is coming from, afaikt.  -sc

-- 
Sean Chittenden
Index: src/include/pg_config_manual.h
===
RCS file: /home/ncvs/pgsql/pgsql-server/src/include/pg_config_manual.h,v
retrieving revision 1.5
diff -u -r1.5 pg_config_manual.h
--- src/include/pg_config_manual.h  4 Aug 2003 00:43:29 -   1.5
+++ src/include/pg_config_manual.h  27 Aug 2003 17:40:12 -
@@ -23,7 +23,7 @@
  *
  * Changing BLCKSZ requires an initdb.
  */
-#define BLCKSZ 8192
+#define BLCKSZ 16384
 
 /*
  * RELSEG_SIZE is the maximum number of blocks allowed in one disk

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] opinion on RAID choice

2003-08-28 Thread Vivek Khera
I just ran a handful of tests on a 14-disk array on a SCSI hardware
RAID card.

From some quickie benchmarks using the bonnie++ benchmark, it appears
that the RAID5 across all 14 disks is a bit faster than RAID50 and
noticeably faster than RAID10...

Sample numbers for a 10Gb file (speed in Kbytes/second)
   
   RAID5 RAID50   RAID10   
sequential write:  39728 3756823533
read/write file:   13831 1328911400 
sequential read:   52184 5152954222 


Hardware is a Dell 2650 dual Xeon, 4GB Ram, PERC3/DC RAID card with
14 external U320 SCSI 15kRPM drives.  Software is FreeBSD 4.8 with the
default newfs settings.

The RAID drives were configured with 32k stripe size.  From informal
tests it doesn't seem to make much difference in the bonnie++
benchmark to go with 64k stripe on the RAID10 (didn't test it with
RAID5 or RAID50).  They say use larger stripe size for sequential
access, and lower for random access.

My concern is speed.  Any RAID config on this system has more disk
space than I will need for a LOOONG time.

My Postgres load is a heavy mix of select/update/insert.  ie, it is a
very actively updated and read database.

The conventional wisdom has been to use RAID10, but with 14 disks, I'm
kinda leaning toward RAID50 or perhaps just RAID5.

Has anyone else done similar tests of different RAID levels?  What
were your conclusions?

Raw output from bonnie++ available upon request.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Vivek Khera
 SC == Sean Chittenden [EMAIL PROTECTED] writes:

 I need to step in and do 2 things:
SC Thanks for posting that.  Let me know if you have any questions while
SC doing your testing.  I've found that using 16K blocks on FreeBSD
SC results in about an 8% speedup in writes to the database, fwiw.

Where/how does one set this?  In postgresql.conf or on the file system
or during compilation of postgres?  I'm on FreeBSD 4.8 still.

I've got a box right now on which I'm comparing the speed merits of
hardware RAID10, RAID5, and RAID50 using a filesystem benchmark
utility (bonnie++).  If I have time I'm gonna try different striping
block sizes.  Right now I'm using 32k byte stripe size.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Vivek Khera
 SC == Sean Chittenden [EMAIL PROTECTED] writes:

 I need to step in and do 2 things:
SC Thanks for posting that.  Let me know if you have any questions while
SC doing your testing.  I've found that using 16K blocks on FreeBSD
SC results in about an 8% speedup in writes to the database, fwiw.

ok.. ignore my prior request about how to set that... i missed you had
included a patch.

Any recommendations on newfs parameters for an overly large file
system used solely for Postgres?  Over 100Gb (with raid 10) or over
200Gb (with raid 5)?


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] The results of my PostgreSQL/filesystem performance

2003-08-28 Thread scott.marlowe
On Thu, 28 Aug 2003, Sean Chittenden wrote:

  What it still leaves quite open is just what happens when the OS has
  more than one disk drive or CPU to play with.  It's not clear what
  happens in such cases, whether FreeBSD would catch up, or be left
  further in the dust.  The traditional propaganda has been that
  there are all sorts of reasons to expect PostgreSQL on FreeBSD to
  run a bit faster than on Linux; it is a bit unexpected for the
  opposite to seem true.
 
 Let me nip this in the butt before people run away with ideas that
 aren't correct.  When the tests were performed in FreeBSD 5.1 and
 Linux, the hard drives were running UDMA.  When running 4.8, for some
 reason his drives settled in on PIO mode:
 
 ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
 241 tn 12 sn 44) retrying
 ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
 241 tn 12 sn 44) retrying
 ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
 241 tn 12 sn 44) retrying
 ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 
 241 tn 12 sn 44) falling back to PIO mode
 
 The benchmarks were hardly conclusive as UDMA runs vastly faster than
 PIO.  Until we hear back as to whether cables were jarred loose
 between the tests or hearing if something else changed, I'd hardly
 consider these conclusive tests given PIO/UDMA is apples to oranges in
 terms of speed and I fully expect that FreeBSD 4.8 will perform at
 least faster than 5.1 (5.x is still being unwound from Giant), but
 should out perform Linux as well if industry experience iss any
 indicator.

Plus, in most real servers you're gonna be running SCSI, so it might be 
nice to see a test with a good SCSI controller (Symbios 875 is a nice 
choice) and a couple hard drives, one each for WAL and data.  This would 
more closely resemble actual usage and there are likely to be fewer issues 
with things like UDMA versus PIO on SCSI.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simple queries take forever to run

2003-08-28 Thread Michael Guerin
Stephan Szabo wrote:

On Thu, 28 Aug 2003, Michael Guerin wrote:

 

Stephan Szabo wrote:

   

On Thu, 28 Aug 2003, Michael Guerin wrote:



 

Stephan Szabo wrote:



   

On Wed, 27 Aug 2003, Michael Guerin wrote:





 

I'm running into some performance problems trying to execute simple
queries.
postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 40
ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes 1s to return,
postgresql doesn't return at all, neither does explain analyze.


   

IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
generally much better (for reasonably sized subqueries) but in earlier
versions you'll probably want to convert into an EXISTS or join form.




 

Something else seems to be going on, even switching to an exists clause
gives much better but poor performance.
count(*) where exists clause: Postgresql 19s, SQL Server 1s
count(*) where not exists:  23.3s SQL Server 1.5s
   

What does explain analyze show for the two queries?



 

explain analyze  select count(*) from tbltimeseries where exists(select
uniqid  from tblobjectname where timeseriesid = uniqid);
Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
time=22756.64..22756.64 rows=1 loops=1)
  -  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
Filter: (NOT (subplan))
SubPlan
  -  Index Scan using idx_objectname on tblobjectname
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
loops=1200673)
Index Cond: ($0 = uniqid)
Total runtime: 22756.83 msec
(7 rows)
   

Hmm... I'd thought that it had options for a better plan than that.

What do things like:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
tblobjectname where timeseriesid=uniquid;
and

explain analyze select count(distinct timeseriesid) from
tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
where uniqid is null;
give you?

 

much better performance:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
tblobjectname where timeseriesid=uniquid;
Aggregate  (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1 
loops=1)
  -  Nested Loop  (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13 
rows=561 loops=1)
-  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) (actual 
time=0.05..4.98 rows=1282 loops=1)
-  Index Scan using xx on tbltimeseries  (cost=0.00..5.72 rows=1 width=4) 
(actual time=0.51..0.51 rows=0 loops=1282)
  Index Cond: (tbltimeseries.timeseriesid = outer.uniqid)
Total runtime: 669.61 msec
(6 rows)
explain analyze select count(distinct timeseriesid) from
tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
where uniqid is null;
Aggregate  (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47 
rows=1 loops=1)
  -  Hash Join  (cost=37.02..56142.51 rows=1200673 width=8) (actual 
time=7.41..6376.12 rows=1200113 loops=1)
Hash Cond: (outer.timeseriesid = inner.uniqid)
Filter: (inner.uniqid IS NULL)
-  Seq Scan on tbltimeseries  (cost=0.00..44082.73 rows=1200673 width=4) 
(actual time=0.01..3561.61 rows=1200673 loops=1)
-  Hash  (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0 
loops=1)
  -  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) 
(actual time=0.04..2.84 rows=1282 loops=1)
Total runtime: 12699.76 msec
(8 rows)






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] The results of my PostgreSQL/filesystem performance tests

2003-08-28 Thread Sean Chittenden
  I need to step in and do 2 things:
 SC Thanks for posting that.  Let me know if you have any questions while
 SC doing your testing.  I've found that using 16K blocks on FreeBSD
 SC results in about an 8% speedup in writes to the database, fwiw.
 
 ok.. ignore my prior request about how to set that... i missed you
 had included a patch.
 
 Any recommendations on newfs parameters for an overly large file
 system used solely for Postgres?  Over 100Gb (with raid 10) or over
 200Gb (with raid 5)?

Nope, you'll have to test and see.  If you find something that works,
however, let me know.  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] pgsql inserts problem

2003-08-28 Thread Tarhon-Onu Victor

Hi,

I have a (big) problem with postgresql when making lots of 
inserts per second. I have a tool that is generating an output of ~2500 
lines per seconds. I write a script in PERL that opens a pipe to that 
tool, reads every line and inserts data.
I tryed both commited and not commited variants (the inserts 
were commited at every 60 seconds), and the problem persists.

The problems is that only ~15% of the lines are inserted into 
the database. The same script modified to insert the same data in a 
similar table created in a MySQL database inserts 100%.

I also dropped the indexes on various columns, just to make sure 
that the overhead is not to big (but I also need that indexes because 
I'll make lots of SELECTs from that table).
I tried both variants: connecting to a host and localy (through 
postgresql server's socket (/tmp/s.PGSQL.5432).

Where can be the problem?

I'm using postgresql 7.4 devel snapshot 20030628 and 20030531. 
Some of the settings are:

shared_buffers = 520
max_locks_per_transaction = 128
wal_buffers = 8 
max_fsm_relations = 3
max_fsm_pages = 482000 
sort_mem = 131072
vacuum_mem = 131072
effective_cache_size = 1
random_page_cost = 2

-- 
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Bruno Wolff III
On Wed, Aug 27, 2003 at 15:50:32 +0300,
  Tarhon-Onu Victor [EMAIL PROTECTED] wrote:
 
   The problems is that only ~15% of the lines are inserted into 
 the database. The same script modified to insert the same data in a 
 similar table created in a MySQL database inserts 100%.

Did you check the error status for the records that weren't entered?

My first guess is that you have some bad data you are trying to insert.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Christopher Kings-Lynne
  The problems is that only ~15% of the lines are inserted into
  the database. The same script modified to insert the same data in a
  similar table created in a MySQL database inserts 100%.

 Did you check the error status for the records that weren't entered?

 My first guess is that you have some bad data you are trying to insert.

I wouldn't be surprised, MySQL will just insert a zero instead of failing in
most cases :P

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Richard Huxton
On Wednesday 27 August 2003 13:50, Tarhon-Onu Victor wrote:

 shared_buffers = 520
 max_locks_per_transaction = 128
 wal_buffers = 8
 max_fsm_relations = 3
 max_fsm_pages = 482000
 sort_mem = 131072
 vacuum_mem = 131072
 effective_cache_size = 1
 random_page_cost = 2

Slightly off-topic, but I think your tuning settings are a bit out. You've got 
4MB allocated to shared_buffers but 128MB allocated to sort_mem? And only 
80MB to effective_cache_size? Your settings might be right, but you'd need a 
very strange set of circumstances.

As for PG silently discarding inserts, your best bet might be to write a short 
Perl script to reproduce the problem. Without that, people are likely to be 
sceptical - if PG tended to do this sort of thing, none of us would use it.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])