Re: [PERFORM] Best OS for Postgres 8.2

2007-05-13 Thread Andrew McMillan
On Tue, 2007-05-08 at 23:31 -0400, Greg Smith wrote:
 
 My issue wasn't with the idea, it was with the implementation.  When I 
 have my newbie hat on, it adds a layer of complexity that isn't needed for 
 simple installs.

I find it very hard to agree with that.

As a newbie I install postgresql and have a database server installed,
and operating.  The fact that the DB files are installed somewhere
like /var/lib/postgresql/8.1/main is waaay beyond newbie.

At that point I can createdb or createuser, but I _do_ _not_ need to
know anything about the cluster stuff until there is more than one DB on
the machine.

The Debian wrappers all default appropriately for the single-cluster
case, so having a single cluster has added _no_ perceivable complexity
for a newbie (as it should).

If you have a second cluster, whether it's the same Pg version or not,
things necessarily start to get complicated.  OTOH I haven't had any
problem explaining to people that the --cluster option applies, and
there are sane ways to make that default to a reasonable thing as well.

All in all I think that the Debian scripts are excellent.  I'm sure
there are improvements that could be made, but overall they don't get in
the way, they do the right thing in the minimal case, and they give the
advanced user a lot more choices about multiple DB instances on the same
machine.

Cheers,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
 Open Source: the difference between trust and antitrust
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Question about memory allocations

2007-04-13 Thread Andrew McMillan
On Tue, 2007-04-10 at 15:28 -0400, Steve wrote:
 
 I'm trying to tune the memory usage of a new machine that has a -lot- of 
 memory in it (32 gigs).

...
 
 shared_buffers = 16GB

Really?

Wow!

Common wisdom in the past has been that values above a couple of hundred
MB will degrade performance.  Have you done any benchmarks on 8.2.x that
show that you get an improvement from this, or did you just take the
too much of a good thing is wonderful approach?

Cheers,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
 You have an unusual equipment for success.  Be sure to use it properly.
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Temporary Table

2005-11-07 Thread Andrew McMillan
On Tue, 2005-11-08 at 10:22 +, Christian Paul B. Cosinas wrote:
 I see.
 
 But How Can I put this in the Cron of my Linux Server?
 I really don't have an idea :)
 What I want to do is to loop around all the databases in my server and
 execute the vacuum of these 3 tables in each tables.

I usually write a small shell script something like:

==
#!/bin/sh

psql somedatabase EOQ
  VACUUM this;
  VACUUM that;
  DELETE FROM someotherplace WHERE delete_this_record;
EOQ
==

and so forth...

This makes the SQL quite nicely readable.


Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
You work very hard.  Don't try to think as well.
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Andrew McMillan
On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote:
  
  On my laptop this takes:
  
  real1m33.758s
  user0m4.285s
  sys 0m1.181s
  
  One interesting effect is the data in bayes_vars has a huge number of
  updates and needs vacuum _frequently_.  After the run a vacuum full
  compacts it down from 461 pages to 1 page.
  
 
 I wonder why your laptop is so much faster.  My 2 min 30 sec test was
 done on a dual xeon with a LSI megaraid with 128MB cache and writeback
 caching turned on.

I only do development stuff on my laptop, and all of my databases are
reconstructable from copies, etc...  so I turn off fsync in this case.


 How often should this table be vacuumed, every 5 minutes?

I would be tempted to vacuum after each e-mail, in this case.


 Also, this test goes a bit faster with sync turned off, if mysql isn't
 using sync that would be why it's so much faster.  Anyone know what the
 default for mysql is?

It depends on your table type for MySQL.

For the data in question (i.e. bayes scoring) it would seem that not
much would be lost if you did have to restore your data from a day old
backup, so perhaps fsync=false is OK for this particular application.

Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  What we wish, that we readily believe.
 -- Demosthenes
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Andrew McMillan
On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote:
 
 Ok, I finally got some test data together so that others can test
 without installing SA.
 
 The schema and test dataset is over at
 http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
 
 I have a pretty fast machine with a tuned postgres and it takes it about
 2 minutes 30 seconds to load the test data.  Since the test data is the
 bayes information on 616 spam messages than comes out to be about 250ms
 per message.  While that is doable, it does add quite a bit of overhead
 to the email system.

On my laptop this takes:

real1m33.758s
user0m4.285s
sys 0m1.181s

One interesting effect is the data in bayes_vars has a huge number of
updates and needs vacuum _frequently_.  After the run a vacuum full
compacts it down from 461 pages to 1 page.

Regards,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  I don't do it for the money.
-- Donald Trump, Art of the Deal

-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Postgresql and Software RAID/LVM

2005-06-06 Thread Andrew McMillan
On Fri, 2005-06-03 at 11:45 -0700, Steve Poe wrote:
 I have a small business client that cannot afford high-end/high quality
 RAID cards for their next server. That's a seperate argument/issue right
 there for me, but what the client wants is what the client wants.
 
 Has anyone ran Postgres with software RAID or LVM on a production box?
 What have been your experience?

Hi,

We regularly run LVM on top of software raid for our PostgreSQL servers
(and our other servers, for that matter).

As far as I can see these systems have not had issues related to either
software RAID or LVM - that's around 30 systems all up, maybe 8 running
PostgreSQL, in production.  The database servers are a variety of
dual-Xeon (older) and dual-Opteron (newer) systems.

The Xeons are all running Debian Woody with 2.4.xx kernels and the
Opterons are all running Sarge with 2.6.x kernels.

Regards,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   The secret of being a bore is to say everything -- Voltaire
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Adaptec/LSI/?? RAID

2005-06-02 Thread Andrew McMillan
On Wed, 2005-06-01 at 20:42 -0700, Stacy White wrote:
 We're in the process of buying another Opteron server to run Postgres, and
 based on the suggestions in this list I've asked our IT director to get an
 LSI MegaRaid controller rather than one of the Adaptecs.
 
 But when we tried to place our order, our vendor (Penguin Computing) advised
 us:
 
 we find LSI does not work well with 4GB of RAM. Our engineering find that
 LSI card could cause system crashes. One of our customer ... has found that
 Adaptec cards works well on PostGres SQL -- they're using it as a preforce
 server with xfs and post-gress.
 
 Any comments?  Suggestions for other RAID controllers?

Hi,

We're using the Megaraid (Intel branded model) on a dual Opteron system
with 8G RAM very happily.  The motherboard is a RioWorks one, the OS is
Debian Sarge AMD64 with kernel 2.6.11.8 and PostgreSQL 7.4.7.

Cheers,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267


-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Adaptec/LSI/?? RAID (what about JBOD?)

2005-06-02 Thread Andrew McMillan
On Thu, 2005-06-02 at 14:02 -0700, [EMAIL PROTECTED] wrote:
 I have a similar question about what to choose (either LSI or Adaptec U320), 
 but
 plan to use them just for JBOD drivers.  I expect to be using either net or
 freebsd.  The system CPU will be Opteron.  My impression is that both the ahd
 and mpt drivers (for U320 Adaptec and LSI, respectively) are quite stable, but
 not from personal experience.  Like I said, I don't plan to have the cards 
 doing
 RAID in hardware.  Should I be pretty safe with either choice of HBA then?

On the machine I mentioned earlier in this thread we use the Megaraid
for JBOD, but the card setup to use the disks that way was somewhat
confusing, requiring us to configure logical drives that in fact matched
the physical ones.  The card still wanted to write that information onto
the disks, reducing the total disk space available by some amount, but
also meaning that we were unable to migrate our system from a previous
non-RAID card cleanly.

Regards,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
Whereof one cannot speak, thereon one must remain silent. -- Wittgenstein
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] OID vs overall system performances on high load

2005-05-29 Thread Andrew McMillan
On Sun, 2005-05-29 at 16:17 -0400, Eric Lauzon wrote:
 I am still in the dark due to my lack of knowledge on internal OID 
 management,but
 i would presume that a table with OID enable and that has high load would 
 require
 some more work from pgsql internal to maintain the OID index for the database.
  
 So OID can be beneficial on static tables, or tables that you want to be able 
 to manipulate
 with pgadmin X , but can a table without OID increase performances on 
 insert,delete,update,COPY?
  
 I am not really worried about disk space that an OID collumn can take, but i 
 was wandering if an 
 insert in a table of 20 millions and more that has oid would slow the insert 
 process. Since OID seem
 to act as a global index mabey maintaning that index can become costy over 
 high table load by postgresql
 backend.

There is no OID index, unless you create one.

The disk space that an OID column can take has an effect on performance:
reducing the amount of physical disk reads will mean that more of your
real data is cached, and so forth.  How much effect it will have will
depend on the relative size of the OID column and the other columns in
your data.

Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   http://survey.net.nz/ - any more questions?
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] OID vs overall system performances on high load

2005-05-27 Thread Andrew McMillan
On Fri, 2005-05-27 at 13:05 -0400, Eric Lauzon wrote:
 What are the effect of having a table with arround 500
 insert/update/delete on two to eight table in a time frame of 2
 minutes 24/24h, when you have oid enabled versus the same setup when
 you dont have oid?
 
 That deployment is done on a postgres with 8 to 9 databases, each
 having those 2 to 8 high load tables with oid enabled.
 
 Would the oid colum slow down table scan when you have over 20
 millions row?
 
 Would the cost of maintaining the oid column inside thoses high load
 tables when there is no oid reference used for data seeking costy for
 postgres ressources!?

The OID column is an extra few bytes on each row.  If you don't have any
use for it (and let's face it: most of us don't), then create your
tables without OID.

The amount of impact that it makes will depend on what the general row
size is.  If they are rows with a couple of integers then the size of an
OID column will be a significant portion of the size of each row, and
removing it will make the physical on-disk data size significantly
smaller.  If the size of the average row is (e.g.) 2k then the OID will
only be a very small fraction of the data, and removing it will only
make a small difference.

Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
 ... I want a COLOR T.V. and a VIBRATING BED!!!
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Using LIMIT changes index used by planner

2004-12-13 Thread Andrew McMillan
On Mon, 2004-12-13 at 01:13 -0500, Sven Willenberger wrote:
 I have a question regarding a serious performance hit taken when using a 
 LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB 
 of memory. The table in question contains some 25 million rows with a 
 bigserial primary key, orderdate index and a referrer index. The 2 
 select statements are as follow:

It's an interesting question, but to be able to get answers from this
list you will need to provide EXPLAIN ANALYZE ... rather than just
EXPLAIN 

AFAICS the bad plan on LIMIT is because it optimistically thinks the
odds are around the 0.00 end, rather than the 64297840.86 end, and
indeed that is what the Limit ... estimate is showing.  A bad plan (in
your case) is encouraged here by the combination of LIMIT and ORDER
BY.

For real background on this, and calculated recommendations, we'd need
that more detailed output though.

As a quick hack, it's possible that you could improve things by
increasing the samples on relevant columns with some judicious ALTER
TABLE ... ALTER COLUMN ... SET STATISTICS ... commands.

Cheers,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
 Planning an election?  Call us!
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] pg_restore taking 4 hours!

2004-12-02 Thread Andrew McMillan
On Wed, 2004-12-01 at 09:16 -0200, Rodrigo Carvalhaes wrote:
 
 I am using PostgreSQL with a proprietary ERP software in Brazil. The 
 database have around 1.600 tables (each one with +/- 50 columns).

...

 max_fsm_pages = 2
 max_fsm_relations = 1000

Hi,

I doubt that this will improve your pg_restore performance, but if you
have 1600 tables in the database then you very likely want to increase
the above two settings.

In general max_fsm_relations should be more than the total number of
tables across all databases in a given installation.  The best way to
set these is to do a VACUUM VERBOSE, which will print the appropriate
minimum numbers at the end of the run, along with the current setting.

Cheers,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
Never trust a computer you can't repair yourself.
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Postgres vs. DSpam

2004-11-25 Thread Andrew McMillan
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
 Hi,
 
 I have installed the dspam filter
 (http://www.nuclearelephant.com/projects/dspam) on our mail server
 (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
 with a quite low traffic of 4000 messages/day. So it's a quite common
 platform/environment, nothing spectacular.

I am using DSpam with PostgreSQL here.  I have a daily job that cleans
the DSpam database up, as follows:

DELETE FROM dspam_token_data
  WHERE (innocent_hits*2) + spam_hits  5
  AND CURRENT_DATE - last_hit  60;

DELETE FROM dspam_token_data
  WHERE innocent_hits = 1
  AND CURRENT_DATE - last_hit  30;

DELETE FROM dspam_token_data
  WHERE CURRENT_DATE - last_hit  180;

DELETE FROM dspam_signature_data
  WHERE CURRENT_DATE - created_on  14;

VACUUM dspam_token_data;

VACUUM dspam_signature_data;



I also occasionally do a VACUUM FULL ANALYZE; on the database as well.


In all honesty though, I think that MySQL is better suited to DSpam than
PostgreSQL is.


 Please, could anyone explain me this difference?
 Is Postgres that bad?
 Is MySQL that good?
 Am I the only one to have observed this behavior?

I believe that what DSpam does that is not well-catered for in the way
PostgreSQL operates, is that it does very frequent updates to rows in
(eventually) quite large tables.  In PostgreSQL the UPDATE will result
internally in a new record being written, with the old record being
marked as deleted.  That old record won't be re-used until after a
VACUUM has run, and this means that the on-disk tables will have a lot
of dead rows in them quite quickly.

The reason that PostgreSQL operates this way, is a direct result of the
way transactional support is implemented, and it may well change in a
version or two.  It's got better over the last few versions, with things
like pg_autovacuum, but that approach still doesn't suit some types of
database updating.

Cheers,
Andrew.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Andrew McMillan
On Fri, 2004-11-05 at 12:46 +0530, Antony Paul wrote:
 Hi all,
I have a table which have more than 20 records. I need to get
 the records which matches like this
 
 where today::date = '2004-11-05';
 
 This is the only condition in the query. There is a btree index on the
 column today.
 Is there any way to optimise it.

Hi Antony,

I take it your field is called today (seems dodgy, but these things
happen...).  Anywa, have you tried indexing on the truncated value?

  create index xyz_date on xyz( today::date );
  analyze xyz;

That's one way.  It depends on how many of those 200,000 rows are on
each date too, as to whether it will get used by your larger query.

Regards,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  When in doubt, tell the truth.
-- Mark Twain
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Insert performance, what should I expect?

2004-10-20 Thread Andrew McMillan
On Wed, 2004-10-20 at 11:53 +1000, Brock Henry wrote:
 
 Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
 then importing, then adding keys and indexes. Then I've got successive
 runs. I figure the reindexing will get more expensive as the database
 grows?

Sounds like the right approach to me, if the tables are empty before the
import.


 Successive Imports: 44,49,50,57,55,61,72 (seconds)
 = average 1051inserts/second (which now that I've written this seems
 fairly good)

(A) Are you doing the whole thing inside a transaction?  This will be
significantly quicker.  COPY would probably be quicker still, but the
biggest difference will be a single transaction.

(B) If you are starting with empty files, are you ensuring that the dead
records are vacuumed before you start?  I would recommend a vacuum
full on the affected tables prior to the first import run (i.e. when
the tables are empty).  This is likely to be the reason that the timing
on your successive imports increases so much.



 sort_mem = 4096   

You probably want to increase this - if you have 1G of RAM then there is
probably some spare.  But if you actually expect to use 32 connections
then 32 * 4M = 128M might mean a careful calculation is needed.  If you
are really only likely to have 1-2 connections running concurrently then
increase it to (e.g.) 32768.

 max_fsm_relations = 300  

If you do a vacuum full verbose; the last line will give you some
clues as to what to set this (and max_fsm_pages) too.


 effective_cache_size = 16000

16000 * 8k = 128M seems low for a 1G machine - probably you could say
64000 without fear of being wrong.  What does free show as cached?
Depending on how dedicated the machine is to the database, the effective
cache size may be as much as 80-90% of that.


 Can I expect it to go faster than this? I'll see where I can make my
 script itself go faster, but I don't think I'll be able to do much.
 I'll do some pre-prepare type stuff, but I don't expect significant
 gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
 not for some other databases the server is hosting.

You can probably double the speed - maybe more.

Cheers,
Andrew,
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  How many things I can do without! -- Socrates
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Seq scan vs. Index scan with different query

2004-07-05 Thread Andrew McMillan
On Mon, 2004-07-05 at 12:15 +0200, [EMAIL PROTECTED] wrote:
 Hello,
 
 Can anybody suggest any hint on this:
 
 temp= EXPLAIN SELECT DISTINCT number FROM tablex WHERE Date BETWEEN 
 '2004-06-28'::date AND '2004-07-04'::date AND Time BETWEEN '00:00:00'::time AND 
 '18:01:00'::time;
 
 Unique  (cost=305669.92..306119.43 rows=89 width=8)
   -  Sort  (cost=305669.92..305894.67 rows=89903 width=8)
   Sort Key: number
   -  Index Scan using DateTimeIndex on tablex  
 (cost=0.00..298272.66 rows=89903 width=8)
   Index Cond: ((Date = '2004-06-28'::date) AND 
 (Date = '2004-07-04'::date) AND (Time = '00:00:00'::time without time zone) 
 AND (Time = '18:01:00'::time without time zone))
 
 
 temp= EXPLAIN SELECT DISTINCT number FROM tablex WHERE Date BETWEEN 
 '2004-06-28'::date AND '2004-07-04'::date AND Time BETWEEN '00:00:00'::time AND 
 '19:01:00'::time;
 
 Unique  (cost=315252.77..315742.27 rows=97 width=8)
   -  Sort  (cost=315252.77..315497.52 rows=97900 width=8)
   Sort Key: number
   -  Seq Scan on tablex  (cost=0.00..307137.34 rows=97900 
 width=8)
   Filter: ((Date = '2004-06-28'::date) AND (Date = 
 '2004-07-04'::date) AND (Time = '00:00:00'::time without time zone) AND (Time 
 = '19:01:00'::time without time zone))
 
 Basically, the difference is in upper Time value (as you can see, it's
 18:01:00 in the first query and 19:01:00 in the other one). 
 The question is - why does it use index in first case and 
 it tries to do full sequential scan when the upper Time value
 is different?
 
 DateTimeIndex was created on both columns (Date/Time):
 CREATE INDEX DateTimeIndex ON tablex USING btree (Date, Time);

PostgreSQL is always going to switch at some point, where the number of
rows that have to be read from the table exceed some percentage of the
total rows in the table.

We can possibly be more helpful if you send EXPLAIN ANALYZE, rather than
just EXPLAIN.

A few things to be careful of:

- Is this supposed to be a slice of midnight to 6pm, for each day
between 28 June and 4 July?  If you want a continuous period from
Midnight 28 June - 6pm 4 July you're better to have a single timestamp
field.

- It is unlikely that the , Time on your index is adding much to your
selectivity, and it may be that you would be better off without it.

- the DISTINCT can screw up your results, and it usually means that the
SQL is not really the best it could be.  A _real_ need for DISTINCT is
quite rare in my experience, and from what I have seen it adds overhead
and tends to encourage bad query plans when used unnecessarily.

Hope this is some help.

Regards,
Andrew McMillan

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  Make things as simple as possible, but no simpler -- Einstein
-


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Slow vacuum performance

2004-06-18 Thread Andrew McMillan
On Thu, 2004-06-17 at 13:09 -0700, Patrick Hatcher wrote:
 
 
 
 Pg:  7.4.2
 RedHat 7.3
 Ram: 8gig
 
 I have 6 million row table that I vacuum full analyze each night.  The time
 seems to be streching out further and further as I add more rows.  I read
 the archives and Josh's annotated pg.conf guide that setting the FSM higher
 might help.  Currently, my memory settings are set as such.  Does this seem
 low?
 
 Last reading from vaccum verbose:
   INFO:  analyzing cdm.cdm_ddw_customer
 INFO:  cdm_ddw_customer: 209106 pages, 3000 rows sampled, 6041742
 estimated total rows
 I think I should now set my max FSM to at least 21 but wanted to make
 sure

Yes, that's my interpretation of those numbers too.  I would set
max_fsm_pages to 30 (or more) in that case.

If you have 8G of RAM in the machine your shared_buffers seems very low
too.  Depending on how it is used I would increase that to at least the
recommended maximum (1 - 80M).

You don't quote your setting for effective_cache_size, but you should
probably look at what /usr/bin/free reports as cached, divide that
by 10, and set it to that as a quick rule of thumb...

Regards,
Andrew McMillan


 shared_buffers = 2000   # min 16, at least max_connections*2, 8KB
 each
 sort_mem = 12288# min 64, size in KB
 
 # - Free Space Map -
 
 max_fsm_pages = 10  # min max_fsm_relations*16, 6 bytes each
 #max_fsm_relations = 1000   # min 100, ~50 bytes each
 

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  Make things as simple as possible, but no simpler -- Einstein
-


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] DB Design

2004-05-19 Thread Andrew McMillan
On Wed, 2004-05-19 at 15:37 +0800, Michael Ryan S. Puncia wrote:
 Hi Guys,
 

 
   My question is .. which is better design
 
  
 
  1. Single Table with 50 million records or
  2. Multiple Table using inheritance to the parents table

It's not that simple.

Given your e-mail address I assume you want to store Philippines Census
data in such a table, but does Census data fit well in a single flat
table structure?  Not from what I have seen here in NZ, but perhaps
Census is simpler there.

So to know what the best answer to that question is, people here will
surely need more and better information from you about database schema,
record size, indexing and query characteristics, and so on.


 I will use this only for query purpose ..

Then you may quite possibly want to consider a different database.
Particularly if it is single-user query purposes.

For example, there are some SQL databases that would load the entire
database into RAM from static files, and then allow query against this.
This can obviously give huge performance improvements in situations
where volatility is not a problem.

Cheers,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
   Do not overtax your powers.
-


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Quad processor options

2004-05-13 Thread Andrew McMillan
On Tue, 2004-05-11 at 15:46 -0700, Paul Tuckfield wrote:

 - the cache column shows that linux is using 2.3G for cache. (way too 
 much) you generally want to give memory to postgres to keep it close to 
 the user, not leave it unused to be claimed by linux cache (need to leave 
 *some* for linux tho)
 
 My recommendations:
 - I'll bet you have a low value for shared buffers, like 1.  On 
 your 3G system you should ramp up the value to at least 1G (125000 8k buffers) 
 unless something else runs on the system.   It's best to not do things too 
 drastically, so if Im right and you sit at 1 now, try going to
 3 then 6 then 125000 or above.

Huh?

Doesn't this run counter to almost every piece of PostgreSQL performance
tuning advice given?

I run my own boxes with buffers set to around 1-2 and an
effective_cache_size = 375000 (8k pages - around 3G).

That's working well with PostgreSQL 7.4.2 under Debian woody (using
Oliver Elphick's backported packages from
http://people.debian.org/~elphick/debian/).

Regards,
Andrew.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
Q:How much does it cost to ride the Unibus?
A:2 bits.
-


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-25 Thread Andrew McMillan
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote:
 Tom,
 
  The tricky
  part is that a slow adaptation rate means we can't have every backend
  figuring this out for itself --- the right value would have to be
  maintained globally, and I'm not sure how to do that without adding a
  lot of overhead.
 
 This may be a moot point, since you've stated that changing the loop timing 
 won't solve the problem, but what about making the test part of make?   I 
 don't think too many systems are going to change processor architectures once 
 in production, and those that do can be told to re-compile.

Sure they do - PostgreSQL is regularly provided as a pre-compiled
distribution.  I haven't compiled PostgreSQL for years, and we have it
running on dozens of machines, some SMP, some not, but most running
Debian Linux.

Even having a compiler _installed_ on one of our client's database
servers would usually be considered against security procedures, and
would get a black mark when the auditors came through.

Regards,
Andrew McMillan
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
 Planning an election?  Call us!
-


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] good pc but bad performance,why?

2004-04-07 Thread Andrew McMillan
On Wed, 2004-04-07 at 20:56, huang yaqin wrote:
 Hello, Richard Huxton,
   
   You said  turning fsync off may cause losing data, that's terrible.
   I use SCSI disk, and file system is ext3. I tune postgresql.conf and can't get 
 any improvement. So what can I do?
   Does SCSI disk and IDE disk have difference?

Yes, turning off fsync means that the database is not guaranteeing
consistency of writes to disk any longer.  On the other hand your IDE
system probably never was, because IDE drives just typically turn on
write caching in hardware without telling anyone.

SCSI typically doesn't turn on write caching in the physical drive by
default, as Tom Lane pointed out earlier.  Good SCSI has a battery
backed up cache, and then it is OK to turn on write caching, because the
controller has enough battery to complete all writes in the event of a
power failure.

One thing I recommend is to use ext2 (or almost anything but ext3). 
There is no real need (or benefit) from having the database on a
journalled filesystem - the journalling is only trying to give similar
sorts of guarantees to what the fsync in PostgreSQL is doing.

The suggestion someone else made regarding use of software raid is
probably also a good one if you are trying to use the on-board RAID at
the moment.

Finally, I would say that because you are seeing poor performance on one
box and great performance on another, you should look at the hardware,
or at the hardware drivers, for the problem - not so much at PostgreSQL.

Of course if it is application performance you want to achieve, we _can_
help here, but you will need to provide more details of what you are
trying to do in your application, including;
 - confirmation that you have done a VACUUM and ANALYZE of all tables
before you start
 - output from EXPLAIN ANALYZE for slow queries
 - anything else you think is useful.

without that sort of detail we can only give vague suggestions, like
wrap everything in a transaction - excellent advice, certainly, but
you can read that in the manual.

There are no magic bullets, but I am sure most of the people on this
list have systems that regularly do way more than 50 inserts / second on
server hardware.

Regards,
Andrew McMillan

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
 http://survey.net.nz/ - any questions?
-


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] 100 simultaneous connections, critical limit?

2004-01-14 Thread Andrew McMillan
On Thu, 2004-01-15 at 01:48, Jn Ragnarsson wrote:
 I am writing a website that will probably have some traffic.
 Right now I wrap every .php page in pg_connect() and pg_close().
 Then I read somewhere that Postgres only supports 100 simultaneous
 connections (default). Is that a limitation? Should I use some other
 method when writing code for high-traffic website?

Whether the overhead of pg_connect() pg_close() has a noticeable effect
on your application depends on what you do in between them.  TBH I never
do that second one myself - PHP will close the connection when the page
is finished.

I have developed some applications which are trying to be
as-fast-as-possible and for which I either use pg_pconnect so you have
one DB connection per Apache process, or I use DBBalancer where you have
a pool of connections, and pg_connect is _actually_ connecting to
DBBalancer in a very low-overhead manner and you have a pool of
connections out the back.  I am the Debian package maintainer for
DBBalancer.

You may also want to consider differentiating based on whether the
application is writing to the database or not.  Pooling and persistent
connections can give weird side-effects if transaction scoping is
bollixed in the application - a second page view re-using an earlier
connection which was serving a different page could find itself in the
middle of an unexpected transaction.  Temp tables are one thing that can
bite you here.

There are a few database pooling solutions out there. Using pg_pconnect
is the simplest of these, DBBalancer fixes some of it's issues, and
others go further still.

Another point to consider is that database pooling will give you the
biggest performance increase if your queries are all returning small
datasets.  If you return large datasets it can potentially make things
worse (depending on implementation) through double-handling of the data.

As others have said too: 100 is just a configuration setting in
postgresql.conf - not an implemented limit.

Cheers,
Andrew McMillan.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
  How many things I can do without! - Socrates
-

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Tuning PostgreSQL

2003-07-26 Thread Andrew McMillan
On Wed, 2003-07-23 at 00:53, Alexander Priem wrote:
 Wow, I never figured how many different RAID configurations one could think
 of   :)
 
 After reading lots of material, forums and of course, this mailing-list, I
 think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm
 each), one of those six disks will be a 'hot spare'. I will just put the OS,
 the WAL and the data one one volume. RAID10 is way to expensive   :)

The general heuristic is that RAID-5 is not the way to deal with
databases.  Now surely someone will disagree with me, but as I
understand it RAID-5 has a bottleneck on a single disk for the
(checksum) information.  Bottleneck is not the word you want to hear in
the context of database server.

RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is the best
choice.

As far as FS performance goes, a year or two ago I remember someone
doing an evaluation of FS performance for PostgreSQL and they found that
the best performance was...

FAT

Yep: FAT

The reason is that a lot of what the database is doing, especially
guaranteeing writes (WAL) and so forth is best handled through a
filesystem that does not get in the way.  The fundamentals will not have
changed.

It is for this reason that ext2 is very much likely to be better than
ext3.  XFS is possibly (maybe, perhaps) OK, because there are
optimisations in there for databases, but the best optimisation is to
not be there at all.  That's why Oracle want direct IO to disk
partitions so they can implement their own filesystem (i.e. record
system... table system...) on a raw partition.

Personally I don't plan to reboot my DB server more than once a year (if
that (even my_laptop currently has 37 days uptime, not including
suspend).  On our DB servers I use ext2 (rather than ext3) mounted with
noatime, and I bite the 15 minutes to fsck (once a year) rather than
screw general performance with journalling database on top of
journalling FS.  I split pg_xlog onto a separate physical disk, if
performance requirements are extreme. 

Catalyst's last significant project was to write the Domain Name
registration system for .nz (using PostgreSQL).  Currently we are
developing the electoral roll for the same country (2.8 million electors
living at 1.4 million addresses).  We use Oracle (or Progress, or MySQL)
if a client demands them, but we use PostgreSQL if we get to choose.
Increasingly we get to choose.  Good.

Regards,
Andrew.
-- 
-
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201 MOB: +64(21)635-694OFFICE: +64(4)499-2267
   Survey for nothing with http://survey.net.nz/ 
-


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]