from:"Brad Nicholson"

Re: [PERFORM] Postgresql works too slow

2005-04-19 Thread Brad Nicholson

Simon Riggs wrote:
On Mon, 2005-04-18 at 08:50 +0400, Nurlan Mukhanov (AL/EKZ) wrote:
 

I'm trying to restore my database from dump in several parrallel processes, but restore process works too slow.
Number of rows about 100 000 000,
RAM: 8192M
CPU: Ultra Sparc 3
Number of CPU: 4
OS: SunOS sun 5.8
RDBMS: PostgreSQL 8.0
   

 

How to encrease postgresql speed? Why postgres took only 5.0% of CPU time?
   

When you say restore...what are you actually doing? 
An archive recovery?
A reload?
A file-level restore of database?

 

If you are doing a restore off a pg_dump, did you dump the data as 
inserts?  This takes a lot more time to restore.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp. 

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Moving pg_xlog

2005-06-01 Thread Brad Nicholson


Keith Worthington wrote:


On Wed, 01 Jun 2005 12:19:40 -0400, Tom Lane wrote
 


Keith Worthington [EMAIL PROTECTED] writes:
   


I have been reading about increasing PostgreSQL performance
by relocating the pg_xlog to a disk other than the one
where the database resides.  I have the following pg_xlogs
on my system.

/raid02/databases/pg_xlog
/raid02/rhdb_databases/pg_xlog
/raid02/databases-8.0.0/pg_xlog
/var/lib/pgsql/data/pg_xlog

I have no idea why the forth entry is there.  It is in the PostgreSQL
installation directory.
 


It's there because the RPM sets up a database under /var/lib/pgsql/data.

   


1) stop the postmaster
2) rm -rf /var/lib/pgsql/data/pg_xlog
3) mv /raid02/databases/pg_xlog /var/lib/pgsql/data/pg_xlog
4) ln -s /var/lib/pgsql/data/pg_xlog /raid02/databases/pg_xlog
5) start postmaster
 


Put the xlog anywhere BUT there!

   


If I can do that and place the pg_xlog in the installation
directory will I create any installation issues the next
time I upgrade PostgreSQL?
 


Oh, the installation will be just fine ... but your database will not
be after the upgrade wipes out your WAL.  Put the xlog under some
non-system-defined directory.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
   



Thanks Tom.  I am glad I asked before I leaped. 8-0

Is there a convention that most people follow.  It would seem that anywhere in
the installation directory is a bad idea.  From what I have read on other
threads it does not want to be in the database directory since in most cases
that would put it on the same disk as the database.

 

We tend to use somthing that associates the WAL with the appropriate 
cluster, like


/var/lib/CLUSTER for the data
/var/lib/CLUSTER_WAL for WAL files.


I am assuming due to lack of reaction that the symbolic link is not an issue.
Is there a cleaner or more appropriate way of moving the pg_xlog.

 



A symbolic link is the standard way to do it.


Finally, am I correct in assuming that as long as the postmaster is shut down
moving the log is safe?
 



You are correct.  Moving the WAL files with the postmaster running would 
be a very bad thing.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp. 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] 'Real' auto vacuum?

2005-08-31 Thread Brad Nicholson


Mindaugas Riauba wrote:


When a row is orphaned it's added to a list of possibly available rows.
When a new row is needed the list of possible rows is examined and the
first one with a transaction id less then the lowest running transaction
id is chosen to be the new row?  These rows can be in a heap so it's
really fast to find one.
 


This is the long-term plan.However, it's actually a lot harder than it
sounds.  Patches welcome.
   



 Some ETA? Since that would be the most welcome addition for us. We
have few very heavily updated databases where table bloat and constant
vacuuming is killing performance.



How often are you vacuuming (the definition of 'constantly' tends to 
vary)?  Are you vacuuming the whole database each time?  If so, identify 
which tables are being updated frequently, and vacuum those often.  
Vacuum other tables less frequently.


Also, are you you using VACUUM FULL (if so, you certainly don't want to be).

--
Brad Nicholson  416-673-4106[EMAIL PROTECTED]
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[PERFORM] Investigating IO Saturation

2006-01-24 Thread Brad Nicholson

I'm investigating a potential IO issue.  We're running 7.4 on AIX 5.1.  
During periods of high activity (reads, writes, and vacuums), we are 
seeing iostat reporting 100% disk usage.  I have a feeling that the 
iostat numbers are misleading.  I can make iostat usage jump from less 
than 10% to greater than 95% by running a single vacuum against a 
moderate sized table (no noticeable change in the other activity).


Do I actually have a problem with IO?  Whether I do or not, at what 
point should I start to be concerned about IO problems?  If my 
understanding is correct, it should be based on the wait time.  Here's 
the output of vmstat during heavy load (reads, writes, several daily 
vacuums and a nightly pg_dump).  Wait times appear to be alright, but my 
understanding of when to start being concerned about IO starvation is 
foggy at best.


vmstat 5
kthr memory page  faultscpu
- ---   ---
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
2  2 1548418 67130   0   0   0 141   99   0 1295 22784 13128 11  4 71 14
3  3 1548422 66754   0   0   0 2127 2965   0 2836 29981 25091 26  4 39 31
2  3 1548423 66908   0   0   0 2369 3221   0 3130 34725 28424 25  7 38 30
3  5 1548423 67029   0   0   0 2223 3097   0 2722 31885 25929 26  9 33 32
3  3 1548423 67066   0   0   0 2366 3194   0 2824 43546 36226 30  5 35 31
2  4 1548423 67004   0   0   0 2123 3236   0 2662 25756 21841 22  4 39 35
2  4 1548957 66277   0   0   0 1928 10322   0 2941 36340 29906 28  6 34 33
3  5 1549245 66024   0   0   0 2324 14291   0 2872 39413 25615 25  4 34 37
2  6 1549282 66107   0   0   0 1930 11189   0 2832 72062 32311 26  5 32 38
2  4 1549526 65855   0   0   0 2375 9278   0 2822 40368 32156 29  5 37 29
2  3 1548984 66227   0   0   0 1732 5065   0 2825 39240 30788 26  5 40 30
3  4 1549341 66027   0   0   0 2325 6453   0 2790 37567 30509 28  5 37 30
2  4 1549377 65789   0   0   0 1633 2731   0 2648 35533 27395 20  5 39 36
1  5 1549765 65666   0   0   0 2272 3340   0 2792 43002 34090 26  5 29 40
2  3 1549787 65646   0   0   0 1779 2679   0 2596 37446 29184 22  5 37 36
2  5 1548985 66263   0   0   0 2077 3086   0 2778 49579 39940 26  9 35 30
2  4 1548985 66473   0   0   0 2078 3093   0 2682 23274 18460 22  3 41 34
4  3 1548985 66263   0   0   0 2177 3344   0 2734 43029 35536 29  5 38 28
1  4 1548985 66491   0   0   0 1978 3215   0 2739 28291 22672 23  4 41 32
3  3 1548985 66422   0   0   0 1732 2469   0 2852 71865 30850 28  5 38 29

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Investigating IO Saturation

2006-01-24 Thread Brad Nicholson


Joshua D. Drake wrote:


Brad Nicholson wrote:

I'm investigating a potential IO issue.  We're running 7.4 on AIX 
5.1.  During periods of high activity (reads, writes, and vacuums), 
we are seeing iostat reporting 100% disk usage.  I have a feeling 
that the iostat numbers are misleading.  I can make iostat usage jump 
from less than 10% to greater than 95% by running a single vacuum 
against a moderate sized table (no noticeable change in the other 
activity).


Well that isn't surprising. Vacuum is brutal especially on 7.4 as that 
is pre background writer. What type of IO do you have available (RAID, 
SCSI?)



Data LUN is RAID 10, wal LUN is RAID 1.


--
Brad Nicholson  416-673-4106[EMAIL PROTECTED]
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] vacuum full seems to hang on very small table

2006-04-04 Thread Brad Nicholson

Dan Harris wrote:
 I have a table with 1 live row that I found has 115000 dead rows in it (
 from a testing run ).  I'm trying to VACUUM FULL the table and it has
 run for over 18 hours without completion.  Considering the hardware on
 this box and the fact that performance seems reasonable in all other
 aspects, I'm confused as to why this would happen.  The database other
 than this table is quite large ( 70 gigs on disk ) and I would expect to
 take days to complete but I just did 'vacuum full table_stats'.  That
 should only do that table, correct?  I'm running 8.0.3.

VACUUM FULL requires an exclusive lock on the table that it's vacuuming.
 Chances are something else has a lock on the table is blocking the
vacuum from obtaining the necessary lock.  Check pg_locks for ungranted
locks, you'll probably find that the request from the vacuum is ungranted.
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-10 Thread Brad Nicholson

Tom Lane wrote:

 This is unfortunately not going to help you as far as getting that
 machine into production now (unless you're brave enough to run CVS tip
 as production, which I certainly am not).  I'm afraid you're most likely
 going to have to ship that pSeries back at the end of the month, but
 while you've got it it'd be awfully nice if we could use it as a testbed

We have PSeries boxes here that won't be going away anytime soon.  If
there are any specific test cases that need to run, I should be able to
find the time to do it.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] newly created database makes queries run 300% faster

2006-06-30 Thread Brad Nicholson

Ksenia Marasanova wrote:
 Hi,
 
 Alfter hours of adjusting performance of the queries in my Postgres
 7.3 database - reprogramming the queries, VACUUMing, changing value of
 enable_seqscan - I gived it up, recreated the database and transferred
 the dump of the old database into it.
 The queries went from 15 sec to 50 msec!! Wow.
 Now I would really love to know how the old database got that slow,
 and how can I avoid it in the future. Any tips are greatly
 appreciated!

If memory servers me (and it might not in this case), vacuum in 7.3 had
issues with indexes.  Reindexing or clustering your tables might have
helped.  Both are blocking operations.

How to avoid it in the future is simple.  Upgrade to a modern version of
Postgres and vacuum your database properly.  People work on this thing
for a reason :-)


-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Vacuum not identifying rows for removal..

2006-08-21 Thread Brad Nicholson

On Mon, 2006-08-21 at 11:50 -0700, Eamonn Kent wrote:

 So, my best guess is that something in our application is preventing
 vacuum from removing dead rows.  What could cause this?  Would it be
 caused by a long-living transaction?  What is the best way to track
 the problem down...right now, I am looking through pg_stat_activity
 and pg_locks to find processes that are “in transaction” and what
 locks they are holding.

If you have any long running transactions - idle or active, that's your
problem.  Vacuum can only clear out dead tuples older than that oldest
transaction.  Deal with those.  Make sure every single transaction  your
app initiates commits or rolls back every single time.

You'll generally find them in pg_stat_activity, but not always.  ps may
show you idle transactions not showing as idle in pg_stat_activity
 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Identifying bloated tables

2006-08-28 Thread Brad Nicholson

On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall
wrote:
 I just put together a view, which helps us in indentifying which 
 database tables are suffering from space bloat, ie. they take up much 
 more space than they actually should. I though this might be useful for 
 some folk here, because the questions about bloat-related performance 
 degradation are quite common.

Are you sure you haven't reinvented the wheel?  Have you checked out
contrib/pgstattuple ?

Brad.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL and sql-bench

2006-09-21 Thread Brad Nicholson

On Thu, 2006-09-21 at 07:52 -0700, yoav x wrote:
 Hi
 
 After upgrading DBI and DBD::Pg, this benchmark still picks MySQL as the 
 winner (at least on Linux
 RH3 on a Dell 1875 server with 2 hyperthreaded 3.6GHz CPUs and 4GB RAM).
 I've applied the following parameters to postgres.conf:
 
 max_connections = 500
 shared_buffers = 3000
 work_mem = 10
 effective_cache_size = 30
 
 Most queries still perform slower than with MySQL. 
 Is there anything else that can be tweaked or is this a limitation of PG or 
 the benchmark?

As mentioned by others, you are using a benchmark that is slanted
towards MySQL. 


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] recommended benchmarks

2006-09-22 Thread Brad Nicholson

On Fri, 2006-09-22 at 13:14 -0400, Charles Sprickman wrote:
 Hi all,
 
 I still have an dual dual-core opteron box with a 3Ware 9550SX-12 sitting 
 here and I need to start getting it ready for production.  I also have to 
 send back one processor since we were mistakenly sent two. Before I do 
 that, I would like to record some stats for posterity and post to the list 
 so that others can see how this particular hardware performs.
 
 It looks to be more than adequate for our needs...
 
 What are the standard benchmarks that people here use for comparison 
 purposes?  I know all benchmarks are flawed in some way, but I'd at least 
 like to measure with the same tools that folks here generally use to get a 
 ballpark figure.

Check out the OSDL stuff.

http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/

Brad.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson

On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote:
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  Adnan DURSUN
  i want to be can read an execution plan when 
  i look at it. 
  So, is there any doc about how it should be read ?
 
 
 You are asking how to read the output from EXPLAIN?  This page is a good
 place to start:
 
 http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 

Robert Treat's Explaining Explain presentation from OSCON is also very
good:

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22

Brad.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson

On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote:
* When any session updates the data that already in shared
 buffer, 
 does Postgres synchronize the data both disk and shared buffers area 
  immediately ?
 
 Not necessarily true. When a block is modified in the shared buffers,
 the modified block is written to the Postgres WAL log. A periodic DB
 checkpoint is performed to flush the modified blocks in the shared
 buffers to the data files.

Postgres 8.0 and beyond have a process called bgwriter that continually
flushes dirty buffers to disk, to minimize the work that needs to be
done at checkpoint time.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Priority to a mission critical transaction

2006-11-23 Thread Brad Nicholson

On Tue, 2006-11-21 at 21:43 -0200, Carlos H. Reimer wrote:
 Hi,
  
 We have an application that is mission critical, normally very fast,
 but when an I/O or CPU bound transaction appears, the mission critical
 application suffers. Is there a way go give some kind of priority to
 this kind of application?
 Reimer


Not that I'm aware of.  Depending on what the problems transactions are,
setting up a replica on a separate machine and running those
transactions against the replica might be the solution.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-12 Thread Brad Nicholson

On Mon, 2006-12-11 at 20:22 -0200, Daniel van Ham Colchete wrote:
 
 I'm thinking about writing a script to make all the tests (more than 3
 times each), get the data and plot some graphs.
 
 I don't have the time right now to do it, maybe next week I'll have.

Check out the OSDL test suite stuff.  It runs the test and handles all
the reporting for you (including graphs).
http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/

 I invite everyone to comment/sugest on the procedure or the results.

I'd recommend running each test for quite a bit longer.  Get your
buffers dirty and exercised, and see what things look like then.

Also, if you have the disk, offload your wal files to a separate disk.
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Not Picking Index

2007-02-16 Thread Brad Nicholson

On Fri, 2007-02-16 at 20:01 +0530, Gauri Kanekar wrote:
 
 I want the planner to ignore a specific index.
 I am testing some query output. For that purpose i dont want the
 index.
 I that possible to ignore a index by the planner.

If the indexed field is an intger, add 0 to it.


-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

I'm having a problem with long running commits appearing in my database
logs.  It may be hardware related, as the problem appeared when we moved
the database to a new server connected to a different disk array.  The
disk array is a lower class array, but still more than powerful enough
to handle the IO requirements.  One big difference though is that the
old array had 16 GB of cache, the new one has 4 GB.

Running Postgres 8.1.8 on AIX 5.3

We have enough IO to spare that we have the bgwriter cranked up pretty
high, dirty buffers are getting quickly.  Vmstat indicates 0 io wait
time, no swapping or anything nasty like that going on.

The long running commits do not line up with checkpoint times.

The postgresql.conf config are identical except that wal_buffers was 8
on the old master, and it is set to 16 on the new one.

We have other installations of this product running on the same array
(different servers though) and they are not suffering from this
problem. 

The only other thing of note is that the wal files sit on the same disk
as the data directory.  This has not changed between the old and new
config, but the installs that are running fine do have their wal files
on a separate partition.

Any ideas where the problem could lie?  Could having the wal files on
the same data partition cause long running commits when there is plenty
of IO to spare?

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
 I'm having a problem with long running commits appearing in my database
 logs.  It may be hardware related, as the problem appeared when we moved
 the database to a new server connected to a different disk array.  The
 disk array is a lower class array, but still more than powerful enough
 to handle the IO requirements.  One big difference though is that the
 old array had 16 GB of cache, the new one has 4 GB.
 
 Running Postgres 8.1.8 on AIX 5.3
 
 We have enough IO to spare that we have the bgwriter cranked up pretty
 high, dirty buffers are getting quickly.  Vmstat indicates 0 io wait
 time, no swapping or anything nasty like that going on.
 
 The long running commits do not line up with checkpoint times.
 
 The postgresql.conf config are identical except that wal_buffers was 8
 on the old master, and it is set to 16 on the new one.
 
 We have other installations of this product running on the same array
 (different servers though) and they are not suffering from this
 problem. 
 
 The only other thing of note is that the wal files sit on the same disk
 as the data directory.  This has not changed between the old and new
 config, but the installs that are running fine do have their wal files
 on a separate partition.
 
 Any ideas where the problem could lie?  Could having the wal files on
 the same data partition cause long running commits when there is plenty
 of IO to spare?

More on this - we also have long running commits on installations that
do have the wal files on a separate partition.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

On Thu, 2007-09-13 at 11:10 -0400, Tom Lane wrote:
 Brad Nicholson [EMAIL PROTECTED] writes:
  On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
  I'm having a problem with long running commits appearing in my database
  logs.  It may be hardware related, as the problem appeared when we moved
  the database to a new server connected to a different disk array.
 
  More on this - we also have long running commits on installations that
  do have the wal files on a separate partition.
 
 What's your definition of long running commit --- seconds? milliseconds?
 Exactly what are you measuring?  Can you correlate the problem with what

log_min_duration is set to 150ms

Commits running over that up to 788ms.  Here is what we see in the logs
(with obfuscated dbname, username and IP):

2007-09-13 10:01:49.787 CUT [782426] dbname username 1.2.3.171 LOG:
duration: 224.286 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 10:19:16.373 CUT [737404] dbname username 1.2.3.174 LOG:
duration: 372.545 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 10:19:24.437 CUT [1806498] dbname username 11.2.3.171 LOG:
duration: 351.544 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 10:33:11.204 CUT [962598] dbname username 1.2.3.170 LOG:
duration: 504.057 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 10:40:33.735 CUT [1282104] dbname username 1.2.3.174 LOG:
duration: 250.127 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 10:49:54.752 CUT [1188032] dbname username 1.2.3.170 LOG:
duration: 382.781 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2007-09-13 11:30:43.339 CUT [1589464] dbname username 1.2.3.172 LOG:
duration: 408.463 ms  statement: EXECUTE unnamed  [PREPARE:  commit]


-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
 On Thu, 13 Sep 2007, Brad Nicholson wrote:

 I'd be curious to see how you've got your background writer configured to 
 see if it matches situations like this I've seen in the past.  The 
 parameters controlling the all scan are the ones you'd might consider 
 turning down, definately the percentage and possibly the maxpages as well.


bgwriter_delay = 50 # 10-1 milliseconds between
rounds
bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers
scanned/round
bgwriter_lru_maxpages = 300 # 0-1000 buffers max
written/round
bgwriter_all_percent = 20   # 0-100% of all buffers
scanned/round
bgwriter_all_maxpages = 600 # 0-1000 buffers max
written/round


-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote:
 On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
  On Thu, 13 Sep 2007, Brad Nicholson wrote:
 
  I'd be curious to see how you've got your background writer configured to 
  see if it matches situations like this I've seen in the past.  The 
  parameters controlling the all scan are the ones you'd might consider 
  turning down, definately the percentage and possibly the maxpages as well.
 
 
 bgwriter_delay = 50 # 10-1 milliseconds between
 rounds
 bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers
 scanned/round
 bgwriter_lru_maxpages = 300 # 0-1000 buffers max
 written/round
 bgwriter_all_percent = 20   # 0-100% of all buffers
 scanned/round
 bgwriter_all_maxpages = 600 # 0-1000 buffers max
 written/round

I should add, there are 6 back ends running on this disk array
(different servers and different data partitions) with these bgwriter
settings. 

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Long Running Commits - Not Checkpoints

2007-09-13 Thread Brad Nicholson

On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
 Since you're probably not monitoring I/O waits and similar statistics on 
 how the disk array's cache is being used, whether this is happening or not 
 to you won't be obvious from what the operating system is reporting.  


A sysadmin looked at cache usage on the disk array.  The read cache is
being used heavily, and the write cache is not.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Curious about dead rows.

2007-11-16 Thread Brad Nicholson

On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
 Russell Smith [EMAIL PROTECTED] writes:
  It is possible that analyze is not getting the number of dead rows right?
 
 Hah, I think you are on to something.  ANALYZE is telling the truth
 about how many dead rows it saw, but its notion of dead is not good
 according to SnapshotNow.  Thus, rows inserted by a not-yet-committed
 transaction would be counted as dead.  So if these are background
 auto-analyzes being done in parallel with inserting transactions that
 run for awhile, seeing a few not-yet-committed rows would be
 unsurprising.
 
 I wonder if that is worth fixing?  I'm not especially concerned about
 the cosmetic aspect of it, but if we mistakenly launch an autovacuum
 on the strength of an inflated estimate of dead rows, that could be
 costly.

Sounds to me like that could result in autovacuum kicking off while
doing large data loads.  This sounds suspiciously like problem someone
on -novice was having - tripping over a windows autovac bug while doing
a data load

http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Brad Nicholson

I have a legacy system still on 7.4 (I know, I know...the upgrade is
coming soon).

I have a fairly big spike happening once a day, every day, at the same
time.  It happens during a checkpoint, no surprise there.  I know the
solution to the problem (upgrade to a modern version), but what I'm
looking for as an explanation as to why one particular checkpoint would
be so bad on a low volume system, so I can appease certain management
concerns. 

This is a _really _low volume system, less than 500 writes/hour.  Normal
operation sees checkpoint related spikes of around 200-300 milliseconds.
We always checkpoint at the checkpoint timeout (every 5 minutes).
During this one checkpoint, I'm seeing transactions running 2-3 seconds.
During this time, writes are  5/minute.

Relevant settings:
shared_buffers = 1

checkpoint_segments = 30
checkpoint_timeout = 300

What gives?

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Brad Nicholson


On Thu, 2007-11-29 at 16:14 +, Simon Riggs wrote:
 On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote:
  I have a legacy system still on 7.4 (I know, I know...the upgrade is
  coming soon).
  
  I have a fairly big spike happening once a day, every day, at the same
  time.  It happens during a checkpoint, no surprise there.  I know the
  solution to the problem (upgrade to a modern version), but what I'm
  looking for as an explanation as to why one particular checkpoint would
  be so bad on a low volume system, so I can appease certain management
  concerns. 
  
  This is a _really _low volume system, less than 500 writes/hour.  Normal
  operation sees checkpoint related spikes of around 200-300 milliseconds.
  We always checkpoint at the checkpoint timeout (every 5 minutes).
  During this one checkpoint, I'm seeing transactions running 2-3 seconds.
  During this time, writes are  5/minute.
  
  Relevant settings:
  shared_buffers = 1
  
  checkpoint_segments = 30
  checkpoint_timeout = 300
  
  What gives?
 
 If the timing is regular, its most likely a human-initiated action
 rather then a behavioural characteristic.
 
 VACUUM runs in background at that time, updates loads of blocks which
 need to be written out at checkpoint time. That slows queries down at
 that time but not others.

Bingo.  Big vacuum daily vacuum completes shortly before this chckpoint.

Thanks. 

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Brad Nicholson


On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote:
 Peter,
 
 We are doing vacuum full every alternate day. We also do vacuum
 analyze very often.
 We are currently using 8.1.3 version.
 Auto vacuum is already on. But the table1 is so busy that auto vacuum
 don't get sufficient chance to vacuum it :(.

You should seriously consider upgrading to PG 8.3.  There have been
substantial improvements to VACUUM since 8.1

Brad.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Brad Nicholson

On Mon, 2008-11-24 at 11:43 -0700, Kevin Kempter wrote:
 Hi All;
 
 I've installed pg_buffercache and I want to use it to help define the optimal 
 shared_buffers size. 
 
 Currently I run this each 15min via cron:
 insert into buffercache_stats select now(), isdirty, count(*) as buffers, 
 (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 
 and here's it's explain plan
 explain insert into buffercache_stats select now(), isdirty, count(*) as 
 buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 QUERY PLAN
 ---
  Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
-  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
  -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00 
 rows=1000 width=1)
 (3 rows)
 
 
 Then once a day I will pull a report from the buffercache_stats table. The 
 buffercache_stats table is our own creation :
 
 \d buffercache_stats
  Table public.buffercache_stats
  Column |Type | Modifiers
 +-+---
  snap_timestamp | timestamp without time zone |
  isdirty| boolean |
  buffers  | integer   |
  memory| integer   |
 
 
 Here's my issue, the server that we'll eventually roll this out to is 
 extremely busy and the every 15min query above has the potential to have a 
 huge impact on performance.

I wouldn't routinely run pg_buffercache on a busy database.  Plus, I
don't think that pg_buffercache will answer this question for you. It
will tell you whats currently in the buffer pool and the clean/dirty
status, but that's not the first place I'd look, but what you really
need is to figure out the hit ratio on the buffer pool and go from
there.

 Does anyone have any suggestions per a better approach or maybe a way to 
 improve the performance for the above query ?

You should be able to use the blocks hit vs block read data in the
pg_stat_database view (for the overall database), and drill down into
pg_statio_user_tables/pg_statio_all_tables to get more detailed data if
you want.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Brad Nicholson

On Mon, 2008-11-24 at 12:46 -0700, Scott Marlowe wrote:
 On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter
 [EMAIL PROTECTED] wrote:
  Hi All;
 
  I've installed pg_buffercache and I want to use it to help define the 
  optimal
  shared_buffers size.
 
  Currently I run this each 15min via cron:
  insert into buffercache_stats select now(), isdirty, count(*) as buffers,
  (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 
  and here's it's explain plan
  explain insert into buffercache_stats select now(), isdirty, count(*) as
  buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 QUERY PLAN
  ---
   Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
-  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
  -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00
  rows=1000 width=1)
  (3 rows)
 
 
  Then once a day I will pull a report from the buffercache_stats table. The
  buffercache_stats table is our own creation :
 
  \d buffercache_stats
  Table public.buffercache_stats
  Column |Type | Modifiers
  +-+---
   snap_timestamp | timestamp without time zone |
   isdirty| boolean |
   buffers  | integer   |
   memory| integer   |
 
 
  Here's my issue, the server that we'll eventually roll this out to is
  extremely busy and the every 15min query above has the potential to have a
  huge impact on performance.
 
  Does anyone have any suggestions per a better approach or maybe a way to
  improve the performance for the above query ?
 
 I wouldn't worry about running it every 15 minutes unless it's on a
 REALLY slow machine.
 
 I just ran it in a loop over and over on my 8 core opteron server and
 it ran the load factor up by almost exactly 1.0.  Under our normal
 daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new
 load of running that query over and over.  So, it doesn't seem to be
 blocking or anything.

The internal docs for pg_buffercache_pages.c state:

To get a consistent picture of the buffer state, we must lock all
partitions of the buffer map.  Needless to say, this is horrible
for concurrency.  Must grab locks in increasing order to avoid
possible deadlocks.

I'd be concerned about that running routinely.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Sl_log_1 and sl_log_2 not getting truncated.

2009-01-07 Thread Brad Nicholson

On Wed, 2009-01-07 at 18:18 +0530, Nimesh Satam wrote:
 Hi,
 
 We have recently installed slony and tsrted replication on one of our
 test machines. When we start inserting data in to the replicated
 database, the replication is taking properly. Over a period of time
 the lag increases between the two database.

You should sign up to the Slony list and ask your question there.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Brad Nicholson

On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote:
 On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz gryz...@gmail.com 
 wrote:
  On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
  mallah.raj...@gmail.com wrote:
 
  vacuum_cost_delay = 150
  vacuum_cost_page_hit = 1
  vacuum_cost_page_miss = 10
  vacuum_cost_page_dirty = 20
  vacuum_cost_limit = 1000
  autovacuum_vacuum_cost_delay = 300
 
  why is it not a good idea to give end users control over when they
  want to run it ?
 
  Effectively, you have control over autovacuum via these params.
  You have to remember, that autovacuum doesn't cost much, and it makes
  planner know more about data.
  It's not there to clean up databases, as you might imagine - it is
  there to update stats, and mark pages as free.
 
  So make sure you tweak that config fist, because I have a funny
  feeling that you just think that vacuuming bogs down your machine, and
  _can_ be turned off without any bad consequences, which is simply not
  true.
 
 our usage pattern is such that peak activity (indicated by load average)
 during day time is 10 times during night hours. Autovacuum just puts
 more pressure to the system. If less stressing version is used then
 it shall take longer to complete one cycle,  which would mean  less
 performance for longer time . Less performance queues up queries
 and encourages people to re submit their queries which again
 adds to bogging up the system.

That's not exactly how it works in practise, if tuned properly.  It may
take longer, but it is less intensive while running.

We had one system that had spikes happening due to the exact case you
described - there were noticeably high IO wait times while certain
tables were being vacuumed.  We set the cost delay and the wait times
dropped to the point where it was non-issue.  Vacuums take twice as
long, but there is no measurable impact to the performance.

 In our case i feel the hardware is bit underscaled as compared to
 load thats why i think running in lean hours is best of both worlds
 no performance sacrifices and intelligent vacuuming.

That is a different issue altogether. 

Not vacuuming a running system at all during peak hours is not
considered intelligent vacuuming IMHO.  There are plenty of use cases
where small, frequent vacuums keep tables under control at a very low
cost.  Letting them go for extended periods of time without vacuuming
causes bloat and eventual slowdowns to table access which manifest in
higher IO usage across the board.

If you really are dead set on vacuuming only at night, you may want to
do a careful analysis of which tables need to be vacuumed and when, and
trigger manual vacuums from cron.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Long Running Commits

2009-03-04 Thread Brad Nicholson

 backend, pid=667692 
socket=9
2009-03-04 09:56:13.933 CUT [561402]DEBUG:  server process (PID 864402) 
exited with exit code 0
2009-03-04 09:56:13.933 CUT [561402]DEBUG:  server process (PID 594124) 
exited with exit code 0
2009-03-04 09:56:13.934 CUT [561402]DEBUG:  server process (PID 852202) 
exited with exit code 0
2009-03-04 09:56:13.935 CUT [561402]DEBUG:  server process (PID 2433156) 
exited with exit code 0
2009-03-04 09:56:13.935 CUT [561402]DEBUG:  server process (PID 2216120) 
exited with exit code 0
2009-03-04 09:56:13.936 CUT [561402]DEBUG:  server process (PID 1761484) 
exited with exit code 0
2009-03-04 09:56:13.936 CUT [561402]DEBUG:  server process (PID 815268) 
exited with exit code 0
2009-03-04 09:56:13.936 CUT [561402]DEBUG:  server process (PID 876668) 
exited with exit code 0
2009-03-04 09:56:13.937 CUT [561402]DEBUG:  server process (PID 897060) 
exited with exit code 0
2009-03-04 09:56:13.937 CUT [561402]DEBUG:  server process (PID 2199742) 
exited with exit code 0
2009-03-04 09:56:13.937 CUT [561402]DEBUG:  server process (PID 913618) 
exited with exit code 0
2009-03-04 09:56:13.937 CUT [561402]DEBUG:  server process (PID 2334774) 
exited with exit code 0
2009-03-04 09:56:13.984 CUT [561402]DEBUG:  forked new backend, pid=2334776 
socket=9
2009-03-04 09:56:14.065 CUT [602282] appdb appuser 1.2.3.3 LOG:  duration: 
872.457 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2009-03-04 09:56:14.065 CUT [868552] appdb appuser 1.2.3.4 LOG:  duration: 
873.756 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2009-03-04 09:56:14.075 CUT [2212000] appdb appuser 1.2.3.5 LOG:  duration: 
586.276 ms  statement: EXECUTE unnamed  [PREPARE:  commit]
2009-03-04 09:56:14.076 CUT [561402]DEBUG:  forked new backend, pid=913620 
socket=9
2009-03-04 09:56:14.095 CUT [561402]DEBUG:  forked new backend, pid=782510 
socket=9
2009-03-04 09:56:14.106 CUT [561402]DEBUG:  forked new backend, pid=2285776 
socket=9
2009-03-04 09:56:14.179 CUT [561402]DEBUG:  forked new backend, pid=999650 
socket=9
2009-03-04 09:56:14.194 CUT [561402]DEBUG:  forked new backend, pid=1646620 
socket=9
2009-03-04 09:56:14.245 CUT [561402]DEBUG:  forked new backend, pid=921634 
socket=9
2009-03-04 09:56:14.255 CUT [561402]DEBUG:  forked new backend, pid=1257600 
socket=9
2009-03-04 09:56:14.265 CUT [561402]DEBUG:  forked new backend, pid=1622114 
socket=9
2009-03-04 09:56:14.325 CUT [561402]DEBUG:  forked new backend, pid=1519664 
socket=9
2009-03-04 09:56:14.326 CUT [561402]DEBUG:  forked new backend, pid=1839238 
socket=9
2009-03-04 09:56:14.335 CUT [561402]DEBUG:  forked new backend, pid=2412680 
socket=9
2009-03-04 09:56:14.425 CUT [561402]DEBUG:  forked new backend, pid=2007238 
socket=9
2009-03-04 09:56:14.426 CUT [561402]DEBUG:  server process (PID 2363424) 
exited with exit code 0
2009-03-04 09:56:14.426 CUT [561402]DEBUG:  server process (PID 1749056) 
exited with exit code 0
2009-03-04 09:56:14.426 CUT [561402]DEBUG:  server process (PID 2056242) 
exited with exit code 0
2009-03-04 09:56:14.427 CUT [561402]DEBUG:  server process (PID 770156) 
exited with exit code 0
2009-03-04 09:56:14.433 CUT [561402]DEBUG:  forked new backend, pid=770158 
socket=9
2009-03-04 09:56:14.433 CUT [561402]DEBUG:  server process (PID 2379944) 
exited with exit code 0
2009-03-04 09:56:14.484 CUT [561402]DEBUG:  forked new backend, pid=2379946 
socket=9
2009-03-04 09:56:14.485 CUT [561402]DEBUG:  server process (PID 1130616) 
exited with exit code 0
2009-03-04 09:56:14.494 CUT [561402]DEBUG:  forked new backend, pid=1130618 
socket=9
2009-03-04 09:56:14.505 CUT [561402]DEBUG:  forked new backend, pid=516338 
socket=9
2009-03-04 09:56:14.506 CUT [561402]DEBUG:  forked new backend, pid=2199744 
socket=9

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Brad Nicholson


Greg Smith wrote:

Karl Denninger wrote:

With the write cache off on these disks they still are huge wins for
very-heavy-read applications, which many are.
Very read-heavy applications would do better to buy a ton of RAM 
instead and just make sure they populate from permanent media (say by 
reading everything in early at sequential rates to prime the cache).  
There is an extremely narrow use-case where SSDs are the right 
technology, and it's only in a subset even of read-heavy apps where 
they make sense.


Out of curiosity, what are those narrow use cases where you think SSD's 
are the correct technology?


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Brad Nicholson

On Tue, 2009-11-17 at 11:36 -0500, Merlin Moncure wrote:
 2009/11/13 Greg Smith g...@2ndquadrant.com:
  As far as what real-world apps have that profile, I like SSDs for small to
  medium web applications that have to be responsive, where the user shows up
  and wants their randomly distributed and uncached data with minimal latency.
  SSDs can also be used effectively as second-tier targeted storage for things
  that have a performance-critical but small and random bit as part of a
  larger design that doesn't have those characteristics; putting indexes on
  SSD can work out well for example (and there the write durability stuff
  isn't quite as critical, as you can always drop an index and rebuild if it
  gets corrupted).
 
 I am right now talking to someone on postgresql irc who is measuring
 15k iops from x25-e and no data loss following power plug test.  I am
 becoming increasingly suspicious that peter's results are not
 representative: given that 90% of bonnie++ seeks are read only, the
 math doesn't add up, and they contradict broadly published tests on
 the internet.  Has anybody independently verified the results?

How many times have the run the plug test?  I've read other reports of
people (not on Postgres) losing data on this drive with the write cache
on.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Is Diskeeper Automatic Mode safe?

2009-11-17 Thread Brad Nicholson

On Mon, 2009-11-16 at 23:57 -0500, cb wrote:
 On Nov 16, 2009, at 8:31 PM, Tom Lane wrote:
 Myself and the other guy responsible for the underlying hardware have  
 already gone down this route. The big bosses know our stance and know  
 it isn't us preventing the upgrade. After that, there isn't too much  
 more I can do except sit back and shake my head each time something  
 goes wrong and I get sent on a wild goose chase to find any reason for  
 the failure OTHER than PG.


What you need to do is stop the wild goose chases.  If problem is you PG
version, no amount of investigation into other areas is going to change
that.  Your company is simply wasting money by ignoring this and blindly
hoping that the problem will be something else.

It can be a difficult battle, but it can be won.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Brad Nicholson

On Thu, 2009-11-19 at 19:01 +0100, Anton Rommerskirchen wrote:
 Am Donnerstag, 19. November 2009 13:29:56 schrieb Craig Ringer:
  On 19/11/2009 12:22 PM, Scott Carey wrote:
   3:  Have PG wait a half second (configurable) after the checkpoint
   fsync() completes before deleting/ overwriting any WAL segments.  This
   would be a trivial feature to add to a postgres release, I think.
 
  How does that help? It doesn't provide any guarantee that the data has
  hit main storage - it could lurk in SDD cache for hours.
 
   4: Yet another solution:  The drives DO adhere to write barriers
   properly. A filesystem that used these in the process of fsync() would be
   fine too. So XFS without LVM or MD (or the newer versions of those that
   don't ignore barriers) would work too.
 
  *if* the WAL is also on the SSD.
 
  If the WAL is on a separate drive, the write barriers do you no good,
  because they won't ensure that the data hits the main drive storage
  before the WAL recycling hits the WAL disk storage. The two drives
  operate independently and the write barriers don't interact.
 
  You'd need some kind of inter-drive write barrier.
 
  --
  Craig Ringer
 
 
 Hello !
 
 as i understand this:
 ssd performace is great, but caching is the problem.
 
 questions:
 
 1. what about conventional disks with 32/64 mb cache ? how do they handle the 
 plug test if their caches are on ?

If the aren't battery backed, they can lose data.  This is not specific
to SSD.

 2. what about using seperated power supply for the disks ? it it possible to 
 write back the cache after switching the sata to another machine controller ?

Not sure.  I only use devices with battery backed caches or no cache.  I
would be concerned however about the drive not flushing itself and still
running out of power.

 3. what about making a statement about a lacking enterprise feature (aka 
 emergency battery equipped ssd) and submitting this to the producers ?

The producers aren't making Enterprise products, they are using caches
to accelerate the speeds of consumer products to make their drives more
appealing to consumers.  They aren't going to slow them down to make
them more reliable, especially when the core consumer doesn't know about
this issue, and is even less likely to understand it if explained.

They may stamp the word Enterprise on them, but it's nothing more than
marketing.

 I found that one of them (OCZ) seems to handle suggestions of customers (see 
 write speed discussins on vertex fro example)
 
 and another (intel) seems to handle serious problems with his disks in 
 rewriting and sometimes redesigning his products - if you tell them and 
 market dictades to react (see degeneration of performace before 1.11 
 firmware).
 
 perhaps its time to act and not only to complain about the fact.

Or, you could just buy higher quality equipment that was designed with
this in mind.

There is nothing unique to SSD here IMHO.  I wouldn't run my production
grade databases on consumer grade HDD, I wouldn't run them on consumer
grade SSD either.


-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Brad Nicholson

On Mon, 2010-01-04 at 15:53 -0500, Madison Kelly wrote:
 Gary Doades wrote:
   From your queries it definitely looks like its your stats that are the 
  problem. When the stats get well out of date the planner is choosing a 
  hash join because it thinks thousands of rows are involved where as only 
  a few are actually involved. Thats why, with better stats, the second 
  query is using a loop join over very few rows and running much quicker.
  
  Therefore it's ANALYZE you need to run as well as regular VACUUMing. 
  There should be no need to VACUUM FULL at all as long as you VACUUM and 
  ANALYZE regularly. Once a day may be enough, but you don't say how long 
  it takes your database to become slow.
  
  You can VACUUM either the whole database (often easiest) or individual 
  tables if you know in more detail what the problem is and that only 
  certain tables need it.
  
  Setting up autovacuum may well be sufficient.
  
  Cheers,
  Gary.
 
 That explains things, thank you!
 
 For the record; It was taking a few months for the performance to become 
 intolerable. I've added CLUSTER - ANALYZE - VACUUM to my nightly 
 routine and dropped the VACUUM FULL call. I'll see how this works.

I think you are going down the wrong route here - you should be looking
at preventative maintenance instead of fixing it after its broken.

Ensure that autovacuum is running for the database (assuming that you
are on a relatively modern version of PG), and possibly tune it to be
more aggressive (we can help).

This will ensure that the condition never comes up.

ps - if you do go with the route specify, no need to VACUUM after the
CLUSTER.  CLUSTER gets rid of the dead tuples - nothing for VACUUM to
do.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] DB is slow until DB is reloaded

2010-01-05 Thread Brad Nicholson

On Mon, 2010-01-04 at 20:02 -0800, Craig James wrote:
 +Madison Kelly wrote:
  You are right, autovacuum is not running after all. From your comment, I 
  am wondering if you'd recommend I turn it on or not? If so, given that I 
  doubt I will upgrade any time soon, how would I enable it? I suppose I 
  could google that, but google rarely shares gotcha's. :)

 Most of the pain of a Postgres upgrade is the dump/reload step.  But you've 
 already had to do that several times, so why the hesitation to upgrade?  
 Upgrading Postgres to the latest release, even if you have to do it from the 
 source code, takes almost no time at all compared to the time you've already 
 burned trying to solve this problem.  

Actually, the biggest pain going beyond 8.2 is the change to implicit
casting.  

 Do the upgrade, you won't regret it.

Agree.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bgwriter tunables vs pg_stat_bgwriter

2010-02-17 Thread Brad Nicholson

On Wed, 2010-02-17 at 08:30 -0500, Jeff wrote:
 Since getting on 8.4 I've been monitoring things fairly closely.
 I whipped up a quick script to monitor pg_stat_bgwriter and save  
 deltas every minute so I can ensure my bgwriter is beating out the  
 backends for writes (as it is supposed to do).
 
 Now, the odd thing I'm running into is this:
 
 bgwriter_delay is 100ms (ie 10 times a second, give or take)
 bgwriter_lru_maxpages is 500 (~5000 pages / second)
 bgwriter_lru_multiplier is 4
 
 Now, assuming I understand these values right the following is what  
 should typically happen:
 
 while(true)
 {
  if buffers_written  bgwriter_lru_maxpages
or buffers_written  anticipated_pages_needed *  
 bgwriter_lru_multiplier
 {
   sleep(bgwriter_delay ms)
continue;
 }
 ...
 }

Correct.

 so I should not be able to have more than ~5000 bgwriter_clean pages  
 per minute. (this assumes writing takes 0ms, which of course is  
 inaccurate)

That works out to 5000/second - 300,000/minute.

 However, I see this in my stats (they are deltas), and I'm reasonably  
 sure it is not a bug in the code:
 
 (timestamp, buffers clean, buffers_checkpoint, buffers backend)
   2010-02-17 08:23:51.184018 | 1 |   1686  
 |   5
   2010-02-17 08:22:51.170863 | 15289 |  12676  
 | 207
   2010-02-17 08:21:51.155793 | 38467 |   8993  
 |4277
   2010-02-17 08:20:51.139199 | 35582 |  0  
 |9437
   2010-02-17 08:19:51.125025 | 8 |  0  
 |   3
   2010-02-17 08:18:51.84 |  1140 |   1464  
 |   6
   2010-02-17 08:17:51.098422 | 0 |   1682  
 | 228
   2010-02-17 08:16:51.082804 |50 |  0  
 |   6
   2010-02-17 08:15:51.067886 |   789 |  0  
 |   1
 
 perhaps some stats buffering occurring or something or some general  
 misunderstanding of some of these tunables?

 --
 Jeff Trout j...@jefftrout.com
 http://www.stuarthamm.net/
 http://www.dellsmartexitin.com/
 
 
 
 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson

On Wed, 2010-03-17 at 14:30 +0200, Devrim GÜNDÜZ wrote:
 On Mon, 2010-03-08 at 09:38 -0800, Ben Chobot wrote:
  We've enjoyed our FusionIO drives very much. They can do 100k iops
  without breaking a sweat.
 
 Yeah, performance is excellent. I bet we could get more, but CPU was
 bottleneck in our test, since it was just a demo server :(

Did you test the drive in all three modes?  If so, what sort of
differences did you see.

I've been hearing bad things from some folks about the quality of the
FusionIO drives from a durability standpoint. I'm Unsure if this is
vendor specific bias or not, but considering the source (which not
vendor specific), I don't think so. 

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson

On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote:
 On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:
 
  I've been hearing bad things from some folks about the quality of the
  FusionIO drives from a durability standpoint.
 
 Can you be more specific about that? Durability over what time frame? How 
 many devices in the sample set? How did FusionIO deal with the issue?

I didn't get any specifics - as we are looking at other products.  It
did center around how FusionIO did wear-leveling though. 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson

On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:
 FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, 
 which wear levels across 100GB of actual installed capacity. 
 http://community.fusionio.com/forums/p/34/258.aspx#258
 

20% of overall capacity free for levelling doesn't strike me as a lot.
Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
leaves 40% (with much larger overall capacity).

Also, running that drive at 80GB is the Maximum Capacity mode, which
decreases the write performance.

 Max drive performance would be about 41TB/day, which coincidently works out 
 very close to the 3 year warranty they have on the devices.
 

To counter that:

http://www.tomshardware.com/reviews/fusioinio-iodrive-flash,2140-2.html

Fusion-io’s wear leveling algorithm is based on a cycle of 5 TB
write/erase volume per day, resulting in 24 years run time for the 80 GB
model, 48 years for the 160 GB version and 16 years for the MLC-based
320 GB type. However, since 5 TB could be written or erased rather
quickly given the performance level, we recommend not relying on these
approximations too much.


 FusionIO's claim _seems_ credible. I'd love to see some evidence to the 
 contrary.

Vendor claims always seem credible.  The key is to separate the
marketing hype from the actual details.

Again, I'm just passing along what I heard - which was from a
vendor-neutral, major storage consulting firm that decided to stop
recommending these drives to clients.  Make of that what you will.

As an aside, some folks in our Systems Engineering department here did
do some testing of FusionIO, and they found that the helper daemons were
inefficient and placed a fair amount of load on the server.  That might
be something to watch of for for those that are testing them.

 
 On Mar 17, 2010, at 9:18 AM, Brad Nicholson wrote:
 
  On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote:
  On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:
  
  I've been hearing bad things from some folks about the quality of the
  FusionIO drives from a durability standpoint.
  
  Can you be more specific about that? Durability over what time frame? How 
  many devices in the sample set? How did FusionIO deal with the issue?
  
  I didn't get any specifics - as we are looking at other products.  It
  did center around how FusionIO did wear-leveling though. 
  -- 
  Brad Nicholson  416-673-4106
  Database Administrator, Afilias Canada Corp.
  
  
 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson

On Wed, 2010-03-17 at 14:11 -0400, Justin Pitts wrote:
 On Mar 17, 2010, at 10:41 AM, Brad Nicholson wrote:
 
  On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:
  FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, 
  which wear levels across 100GB of actual installed capacity. 
  http://community.fusionio.com/forums/p/34/258.aspx#258
  
  
  20% of overall capacity free for levelling doesn't strike me as a lot.
 
 I don't have any idea how to judge what amount would be right.
 
  Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
  leaves 40% (with much larger overall capacity).
  
  Also, running that drive at 80GB is the Maximum Capacity mode, which
  decreases the write performance.
 
 Very fair. In my favor, my proposed use case is probably at half capacity or 
 less. I am getting the impression that partitioning/formatting the drive for 
 the intended usage, and not the max capacity, is the way to go. Capacity 
 isn't an issue with this workload. I cannot fit enough drives into these 
 servers to get a tenth of the IOPS that even Tom's documents the ioDrive is 
 capable of at reduced performance levels.


The actual media is only good for a very limited number of write cycles.  The 
way that the drives get around to be reliable is to 
constantly write to different areas.  The more you have free, the less you have 
to re-use, the longer the lifespan.

This is done by the drives wear levelling algorithms, not by using
partitioning utilities btw.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Why Wal_buffer is 64KB

2010-03-25 Thread Brad Nicholson

On Thu, 2010-03-25 at 20:31 +0530, Tadipathri Raghu wrote:
 Hi All,
  
 Can anybody clarify on this, why wal_buffer is 64kb and what is
 advantages and disadvantages in increasing or decreasing the
 wal_buffer.

This is addressed in the documentation.

http://www.postgresql.org/docs/8.4/interactive/wal-configuration.html

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to fast the REINDEX

2010-04-01 Thread Brad Nicholson

On Thu, 2010-04-01 at 19:17 +0530, raghavendra t wrote:
 
 Hi All,
  
 System Config
 -
 CPU - Intel® Xenon® CPU
 CPU Speed - 3.16 GHz
 Server Model - Sun Fire X4150
 RAM-Size - 16GB
 
 Steve:
 So am I to understand I don't need to do daily reindexing as a
 maintenance measure with 8.3.7 on FreeBSD.
  
 My question is something like Steve's, why we should not do reindexing
 as our maintenance task. I was doing reindex only to get
 a best fit and not fall short of 90% hole, bcoz my table has lot of
 updates and deletes. We also has the weekly maintance of VACUUM, but
 still reindex takes lot of time. 

This is your problem.  You should enable autovaccuum, let the vacuums
happen more frequently, and this problem will go away.  You will still
have to fix the underlying bloat a last time though.
 

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Zeus IOPS

2010-05-31 Thread Brad Nicholson

On Fri, 2010-05-28 at 13:48 -0600, Scott Marlowe wrote:
 Anybody on the list have any experience with these drives?  They get
 good numbers but I can't find diddly on them on the internet for the
 last year or so.
 
 http://www.stec-inc.com/product/zeusiops.php

I'd heard that they were a popular choice in the Enterprise market, but
that was around 6 months ago or so, and purely anecdotal.

Seems strange though for them to disappear off the radar when SSD
related information is becoming so prevalent.  That could just be an
effect of all the noise around the non-Enterprise grade SSD's floating
around though.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Zeus IOPS

2010-06-01 Thread Brad Nicholson

On Tue, 2010-06-01 at 10:27 +0300, Mindaugas Riauba wrote:
 Hello,
 
  Anybody on the list have any experience with these drives?  They get
  good numbers but I can't find diddly on them on the internet for the
  last year or so.
 
  http://www.stec-inc.com/product/zeusiops.php
 
   Most of the storage vendors (I have confirmation from EMC and HP)
 use those in their SAN boxes. I believe that is because they are the
 only SLC SSD makers that have supercapacitors on the SSD drive which
 allows them to run with write cache enabled. As a side effect - they
 are insanely expensive. :)

Texas Memory Systems also have these.
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-30 Thread Brad Nicholson

On Tue, 2010-06-29 at 21:39 -0400, Bruce Momjian wrote:
 Jignesh Shah wrote:
  On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian br...@momjian.us wrote:
   Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
I asked on IRC and was told it is true, and looking at the C code it
looks true. ?What synchronous_commit = false does is to delay writing
the wal buffers to disk and fsyncing them, not just fsync, which is
where the commit loss due to db process crash comes from.
  
Ah, I see. ?Thanks.
  
I am personally surprised it was designed that way; ?I thought we would
just delay fsync.
  
   That would require writing and syncing to be separable actions. ?If
   you're using O_SYNC or similar, they aren't.
  
   Ah, very good point. ?I have added a C comment to clarify why this is
   the current behavior; ?attached and applied.
  
   --
   ?Bruce Momjian ?br...@momjian.us ? ? ? ?http://momjian.us
   ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com
  
  
  Though has anybody seen a behaviour where synchronous_commit=off is
  slower than synchronous_commit=on  ? Again there are two cases here
  one with O_* flag and other with f*sync flags. But I had seen that
  behavior with PostgreSQL 9.0 beta(2 I think) though havent really
  investigated it much yet .. (though now I dont remember which
  wal_sync_method flag) . Just curious if anybody has seen that
  behavior..
 
 I have trouble believing how synchronous_commit=off could be slower than
 'on'.
 

I wonder if it could be contention on wal buffers?

Say I've turned synchronous_commit off, I drive enough traffic fill up
my wal_buffers.  I assume that we would have to start writing buffers
down to disk before allocating to the new process.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Brad Nicholson

On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
 Samuel Gendler sgend...@ideasculptor.com writes:
  On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
  cr...@postnewspapers.com.au wrote:
  If you're not using a connection pool, start using one.
 
  I see this issue and subsequent advice cross this list awfully
  frequently.  Is there in architectural reason why postgres itself
  cannot pool incoming connections in order to eliminate the requirement
  for an external pool?
 
 Perhaps not, but there's no obvious benefit either.  Since there's
 More Than One Way To Do It, it seems more practical to keep that as a
 separate problem that can be solved by a choice of add-on packages.

This sounds similar to the approach to taken with Replication for years
before being moved into core.

Just like replication, pooling has different approaches.  I do think
that in both cases, having a solution that works, easily, out of the
box will meet the needs of most users.

There is also the issue of perception/adoption here as well.  One of my
colleagues mentioned that at PG East that he repeatedly heard people
talking (negatively) about the over reliance on add-on packages to deal
with core DB functionality.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] dates and partitioning

2010-07-20 Thread Brad Nicholson

On Tue, 2010-07-20 at 09:36 -0600, Kevin Kempter wrote:
 Hi All;
 
 we have a table partitioned by day, the check constraint on the child tables 
 looks like this (this is the may 31st partition):
 
 CHECK 
 (stime = '2010-05-30 00:00:00+00'::timestamp with time zone 
   AND stime = '2010-05-30 23:59:59+00'::timestamp with time zone)
 
 
 We have a python based app that creates code like this:
 
 select
  *
 from
 table_a a, 
 table_b b,
 table_d d
 where a.id = b.id
 and b.id = d.id
 and stime = timestamp %s at time zone \'UTC\'
 and stime  timestamp %s at time zone \'UTC\'
 and stime = timestamp %s at time zone d.name
 and stime  timestamp %s at time zone d.name
...
 
 
 so here's my questions:
 
 1) the above app generated query pshows up like this in pg_stat_activity:
 
 and stime = timestamp E'2010-07-17' at time zone 'UTC'   
 and stime  timestamp E'2010-07-21' at time zone 'UTC' 
 and stime = timestamp E'2010-07-18' at time zone d.name  
 and stime  timestamp E'2010-07-19' at time zone d.name 
 
 what's the E'date' from? and why does it show up this way?

That's E is an escape character.  Python is likely putting that in.

See http://www.postgresql.org/docs/8.4/static/sql-syntax-lexical.html -
section 4.1.2.2

 
 2) the above query creates a plan that does a sequential scan  filter on 
 every partition. Why won't it only hit the correct partitions? Is it due to 
 the way the date was specified? or maybe the at time zone syntax?

Do you have constraint_exclusion turned on?

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Testing Sandforce SSD

2010-08-05 Thread Brad Nicholson


 On 10-08-04 03:49 PM, Scott Carey wrote:

On Aug 2, 2010, at 7:26 AM, Merlin Moncure wrote:


On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havingayebhavi...@gmail.com  wrote:

After a week testing I think I can answer the question above: does it work
like it's supposed to under PostgreSQL?

YES

The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534

* it is safe to mount filesystems with barrier off, since it has a 'supercap
backed cache'. That data is not lost is confirmed by a dozen power switch
off tests while running either diskchecker.pl or pgbench.
* the above implies its also safe to use this SSD with barriers, though that
will perform less, since this drive obeys write trough commands.
* the highest pgbench tps number for the TPC-B test for a scale 300 database
(~5GB) I could get was over 6700. Judging from the iostat average util of
~40% on the xlog partition, I believe that this number is limited by other
factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4
pgbench without threads. Unfortunately I don't have a faster/more core
machines available for testing right now.
* pgbench numbers for a larger than RAM database, read only was over 25000
tps (details are at the end of this post), during which iostat reported
~18500 read iops and 100% utilization.
* pgbench max reported latencies are 20% of comparable BBWC setups.
* how reliable it is over time, and how it performs over time I cannot say,
since I tested it only for a week.

Thank you very much for posting this analysis.  This has IMNSHO the
potential to be a game changer.  There are still some unanswered
questions in terms of how the drive wears, reliability, errors, and
lifespan but 6700 tps off of a single 400$ device with decent fault
tolerance is amazing (Intel, consider yourself upstaged).  Ever since
the first samsung SSD hit the market I've felt the days of the
spinning disk have been numbered.  Being able to build a 100k tps
server on relatively inexpensive hardware without an entire rack full
of drives is starting to look within reach.

Intel's next gen 'enterprise' SSD's are due out later this year.  I have heard 
from those with access to to test samples that they really like them -- these 
people rejected the previous versions because of the data loss on power failure.

So, hopefully there will be some interesting competition later this year in the 
medium price range enterprise ssd market.



I'll be doing some testing on Enterprise grade SSD's this year.  I'll 
also be looking at some hybrid storage products that use as  SSD's as 
accelerators mixed with lower cost storage.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson


 On 8/10/2010 12:21 PM, Greg Smith wrote:

Scott Carey wrote:
Also, the amount of data at risk in a power loss varies between 
drives.  For Intel's drives, its a small chunk of data (  256K).  
For some other drives, the cache can be over 30MB of outstanding writes.

For some workloads this is acceptable


No, it isn't ever acceptable.  You can expect the type of data loss 
you get when a cache fails to honor write flush calls results in 
catastrophic database corruption.  It's not I lost the last few 
seconds; it's the database is corrupted and won't start after a 
crash.  This is why we pound on this topic on this list.  A SSD that 
fails to honor flush requests is completely worthless for anything 
other than toy databases.  You can expect significant work to recover 
any portion of your data after the first unexpected power loss under 
heavy write load in this environment, during which you're down.  We do 
database corruption recovery at 2ndQuadrant; while I can't talk about 
the details of some recent incidents, I am not speaking theoretically 
when I warn about this.




What about putting indexes on them?  If the drive fails and drops writes 
on those, they could be rebuilt - assuming your system can function 
without the index(es) temporarily.



--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson


 On 8/10/2010 2:28 PM, Greg Smith wrote:

Brad Nicholson wrote:
What about putting indexes on them?  If the drive fails and drops 
writes on those, they could be rebuilt - assuming your system can 
function without the index(es) temporarily.


Dumping indexes on SSD is one of the better uses for them, presuming 
you can survive what is likely to be an outage from a can the site 
handle full load? perspective while they rebuild after a crash.  As 
I'm sure Brad is painfully aware of already, index rebuilding in 
PostgreSQL can take a while.  To spin my broken record here again, the 
main thing to note when you consider that--relocate indexes onto 
SSD--is that the ones you are most concerned about the performance of 
were likely to be already sitting in RAM anyway, meaning the SSD 
speedup doesn't help reads much.  So the giant performance boost just 
isn't there in that case.


The case where I'm thinking they may be of use is for indexes you can 
afford to lose.  I'm thinking of ones that are needed by nightly batch 
jobs, down stream systems or reporting - the sorts of things that you 
can turn off  during a rebuild, and where the data sets are not likely 
to be in cache.


We have a few such cases, but we don't need the speed of SSD's for them.

Personally, I wouldn't entertain any SSD with a capacitor backing it for 
anything, even indexes.  Not worth the hassle to me.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson


 On 8/10/2010 2:38 PM, Karl Denninger wrote:

Scott Marlowe wrote:

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net  wrote:
   

ANY disk that says write is complete when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
 


What about read only slaves where there's a master with 100+spinning
hard drives getting it right and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.
   

A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?



A valid case is a Slony replica if used for query offloading (not for 
DR).  It's considered a read-only subscriber from the perspective of 
Slony as only Slony can modify the data  (although you are technically 
correct, it is not read only - controlled write may be more accurate).


 In case of failure, a rebuild + resubscribe gets you back to the same 
consistency.  If you have high IO requirements, and don't have the 
budget to rack up extra disk arrays to meet them, it could be an option.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson


 On 8/10/2010 3:28 PM, Karl Denninger wrote:

Brad Nicholson wrote:

On 8/10/2010 2:38 PM, Karl Denninger wrote:

Scott Marlowe wrote:

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net  wrote:
   

ANY disk that says write is complete when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
 


What about read only slaves where there's a master with 100+spinning
hard drives getting it right and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.
   

A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

A valid case is a Slony replica if used for query offloading (not for 
DR).  It's considered a read-only subscriber from the perspective of 
Slony as only Slony can modify the data  (although you are 
technically correct, it is not read only - controlled write may be 
more accurate).


 In case of failure, a rebuild + resubscribe gets you back to the 
same consistency.  If you have high IO requirements, and don't have 
the budget to rack up extra disk arrays to meet them, it could be an 
option.

CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to 
stable storage.


What will hose you is assuming that your data will be okay in the case 
of a failure, which is a very bad assumption to make in the case on 
unreliable SSD's.  You are assuming I am implying that these should be 
treated like reliable media - I am not.


In case of failure, you need to assume data loss until proven 
otherwise.  If there is a problem, rebuild.


When the slave restarts it will not know that the transaction was 
lost.  Neither will the master, since it was told that it was 
committed.  Slony will happily go on its way and replicate forward, 
without any indication of a problem - except that on the slave, there 
are one or more transactions that are **missing**.




Correct.
Some time later you issue an update that goes to the slave, but the 
change previously lost causes the slave commit to violate referential 
integrity.   SLONY will fail to propagate that change and all behind 
it - it effectively locks at that point in time.



It will lock data flow to that subscriber, but not to others.

You can recover from this by dropping the slave from replication and 
re-inserting it, but that forces a full-table copy of everything in 
the replication set.  The bad news is that the queries to the slave in 
question may have been returning erroneous data for some unknown 
period of time prior to the lockup in replication (which hopefully you 
detect reasonably quickly - you ARE watching SLONY queue depth with 
some automated process, right?)


There are ways around that - run two subscribers and redirect your 
queries on failure.  Don't bring up the failed replica until it is 
verified or rebuilt.


I can both cause this in the lab and have had it happen in the field.  
It's a nasty little problem that bit me on a series of disks that 
claimed to have write caching off, but in fact did not.  I was very 
happy that the data on the master was good at that point, as if I had 
needed to failover to the slave (thinking it was a good copy) I 
would have been in SERIOUS trouble.


It's very easy to cause those sorts of problems.

What  I am saying is that the technology can have a use, if you are 
aware of the sharp edges, and can both work around them and live with 
them.  Everything you are citing is correct, but is more implying that 
they they are blindly thrown in without understanding the risks and 
mitigating them.


I'm also not suggesting that this is a configuration I would endorse, 
but it could potentially save a lot of money in certain use cases.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-12 Thread Brad Nicholson


 On 10-08-12 03:22 AM, Arjen van der Meijden wrote:

On 12-8-2010 2:53 gnuo...@rcn.com wrote:

- The value of SSD in the database world is not as A Faster HDD(tm).
Never was, despite the naive' who assert otherwise.  The value of SSD
is to enable BCNF datastores.  Period.  If you're not going to do
that, don't bother.  Silicon storage will never reach equivalent
volumetric density, ever.  SSD will never be useful in the byte bloat
world of xml and other flat file datastores (resident in databases or
not).  Industrial strength SSD will always be more expensive/GB, and
likely by a lot.  (Re)factoring to high normalization strips out an
order of magnitude of byte bloat, increases native data integrity by
as much, reduces much of the redundant code, and puts the ACID where
it belongs.  All good things, but not effortless.


It is actually quite common to under-utilize (short stroke) hard 
drives in the enterprise world. Simply because 'they' need more IOps 
per amount of data than a completely utilized disk can offer.
As such the expense/GB can be much higher than simply dividing the 
capacity by its price (and if you're looking at fiber channel disks, 
that price is quite high already). And than it is relatively easy to 
find enterprise SSD's with better pricing for the whole system as soon 
as the IOps are more important than the capacity.


And when you compare the ongoing operational costs of rack space, 
powering and cooling for big arrays full of spinning disks to flash 
based solutions the price comparison evens itself out even more.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] locking issue on simple selects?

2010-09-15 Thread Brad Nicholson


 On 10-09-15 03:07 PM, Tobias Brox wrote:

On 15 September 2010 15:39, Tom Lanet...@sss.pgh.pa.us  wrote:

An exclusive lock will block selects too.  Have you looked into pg_locks
for ungranted lock requests?

Well - I thought so, we have a logging script that logs the content of
the pg_locks table, it didn't log anything interesting but it may be a
problem with the script itself.  It does an inner join on
pg_locks.relation = pg_class.oid but when I check now this join seems
to remove most of the rows in the pg_locks table.  Does it make sense
at all to join pg_class with pg_locks?  I will ask the sysadm to
change to an outer join as for now.



You can also enable log_lock_waits and the lock waits will appear in 
your Postgres logs.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Memory usage - indexes

2010-09-24 Thread Brad Nicholson


 On 10-09-24 12:46 PM, Tobias Brox wrote:

On 24 September 2010 18:23, Bob Lunneybob_lun...@yahoo.com  wrote:

Consult pg_statio_user_indexes to see which indexes have been used
and how much.

What is the main differences between pg_statio_user_indexes and
pg_stat_user_indexes?



The pg_stat_* views give you usage information (for indexes - number of 
scans, numbers of tuples read/fetched).  The pg_statio_* views give you 
information about block reads and block hits




I'm not at all concerned about 1 and 2 above - we don't have any
performance issues on the write part, and we have plenty of disk
capacity.  We are still doing the nightly vacuum thing, and it does
hurt us a bit since it's dragging ever more out in time.


Why is the vacuum dragging out over time?  Is the size of your data 
increasing, are you doing more writes that leave dead tuples, or are 
your tables and/or indexes getting bloated?


Also, is there a reason why you do nightly vacuums instead of letting 
autovacuum handle the work?  We started doing far less vacuuming when we 
let autovacuum handle things.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Memory usage - indexes

2010-09-24 Thread Brad Nicholson


 On 10-09-24 01:41 PM, Tobias Brox wrote:

What do you mean, that you could run regular vacuum less frequently,
or that the regular vacuum would go faster?


It means that vacuums ran less frequently.  With cron triggered vacuums, 
we estimated when tables needed to be vacuumed, and vacuumed them 
accordingly.  Because of unpredictable shifts in activity, we scheduled 
the vacuums to happen more often than needed.


With autovacuum, we vacuum some of our large tables far less 
frequently.  We have a few large tables that used to get vacuumed every 
other day that now get vacuumed once or twice a month.


The vacuums themselves take longer now as we use the vacuum cost delay 
to control the IO.  That wasn't an option for us when we did manual 
vacuums as that was in 8.1 when vacuums were still treated as long 
running transactions.  Stretching a vacuum out to a few hours  prior to 
8.2 would bloat other tables.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Memory usage - indexes

2010-09-24 Thread Brad Nicholson


 On 10-09-24 03:06 PM, Bob Lunney wrote:

The pertinent difference between pg_stat_user_indexes and 
pg_statio_user_indexes is the latter shows the number of blocks read from disk 
or found in the cache.


I have a minor, but very important correction involving this point.  The 
pg_statio tables show you what blocks are found in the Postgres buffer 
cache, and what ones are not.


For the ones that are not, those blocks may come from the OS filesystem 
cache, a battery backed cache, or on the actual disk.  There is a big 
difference in performance based on where you are actually getting those 
blocks from (and you can't this info from Postgres).


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] New wiki page on write reliability

2010-10-21 Thread Brad Nicholson


On 10-10-21 10:08 AM, Greg Smith wrote:
Now that some of my recent writing has gone from NDA protected to 
public sample, I've added a new page to the PostgreSQL wiki that 
provides a good starting set of resources to learn about an ever 
popular topic here, how write cache problems can lead to database 
corruption:  http://wiki.postgresql.org/wiki/Reliable_Writes


Bruce also has a presentation he's been working on that adds pictures 
showing the flow of data through the various cache levels, to help 
people visualize the whole thing, that should get added into there 
once he's finished tweaking it.


I'd like to get some feedback from the members of this list about 
what's still missing after this expanded data dump.  Ultimately I'd 
like to get this page to be an authoritative enough resource that the 
Reliability section of the official documentation could point back 
to this as a recommendation for additional information.  So much of 
this material requires singling out specific vendors and staying up to 
date with hardware changes, both things that the official docs are not 
a good place for.




Looks like a good start.

I think a warning turning fsync off, the dangers of async_commit, and 
the potential problems with disabling full_page_writes might be worth 
mentioning on this page, unless you want to leave that buried in the 
attached references.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] AIX slow buffer reads

2010-10-26 Thread Brad Nicholson


On 10-10-25 03:26 PM, André Volpato wrote:

| On Mon, Oct 25, 2010 at 2:21 PM, André Volpato
|andre.volp...@ecomtecnologia.com.br  wrote:
|  Hi all,
|
|  We are tuning a PostgreSQL box with AIX 5.3 and got stucked in a
|  very odd situation.
|  When a query got ran for the second time, the system seems to
|  deliver the results to slow.
|
|  Here´s some background info:
|
|  AIX Box:
|  PostgreSQL 8.4.4, AIX 5.3-9 64bits, SAN IBM DS3400, 8x450GB SAS 15K
|  Raid-5
|  8GB RAM, 2.3GB Shared buffers
|
|  Debian Box:
|  PostgreSQL 8.4.4, Debian 4.3.2 64bits, SAN IBM DS3400, 5x300GB SAS
|  15K Raid-0
|  7GB RAM, 2.1GB Shared buffers
|
|  Right now, we changed lots of AIX tunables to increase disk and SO
|  performance.
|  Of course, postgres got tunned as well. I can post all changes made
|  until now if needed.
|
|  To keep it simple, I will try to explain only the buffer read issue.
|  This query [1] took like 14s to run at AIX, and almost the same time
|  at Debian.
|  The issue is when I run it for the second time:
|  AIX - 8s
|  Debian - 0.3s
|
|  These times keep repeating after the second run, and I can ensure
|  AIX isn´t touching the disks anymore.
|  I´ve never seen this behaviour before. I heard about Direct I/O and
|  I was thinking about givng it a shot.
|  Any ideas?
|
|
| I doubt disk/io is the problem.

Me either.
Like I said, AIX do not touch the storage when runing the query.
It became CPU-bound after data got into cache.


Have you confirmed that the hardware is ok on both servers?

Have both OS's been tuned by people that know how to tune the respective 
OS's?  AIX is very different than Linux, and needs to be tuned accordingly.


On AIX can you trace why it is CPU bound?  What else is taking the CPU 
time, anything?


Also, can you provide the output of pg_config from your AIX build?

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] AIX slow buffer reads

2010-10-27 Thread Brad Nicholson


On 10-10-26 05:04 PM, André Volpato wrote:

- Mensagem original -
| On 10-10-25 03:26 PM, André Volpato wrote:
|  | On Mon, Oct 25, 2010 at 2:21 PM, André Volpato
|  |andre.volp...@ecomtecnologia.com.br  wrote:

(...)

|  |   These times keep repeating after the second run, and I can
|  |   ensure AIX isn´t touching the disks anymore.
|  |   I´ve never seen this behaviour before. I heard about Direct I/O
|  |   and I was thinking about givng it a shot.
|  |
|  |   Any ideas?
|  |
|  |
|  | I doubt disk/io is the problem.
|
|  Me either.
|  Like I said, AIX do not touch the storage when runing the query.
|  It became CPU-bound after data got into cache.
|
| Have you confirmed that the hardware is ok on both servers?
|

The hardware was recently instaled and checked by the vendor team.
AIX box is on JS22:
PostgreSQL 8.4.4, AIX 5.3-9 64bits, SAN IBM DS3400, 8x450GB SAS 15K Raid-5
8GB RAM (DDR2 667)

# lsconf
System Model: IBM,7998-61X
Processor Type: PowerPC_POWER6
Processor Implementation Mode: POWER 6
Processor Version: PV_6
Number Of Processors: 4
Processor Clock Speed: 4005 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
Memory Size: 7680 MB

Debian box is on HS21:
PostgreSQL 8.4.4, Debian 4.3.2 64bits, SAN IBM DS3400, 5x300GB SAS 15K Raid-0
7GB RAM (DDR2 667)
We are forced to use RedHat on this machine, so we are virtualizing the Debian 
box.

# cpuinfo
processor   : [0-7]
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
stepping: 6
cpu MHz : 2500.148
cache size  : 6144 KB



| Have both OS's been tuned by people that know how to tune the
| respective OS's? AIX is very different than Linux, and needs to be tuned
| accordingly.

We´ve been tuning AIX for the last 3 weeks, and lots of tuneables got changed.
On Debian, we have far more experience, and it´s been a chalenge to understand 
how AIX works.

Most important tunes:
page_steal_method=1
lru_file_repage=0
kernel_heap_psize=64k
maxperm%=90
maxclient%=90
minperm%=20

Disk:
chdev -l hdisk8 -a queue_depth=24
chdev -l hdisk8 -a reserve_policy=no_reserve
chdev -l hdisk8 -a algorithm=round_robin
chdev -l hdisk8 -a max_transfer=0x40

HBA:
chdev -l fcs0 -P -a max_xfer_size=0x40 -a num_cmd_elems=1024

Postgres:
shared_buffers = 2304MB
effective_io_concurrency = 5


I wonder if effective_io_concurrency has anything to do with it.  It was 
implemented and mainly tested on Linux, and I am unsure if it will do 
anything on AIX.  The plan you posted for the query does a bitmap index 
scans which is what effective_io_concurrency will speed up.


Can you post the output of explain analyze for that query on both AIX 
and Linux?  That will show where the time is being spent.


If it is being spent in the bitmap index scan, try setting 
effective_io_concurrency to 0 for Linux, and see what effect that has.



--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] AIX slow buffer reads

2010-10-27 Thread Brad Nicholson


On 10/27/2010 4:10 PM, Tom Lane wrote:

Greg Smithg...@2ndquadrant.com  writes:

AndrÃ© Volpato wrote:

I disabled effective_io_concurrency at AIX but it made no changes on bitmap 
index times.

Brad's point is that it probably doesn't do anything at all on AIX, and
is already disabled accordingly.

AFAICT from googling, AIX does have posix_fadvise, though maybe it
doesn't do anything useful ...

regards, tom lane


If there is an easy way to check if it does do anything useful?  If so, 
I can check it out.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

64 matches

Mail list logo