[PERFORM] Fragmentation/Vacuum, Analyze, Re-Index

2010-01-22 Thread DM
Hello All,

How to identify if a table requires full vacuum? How to identify when to do
re-index on an existing index of a table?

Is there any tool for the above?

Thanks
Deepak Murthy


[PERFORM] Slow update query

2010-01-22 Thread elias ghanem
Hi,

 

For the explain analyze here's the output:

Seq Scan on in_sortie  (cost=0.00..171140.19 rows=114449 width=84) (actual
time=15.074..28461.349 rows=99611 loops=1)

  Output: type, site_id, fiche_produit_id, numero_commande, ligne_commande,
date_sortie, quantite_sortie, date_livraison_souhaitee, quantite_souhaitee,
client_ref, valeur, type_mouvement, etat_sortie_annulation,
etat_sortie_prevision, etat_sortie_taux_service, date_commande, valide

  Filter: (valeur  0.83)

Total runtime: 104233.651 ms

 

(Although the total runtime is 104233.651 ms when I run the query it takes
2.5 mins)

 

-Concerning the exact version of postgresql I'm using, here is the result of
the select version() :

PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.6
20060404 (Red Hat 3.4.6-10), 32-bit

 

- for the postgresql.conf I've attached the file.

 

-Concerning the query, I'm sorry; it seems that I did not explain the
problem clearly enough. Here's a better explanation:

This update, shown below, is just one step in a long process. After
processing certain rows, these rows have to be flagged so they don't get
processed another time.

UPDATE IN_SORTIE SET VALIDE = 'O' WHERE VALEUR  0.83

The [SET VALIDE = 'O'] merely flags this row as already processed.

The where clause that identifies these rows is rather simple: [WHERE VALEUR
 0.83]. It affects around 100,000 records in a table that contains around
3,000,000.

We are running this process on both Oracle and Postgres. I have noticed that
this particular UPDATE statement for the same table size and the same number
of rows affected, takes 11 seconds on Oracle while it takes 2.5 minutes on
Postgres.

Knowing that there are no indexes on either database for this table;

 

So the problem can be resumed by the following: why a query like UPDATE
IN_SORTIE SET VALIDE = 'O' WHERE VALEUR  0.83 takes 2.5 min on Postgresql
knowing that it is issued on a table containing around 3 000 000 records and
affects around 1 00 000 record

 

Thanks again for your advise



postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow update query

2010-01-22 Thread Robert Haas
On Thu, Jan 21, 2010 at 11:14 AM, elias ghanem e.gha...@acteos.com wrote:
 So the average execution time of the function is around 2.5 mins, meaning
 that the update query (+ the vacuum) takes 2.5 mins to execute. So is this a
 normal behavior? (The same function in oracle with the same environment
 (with our vacuum obviously) is executed in 11 second).

Does Oracle get slower if you actually change something?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-22 Thread Carlo Stonebanks

Hi Greg,

As a follow up to this suggestion:

I don't see effective_cache_size listed there.  If that's at the default, 
I wouldn't be surprised that you're seeing sequential scans instead of 
indexed ones far too often.


I found an article written by you 
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm and thought 
this was pretty useful, and especially this comment:


effective_cache_size should be set to how much memory is leftover for disk 
caching after taking into account what's used by the operating system, 
dedicated PostgreSQL memory, and other applications. If it's set too low, 
indexes may not be used for executing queries the way you'd expect. Setting 
effective_cache_size to 1/2 of total memory would be a normal conservative 
setting. You might find a better estimate by looking at your operating 
system's statistics. On UNIX-like systems, add the free+cached numbers from 
free or top. On Windows see the System Cache in the Windows Task Manager's 
Performance tab.




Are these values to look at BEFORE starting PG? If so, how do I relate the 
values returned to setting the effective_cache_size values?


Carlo

PS Loved your 1995 era pages. Being a musician, it was great to read your 
recommendations on how to buy these things called CD's. I Googled the 
term, and they appear to be some ancient precursor to MP3s which people 
actually PAID for. What kind of stone were they engraved on? ;-D




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Tory M Blue
On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer
cr...@postnewspapers.com.auwrote:


  Any assistance would be appreciated, don't worry about slapping me
  around I need to figure this out. Otherwise I'm buying new hardware
  where it may not be required.

 What is the reporting query that takes 26 hours? You didn't seem to
 include it, or any query plan information for it (EXPLAIN or EXPLAIN
 ANALYZE results).


It's this query, run 6000 times with a diff makeid's  *
*

*SELECT COUNT(DISTINCT uid )  AS active_users FROM pixelpool.userstats
WHERE makeid ='bmw-ferman' AND tagged =true*

*Plan*

*Aggregate  (cost=49467.00..49467.01 rows=1 width=8)*

*  -  Bitmap Heap Scan on userstats
(cost=363.49..49434.06 rows=13175 width=8)*

*Recheck Cond: (makeid = 'b1mw-ferman'::text)*

*Filter: tagged*

*-  Bitmap Index Scan on
idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

*  Index Cond: ((makeid = 'b1mw-ferman'::text)
AND (tagged = true))*


 What sort of activity is happening on the db concurrently with your
 tests? What's your max connection limit?


50 max and there is nothing, usually one person connected if that, otherwise
it's a cron job that bulk inserts and than jobs later on run that generate
the reports off the static data. No deletes or updates happening.



 What're your shared_buffers and effective_cache_size settings?


shared_buffers = 1028MB  (Had this set at 128 and 256 and just recently
bumped it higher, didn't buy me anything)
maintenance_work_mem = 128MB
fsync=on
random_page_cost = 4.0
effective_cache_size = 7GB
default vac settings



 Could sorts be spilling to disk? Check work_mem size and enable logging
 of tempfiles (see the manual).


work_mem = 100MB# min 64kB

Will do and I guess it's possible but during the queries, reports I don't
see a ton of writes, mostly reads


 Does an explicit ANALYZE of the problem table(s) help?


It didn't.

Thanks
Tory


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Tory M Blue
On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton d...@archonet.com wrote:

 On 21/01/10 22:15, Tory M Blue wrote:

 · Data distribution = In the 98mill records, there are 7000 unique

 makeid's, and 21mill unique UID's. About 41mill of the records have
 tagged=true

 · Time to execute the following query with indices on makeid and
 tagged = 90-120 seconds. The planner uses the webid index and filters on
 tagged and then rechecks the webid index

 *SELECT COUNT(DISTINCT uid )  AS active_users FROM
 pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true*

 · Time to execute the the same query with a combined index on
 makeid
 and tagged = 60-100 seconds. The planner uses the combined index and then
 filters tagged.


 Two things:

 1. You have got the combined index on (makeid, tagged) and not (tagged,
 makeid) haven't you? Just checking.


Yes we do


 2. If it's mostly tagged=true you are interested in you can always use a
 partial index: CREATE INDEX ... (makeid) WHERE tagged
 This might be a win even if you need a second index with WHERE NOT tagged.


Partial index doesn't seem to fit here due to the fact that there are 35-40%
Marked True.

Didn't think about creating a second index for false, may give that a shot.



 Also, either I've not had enough cofee yet, or a bitmap scan is an odd
 choice for only ~ 13000 rows out of 100 million.

  *-   Bitmap Index Scan on

 idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

 *  Index Cond: ((makeid =
 'b1mw-ferman'::text)
 AND (tagged = true))*


 Otherwise, see what Craig said.

 I'm assuming this isn't the query that is CPU bound for a long time. Unless
 your table is horribly bloated, there's no reason for that judging by this
 plan.


It is, but not always, only when there are 10K more matches. And the explain
unfortunately is sometimes way high or way low, so the expalin is hit and
miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.

Thanks

Tory


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Richard Huxton

On 22/01/10 18:03, Tory M Blue wrote:

On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxtond...@archonet.com  wrote:


On 21/01/10 22:15, Tory M Blue wrote:



2. If it's mostly tagged=true you are interested in you can always use a
partial index: CREATE INDEX ... (makeid) WHERE tagged
This might be a win even if you need a second index with WHERE NOT tagged.



Partial index doesn't seem to fit here due to the fact that there are 35-40%
Marked True.

Didn't think about creating a second index for false, may give that a shot.


If you're mostly search tagged=true, try the partial index - it'll mean 
the planner is just scanning the index for the one term.



Also, either I've not had enough cofee yet, or a bitmap scan is an odd
choice for only ~ 13000 rows out of 100 million.

  *-Bitmap Index Scan on


idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

*  Index Cond: ((makeid =
'b1mw-ferman'::text)
AND (tagged = true))*



Otherwise, see what Craig said.

I'm assuming this isn't the query that is CPU bound for a long time. Unless
your table is horribly bloated, there's no reason for that judging by this
plan.


It is, but not always, only when there are 10K more matches. And the explain
unfortunately is sometimes way high or way low, so the expalin is hit and
miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.


Hmm - might be able to push that cross-over point up a bit by tweaking 
various costs, but you've got to be careful you don't end up making all 
your other queries worse. It'd be good to figure out what the problem is 
first.


Looking at the query there are four stages:
 1. Scan the index, build a bitmap of heap pages with matching rows
 2. Scan those pages, find the rows that match
 3. Run DISTINCT on the uids
 4. Count them
I wonder if it could be the DISTINCT. What happens with a count(*) or 
count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful 
than straight EXPLAIN here. That will show actual times for each stage.


On Craig's branch of this thread, you say you call it 6000 times with 
different makeids. Any reason why you can't join to a temp table and 
just do it in one query?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Matthew Wakeling

On Fri, 22 Jan 2010, Tory M Blue wrote:

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.


Okay, so you have two differing cases. Show us the EXPLAIN ANALYSE for 
both of them, and we will see what the difference is.


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Scott Marlowe
On Fri, Jan 22, 2010 at 10:59 AM, Tory M Blue tmb...@gmail.com wrote:
 On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer cr...@postnewspapers.com.au
 wrote:
  Any assistance would be appreciated, don't worry about slapping me
  around I need to figure this out. Otherwise I'm buying new hardware
  where it may not be required.

 What is the reporting query that takes 26 hours? You didn't seem to
 include it, or any query plan information for it (EXPLAIN or EXPLAIN
 ANALYZE results).

 It's this query, run 6000 times with a diff makeid's

 SELECT COUNT(DISTINCT uid )  AS active_users FROM pixelpool.userstats
 WHERE makeid ='bmw-ferman' AND tagged =true

Any chance of trying this instead:

select makeid, count(distinct uid) as active_users from
pixelpool.userstats where tagged=true group by makeid

And seeing how long it takes?  If you're limiting the total number of
makeids then you could add

and makeid in (biglistofmakeidsgoeshere)

Note that a partial index of

create index xyz on pixelpool.userstats (makeid) where tagged;

might help both the original and this query.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fragmentation/Vacuum, Analyze, Re-Index

2010-01-22 Thread Richard Neill

DM wrote:
Is there any script/tool to identify if the table requires full vacuum? 
or to re-index an existing index table?




Don't know if there is a script to specifically do this, though you may 
find this query a useful one:


SELECT relname, reltuples, relpages FROM pg_class ORDER BY relpages DESC;


(it shows what's currently using most of the disk).


In general though, you should never use VACUUM FULL. The best bet is 
to tune autovacuum to be more aggressive, and then occasionally run CLUSTER.


Best wishes,

Richard




Thanks
Deepak



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread fka...@googlemail.com
Scott Carey:

 Well, something is causing the system to alternate between
 CPU and disk bound here.  (see below).
 It would be useful to see what affect the index has.

Ok, I simply deleted the index and repeated the test: I did
not notice any difference. This is probably so because in
fact I am doing just 10 INSERTs.


  (B) perfmon34.png: Results/graphs (performance monitor):
  
 Great data!

BTW: I have some more screenshots but as they do not arrive
on the mailing list I keep it. The new graphs are basicly
the same anyway.


  (1) The upper dark/gray graph touching the 100% sometimes is
  disk write time % of the data disk G:
  
  (2) The yellow graph is nearly completly overpainted by (1)
  since it is disk time %.
  
  (3) The brown graph below (1) is Disk Write Byts/s divided
  by 1.000.000, so around 40 MB/s average.
  
 
 Looks like it is writing everything twice, or close to it.
 Alternatively the index writes occupy half, but that is
 unlikely.

'Writing twice': That is the most interesting point I
believe. Why is the data disk doing 40 MB/s *not* including
WAL, however, having 20 MB/s write thoughput in fact. Seems
like: 20 MB for data, 20 MB for X, 20 MB for WAL. 

Although that questions is still unanswered: I verified
again that I am disk bound by temporarily replacing the
raid-0 with slower solution: a singly attached sata disk
of the same type: This *did* slow down the test a lot
(approx.  20%). So, yes, I am disk bound but, again, why
that much...

About removing the index on OIDs: No impact (see above).


  (4) The read graph is Disk Time % of the WAL drive E:,
  average approx. 30%.
  
 
 WAL doesn't look like a bottleneck here, as other tests
 have shown.  A larger wal_buffers setting might lower this
 more, since your record overflows the buffer for sure.
 You might want to change your test case to write records
 similar size to what you expect (do you expect 80MB?) and
 then set wal_buffers up to the size of one checkpoint
 segment (16MB) if you expect larger data per transaction.

Ok, without knowing each exact effect I changed some of the
configuration values (from the defaults in 8.2.4), and did
some tests:

(1) First, the most important 8.2.4 defaults (for you to
overlook):

#shared_buffers=32MB
#temp_buffers=8MB
#max_prepared_transactions=5
#work_mem=1MB
#maintenance_work_mem=16MB
#max_stack_depth=2MB
#max_fsm_pages=204800
#max_fsm_relations=1000
#max_files_per_process=1000
#shared_preload_libraries=''
#vacuum_cost_delay=0
#vacuum_cost_page_hit=1
#vacuum_cost_page_miss=10
#vacuum_cost_page_dirty=20
#vacuum_cost_limit=200
#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#fsync=on
#full_page_writes=on
#wal_buffers=64kB
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s
#seq_page_cost=1.0
#random_page_cost=4.0
#cpu_tuple_cost=0.01
#cpu_index_tuple_cost=0.005
#cpu_operator_cost=0.0025
#effective_cache_size=128MB
#default_statistics_target=10
#constraint_exclusion=off
#from_collapse_limit=8
#join_collapse_limit=8
#autovacuum=on
#autovacuum_naptime=1min
#autovacuum_vacuum_threshold=500
#autovacuum_analyze_threshold=250
#autovacuum_vacuum_scale_factor=0.2
#autovacuum_analyze_scale_factor=0.1
#autovacuum_freeze_max_age=2
#autovacuum_vacuum_cost_delay=-1
#autovacuum_vacuum_cost_limit=-1
#deadlock_timeout=1s
#max_locks_per_transaction=64


(2) The tests:

Note: The standard speed was about 800MB/40s, so 20MB/s.


a)
What I changed: fsync=off
Result: 35s, so 5s faster.


b) like a) but:
checkpoint_segments=128 (was 3)
autovacuum=off

Result: 35s (no change...?!)


c) like b) but:
temp_buffers=200MB (was 8)
wal_sync_method=open_datasync (was fsync)
wal_buffers=1024kB (was 64)

Result:
The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
However, having autovacuum=off probably means that deleted
rows will occupy disk space? And I also fear that
checkpoint_segments=128 mean that at some point in the
future there will be a huge delay then (?).


d) also like b) but:
temp_buffers=1000MB
wal_buffers=4096kB
checkpoint_segments=3
autovacuum=on

Result: Again slower 36s



I am not able to interprete that in depth.





  (C) My interpretation
  
  (1) Although the data disk G: sometimes hits 100%: All in
  all it seems that neither the CPUs nor the data disk
  (approx. 65%) nor the WAL disk (approx. 30%) are at their
  limits. See also 1000 writes/s, 40MB/s write thoughput.
  
 
 I think it is alternating.  Whatever is causing the 25%
 CPU jump during the 'slow' periods is a clue.  Some
 process on the system must be increasing its time
 significantly in these bursts.  I suspect it is postgres
 flushing data from its shared_buffers to the OS.  8.2 is
 not very efficient at its ability to write out to the OS
 in a constant stream, and tends to be very 'bursty' like
 this.  I suspect that 8.3 or 8.4 would perform a lot
 better here, when tuned right.

Ok, I've managed to use 8.4 here. 

Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Craig Ringer

On 23/01/2010 1:59 AM, Tory M Blue wrote:


It's this query, run 6000 times with a diff makeid's /
/

/SELECT COUNT(DISTINCT uid )  AS active_users FROM
pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true/

/Plan/

/ Aggregate  (cost=49467.00..49467.01 rows=1 width=8)/

/   -  Bitmap Heap Scan on userstats  (cost=363.49..49434.06
rows=13175 width=8)/

/ Recheck Cond: (makeid = 'b1mw-ferman'::text)/

/ Filter: tagged/

/ -  Bitmap Index Scan on idx_retargetuserstats_makeidtag
(cost=0.00..360.20 rows=13175 width=0)/

/   Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged
= true))/


Try:

- Adding a partial index on makeid, eg:

   CREATE INDEX userstats_makeid_where_tagged_idx
   ON userstats (makeid) WHERE (tagged);

- Instead of repeating the query 6000 times in a loop, collect the data 
in one pass by joining against a temp table containing the makeids of 
interest.


SELECT COUNT(DISTINCT u.uid) AS active_users
FROM pixelpool.userstats u
INNER JOIN temp_makeids m ON (u.makeid = m.makeid)
WHERE u.tagged = true;

(If the 6000 repeats are really a correlated subquery part of a bigger 
query you still haven't shown, then you might be able to avoid 6000 
individual passes by adjusting your outer query instead).


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread Scott Carey

On Jan 21, 2010, at 12:35 AM, Greg Smith wrote:

 Scott Carey wrote:
 On Jan 20, 2010, at 5:32 AM, fka...@googlemail.com wrote:
 
 
 In the attachement you'll find 2 screenshots perfmon34.png
 and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
 list).
 
 
 
 I don't think they made it to the list?  I didn't see it, presumably 
 Scott got a direct copy.  I'd like to get a copy and see the graphs even 
 if takes an off-list message.  If it's an 8.2 checkpoint issue, I know 
 exactly what shapes those take in terms of the disk I/O pattern.
 

Sorry -- I didn't get them from the list, I was CC'd along with the list, and 
so my copy has the images.

 -- 
 Greg Smith2ndQuadrant   Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com  www.2ndQuadrant.com
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread Scott Carey
On Jan 22, 2010, at 12:42 PM, fka...@googlemail.com wrote:
 
 'Writing twice': That is the most interesting point I
 believe. Why is the data disk doing 40 MB/s *not* including
 WAL, however, having 20 MB/s write thoughput in fact. Seems
 like: 20 MB for data, 20 MB for X, 20 MB for WAL. 
 

There are a few things that can do this for non-TOAST stuff.  The other comment 
that TOAST writes all zeros first might be related too.

 Although that questions is still unanswered: I verified
 again that I am disk bound by temporarily replacing the
 raid-0 with slower solution: a singly attached sata disk
 of the same type: This *did* slow down the test a lot
 (approx.  20%). So, yes, I am disk bound but, again, why
 that much...
 

Sometimes disk bound (as the graphs show).  I suspect that if you artificially 
slow your CPU down (maybe force it into power saving mode with a utility) it 
will also be slower.  The I/O seems to be the most significant part though.

 
 (1) First, the most important 8.2.4 defaults (for you to
 overlook):
 
 #shared_buffers=32MB

Try 200MB for the above
 #temp_buffers=8MB

You tried making this larger, which helped some.

 #bgwriter_delay=200ms
 #bgwriter_lru_percent=1.0
 #bgwriter_lru_maxpages=5
 #bgwriter_all_percent=0.333
 #bgwriter_all_maxpages=5
 #checkpoint_segments=3
 #checkpoint_timeout=5min
 #checkpoint_warning=30s

Check out this for info on these parameters
http://wiki.postgresql.org/wiki/User:Gsmith  (Is there a better link Greg?)

 #fsync=on
Changing this probably helps the OS spend less time flushing to disk.

 
 (2) The tests:
 
 Note: The standard speed was about 800MB/40s, so 20MB/s.
 
 
 a)
 What I changed: fsync=off
 Result: 35s, so 5s faster.
 
 
 b) like a) but:
 checkpoint_segments=128 (was 3)
 autovacuum=off
 
 Result: 35s (no change...?!)
 

yes, more checkpoint_segments will help if your shared_buffers is larger, it 
won't do a whole lot otherwise.  Generally, I like to keep these roughly equal 
sized as a starting point for any small to medium sized configuration.  So if 
shared_buffers is 1GB, that takes 64 checkpoint segments to hold for heavy 
write scenarios.

 
 c) like b) but:
 temp_buffers=200MB (was 8)
 wal_sync_method=open_datasync (was fsync)
 wal_buffers=1024kB (was 64)
 
 Result:
 The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
 However, having autovacuum=off probably means that deleted
 rows will occupy disk space? And I also fear that
 checkpoint_segments=128 mean that at some point in the
 future there will be a huge delay then (?).

I am curious which of the two helped most.  I don't think temp_buffers should 
do anything (it is for temp tables afaik).

 d) also like b) but:
 temp_buffers=1000MB
 wal_buffers=4096kB
 checkpoint_segments=3
 autovacuum=on
 
 Result: Again slower 36s
 

Try changing shared_buffers.  This is where uncommitted data needs to avoid 
overflowing before a commit.  If this was non-TOAST data, i would suspect this 
is the cause of any double-writing. But I don't know enough about TOAST to know 
if the same things happen here.


 Ok, I've managed to use 8.4 here. Unfortunatelly: There was
 nearly no improvement in speed. For example test 2d)
 performed in 35s.
 

With a very small shared_buffers the improvements to Postgres' shared_buffer / 
checkpoint interaction can not be utilized.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fragmentation/Vacuum, Analyze, Re-Index

2010-01-22 Thread Reid Thompson

On 1/22/2010 2:27 PM, Richard Neill wrote:

DM wrote:

Is there any script/tool to identify if the table requires full
vacuum? or to re-index an existing index table?


http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html

The bucardo project has released its nagios plugins for PostgreSQL and we can extract from them this nice view 
in order to check for table and index bloat into our PostgreSQL databases:


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance