date:20080428

Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-28 Thread Dimitri Fontaine

Hi,

Le dimanche 27 avril 2008, Greg Smith a écrit :
 than SQL*PLUS.  Then on the PostgreSQL side, you could run multiple COPY
 sessions importing at once to read this data all back in, because COPY
 will bottleneck at the CPU level before the disks will if you've got
 reasonable storage hardware.

Latest pgloader version has been made to handle this exact case, so if you 
want to take this route, please consider pgloader 2.3.0:
  http://pgloader.projects.postgresql.org/#_parallel_loading
  http://pgfoundry.org/projects/pgloader/

Another good reason to consider using pgloader is when the datafile contains 
erroneous input lines and you don't want the COPY transaction to abort. Those 
error lines will get rejected out by pgloader while the correct ones will get 
COPYied in.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Gregory Stark


Josh Berkus [EMAIL PROTECTED] writes:

 I think TPC-E will make both of these major improvements much more important.
 I suspect it would be hard to get 8.2 to even pass TPC-E due to the 
 checkpoint
 dropouts.

 You'd be surprised, then.  We're still horribly, horribly lock-bound on TPC-E;
 on anything over 4 cores lock resolution chokes us to death.  See Jignesh's 
 and
 Paul's various posts about attempts to fix this.

Most of those posts have been about scalability issues with extremely large
numbers of sessions. Those are interesting too and they may be limiting our
results in benchmarks which depend on such a configuration (which I don't
think includes TPC-E, but the benchmark Jignesh has been writing about is some
Java application benchmark which may be such a beast) but they don't directly
relate to whether we're passing TPC-E.

What I was referring to by passing TPC-E was the criteria for a conformant
benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile
response time  5s and average response time  95th percentile response
time. You can pass those even if 1 transaction in 20 takes 10-20s which is
more than enough to cover checkpoints and other random sources of inconsistent
performance.

TPC-E has more stringent requirements which explicitly require very consistent
response times and I doubt 8.2 would have been able to pass them. So the
performance limiting factors whether they be i/o, cpu, lock contention, or
whatever don't even come into play. We wouldn't have any conformant results
whatsoever, not even low values limited by contention. 8.3 however should be
in a better position to pass.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Heikki Linnakangas


Gregory Stark wrote:

TPC-E has more stringent requirements which explicitly require very consistent
response times and I doubt 8.2 would have been able to pass them.


Sure it would. Just not for a very large scale factor ;-).

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

All,

We have a table table1 which get insert and updates daily in high numbers,
bcoz of which its size is increasing and we have to vacuum it every
alternate day. Vacuuming table1 take almost 30min and during that time the
site is down.

We need to cut down on this downtime.So thought of having a replication
system, for which the replicated DB will be up during the master is getting
vacuumed.

Can anybody guide which will be the best suited replication solution for
this.

Thanx for any help
~ Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Peter Childs

2008/4/28 Gauri Kanekar [EMAIL PROTECTED]:

 All,

 We have a table table1 which get insert and updates daily in high
 numbers, bcoz of which its size is increasing and we have to vacuum it every
 alternate day. Vacuuming table1 take almost 30min and during that time the
 site is down.

 We need to cut down on this downtime.So thought of having a replication
 system, for which the replicated DB will be up during the master is getting
 vacuumed.

 Can anybody guide which will be the best suited replication solution for
 this.

 Thanx for any help
 ~ Gauri


I home your not using Vacuum Full... (Standard Reply for this type of
question)

What version of Postgresql are you using?

Have you tried autovacuum?

Run plain vacuum even more often on this even more often (like ever half
hour) and it should not take as long and save space.

If still have trouble run vacuum analyse verbose table1; and see what it
says.

If your doing it right you should be able to vacuum with the database up.

Sounds like you might be happier a fix for the problem rather than a complex
work around which will actually solve a completely different problem.

Regards

Peter.

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

Peter,

We are doing vacuum full every alternate day. We also do vacuum analyze very
often.
We are currently using 8.1.3 version.
Auto vacuum is already on. But the table1 is so busy that auto vacuum don't
get sufficient chance to vacuum it :(.

Have already tried all the option listed by you, thats y we reached to the
decision of having a replication sytsem. So any suggestion on that :).

Thanx
~ Gauri



On Mon, Apr 28, 2008 at 7:28 PM, Peter Childs [EMAIL PROTECTED]
wrote:



 2008/4/28 Gauri Kanekar [EMAIL PROTECTED]:

 All,
 
  We have a table table1 which get insert and updates daily in high
  numbers, bcoz of which its size is increasing and we have to vacuum it every
  alternate day. Vacuuming table1 take almost 30min and during that time the
  site is down.
 
  We need to cut down on this downtime.So thought of having a replication
  system, for which the replicated DB will be up during the master is getting
  vacuumed.
 
  Can anybody guide which will be the best suited replication solution for
  this.
 
  Thanx for any help
  ~ Gauri
 

 I home your not using Vacuum Full... (Standard Reply for this type of
 question)

 What version of Postgresql are you using?

 Have you tried autovacuum?

 Run plain vacuum even more often on this even more often (like ever half
 hour) and it should not take as long and save space.

 If still have trouble run vacuum analyse verbose table1; and see what it
 says.

 If your doing it right you should be able to vacuum with the database up.

 Sounds like you might be happier a fix for the problem rather than a
 complex work around which will actually solve a completely different
 problem.

 Regards

 Peter.




-- 
Regards
Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Brad Nicholson


On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote:
 Peter,
 
 We are doing vacuum full every alternate day. We also do vacuum
 analyze very often.
 We are currently using 8.1.3 version.
 Auto vacuum is already on. But the table1 is so busy that auto vacuum
 don't get sufficient chance to vacuum it :(.

You should seriously consider upgrading to PG 8.3.  There have been
substantial improvements to VACUUM since 8.1

Brad.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread salman


Gauri Kanekar wrote:

Peter,

We are doing vacuum full every alternate day. We also do vacuum analyze very
often.
We are currently using 8.1.3 version.
Auto vacuum is already on. But the table1 is so busy that auto vacuum don't
get sufficient chance to vacuum it :(.

Have already tried all the option listed by you, thats y we reached to the
decision of having a replication sytsem. So any suggestion on that :).

Thanx
~ Gauri



We use slony for exactly this type of a situation. It's not the most 
user-friendly piece of software, but it works well enough that I can 
schedule maintenance windows (we're a 24/7 shop) and do clustering and 
other tasks on our DB to reclaim space, etc.


-salman

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

Thats one of the thingsto be done in near future.
But it need some changes from application point of view. :( ... so just got
escalated for that reason.

But for now, which one will be a well suited replication system ?

~ Gauri

On Mon, Apr 28, 2008 at 7:43 PM, Brad Nicholson [EMAIL PROTECTED]
wrote:


 On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote:
  Peter,
 
  We are doing vacuum full every alternate day. We also do vacuum
  analyze very often.
  We are currently using 8.1.3 version.
  Auto vacuum is already on. But the table1 is so busy that auto vacuum
  don't get sufficient chance to vacuum it :(.

 You should seriously consider upgrading to PG 8.3.  There have been
 substantial improvements to VACUUM since 8.1

 Brad.




-- 
Regards
Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

Salman,

Slony don't do automatic failover. And we would appreciate a system with
automatic failover :(

~ Gauri


On Mon, Apr 28, 2008 at 7:46 PM, salman [EMAIL PROTECTED]
wrote:

 Gauri Kanekar wrote:

  Peter,
 
  We are doing vacuum full every alternate day. We also do vacuum analyze
  very
  often.
  We are currently using 8.1.3 version.
  Auto vacuum is already on. But the table1 is so busy that auto vacuum
  don't
  get sufficient chance to vacuum it :(.
 
  Have already tried all the option listed by you, thats y we reached to
  the
  decision of having a replication sytsem. So any suggestion on that :).
 
  Thanx
  ~ Gauri
 
 
 We use slony for exactly this type of a situation. It's not the most
 user-friendly piece of software, but it works well enough that I can
 schedule maintenance windows (we're a 24/7 shop) and do clustering and other
 tasks on our DB to reclaim space, etc.

 -salman




-- 
Regards
Gauri

[PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread A B

So, it is time to improve performance, it is running to slow.
AFAIK (as a novice) there are a few general areas:

1) hardware
2) rewriting my queries and table structures
3) using more predefined queries
4) tweek parameters in the db conf files

Of these points:
1) is nothing I can do about right now, but in the future perhaps.
2) will be quite hard right now since there is more code than time.
3) almost like 2 but perhaps more do-able with the current constraints.
4) This seems to be the easiest one to start with...

So what should I do/read concerning point 4?
If you have other good suggestions  I'd be very interested in that.

Thank you :-)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Josh Berkus


Greg,


What I was referring to by passing TPC-E was the criteria for a conformant
benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile
response time  5s and average response time  95th percentile response
time. You can pass those even if 1 transaction in 20 takes 10-20s which is
more than enough to cover checkpoints and other random sources of inconsistent
performance.


We can do this now.  I'm unhappy because we're at about 1/4 of Oracle 
performance, but we certainly pass -- even with 8.2.


--Josh

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] shared_buffer/DRBD performances

2008-04-28 Thread Gaetano Mendola

I have complete the benchmarks I have made with Postgres and I have talk about
some weeks ago on postgres performance mailing list (see post shared_buffers).

On the follow link you can find a doc that contains the graphs generated.

http://www.mediafire.com/?lk4woomsxlc



Regards
Gaetano Mendola

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Claus Guttesen

  1) hardware
  2) rewriting my queries and table structures
  3) using more predefined queries
  4) tweek parameters in the db conf files

  Of these points:
  1) is nothing I can do about right now, but in the future perhaps.
  2) will be quite hard right now since there is more code than time.
  3) almost like 2 but perhaps more do-able with the current constraints.
  4) This seems to be the easiest one to start with...

  So what should I do/read concerning point 4?
  If you have other good suggestions  I'd be very interested in that.

  Thank you :-)

You can provide information postgresql-version, what type of queries
you're running, some explain analyze of those, and what type of
hardware you're running and what OS is installed.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Andrew Sullivan

On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote:
 Peter,
 
 We are doing vacuum full every alternate day. We also do vacuum analyze very
 often.

VACUUM FULL is making your problem worse, not better.  Don't do that.

 We are currently using 8.1.3 version.

You need immediately to upgrade to the latest 8.1 stability and
security release, which is 8.1.11.  This is a drop-in replacement.
It's an urgent fix for your case.

 Auto vacuum is already on. But the table1 is so busy that auto vacuum don't
 get sufficient chance to vacuum it :(.

You probably need to tune autovacuum not to do that table, and just
vacuum that table in a constant loop or something.  VACUUM should
_never_ take the site down.  If it does, you're doing it wrong.
 
 Have already tried all the option listed by you, thats y we reached to the
 decision of having a replication sytsem. So any suggestion on that :).

I think you will find that no replication system will solve your
underlying problems.  That said, I happen to work for a company that
will sell you a replication system to work with 8.1 if you really want
it.

A


-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Andrew Sullivan

On Mon, Apr 28, 2008 at 07:48:48PM +0530, Gauri Kanekar wrote:

 Slony don't do automatic failover. And we would appreciate a system with
 automatic failover :(

No responsible asynchronous system will give you automatic failover.
You can lose data that way.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Dennis Muhlestein


A B wrote:

So, it is time to improve performance, it is running to slow.
AFAIK (as a novice) there are a few general areas:

1) hardware
2) rewriting my queries and table structures
3) using more predefined queries
4) tweek parameters in the db conf files

Of these points:
1) is nothing I can do about right now, but in the future perhaps.
2) will be quite hard right now since there is more code than time.
3) almost like 2 but perhaps more do-able with the current constraints.
4) This seems to be the easiest one to start with...

So what should I do/read concerning point 4?
If you have other good suggestions  I'd be very interested in that.

Thank you :-)



1st, change your log settings log_min_duration_statement to something 
like 1000 (one second).  This will allow you to see which statements 
take the longest.


2nd.  Use EXPLAIN ANALYZE on those statements to determine what is 
taking a long time and focus on optimizing those statements that take 
the longest to execute.


That ought to get you a long way down the road.

-Dennis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Steve Crawford


A B wrote:

So, it is time to improve performance, it is running to slow.
AFAIK (as a novice) there are a few general areas:

1) hardware
2) rewriting my queries and table structures
3) using more predefined queries
4) tweek parameters in the db conf files

Of these points:
1) is nothing I can do about right now, but in the future perhaps.
2) will be quite hard right now since there is more code than time.
3) almost like 2 but perhaps more do-able with the current constraints.
4) This seems to be the easiest one to start with...

So what should I do/read concerning point 4?
If you have other good suggestions  I'd be very interested in that.
  
Go back to step zero - gather information that would be helpful in 
giving advice. For starters:

- What hardware do you currently have?
- What OS and version of PG?
- How big is the database?
- What is the nature of the workload (small queries or data-mining, how 
many simultaneous clients, transaction rate, etc.)?

- Is PG sharing the machine with other workloads?

Then edit your postgresql.conf file to gather data (see 
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html). 
With stat collection enabled, you can often find some low-hanging fruit 
like indexes that aren't used (look in pg_stat_user_indexes) - sometime 
because the query didn't case something in the where-clause correctly.


Also look at 
http://www.postgresql.org/docs/8.3/interactive/runtime-config-logging.html 
- especially the log_min_duration_statement setting to find long-running 
queries. You will probably need to try different settings and watch the 
log. Logging impacts performance so don't just set to log everything and 
forget. You need to play with it.


Don't discount step 2 - you may find you can rewrite one inefficient but 
frequent query. Or add a useful index on the server.


Cheers,
Steve




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Chris Browne

[EMAIL PROTECTED] (A B) writes:
 So, it is time to improve performance, it is running to slow.
 AFAIK (as a novice) there are a few general areas:

 1) hardware
 2) rewriting my queries and table structures
 3) using more predefined queries
 4) tweek parameters in the db conf files

 Of these points:
 1) is nothing I can do about right now, but in the future perhaps.
 2) will be quite hard right now since there is more code than time.
 3) almost like 2 but perhaps more do-able with the current constraints.
 4) This seems to be the easiest one to start with...

 So what should I do/read concerning point 4?
 If you have other good suggestions  I'd be very interested in that.

 Thank you :-)

In the order of ease of implementation, it tends to be...

1.  Tweak postgresql.conf
2.  Make sure you ran VACUUM + ANALYZE
3.  Find some expensive queries and try to improve them, which might
involve changing the queries and/or adding relevant indices
4.  Add RAM to your server
5.  Add disk to your server
6.  Redesign your application's DB schema so that it is more performant
by design

URL below may have some material of value...
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxfinances.info/info/postgresqlperformance.html
It is usually a   good idea to  put  a capacitor of a  few microfarads
across the output, as shown.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread John Rouillard

Hi all:

We are loading in a number (100+) of sql files that are about 100M in
size.  It takes about three hours to load the file. There is very
little load on the database other than the copy from operations.

We are running postgresql-8.1.3 under Centos 4 on a RAID 1/0 array
with 4 disks (so we have only one spindle). The partitions are set up
in an LVM and iostat 5 shows (for one report):

  avg-cpu: %user %nice %sys %iowait %idle
   1.70  0.00  0.80 51.40   46.10

  Device:  tps Blk_read/s  Blk_wrtn/s  Blk_read  Blk_wrtn
  sda  179.20  1472.00 2561.60 7360  12808
  sda1 0.000.000.000 0
  sda2 385.20  1462.40 2561.60 7312  12808
  dm-0 0.800.006.400 32
  dm-1 0.000.000.000 0
  dm-2 0.000.000.000 0
  dm-3 0.000.000.000 0
  dm-4 4.400.0035.20   0 176
  dm-5 0.000.000.000 0
  dm-6 380.00  1462.40 2520.00 7312  12600

dm-6 is where the data files reside and dm-4 is where the WAL archives
are kept. Note all the DM's are on the same RAID 0 device /dev/sda2.

A sample psql command file to load the data is:

  BEGIN;
  COPY peers (observe_start, observe_end, geo_scope, geo_value,
  peer_a, peer_b) FROM stdin WITH NULL AS '';
  (data here)
  3 more copy commands to different tables w/ data
  COMMIT;

The primary keys for the tables being loaded are composite keys using
4-7 columns, so that may be part of the issue.

From postgres.conf

  shared_buffers = 3000
  #temp_buffers = 1000# min 100, 8KB each
  #max_prepared_transactions = 5  # can be 0 or more
  max_locks_per_transaction).
  work_mem = 2048 # min 64, size in KB
  maintenance_work_mem = 65536# min 1024, size in KB
  #max_stack_depth = 2048 # min 100, size in KB

The prior settings for work_mem/maintenance_work_mem were the
defaults:

  #work_mem = 1024   # min 64, size in KB
  #maintenance_work_mem = 16384  # min 1024, size in KB

I also took a look at disk-io hit rates:

# select * from pg_statio_user_tables;  
 relid | schema |   relname| heap_blks_read | heap_blks_hit | idx_blks_read 
| idx_blks_hit |
---++--++---+---+--+
 17282 | public | providers|  179485097 |  78832253 |835008 
|196903582 |
 17264 | public | events   |  0 | 0 |   
|  |
 17262 | public | days |495 |   219 |   478 
|   16 |
 17276 | public | peers|  147435004 | 114304828 |   1188908 
|295569499 |
 17288 | public | rankings |  564638938 | 345456664 | 275607291 
|   1341727605 |
 17270 | public | market_share | 131932 | 90048 |  5408 
|   182100 |

market_share did have one tidx_blks_read reported, but all the other
fields (toast_blks_read, toast_blks_hit, tidx_blks_read,
tidx_blks_hit) were empty for all rows.

This looks like we have whole indexes in memory except for the days
table, which has a low update rate, so I am not worried about that.

However for the heap_blks_read and heap_blks_hit we get a different
story:

  relname   | hit_percent
  --+---
  providers | 43.92
  days  | 44.24
  peers | 77.52
  rankings  | 61.18
  market_share  | 68.25

so we see a 43 % hit ratio for providers to 77% hit ratio for
peers. Not horrible hit rates given that we are more data warehousing
than OLTP, but I am not sure what effect increasing these (by
increasing shared_buffers I think) will have on the COPY operation. I
would suspect none.

To try to solve this speed issue:

 I checked the logs and was seeing a few

2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds apart)

 of these, so I changed:

   checkpoint_segments = 30 
   checkpoint_warning = 150

 in postgres.conf and reloaded postgres. I have only seen one of these
 log messages in the past week.
 
 I have turned of autovacuum.

 I have increased the maintenance_work_mem as mentioned
   above. (Although I didn't expect it to do anything unless we
   drop/recreate indexes).

 I have increased work_mem as mentioned above.

The only things I can think of is increasing shared memory, or
dropping indexes.

I don't see any indication in the docs that increasing shared memory
would help speed up a copy operation.

The only indexes we have to drop are the ones on the primary keys
(there is one non-primary key index in the database as well).

Can you drop an index on the primary key for a table and add it back
later?  Am I correct in saying: the primary key index is what enforces
the unique constraint in the table? If the index

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Greg Smith


On Mon, 28 Apr 2008, Gauri Kanekar wrote:

We are doing vacuum full every alternate day. We also do vacuum analyze 
very often. We are currently using 8.1.3 version...Have already tried 
all the option listed by you, thats y we reached to the decision of 
having a replication sytsem.


Andrew Sullivan has already given a response here I agree with, I wanted 
to expland on that.  You have a VACUUM problem.  The fact that you need 
(or feel you need) to VACUUM FULL every other day says there's something 
very wrong here.  The way to solve most VACUUM problems is to VACUUM more 
often, so that the work in each individual one never gets so big that your 
system takes an unnaceptable hit, and you shouldn't ever need VACUUM FULL. 
Since your problem is being aggrevated because you're running a 
dangerously obsolete version, that's one of the first things you should 
fix--to at least the latest 8.1 if you can't deal with a larger version 
migration.  The fact that you're happily running 8.1.3 says you most 
certainly haven't tried all the other options here.


Every minute you spend looking into a replication system is wasted time 
you could be spending on the right fix here.  You've fallen into the 
common trap where you're fixated on a particular technical solution so 
much that you're now ignoring suggestions on how to resolve the root 
problem.  Replication is hard to get going even on a system that works 
perfectly, and replicating a known buggy system just to work around a 
problem really sounds like a bad choice.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread Heikki Linnakangas


John Rouillard wrote:

We are running postgresql-8.1.3 under Centos 4


You should upgrade, at least to the latest minor release of the 8.1 
series (8.1.11), as there has been a bunch of important bug and security 
fixes. Or even better, upgrade to 8.3, which has reduced the storage 
size of especially variable length datatypes like text/char/varchar in 
particular. As your COPY is I/O bound, reducing storage size will 
translate directly to improved performance.



dm-6 is where the data files reside and dm-4 is where the WAL archives
are kept. Note all the DM's are on the same RAID 0 device /dev/sda2.


Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in 
the same transaction as you COPY into it, you can avoid WAL logging of 
the loaded data, which will in the best case double your performance as 
your WAL is on the same physical drives as the data files.



The only indexes we have to drop are the ones on the primary keys
(there is one non-primary key index in the database as well).

Can you drop an index on the primary key for a table and add it back
later?  Am I correct in saying: the primary key index is what enforces
the unique constraint in the table? If the index is dropped and
non-unique primary key data has been added, what happens when you
re-add the index?


Yes, the index is what enforces the uniqueness. You can drop the primary 
key constraint, and add it back after the load with ALTER TABLE. If the 
load introduces any non-unique primary keys, adding the primary key 
constraint will give you an error and fail.


Dropping and recreating the indexes is certainly worth trying.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread John Rouillard

On Mon, Apr 28, 2008 at 06:53:09PM +0100, Heikki Linnakangas wrote:
 John Rouillard wrote:
 We are running postgresql-8.1.3 under Centos 4
 You should upgrade, at least to the latest minor release of the 8.1 
 series (8.1.11), as there has been a bunch of important bug and security 
 fixes. Or even better, upgrade to 8.3, which has reduced the storage 
 size of especially variable length datatypes like text/char/varchar in 
 particular. As your COPY is I/O bound, reducing storage size will 
 translate directly to improved performance.

Yup. Just saw that suggestion in an unrelated email.
 
 dm-6 is where the data files reside and dm-4 is where the WAL archives
 are kept. Note all the DM's are on the same RAID 0 device /dev/sda2.
 
 Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in 
 the same transaction as you COPY into it, you can avoid WAL logging of 
 the loaded data, which will in the best case double your performance as 
 your WAL is on the same physical drives as the data files.

We can't do this as we are backfilling a couple of months of data into
tables with existing data.
 
 The only indexes we have to drop are the ones on the primary keys
 (there is one non-primary key index in the database as well).
 
 Can you drop an index on the primary key for a table and add it back
 later?  Am I correct in saying: the primary key index is what enforces
 the unique constraint in the table? If the index is dropped and
 non-unique primary key data has been added, what happens when you
 re-add the index?
 
 Yes, the index is what enforces the uniqueness. You can drop the primary 
 key constraint, and add it back after the load with ALTER TABLE. If the 
 load introduces any non-unique primary keys, adding the primary key 
 constraint will give you an error and fail.

That's the part I am worried about. I guess using psql to delete the
problem row then re-adding the index will work.
 
 Dropping and recreating the indexes is certainly worth trying.

Thanks for the info.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread Greg Smith


On Mon, 28 Apr 2008, John Rouillard wrote:


   2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds apart)
so I changed:
  checkpoint_segments = 30
  checkpoint_warning = 150


That's good, but you might go higher than 30 for a bulk loading operation 
like this, particularly on 8.1 where checkpoints are no fun.  Using 100 is 
not unreasonable.



shared_buffers = 3000
I don't see any indication in the docs that increasing shared memory
would help speed up a copy operation.


The index blocks use buffer space, and what ends up happening if there's 
not enough memory is they are written out more than they need to be (and 
with your I/O hardware you need to avoid writes unless absolutely 
necessary).  Theoretically the OS is caching around that situation but 
better to avoid it.  You didn't say how much RAM you have, but you should 
start by a factor of 10 increase to 30,000 and see if that helps; if so, 
try making it large enough to use 1/4 of total server memory.  3000 is 
only giving the server 24MB of RAM to work with, and it's unfair to expect 
it to work well in that situation.


While not relevant to this exercise you'll need to set 
effective_cache_size to a useful value one day as well.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread PFC



You got the order slightly wrong I guess.


1) hardware


	Would only come first if your RAM is really too small, or you use RAID5  
on write-heavy tables, or what limits you is transaction fsync (hint :  
8.3).

Adding RAM is cheap.


2) rewriting my queries and table structures


This should really come first.
	Log expensive queries. Note that an expensive query can be a slow query,  
or be a rather fast query that you execute lots of times, or a very simple  
and fast query that you execute really really too often.


Now ask yourself :
* What is this query supposed to do ?

* Do I need this query ?

Example :
You put your sessions in a database ?
= Perhaps put them in the good old filesystem ?

	Your PHP is loading lots of configuration from the database for every  
page.
	= Cache it, generate some PHP code once and include it, put it in the  
session if it depends on the user, but don't reload the thing on each page  
!


This feature is useless
	= Do you really need to display a birthday cake on your forum for those  
users who have their birthday today ?


UPDATEs...
	= Do you really need to update the last time a user was online every  
time ? What about updating it every 5 minutes instead ?


* Is this query inside a loop ?
= Use JOIN.

* Do I need all the rows from this query ?

Example :
You use pagination and perform the same query changing LIMIT/OFFSET ?
= Perform the query once, retrieve the first N pages of result, cache it  
in the session or in a table.


* You have a website ?
= Use lighttpd and fastcgi

* Do I need all the columns from this query ?

* Do I suffer from locking ?

etc.


Now you should see some easy targets.
For the queries that are slow, use EXPLAIN ANALYZE.
Question your schema.
etc.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Gregory Stark

Josh Berkus [EMAIL PROTECTED] writes:

 Greg,

 What I was referring to by passing TPC-E was the criteria for a conformant
 benchmark run. TPC-C has iirc, only two relevant criteria: 95th percentile
 response time  5s and average response time  95th percentile response
 time. You can pass those even if 1 transaction in 20 takes 10-20s which is
 more than enough to cover checkpoints and other random sources of 
 inconsistent
 performance.

 We can do this now.  I'm unhappy because we're at about 1/4 of Oracle
 performance, but we certainly pass -- even with 8.2.

We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On
similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle?
Can you point me to the actual benchmark runs you're referring to?

But I just made an off-hand comment that I doubt 8.2 could pass TPC-E which
has much more stringent requirements. It has requirements like: 

  the throughput computed over any period of one hour, sliding over the Steady
  State by increments of ten minutes, varies from the Reported Throughput by no
  more than 2%


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread Shane Ambler


John Rouillard wrote:

We can't do this as we are backfilling a couple of months of data 
into tables with existing data.


Is this a one off data loading of historic data or an ongoing thing?



The only indexes we have to drop are the ones on the primary keys
 (there is one non-primary key index in the database as well).


If this amount of data importing is ongoing then one thought I would try
is partitioning (this could be worthwhile anyway with the amount of data
you appear to have).
Create an inherited table for the month being imported, load the data 
into it, then add the check constraints, indexes, and modify the 
rules/triggers to handle the inserts to the parent table.




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-28 Thread Tino Wildenhain


Adonias Malosso wrote:

Hi All,

I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 
columns table

from ORACLE to PGSQL.

The current approach is to dump the data in CSV and than COPY it to 
Postgresql.



Uhm. 101 columns you say? Sounds interesting. There are dataloaders
like: http://pgfoundry.org/projects/pgloader/  which could speed
up loading the data over just copy csv. I wonder how much normalizing
could help.

Tino

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Joshua D. Drake

On Mon, 28 Apr 2008 14:40:25 -0400
Gregory Stark [EMAIL PROTECTED] wrote:


 We certainly can pass TPC-C. I'm curious what you mean by 1/4 though?
 On similar hardware? Or the maximum we can scale to is 1/4 as large
 as Oracle? Can you point me to the actual benchmark runs you're
 referring to?

I would be curious as well considering there has been zero evidence
provided to make such a statement. I am not saying it isn't true, it
wouldn't be surprising to me if Oracle outperformed PostgreSQL in TPC-C
but I would sure like to see in general how wel we do (or don't).


Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Radhika S

On Mon, Apr 28, 2008 at 9:38 AM, Gauri Kanekar
[EMAIL PROTECTED] wrote:
 All,

 We have a table table1 which get insert and updates daily in high numbers,
 bcoz of which its size is increasing and we have to vacuum it every
 alternate day. Vacuuming table1 take almost 30min and during that time the
 site is down.

Slony is an open source replication system built for Postgres.
But the real problem is that you are doing a vaccum full every day.
This is highly invasive.
Take a look at the postgres docs on Vacuuming the db. Analyze is best
on a daily basis. If you have a lot of deletes, then try vacuum
truncate.

The postgres documentation describes the various vaccuum options and
explains the merits of each.

Hope that helps.
Radhika


-- 
It is all a matter of perspective. You choose your view by choosing
where to stand. --Larry Wall

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Postgres Benchmark looking for maintainer

2008-04-28 Thread PFC


Hello,


Executive summary :

Last year I wrote a database benchmark which simulates a forum.
It works on Postgres and MySQL.
It could be useful.
I have no time to touch this, so it is rotting on my harddrive.

Who wants to adopt it ? I will put it on pgfoundry.
I can spend a few hours documenting the source and writing some
documentation and pass the package to someone who might be interested and
more available.

Details :

The benchmark is a forum type load (actually it came from me arguing with
the phpBB team, lol) but, unlike all forums I know, correctly optimized.
A bunch of forums are created, and there is a website (in PHP), very
basic, which allows you to browse the forums, view topics, and insert
posts. It displays the usual forum info like last post, number of topics
or posts in forum, number of posts in topic, etc.

Then there is a benchmarking client, written in Python. It spawns a number
of users who perform real-life actions, like viewing pages, adding
posts, and there a few simulated moderators who will, once in a while,
destroy topics and even forums.

This client can hit the PHP website via HTTP.

However postgres is so fast that you would need several PHP servers to
kill it. So, I added a multi-backend capability to the client : it can hit
the database directly, performing the queries the PHP script would have
performed.

However, postgres is still so fast that you won't be able to benchmark
anything more powerful than a Core 2, the client would need to be
rewritten in a compiled language like Java. Also, retrieving the posts'
text easily blasted the 100 Mbps connection between server and client, so
you would need Gigabit ethernet.

So, the load is very realistic (it would mimic a real forum pretty well) ;
but in order to benchmark it you must simulate humongous traffic levels.

The only difference is that my benchmark does a lot more writing (post
insertions) than a normal forum ; I wanted the database to grow big in a
few hours.

It also works on MySQL so you can get a good laugh. Actually I was able to
extract some good performance out of MySQL, after lots of headaches,
except that I was never able to make it use more than 1 core.

Contrary to the usual benchmarks, the code is optimized for MySQL and for
Postgres, and the stored procedures also. Thus, what is compared is not a
least-common-denominator implementation that happens to work on both
databases, but two implementations specifically targeted and optimized at
each database.

The benchmark is also pretty simple (unlike the TPC) but it is useful,
first it is CPU-bound then IO-bound and clustering the tables does a lot
for performance (you can test auto-cluster), checkpoints are very visible,
etc. So it can provide useful information that is easier to understand
that a very complex benchmark.

Originally the purpose of the benchmark was to test postgres' full search
; the code is still there.

Regards,
Pierre

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-28 Thread Adonias Malosso

Jonah,

Thank you for the answer. Good to know about this enterprise DB feature.

I´ll follow using pgloader.

Regards.

Adonias Malosso

On Sat, Apr 26, 2008 at 10:14 PM, Jonah H. Harris [EMAIL PROTECTED]
wrote:

 On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso [EMAIL PROTECTED]
 wrote:
  I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
  columns table
  from ORACLE to PGSQL.

 The fastest and easiest method would be to dump the data from Oracle
 into CSV/delimited format using something like ociuldr
 (http://www.anysql.net/en/ociuldr.html) and load it back into PG using
 pg_bulkload (which is a helluva lot faster than COPY).  Of course, you
 could try other things as well... such as setting up generic
 connectivity to PG and inserting the data to a PG table over the
 database link.

 Similarly, while I hate to see shameless self-plugs in the community,
 the *fastest* method you could use is dblink_ora_copy, contained in
 EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
 connection to COPY the data directly from Oracle into Postgres, which
 also saves you the intermediate step of dumping the data.

 --
 Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
 EnterpriseDB Corporation | fax: 732.331.1301
 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
 Edison, NJ 08837 | http://www.enterprisedb.com/

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Chris Browne

[EMAIL PROTECTED] (Gauri Kanekar) writes:
 We have a table table1 which get insert and updates daily in high
 numbers, bcoz of which its size is increasing and we have to vacuum
 it every alternate day. Vacuuming table1 take almost 30min and
 during that time the site is down.  We need to cut down on this
 downtime.So thought of having a replication system, for which the
 replicated DB will be up during the master is getting vacuumed.  Can
 anybody guide which will be the best suited replication solution for
 this.

The only reason that it would be necessary for VACUUM to take the
site down would be if you are running version 7.1, which was
obsoleted in 2002, which, it should be noted, was SIX YEARS AGO.

As has been noted, you seem to be presupposing a remarkably complex
solution to resolve a problem which is likely to be better handled via
running VACUUM rather more frequently.
-- 
output = reverse(ofni.sesabatadxunil @ enworbbc)
http://www3.sympatico.ca/cbbrowne/postgresql.html
Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
shipped in tightly-packed bales. Any wagonload of loose hay attempting
to pass through a checkpoint will be set on fire.
http://www.eviloverlord.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Best practice to load a huge table from ORACLE to PG

2008-04-28 Thread Jonah H. Harris

On Mon, Apr 28, 2008 at 5:37 PM, Adonias Malosso [EMAIL PROTECTED] wrote:
 Thank you for the answer. Good to know about this enterprise DB feature.

No problem.

 I´ll follow using pgloader.

That's fine.  Though, I'd really suggest pg_bulkload, it's quite a bit faster.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

Basically we have some background process which updates table1 and we
don't want the application to make any changes to table1 while vacuum.

Vacuum requires exclusive lock on table1 and if any of the background or
application is ON vacuum don't kick off. Thats the reason we need to get the
site down.

~ Gauri

On Tue, Apr 29, 2008 at 3:13 AM, Chris Browne [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] (Gauri Kanekar) writes:
  We have a table table1 which get insert and updates daily in high
  numbers, bcoz of which its size is increasing and we have to vacuum
  it every alternate day. Vacuuming table1 take almost 30min and
  during that time the site is down.  We need to cut down on this
  downtime.So thought of having a replication system, for which the
  replicated DB will be up during the master is getting vacuumed.  Can
  anybody guide which will be the best suited replication solution for
  this.

 The only reason that it would be necessary for VACUUM to take the
 site down would be if you are running version 7.1, which was
 obsoleted in 2002, which, it should be noted, was SIX YEARS AGO.

 As has been noted, you seem to be presupposing a remarkably complex
 solution to resolve a problem which is likely to be better handled via
 running VACUUM rather more frequently.
 --
 output = reverse(ofni.sesabatadxunil @ enworbbc)
 http://www3.sympatico.ca/cbbrowne/postgresql.html
 Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
 shipped in tightly-packed bales. Any wagonload of loose hay attempting
 to pass through a checkpoint will be set on fire.
 http://www.eviloverlord.com/

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Regards
Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

But unless we do full vacuum the space is not recovered. Thats y we prefer
full vacuum.

~ Gauri

On Tue, Apr 29, 2008 at 10:38 AM, Greg Smith [EMAIL PROTECTED] wrote:

 On Tue, 29 Apr 2008, Gauri Kanekar wrote:

  Basically we have some background process which updates table1 and we
  don't want the application to make any changes to table1 while vacuum.
  Vacuum requires exclusive lock on table1 and if any of the background
  or
  application is ON vacuum don't kick off.
 

 VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. It's
 one of the reasons FULL should be avoided.  If you do regular VACUUM
 frequently enough, you shouldn't ever need to do a FULL one anyway.


 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD




-- 
Regards
Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Greg Smith


On Tue, 29 Apr 2008, Gauri Kanekar wrote:


Basically we have some background process which updates table1 and we
don't want the application to make any changes to table1 while vacuum.
Vacuum requires exclusive lock on table1 and if any of the background or
application is ON vacuum don't kick off.


VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. 
It's one of the reasons FULL should be avoided.  If you do regular VACUUM 
frequently enough, you shouldn't ever need to do a FULL one anyway.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Gauri Kanekar

Andrew,

Can you explain me in detail why u said vacuum full is making the things
worst.
We do vacuum full, as vacuum verbose analyse dont regain space for us.

~ Gauri

On Mon, Apr 28, 2008 at 9:52 PM, Andrew Sullivan [EMAIL PROTECTED]
wrote:

 On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote:
  Peter,
 
  We are doing vacuum full every alternate day. We also do vacuum analyze
 very
  often.

 VACUUM FULL is making your problem worse, not better.  Don't do that.

  We are currently using 8.1.3 version.

 You need immediately to upgrade to the latest 8.1 stability and
 security release, which is 8.1.11.  This is a drop-in replacement.
 It's an urgent fix for your case.

  Auto vacuum is already on. But the table1 is so busy that auto vacuum
 don't
  get sufficient chance to vacuum it :(.

 You probably need to tune autovacuum not to do that table, and just
 vacuum that table in a constant loop or something.  VACUUM should
 _never_ take the site down.  If it does, you're doing it wrong.

  Have already tried all the option listed by you, thats y we reached to
 the
  decision of having a replication sytsem. So any suggestion on that :).

 I think you will find that no replication system will solve your
 underlying problems.  That said, I happen to work for a company that
 will sell you a replication system to work with 8.1 if you really want
 it.

 A


 --
 Andrew Sullivan
 [EMAIL PROTECTED]
 +1 503 667 4564 x104
 http://www.commandprompt.com/

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Regards
Gauri

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Pavan Deolasee

On Tue, Apr 29, 2008 at 10:41 AM, Gauri Kanekar
[EMAIL PROTECTED] wrote:
 But unless we do full vacuum the space is not recovered. Thats y we prefer
 full vacuum.

There is no point in recovering the space by moving tuples and
truncating the relation (that's what VACUUM FULL does) because you are
doing frequent updates on the table and that would again extend the
relation. If you run plain VACUUM, that would recover dead space and
update the free space maps. It may not be able to reduce the table
size, but you should not be bothered much about it because the
following updates/inserts will fill in the fragmented free space.

You may want to check your FSM settings as well to make sure that you
are tracking free space properly.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Replication Syatem

2008-04-28 Thread Pavan Deolasee

On Tue, Apr 29, 2008 at 11:16 AM, Gauri Kanekar
[EMAIL PROTECTED] wrote:
 Andrew,

 Can you explain me in detail why u said vacuum full is making the things
 worst.

1. VACUUM FULL takes exclusive lock on the table. That makes table
unavailable for read/writes.

2. VACUUM FULL moves live tuples around. When a tuple is moved, the
old index entry is deleted and a new index entry is inserted. This
causes index bloats which are hard to recover.


 We do vacuum full, as vacuum verbose analyse dont regain space for us.


As I mentioned in the other reply, you are not gaining much by
regaining space. The subsequent UPDATEs/INSERTs will quickly extend
the relation and you loose all the work done by VACUUM FULL.  Plain
VACUUM will update FSM to track the free space scattered across the
relation which is later reused by updates/inserts.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

40 matches

Mail list logo