Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
I wonder if my dump/restore routine isn't causing this issue. Seeing
that I do the db development on my laptop (the fast one) and then
restores it on the other two machines. I have confirmed if all the
indexes are present after a restore.

This is the routine:

/usr/local/pgsql/bin/pg_dump -t layer mapdb | gzip  layer.gz

rsync --progress --rsh=ssh layer.gz
[EMAIL PROTECTED]:/home/postgres/layer.gz

--

/usr/local/pgsql/bin/pg_dump -t visiblelayer mapdb | gzip  visiblelayer.gz

rsync --progress --rsh=ssh visiblelayer.gz
[EMAIL PROTECTED]:/home/postgres/visiblelayer.gz

--

/usr/local/pgsql/bin/pg_dump -t style mapdb | gzip  style.gz

rsync --progress --rsh=ssh style.gz
[EMAIL PROTECTED]:/home/postgres/style.gz

--

/usr/local/pgsql/bin/pg_dump -t layertype mapdb | gzip  layertype.gz

rsync --progress --rsh=ssh layertype.gz
[EMAIL PROTECTED]:/home/postgres/layertype.gz

--

DROP TABLE visiblelayer;
DROP TABLE style;
DROP TABLE layer;
DROP TABLE layertype;

gunzip -c layertype.gz | /usr/local/pgsql/bin/psql mapdb
gunzip -c style.gz | /usr/local/pgsql/bin/psql mapdb
gunzip -c visiblelayer.gz | /usr/local/pgsql/bin/psql mapdb
gunzip -c layer.gz | /usr/local/pgsql/bin/psql mapdb

/usr/local/pgsql/bin/vacuumdb -d mapdb -z -v

Craig James wrote:

 On 2007-06-11 Christo Du Preez wrote:
 I really hope someone can shed some light on my problem. I'm not sure
 if this is a posgres or potgis issue.

 Anyway, we have 2 development laptops and one live server, somehow I
 managed to get the same query to perform very well om my laptop, but
 on both the server and the other laptop it's really performing bad.

 One simple possibility that bit me in the past: If you do
 pg_dump/pg_restore to create a copy of the database, you have to
 ANALYZE the newly-restored database.  I mistakenly assumed that
 pg_restore would do this, but you have to run ANALYZE explicitely
 after a restore.

 Craig


 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



-- 
Christo Du Preez

Senior Software Engineer
Mecola IT
Mobile:  +27 [0]83 326 8087
Skype:   christodupreez
Website: http://www.locateandtrade.co.za


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
Where do I set the planner settings or are you reffering to settings in
postgres.conf that may affect the planner?

The one badly performing laptop is the same as mine (the fast one) and
the server is much more powerful.

Laptops: Intel Centrino Duo T2600 @ 2.16GHz, 1.98 GB RAM

Server: 2 xIntel Pentium D CPU  3.00GHz, 4 GB RAM

All three systems are running Suse 10.2, with the same PosgreSQL, same
configs, same databases. As far as I know, same everything.

PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)
POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct
2006 USE_STATS

Thanx for all the advice

Dave Dutcher wrote:
 -Original Message-
 From: Christo Du Preez
 Sent: Monday, June 11, 2007 10:10 AM

 I have narrowed down the problem (I think) and it's the query 
 planner using different plans and I haven't got a clue why. 
 Can anyone please shed some light on this?
 

 Different plans can be caused by several different things like different
 server versions, different planner settings in the config file, different
 schemas, or different statistics.  You say the server versions are the same,
 so that's not it.  Is the schema the same?  One isn't missing indexes that
 the other has?  Do they both have the same data, or at least very close to
 the same data?  Have you run analyze on both of them to update their
 statistics?  Do they have the same planner settings in the config file?  I
 would check that stuff out and see if it helps.

 Dave



   

-- 
Christo Du Preez

Senior Software Engineer
Mecola IT
Mobile:  +27 [0]83 326 8087
Skype:   christodupreez
Website: http://www.locateandtrade.co.za



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
Good day,

I have noticed that my server never uses indexing. No matter what I do.

As an example I took a table with about 650 rows, having a parentid
field with an index on parentid.

EXPLAIN ANALYZE
SELECT *
  FROM layertype
where parentid = 300;

On my laptop the explain analyze looks like this:

Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
rows=1 width=109)
  Index Cond: (parentid = 300)

and on the problem server:

Seq Scan on layertype  (cost=0.00..20.39 rows=655 width=110)
  Filter: (parentid = 300)

.

I have dropped the index, recreated it, vacuumed the table, just about
everything I could think of, And there is just no way I can get the
query planner to use the index.

PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)
POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct
2006 USE_STATS


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Reid Thompson
try it with a table with 650K rows...

On Tue, 2007-06-12 at 15:32 +0200, Christo Du Preez wrote:
 Good day,
 
 I have noticed that my server never uses indexing. No matter what I do.
 
 As an example I took a table with about 650 rows, having a parentid
 field with an index on parentid.
 
 EXPLAIN ANALYZE
 SELECT *
   FROM layertype
 where parentid = 300;
 
 On my laptop the explain analyze looks like this:
 
 Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
 rows=1 width=109)
   Index Cond: (parentid = 300)
 
 and on the problem server:
 
 Seq Scan on layertype  (cost=0.00..20.39 rows=655 width=110)
   Filter: (parentid = 300)
 
 .
 
 I have dropped the index, recreated it, vacuumed the table, just about
 everything I could think of, And there is just no way I can get the
 query planner to use the index.
 
 PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
 20061115 (prerelease) (SUSE Linux)
 POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct
 2006 USE_STATS
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Michael Glaesemann


On Jun 12, 2007, at 8:32 , Christo Du Preez wrote:

I have noticed that my server never uses indexing. No matter what I  
do.


As an example I took a table with about 650 rows, having a parentid
field with an index on parentid.

EXPLAIN ANALYZE
SELECT *
  FROM layertype
where parentid = 300;


The planner weighs the cost of the different access methods and  
choses the one that it believes is lowest in cost. An index scan is  
not always faster than a sequential scan. With so few rows, it's  
probably faster for the server to read the whole table rather than  
reading the index and looking up the corresponding row. If you want  
to test this, you can set enable_seqscan to false and try running  
your query again.


http://www.postgresql.org/docs/8.2/interactive/runtime-config- 
query.html#RUNTIME-CONFIG-QUERY-ENABLE


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Gregory Stark

Christo Du Preez [EMAIL PROTECTED] writes:

 On my laptop the explain analyze looks like this:

 Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
 rows=1 width=109)
   Index Cond: (parentid = 300)

That's not explain analyze, that's just plain explain.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
The actual table I noticed the problem has a million rows and it still
doesn't use indexing

Reid Thompson wrote:
 try it with a table with 650K rows...

 On Tue, 2007-06-12 at 15:32 +0200, Christo Du Preez wrote:
   
 Good day,

 I have noticed that my server never uses indexing. No matter what I do.

 As an example I took a table with about 650 rows, having a parentid
 field with an index on parentid.

 EXPLAIN ANALYZE
 SELECT *
   FROM layertype
 where parentid = 300;

 On my laptop the explain analyze looks like this:

 Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
 rows=1 width=109)
   Index Cond: (parentid = 300)

 and on the problem server:

 Seq Scan on layertype  (cost=0.00..20.39 rows=655 width=110)
   Filter: (parentid = 300)

 .

 I have dropped the index, recreated it, vacuumed the table, just about
 everything I could think of, And there is just no way I can get the
 query planner to use the index.

 PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
 20061115 (prerelease) (SUSE Linux)
 POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct
 2006 USE_STATS


 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
 


   

-- 
Christo Du Preez

Senior Software Engineer
Mecola IT
Mobile:  +27 [0]83 326 8087
Skype:   christodupreez
Website: http://www.locateandtrade.co.za


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] How much memory PostgreSQL is going to use?

2007-06-12 Thread Arnau

Hi all,

  I have a server with 4GB of memory and I'm tweaking the PostgreSQL 
configuration. This server will be dedicated to run PostgreSQL so I'd 
like to dedicate as much as possible RAM to it.


  I have dedicated 1GB to shared_buffers (shared_buffers=131072) but 
I'm not sure if this will be the maximum memory used by PostgreSQL or 
additional to this it will take more memory. Because if shared_buffers 
is the maximum I could raise that value even more.


Cheers!
--
Arnau

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Dave Dutcher
 From: Christo Du Preez
 Sent: Tuesday, June 12, 2007 2:38 AM
 
 Where do I set the planner settings or are you reffering to 
 settings in postgres.conf that may affect the planner?
 

Yes I'm reffering to settings in postgres.conf.  I'm wondering if
enable_indexscan or something got turned off on the server for some reason.
Here is a description of those settings:

http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html

So when you move data from the laptop to the server, I see that your script
correctly runs an analyze after the load, so have you run analyze on the
fast laptop lately?  Hopefully running analyze wouldn't make the planner
choose a worse plan on the laptop, but if we are trying to get things
consistant between the laptop and server, that is something I would try.

If the consistancy problem really is a problem of the planner not using
index scans on the server, then if you can, please post the table definition
for the table with a million rows and an EXPLAIN ANALYZE of a query which
selects a few rows from the table.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Steinar H. Gunderson
On Tue, Jun 12, 2007 at 03:32:40PM +0200, Christo Du Preez wrote:
 As an example I took a table with about 650 rows, having a parentid
 field with an index on parentid.

Try a bigger table. Using an index for only 650 rows is almost always
suboptimal, so it's no wonder the planner doesn't use the index.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How much memory PostgreSQL is going to use?

2007-06-12 Thread Bill Moran
In response to Arnau [EMAIL PROTECTED]:

 Hi all,
 
I have a server with 4GB of memory and I'm tweaking the PostgreSQL 
 configuration. This server will be dedicated to run PostgreSQL so I'd 
 like to dedicate as much as possible RAM to it.
 
I have dedicated 1GB to shared_buffers (shared_buffers=131072) but 
 I'm not sure if this will be the maximum memory used by PostgreSQL or 
 additional to this it will take more memory. Because if shared_buffers 
 is the maximum I could raise that value even more.

Individual backend processes will allocate more memory above shared_buffers
for processing individual queries.  See work_mem.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Steinar H. Gunderson
On Tue, Jun 12, 2007 at 04:11:33PM +0200, Christo Du Preez wrote:
 The actual table I noticed the problem has a million rows and it still
 doesn't use indexing

Then please post an EXPLAIN ANALYZE of the query that is slow, along with the
table definition and indexes.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Variable (degrading) performance

2007-06-12 Thread Vladimir Stankovic

Heikki,

Thanks for the response.

Heikki Linnakangas wrote:

Vladimir Stankovic wrote:
I'm running write-intensive, TPC-C like tests. The workload consist 
of 150 to 200 thousand transactions. The performance varies 
dramatically, between 5 and more than 9 hours (I don't have the exact 
figure for the longest experiment). Initially the server is 
relatively fast. It finishes the first batch of 50k transactions in 
an hour. This is probably due to the fact that the database is 
RAM-resident during this interval. As soon as the database grows 
bigger than the RAM the performance, not surprisingly, degrades, 
because of the slow disks.
My problem is that the performance is rather variable, and to me 
non-deterministic. A 150k test can finish in approx. 3h30mins but 
conversely  it can take more than 5h to complete.
Preferably I would like to see *steady-state* performance (where my 
interpretation of the steady-state is that the average 
throughput/response time does not change over time). Is the 
steady-state achievable despite the MVCC and the inherent 
non-determinism between experiments? What could be the reasons for 
the variable performance?


Steadiness is a relative; you'll never achieve perfectly steady 
performance where every transaction takes exactly X milliseconds. That 
said, PostgreSQL is not as steady as many other DBMS's by nature, 
because of the need to vacuum. Another significant source of 
unsteadiness is checkpoints, though it's not as bad with fsync=off, 
like you're running.
What I am hoping to see is NOT the same value for all the executions of 
the same type of transaction (after some transient period). Instead, I'd 
like to see that if I take appropriately-sized set of transactions  I 
will see at least steady-growth in transaction average times, if not 
exactly the same average. Each chunk would possibly include sudden 
performance drop due to the necessary vacuum and checkpoints. The 
performance might be influenced by the change in the data set too.
I am unhappy about the fact that durations of experiments can differ 
even 30% (having in mind that they are not exactly the same due to the 
non-determinism on the client side) . I would like to eliminate this 
variability. Are my expectations reasonable? What could be the cause(s) 
of this variability?


I'd suggest using the vacuum_cost_delay to throttle vacuums so that 
they don't disturb other transactions as much. You might also want to 
set up manual vacuums for the bigger tables, instead of relying on 
autovacuum, because until the recent changes in CVS head, autovacuum 
can only vacuum one table at a time, and while it's vacuuming a big 
table, the smaller heavily-updated tables are neglected.



The database server version is  8.1.5 running on Fedora Core 6.


How about upgrading to 8.2? You might also want to experiment with CVS 
HEAD to get the autovacuum improvements, as well as a bunch of other 
performance improvements.


I will try these, but as I said my primary goal is to have 
steady/'predictable' performance, not necessarily to obtain the fastest 
PG results.


Best regards,
Vladimir

--
Vladimir Stankovic  T: +44 20 7040 0273
Research Student/Research Assistant F: +44 20 7040 8585
Centre for Software Reliability E: [EMAIL PROTECTED]
City University 
Northampton Square, London EC1V 0HB 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] VACUUM vs auto-vacuum daemon

2007-06-12 Thread Sabin Coanda
Hi there,

Using explicitly VACUUM command give me the opportunity to fine tune my 
VACUUM scheduling parameters, after I analyze the log generated by VACUUM 
VERBOSE.

On the other hand I'd like to use the auto-vacuum mechanism because of its 
facilities. Unfortunately, after I made some initial estimations for 
autovacuum_naptime, and I set the specific data into pg_autovacuum table, I 
have not a feedback from the auto-vacuum mechanism to check that it works 
well or not.  It would be nice to have some kind of log similar with the one 
generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide 
such a useful log ?

TIA,
Sabin 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-12 Thread Tyrrill, Ed
Craig James wrote:
 Tyrrill, Ed wrote:
  QUERY PLAN
 
 


 


  ---
   Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645
 width=8)
  (actual time=6503583.342..8220629.311 rows=93524 loops=1)
 Merge Cond: (outer.record_id = inner.record_id)
 Filter: (inner.record_id IS NULL)
 -  Index Scan using backupobjects_pkey on backupobjects
  (cost=0.00..521525.10 rows=13799645 width=8) (actual
  time=15.955..357813.621 rows=13799645 loops=1)
 -  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
  (actual time=6503265.293..7713657.750 rows=214938308 loops=1)
   Sort Key: backup_location.record_id
   -  Seq Scan on backup_location  (cost=0.00..3311212.04
  rows=214938304 width=8) (actual time=11.175..1881179.825
 rows=214938308
  loops=1)
   Total runtime: 8229178.269 ms
  (8 rows)
 
  I ran vacuum analyze after the last time any inserts, deletes, or
  updates were done, and before I ran the query above.  I've attached
 my
  postgresql.conf.  The machine has 4 GB of RAM.
 
 I thought maybe someone with more expertise than me might answer this,
 but since they haven't I'll just make a comment.  It looks to me like
 the sort of 214 million rows is what's killing you.  I suppose you
 could try to increase the sort memory, but that's a lot of memory.  It
 seems to me an index merge of a relation this large would be faster,
 but that's a topic for the experts.
 
 On a theoretical level, the problem is that it's sorting the largest
 table.  Perhaps you could re-cast the query so that it only has to
 sort the smaller table, something like
 
select a.id from a where a.id not in (select distinct b.id from b)
 
 where b is the smaller table.  There's still no guarantee that it
 won't do a sort on a, though.  In fact one of the clever things
 about Postgres is that it can convert a query like the one above into
 a regular join, unless you do something like select ... offset 0
 which blocks the optimizer from doing the rearrangement.
 
 But I think the first approach is to try to tune for a better plan
 using your original query.
 
 Craig

Thanks for the input Craig.  I actually started out with a query similar
to what you suggest, but the performance was days to complete back when
the larger table, backup_location, was still under 100 million rows.
The current query is the best performance to date.  I have been playing
around with work_mem, and doubling it to 128MB did result in some
improvement, but doubleing it again to 256MB showed no further gain.
Here is the explain analyze with work_mem increased to 128MB:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;

QUERY
PLAN




 Merge Left Join  (cost=36876242.28..40658535.53 rows=13712990 width=8)
(actual time=5795768.950..5795768.950 rows=0 loops=1)
   Merge Cond: (outer.record_id = inner.record_id)
   Filter: (inner.record_id IS NULL)
   -  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..520571.89 rows=13712990 width=8) (actual
time=2.490..201516.228 rows=13706121 loops=1)
   -  Sort  (cost=36876242.28..37414148.76 rows=215162592 width=8)
(actual time=4904205.255..5440137.309 rows=215162559 loops=1)
 Sort Key: backup_location.record_id
 -  Seq Scan on backup_location  (cost=0.00..3314666.92
rows=215162592 width=8) (actual time=4.186..1262641.774 rows=215162559
loops=1)
 Total runtime: 5796322.535 ms


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] VACUUM vs auto-vacuum daemon

2007-06-12 Thread Alvaro Herrera
Sabin Coanda wrote:
 Hi there,
 
 Using explicitly VACUUM command give me the opportunity to fine tune my 
 VACUUM scheduling parameters, after I analyze the log generated by VACUUM 
 VERBOSE.
 
 On the other hand I'd like to use the auto-vacuum mechanism because of its 
 facilities. Unfortunately, after I made some initial estimations for 
 autovacuum_naptime, and I set the specific data into pg_autovacuum table, I 
 have not a feedback from the auto-vacuum mechanism to check that it works 
 well or not.  It would be nice to have some kind of log similar with the one 
 generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide 
 such a useful log ?

No, sorry, autovacuum is not currently very good regarding reporting its
activities.  It's a lot better in 8.3 but even there it doesn't report
the full VACUUM VERBOSE log.  It looks like this:

LOG:  automatic vacuum of table alvherre.public.foo: index scans: 0
pages: 45 removed, 0 remain
tuples: 1 removed, 0 remain
system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec
LOG:  automatic analyze of table alvherre.public.foo system usage: CPU 
0.00s/0.00u sec elapsed 0.00 sec

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Tom Lane
Christo Du Preez [EMAIL PROTECTED] writes:
 On my laptop the explain analyze looks like this:

 Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
 rows=1 width=109)
   Index Cond: (parentid = 300)

OK ...

 and on the problem server:

 Seq Scan on layertype  (cost=0.00..20.39 rows=655 width=110)
   Filter: (parentid = 300)

The server thinks that every row of the table matches the WHERE clause.
That being the case, it's making the right choice to use a seqscan.
The question is why is the rows estimate so far off?  Have you ANALYZEd
the table lately?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] VACUUM vs auto-vacuum daemon

2007-06-12 Thread Bill Moran
In response to Sabin Coanda [EMAIL PROTECTED]:

 Hi there,
 
 Using explicitly VACUUM command give me the opportunity to fine tune my 
 VACUUM scheduling parameters, after I analyze the log generated by VACUUM 
 VERBOSE.
 
 On the other hand I'd like to use the auto-vacuum mechanism because of its 
 facilities. Unfortunately, after I made some initial estimations for 
 autovacuum_naptime, and I set the specific data into pg_autovacuum table, I 
 have not a feedback from the auto-vacuum mechanism to check that it works 
 well or not.  It would be nice to have some kind of log similar with the one 
 generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide 
 such a useful log ?

Ditto what Alvaro said.

However, you can get some measure of tracking my running VACUUM VERBOSE
on a regular basis to see how well autovacuum is keeping up.  There's
no problem with running manual vacuum and autovacuum together, and you'll
be able to gather _some_ information about how well autovacuum is
keeping up.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] VACUUM vs auto-vacuum daemon

2007-06-12 Thread Sabin Coanda
Hi Bill,

...

 However, you can get some measure of tracking my running VACUUM VERBOSE
 on a regular basis to see how well autovacuum is keeping up.  There's
 no problem with running manual vacuum and autovacuum together, and you'll
 be able to gather _some_ information about how well autovacuum is
 keeping up.

Well, I think it is useful just if I am able to synchronize the autovacuum 
to run always after I run vacuum verbose. But I don't know how to do that. 
Do you ?

Sabin 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
Yes, I have just about tried every combination of vacuum on the
database. Just to make 100% sure.

Tom Lane wrote:
 Christo Du Preez [EMAIL PROTECTED] writes:
   
 On my laptop the explain analyze looks like this:
 

   
 Index Scan using fki_layertype_parentid on layertype  (cost=0.00..8.27
 rows=1 width=109)
   Index Cond: (parentid = 300)
 

 OK ...

   
 and on the problem server:
 

   
 Seq Scan on layertype  (cost=0.00..20.39 rows=655 width=110)
   Filter: (parentid = 300)
 

 The server thinks that every row of the table matches the WHERE clause.
 That being the case, it's making the right choice to use a seqscan.
 The question is why is the rows estimate so far off?  Have you ANALYZEd
 the table lately?

   regards, tom lane


   

-- 
Christo Du Preez

Senior Software Engineer
Mecola IT
Mobile:  +27 [0]83 326 8087
Skype:   christodupreez
Website: http://www.locateandtrade.co.za


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Tom Lane
Christo Du Preez [EMAIL PROTECTED] writes:
 Yes, I have just about tried every combination of vacuum on the
 database. Just to make 100% sure.

Well, there's something mighty wacko about that rowcount estimate;
even if you didn't have stats, the estimate for a simple equality
constraint oughtn't be 100% match.

What do you get from SELECT * FROM pg_stats WHERE tablename = 'layertype'
on both systems?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Christo Du Preez
Fast:

public;layertype;id;0;4;-1;;;{1,442,508,575,641,708,774,840,907,973,1040};0.95
public;layertype;label;0;14;-0.971429;{arch,bank,bench,canyon,gap,hill,hills,levee,mountain,mountains};{0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752};{abandoned
airfield,boatyard,corridor,forest(s),intermittent lake,metro
station,park headquarters,reefs,section of bank,swamp,zoo};0.107307
public;layertype;parentid;0.98797;4;2;{4,1};{0.00902256,0.00300752};;-0.142857
public;layertype;zorder;0;4;9;{0};{0.98797};{1,2,3,4,5,6,7,8};0.928955
public;layertype;description;0.100752;74;-0.888722;{a branch of
a canyon or valley,a low, isolated, rounded hill,a near-level
shallow, natural depression or basin, usually containing an intermittent
lake, pond, or pool,a relatively shallow, wide depression, the bottom
of which usually has a continuous gradient,a shore zone of coarse
unconsolidated sediment that extends from the low-water line to the
highest reach of storm waves,a surface-navigation hazard composed of
consolidated material,a surface-navigation hazard composed of
unconsolidated
material};{0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752};{a
barrier constructed across a stream to impound water,a comparatively
depressed area on an icecap,a facility for pumping oil through a
pipeline,a large house, mansion, or chateau, on a large estate,an
area drained by a stream,an elongate (tongue-like) extension of a flat
sea floor into an adjacent higher feature,a place where caravans stop
for rest,a series of associated ridges or seamounts,a sugar mill no
longer used as a sugar mill,bowl-like hollows partially surrounded by
cliffs or steep slopes at the head of a glaciated
valley,well-delineated subdivisions of a large and complex positive
feature};-0.0178932
public;layertype;code;0.0135338;9;-1;;;{A.ADM1,H.HBRX,H.STMM,L.RGNL,S.BUSTN,S.HTL,S.PKLT,S.TRIG,T.MTS,U.GAPU,V.VINS};0.995628

Slow:

public;layertype;id;0;4;-1;;;{1,437,504,571,638,705,772,839,906,973,1040};-0.839432
public;layertype;label;0;15;-0.965723;{arch,bank,bench,canyon,country,gap,hill,hills,levee,mountain};{0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063};{abandoned
airfield,boatyard,cotton plantation,fork,intermittent oxbow
lake,military installation,park headquarters,reef,second-order
administrative division,swamp,zoo};-0.0551452
public;layertype;parentid;0.00745157;4;7;{300};{0.976155};{1,1,4,5,8,12};0.92262
public;layertype;zorder;0;4;8;{0};{0.971684};{1,2,3,3,5,7,7};0.983028
public;layertype;description;0.110283;74;-0.879285;{a branch of
a canyon or valley,a low, isolated, rounded hill,a near-level
shallow, natural depression or basin, usually containing an intermittent
lake, pond, or pool,a relatively shallow, wide depression, the bottom
of which usually has a continuous gradient,a shore zone of coarse
unconsolidated sediment that extends from the low-water line to the
highest reach of storm waves,a surface-navigation hazard composed of
consolidated material,a surface-navigation hazard composed of
unconsolidated
material};{0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063};{a
barrier constructed across a stream to impound water,a comparatively
depressed area on an icecap,a facility for pumping water from a major
well or through a pipeline,a large inland body of standing water,an
area drained by a stream,an embankment bordering a canyon, valley, or
seachannel,a place where diatomaceous earth is extracted,a series of
associated ridges or seamounts,a sugar mill no longer used as a sugar
mill,bowl-like hollows partially surrounded by cliffs or steep slopes
at the head of a glaciated valley,well-delineated subdivisions of a
large and complex positive feature};0.0103485
public;layertype;code;0.023845;9;-1;;;{A.ADM1,H.INLT,H.STMM,L.RNGA,S.BUSTN,S.HUT,S.PKLT,S.TRIG,T.MTS,U.GAPU,V.VINS};-0.852108

This table contains identical data.

Thanx for your help Tom




Tom Lane wrote:
 Christo Du Preez [EMAIL PROTECTED] writes:
   
 Yes, I have just about tried every combination of vacuum on the
 database. Just to make 100% sure.
 

 Well, there's something mighty wacko about that rowcount estimate;
 even if you didn't have stats, the estimate for a simple equality
 constraint oughtn't be 100% match.

 What do you get from SELECT * FROM pg_stats WHERE tablename = 'layertype'
 on both systems?

   regards, tom lane


   

-- 
Christo Du Preez

Senior Software Engineer
Mecola IT
Mobile:  +27 [0]83 326 8087
Skype:   christodupreez
Website: http://www.locateandtrade.co.za


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Variable (degrading) performance

2007-06-12 Thread Heikki Linnakangas

Vladimir Stankovic wrote:
What I am hoping to see is NOT the same value for all the executions of 
the same type of transaction (after some transient period). Instead, I'd 
like to see that if I take appropriately-sized set of transactions  I 
will see at least steady-growth in transaction average times, if not 
exactly the same average. Each chunk would possibly include sudden 
performance drop due to the necessary vacuum and checkpoints. The 
performance might be influenced by the change in the data set too.
I am unhappy about the fact that durations of experiments can differ 
even 30% (having in mind that they are not exactly the same due to the 
non-determinism on the client side) . I would like to eliminate this 
variability. Are my expectations reasonable? What could be the cause(s) 
of this variability?


You should see that if you define your chunk to be long enough. Long 
enough is probably hours, not minutes or seconds. As I said earlier, 
checkpoints and vacuum are a major source of variability.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best use of second controller with faster disks?

2007-06-12 Thread Vivek Khera


On Jun 11, 2007, at 9:14 PM, Francisco Reyes wrote:


RAID card 1 with 8 drives. 7200 RPM SATA RAID10
RAID card 2 with 4 drives. 10K RPM SATA RAID10



what raid card have you got?  i'm playing with an external enclosure  
which has an areca sata raid in it and connects to the host via fibre  
channel.  it is wicked fast, and supports a RAID6 which seems to be  
as fast as the RAID10 in my initial testing on this unit.


What drives are you booting from?  If you're booting from the 4-drive  
RAID10, perhaps split that into a pair of RAID1's and boot from one  
and use the other as the pg log disk.


however, I must say that with my 16 disk array, peeling the log off  
the main volume actually slowed it down a bit.  I think that the raid  
card is just so fast at doing the RAID6 computations and having the  
striping is a big gain over the dedicated RAID1 for the log.


Right now I'm testing an 8-disk RAID6 configuration on the same   
device; it seems slower than the 16-disk RAID6, but I haven't yet  
tried 8-disk RAID10 with dedicated log yet.


Besides having pg_xlog in the 10K RPM drives what else can I do to  
best use those drives other than putting some data in them?


Iostat shows the drives getting used very little, even during  
constant updates and vacuum.


Some of the postgresl.conf settings that may be relevant.
wal_buffers = 64
checkpoint_segments = 64


i'd bump checkpoint_segements up to 256 given the amount of disk  
you've got dedicated to it.  be sure to increase checkpoint timeout too.


And if you can move to 6.2 FreeBSD you should pick up some speed on  
the network layer and possibly the disk I/O.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] performance drop on 8.2.4, reverting to 8.1.4

2007-06-12 Thread Steven Flatt

Thanks Tom and Alvaro.

To follow up on this, I re-wrote and tweaked a number of queries (including
the one provided) to change LEFT OUTER JOIN ... WHERE col IS NULL clauses
to WHERE col NOT IN (...) clauses.

This has brought performance to an acceptable level on 8.2.

Thanks for your time,
Steve


On 6/7/07, Tom Lane [EMAIL PROTECTED] wrote:


Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane escribió:
 I was hoping that the auto plan invalidation code in CVS HEAD would get
 it out of this problem, but it seems not to for the problem-as-given.
 The trouble is that it won't change plans until autovacuum analyzes the
 tables, and that won't happen until the transaction commits and sends
 off its I-inserted-lotsa-rows report to the stats collector.

 I think there is something we can do about this -- drop the default
 value for analyze threshold.

Maybe worth doing, but it doesn't help for Steve's example.

   regards, tom lane



Re: [PERFORM] Best use of second controller with faster disks?

2007-06-12 Thread Francisco Reyes

Vivek Khera writes:


what raid card have you got?


2 3ware cards.
I believe both are 9550SX 

i'm playing with an external enclosure  
which has an areca sata raid in it and connects to the host via fibre  
channel. 


What is the OS? FreeBSD?
One of the reasons I stick with 3ware is that it is well supported in 
FreeBSD and has a pretty decent management program


it is wicked fast, and supports a RAID6 which seems to be  
as fast as the RAID10 in my initial testing on this unit.


My next large machine I am also leaning towards RAID6. The space different 
is just too big to ignore.

3ware recommends RAID6 for 5+ drives.
 

What drives are you booting from?


Booting from the 8 drive raid.

 If you're booting from the 4-drive  
RAID10, perhaps split that into a pair of RAID1's and boot from one  
and use the other as the pg log disk.


Maybe for the next machine.

however, I must say that with my 16 disk array, peeling the log off  
the main volume actually slowed it down a bit.  I think that the raid  
card is just so fast at doing the RAID6 computations and having the  
striping is a big gain over the dedicated RAID1 for the log.


Could be.
Seems like RAID6 is supposed to be a good balance between performance and 
available space.


Right now I'm testing an 8-disk RAID6 configuration on the same   
device; it seems slower than the 16-disk RAID6, but I haven't yet  
tried 8-disk RAID10 with dedicated log yet.


Is all this within the same controller?

i'd bump checkpoint_segements up to 256 given the amount of disk  
you've got dedicated to it.  be sure to increase checkpoint timeout too.


Thanks. Will try that.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org