Re: [PERFORM] Less rows - better performance?

2008-07-22 Thread Ɓukasz Filut

[...]

You can check this too:
select relname, relpages, reltuples, relkind
 from pg_class
where relkind in ('r', 'i')
order by relpages desc limit 20;

Will give you the top-20 tables and their sizes, 1 page is typically 
8KB, so you can cross-check if relpages/reltuples is completly off, 
this is a good indicator for table/index bloat.

use this query :
select pg_size_pretty(pg_relation_size(oid)) as relation_size,relname, 
relpages, reltuples, relkind

from pg_class
where relkind in ('r', 'i')
order by relpages desc limit 20;

output will be much more readeable


Regards,
Mario

   Lukasz

--
Lukasz Filut
DBA - IT Group, WSB-TEB Corporation
Poznan - Poland


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Less rows - better performance?

2008-07-21 Thread Andreas Hartmann

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate archive
application. This would reduce the overall amount of data in the main
application by about 80% at the moment.

Actually I doubt that this will have the desired effect, since the
semester is part of the primary key in virtually all tables (apart from
some small tables containing string constants etc.), and therefore
indexed. Some tests with EXPLAIN ANALYZE and some web tests (JMeter)
seem to confirm this, the queries showed the same performance with 2 and
10 semesters.

But since I'm not sure yet, I would very much appreciate any answers to
the following questions:

- Do you think the approach (reducing the data) is effective?
- Are there any particular tests which I should do?

Thanks a lot in advance!

-- Andreas



--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Richard Huxton

Andreas Hartmann wrote:

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate archive
application. 


If you had 27GB of data maybe, but you've only got 27MB - that's 
presumably all sitting in memory.


What in particular is slow?

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Andreas Hartmann

Richard, thanks for your reply!

Richard Huxton schrieb:

Andreas Hartmann wrote:

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate archive
application. 


If you had 27GB of data maybe, but you've only got 27MB - that's 
presumably all sitting in memory.


Here's some info about the actual amount of data:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - 
is this because of index data etc.?




What in particular is slow?


There's no particular bottleneck (at least that we're aware of). During 
the first couple of days after the beginning of the semester the 
application request processing tends to slow down due to the high load 
(many students assemble their schedule). The customer upgraded the 
hardware (which already helped a lot), but they asked us to find further 
approaches to performance optimiziation.


-- Andreas


--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Guillaume Smet
On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann [EMAIL PROTECTED] wrote:
 SELECT pg_database.datname,
 pg_size_pretty(pg_database_size(pg_database.datname)) AS size
 FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
 ---+-
  vvz_live_1| 2565 MB

 I wonder why the actual size is so much bigger than the data-only dump - is
 this because of index data etc.?

More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Richard Huxton

Andreas Hartmann wrote:


Here's some info about the actual amount of data:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - 
is this because of index data etc.?


I suspect Guillame is right and you've not been vacuuming. That or 
you've got a *LOT* of indexes. If the database is only 27MB dumped, I'd 
just dump/restore it.


Since the database is read-only it might be worth running CLUSTER on the 
 main tables if there's a sensible ordering for them.



What in particular is slow?


There's no particular bottleneck (at least that we're aware of). During 
the first couple of days after the beginning of the semester the 
application request processing tends to slow down due to the high load 
(many students assemble their schedule). The customer upgraded the 
hardware (which already helped a lot), but they asked us to find further 
approaches to performance optimiziation.


1. Cache sensibly at the application (I should have thought there's 
plenty of opportunity here).
2. Make sure you're using a connection pool and have sized it reasonably 
(try 4,8,16 see what loads you can support).
3. Use prepared statements where it makes sense. Not sure how you'll 
manage the interplay between this and connection pooling in JDBC. Not a 
Java man I'm afraid.


If you're happy with the query plans you're looking to reduce overheads 
as much as possible during peak times.


4. Offload more of the processing to clients with some fancy ajax-ed 
interface.
5. Throw in a spare machine as an app server for the first week of term. 
Presumably your load is 100 times average at this time.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Craig Ringer

Guillaume Smet wrote:

On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann [EMAIL PROTECTED] wrote:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

   datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - is
this because of index data etc.?


More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?


You might also want to REINDEX and see if that improves things. My 
understanding is that if vacuum isn't run regularly, the indexes may end 
up a bit of a mess as well as the tables.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Christian GRANDIN
Hi,

Reducing the amount of data will only have effect on table scan or index
scan. If your queries are selective and optimized, it will have no effect.

Before looking for solutions, the first thing to do is to understand what's
happen.

If you already know the queries then explain them. Otherwise, you must log
duration with the log_statement and log_min_duration parameters in the
postgresql.conf.

Before this, you must at least run VACUUM ANALYZE on the database to collect
actual statistics and have current explain plans.

Best regards.

Christian

2008/7/21 Richard Huxton [EMAIL PROTECTED]

 Andreas Hartmann wrote:


 Here's some info about the actual amount of data:

 SELECT pg_database.datname,
 pg_size_pretty(pg_database_size(pg_database.datname)) AS size
 FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
 ---+-
  vvz_live_1| 2565 MB

 I wonder why the actual size is so much bigger than the data-only dump -
 is this because of index data etc.?


 I suspect Guillame is right and you've not been vacuuming. That or you've
 got a *LOT* of indexes. If the database is only 27MB dumped, I'd just
 dump/restore it.

 Since the database is read-only it might be worth running CLUSTER on the
  main tables if there's a sensible ordering for them.

  What in particular is slow?


 There's no particular bottleneck (at least that we're aware of). During
 the first couple of days after the beginning of the semester the application
 request processing tends to slow down due to the high load (many students
 assemble their schedule). The customer upgraded the hardware (which already
 helped a lot), but they asked us to find further approaches to performance
 optimiziation.


 1. Cache sensibly at the application (I should have thought there's plenty
 of opportunity here).
 2. Make sure you're using a connection pool and have sized it reasonably
 (try 4,8,16 see what loads you can support).
 3. Use prepared statements where it makes sense. Not sure how you'll manage
 the interplay between this and connection pooling in JDBC. Not a Java man
 I'm afraid.

 If you're happy with the query plans you're looking to reduce overheads as
 much as possible during peak times.

 4. Offload more of the processing to clients with some fancy ajax-ed
 interface.
 5. Throw in a spare machine as an app server for the first week of term.
  Presumably your load is 100 times average at this time.

 --
  Richard Huxton
  Archonet Ltd


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance