Re: [PERFORM] log_statement at postgres.conf

2008-07-21 Thread Pomarede Nicolas

On Mon, 21 Jul 2008, System/IJS - Joko wrote:


Thx a lot Nicolas,

I finaly success to log query statement because of your simple explanation.
I have other question:
1. Is there posibility to automatically logging that statement to table?


I don't know, never tried that.


2. All of that statement is come from every database on my server,
could I know from which database that statement come?
or at least I can filter to log only from database X ?


You can modify 'log_line_prefix' to add the database name :
use '%d %t %p %r ' instead of the default '%t %p %r ' for example.

3. If I need to log only changed made on my database, then the value of 
'log_statement' is 'mod' ?


yes


Nicolas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Stephane Bailliez

Luke Lonergan wrote:


pgbench is unrelated to the workload you are concerned with if ETL/ELT 
and decision support / data warehousing queries are your target.


Also - placing the xlog on dedicated disks is mostly irrelevant to 
data warehouse / decision support work or ELT.  If you need to 
maximize loading speed while concurrent queries are running, it may be 
necessary, but I think you'll be limited in load speed by CPU related 
to data formatting anyway.


Indeed. pgbench was mostly done as 'informative' and not really relevant 
to the future workload of this db. (given the queries it's doing not 
sure it's relevant for anything but connections speed,
interesting for me to get reference for tx like however). I was more 
interested in the raw disk performance.




The primary performance driver for ELT / DW is sequential transfer 
rate, thus the dd test at 2X memory.  With six data disks of this 
type, you should expect a maximum of around 6 x 80 = 480 MB/s.  With 
RAID10, depending on the raid adapter, you may need to have two or 
more IO streams to use all platters, otherwise your max speed for one 
query would be 1/2 that, or 240 MB/s.



ok, which seems to be in par with what I'm getting. (the 240 that is)



I'd suggest RAID5, or even better, configure all eight disks as a JBOD 
in the RAID adapter and run ZFS RAIDZ.  You would then expect to get 
about 7 x 80 = 560 MB/s on your single query.


Do you have a particular controller and disk hardware configuration in 
mind when you're suggesting RAID5 ?
My understanding was it was more difficult to find the right hardware to 
get performance on RAID5 compared to RAID10.




That said, your single cpu on one query will only be able to scan that 
data at about 300 MB/s (try running a SELECT COUNT(*) against a table 
that is 2X memory size).



Note quite 2x memory size, but ~26GB (accounts with scaling factor 2000):

$ time psql -c "select count(*) from accounts" pgbench
  count
---
2
(1 row)

real1m52.050s
user0m0.020s
sys 0m0.020s


NB: For the sake of completness, reran the pgbench by taking average of 
10 runs for each scaling factor (same configuration as per initial mail, 
columns are scaling factor, db size, average tps)


1 20 23451
100 1565 21898
200 3127 20474
300 4688 20003
400 6249 20637
500 7810 16434
600 9372 15114
700 11000 14595
800 12000 16090
900 14000 14894
1000 15000 3071
1200 18000 3382
1400 21000 1888
1600 24000 1515
1800 27000 1435
2000 3 1354

-- stephane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Luke Lonergan
Hi Stephane,

On 7/21/08 1:53 AM, "Stephane Bailliez" <[EMAIL PROTECTED]> wrote:

>> I'd suggest RAID5, or even better, configure all eight disks as a JBOD
>> in the RAID adapter and run ZFS RAIDZ.  You would then expect to get
>> about 7 x 80 = 560 MB/s on your single query.
>> 
> Do you have a particular controller and disk hardware configuration in
> mind when you're suggesting RAID5 ?
> My understanding was it was more difficult to find the right hardware to
> get performance on RAID5 compared to RAID10.

If you're running RAIDZ on ZFS, the controller you have should be fine.
Just configure the HW RAID controller to treat the disks as JBOD (eight
individual disks), then make a single RAIDZ zpool of the eight disks.  This
will run them in a robust SW RAID within Solaris.  The fault management is
superior to what you would otherwise have in your HW RAID and the
performance should be much better.

- Luke


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Stephane Bailliez

Greg Smith wrote:


Note that I've had some issues with the desktop Ubuntu giving slower 
results in tests like this than the same kernel release using the 
stock kernel parameters.  Haven't had a chance yet to see how the 
server Ubuntu kernel fits into that or exactly what the desktop one is 
doing wrong yet. Could be worse--if you were running any 8.04 I expect 
your pgbench results would be downright awful.


Ah interesting. Isn't it a scheduler problem, I thought CFQ was the 
default for desktop ?
I doublechecked the 7.10 server on this box and it's really the deadline 
one that is used:


cat /sys/block/sdb/queue/scheduler
noop anticipatory [deadline] cfq

Do you have some more pointers on the 8.04 issues you mentioned ? 
(that's deemed to be the next upgrade from ops)



postgresql 8.2.9 with data and xlog as mentioned above
There are so many known performance issues in 8.2 that are improved in 
8.3 that I'd suggest you really should be considering it for a new 
install at this point.


Yes I'd definitely prefer to go 8.3 as well but there are a couple 
reasons for now I have to suck it up:

- 8.2 is the one in the 7.10 repository.
- I need plr as well and 8.3-plr debian package does not exist yet.

(I know in both cases we could recompile and install it from there, but ...)

In general, you'll want to use a couple of clients per CPU core for 
pgbench tests to get a true look at the scalability.  Unfortunately, 
the way the pgbench client runs means that it tends to top out at 20 
or 30 thousand TPS on read-only tests no matter how many cores you 
have around. But you may find operations where peak throughput comes 
at closer to 32 clients here rather than just 8.

ok. Make sense.

As far as the rest of your results go, Luke's comment that you may 
need more than one process to truly see the upper limit of your disk 
performance is right on target.  More useful commentary on that issue 
I'd recomend is near the end of 
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ 


Yeah I was looking at that url as well. Very useful.

Thanks for all the info Greg.

-- stephane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Less rows -> better performance?

2008-07-21 Thread Andreas Hartmann

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate "archive"
application. This would reduce the overall amount of data in the main
application by about 80% at the moment.

Actually I doubt that this will have the desired effect, since the
semester is part of the primary key in virtually all tables (apart from
some small tables containing string constants etc.), and therefore
indexed. Some tests with EXPLAIN ANALYZE and some web tests (JMeter)
seem to confirm this, the queries showed the same performance with 2 and
10 semesters.

But since I'm not sure yet, I would very much appreciate any answers to
the following questions:

- Do you think the approach (reducing the data) is effective?
- Are there any particular tests which I should do?

Thanks a lot in advance!

-- Andreas



--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Richard Huxton

Andreas Hartmann wrote:

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate "archive"
application. 


If you had 27GB of data maybe, but you've only got 27MB - that's 
presumably all sitting in memory.


What in particular is slow?

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Perl/DBI vs Native

2008-07-21 Thread Valentin Bogdanov
Hi,

I have ran quite a few tests comparing how long a query takes to execute from 
Perl/DBI as compared to psql/pqlib. No matter how many times I run the test the 
results were always the same.

I run a SELECT all on a fairly big table and enabled the 
log_min_duration_statement option. With psql postgres consistently logs half a 
second while the exact same query executed with Perl/DBI takes again 
consistently 2 seconds.

If I were timing the applications I would have been too much surprised by these 
results, obviously, processing with Perl would be slower than a native 
application. But it's the postmaster that gives these results. Could it be 
because the DBI module is slower at assimilating the data?

Any light on the subject would be greatly appreciated.


Regards,

Val


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Andreas Hartmann

Richard, thanks for your reply!

Richard Huxton schrieb:

Andreas Hartmann wrote:

Dear PostgreSQL community,

first some info about our application:

- Online course directory for a University
- Amount of data: complete dump is 27 MB
- Semester is part of primary key in each table
- Data for approx. 10 semesters stored in the DB
- Read-only access from web application (JDBC)

Our client has asked us if the performance of the application could be
improved by moving the data from previous years to a separate "archive"
application. 


If you had 27GB of data maybe, but you've only got 27MB - that's 
presumably all sitting in memory.


Here's some info about the actual amount of data:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - 
is this because of index data etc.?




What in particular is slow?


There's no particular bottleneck (at least that we're aware of). During 
the first couple of days after the beginning of the semester the 
application request processing tends to slow down due to the high load 
(many students assemble their schedule). The customer upgraded the 
hardware (which already helped a lot), but they asked us to find further 
approaches to performance optimiziation.


-- Andreas


--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Guillaume Smet
On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> SELECT pg_database.datname,
> pg_size_pretty(pg_database_size(pg_database.datname)) AS size
> FROM pg_database where pg_database.datname = 'vvz_live_1';
>
>datname|  size
> ---+-
>  vvz_live_1| 2565 MB
>
> I wonder why the actual size is so much bigger than the data-only dump - is
> this because of index data etc.?

More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Richard Huxton

Andreas Hartmann wrote:


Here's some info about the actual amount of data:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - 
is this because of index data etc.?


I suspect Guillame is right and you've not been vacuuming. That or 
you've got a *LOT* of indexes. If the database is only 27MB dumped, I'd 
just dump/restore it.


Since the database is read-only it might be worth running CLUSTER on the 
 main tables if there's a sensible ordering for them.



What in particular is slow?


There's no particular bottleneck (at least that we're aware of). During 
the first couple of days after the beginning of the semester the 
application request processing tends to slow down due to the high load 
(many students assemble their schedule). The customer upgraded the 
hardware (which already helped a lot), but they asked us to find further 
approaches to performance optimiziation.


1. Cache sensibly at the application (I should have thought there's 
plenty of opportunity here).
2. Make sure you're using a connection pool and have sized it reasonably 
(try 4,8,16 see what loads you can support).
3. Use prepared statements where it makes sense. Not sure how you'll 
manage the interplay between this and connection pooling in JDBC. Not a 
Java man I'm afraid.


If you're happy with the query plans you're looking to reduce overheads 
as much as possible during peak times.


4. Offload more of the processing to clients with some fancy ajax-ed 
interface.
5. Throw in a spare machine as an app server for the first week of term. 
Presumably your load is 100 times average at this time.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Craig Ringer

Valentin Bogdanov wrote:

Hi,

I have ran quite a few tests comparing how long a query takes to execute from 
Perl/DBI as compared to psql/pqlib. No matter how many times I run the test the 
results were always the same.

I run a SELECT all on a fairly big table and enabled the 
log_min_duration_statement option. With psql postgres consistently logs half a 
second while the exact same query executed with Perl/DBI takes again 
consistently 2 seconds.

If I were timing the applications I would have been too much surprised by these 
results, obviously, processing with Perl would be slower than a native 
application. But it's the postmaster that gives these results. Could it be 
because the DBI module is slower at assimilating the data?

Any light on the subject would be greatly appreciated.


Random guess: Perl's DBI is using parameterized prepared statements, 
preventing the optimizer from using its knowledge about common values in 
the table to decide whether or not index use is appropriate. When you're 
writing the query in psql, you're not using prepared statements so the 
optimizer can be cleverer.


Try comparing:

SELECT statement

to

PREPARE test(params) AS statement;
EXECUTE test(params);

eg:

SELECT x + 44 FROM t;

vs:

PREPARE test(int) AS x + $1 FROM t;
EXECUTE test(44);

Use EXPLAIN ANALYZE to better understand the changes in the query plan.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Craig Ringer

Guillaume Smet wrote:

On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann <[EMAIL PROTECTED]> wrote:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

   datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - is
this because of index data etc.?


More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?


You might also want to REINDEX and see if that improves things. My 
understanding is that if vacuum isn't run regularly, the indexes may end 
up a bit of a mess as well as the tables.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Rusty Conover


On Jul 21, 2008, at 5:19 AM, Valentin Bogdanov wrote:


Hi,

I have ran quite a few tests comparing how long a query takes to  
execute from Perl/DBI as compared to psql/pqlib. No matter how many  
times I run the test the results were always the same.


I run a SELECT all on a fairly big table and enabled the  
log_min_duration_statement option. With psql postgres consistently  
logs half a second while the exact same query executed with Perl/DBI  
takes again consistently 2 seconds.


If I were timing the applications I would have been too much  
surprised by these results, obviously, processing with Perl would be  
slower than a native application. But it's the postmaster that gives  
these results. Could it be because the DBI module is slower at  
assimilating the data?


Hi Val,

Yes, DBI can be slower then the native C interface.  The speed depends  
on how the data is being returned inside of Perl. Returning hashes is  
a slower method then returning arrays from what I've found due to the  
overhead in the creation of the objects in Perl.


So:

my $r = $dbh->selectall_arrayref("select * from table", { Columns =>  
{}});


Is slower then:

my $r = $dbh->selectall_arrayref("select * from table", undef);

Secondarily, if you're returning a lot of rows you may want to look  
into using a cursor, so that you can fetch the rows a 1000 at a time  
in a tight loop then discard them once you are done with them.  This  
will hopefully prevent the system from having continually allocate  
memory for all of your rows.  For each field in each row Perl  
allocates memory to store the value from Postgres, so if you have many  
fields on your table this can be a large number of allocations  
depending on the number of rows.  Any userland profile tool should  
help you debug what's going on here.


Cheers,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com - http://www.gearbuyer.com - 
http://www.footwearbuyer.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Christian GRANDIN
Hi,

Reducing the amount of data will only have effect on table scan or index
scan. If your queries are selective and optimized, it will have no effect.

Before looking for solutions, the first thing to do is to understand what's
happen.

If you already know the queries then explain them. Otherwise, you must log
duration with the log_statement and log_min_duration parameters in the
postgresql.conf.

Before this, you must at least run VACUUM ANALYZE on the database to collect
actual statistics and have current explain plans.

Best regards.

Christian

2008/7/21 Richard Huxton <[EMAIL PROTECTED]>

> Andreas Hartmann wrote:
>
>>
>> Here's some info about the actual amount of data:
>>
>> SELECT pg_database.datname,
>> pg_size_pretty(pg_database_size(pg_database.datname)) AS size
>> FROM pg_database where pg_database.datname = 'vvz_live_1';
>>
>>datname|  size
>> ---+-
>>  vvz_live_1| 2565 MB
>>
>> I wonder why the actual size is so much bigger than the data-only dump -
>> is this because of index data etc.?
>>
>
> I suspect Guillame is right and you've not been vacuuming. That or you've
> got a *LOT* of indexes. If the database is only 27MB dumped, I'd just
> dump/restore it.
>
> Since the database is read-only it might be worth running CLUSTER on the
>  main tables if there's a sensible ordering for them.
>
>  What in particular is slow?
>>>
>>
>> There's no particular bottleneck (at least that we're aware of). During
>> the first couple of days after the beginning of the semester the application
>> request processing tends to slow down due to the high load (many students
>> assemble their schedule). The customer upgraded the hardware (which already
>> helped a lot), but they asked us to find further approaches to performance
>> optimiziation.
>>
>
> 1. Cache sensibly at the application (I should have thought there's plenty
> of opportunity here).
> 2. Make sure you're using a connection pool and have sized it reasonably
> (try 4,8,16 see what loads you can support).
> 3. Use prepared statements where it makes sense. Not sure how you'll manage
> the interplay between this and connection pooling in JDBC. Not a Java man
> I'm afraid.
>
> If you're happy with the query plans you're looking to reduce overheads as
> much as possible during peak times.
>
> 4. Offload more of the processing to clients with some fancy ajax-ed
> interface.
> 5. Throw in a spare machine as an app server for the first week of term.
>  Presumably your load is 100 times average at this time.
>
> --
>  Richard Huxton
>  Archonet Ltd
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Andreas Hartmann

Guillaume Smet schrieb:

On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann <[EMAIL PROTECTED]> wrote:

SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database where pg_database.datname = 'vvz_live_1';

   datname|  size
---+-
 vvz_live_1| 2565 MB

I wonder why the actual size is so much bigger than the data-only dump - is
this because of index data etc.?


More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?


Thanks for the hint!

I just verified that the autovacuum property is enabled. I did the 
following to prepare the tests:


- setup two test databases, let's call them db_all and db_current
- import the dump from the live DB into both test DBs
- delete the old semester data from db_current, leaving only the current 
data


Both test DBs were 600 MB large after this. I did a VACUUM FULL ANALYZE 
on both of them now. db_all didn't shrink significantly (only 1 MB), 
db_current shrunk to 440 MB. We're using quite a lot of indexes, I guess 
that's why that much data are allocated.


-- Andreas

--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Harald Armin Massa
Andreas,

> I just verified that the autovacuum property is enabled. I did the following
> to prepare the tests:

"autovacuum property is enabled" Did you also check the logs, if
autovacuum is working?

> - setup two test databases, let's call them db_all and db_current
> - import the dump from the live DB into both test DBs
> - delete the old semester data from db_current, leaving only the current
> data
>
> Both test DBs were 600 MB large after this. I did a VACUUM FULL ANALYZE on
> both of them now. db_all didn't shrink significantly (only 1 MB), db_current
> shrunk to 440 MB.

Your test is not testing if vacuum is done on your production
database! With pg_dump + pg_restore you removed next to all database
bloat. (theoretically all)

After loading a fresh dump, vacuuming ideally has to do nearly
nothing; after deleting some data VACUUM reclaims the memory of the
deleted rows, thats the shrinking you see after delete + vacuum.

The bload in your production system may be the result of updates and
deletes in that system; dumping and restoring removes that bloat.

If your life DB is ~2,5Gig, and your dumped / restored DB is only
600MB, that 2500MB minus 600MB is some bloat from not vacuuming or
bloated indexes. So, before the start of the next semester, at least
do vacuum. (maybe also reindex)

Best wishes,

Harald



-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pidgeon
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Andreas Hartmann

Mario Weilguni schrieb:

Andreas Hartmann schrieb:


[…]


I just verified that the autovacuum property is enabled.


[…]


Did you have:
stats_start_collector = on
stats_block_level = on
stats_row_level = on

Otherwise autovacuum won't run IMO.


Thanks for the hint! The section looks like this:

stats_start_collector = on
#stats_command_string = off
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off


I'll check the logs if the vacuum really runs - as soon as I find them :)

-- Andreas
--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Mario Weilguni

Andreas Hartmann schrieb:

Mario Weilguni schrieb:

Andreas Hartmann schrieb:


[…]


I just verified that the autovacuum property is enabled.


[…]


Did you have:
stats_start_collector = on
stats_block_level = on
stats_row_level = on

Otherwise autovacuum won't run IMO.


Thanks for the hint! The section looks like this:

stats_start_collector = on
#stats_command_string = off
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off


I'll check the logs if the vacuum really runs - as soon as I find them :)

-- Andreas

You might want to use these entries in your config:
redirect_stderr = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d

Fit those to your needs, then you will find log entries in $PGDATA/pg_log/

And BTW, I was wrong, you just need to have stats_row_level=On, 
stats_block_level doesn't matter. But in fact it's simple, if you don't 
have 24x7 requirements type VACUUM FULL ANALYZE; and check if your DB 
becomes smaller, I really doubt you can have that much indizes that 27MB 
dumps might use 2.3 GB on-disk.


You can check this too:
select relname, relpages, reltuples, relkind
 from pg_class
where relkind in ('r', 'i')
order by relpages desc limit 20;

Will give you the top-20 tables and their sizes, 1 page is typically 
8KB, so you can cross-check if relpages/reltuples is completly off, this 
is a good indicator for table/index bloat.


Regards,
Mario


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Craig James
Valentin Bogdanov wrote: 

I have ran quite a few tests comparing how long a query takes to
execute from Perl/DBI as compared to psql/pqlib. No matter how many
times I run the test the results were always the same.

I run a SELECT all on a fairly big table and enabled the
log_min_duration_statement option. With psql postgres consistently
logs half a second while the exact same query executed with Perl/DBI
takes again consistently 2 seconds.


The problem may be that your two tests are not equivalent.  When Perl executes 
a statement, it copies the *entire* result set back to the client before it 
returns the first row.  The following program might appear to just be fetching 
the first row:

 $sth = $dbh->prepare("select item from mytable");
 $sth->execute();
 $item = $sth->fetchrow_array();

But in fact, before Perl returns from the $sth->execute() statement, it has 
already run the query and copied all of the rows into a hidden, client-side cache.  
Each $sth->fetchrow_array() merely copies the data from the hidden cache into your 
local variable.

By contrast, psql executes the query, and starts returning the data a page at a 
time.  So it may appear to be much faster.

This also means that Perl has trouble with very large tables.  If the "mytable" 
in the above example is very large, say a hundred billion rows, you simply can't execute 
this statement in Perl.  It will try to copy 100 billion rows into memory before 
returning the first answer.

The reason for Perl's behind-the-scenes caching is because it allows multiple 
connections to a single database, and multiple statements on each database 
handle.  By executing each statement completely, it gives the appearance that 
multiple concurrent queries are supported.  The downside is that it can be a 
huge memory hog.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] [BACKUPS]Little backups

2008-07-21 Thread Leví Teodoro da Silva
Hi Guys,

I am developing a project with PostgreSQL and one guy from project is
familiar with Oracle and did a question for me, but i could not answer, if
someone could help it will be good. =)
The question is :
*
- In oracle he makes a full backup two times in a day. In this range of
time, Oracle make a lot of mini-backups, but this backups is about just the
data whose have changed in this time. If the system fails, he could
reconstruct the database adding the last "big backup" with "mini-backups".
Can Postgres do this ?  *




Regards,
Leví - Brazil


Re: [PERFORM] [BACKUPS]Little backups

2008-07-21 Thread Kevin Grittner
>>> "Leví Teodoro da Silva" <[EMAIL PROTECTED]> wrote: 
 
> - In oracle he makes a full backup two times in a day. In this range
of
> time, Oracle make a lot of mini-backups, but this backups is about
just the
> data whose have changed in this time. If the system fails, he could
> reconstruct the database adding the last "big backup" with
"mini-backups".
> Can Postgres do this ?  *
 
The equivalent capability in PostgreSQL is the Point-In-Time Recovery
backup strategy:
 
http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html
 
Twice daily seems rather extreme -- we generally go with monthly.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BACKUPS]Little backups

2008-07-21 Thread Albert Cervera Areny
A Dilluns 21 Juliol 2008, Leví Teodoro da Silva va escriure:
> Hi Guys,
>
> I am developing a project with PostgreSQL and one guy from project is
> familiar with Oracle and did a question for me, but i could not answer, if
> someone could help it will be good. =)
> The question is :
> *
> - In oracle he makes a full backup two times in a day. In this range of
> time, Oracle make a lot of mini-backups, but this backups is about just the
> data whose have changed in this time. If the system fails, he could
> reconstruct the database adding the last "big backup" with "mini-backups".
> Can Postgres do this ?  *

Yes, it can. If you need detailed information, you can take a look at 
http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html

>
>
>
>
> Regards,
> Leví - Brazil
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BACKUPS]Little backups

2008-07-21 Thread A. Kretschmer
am  Mon, dem 21.07.2008, um 15:20:27 -0300 mailte Leví Teodoro da Silva 
folgendes:
> - In oracle he makes a full backup two times in a day. In this range of time,
> Oracle make a lot of mini-backups, but this backups is about just the data
> whose have changed in this time. If the system fails, he could reconstruct the
> database adding the last "big backup" with "mini-backups". Can Postgres do 
> this
> ? 

Sure, with the WAL-files.

http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Greg Smith

On Mon, 21 Jul 2008, Stephane Bailliez wrote:

Isn't it a scheduler problem, I thought CFQ was the default for desktop 
?


CFQ/Deadline/AS are I/O scheduler choices.  What changed completely in 
2.6.23 is the kernel process scheduler. 
http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt gives 
some info about the new one.


While the switch to CFS has shown great improvements in terms of desktop 
and many server workloads, what I discovered is that the pgbench test 
program itself is really incompatible with it.  There's a kernel patch 
that seems to fix the problem at http://lkml.org/lkml/2008/5/27/58 but I 
don't think it's made it into a release yet.


This is not to say the kernel itself is unsuitable for running PostgreSQL 
itself, but if you're using pgbench as the program to confirm that I 
expect you'll be dissapointed with results under the Ubuntu 8.04 kernel. 
It tops out at around 10,000 TPS running the select-only test for me while 
older kernels did 3X that much.


Yes I'd definitely prefer to go 8.3 as well but there are a couple reasons 
for now I have to suck it up:

- 8.2 is the one in the 7.10 repository.
- I need plr as well and 8.3-plr debian package does not exist yet.
(I know in both cases we could recompile and install it from there, but ...)


Stop and think about this for a minute.  You're going into production with 
an older version having a set of known, impossible to work around issues 
that if you hit them the response will be "upgrade to 8.3 to fix that", 
which will require the major disruption to your application of a database 
dump and reload at that point if that fix becomes critical.  And you can't 
just do that now because of some packaging issues?  I hope you can impress 
upon the other people involved how incredibly short-sighted that is.


Unfortunately, it's harder than everyone would like to upgrade an existing 
PostgreSQL installation.  That really argues for going out of your way ir 
necessary to deploy the latest stable release when you're building 
something new, if there's not some legacy bits seriously holding you back.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Tom Lane
Craig James <[EMAIL PROTECTED]> writes:
> Valentin Bogdanov wrote: 
>> I have ran quite a few tests comparing how long a query takes to
>> execute from Perl/DBI as compared to psql/pqlib. No matter how many
>> times I run the test the results were always the same.
>> 
>> I run a SELECT all on a fairly big table and enabled the
>> log_min_duration_statement option. With psql postgres consistently
>> logs half a second while the exact same query executed with Perl/DBI
>> takes again consistently 2 seconds.

> The problem may be that your two tests are not equivalent.  When Perl
> executes a statement, it copies the *entire* result set back to the
> client before it returns the first row.

Sure, but so does psql (unless you've turned on the magic FETCH_COUNT
setting).  I think the theories about prepared versus literal statements
were more promising; but I don't know DBI well enough to know exactly
what it was sending to the server.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BACKUPS]Little backups

2008-07-21 Thread Berge Schwebs Bjørlo
On Mon, Jul 21, 2008 at 03:20:27PM -0300, Leví Teodoro da Silva wrote:
> - In oracle he makes a full backup two times in a day. In this range of
> time, Oracle make a lot of mini-backups, but this backups is about just the
> data whose have changed in this time. If the system fails, he could
> reconstruct the database adding the last "big backup" with "mini-backups".
> Can Postgres do this ?  *

Take a look at Point-In-Time-Recovery, PITR:
http://www.postgresql.org/docs/current/static/continuous-archiving.html

-Berge

-- 
Berge Schwebs Bjørlo
Alegría!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BACKUPS]Little backups

2008-07-21 Thread Levi

Thank you guys for the fast answer.

This same guy asked me about the support on PostgreSQL. When he see the 
community behind PostgreSQL , he never will be worried about support. =)


Thanks a lot,
Leví

A. Kretschmer escreveu:

am  Mon, dem 21.07.2008, um 15:20:27 -0300 mailte Leví Teodoro da Silva 
folgendes:
  

- In oracle he makes a full backup two times in a day. In this range of time,
Oracle make a lot of mini-backups, but this backups is about just the data
whose have changed in this time. If the system fails, he could reconstruct the
database adding the last "big backup" with "mini-backups". Can Postgres do this
? 



Sure, with the WAL-files.

http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html


Andreas
  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Tom Lane wrote:
> Sure, but so does psql (unless you've turned on the magic FETCH_COUNT
> setting).  I think the theories about prepared versus literal statements
> were more promising; but I don't know DBI well enough to know exactly
> what it was sending to the server.

Almost certainly a prepared_statement unless no placeholders were being
used at all. Another way to test (from the DBI side) is to set
$sth->{pg_server_prepare} = 0, which will send the SQL directly to the
backend, just as if you've typed it in at a command prompt. You can also
use the tracing mechanism of DBI to see what's going on behind the scenes.
For example:

$dbh->trace('SQL');

$dbh->do("SELECT 1234 FROM pg_class WHERE relname = 'bob'");
$dbh->do("SELECT 1234 FROM pg_class WHERE relname = ?", undef, 'mallory');

$sth = $dbh->prepare("SELECT 4567 FROM pg_class WHERE relname = ?");
$sth->execute('alice');
$sth->{pg_server_prepare} = 0;
$sth->execute('eve1');
$sth->{pg_server_prepare} = 1;
$sth->execute('eve2');

$dbh->commit;

Outputs:

===

begin;

SELECT 1234 FROM pg_class WHERE relname = 'bob';

EXECUTE SELECT 1234 FROM pg_class WHERE relname = $1 (
$1: mallory
);

PREPARE dbdpg_p22988_1 AS SELECT 4567 FROM pg_class WHERE relname = $1;

EXECUTE dbdpg_p22988_1 (
$1: alice
);

SELECT 4567 FROM pg_class WHERE relname = 'eve1';

EXECUTE dbdpg_p22988_1 (
$1: eve2
);

commit;

DEALLOCATE dbdpg_p22988_1;

===

You can even view exactly which libpq calls are being used at each point with:

$dbh->trace('SQL,libpq');

To get back to the original poster's complaint, you may want to figure out why
the difference is so great for a prepared plan. It may be that you need to
cast the placeholder(s) to a specific type, for example.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200807211637
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkiE83wACgkQvJuQZxSWSsiGrwCdGMLgauGwR2UzfoMPrTH/mrRg
nxsAnjx14goMV23a9yRjtSw+ixJWQkuI
=gjVE
-END PGP SIGNATURE-



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] A guide/tutorial to performance monitoring and tuning

2008-07-21 Thread Francisco Reyes
On 2:59 pm 06/29/08 Greg Smith <[EMAIL PROTECTED]> wrote:
> Right now I'm working with a few other people to put together a more
> straightforward single intro guide that should address some of the
> vagueness you point out here,

Was that ever completed?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Emil Pedersen


[...]


Yes I'd definitely prefer to go 8.3 as well but there are a couple
reasons for now I have to suck it up:
- 8.2 is the one in the 7.10 repository.
- I need plr as well and 8.3-plr debian package does not exist yet.

(I know in both cases we could recompile and install it from there,
but ...)


At least on debian it was quite easy to "backport" 8.3.3 from sid
to etch using apt-get's source and build-dep functions.  That way
you get a normal installable package.

I'm not sure, but given the similarity I would guess it won't be
much harder on ubuntu.

// Emil


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Emil Pedersen



--On tisdag, juli 22, 2008 01.20.52 +0200 Emil Pedersen 
<[EMAIL PROTECTED]> wrote:




[...]


Yes I'd definitely prefer to go 8.3 as well but there are a couple
reasons for now I have to suck it up:
- 8.2 is the one in the 7.10 repository.
- I need plr as well and 8.3-plr debian package does not exist yet.

(I know in both cases we could recompile and install it from there,
but ...)


At least on debian it was quite easy to "backport" 8.3.3 from sid
to etch using apt-get's source and build-dep functions.  That way
you get a normal installable package.

I'm not sure, but given the similarity I would guess it won't be
much harder on ubuntu.


I should have said that I was talking about the postgresql, I
missed the plr part.  I appologize for the noice.

// Emil


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Tom Lane
Emil Pedersen <[EMAIL PROTECTED]> writes:
>> At least on debian it was quite easy to "backport" 8.3.3 from sid
>> to etch using apt-get's source and build-dep functions.  That way
>> you get a normal installable package.

> I should have said that I was talking about the postgresql, I
> missed the plr part.  I appologize for the noice.

Still, there's not normally that much difference between the packaging
for one version and for the next.  I can't imagine that it would take
much time to throw together a package for 8.3 plr based on what you're
using for 8.2.  All modern package-based distros make this pretty easy.
The only reason not to do it would be if you're buying support from
a vendor who will only support specific package versions...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] A guide/tutorial to performance monitoring and tuning

2008-07-21 Thread Greg Smith

On Mon, 21 Jul 2008, Francisco Reyes wrote:


On 2:59 pm 06/29/08 Greg Smith <[EMAIL PROTECTED]> wrote:

Right now I'm working with a few other people to put together a more
straightforward single intro guide that should address some of the
vagueness you point out here,


Was that ever completed?


Not done yet; we're planning to have a first rev done in another couple of 
weeks.  The work in progress is at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and I'm due 
to work out another set of improvements to that this week during OSCON.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-21 Thread Stephane Bailliez

Greg Smith wrote:
CFQ/Deadline/AS are I/O scheduler choices.  What changed completely in 
2.6.23 is the kernel process scheduler. 
http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt 
gives some info about the new one.


While the switch to CFS has shown great improvements in terms of 
desktop and many server workloads, what I discovered is that the 
pgbench test program itself is really incompatible with it.  There's a 
kernel patch that seems to fix the problem at 
http://lkml.org/lkml/2008/5/27/58 but I don't think it's made it into 
a release yet.


This is not to say the kernel itself is unsuitable for running 
PostgreSQL itself, but if you're using pgbench as the program to 
confirm that I expect you'll be dissapointed with results under the 
Ubuntu 8.04 kernel. It tops out at around 10,000 TPS running the 
select-only test for me while older kernels did 3X that much.


ok, thanks for all the details. good to know.

Stop and think about this for a minute.  You're going into production 
with an older version having a set of known, impossible to work around 
issues that if you hit them the response will be "upgrade to 8.3 to 
fix that", which will require the major disruption to your application 
of a database dump and reload at that point if that fix becomes 
critical.  And you can't just do that now because of some packaging 
issues?  I hope you can impress upon the other people involved how 
incredibly short-sighted that is.


I understand what you're saying. However if I were to play devil's 
advocate, the existing one that I'm 'migrating' (read entirely changing 
schemas, 'migrating' data) is coming out from a 8.1.11 install. It is 
not a critical system. The source data is always available from another 
system and the postgresql system would be a 'client'. So if 8.2.x is so 
abysmal it should not even be considered for install compared to 8.1.x 
and that only 8.3.x is viable then ok that makes sense and I have to go 
the extra mile.


But message received loud and clear. Conveniently 8.3.3 is also 
available on backports so it does not cost much and pinning it will be 
and pinning it is right now. (don't think there will be any pb with plr, 
even though the original seems to be patched a bit, but that will be for 
later when I don't know what to do and that all is ready).


For the sake of completeness (even though irrelevant), here's the run 
with 32 clients on 8.3 same config as before (except max_fsm_pages at 
204800)


1 19 36292
100 1499 32127
200 2994 30679
300 4489 29673
400 5985 18627
500 7480 19714
600 8975 19437
700 1 20271
800 12000 18038
900 13000 9842
1000 15000 5996
1200 18000 5404
1400 2 3701
1600 23000 2877
1800 26000 2657
2000 29000 2612

cheers,

-- stephane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance