Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Cédric Villemain
2010/11/11 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  For Kevin's case, it seems like we want the caching percentage
 to vary not so much based on which table we're hitting at the moment
 but on how much of it we're actually reading.

 Well, we could certainly take the expected number of pages to read and
 compare that to effective_cache_size.  The thing that's missing in that
 equation is how much other stuff is competing for cache space.  I've
 tried to avoid having the planner need to know the total size of the
 database cluster, but it's kind of hard to avoid that if you want to
 model this honestly.

 Would it be at all workable to have an estimate that so many megs of a
 table are in cache

Yes, with Linux ... at least.

 (independently of any other table), and then we could
 scale the cost based on the expected number of pages to read versus that
 number?  The trick here is that DBAs really aren't going to want to set
 such a per-table number (at least, most of the time) so we need a
 formula to get to a default estimate for that number based on some simple
 system-wide parameters.  I'm not sure if that's any easier.

My current ideas for future POC with pgfincore are around what is said
currently in this thread.

I'd like to have some maintenance stuff like auto-ANALYZE which report
table and index usage of the OS cache, it might be % of data in cache
and distribution of data in cache (perhaps only my last 15% of the
table are in cache, or perhaps 15% of blocks with a more
regular-random?- distribution)
My current stats around OS cache illustrate that the OS page cache
remain stable : number of blocks in memory per object does not  change
a lot once application have run long enough.

Those are good stats to automaticaly adjust random_page_cost and
seq_page_cost per per table or index. DBA provide accurate (with the
hardware) random_page_cost and seq_page_cost , perhaps we may want a
mem_page_cost (?). Or we just adjust rand_page_cost and seq_page_cost
based on the average data in cache.
Actually I think that updating *_page_cost and keeping the current
design of effective_cache_size (in costsize.c) may rock enough.



 BTW, it seems that all these variants have an implicit assumption that
 if you're reading a small part of the table it's probably part of the
 working set; which is an assumption that could be 100% wrong.  I don't
 see a way around it without trying to characterize the data access at
 an unworkably fine level, though.

Exactly.


                        regards, tom lane

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Cédric Villemain
2010/11/11 Robert Haas robertmh...@gmail.com:
 On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  For Kevin's case, it seems like we want the caching percentage
 to vary not so much based on which table we're hitting at the moment
 but on how much of it we're actually reading.

 Well, we could certainly take the expected number of pages to read and
 compare that to effective_cache_size.  The thing that's missing in that
 equation is how much other stuff is competing for cache space.  I've
 tried to avoid having the planner need to know the total size of the
 database cluster, but it's kind of hard to avoid that if you want to
 model this honestly.

 I'm not sure I agree with that.  I mean, you could easily have a
 database that is much larger than effective_cache_size, but only that
 much of it is hot.  Or, the hot portion could move around over time.
 And for reasons of both technical complexity and plan stability, I
 don't think we want to try to model that.  It seems perfectly
 reasonable to say that reading 25% of effective_cache_size will be
 more expensive *per-page* than reading 5% of effective_cache_size,
 independently of what the total cluster size is.

 Would it be at all workable to have an estimate that so many megs of a
 table are in cache (independently of any other table), and then we could
 scale the cost based on the expected number of pages to read versus that
 number?  The trick here is that DBAs really aren't going to want to set
 such a per-table number (at least, most of the time) so we need a
 formula to get to a default estimate for that number based on some simple
 system-wide parameters.  I'm not sure if that's any easier.

 That's an interesting idea.  For the sake of argument, suppose we
 assume that a relation which is less than 5% of effective_cache_size
 will be fully cached; and anything larger we'll assume that much of it
 is cached.  Consider a 4GB machine with effective_cache_size set to
 3GB.  Then we'll assume that any relation less than 153MB table is
 100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5%
 cached.  That doesn't seem quite right, though: the caching percentage
 drops off very quickly after you exceed the threshold.

 *thinks*

 I wondering if we could do something with a formula like 3 *
 amount_of_data_to_read / (3 * amount_of_data_to_read +
 effective_cache_size) = percentage NOT cached.  That is, if we're
 reading an amount of data equal to effective_cache_size, we assume 25%
 caching, and plot a smooth curve through that point.  In the examples
 above, we would assume that a 150MB read is 87% cached, a 1GB read is
 50% cached, and a 3GB read is 25% cached.


But isn't it already the behavior of effective_cache_size usage ?

See  index_pages_fetched() in costsize.c



 BTW, it seems that all these variants have an implicit assumption that
 if you're reading a small part of the table it's probably part of the
 working set; which is an assumption that could be 100% wrong.  I don't
 see a way around it without trying to characterize the data access at
 an unworkably fine level, though.

 Me neither, but I think it will frequently be true, and I'm not sure
 it will hurt very much when it isn't.  I mean, if you execute the same
 query repeatedly, that data will become hot soon enough.  If you
 execute a lot of different queries that each touch a small portion of
 a big, cold table, we might underestimate the costs of the index
 probes, but so what?  There's probably no better strategy for
 accessing that table anyway.  Perhaps you can construct an example
 where this underestimate affects the join order in an undesirable
 fashion, but I'm having a hard time getting worked up about that as a
 potential problem case.  Our current system - where we essentially
 assume that the caching percentage is uniform across the board - can
 have the same problem in less artificial cases.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Vitalii Tymchyshyn

I'd say there are two Qs here:

1) Modify costs based on information on how much of the table is in 
cache. It would be great  if this can be done, but I'd prefer to have it 
as admin knobs (because of plan stability). May be both admin and 
automatic ways can be followed with some parallel (disableable) process 
modify knobs on admin behalf. In this case different strategies to 
automatically modify knobs can be applied.


2) Modify costs for part of table retrieval. Then you need to define 
part. Current ways are partitioning and partial indexes. Some similar 
to partial index thing may be created, that has only where clause and 
no data. But has statistics and knobs (and may be personal bufferspace 
if they are introduced). I don't like to gather data about last X 
percents or like, because it works only in clustering and it's hard for 
optimizer to decide if it will be enough to scan only this percents for 
given query.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Cédric Villemain
I supposed it was an answer to my mail but not sure... please keep
CC'ed people, it is easier to follow threads (at least for me)

2010/11/12 Vitalii Tymchyshyn tiv...@gmail.com:
 I'd say there are two Qs here:

 1) Modify costs based on information on how much of the table is in cache.
 It would be great  if this can be done, but I'd prefer to have it as admin
 knobs (because of plan stability). May be both admin and automatic ways can
 be followed with some parallel (disableable) process modify knobs on admin
 behalf. In this case different strategies to automatically modify knobs can
 be applied.

OS cache is usualy stable enough to keep your plans stable too, I think.


 2) Modify costs for part of table retrieval. Then you need to define part.
 Current ways are partitioning and partial indexes. Some similar to partial
 index thing may be created, that has only where clause and no data. But
 has statistics and knobs (and may be personal bufferspace if they are
 introduced). I don't like to gather data about last X percents or like,
 because it works only in clustering and it's hard for optimizer to decide if
 it will be enough to scan only this percents for given query.

Modifying random_page_cost and sequential_page_cost thanks to
statistics about cached blocks can be improved if we know the
distribution.

It does not mean : we know we have last 15% in cache, and we are goign
to request those 15%.


 Best regards, Vitalii Tymchyshyn

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Vitalii Tymchyshyn

12.11.10 12:56, Cédric Villemain написав(ла):

I supposed it was an answer to my mail but not sure... please keep
CC'ed people, it is easier to follow threads (at least for me)
   

OK

2010/11/12 Vitalii Tymchyshyntiv...@gmail.com:
   

I'd say there are two Qs here:

1) Modify costs based on information on how much of the table is in cache.
It would be great  if this can be done, but I'd prefer to have it as admin
knobs (because of plan stability). May be both admin and automatic ways can
be followed with some parallel (disableable) process modify knobs on admin
behalf. In this case different strategies to automatically modify knobs can
be applied.
 

OS cache is usualy stable enough to keep your plans stable too, I think.
   
Not if it is on edge. There are always edge cases where data fluctuates 
near some threshold.
   

2) Modify costs for part of table retrieval. Then you need to define part.
Current ways are partitioning and partial indexes. Some similar to partial
index thing may be created, that has only where clause and no data. But
has statistics and knobs (and may be personal bufferspace if they are
introduced). I don't like to gather data about last X percents or like,
because it works only in clustering and it's hard for optimizer to decide if
it will be enough to scan only this percents for given query.
 

Modifying random_page_cost and sequential_page_cost thanks to
statistics about cached blocks can be improved if we know the
distribution.

It does not mean : we know we have last 15% in cache, and we are goign
to request those 15%.
   


You mean *_cost for the whole table, don't you? That is case (1) for me.
Case (2) is when different cost values are selected based on what 
portion of table is requested in the query. E.g. when we have data for 
the whole day in one table, data for the last hour is cached and all the 
other data is not. Optimizer then may use different *_cost for query 
that requires all the data and for query that requires only last hour 
data. But, as I've said, that is much more complex task then (1).


Best regards, Vitalii Tymchyshyn


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Cédric Villemain
2010/11/12 Vitalii Tymchyshyn tiv...@gmail.com:
 12.11.10 12:56, Cédric Villemain написав(ла):

 I supposed it was an answer to my mail but not sure... please keep
 CC'ed people, it is easier to follow threads (at least for me)


 OK

 2010/11/12 Vitalii Tymchyshyntiv...@gmail.com:


 I'd say there are two Qs here:

 1) Modify costs based on information on how much of the table is in
 cache.
 It would be great  if this can be done, but I'd prefer to have it as
 admin
 knobs (because of plan stability). May be both admin and automatic ways
 can
 be followed with some parallel (disableable) process modify knobs on
 admin
 behalf. In this case different strategies to automatically modify knobs
 can
 be applied.


 OS cache is usualy stable enough to keep your plans stable too, I think.


 Not if it is on edge. There are always edge cases where data fluctuates near
 some threshold.

So far I did some analysis on the topic with pgfincore. Tables and
index first have peak and holes if you graph the % of blocks in cache
at the server start, but after a while, it is more stable.

Maybe there are applications where linux faill to find a 'stable' page cache.

If people are able to graph the pgfincore results for all or part of
the objects of their database it will give us more robust analysis.
Especially when corner case with the planner exists (like here).




 2) Modify costs for part of table retrieval. Then you need to define
 part.
 Current ways are partitioning and partial indexes. Some similar to
 partial
 index thing may be created, that has only where clause and no data. But
 has statistics and knobs (and may be personal bufferspace if they are
 introduced). I don't like to gather data about last X percents or like,
 because it works only in clustering and it's hard for optimizer to decide
 if
 it will be enough to scan only this percents for given query.


 Modifying random_page_cost and sequential_page_cost thanks to
 statistics about cached blocks can be improved if we know the
 distribution.

 It does not mean : we know we have last 15% in cache, and we are goign
 to request those 15%.


 You mean *_cost for the whole table, don't you? That is case (1) for me.

Yes.

 Case (2) is when different cost values are selected based on what portion of
 table is requested in the query. E.g. when we have data for the whole day in
 one table, data for the last hour is cached and all the other data is not.
 Optimizer then may use different *_cost for query that requires all the data
 and for query that requires only last hour data. But, as I've said, that is
 much more complex task then (1).

I need to think some more time of that.

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue. 

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

PROBLEM DECRIPTION
--
By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the active row and expires the old row that remains dead and then is
up to VACUUM procedure to recover the dead rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the live balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value! 

SUGGESTION
--
1) When a raw UPDATE is performed, store all new raw versions either
in separate temporary table space 
   or in a reserved space at the end of each table (can be allocated
dynamically) etc 
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location  
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES  
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step. 
8) After that VACUUM will have to deal only with deletions!!! 


I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware. 

I strongly believe that the outcome of the discussion regarding this
issue will be helpful. 

Best Regards, 

Kyriacos Kyriacou
Senior Developer/DBA



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kenneth Marshall
On Fri, Nov 12, 2010 at 03:47:30PM +0200, Kyriacos Kyriacou wrote:
 This is my first post in this mailing list and I would like to raise an
 issue that in my opinion is causing performance issues of PostgreSQL
 especially in a transaction processing environment. In my company we are
 using PostgreSQL for the last 8 year for our in-house developed billing
 system (telecom). The last few months we started considering moving to
 another RDBMS just because of this issue. 
 
 After all these years, I believe that the biggest improvement that could
 be done and will boost overall performance especially for enterprise
 application will be to improve Multiversion Concurrency Control (MVCC)
 mechanism. In theory this seems to be improving performance for SELECT
 queries but on tables with very intensive and frequent updates, even
 that is not fully true because of the fragmentation of data caused by
 MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
 as a buffer) took more than 40min to return a result! VACUUM is not a
 solution in my opinion even though after the introduction of autovacuum
 daemon situation got much better.
 
 PROBLEM DECRIPTION
 --
 By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
 new copy of the row in a new location. Any SELECT queries within the
 same session are accessing the new version of the raw and all other
 queries from other users are still accessing the old version. When
 transaction is COMMIT PostgreSQL makes the a new version of the row as
 the active row and expires the old row that remains dead and then is
 up to VACUUM procedure to recover the dead rows space and make it
 available to the database engine. In case that transaction is ROLLBACK
 then space reserved for the new version of the row is released. The
 result is to have huge fragmentation on table space, unnecessary updates
 in all affected indexes, unnecessary costly I/O operations, poor
 performance on SELECT that retrieves big record sets (i.e. reports etc)
 and slower updates. As an example, consider updating the live balance
 of a customer for each phone call where the entire customer record has
 to be duplicated again and again upon each call just for modifying a
 numeric value! 
 
 SUGGESTION
 --
 1) When a raw UPDATE is performed, store all new raw versions either
 in separate temporary table space 
or in a reserved space at the end of each table (can be allocated
 dynamically) etc 
 2) Any SELECT queries within the same session will be again accessing
 the new version of the row
 3) Any SELECT queries from other users will still be accessing the old
 version
 4) When UPDATE transaction is ROLLBACK just release the space used in
 new temporary location  
 5) When UPDATE transaction is COMMIT then try to LOCK the old version
 and overwrite it at the same physical location (NO FRAGMENTATION).
 6) Similar mechanism can be applied on INSERTS and DELETES  
 7) In case that transaction was COMMIT, the temporary location can be
 either released or archived/cleaned on a pre-scheduled basis. This will
 possibly allow the introduction of a TRANSACTION LOG backup mechanism as
 a next step. 
 8) After that VACUUM will have to deal only with deletions!!! 
 
 
 I understand that my suggestion seems to be too simplified and also that
 there are many implementation details and difficulties that I am not
 aware. 
 
 I strongly believe that the outcome of the discussion regarding this
 issue will be helpful. 
 
 Best Regards, 
 
 Kyriacos Kyriacou
 Senior Developer/DBA
 

I cannot speak to your suggestion, but it sounds like you are not
vacuuming enough and a lot of the bloat/randomization would be helped
by making use of HOT updates in which the updates are all in the same
page and are reclaimed almost immediately.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Thom Brown
On 12 November 2010 13:47, Kyriacos Kyriacou kyriac...@prime-tel.comwrote:

 This is my first post in this mailing list and I would like to raise an
 issue that in my opinion is causing performance issues of PostgreSQL
 especially in a transaction processing environment. In my company we are
 using PostgreSQL for the last 8 year for our in-house developed billing
 system (telecom). The last few months we started considering moving to
 another RDBMS just because of this issue.

 After all these years, I believe that the biggest improvement that could
 be done and will boost overall performance especially for enterprise
 application will be to improve Multiversion Concurrency Control (MVCC)
 mechanism. In theory this seems to be improving performance for SELECT
 queries but on tables with very intensive and frequent updates, even
 that is not fully true because of the fragmentation of data caused by
 MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
 as a buffer) took more than 40min to return a result! VACUUM is not a
 solution in my opinion even though after the introduction of autovacuum
 daemon situation got much better.

 PROBLEM DECRIPTION
 --
 By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
 new copy of the row in a new location. Any SELECT queries within the
 same session are accessing the new version of the raw and all other
 queries from other users are still accessing the old version. When
 transaction is COMMIT PostgreSQL makes the a new version of the row as
 the active row and expires the old row that remains dead and then is
 up to VACUUM procedure to recover the dead rows space and make it
 available to the database engine. In case that transaction is ROLLBACK
 then space reserved for the new version of the row is released. The
 result is to have huge fragmentation on table space, unnecessary updates
 in all affected indexes, unnecessary costly I/O operations, poor
 performance on SELECT that retrieves big record sets (i.e. reports etc)
 and slower updates. As an example, consider updating the live balance
 of a customer for each phone call where the entire customer record has
 to be duplicated again and again upon each call just for modifying a
 numeric value!

 SUGGESTION
 --
 1) When a raw UPDATE is performed, store all new raw versions either
 in separate temporary table space
   or in a reserved space at the end of each table (can be allocated
 dynamically) etc
 2) Any SELECT queries within the same session will be again accessing
 the new version of the row
 3) Any SELECT queries from other users will still be accessing the old
 version
 4) When UPDATE transaction is ROLLBACK just release the space used in
 new temporary location
 5) When UPDATE transaction is COMMIT then try to LOCK the old version
 and overwrite it at the same physical location (NO FRAGMENTATION).
 6) Similar mechanism can be applied on INSERTS and DELETES
 7) In case that transaction was COMMIT, the temporary location can be
 either released or archived/cleaned on a pre-scheduled basis. This will
 possibly allow the introduction of a TRANSACTION LOG backup mechanism as
 a next step.
 8) After that VACUUM will have to deal only with deletions!!!


 I understand that my suggestion seems to be too simplified and also that
 there are many implementation details and difficulties that I am not
 aware.

 I strongly believe that the outcome of the discussion regarding this
 issue will be helpful.


Which version of PostgreSQL are you basing this on?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


Re: [PERFORM] postmaster consuming /lots/ of memory with hash aggregate. why?

2010-11-12 Thread Jon Nelson
On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/11/12 Jon Nelson jnelson+pg...@jamponi.net:
 On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello

 look on EXPLAIN ANALYZE command. Probably your statistic are out, and
 then planner can be confused. EXPLAIN ANALYZE statement show it.

 As I noted earlier, I did set statistics to 1000 an re-ran vacuum
 analyze and the plan did not change.

 this change can do nothing. this is default in config. did you use
 ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE

No. To be clear: are you saying that changing the value for
default_statistics_target, restarting postgresql, and re-running
VACUUM ANALYZE does *not* change the statistics for columns
created/populated *prior* to the sequence of operations, and that one
/must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE?

That does not jive with the documentation, which appears to suggest
that setting a new default_statistics_target, restarting postgresql,
and then re-ANALYZE'ing a table should be sufficient (provided the
columns have not had a statistics target explicitly set).

 What other diagnostics can I provide? This still doesn't answer the
 4 row question, though. It seems absurd to me that the planner
 would give up and just use 4 rows (0.02 percent of the actual
 result).


 there can be some not well supported operation, then planner use a
 some % from rows without statistic based estimation

The strange thing is that the value 4 keeps popping up in totally
diffferent contexts, with different tables, databases, etc... I tried
digging through the code and the only thing I found was that numGroups
was being set to 4 but I couldn't see where.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread bricklen
On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote:

 I cannot speak to your suggestion, but it sounds like you are not
 vacuuming enough and a lot of the bloat/randomization would be helped
 by making use of HOT updates in which the updates are all in the same
 page and are reclaimed almost immediately.

 Regards,
 Ken

IIRC, HOT only operates on non-indexed columns, so if you the tables
are heavily indexed you won't get the full benefit of HOT. I could be
wrong though.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kenneth Marshall
On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote:
 On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote:
 
  I cannot speak to your suggestion, but it sounds like you are not
  vacuuming enough and a lot of the bloat/randomization would be helped
  by making use of HOT updates in which the updates are all in the same
  page and are reclaimed almost immediately.
 
  Regards,
  Ken
 
 IIRC, HOT only operates on non-indexed columns, so if you the tables
 are heavily indexed you won't get the full benefit of HOT. I could be
 wrong though.
 

That is true, but if they are truly having as big a bloat problem
as the message indicated, it would be worth designing the schema
to leverage HOT for the very frequent updates.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Vitalii Tymchyshyn

12.11.10 15:47, Kyriacos Kyriacou написав(ла):

PROBLEM DECRIPTION
--
As an example, consider updating the live balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value!
   
Have you considered splitting customer record into two tables with 
mostly read-only data and with data that is updated often? Such 1-1 
relationship can make a huge difference to performance in your case. You 
can even try to simulate old schema by using an updateable view.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postmaster consuming /lots/ of memory with hash aggregate. why?

2010-11-12 Thread Pavel Stehule
2010/11/12 Jon Nelson jnelson+pg...@jamponi.net:
 On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/11/12 Jon Nelson jnelson+pg...@jamponi.net:
 On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello

 look on EXPLAIN ANALYZE command. Probably your statistic are out, and
 then planner can be confused. EXPLAIN ANALYZE statement show it.

 As I noted earlier, I did set statistics to 1000 an re-ran vacuum
 analyze and the plan did not change.

 this change can do nothing. this is default in config. did you use
 ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE

 No. To be clear: are you saying that changing the value for
 default_statistics_target, restarting postgresql, and re-running
 VACUUM ANALYZE does *not* change the statistics for columns
 created/populated *prior* to the sequence of operations, and that one
 /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE?


yes.

but I was wrong. Documentation is correct. Problem is elsewhere.

 That does not jive with the documentation, which appears to suggest
 that setting a new default_statistics_target, restarting postgresql,
 and then re-ANALYZE'ing a table should be sufficient (provided the
 columns have not had a statistics target explicitly set).


 What other diagnostics can I provide? This still doesn't answer the
 4 row question, though. It seems absurd to me that the planner
 would give up and just use 4 rows (0.02 percent of the actual
 result).


 there can be some not well supported operation, then planner use a
 some % from rows without statistic based estimation

 The strange thing is that the value 4 keeps popping up in totally
 diffferent contexts, with different tables, databases, etc... I tried
 digging through the code and the only thing I found was that numGroups
 was being set to 4 but I couldn't see where.



if I remember well, you can set a number of group by ALTER TABLE ALTER
COLUMN SET n_distinct = ..

maybe you use it.

Regards

Pavel Stehule

http://www.postgresql.org/docs/9.0/interactive/sql-altertable.html




 --
 Jon

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
 

We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
database size is over 200Gb so upgrade is not an easy decision! 

I have it in my plans so in next few months I will setup new servers and
upgrade to version 9. 


 Which version of PostgreSQL are you basing this on?


-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935



Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
This was done already as a workaround after identifying this problem. 
I just gave it as an example.

-Original Message-
From: Vitalii Tymchyshyn [mailto:tiv...@gmail.com] 
Sent: Friday, November 12, 2010 5:54 PM
To: Kyriacos Kyriacou
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] MVCC performance issue

12.11.10 15:47, Kyriacos Kyriacou написав(ла):
 PROBLEM DECRIPTION
 --
 As an example, consider updating the live balance
 of a customer for each phone call where the entire customer record has
 to be duplicated again and again upon each call just for modifying a
 numeric value!

Have you considered splitting customer record into two tables with 
mostly read-only data and with data that is updated often? Such 1-1 
relationship can make a huge difference to performance in your case. You 
can even try to simulate old schema by using an updateable view.

Best regards, Vitalii Tymchyshyn



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Andy Colson

On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote:


SUGGESTION
--
1) When a raw UPDATE is performed, store all new raw versions either
in separate temporary table space
or in a reserved space at the end of each table (can be allocated
dynamically) etc


Your use of raw is confusing.  I'll just ignore the word.  New row 
versions are already stored in a dynamically allocated spot, right along 
with the other versions of the table.  You are assuming that getting to 
the correct version of the row is very slow?  That's only going to be 
the case if you have lots and lots of versions.  And your solution will 
not actually help if there are lots of versions.  While one person who 
is hitting the most recent version might be ok, everyone else will still 
have to search for theirs.  Just as they do now.



2) Any SELECT queries within the same session will be again accessing
the new version of the row


I don't see how this is different from what we currently have.  same 
session could have been dropped from your separate table space, and 
then you'd have to go search through previous versions of the row... 
exactly like you do now.


And worse, if you dont want to drop your version of the row from the 
separate table space until you commit/rollback, then no other user can 
start a transaction on that table until your done!  oh no!  You have 
reads and writes blocking each other.



3) Any SELECT queries from other users will still be accessing the old
version


Again.. the same.


4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location


current layout makes rollback very very fast.


5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).


Not sure what you mean by lock, but lock requires single user access and 
slow's things down.  Right now we just bump the most active transaction 
number, which is very efficient, and requires no locks.  As soon as you 
lock anything, somebody, by definition, has to wait.




6) Similar mechanism can be applied on INSERTS and DELETES
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step.


You are kind of assuming there will only ever be one new transaction, 
and one old transaction.  What about a case where 10 people start a 
transaction, and there are 10 versions of the row?



It seems to me like you are using very long transactions, which is 
causing lots of row versions to show up.  Have you run explain analyze 
on your slow querys to find out the problems?


Have you checked to see if you are cpu bound or io bound?  If you are 
dealing with lots of row versions, I'd assume you are cpu bound.  If you 
check your system though, and see you are io bound, I think that might 
invalidate your assumptions above.


MVCC makes multi user access very nice because readers and writers dont 
block each other, and there are very few locks.  It does come with some 
kinks (gotta vacuum, keep transactions short, you must commit, etc).


select count(*) for example is always going to be slow... just expect 
it, lets not destroy what works well about the database just to make it 
fast.  Instead, find a better alternative so you dont have to run it.


Just like any database, you have to work within MVCC's good points and 
try to avoid the bad spots.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kenneth Marshall
Ah, this is a very old version. If you can take advantage of
a version with HOT support, you should be much, much happier.

Cheers,
Ken

On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote:
  
 
 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
 database size is over 200Gb so upgrade is not an easy decision! 
 
 I have it in my plans so in next few months I will setup new servers and
 upgrade to version 9. 
 
 
  Which version of PostgreSQL are you basing this on?
 
 
 -- 
 Thom Brown
 Twitter: @darkixion
 IRC (freenode): dark_ixion
 Registered Linux user: #516935
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Thom Brown
On 12 November 2010 16:14, Kyriacos Kyriacou kyriac...@prime-tel.comwrote:



 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
 database size is over 200Gb so upgrade is not an easy decision!

 I have it in my plans so in next few months I will setup new servers and
 upgrade to version 9.


Everything changed, performance-wise, in 8.3, and there have also been
improvements since then too.  So rather than completely changing your
database platform, at least take a look at what work has gone into Postgres
since the version you're using.
http://www.postgresql.org/docs/8.3/static/release-8-3.html#AEN87319

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Ben Chobot
On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote:

 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and 
 database size is over 200Gb so upgrade is not an easy decision!

This is why we have slony, so you can slowly upgrade your 200Gb while you're 
live and then only suffer a minute or so of downtime while you switchover. Even 
if you only install slony for the point of the upgrade and then uninstall it 
after you're done, that seems well worth it to me rather than running on 8.2.4 
for a while.

Note there were some changes between 8.2 and 8.3 in regards to casting that 
might make you revisit your application.

Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Tom Lane
Kyriacos Kyriacou kyriac...@prime-tel.com writes:
 We are still using PostgreSQL 8.2.4.

In that case you don't have HOT updates, so it seems to me to be a
little premature to be proposing a 100% rewrite of the system to fix
your problems.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Robert Haas
On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 I wondering if we could do something with a formula like 3 *
 amount_of_data_to_read / (3 * amount_of_data_to_read +
 effective_cache_size) = percentage NOT cached.  That is, if we're
 reading an amount of data equal to effective_cache_size, we assume 25%
 caching, and plot a smooth curve through that point.  In the examples
 above, we would assume that a 150MB read is 87% cached, a 1GB read is
 50% cached, and a 3GB read is 25% cached.

 But isn't it already the behavior of effective_cache_size usage ?

No.

The ideal of trying to know what is actually in cache strikes me as an
almost certain non-starter.  It can change very quickly, even as a
result of the query you're actually running.  And getting the
information we'd need in order to do it that way would be very
expensive, when it can be done at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
To be honest I just now read about HOT (Heap Overflow Tuple) and it
seems that will help a lot. Thanks for your point.

Kyriacos

-Original Message-
From: Kenneth Marshall [mailto:k...@rice.edu] 
Sent: Friday, November 12, 2010 6:22 PM
To: Kyriacos Kyriacou
Cc: Thom Brown; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] MVCC performance issue

Ah, this is a very old version. If you can take advantage of
a version with HOT support, you should be much, much happier.

Cheers,
Ken

On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote:
  
 
 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
 database size is over 200Gb so upgrade is not an easy decision! 
 
 I have it in my plans so in next few months I will setup new servers
and
 upgrade to version 9. 
 
 
  Which version of PostgreSQL are you basing this on?
 
 
 -- 
 Thom Brown
 Twitter: @darkixion
 IRC (freenode): dark_ixion
 Registered Linux user: #516935
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 I wondering if we could do something with a formula like 3 *
 amount_of_data_to_read / (3 * amount_of_data_to_read +
 effective_cache_size) = percentage NOT cached.  That is, if we're
 reading an amount of data equal to effective_cache_size, we assume 25%
 caching, and plot a smooth curve through that point.  In the examples
 above, we would assume that a 150MB read is 87% cached, a 1GB read is
 50% cached, and a 3GB read is 25% cached.

 But isn't it already the behavior of effective_cache_size usage ?

 No.

I think his point is that we already have a proven formula
(Mackert-Lohmann) and shouldn't be inventing a new one out of thin air.
The problem is to figure out what numbers to apply the M-L formula to.

I've been thinking that we ought to try to use it in the context of the
query as a whole rather than for individual table scans; the current
usage already has some of that flavor but we haven't taken it to the
logical conclusion.

 The ideal of trying to know what is actually in cache strikes me as an
 almost certain non-starter.

Agreed on that point.  Plan stability would go out the window.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
My suggestion had just a single difference from what currently MVCC is
doing (btw I never said that MVCC is bad). 

NOW === on COMMIT previous version record is expired and the 
 new version record (created in new dynamically allocated 
 spot, as you said) is set as active
 
MY  === on COMMIT, to update new version data over the same physical
location that initial version was 
 and release the space used to keep the new version (that was
dynamically allocated).

The rest are all the same! I do not think that this is breaking anything
and I still believe that this might help.

I will try to plan upgrade the soonest possible to the newest version.
Reading few words about HOT updates 
it seems that more or less is similar to what I have described and will
be very helpful.

Kyriacos

 -Original Message-
 From: Andy Colson [mailto:a...@squeakycode.net]
 Sent: Friday, November 12, 2010 6:22 PM
 To: Kyriacos Kyriacou
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] MVCC performance issue
 
 On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote:
 
  SUGGESTION
  --
  1) When a raw UPDATE is performed, store all new raw versions
either
  in separate temporary table space
  or in a reserved space at the end of each table (can be
allocated
  dynamically) etc
 
 Your use of raw is confusing.  I'll just ignore the word.  New row
 versions are already stored in a dynamically allocated spot, right
along
 with the other versions of the table.  You are assuming that getting
to
 the correct version of the row is very slow?  That's only going to
be
 the case if you have lots and lots of versions.  And your solution
will
 not actually help if there are lots of versions.  While one person who
 is hitting the most recent version might be ok, everyone else will
still
 have to search for theirs.  Just as they do now.
 
  2) Any SELECT queries within the same session will be again
accessing
  the new version of the row
 
 I don't see how this is different from what we currently have.  same
 session could have been dropped from your separate table space, and
 then you'd have to go search through previous versions of the row...
 exactly like you do now.
 
 And worse, if you dont want to drop your version of the row from the
 separate table space until you commit/rollback, then no other user can
 start a transaction on that table until your done!  oh no!  You have
 reads and writes blocking each other.
 
  3) Any SELECT queries from other users will still be accessing the
old
  version
 
 Again.. the same.
 
  4) When UPDATE transaction is ROLLBACK just release the space used
in
  new temporary location
 
 current layout makes rollback very very fast.
 
  5) When UPDATE transaction is COMMIT then try to LOCK the old
version
  and overwrite it at the same physical location (NO FRAGMENTATION).
 
 Not sure what you mean by lock, but lock requires single user access
and
 slow's things down.  Right now we just bump the most active
transaction
 number, which is very efficient, and requires no locks.  As soon as
you
 lock anything, somebody, by definition, has to wait.
 
 
  6) Similar mechanism can be applied on INSERTS and DELETES
  7) In case that transaction was COMMIT, the temporary location can
be
  either released or archived/cleaned on a pre-scheduled basis. This
will
  possibly allow the introduction of a TRANSACTION LOG backup
mechanism as
  a next step.
 
 You are kind of assuming there will only ever be one new transaction,
 and one old transaction.  What about a case where 10 people start a
 transaction, and there are 10 versions of the row?
 
 
 It seems to me like you are using very long transactions, which is
 causing lots of row versions to show up.  Have you run explain analyze
 on your slow querys to find out the problems?
 
 Have you checked to see if you are cpu bound or io bound?  If you are
 dealing with lots of row versions, I'd assume you are cpu bound.  If
you
 check your system though, and see you are io bound, I think that might
 invalidate your assumptions above.
 
 MVCC makes multi user access very nice because readers and writers
dont
 block each other, and there are very few locks.  It does come with
some
 kinks (gotta vacuum, keep transactions short, you must commit, etc).
 
 select count(*) for example is always going to be slow... just expect
 it, lets not destroy what works well about the database just to make
it
 fast.  Instead, find a better alternative so you dont have to run it.
 
 Just like any database, you have to work within MVCC's good points and
 try to avoid the bad spots.
 
 -Andy



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Scott Marlowe
On Fri, Nov 12, 2010 at 9:22 AM, Thom Brown t...@linux.com wrote:
 On 12 November 2010 16:14, Kyriacos Kyriacou kyriac...@prime-tel.com
 wrote:



 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
 database size is over 200Gb so upgrade is not an easy decision!

 I have it in my plans so in next few months I will setup new servers and
 upgrade to version 9.

 Everything changed, performance-wise, in 8.3, and there have also been
 improvements since then too.  So rather than completely changing your
 database platform, at least take a look at what work has gone into Postgres
 since the version you're using.

Agreed.  8.3 was a colossal step forward for pg performance.  8.4 was
a huge step ahead in maintenance with on disk fsm.  If I was upgrading
from 8.2 today I would go straight to 8.4 and skip 8.3 since it's a
much bigger pain in the butt to configure for fsm stuff.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Scott Marlowe
OK, in general you have to pay for MVCC one way or another.  Many
databases make you pay as you go, so to speak, by storing all the MVCC
info in a log to be applied at some future date.  Other databases you
can pay later, by storing all the MVCC in the table itself.  Both have
similar costs, but one can punish you harshly if you let the MVCC data
stored in the database get out of hand.

8.3 and above are much more aggresive about autovacuuming, and on
bigger hardware you can make it VERY aggressive and keep the bloat out
while keeping up good throughput.  On some servers I set up 4 or 6 or
8 autovacuum threads to keep up.If you were on another db you
might be adding more drives to make some other part faster.

For batch processing storing all MVCC data in the data store can be
problematic, but for more normal work where you're changing 1% of a
table all the time it can be very fast.

Some other databases will just run out of space to store transactions
and roll back everything you've done.  PostgreSQL will gladly let you
shoot yourself in the foot with bloating the data store by running
successive whole table updates without vacuuming in between.

Bottom line, if your hardware can't keep up, it can't keep up.  If
vacuum capsizes your IO and still can't keep up then you need more
disks and / or better storage subsystems.  A 32 disk array with single
controller goes for ~$7 to $10k, and you can sustain some pretty
amazing thgouhput on that kind of IO subsystem.

If you're doing batch processing you can get a lot return by just
making sure you vacuum after each mass update.  Especially if you are
on a single use machine with no cost delays for vacuum, running a
vacuum on a freshly worked table should be pretty fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Scott Marlowe
On Fri, Nov 12, 2010 at 9:19 AM, Ben Chobot be...@silentmedia.com wrote:
 On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote:

 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and
 database size is over 200Gb so upgrade is not an easy decision!

 This is why we have slony, so you can slowly upgrade your 200Gb while you're
 live and then only suffer a minute or so of downtime while you switchover.
 Even if you only install slony for the point of the upgrade and then
 uninstall it after you're done, that seems well worth it to me rather than
 running on 8.2.4 for a while.
 Note there were some changes between 8.2 and 8.3 in regards to casting that
 might make you revisit your application.

I work in a slony shop and we used slony to upgrade from 8.2 to 8.3
and it was a breeze.  Course we practiced on some test machines first,
but it went really smoothly.  Our total downtime, due to necessary
testing before going live again, was less than 20 mintues.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Scott Carey
HOT also usually requires setting FILLFACTOR to something other than the 
default for your table, so that there is guaranteed room in the page to modify 
data without allocating a new page.

If you have fillfactor=75, then basically this proposal is already done -- each 
page has 25% temp space for updates in it.  With the caveat that that is only 
true if the updates are to columns without indexes.
On Nov 12, 2010, at 7:37 AM, Kenneth Marshall wrote:

 On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote:
 On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote:
 
 I cannot speak to your suggestion, but it sounds like you are not
 vacuuming enough and a lot of the bloat/randomization would be helped
 by making use of HOT updates in which the updates are all in the same
 page and are reclaimed almost immediately.
 
 Regards,
 Ken
 
 IIRC, HOT only operates on non-indexed columns, so if you the tables
 are heavily indexed you won't get the full benefit of HOT. I could be
 wrong though.
 
 
 That is true, but if they are truly having as big a bloat problem
 as the message indicated, it would be worth designing the schema
 to leverage HOT for the very frequent updates.
 
 Cheers,
 Ken
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Scott Carey

On Nov 12, 2010, at 9:13 AM, Kyriacos Kyriacou wrote:

 My suggestion had just a single difference from what currently MVCC is
 doing (btw I never said that MVCC is bad). 
 
 NOW === on COMMIT previous version record is expired and the 
 new version record (created in new dynamically allocated 
 spot, as you said) is set as active
 
 MY  === on COMMIT, to update new version data over the same physical
 location that initial version was 
 and release the space used to keep the new version (that was
 dynamically allocated).

But what about other transactions that can still see the old version?

You can't overwrite the old data if there are any other transactions open in 
the system at all.  You have to have a mechanism to keep the old copy around 
for a while.

 
 The rest are all the same! I do not think that this is breaking anything
 and I still believe that this might help.
 
 I will try to plan upgrade the soonest possible to the newest version.
 Reading few words about HOT updates 
 it seems that more or less is similar to what I have described and will
 be very helpful.
 
 Kyriacos


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Robert Haas
On Fri, Nov 12, 2010 at 11:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think his point is that we already have a proven formula
 (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air.
 The problem is to figure out what numbers to apply the M-L formula to.

I'm not sure that's really measuring the same thing, although I'm not
opposed to using it if it produces reasonable answers.

 I've been thinking that we ought to try to use it in the context of the
 query as a whole rather than for individual table scans; the current
 usage already has some of that flavor but we haven't taken it to the
 logical conclusion.

That's got a pretty severe chicken-and-egg problem though, doesn't it?
 You're going to need to know how much data you're touching to
estimate the costs so you can pick the best plan, but you can't know
how much data will ultimately be touched until you've got the whole
plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] questions regarding shared_buffers behavior

2010-11-12 Thread Robert Haas
On Sun, Nov 7, 2010 at 10:03 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/11/8 Mark Rostron mrost...@ql2.com:
 
  What is the procedure that postgres uses to decide whether or not a
  table/index block will be left in the shared_buffers cache at the end
  of the operation?
 

 The only special cases are for sequential scans and VACUUM, which use 
 continuously re-use a small section of the buffer cache in some cases 
 instead.

 Thanks - the part about sequential scans and the re-use of a small section 
 of shared_buffers is the bit I was interested in.
 I don't suppose you would be able to tell me how large that re-useable area 
 might be?

 There are 256KB per seqscan and 256KB per vacuum.

 I suggest you to go reading src/backend/storage/buffer/README

Note that there is a different, higher limit for the bulk write
strategy when using COPY IN or CTAS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why dose the planner select one bad scan plan.

2010-11-12 Thread 静安寺
Thanks for your answer! And I am sorry for trading the question as a bug, and 
send it to 'bugs' mailing-list. 
 
But I doubt your answer. I think the essence of the problem is when the planner 
selects 'Bitmap Index Scan' and how the planner computes the cost of 'Bitmap 
Index Scan'. 
 
Tom Lane said “In principle a bitmap index scan should be significantly faster 
if the index can return the bitmap more or less natively rather than having 
to construct it. My recollection though is that a significant amount of work is 
needed to make that happen, and that there is no existing patch that tackled 
the problem. So I'm not sure that this report should be taken as indicating 
that there's no chance of a SELECT performance improvement. What it does say is 
that we have to do that work if we want to make bitmap indexes useful.”
 
Okay, I want to know how the planner computes the cost of constructing bitmap. 
And when the planner computes the cost of 'Bitmap Index Scan', if it considers 
the influence of memory cache? As when I do not clear the memory cache, I find 
the 'Bitmap Index Scan' is real fast than 'Seq Scan'.
 
 Best Regards!
 
Asen

[PERFORM] MVCC performance issue

2010-11-12 Thread Kyriacos Kyriacou
This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue. 

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

PROBLEM DECRIPTION
--
By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the active row and expires the old row that remains dead and then is
up to VACUUM procedure to recover the dead rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the live balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value! 

SUGGESTION
--
1) When a raw UPDATE is performed, store all new raw versions either
in separate temporary table space 
   or in a reserved space at the end of each table (can be allocated
dynamically) etc 
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location  
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES  
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step. 
8) After that VACUUM will have to deal only with deletions!!! 


I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware. 

I strongly believe that the outcome of the discussion regarding this
issue will be helpful. 

Best Regards, 

Kyriacos Kyriacou
Senior Developer/DBA



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] autovacuum blocks the operations of other manual vacuum

2010-11-12 Thread kuopo
Hi,

I have a question about the behavior of autovacuum. When I have a big
table A which is being processed by autovacuum, I also manually use
(full) vacuum to clean another table B. Then I found that I always got
something like “found 0 removable, 14283 nonremovable row”. However,
if I stop the autovacuum functionality and use vacuum on that big
table A manually, I can clean table B (ex. found 22615 removable, 2049
nonremovable row).

Is this correct? Why do vacuum and autovacuum have different actions?

Ps. My postgreSQL is 8.4.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] temporary tables, indexes, and query plans

2010-11-12 Thread Jon Nelson
On Thu, Oct 28, 2010 at 9:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jon Nelson jnelson+pg...@jamponi.net writes:
 On Wed, Oct 27, 2010 at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It thinks it's faster, or there is some reason why it *can't* use the
 index, like a datatype mismatch.  You could tell which by trying set
 enable_seqscan = off to see if that will make it change to another
 plan; if so, the estimated costs of that plan versus the original
 seqscan would be valuable information.

 When I place the index creation and ANALYZE right after the bulk
 update, follow it with 'set enable_seqscan = false', the next query
 (also an UPDATE - should be about 7 rows) results in this plan:

 Seq Scan on foo_table  (cost=100.00..1004998.00 rows=24 
 width=236)

 OK, so it thinks it can't use the index.  (The cost=100 bit is
 the effect of enable_seqscan = off: it's not possible to just never use
 seqscans, but we assign an artificially high cost to discourage the
 planner from selecting them if there's any other alternative.)

 So we're back to wondering why it can't use the index.  I will say
 once more that we could probably figure this out quickly if you'd
 post an exact example instead of handwaving.

OK. This is a highly distilled example that shows the behavior.
The ANALYZE doesn't appear to change anything, nor the SET STATISTICS
(followed by ANALYZE), nor disabling seqential scans. Re-writing the
table with ALTER TABLE does, though.
If the initial UPDATE (the one before the index creation) is commented
out, then the subsequent updates don't use sequential scans.

\timing off
BEGIN;
CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b,
''::text AS c from generate_series(1,500) AS x;
UPDATE foo SET c = 'foo' WHERE b = 'A' ;
CREATE INDEX foo_b_idx on foo (b);

-- let's see what it looks like
EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C';

-- does forcing a seqscan off help?
set enable_seqscan = false;
EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C';

-- what about analyze?
ANALYZE VERBOSE foo;
EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C';

-- what about statistics?
ALTER TABLE foo ALTER COLUMN b SET STATISTICS 1;
ANALYZE VERBOSE foo;
EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C';

-- let's re-write the table
ALTER TABLE foo ALTER COLUMN a TYPE int;
EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C';

ROLLBACK;

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Cédric Villemain
2010/11/12 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 I wondering if we could do something with a formula like 3 *
 amount_of_data_to_read / (3 * amount_of_data_to_read +
 effective_cache_size) = percentage NOT cached.  That is, if we're
 reading an amount of data equal to effective_cache_size, we assume 25%
 caching, and plot a smooth curve through that point.  In the examples
 above, we would assume that a 150MB read is 87% cached, a 1GB read is
 50% cached, and a 3GB read is 25% cached.

 But isn't it already the behavior of effective_cache_size usage ?

 No.

 I think his point is that we already have a proven formula
 (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air.
 The problem is to figure out what numbers to apply the M-L formula to.

 I've been thinking that we ought to try to use it in the context of the
 query as a whole rather than for individual table scans; the current
 usage already has some of that flavor but we haven't taken it to the
 logical conclusion.

 The ideal of trying to know what is actually in cache strikes me as an
 almost certain non-starter.

 Agreed on that point.  Plan stability would go out the window.

Point is not to now the current cache, but like for ANALYZE on a
regular basis (probably something around number of page read/hit) run
a cache_analyze which report stats like ANALYZE do, and may be
adjusted per table like auto_analyze is.

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Craig Ringer

On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:


 The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates.


Yep. It's all about trade-offs. For some workloads the in-table MVCC 
storage setup works pretty darn poorly, but for most it seems to work 
quite well.


There are various other methods of implementing relational storage with 
ACID properties. You can exclude all other transactions while making a 
change to a table, ensuring that nobody else can see old or new rows 
so there's no need to keep them around. You can use an out-of-line redo 
log (a-la Oracle). Many other methods exist, too.


They all have advantages and disadvantages for different workloads. It's 
far from trivial to mix multiple schemes within a single database, so 
mixing and matching schemes for different parts of your DB isn't 
generally practical.



1) When a raw UPDATE is performed, store all new raw versions either
in separate temporary table space
or in a reserved space at the end of each table (can be allocated
dynamically) etc


OK, so you want a redo log a-la Oracle?


2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
version


... and incurring horrible random I/O penalties if the redo log doesn't 
fit in RAM. Again, a-la Oracle.


Even read-only transactions have to hit the undo log if there's an 
update in progress, because rows they need may have been moved out to 
the undo log as they're updated in the main table storage.


[snip description]


I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not
aware.


It sounds like you're describing Oracle-style MVCC, using redo logs.

http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/

http://en.wikipedia.org/wiki/Multiversion_concurrency_control

Oracle's MVCC approach has its own costs. Like Pg's, those costs 
increase with update/delete frequency. Instead of table bloat, Oracle 
suffers from redo log growth (or redo log size management issues). 
Instead of increased table scan costs from dead rows, Oracle suffers 
from random I/O costs as it looks up the out-of-line redo log for old 
rows. Instead of long-running writer transactions causing table bloat, 
Oracle can have problems with long-running reader transactions aborting 
when the redo log runs out of space.


Personally, I don't know enough to know which is better. I suspect 
they're just different, with different trade-offs. If redo logs allow 
you  to do without write-ahead logging, that'd be interesting - but 
then, the WAL is useful for all sorts of replication options, and the 
use of linear WALs means that write ordering in the tables doesn't need 
to be as strict, which has performance advantages.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Rich
In reading what you are describing, don't you think PG 9 goes a long way to
helping you out?

On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer
cr...@postnewspapers.com.auwrote:

 On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:

   The
 result is to have huge fragmentation on table space, unnecessary updates
 in all affected indexes, unnecessary costly I/O operations, poor
 performance on SELECT that retrieves big record sets (i.e. reports etc)
 and slower updates.


 Yep. It's all about trade-offs. For some workloads the in-table MVCC
 storage setup works pretty darn poorly, but for most it seems to work quite
 well.

 There are various other methods of implementing relational storage with
 ACID properties. You can exclude all other transactions while making a
 change to a table, ensuring that nobody else can see old or new rows so
 there's no need to keep them around. You can use an out-of-line redo log
 (a-la Oracle). Many other methods exist, too.

 They all have advantages and disadvantages for different workloads. It's
 far from trivial to mix multiple schemes within a single database, so mixing
 and matching schemes for different parts of your DB isn't generally
 practical.


  1) When a raw UPDATE is performed, store all new raw versions either
 in separate temporary table space
or in a reserved space at the end of each table (can be allocated
 dynamically) etc


 OK, so you want a redo log a-la Oracle?


  2) Any SELECT queries within the same session will be again accessing
 the new version of the row
 3) Any SELECT queries from other users will still be accessing the old
 version


 ... and incurring horrible random I/O penalties if the redo log doesn't fit
 in RAM. Again, a-la Oracle.

 Even read-only transactions have to hit the undo log if there's an update
 in progress, because rows they need may have been moved out to the undo log
 as they're updated in the main table storage.

 [snip description]


  I understand that my suggestion seems to be too simplified and also that
 there are many implementation details and difficulties that I am not
 aware.


 It sounds like you're describing Oracle-style MVCC, using redo logs.


 http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/

 http://en.wikipedia.org/wiki/Multiversion_concurrency_control

 Oracle's MVCC approach has its own costs. Like Pg's, those costs increase
 with update/delete frequency. Instead of table bloat, Oracle suffers from
 redo log growth (or redo log size management issues). Instead of increased
 table scan costs from dead rows, Oracle suffers from random I/O costs as it
 looks up the out-of-line redo log for old rows. Instead of long-running
 writer transactions causing table bloat, Oracle can have problems with
 long-running reader transactions aborting when the redo log runs out of
 space.

 Personally, I don't know enough to know which is better. I suspect
 they're just different, with different trade-offs. If redo logs allow you
  to do without write-ahead logging, that'd be interesting - but then, the
 WAL is useful for all sorts of replication options, and the use of linear
 WALs means that write ordering in the tables doesn't need to be as strict,
 which has performance advantages.

 --
 Craig Ringer


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance