Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-31 Thread Robert Haas
On Mon, Mar 31, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 Enclosed is the patch to implement the requirement that issue log message to
 suggest VACUUM FULL if a table is nearly empty.

 The requirement comes from the Postgresql TODO list.

 If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
 the table is considered to be large enough.

 If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
 then the table is considered to have large numbers of unused rows.

 I'm not sure that we want people to automatically VF a table just
 because it's 2x bloated.  Doesn't it depend on the table size?  And in
 sort of a funny way, too, like, if the tables is small, 2x bloat is
 not wasting much disk space, but getting rid of it is probably easy,
 so maybe you should - but if the table is a terabyte, even 50% bloat
 might be pretty intolerable, but whether it makes sense to try to get
 rid of it depends on your access pattern.  I'm not really too sure
 whether it makes sense to try to make an automated recommendation
 here, or maybe only in egregious cases.

 I think here main difficulty is to decide when it will be considered good
 to display such a message. As you said, that it depends on access pattern
 whether 50% bloat is tolerable or not, so one way could be to increase the
 bloat limit and table size threshold to higher value (bloat - 80%,
 table_size = 500M) where it would make sense to  recommend VF for all cases
 or another way could be to consider using some auto vacuum threshold parameter
 like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
 this message. I think parameter like scale factor can make sense as to an 
 extent
 this parameter is an indicative of how much dead space percentage is tolerable
 for user.

I don't think there's a very direct relationship between those things.
 One of the problems we repeatedly encounter is that the scale factor
only governs when the table becomes eligible to be vacuumed; once that
happens, it takes some amount of time - ideally 1 minute but more if
all workers are busy or if autovacuum_naptime has unfortunately been
increased - for the vacuum to start, and then more time after that for
the vacuum to finish.  I think the latter is really the kicker.  Even
if your system is relatively well-tuned, a big table takes a long time
to vacuum, and you're going to continue accumulating bloat while the
vacuum is running.

Another aspect of my ambivalence about this is that VACUUM FULL tends
to get overused as it is.  If we start making automated
recommendations in that direction, it might cause people to lean that
way even further, which would not, on the whole, be a good thing.  On
the other hand, if the table is 80% dead space, it's a pretty good bet
that a VACUUM FULL is needed.  Even there, though, the VACUUM FULL may
be a pretty temporary fix unless the user also fixes the underlying
issue that caused the table bloat to accumulate in the first place.
Sometimes bloat is caused by a one-off issue, like one long-running
query.  But sometimes it's caused by something systematic, like
setting the cost limit too low or the nap time too high.  Just telling
the user to run VACUUM FULL is likely to make the user conclude that
PostgreSQL sucks, I have to keep running VACUUM FULL all the time,
taking a full-table lock.  Of course, really giving the user a useful
level of information here is probably impractical in a log message
anyway, but that doesn't mean giving them too little information to do
something useful is better.

Yet another thing that bothers me about this is that the table might
already be getting vacuumed very frequently.  If you start getting
this message from autovac once per minute, you're going to think
that's pretty stupid - especially after you try VACUUM FULL and the
problem comes right back because of constant update pressure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-31 Thread Amit Kapila
On Tue, Apr 1, 2014 at 12:24 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 31, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas robertmh...@gmail.com wrote:
  I'm not really too sure
 whether it makes sense to try to make an automated recommendation
 here, or maybe only in egregious cases.

 I think here main difficulty is to decide when it will be considered good
 to display such a message. As you said, that it depends on access pattern
 whether 50% bloat is tolerable or not, so one way could be to increase the
 bloat limit and table size threshold to higher value (bloat - 80%,
 table_size = 500M) where it would make sense to  recommend VF for all cases
 or another way could be to consider using some auto vacuum threshold 
 parameter
 like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
 this message. I think parameter like scale factor can make sense as to an 
 extent
 this parameter is an indicative of how much dead space percentage is 
 tolerable
 for user.



 Another aspect of my ambivalence about this is that VACUUM FULL tends
 to get overused as it is.  If we start making automated
 recommendations in that direction, it might cause people to lean that
 way even further, which would not, on the whole, be a good thing.  On
 the other hand, if the table is 80% dead space, it's a pretty good bet
 that a VACUUM FULL is needed.  Even there, though, the VACUUM FULL may
 be a pretty temporary fix unless the user also fixes the underlying
 issue that caused the table bloat to accumulate in the first place.
 Sometimes bloat is caused by a one-off issue, like one long-running
 query.  But sometimes it's caused by something systematic, like
 setting the cost limit too low or the nap time too high.

Right, but it can happen even if the settings for auto vacuum are done
considering the general usage but as a one of case there is sudden spike in
update in which case it might make sense to give such a message.
However if this message keep appearing in the log every now and then,
it will mean that autovacumm settings are not appropriate for the load.
I think it will be difficult to know the exact reason for dead space, do you
think it can make sense if the message indicates (as Hint) such that,
if user observes this message repeatedly the autovacuum settings are
not as per load.
Another way could be to update docs to indicate the same.

 Just telling
 the user to run VACUUM FULL is likely to make the user conclude that
 PostgreSQL sucks, I have to keep running VACUUM FULL all the time,
 taking a full-table lock.

Agreed user can conclude such things, but even if he figures that out himself
(which is quite possible), he will reach to same conclusion unless he is aware
that the reason could be the autovacuum settings.

Another thought that occurred to me is might be giving such an information for
Index can be more useful as there are always more chances for index bloat
especially in context of below information from docs.
B-tree index pages that have become completely empty are reclaimed for re-use.
However, there is still a possibility of inefficient use of space: if
all but a few index
keys on a page have been deleted, the page remains allocated. Therefore, a usage
pattern in which most, but not all, keys in each range are eventually
deleted will see
poor use of space. For such usage patterns, periodic reindexing is recommended.

There are certain usage pattern's like always inserting data in particular
(increasing/decreasing) order which can lead to bloat in above context.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-30 Thread Amit Kapila
On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 Enclosed is the patch to implement the requirement that issue log message to
 suggest VACUUM FULL if a table is nearly empty.

 The requirement comes from the Postgresql TODO list.

 If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
 the table is considered to be large enough.

 If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
 then the table is considered to have large numbers of unused rows.

 I'm not sure that we want people to automatically VF a table just
 because it's 2x bloated.  Doesn't it depend on the table size?  And in
 sort of a funny way, too, like, if the tables is small, 2x bloat is
 not wasting much disk space, but getting rid of it is probably easy,
 so maybe you should - but if the table is a terabyte, even 50% bloat
 might be pretty intolerable, but whether it makes sense to try to get
 rid of it depends on your access pattern.  I'm not really too sure
 whether it makes sense to try to make an automated recommendation
 here, or maybe only in egregious cases.

I think here main difficulty is to decide when it will be considered good
to display such a message. As you said, that it depends on access pattern
whether 50% bloat is tolerable or not, so one way could be to increase the
bloat limit and table size threshold to higher value (bloat - 80%,
table_size = 500M) where it would make sense to  recommend VF for all cases
or another way could be to consider using some auto vacuum threshold parameter
like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
this message. I think parameter like scale factor can make sense as to an extent
this parameter is an indicative of how much dead space percentage is tolerable
for user.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-26 Thread Robert Haas
On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 Enclosed is the patch to implement the requirement that issue log message to
 suggest VACUUM FULL if a table is nearly empty.

 The requirement comes from the Postgresql TODO list.

 If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
 the table is considered to be large enough.

 If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
 then the table is considered to have large numbers of unused rows.

I'm not sure that we want people to automatically VF a table just
because it's 2x bloated.  Doesn't it depend on the table size?  And in
sort of a funny way, too, like, if the tables is small, 2x bloat is
not wasting much disk space, but getting rid of it is probably easy,
so maybe you should - but if the table is a terabyte, even 50% bloat
might be pretty intolerable, but whether it makes sense to try to get
rid of it depends on your access pattern.  I'm not really too sure
whether it makes sense to try to make an automated recommendation
here, or maybe only in egregious cases.

 The free_space is calculated by reading the details from the FSM pages. This
 may increase the IO, but expecting very less FSM pages thus it shouldn't
 cause

The free space map can show more or less than the real amount of free
space, can't it?  I worry about making a recommendation that might
turn out to be wildly inaccurate...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-23 Thread Wang, Jing
On Thursday, 20 March 2014 2:45 PM, Amit Kapila amit.kapil...@gmail.com wrote:
On Wed, Mar 19, 2014 at 6:25 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 On Friday, 14 March 2014 2:42 PM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 I think it might be okay to even change this API to return the 
 FreeSpace, as the other place it is used is for Index Vacuum, so even if we 
 don't have any intention to print such a message for index in this patch, 
 but similar information could be useful there as well to suggest a user 
 that index has lot of free space.

 Enclosed please find the new patch which get the FreeSpace for one relation 
 from the return of FreeSpaceMapVacuum() function. This function and the 
 fsm_vacuum_page() function have been slightly modified to get the FreeSpace 
 and no I/O burden increasing. The little side-effect is it will calculate 
 FreeSpace for every table even the table is very small.

I think that can also be avoided, because by the time you call 
FreeSpaceMapVacuum(), you already have the required information based on which 
you can decide not to ask for freespace if required.

That will make the function FreeSpaceMapVacuum() look strange and be difficult 
to understand, so I think keeping the existing patch is better.  Cause the 
number of pages of FSM file is small , calculating FreeSpace for small table 
will not bring the burden in performance.

Can't we avoid the new calculation you have added in fsm_vacuum_page(), as 
this function already updates the size, so might be we can get it from current 
calculation done in function.

Sorry, I can't find that information from the current calculation. Could you 
give me some more detail information?

Kind regards
Jing Wang
Fujitsu Australia



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-19 Thread Amit Kapila
On Wed, Mar 19, 2014 at 6:25 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 On Friday, 14 March 2014 2:42 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think it might be okay to even change this API to return the FreeSpace, as 
 the other place it is used is for Index Vacuum, so even if we don't have any 
 intention to print such a message for index in this patch,
 but similar information could be useful there as well to suggest a user that 
 index has lot of free space.

 Enclosed please find the new patch which get the FreeSpace for one relation 
 from the return of FreeSpaceMapVacuum() function. This function and the 
 fsm_vacuum_page() function have been slightly modified to get the FreeSpace 
 and no I/O burden increasing. The little side-effect is it will calculate 
 FreeSpace for every table even the table is very small.

I think that can also be avoided, because by the time you call
FreeSpaceMapVacuum(), you already have the required information
based on which you can decide not to ask for freespace if
required.

Can't we avoid the new calculation you have added in
fsm_vacuum_page(), as this function already updates the size,
so might be we can get it from current calculation done in
function.

+ #define RELPAGES_VALUES_THRESHOLD 1000
+ #define FREESPACE_PERCENTAGE_THRESHOLD 0.5
Is there any basis to define above hash defines, we already
have one number similar to above for deciding Truncate of relation.

In anycase, I think the patch's direction is better than previous and
can be further discussed/reviewed during next CF, as it's already
quite late for 9.4.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-18 Thread Wang, Jing
On Friday, 14 March 2014 2:42 PM, Amit Kapila amit.kapil...@gmail.com wrote:
On Wed, Mar 12, 2014 at 12:22 PM, Haribabu Kommi kommi.harib...@gmail.com 
wrote:
 On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:

 By the way have you checked if FreeSpaceMapVacuum() can serve your 
 purpose, because this call already traverses FSM in depth-first 
 order to update the freespace. So may be by using this call or 
 wrapper on this such that it returns total freespace as well apart 
 from updating freespace can serve the need.

 Thanks for information. we can get the table free space by writing 
 some wrapper or modify a little bit of FreeSpaceMapVacuum() function.

 I think it might be okay to even change this API to return the FreeSpace, as 
 the other place it is used is for Index Vacuum, so even if we don't have any 
 intention to print such a message for index in this patch, 
 but similar information could be useful there as well to suggest a user that 
 index has lot of free space.

Enclosed please find the new patch which get the FreeSpace for one relation 
from the return of FreeSpaceMapVacuum() function. This function and the 
fsm_vacuum_page() function have been slightly modified to get the FreeSpace and 
no I/O burden increasing. The little side-effect is it will calculate FreeSpace 
for every table even the table is very small.


Kind regards
Jing Wang
Fujitsu Australia


vacuum_v2.patch
Description: vacuum_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-13 Thread Amit Kapila
On Wed, Mar 12, 2014 at 12:22 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:

 By the way have you checked if FreeSpaceMapVacuum() can serve your
 purpose, because this call already traverses FSM in depth-first order to
 update the freespace. So may be by using this call or wrapper on this
 such that it returns total freespace as well apart from updating freespace
 can serve the need.

 Thanks for information. we can get the table free space by writing some 
 wrapper
 or modify a little bit of FreeSpaceMapVacuum() function.

I think it might be okay to even change this API to return the FreeSpace, as the
other place it is used is for Index Vacuum, so even if we don't have
any intention
to print such a message for index in this patch, but similar
information could be
useful there as well to suggest a user that index has lot of free space.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-12 Thread Haribabu Kommi
On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com 
 wrote:
  Enclosed is the patch to implement the requirement that issue log message 
  to
  suggest VACUUM FULL if a table is nearly empty.
 
  The requirement comes from the Postgresql TODO list.
 
 I think it would be better if we can use some existing stats to issue 
 warning
 message rather than traversing the FSM for all pages. For example after
 vacuuming page in lazy_scan_heap(), we update the freespace for page.
 You can refer below line in lazy_scan_heap().
 freespace = PageGetHeapFreeSpace(page);

 Now it might be possible that we might not get freespace info easily as
 it is not accumulated for previous vacuum's. Incase there is no viable
 way to get it through vacuum stats, we are already updating fsm after
 vacuum by FreeSpaceMapVacuum(), where I think it should be possible
 to get freespace.

 yes this way it works without extra penalty. But the problem is how to 
 calculate
 the free space which is left in the skipped pages because of visibility bit.

 One way could be by extrapolating (vac_estimate_reltuples) like we do for
 some other stats, but not sure if we can get the correct estimates. The
 main reason is that if you observe that code path, all the decisions are
 mainly done on the basis of vacrelstats. I have not checked in detail if by
 using any other stats, this purpose can be achieved, may be once you can
 look into it.

I checked the vac_estimate_reltuples() function, but not able to find
a proper way to identify the free space.

 By the way have you checked if FreeSpaceMapVacuum() can serve your
 purpose, because this call already traverses FSM in depth-first order to
 update the freespace. So may be by using this call or wrapper on this
 such that it returns total freespace as well apart from updating freespace
 can serve the need.

Thanks for information. we can get the table free space by writing some wrapper
or modify a little bit of FreeSpaceMapVacuum() function. This way it
will not add
any extra overhead in identifying the table is almost empty or not.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-10 Thread Haribabu Kommi
On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote:

 On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
  Enclosed is the patch to implement the requirement that issue log message to
  suggest VACUUM FULL if a table is nearly empty.
 
  The requirement comes from the Postgresql TODO list.
 
  [Solution details]
 
  A check function is added in the function 'lazy_vacuum_rel' to check if the
  table is large enough and contains large numbers of unused rows. If it is
  then issue a log message that suggesting using 'VACUUM FULL' on the table.
 
  The judgement policy is as following:
 
  If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
  the table is considered to be large enough.
 
  If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
  then the table is considered to have large numbers of unused rows.
 
  The free_space is calculated by reading the details from the FSM pages. This
  may increase the IO, but expecting very less FSM pages thus it shouldn't
  cause

 I think it would be better if we can use some existing stats to issue warning
 message rather than traversing the FSM for all pages. For example after
 vacuuming page in lazy_scan_heap(), we update the freespace for page.
 You can refer below line in lazy_scan_heap().
 freespace = PageGetHeapFreeSpace(page);

 Now it might be possible that we might not get freespace info easily as
 it is not accumulated for previous vacuum's. Incase there is no viable
 way to get it through vacuum stats, we are already updating fsm after
 vacuum by FreeSpaceMapVacuum(), where I think it should be possible
 to get freespace.

yes this way it works without extra penalty. But the problem is how to calculate
the free space which is left in the skipped pages because of visibility bit.

In a normal scenario, the pages which are getting skipped during vacuum process
are less in number means then this approach is a good choice.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-10 Thread Amit Kapila
On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com 
 wrote:
  Enclosed is the patch to implement the requirement that issue log message 
  to
  suggest VACUUM FULL if a table is nearly empty.
 
  The requirement comes from the Postgresql TODO list.
 
 I think it would be better if we can use some existing stats to issue warning
 message rather than traversing the FSM for all pages. For example after
 vacuuming page in lazy_scan_heap(), we update the freespace for page.
 You can refer below line in lazy_scan_heap().
 freespace = PageGetHeapFreeSpace(page);

 Now it might be possible that we might not get freespace info easily as
 it is not accumulated for previous vacuum's. Incase there is no viable
 way to get it through vacuum stats, we are already updating fsm after
 vacuum by FreeSpaceMapVacuum(), where I think it should be possible
 to get freespace.

 yes this way it works without extra penalty. But the problem is how to 
 calculate
 the free space which is left in the skipped pages because of visibility bit.

One way could be by extrapolating (vac_estimate_reltuples) like we do for
some other stats, but not sure if we can get the correct estimates. The
main reason is that if you observe that code path, all the decisions are
mainly done on the basis of vacrelstats. I have not checked in detail if by
using any other stats, this purpose can be achieved, may be once you can
look into it.

By the way have you checked if FreeSpaceMapVacuum() can serve your
purpose, because this call already traverses FSM in depth-first order to
update the freespace. So may be by using this call or wrapper on this
such that it returns total freespace as well apart from updating freespace
can serve the need.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-09 Thread Wang, Jing
Hi 

 

Enclosed is the patch to implement the requirement that issue log
message to suggest VACUUM FULL if a table is nearly empty.

The requirement comes from the Postgresql TODO list.

 

[Benefit]

To find which table is nearly empty and suggest using 'VACUUM FULL' to
release the unused disk space this table occupied. 

 

[Analysis]

A table is nearly empty include two scenario:

1. The table occupy small disk size and contains few unused rows. 

2. The table occupy large disk size and contains large numbers of unused
rows. 

 

Obviously the requirement is used to release the disk in the scenario2.

 

[Solution details]

A check function is added in the function 'lazy_vacuum_rel' to check if
the table is large enough and contains large numbers of unused rows. If
it is then issue a log message that suggesting using 'VACUUM FULL' on
the table.

 

The judgement policy is as following:

If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000)
then the table is considered to be large enough.

If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default
0.5) then the table is considered to have large numbers of unused rows.

The free_space is calculated by reading the details from the FSM pages.
This may increase the IO, but expecting very less FSM pages thus it
shouldn't cause 

Any problems. Please let me know your suggestions.

 

[When the log message prints]

When executing SQL command 'VACUUM' or 'VACUUM on a table', this
function will be invoked and may issue the log message if the table
reach the condition.

When auto vacuum work and execute 'VACUUM on a table', this function
will be invoked and may issue the log message if the table reach the
condition.

 

[Example]

SELECT count(*) from t5;

 

count 

---

  3000

(1 row)

 

DELETE FROM t5 where f12900;

DELETE 2899

 

SELECT count(*) from t5;

count 

---

   101

(1 row)

 

LOG:  automatic vacuum of table wjdb.public.t5: index scans: 0

pages: 0 removed, 20
remain

tuples: 2899 removed,
101 remain, 0 are dead but not yet removable

buffer usage: 64 hits, 1
misses, 25 dirtied

avg read rate: 0.130
MB/s, avg write rate: 3.261 MB/s

system usage: CPU
0.00s/0.00u sec elapsed 0.05 sec

LOG:  Table t5 contains large numbers of unused row, suggest using
VACUUM FULL on it!

 

VACUUM t5;

LOG:  Table t5 contains large numbers of unused row, suggest using
VACUUM FULL on it!

 

 

Kind regards

Jing Wang

Fujitsu Australia

 



vacuum_v1.patch
Description: vacuum_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-09 Thread Amit Kapila
On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
 Enclosed is the patch to implement the requirement that issue log message to
 suggest VACUUM FULL if a table is nearly empty.

 The requirement comes from the Postgresql TODO list.

 [Solution details]

 A check function is added in the function 'lazy_vacuum_rel' to check if the
 table is large enough and contains large numbers of unused rows. If it is
 then issue a log message that suggesting using 'VACUUM FULL' on the table.

 The judgement policy is as following:

 If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
 the table is considered to be large enough.

 If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
 then the table is considered to have large numbers of unused rows.

 The free_space is calculated by reading the details from the FSM pages. This
 may increase the IO, but expecting very less FSM pages thus it shouldn't
 cause

I think it would be better if we can use some existing stats to issue warning
message rather than traversing the FSM for all pages. For example after
vacuuming page in lazy_scan_heap(), we update the freespace for page.
You can refer below line in lazy_scan_heap().
freespace = PageGetHeapFreeSpace(page);

Now it might be possible that we might not get freespace info easily as
it is not accumulated for previous vacuum's. Incase there is no viable
way to get it through vacuum stats, we are already updating fsm after
vacuum by FreeSpaceMapVacuum(), where I think it should be possible
to get freespace.

In general, I think idea to log a message for Vaccum Full is okay, but it would
be more viable if we can do that without any additional cost.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers