Re: [HACKERS] [GENERAL] Buglist

2003-09-04 Thread Bruce Momjian
Jan Wieck wrote:
 Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
  Okay, my proposal would be to have a VACUUM mode where it tells the 
  buffer manager to only return a page if it is already in memory, and 
  some not cached if it would have to read it from disk, and simply skip 
  the page in that case.
  
  Since no such call is available at the OS level, this would only work
  well with very large shared_buffers settings (ie, you try to rely on
  PG shared buffers to the exclusion of kernel disk cache).  AFAIK the
  general consensus is that that's not a good way to run Postgres.
 
 Oh-kay ... so yes Manfred, your RSM is probably the better way.

Added to TODO.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar
On Friday 22 August 2003 16:23, Manfred Koizar wrote:
 On Fri, 22 Aug 2003 12:15:33 +0530, Shridhar Daithankar

 [EMAIL PROTECTED] wrote:
  Which leads us to a zero gravity vacuum, that does the lazy vacuum for
  pages currently available in the buffer cache only. [...]
 
 Since autovacuum issues vacuum analyze only, is it acceptable to say that
  this is taken care of already?

 Even a plain VACUUM (without FULL) scans the whole relation to find
 the (possibly few) pages that need to be changed.  We are trying to
 find a way to avoid those needless reads of clean pages, because (a)
 they are IOs competing with other disk operations and (b) they push
 useful pages out of OS cache and (c) of PG shared buffers.  The latter
 might become a non-issue with LRU-k, 2Q or ARC.  But (a) and (b)
 remain.

Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

 Shridhar


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar
On Fri, 22 Aug 2003 16:27:53 +0530, Shridhar Daithankar
[EMAIL PROTECTED] wrote:
What does FSM does then?

FSM = Free Space Map.  VACUUM writes information into the FSM, INSERTs
consult the FSM to find pages with free space for new tuples.

 I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

This has been discussed yesterday here and on -hackers (Decent VACUUM
(was: Buglist)).  We were talking about inventing a second data
structure: RSM.

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck
Shridhar Daithankar wrote:

On Friday 22 August 2003 16:23, Manfred Koizar wrote:
On Fri, 22 Aug 2003 12:15:33 +0530, Shridhar Daithankar

[EMAIL PROTECTED] wrote:
 Which leads us to a zero gravity vacuum, that does the lazy vacuum for
 pages currently available in the buffer cache only. [...]

Since autovacuum issues vacuum analyze only, is it acceptable to say that
 this is taken care of already?
Even a plain VACUUM (without FULL) scans the whole relation to find
the (possibly few) pages that need to be changed.  We are trying to
find a way to avoid those needless reads of clean pages, because (a)
they are IOs competing with other disk operations and (b) they push
useful pages out of OS cache and (c) of PG shared buffers.  The latter
might become a non-issue with LRU-k, 2Q or ARC.  But (a) and (b)
remain.
Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.
It's the other way around! VACUUM scan's the tables to find and reclaim 
free space and remembers that free space in the FSM. The regular 
backends that need storage space to insert tuples then use the free 
space in the pages that are recorded in the FSM instead of adding new 
pages at the end of the relations.

Jan
 Shridhar

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 Shridhar Daithankar wrote:
 Umm.. What does FSM does then? I was under impression that FSM stores page 
 pointers and vacuum work on FSM information only. In that case, it wouldn't 
 have to waste time to find out which pages to clean.

 It's the other way around! VACUUM scan's the tables to find and reclaim 
 free space and remembers that free space in the FSM.

Right.  One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar
On 22 Aug 2003 at 10:45, Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Umm.. What does FSM does then? I was under impression that FSM stores page 
  pointers and vacuum work on FSM information only. In that case, it wouldn't 
  have to waste time to find out which pages to clean.
 
  It's the other way around! VACUUM scan's the tables to find and reclaim 
  free space and remembers that free space in the FSM.
 
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

Somehow it needs to get two types of information.

A. If any transaction is accessing a page
B. If a page contains any free space.

Vacuum needs to look for pages not in A but in B. Can storage manager maintain 
two lists/hashes with minimal cost? In that case, all unlocked and not in 
transaction pages could be a much smaller subset.

Does it sound bizzare?


Bye
 Shridhar

--
Chemicals, n.:  Noxious substances from which modern foods are made.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck
Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:
Shridhar Daithankar wrote:
Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

It's the other way around! VACUUM scan's the tables to find and reclaim 
free space and remembers that free space in the FSM.
Right.  One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.
That's why I think it needs one more pg_stat column to count the number 
of vacuumed tuples. If one does

tuples_updated + tuples_deleted - tuples_vacuumed

he'll get approximately the number of tuples a regular vacuum might be 
able to reclaim. If that number is really small, no need for autovacuum 
to cause any big trouble by scanning the relation.

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck
Jan Wieck wrote:

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.
Which actually would be much better because it'd work without the 
statistics collector configured for gathering IO stats.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar
On 22 Aug 2003 at 11:03, Jan Wieck wrote:

 Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Umm.. What does FSM does then? I was under impression that FSM stores page 
  pointers and vacuum work on FSM information only. In that case, it wouldn't 
  have to waste time to find out which pages to clean.
  
  It's the other way around! VACUUM scan's the tables to find and reclaim 
  free space and remembers that free space in the FSM.
  
  Right.  One big question mark in my mind about these partial vacuum
  proposals is whether they'd still allow adequate FSM information to be
  maintained.  If VACUUM isn't looking at most of the pages, there's no
  very good way to acquire info about where there's free space.
 
 That's why I think it needs one more pg_stat column to count the number 
 of vacuumed tuples. If one does
 
  tuples_updated + tuples_deleted - tuples_vacuumed
 
 he'll get approximately the number of tuples a regular vacuum might be 
 able to reclaim. If that number is really small, no need for autovacuum 
 to cause any big trouble by scanning the relation.
 
 Another way to give autovacuum some hints would be to return some number 
 as commandtuples from vacuum. like the number of tuples actually 
 vacuumed. That together with the new number of reltuples in pg_class 
 will tell autovacuum how frequent a relation really needs scanning.

This kind of information does not really help autovacuum. If we are talking 
about modifying backend stat collection algo., so that vacuum does minimum 
work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
at will any time. In the best case, another resident process like stat 
collector can keep cleaning the deads.

This information must be in terms of pages and actually be maintained as per 
page stat. Looking at number of tuples values does not give any idea to vacuum 
how it is going to flush cache lines, either in postgresql or on OS. I doubt it 
will help vacuum command in itself to be any lighter or more efficient.

If it is easy to do, I would favour maitaining two page maps as I mentioned in 
another mail. One for pages in cache but not locked by any transaction and 
another for pages which has some free space. If it is rare for a page to be 
full, we can skip the later one. I think that could be good enough.




Bye
 Shridhar

--
Office Automation:  The use of computers to improve efficiency in the office   
 by 
removing anyone you would want to talk with over coffee.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 11:08, Jan Wieck wrote:
  Another way to give autovacuum some hints would be to return some number 
  as commandtuples from vacuum. like the number of tuples actually 
  vacuumed. That together with the new number of reltuples in pg_class 
  will tell autovacuum how frequent a relation really needs scanning.
 
 Which actually would be much better because it'd work without the 
 statistics collector configured for gathering IO stats.

Which is certainly a good thing.  Using the stats system is a measurable
performance hit.

I still want to play with pg_autovacuum ignoring the stats system and
just looking at the FSM data.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 11:17, Shridhar Daithankar wrote:
 On 22 Aug 2003 at 11:03, Jan Wieck wrote:
  That's why I think it needs one more pg_stat column to count the number 
  of vacuumed tuples. If one does
  
   tuples_updated + tuples_deleted - tuples_vacuumed
  
  he'll get approximately the number of tuples a regular vacuum might be 
  able to reclaim. If that number is really small, no need for autovacuum 
  to cause any big trouble by scanning the relation.
  
  Another way to give autovacuum some hints would be to return some number 
  as commandtuples from vacuum. like the number of tuples actually 
  vacuumed. That together with the new number of reltuples in pg_class 
  will tell autovacuum how frequent a relation really needs scanning.
 
 This kind of information does not really help autovacuum. If we are talking 
 about modifying backend stat collection algo., so that vacuum does minimum 
 work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
 at will any time. In the best case, another resident process like stat 
 collector can keep cleaning the deads.

I believe what Jan is talking about is knowing when to use a normal
vacuum, and when to do a vacuum decent.  So his proposal is working
under the assumption that there would be a cheaper vacuum analyze that
can be run most of the time.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor
On Fri, 2003-08-22 at 10:45, Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

Well, pg_autovacuum really needs to be looking at the FSM anyway.  It
could look at the FSM, and choose to to do a vacuum normal when there
the amount of FSM data becomes inadequate.  Of course I'm not sure how
you would differentiate a busy table with inadequate FSM data and an
inactive table that doesn't even register in the FSM.  Perhaps you would
still need to consult the stats system.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck
Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory, and 
some not cached if it would have to read it from disk, and simply skip 
the page in that case. Probably needs some modifications in vacuums FSM 
handling, but basically that's it. It'll still cause IO for the 
resulting index bulk cleaning, so I don't know how efficient it'll be 
after all.

The number of vacuumed tuples returned will tell the autovacuum how 
useful this vacuum scan was. The less useful it is, the less frequent 
it'll be scheduled. There is no point in vacuuming a 50M row table every 
hour when the average number of tuples reclaimed is in the hundreds. I 
don't intend to avoid a full table scan completely. I only intend to 
lower the frequency of them. It will require some fuzzy logic in 
autovacuum to figure out if a CACHEONLY vacuum for a table needs to be 
more or less frequent to find more tuples though.

So far for what I have in mind. Now what are you proposing down there? 
Where do you intend to hold that per page stat and what exactly is 
maintaining it? And please don't give us any vague some other resident 
process. This only indicates you don't really know what it requires for 
a process to be able to read or write data in PostgreSQL.

Jan

Shridhar Daithankar wrote:

On 22 Aug 2003 at 11:03, Jan Wieck wrote:

Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
 Shridhar Daithankar wrote:
 Umm.. What does FSM does then? I was under impression that FSM stores page 
 pointers and vacuum work on FSM information only. In that case, it wouldn't 
 have to waste time to find out which pages to clean.
 
 It's the other way around! VACUUM scan's the tables to find and reclaim 
 free space and remembers that free space in the FSM.
 
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

That's why I think it needs one more pg_stat column to count the number 
of vacuumed tuples. If one does

 tuples_updated + tuples_deleted - tuples_vacuumed

he'll get approximately the number of tuples a regular vacuum might be 
able to reclaim. If that number is really small, no need for autovacuum 
to cause any big trouble by scanning the relation.

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.
This kind of information does not really help autovacuum. If we are talking 
about modifying backend stat collection algo., so that vacuum does minimum 
work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
at will any time. In the best case, another resident process like stat 
collector can keep cleaning the deads.

This information must be in terms of pages and actually be maintained as per 
page stat. Looking at number of tuples values does not give any idea to vacuum 
how it is going to flush cache lines, either in postgresql or on OS. I doubt it 
will help vacuum command in itself to be any lighter or more efficient.

If it is easy to do, I would favour maitaining two page maps as I mentioned in 
another mail. One for pages in cache but not locked by any transaction and 
another for pages which has some free space. If it is rare for a page to be 
full, we can skip the later one. I think that could be good enough.



Bye
 Shridhar
--
Office Automation:	The use of computers to improve efficiency in the office	by 
removing anyone you would want to talk with over coffee.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar
On Fri, 22 Aug 2003 10:45:50 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.

VACUUM has accurate information about the pages it just visited.  Free
space information for pages not touched by VACUUM is still in the FSM,
unless free space on a page is too low to be interesting.  VACUUM has
to merge these two lists and throw away entries with little free space
if running out of room.

Thus we might end up with new almost full pages in the FSM while there
are pages with more free space lying around that a previous VACUUM
failed to register because there was more free space at that time.

Considering that
 .  FSM is lossy per definitionem
 .  we are targeting at relations with large passive areas
 .  decent VACUUM shall not replace lazy VACUUM
I see no problem here.

Future advice could be: VACCUM DECENT every hour, VACUUM daily,
VACUUM FULL once a year  where the first two could be scheduled by
autovacuum ...

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar
On Fri, 22 Aug 2003 12:18:02 -0400, Jan Wieck [EMAIL PROTECTED]
wrote:
Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory

But how can it know?  Yes, we know exactly what we have in PG shared
buffers.  OTOH we keep telling people that they should configure
moderate values for shared_buffers because the OS is better at
caching.  Your CACHEONLY VACUUM wouldn't catch those pages that are in
the OS cache but not in the shared buffers, although they are
retrievable at almost the same low cost.

We should not try to avoid _any_ physical disk access.  It's good
enough to avoid useless reads.  Hence my proposal for a reclaimable
space list ...

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 Okay, my proposal would be to have a VACUUM mode where it tells the 
 buffer manager to only return a page if it is already in memory, and 
 some not cached if it would have to read it from disk, and simply skip 
 the page in that case.

Since no such call is available at the OS level, this would only work
well with very large shared_buffers settings (ie, you try to rely on
PG shared buffers to the exclusion of kernel disk cache).  AFAIK the
general consensus is that that's not a good way to run Postgres.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck
Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:
Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory, and 
some not cached if it would have to read it from disk, and simply skip 
the page in that case.
Since no such call is available at the OS level, this would only work
well with very large shared_buffers settings (ie, you try to rely on
PG shared buffers to the exclusion of kernel disk cache).  AFAIK the
general consensus is that that's not a good way to run Postgres.
Oh-kay ... so yes Manfred, your RSM is probably the better way.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly