Re: [HACKERS] VACUUM always makes all pages dirty

2007-10-24 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> The attached patch should fix this. We mark the buffer dirty only if there
> is any state change in the page header.

Applied, with minor additional tweak to avoid duplicate calls to
SetBufferCommitInfoNeedsSave --- that seems (just) expensive enough
to be worth the trouble.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] VACUUM always makes all pages dirty

2007-10-24 Thread Pavan Deolasee
On 10/24/07, Pavan Deolasee <[EMAIL PROTECTED]> wrote:
>
>
>
> I am looking at it. We must not call SetBufferCommitInfoNeedsSave unless
> we make any state changes to the page.
>
>
>

The attached patch should fix this. We mark the buffer dirty only if there
is any state change in the page header.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/heap/pruneheap.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/pruneheap.c,v
retrieving revision 1.2
diff -c -r1.2 pruneheap.c
*** src/backend/access/heap/pruneheap.c	21 Sep 2007 21:25:42 -	1.2
--- src/backend/access/heap/pruneheap.c	24 Oct 2007 09:32:21 -
***
*** 146,162 
  	intnredirected = 0;
  	intndead = 0;
  	intnunused = 0;
  
  	START_CRIT_SECTION();
  
  	/*
! 	 * Mark the page as clear of prunable tuples.  If we find a tuple which
! 	 * may soon become prunable, we shall set the hint again.  Also clear
! 	 * the "page is full" flag, since there's no point in repeating the
! 	 * prune/defrag process until something else happens to the page.
  	 */
  	PageClearPrunable(page);
! 	PageClearFull(page);
  
  	/* Scan the page */
  	maxoff = PageGetMaxOffsetNumber(page);
--- 146,173 
  	intnredirected = 0;
  	intndead = 0;
  	intnunused = 0;
+ 	TransactionId	save_prune_xid;
  
  	START_CRIT_SECTION();
  
  	/*
! 	 * Save the current pd_prune_xid and mark the page as clear of prunable
! 	 * tuples. If we find a tuple which may soon become prunable, we shall set
! 	 * the hint again.
  	 */
+ 	save_prune_xid = ((PageHeader) page)->pd_prune_xid;
  	PageClearPrunable(page);
! 
! 	/* 
! 	 * Also clear the "page is full" flag if it is set, since there's no point
! 	 * in repeating the prune/defrag process until something else happens to the
! 	 * page.
! 	 */
! 	if (PageIsFull(page))
! 	{
! 		PageClearFull(page);
! 		SetBufferCommitInfoNeedsSave(buffer);
! 	}
  
  	/* Scan the page */
  	maxoff = PageGetMaxOffsetNumber(page);
***
*** 209,218 
  	else
  	{
  		/*
! 		 * If we didn't prune anything, we have nonetheless updated the
  		 * pd_prune_xid field; treat this as a non-WAL-logged hint.
  		 */
! 		SetBufferCommitInfoNeedsSave(buffer);
  	}
  
  	END_CRIT_SECTION();
--- 220,230 
  	else
  	{
  		/*
! 		 * If we didn't prune anything, but have updated the
  		 * pd_prune_xid field; treat this as a non-WAL-logged hint.
  		 */
! 		if (((PageHeader) page)->pd_prune_xid != save_prune_xid)
! 			SetBufferCommitInfoNeedsSave(buffer);
  	}
  
  	END_CRIT_SECTION();

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] VACUUM always makes all pages dirty

2007-10-24 Thread Pavan Deolasee
On 10/24/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
>
> Yeah, it's definitely a HOT-introdued thing. Vacuum calls
> heap_page_prune on every page, and this in heap_page_prune is dirtying
> the buffer:
>
> >   else
> >   {
> >   /*
> >* If we didn't prune anything, we have nonetheless
> updated the
> >* pd_prune_xid field; treat this as a non-WAL-logged
> hint.
> >*/
> >   SetBufferCommitInfoNeedsSave(buffer);
> >   }
>
> I don't have time to dig deeper at this moment. I'll take a look later
> today, unless someone beats me to it. We obviously don't want to call
> SetBufferCommitInfoNeedsSave if we didn't really change the pd_prune_xid
> field.
>
>
I am looking at it. We must not call SetBufferCommitInfoNeedsSave unless
we make any state changes to the page.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] VACUUM always makes all pages dirty

2007-10-24 Thread Heikki Linnakangas
ITAGAKI Takahiro wrote:
> VACUUM in 8.3dev always makes all pages dirty even if there are no jobs.
> In 8.2.5, VACUUM produces no dirty pages in the same workload. Therefore,
> VACUUM on 8.3 takes longer time than 8.2. I doubt some bugs in the
> HOT-related codes here, but I cannot point out the actual position yet...

Yeah, it's definitely a HOT-introdued thing. Vacuum calls
heap_page_prune on every page, and this in heap_page_prune is dirtying
the buffer:

>   else
>   {
>   /*
>* If we didn't prune anything, we have nonetheless updated the
>* pd_prune_xid field; treat this as a non-WAL-logged hint.
>*/
>   SetBufferCommitInfoNeedsSave(buffer);
>   }

I don't have time to dig deeper at this moment. I'll take a look later
today, unless someone beats me to it. We obviously don't want to call
SetBufferCommitInfoNeedsSave if we didn't really change the pd_prune_xid
field.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] VACUUM always makes all pages dirty

2007-10-24 Thread ITAGAKI Takahiro
VACUUM in 8.3dev always makes all pages dirty even if there are no jobs.
In 8.2.5, VACUUM produces no dirty pages in the same workload. Therefore,
VACUUM on 8.3 takes longer time than 8.2. I doubt some bugs in the
HOT-related codes here, but I cannot point out the actual position yet...

Do you have any idea on this issue?


 # CREATE VIEW buffers AS
 SELECT nbuf, ndirty, N.nspname AS schemaname, relname
 FROM (SELECT count(*) AS nbuf,
   sum(CASE WHEN isdirty THEN 1 ELSE 0 END) AS ndirty,
   relnamespace, relname
   FROM pg_buffercache B, pg_class C
   WHERE B.relfilenode = C.relfilenode GROUP BY relnamespace, relname) AS T
 LEFT JOIN pg_namespace N ON relnamespace = N.oid
 ORDER BY schemaname, relname;

 # BEGIN;
 # CREATE TABLE test (i integer, filler char(100) default '');
 # INSERT INTO test(i) SELECT generate_series(1, 100);
 # COMMIT;
 # SELECT * FROM buffers WHERE relname = 'test';
nbuf  | ndirty | schemaname | relname
   ---+++-
17242 |  17242 | public | test

First query makes all pages dirty because of hit bits.
 # CHECKPOINT;
 # SELECT count(*) FROM test;
 count
   -
100
 # SELECT * FROM buffers WHERE relname = 'test';
nbuf  | ndirty | schemaname | relname
   ---+++-
17242 |  17242 | public | test

First vacuum makes all pages dirty. Why?
 # CHECKPOINT;
 # VACUUM test;
 # SELECT * FROM buffers WHERE relname = 'test';
nbuf  | ndirty | schemaname | relname
   ---+++-
17242 |  17242 | public | test

Second vacuum makes all pages dirty, too. Why?
 # CHECKPOINT;
 # VACUUM test;
 # SELECT * FROM buffers WHERE relname = 'test';
nbuf  | ndirty | schemaname | relname
   ---+++-
17242 |  17242 | public | test

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate