Re: [PATCHES] Display Pg buffer cache (WIP)

Mark Kirkwood Mon, 07 Mar 2005 17:38:26 -0800

The latest iteration.

I have added documentation and updated the expected output so that the
regression tests pass.

In addition, after looking at the various system view names, I decided
that 'pg_cache_dump' does not fit in nicely - so chose an more Pg
suitable name of 'pg_buffercache'. Some renaming of the backend
functions happened too. Finally, since I was saving blocknum, it went
into the view as well.

Hopefully I am dealing with invalid buffer tags sensibly now. The
per-buffer spin lock is still being held - altho it is obviously trivial
to remove if not actually required.

regards

Mark

P.s : remembered to use diff -c

Mark Kirkwood wrote:

Neil Conway wrote:
Tom Lane wrote:
It'd be possible to dispense with the per-buffer spinlocks so long as you look only at the tag (and perhaps the TAG_VALID flag bit). The tags can't be changing while you hold the BufMappingLock.
That's what I had thought at first, but this comment in buf_internals.h dissuaded me: "buf_hdr_lock must be held to examine or change the tag, flags, usage_count, refcount, or wait_backend_id fields." The comment already notes this isn't true if you've got the buffer pinned; it would be worth adding another exception for holding the BufMappingLock, IMHO.
I'm dubious that there's any point in recording information as
transient as the refcounts and dirtybits
I think it's worth recording dirty bits -- it provides an indication of the effectiveness of the bgwriter, for example. Reference counts could be done away with, although I doubt it would have a significant effect on the time spent holding the lock.
Let's suppose refcount is eliminated. I will then be examining the tag,
flags and buf_id elements of the buffer. Holding the BufMappingLock
prevents the tag changing, but what about the flags?
In addition Tom pointed out that I am not examining the BM_TAG_VALID or
BM_VALID flag bits (I am only checking if tag.blockNum equals
InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or
!BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set
to NULL.
Mark
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

diff -Nacr pgsql.orig/doc/src/sgml/catalogs.sgml 
pgsql/doc/src/sgml/catalogs.sgml
*** pgsql.orig/doc/src/sgml/catalogs.sgml       Mon Mar  7 12:20:17 2005
--- pgsql/doc/src/sgml/catalogs.sgml    Tue Mar  8 12:03:50 2005
***************
*** 3875,3880 ****
--- 3875,3885 ----
  
      <tbody>
       <row>
+       <entry><link 
linkend="view-pg-buffercache"><structname>pg_buffercache</structname></link></entry>
+       <entry>shared buffer cache</entry>
+      </row>
+ 
+      <row>
        <entry><link 
linkend="view-pg-indexes"><structname>pg_indexes</structname></link></entry>
        <entry>indexes</entry>
       </row>
***************
*** 3917,3922 ****
--- 3922,4021 ----
      </tbody>
     </tgroup>
    </table>
+  </sect1>
+ 
+  <sect1 id="view-pg-buffercache">
+   <title><structname>pg_buffercache</structname></title>
+ 
+   <indexterm zone="view-pg-buffercache">
+    <primary>pg_buffercache</primary>
+   </indexterm>
+   <para>
+    The view <structname>pg_buffercache</structname> provides access to
+    some information from the shared buffer cache.
+   </para>
+ 
+   <para>
+    There is one row for each buffer in the shared cache. Unused buffers are 
+    shown with all fields null except <structfield>bufferid</structfield>.
+    Because the cache is shared by all the databases, there are pages from
+    relations not belonging to the current database. 
+   </para>
+ 
+ 
+   <table>
+    <title><structname>pg_buffercache</structname> Columns</title>
+ 
+    <tgroup cols=4>
+     <thead>
+      <row>
+       <entry>Name</entry>
+       <entry>Type</entry>
+       <entry>References</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>bufferid</entry>
+       <entry><type>integer</type></entry>
+       <entry></entry>
+       <entry>
+        The buffer number. This is numbered 1 to 
<varname>shared_buffers</varname>.
+       </entry>
+      </row>
+      <row>
+       <entry>relfilenode</entry>
+       <entry><type>oid</type></entry>
+       <entry><literal><link 
linkend="catalog-pg-class"><structname>pg_class</structname></link>.relfilenode</literal></entry>
+       <entry>
+       The on-disk file for the relation that this page came from.
+       </entry>
+      </row>
+      <row>
+       <entry>reltablespace</entry>
+       <entry><type>oid</type></entry>
+       <entry>
+       <literal><link 
linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link>.oid</literal>
+       </entry>
+       <entry>Tablespace the corresponding relation is in.</entry>
+      </row>
+      <row>
+       <entry>reldatabase</entry>
+       <entry><type>oid</type></entry>
+       <entry><literal><link 
linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+       <entry>
+        Database the corresponding relation belongs to, or zero if the 
+        relation is a globally-shared table</entry>
+      </row>
+      <row>
+       <entry>relblocknumber</entry>
+       <entry><type>numeric</type></entry>
+       <entry></entry>
+       <entry>
+        The page offset within the relation that this buffer page is for.
+       </entry>
+      </row>
+      <row>
+       <entry>isdirty</entry>
+       <entry><type>bool</type></entry>
+       <entry></entry>
+       <entry>True if the buffer is dirty.</entry>
+      </row>
+     </tbody>
+    </tgroup>
+ 
+   </table>
+ 
+   <para>
+    When the <structname>pg_buffercache</structname> view is accessed, 
+    internal buffer manager locks are taken, and a copy of the buffer cache 
data
+    is made for the view to display. This ensures that the view produces a 
+    consistent set of results, while not blocking normal buffer activity 
longer 
+    than necessary.  Nonetheless there could be some impact on database 
+    performance if this view is read often.
+   </para>
+ 
   </sect1>
  
   <sect1 id="view-pg-indexes">
diff -Nacr pgsql.orig/src/backend/catalog/system_views.sql 
pgsql/src/backend/catalog/system_views.sql
*** pgsql.orig/src/backend/catalog/system_views.sql     Fri Mar  4 14:23:09 2005
--- pgsql/src/backend/catalog/system_views.sql  Tue Mar  8 11:45:39 2005
***************
*** 277,279 ****
--- 277,285 ----
      DO INSTEAD NOTHING;
  
  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
+ 
+ CREATE VIEW pg_buffercache AS
+       SELECT P.* FROM pg_buffercache_pages() AS P
+       (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, 
+        relblocknumber numeric(10), isdirty bool);
+ 
diff -Nacr pgsql.orig/src/backend/utils/adt/Makefile 
pgsql/src/backend/utils/adt/Makefile
*** pgsql.orig/src/backend/utils/adt/Makefile   Mon Mar  7 10:16:24 2005
--- pgsql/src/backend/utils/adt/Makefile        Tue Mar  8 11:44:10 2005
***************
*** 24,30 ****
        tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
        network.o mac.o inet_net_ntop.o inet_net_pton.o \
        ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
!       ascii.o quote.o pgstatfuncs.o encode.o
  
  like.o: like.c like_match.c
  
--- 24,30 ----
        tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
        network.o mac.o inet_net_ntop.o inet_net_pton.o \
        ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
!       ascii.o quote.o pgstatfuncs.o encode.o buffercache.o
  
  like.o: like.c like_match.c
  
diff -Nacr pgsql.orig/src/backend/utils/adt/buffercache.c 
pgsql/src/backend/utils/adt/buffercache.c
*** pgsql.orig/src/backend/utils/adt/buffercache.c      Thu Jan  1 12:00:00 1970
--- pgsql/src/backend/utils/adt/buffercache.c   Tue Mar  8 11:48:56 2005
***************
*** 0 ****
--- 1,237 ----
+ /*-------------------------------------------------------------------------
+  *
+  * buffercache.c
+  *    display some contents of the buffer cache
+  *
+  * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *      $PostgreSQL$
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ #include "funcapi.h"
+ #include "catalog/pg_type.h"
+ #include "storage/buf_internals.h"
+ #include "storage/bufmgr.h"
+ #include "utils/relcache.h"
+ #include "utils/builtins.h"
+ 
+ 
+ #define NUM_BUFFERCACHE_PAGES_ELEM    6
+ 
+ 
+ /*
+  * Record structure holding the to be exposed cache data.
+  */
+ typedef struct
+ {
+       uint32          bufferid;
+       Oid                     relfilenode;
+       Oid                     reltablespace;
+       Oid                     reldatabase;
+       BlockNumber     blocknum;
+       bool            isvalid;
+       bool            isdirty;
+ 
+ } BufferCachePagesRec;
+ 
+ 
+ /*
+  * Function context for data persisting over repeated calls.
+  */
+ typedef struct 
+ {
+       AttInMetadata   *attinmeta;
+       BufferCachePagesRec     *record;
+       char                    *values[NUM_BUFFERCACHE_PAGES_ELEM];
+ } BufferCachePagesContext;
+ 
+ 
+ /*
+  * Function returning data from the shared buffer cache - buffer number,
+  * relation node/tablespace/database/blocknum and dirty indicator.
+  */
+ Datum
+ pg_buffercache_pages(PG_FUNCTION_ARGS)
+ {
+       FuncCallContext         *funcctx;
+       Datum                           result;
+       MemoryContext           oldcontext;
+       BufferCachePagesContext *fctx;          /* User function context. */
+       TupleDesc                       tupledesc;
+       HeapTuple                       tuple;
+ 
+       if (SRF_IS_FIRSTCALL())
+       {
+               RelFileNode     rnode;
+               uint32          i;
+               BufferDesc      *bufHdr;
+ 
+ 
+               funcctx = SRF_FIRSTCALL_INIT();
+ 
+               /* Switch context when allocating stuff to be used in later 
calls */
+               oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+               
+               /* construct a tuple to return */
+               tupledesc = CreateTemplateTupleDesc(NUM_BUFFERCACHE_PAGES_ELEM, 
false);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 1, "bufferid",
+                                                                       
INT4OID, -1, 0);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 2, "relfilenode",
+                                                                       OIDOID, 
-1, 0);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 3, "reltablespace",
+                                                                       OIDOID, 
-1, 0);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 4, "reldatabase",
+                                                                       OIDOID, 
-1, 0);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 5, "relblockbumber",
+                                                                       
NUMERICOID, -1, 0);
+               TupleDescInitEntry(tupledesc, (AttrNumber) 6, "isdirty",
+                                                                       
BOOLOID, -1, 0);
+ 
+               /* Generate attribute metadata needed later to produce tuples */
+               funcctx->attinmeta = TupleDescGetAttInMetadata(tupledesc);
+ 
+               /* 
+                * Create a function context for cross-call persistence 
+                * and initialize the buffer counters.
+                */
+               fctx = (BufferCachePagesContext *) 
palloc(sizeof(BufferCachePagesContext));
+               funcctx->max_calls = NBuffers;
+               funcctx->user_fctx = fctx;
+ 
+ 
+               /* Allocate NBuffers worth of BufferCachePagesRec records. */
+               fctx->record = (BufferCachePagesRec *) 
palloc(sizeof(BufferCachePagesRec) * NBuffers);
+ 
+               /* allocate the strings for tuple formation */
+               fctx->values[0] = (char *) palloc(3 * sizeof(uint32) + 1);
+               fctx->values[1] = (char *) palloc(3 * sizeof(uint32) + 1);
+               fctx->values[2] = (char *) palloc(3 * sizeof(uint32) + 1);
+               fctx->values[3] = (char *) palloc(3 * sizeof(uint32) + 1);
+               fctx->values[4] = (char *) palloc(3 * sizeof(uint32) + 1);
+               fctx->values[5] = (char *) palloc(10);
+ 
+               
+               /* Return to original context when allocating transient memory 
*/
+               MemoryContextSwitchTo(oldcontext);
+ 
+ 
+               /* 
+                * Lock Buffer map and scan though all the buffers, saving the
+                * relevant fields in the fctx->record structure.
+                */
+               LWLockAcquire(BufMappingLock, LW_SHARED);
+ 
+               for (i = 0, bufHdr = BufferDescriptors; i < NBuffers; i++, 
bufHdr++)
+               {
+                       /* Lock each buffer header before inspecting. */
+                       LockBufHdr(bufHdr);
+ 
+                       rnode = bufHdr->tag.rnode;
+ 
+                       fctx->record[i].bufferid = 
BufferDescriptorGetBuffer(bufHdr);
+                       fctx->record[i].relfilenode = rnode.relNode;
+                       fctx->record[i].reltablespace = rnode.spcNode;
+                       fctx->record[i].reldatabase = rnode.dbNode;
+                       fctx->record[i].blocknum = bufHdr->tag.blockNum;
+ 
+                       if ( bufHdr->flags & BM_DIRTY) 
+                       {
+                               fctx->record[i].isdirty = true;
+                       }
+                       else
+                       {
+                               fctx->record[i].isdirty = false;
+                       }
+ 
+                       /* Note if the buffer is valid, and has storage created 
*/
+                       if ( (bufHdr->flags & BM_VALID) && (bufHdr->flags & 
BM_TAG_VALID))
+                       {
+                               fctx->record[i].isvalid = true;
+                       }
+                       else
+                       {
+                               fctx->record[i].isvalid = false;
+                       }
+ 
+                       UnlockBufHdr(bufHdr);
+ 
+               }
+ 
+               /* Release Buffer map. */
+               LWLockRelease(BufMappingLock);
+       }
+ 
+       funcctx = SRF_PERCALL_SETUP();
+       
+       /* Get the saved state */
+       fctx = funcctx->user_fctx;
+ 
+ 
+       if (funcctx->call_cntr < funcctx->max_calls)
+       {
+               uint32          i = funcctx->call_cntr;
+               char            *values[NUM_BUFFERCACHE_PAGES_ELEM];
+               int                     j;
+               
+               /* 
+                * Use a temporary values array, initially pointing to
+                * fctx->values, so it can be reassigned w/o losing the storage
+                * for subsequent calls.
+                */
+               for (j = 0; j < NUM_BUFFERCACHE_PAGES_ELEM; j++)
+               {
+                       values[j] = fctx->values[j];
+               }
+               
+ 
+               /*
+                * Set all fields except the bufferid to null if the buffer is
+                * unused or not valid.
+                */
+               if (fctx->record[i].blocknum == InvalidBlockNumber ||
+                       fctx->record[i].isvalid == false )
+               {
+ 
+                       sprintf(values[0], "%u", fctx->record[i].bufferid);
+                       values[1] = NULL;
+                       values[2] = NULL;
+                       values[3] = NULL;
+                       values[4] = NULL;
+                       values[5] = NULL;
+ 
+               }
+               else
+               {
+ 
+                       sprintf(values[0], "%u", fctx->record[i].bufferid);
+                       sprintf(values[1], "%u", fctx->record[i].relfilenode);
+                       sprintf(values[2], "%u", fctx->record[i].reltablespace);
+                       sprintf(values[3], "%u", fctx->record[i].reldatabase);
+                       sprintf(values[4], "%u", fctx->record[i].blocknum);
+                       if (fctx->record[i].isdirty) 
+                       {
+                               strcpy(values[5], "true");
+                       }
+                       else
+                       {
+                               strcpy(values[5], "false");
+                       }
+       
+               }
+ 
+ 
+               /* Build and return the tuple. */
+               tuple = BuildTupleFromCStrings(funcctx->attinmeta, values);
+               result = HeapTupleGetDatum(tuple);
+ 
+ 
+               SRF_RETURN_NEXT(funcctx, result);
+       }
+       else
+               SRF_RETURN_DONE(funcctx);
+ }
+ 
diff -Nacr pgsql.orig/src/include/catalog/pg_proc.h 
pgsql/src/include/catalog/pg_proc.h
*** pgsql.orig/src/include/catalog/pg_proc.h    Fri Mar  4 14:24:20 2005
--- pgsql/src/include/catalog/pg_proc.h Tue Mar  8 11:46:06 2005
***************
*** 3615,3620 ****
--- 3615,3622 ----
  DATA(insert OID = 2558 ( int4                            PGNSP PGUID 12 f f t 
f i 1  23 "16" _null_   bool_int4 - _null_ ));
  DESCR("convert boolean to int4");
  
+ /* builtin for cache internals view */
+ DATA(insert OID = 2510 ( pg_buffercache_pages PGNSP PGUID 12 f f t t v 0 2249 
"" _null_ pg_buffercache_pages - _null_ ));
  
  /*
   * Symbolic values for provolatile column: these indicate whether the result
diff -Nacr pgsql.orig/src/include/utils/builtins.h 
pgsql/src/include/utils/builtins.h
*** pgsql.orig/src/include/utils/builtins.h     Fri Mar  4 14:24:31 2005
--- pgsql/src/include/utils/builtins.h  Tue Mar  8 11:46:26 2005
***************
*** 823,826 ****
--- 823,829 ----
  /* catalog/pg_conversion.c */
  extern Datum pg_convert_using(PG_FUNCTION_ARGS);
  
+ /* cache dump */
+ extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
+ 
  #endif   /* BUILTINS_H */
diff -Nacr pgsql.orig/src/test/regress/expected/rules.out 
pgsql/src/test/regress/expected/rules.out
*** pgsql.orig/src/test/regress/expected/rules.out      Tue Mar  8 13:00:00 2005
--- pgsql/src/test/regress/expected/rules.out   Tue Mar  8 12:59:24 2005
***************
*** 1275,1280 ****
--- 1275,1281 ----
           viewname         |                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             definition                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                        
  
--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   iexit                    | SELECT ih.name, ih.thepath, 
interpt_pp(ih.thepath, r.thepath) AS exit FROM ihighway ih, ramp r WHERE 
(ih.thepath ## r.thepath);
+  pg_buffercache           | SELECT p.bufferid, p.relfilenode, 
p.reltablespace, p.reldatabase, p.relblocknumber, p.isdirty FROM 
pg_buffercache_pages() p(bufferid integer, relfilenode oid, reltablespace oid, 
reldatabase oid, relblocknumbernumeric(10,0), isdirty boolean);
   pg_indexes               | SELECT n.nspname AS schemaname, c.relname AS 
tablename, i.relname AS indexname, t.spcname AS "tablespace", 
pg_get_indexdef(i.oid) AS indexdef FROM ((((pg_index x JOIN pg_class c ON 
((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN 
pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON 
((t.oid = i.reltablespace))) WHERE ((c.relkind = 'r'::"char") AND (i.relkind = 
'i'::"char"));
   pg_locks                 | SELECT l.relation, l."database", l."transaction", 
l.pid, l."mode", l.granted FROM pg_lock_status() l(relation oid, "database" 
oid, "transaction" xid, pid integer, "mode" text, granted boolean);
   pg_rules                 | SELECT n.nspname AS schemaname, c.relname AS 
tablename, r.rulename, pg_get_ruledef(r.oid) AS definition FROM ((pg_rewrite r 
JOIN pg_class c ON ((c.oid = r.ev_class))) LEFT JOIN pg_namespace n ON ((n.oid 
= c.relnamespace))) WHERE (r.rulename <> '_RETURN'::name);
***************
*** 1314,1320 ****
   shoelace_obsolete        | SELECT shoelace.sl_name, shoelace.sl_avail, 
shoelace.sl_color, shoelace.sl_len, shoelace.sl_unit, shoelace.sl_len_cm FROM 
shoelace WHERE (NOT (EXISTS (SELECT shoe.shoename FROM shoe WHERE (shoe.slcolor 
= shoelace.sl_color))));
   street                   | SELECT r.name, r.thepath, c.cname FROM ONLY road 
r, real_city c WHERE (c.outline ## r.thepath);
   toyemp                   | SELECT emp.name, emp.age, emp."location", (12 * 
emp.salary) AS annualsal FROM emp;
! (40 rows)
  
  SELECT tablename, rulename, definition FROM pg_rules 
        ORDER BY tablename, rulename;
--- 1315,1321 ----
   shoelace_obsolete        | SELECT shoelace.sl_name, shoelace.sl_avail, 
shoelace.sl_color, shoelace.sl_len, shoelace.sl_unit, shoelace.sl_len_cm FROM 
shoelace WHERE (NOT (EXISTS (SELECT shoe.shoename FROM shoe WHERE (shoe.slcolor 
= shoelace.sl_color))));
   street                   | SELECT r.name, r.thepath, c.cname FROM ONLY road 
r, real_city c WHERE (c.outline ## r.thepath);
   toyemp                   | SELECT emp.name, emp.age, emp."location", (12 * 
emp.salary) AS annualsal FROM emp;
! (41 rows)
  
  SELECT tablename, rulename, definition FROM pg_rules 
        ORDER BY tablename, rulename;

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: [PATCHES] Display Pg buffer cache (WIP)

Reply via email to