The latest iteration.
I have added documentation and updated the expected output so that the regression tests pass.
In addition, after looking at the various system view names, I decided that 'pg_cache_dump' does not fit in nicely - so chose an more Pg suitable name of 'pg_buffercache'. Some renaming of the backend functions happened too. Finally, since I was saving blocknum, it went into the view as well.
Hopefully I am dealing with invalid buffer tags sensibly now. The per-buffer spin lock is still being held - altho it is obviously trivial to remove if not actually required.
regards
Mark
P.s : remembered to use diff -c
Mark Kirkwood wrote:
Neil Conway wrote:
Tom Lane wrote:
It'd be possible to dispense with the per-buffer spinlocks so long as
you look only at the tag (and perhaps the TAG_VALID flag bit). The
tags can't be changing while you hold the BufMappingLock.
That's what I had thought at first, but this comment in buf_internals.h dissuaded me: "buf_hdr_lock must be held to examine or change the tag, flags, usage_count, refcount, or wait_backend_id fields." The comment already notes this isn't true if you've got the buffer pinned; it would be worth adding another exception for holding the BufMappingLock, IMHO.
I'm dubious that there's any point in recording information as transient as the refcounts and dirtybits
I think it's worth recording dirty bits -- it provides an indication of the effectiveness of the bgwriter, for example. Reference counts could be done away with, although I doubt it would have a significant effect on the time spent holding the lock.
Let's suppose refcount is eliminated. I will then be examining the tag, flags and buf_id elements of the buffer. Holding the BufMappingLock prevents the tag changing, but what about the flags?
In addition Tom pointed out that I am not examining the BM_TAG_VALID or BM_VALID flag bits (I am only checking if tag.blockNum equals InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or !BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set to NULL.
Mark
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
diff -Nacr pgsql.orig/doc/src/sgml/catalogs.sgml
pgsql/doc/src/sgml/catalogs.sgml
*** pgsql.orig/doc/src/sgml/catalogs.sgml Mon Mar 7 12:20:17 2005
--- pgsql/doc/src/sgml/catalogs.sgml Tue Mar 8 12:03:50 2005
***************
*** 3875,3880 ****
--- 3875,3885 ----
<tbody>
<row>
+ <entry><link
linkend="view-pg-buffercache"><structname>pg_buffercache</structname></link></entry>
+ <entry>shared buffer cache</entry>
+ </row>
+
+ <row>
<entry><link
linkend="view-pg-indexes"><structname>pg_indexes</structname></link></entry>
<entry>indexes</entry>
</row>
***************
*** 3917,3922 ****
--- 3922,4021 ----
</tbody>
</tgroup>
</table>
+ </sect1>
+
+ <sect1 id="view-pg-buffercache">
+ <title><structname>pg_buffercache</structname></title>
+
+ <indexterm zone="view-pg-buffercache">
+ <primary>pg_buffercache</primary>
+ </indexterm>
+ <para>
+ The view <structname>pg_buffercache</structname> provides access to
+ some information from the shared buffer cache.
+ </para>
+
+ <para>
+ There is one row for each buffer in the shared cache. Unused buffers are
+ shown with all fields null except <structfield>bufferid</structfield>.
+ Because the cache is shared by all the databases, there are pages from
+ relations not belonging to the current database.
+ </para>
+
+
+ <table>
+ <title><structname>pg_buffercache</structname> Columns</title>
+
+ <tgroup cols=4>
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>bufferid</entry>
+ <entry><type>integer</type></entry>
+ <entry></entry>
+ <entry>
+ The buffer number. This is numbered 1 to
<varname>shared_buffers</varname>.
+ </entry>
+ </row>
+ <row>
+ <entry>relfilenode</entry>
+ <entry><type>oid</type></entry>
+ <entry><literal><link
linkend="catalog-pg-class"><structname>pg_class</structname></link>.relfilenode</literal></entry>
+ <entry>
+ The on-disk file for the relation that this page came from.
+ </entry>
+ </row>
+ <row>
+ <entry>reltablespace</entry>
+ <entry><type>oid</type></entry>
+ <entry>
+ <literal><link
linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link>.oid</literal>
+ </entry>
+ <entry>Tablespace the corresponding relation is in.</entry>
+ </row>
+ <row>
+ <entry>reldatabase</entry>
+ <entry><type>oid</type></entry>
+ <entry><literal><link
linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
+ <entry>
+ Database the corresponding relation belongs to, or zero if the
+ relation is a globally-shared table</entry>
+ </row>
+ <row>
+ <entry>relblocknumber</entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ The page offset within the relation that this buffer page is for.
+ </entry>
+ </row>
+ <row>
+ <entry>isdirty</entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>True if the buffer is dirty.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+
+ </table>
+
+ <para>
+ When the <structname>pg_buffercache</structname> view is accessed,
+ internal buffer manager locks are taken, and a copy of the buffer cache
data
+ is made for the view to display. This ensures that the view produces a
+ consistent set of results, while not blocking normal buffer activity
longer
+ than necessary. Nonetheless there could be some impact on database
+ performance if this view is read often.
+ </para>
+
</sect1>
<sect1 id="view-pg-indexes">
diff -Nacr pgsql.orig/src/backend/catalog/system_views.sql
pgsql/src/backend/catalog/system_views.sql
*** pgsql.orig/src/backend/catalog/system_views.sql Fri Mar 4 14:23:09 2005
--- pgsql/src/backend/catalog/system_views.sql Tue Mar 8 11:45:39 2005
***************
*** 277,279 ****
--- 277,285 ----
DO INSTEAD NOTHING;
GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
+
+ CREATE VIEW pg_buffercache AS
+ SELECT P.* FROM pg_buffercache_pages() AS P
+ (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid,
+ relblocknumber numeric(10), isdirty bool);
+
diff -Nacr pgsql.orig/src/backend/utils/adt/Makefile
pgsql/src/backend/utils/adt/Makefile
*** pgsql.orig/src/backend/utils/adt/Makefile Mon Mar 7 10:16:24 2005
--- pgsql/src/backend/utils/adt/Makefile Tue Mar 8 11:44:10 2005
***************
*** 24,30 ****
tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
network.o mac.o inet_net_ntop.o inet_net_pton.o \
ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
! ascii.o quote.o pgstatfuncs.o encode.o
like.o: like.c like_match.c
--- 24,30 ----
tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \
network.o mac.o inet_net_ntop.o inet_net_pton.o \
ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \
! ascii.o quote.o pgstatfuncs.o encode.o buffercache.o
like.o: like.c like_match.c
diff -Nacr pgsql.orig/src/backend/utils/adt/buffercache.c
pgsql/src/backend/utils/adt/buffercache.c
*** pgsql.orig/src/backend/utils/adt/buffercache.c Thu Jan 1 12:00:00 1970
--- pgsql/src/backend/utils/adt/buffercache.c Tue Mar 8 11:48:56 2005
***************
*** 0 ****
--- 1,237 ----
+ /*-------------------------------------------------------------------------
+ *
+ * buffercache.c
+ * display some contents of the buffer cache
+ *
+ * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * $PostgreSQL$
+ *-------------------------------------------------------------------------
+ */
+ #include "postgres.h"
+ #include "funcapi.h"
+ #include "catalog/pg_type.h"
+ #include "storage/buf_internals.h"
+ #include "storage/bufmgr.h"
+ #include "utils/relcache.h"
+ #include "utils/builtins.h"
+
+
+ #define NUM_BUFFERCACHE_PAGES_ELEM 6
+
+
+ /*
+ * Record structure holding the to be exposed cache data.
+ */
+ typedef struct
+ {
+ uint32 bufferid;
+ Oid relfilenode;
+ Oid reltablespace;
+ Oid reldatabase;
+ BlockNumber blocknum;
+ bool isvalid;
+ bool isdirty;
+
+ } BufferCachePagesRec;
+
+
+ /*
+ * Function context for data persisting over repeated calls.
+ */
+ typedef struct
+ {
+ AttInMetadata *attinmeta;
+ BufferCachePagesRec *record;
+ char *values[NUM_BUFFERCACHE_PAGES_ELEM];
+ } BufferCachePagesContext;
+
+
+ /*
+ * Function returning data from the shared buffer cache - buffer number,
+ * relation node/tablespace/database/blocknum and dirty indicator.
+ */
+ Datum
+ pg_buffercache_pages(PG_FUNCTION_ARGS)
+ {
+ FuncCallContext *funcctx;
+ Datum result;
+ MemoryContext oldcontext;
+ BufferCachePagesContext *fctx; /* User function context. */
+ TupleDesc tupledesc;
+ HeapTuple tuple;
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ RelFileNode rnode;
+ uint32 i;
+ BufferDesc *bufHdr;
+
+
+ funcctx = SRF_FIRSTCALL_INIT();
+
+ /* Switch context when allocating stuff to be used in later
calls */
+ oldcontext =
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ /* construct a tuple to return */
+ tupledesc = CreateTemplateTupleDesc(NUM_BUFFERCACHE_PAGES_ELEM,
false);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 1, "bufferid",
+
INT4OID, -1, 0);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 2, "relfilenode",
+ OIDOID,
-1, 0);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 3, "reltablespace",
+ OIDOID,
-1, 0);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 4, "reldatabase",
+ OIDOID,
-1, 0);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 5, "relblockbumber",
+
NUMERICOID, -1, 0);
+ TupleDescInitEntry(tupledesc, (AttrNumber) 6, "isdirty",
+
BOOLOID, -1, 0);
+
+ /* Generate attribute metadata needed later to produce tuples */
+ funcctx->attinmeta = TupleDescGetAttInMetadata(tupledesc);
+
+ /*
+ * Create a function context for cross-call persistence
+ * and initialize the buffer counters.
+ */
+ fctx = (BufferCachePagesContext *)
palloc(sizeof(BufferCachePagesContext));
+ funcctx->max_calls = NBuffers;
+ funcctx->user_fctx = fctx;
+
+
+ /* Allocate NBuffers worth of BufferCachePagesRec records. */
+ fctx->record = (BufferCachePagesRec *)
palloc(sizeof(BufferCachePagesRec) * NBuffers);
+
+ /* allocate the strings for tuple formation */
+ fctx->values[0] = (char *) palloc(3 * sizeof(uint32) + 1);
+ fctx->values[1] = (char *) palloc(3 * sizeof(uint32) + 1);
+ fctx->values[2] = (char *) palloc(3 * sizeof(uint32) + 1);
+ fctx->values[3] = (char *) palloc(3 * sizeof(uint32) + 1);
+ fctx->values[4] = (char *) palloc(3 * sizeof(uint32) + 1);
+ fctx->values[5] = (char *) palloc(10);
+
+
+ /* Return to original context when allocating transient memory
*/
+ MemoryContextSwitchTo(oldcontext);
+
+
+ /*
+ * Lock Buffer map and scan though all the buffers, saving the
+ * relevant fields in the fctx->record structure.
+ */
+ LWLockAcquire(BufMappingLock, LW_SHARED);
+
+ for (i = 0, bufHdr = BufferDescriptors; i < NBuffers; i++,
bufHdr++)
+ {
+ /* Lock each buffer header before inspecting. */
+ LockBufHdr(bufHdr);
+
+ rnode = bufHdr->tag.rnode;
+
+ fctx->record[i].bufferid =
BufferDescriptorGetBuffer(bufHdr);
+ fctx->record[i].relfilenode = rnode.relNode;
+ fctx->record[i].reltablespace = rnode.spcNode;
+ fctx->record[i].reldatabase = rnode.dbNode;
+ fctx->record[i].blocknum = bufHdr->tag.blockNum;
+
+ if ( bufHdr->flags & BM_DIRTY)
+ {
+ fctx->record[i].isdirty = true;
+ }
+ else
+ {
+ fctx->record[i].isdirty = false;
+ }
+
+ /* Note if the buffer is valid, and has storage created
*/
+ if ( (bufHdr->flags & BM_VALID) && (bufHdr->flags &
BM_TAG_VALID))
+ {
+ fctx->record[i].isvalid = true;
+ }
+ else
+ {
+ fctx->record[i].isvalid = false;
+ }
+
+ UnlockBufHdr(bufHdr);
+
+ }
+
+ /* Release Buffer map. */
+ LWLockRelease(BufMappingLock);
+ }
+
+ funcctx = SRF_PERCALL_SETUP();
+
+ /* Get the saved state */
+ fctx = funcctx->user_fctx;
+
+
+ if (funcctx->call_cntr < funcctx->max_calls)
+ {
+ uint32 i = funcctx->call_cntr;
+ char *values[NUM_BUFFERCACHE_PAGES_ELEM];
+ int j;
+
+ /*
+ * Use a temporary values array, initially pointing to
+ * fctx->values, so it can be reassigned w/o losing the storage
+ * for subsequent calls.
+ */
+ for (j = 0; j < NUM_BUFFERCACHE_PAGES_ELEM; j++)
+ {
+ values[j] = fctx->values[j];
+ }
+
+
+ /*
+ * Set all fields except the bufferid to null if the buffer is
+ * unused or not valid.
+ */
+ if (fctx->record[i].blocknum == InvalidBlockNumber ||
+ fctx->record[i].isvalid == false )
+ {
+
+ sprintf(values[0], "%u", fctx->record[i].bufferid);
+ values[1] = NULL;
+ values[2] = NULL;
+ values[3] = NULL;
+ values[4] = NULL;
+ values[5] = NULL;
+
+ }
+ else
+ {
+
+ sprintf(values[0], "%u", fctx->record[i].bufferid);
+ sprintf(values[1], "%u", fctx->record[i].relfilenode);
+ sprintf(values[2], "%u", fctx->record[i].reltablespace);
+ sprintf(values[3], "%u", fctx->record[i].reldatabase);
+ sprintf(values[4], "%u", fctx->record[i].blocknum);
+ if (fctx->record[i].isdirty)
+ {
+ strcpy(values[5], "true");
+ }
+ else
+ {
+ strcpy(values[5], "false");
+ }
+
+ }
+
+
+ /* Build and return the tuple. */
+ tuple = BuildTupleFromCStrings(funcctx->attinmeta, values);
+ result = HeapTupleGetDatum(tuple);
+
+
+ SRF_RETURN_NEXT(funcctx, result);
+ }
+ else
+ SRF_RETURN_DONE(funcctx);
+ }
+
diff -Nacr pgsql.orig/src/include/catalog/pg_proc.h
pgsql/src/include/catalog/pg_proc.h
*** pgsql.orig/src/include/catalog/pg_proc.h Fri Mar 4 14:24:20 2005
--- pgsql/src/include/catalog/pg_proc.h Tue Mar 8 11:46:06 2005
***************
*** 3615,3620 ****
--- 3615,3622 ----
DATA(insert OID = 2558 ( int4 PGNSP PGUID 12 f f t
f i 1 23 "16" _null_ bool_int4 - _null_ ));
DESCR("convert boolean to int4");
+ /* builtin for cache internals view */
+ DATA(insert OID = 2510 ( pg_buffercache_pages PGNSP PGUID 12 f f t t v 0 2249
"" _null_ pg_buffercache_pages - _null_ ));
/*
* Symbolic values for provolatile column: these indicate whether the result
diff -Nacr pgsql.orig/src/include/utils/builtins.h
pgsql/src/include/utils/builtins.h
*** pgsql.orig/src/include/utils/builtins.h Fri Mar 4 14:24:31 2005
--- pgsql/src/include/utils/builtins.h Tue Mar 8 11:46:26 2005
***************
*** 823,826 ****
--- 823,829 ----
/* catalog/pg_conversion.c */
extern Datum pg_convert_using(PG_FUNCTION_ARGS);
+ /* cache dump */
+ extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS);
+
#endif /* BUILTINS_H */
diff -Nacr pgsql.orig/src/test/regress/expected/rules.out
pgsql/src/test/regress/expected/rules.out
*** pgsql.orig/src/test/regress/expected/rules.out Tue Mar 8 13:00:00 2005
--- pgsql/src/test/regress/expected/rules.out Tue Mar 8 12:59:24 2005
***************
*** 1275,1280 ****
--- 1275,1281 ----
viewname |
definition
--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
iexit | SELECT ih.name, ih.thepath,
interpt_pp(ih.thepath, r.thepath) AS exit FROM ihighway ih, ramp r WHERE
(ih.thepath ## r.thepath);
+ pg_buffercache | SELECT p.bufferid, p.relfilenode,
p.reltablespace, p.reldatabase, p.relblocknumber, p.isdirty FROM
pg_buffercache_pages() p(bufferid integer, relfilenode oid, reltablespace oid,
reldatabase oid, relblocknumbernumeric(10,0), isdirty boolean);
pg_indexes | SELECT n.nspname AS schemaname, c.relname AS
tablename, i.relname AS indexname, t.spcname AS "tablespace",
pg_get_indexdef(i.oid) AS indexdef FROM ((((pg_index x JOIN pg_class c ON
((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN
pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON
((t.oid = i.reltablespace))) WHERE ((c.relkind = 'r'::"char") AND (i.relkind =
'i'::"char"));
pg_locks | SELECT l.relation, l."database", l."transaction",
l.pid, l."mode", l.granted FROM pg_lock_status() l(relation oid, "database"
oid, "transaction" xid, pid integer, "mode" text, granted boolean);
pg_rules | SELECT n.nspname AS schemaname, c.relname AS
tablename, r.rulename, pg_get_ruledef(r.oid) AS definition FROM ((pg_rewrite r
JOIN pg_class c ON ((c.oid = r.ev_class))) LEFT JOIN pg_namespace n ON ((n.oid
= c.relnamespace))) WHERE (r.rulename <> '_RETURN'::name);
***************
*** 1314,1320 ****
shoelace_obsolete | SELECT shoelace.sl_name, shoelace.sl_avail,
shoelace.sl_color, shoelace.sl_len, shoelace.sl_unit, shoelace.sl_len_cm FROM
shoelace WHERE (NOT (EXISTS (SELECT shoe.shoename FROM shoe WHERE (shoe.slcolor
= shoelace.sl_color))));
street | SELECT r.name, r.thepath, c.cname FROM ONLY road
r, real_city c WHERE (c.outline ## r.thepath);
toyemp | SELECT emp.name, emp.age, emp."location", (12 *
emp.salary) AS annualsal FROM emp;
! (40 rows)
SELECT tablename, rulename, definition FROM pg_rules
ORDER BY tablename, rulename;
--- 1315,1321 ----
shoelace_obsolete | SELECT shoelace.sl_name, shoelace.sl_avail,
shoelace.sl_color, shoelace.sl_len, shoelace.sl_unit, shoelace.sl_len_cm FROM
shoelace WHERE (NOT (EXISTS (SELECT shoe.shoename FROM shoe WHERE (shoe.slcolor
= shoelace.sl_color))));
street | SELECT r.name, r.thepath, c.cname FROM ONLY road
r, real_city c WHERE (c.outline ## r.thepath);
toyemp | SELECT emp.name, emp.age, emp."location", (12 *
emp.salary) AS annualsal FROM emp;
! (41 rows)
SELECT tablename, rulename, definition FROM pg_rules
ORDER BY tablename, rulename;
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
